U.S. ATLAS Testbed Status Report

Document Sample
U.S. ATLAS Testbed Status Report Powered By Docstoc
					U.S. ATLAS Grid Testbed
    Status and Plans



           Kaushik De
   University of Texas at Arlington



     DoE/NSF Mid-term Review

   NSF Headquarters, June 2002
                        Outline

 Testbed Phase 2 launched: UTA Workshop
    http://heppc1.uta.edu/atlas/workshop_april_2002/index.html

 New focus on rapid software deployment
  and grid based data production leading to
  demonstrations at Supercomputing 2002
 Kaushik De coordinating U.S. Testbed and
  SC2002 planning since mid-April 2002
 This talk based on new & evolving plans
        Testbed status
        Software distribution
        Application toolkit
        MC production plans
        Monitoring
        Grid tools
        Integration
        SC2002 demos

Kaushik De   DoE/NSF Review                      June 20, 2002   2
                Testbed Goals

 Demonstrate success of grid computing
  model for High Energy Physics
        in data production
        in data access
        in data analysis
 Develop, deploy and test grid middleware
  and applications
        integrate middleware with applications
        simplify deployment - robust, rapid & scalable
        inter-operate with other testbeds & grid
         organizations (iVDGL, DataTag…)
        provide single point-of-service for grid users
 Evolve into fully functioning scalable
  distributed tiered grid



Kaushik De   DoE/NSF Review                    June 20, 2002   3
             Testbed Website

 http://heppc1.uta.edu/atlas/grid-testbed/index.htm




Kaushik De   DoE/NSF Review                June 20, 2002   4
             Grid Testbed Sites


                                         U Michigan


 Lawrence Berkeley
 National Laboratory                                     Boston
                                                         University

                          Argonne
                          National
                          Laboratory



                                                        Brookhaven
                                                        National
                                         Indiana        Laboratory
     Oklahoma
     University                          University



                         University of
                         Texas at
                         Arlington

    US -ATLAS testbed launched February 2001

Kaushik De   DoE/NSF Review                           June 20, 2002   5
               Testbed Fabric

 8 production gatekeepers - ANL, BNL,
  LBNL, BU, IU, UM, OU, UTA
        http://heppc1.uta.edu/atlas/grid-testbed/testbed-sites.htm

 Large clusters at BNL, LBNL, IU, UTA, BU
        BNL: RCF, LBNL: PDSF, IU/BU: prototype Tier 2
        UTA awarded NSF MRI for acquisition of D0 &
         ATLAS grid facility ($950k+$400k) - Thanks!
 + Multiple R&D gatekeepers
        gremlin@bnl - iVDGL GIIS
        heppc5@uta - ATLAS hierarchical GIIS
        atlas10/14@anl - EDG testing
        heppc6@uta+gremlin@bnl - glue schema
        heppc17/19@uta - GRAT development
        few sites - Grappa portal
        bnl - VO server
        few sites - iVDGL testbed
Kaushik De   DoE/NSF Review                            June 20, 2002   6
         Software Distribution

 Jason Smith, Kaushik De, Saul Youssef,
  Wensheng Deng, Shava Smallen
 Goals:
        Easy installation by System Administrators
        Uniform software versions
        Pacman perfect for this task
 First stage deployment
        Done - May, 2002 
        Pacman, Globus 2.0b, cernlib 
        GRAT application/production package 
 Second stage deployment
        Magda, Grappa - June, 2002
        Tools for distributed production
 Third stage
        VDT 1.1.1, Chimera, … - July/August, 2002

Kaushik De   DoE/NSF Review                  June 20, 2002   7
             Available Packages




Kaushik De   DoE/NSF Review       June 20, 2002   8
             Applications Team

 Horst Severini, Kaushik De, Dan Engh,
  Wensheng Deng, Ed May
 Goal: enable physicist to use testbed without worrying
    about underlying middleware or ATLAS software

 Athena-Atlfast for grid testbed
                Tool 1: runs on any globus enabled node (requires
                 transfer of ~17MB executable package)
                Tool 2: runs on grid site where executable
                 package has been preinstalled
                Tool 3: runs on afs enabled sites (the latest
                 version of software is built and used)

 GRid Applications Toolkit: GRAT
                Above plus grid tools - ver 0.1 released 4/12/02 
                tested successfully on 17 U.S. ATLAS
                 gatekeepers, CMS gatekeeper, D0 gatekeeper,
                 EDG CE node (RH 6.x and RH 7.x), ...
                Version 0.3 of GRAT released May 8, 2002

 Next, add Magda+ & merge with Grappa
Kaushik De       DoE/NSF Review                        June 20, 2002   9
                    GRAT v 0.3




 Script based toolkit. Merging now with Grappa
  visual GUI tool (see Gardner talk)

Kaushik De   DoE/NSF Review          June 20, 2002   10
             Testbed Production

 Goals:
        Demonstrate distributed ATLAS data production,
         access and analysis using grid middleware and
         tools developed by the testbed group
 Plans:
        Atlfast production to test middleware and tools,
         and produce physics data for summer students,
         based on athena-atlfast, using VDT+Magda
         +Chimera and both GRAT and Grappa
                2 weeks to regenerate data, once a month
                deploy new tools and middleware each cycle
                move away from farm paradigm to grid model
                very aggressive schedule - people limited!
        DC1 production to test fabric capabilities and
         produce and access data, using old Fortran code
         atlsim, atrig and atrecon (see previous talks)
                not repeatable - hard to actively test grid software
                increase U.S. participation - involve grid testbed

Kaushik De       DoE/NSF Review                        June 20, 2002   11
              Atlfast Production

  Application: Athena-atlfast
            Current version 3.0.1. Next release will be
             3.2.0 (official DC1 release)
  Middleware: VDT+Magda+Chimera
  Interface: GRAT, Grappa
  Sites: 8 ATLAS testbed sites, 2 CMS testbed
     sites, 2 D0 MC farms, EDG sites? TeraGrid sites?
  June, 2002: Phase Alpha
            Demonstrate software deployment and simple
             production system  done




Kaushik De    DoE/NSF Review                    June 20, 2002   12
              Summer Schedule


  July 1-15: Phase 0, 10^7 events
            Globus 2.0 beta, Athena 3.0.1, Grappa, common
             disk model, Magda, 5 physics processes, BNL VO
             manager, minimal job scheduler, GridView
             monitoring

  August 5-19: Phase 1, 10^8 events
            VDT 1.1.1, Hierarchical GIIS server, Athena-atlfast
             3.2.0, Grappa, Magda - data & replica
             management with metadata catalogue, 10 physics
             processes, static MDS based job scheduler, new
             visualization

  September 2-16: Phase 2, 10^9 events,
   1 TB storage, 40k files
            Athena-atlfast 3.2.0 instrumented, 20 physics
             processes, upgraded BNL VO manager, dynamic
             job scheduler, fancy monitoring

  Need some planning of analysis tools

Kaushik De     DoE/NSF Review                       June 20, 2002   13
             Atlfast Production
                Architecture


                                            Boxed
                                         Athena-Atlfast

  Compute Sites                Storage
                              Elements
   MDS              Globus

  Resource
                      Magda                VDC
   Broker


                                         JobOptions:
       Grappa Portal                        Higgs
            or                              SUSY
       GRAT script                          QCD
                                            Top
                                            W/Z
             User
Kaushik De   DoE/NSF Review                   June 20, 2002   14
             Monitoring Team

 Dantong Yu, Patrick McGuigan, Craig Tull,
  Kaushik De, Shawn McKee, Dan Engh,
  Jason Smith
 Monitoring is critically important in
  distributed Grid computing
        check system health, debug problems
        discover resources using static data
        job scheduling and resource allocation decisions using
         dynamic data from MDS and other monitors

 Testbed monitoring priorities
        Discover site configuration
        Discover software installation
        Application monitoring
        Grid status/operations monitoring

 Also need
        Well defined data for job scheduling
        Visualization

Kaushik De   DoE/NSF Review                      June 20, 2002   15
       Monitoring - Back End

 Publishing MDS information
        Glue schema - BNL & UTA
        Pippy - Pacman information service provider
        BNL ACAS schema
        Hierarchical GIIS server
 Non-MDS back ends
        iPerf, Netlogger, Prophesy, Ganglia
 Archiving
        MySQL
                GridView, BNL ACAS
        RRD
                Network

 Work needed
        What to store?
        Replication of archived information
 Good progress on back end!
Kaushik De       DoE/NSF Review                June 20, 2002   16
       Monitoring - Front End

 MDS based
        GridView, Gridsearcher
        Converting TeraGrid and other toolkits
 Non-MDS
        Cricket, Ganglia
 Work needed
        Urgent for SC2002! Graphs, maps, drill-down…
        New visualization team: Dantong Yu (evaluation
         of existing tools), Patrick McGuigan (Java CoG,
         Python), Jason Smith (PHP)




Kaushik De   DoE/NSF Review                  June 20, 2002   17
                  GridView 2.2

 Simple visualization tool using Globus Toolkit
        First native Globus application for ATLAS grid (March 2001)
        Collects information using Globus tools. Archival information
         is stored in MySQL server on a different machine. Data
         published through web server on a third machine.
 http://heppc1.uta.edu/atlas/grid-status/index.html




Kaushik De   DoE/NSF Review                           June 20, 2002   18
                Testbed Tools

 Many tools developed by the U.S. ATLAS
  testbed group during past year
 GridView - simple tool to monitor status of
  testbed Kaushik De, Patrick McGuigan
 Gripe - unified user accounts Rob Gardner
 Magda - MAnager for Grid DAta Torre Wenaus,
    Wensheng Deng (see Gardner & Wenaus talks)
 Pacman - package management and distribution
  tool Saul Youssef
        Being widely used or adopted by iVDGL VDT,
         Ganga, and others (see Gardner talk)
 Grappa - web portal using active notebook
  technology Shava Smallen (see Gardner talk)
 GRAT - GRid Application Toolkit
 Gridsearcher - MDS browser Jennifer Schopf
 GridExpert - Knowledge Database Mark Sosebee
 VO Toolkit - Site AA Rich Baker (see Baker talk)
 ...
Kaushik De   DoE/NSF Review                 June 20, 2002   19
                   Integration!!

 Coordination with other grid efforts and
  software developers - very difficult task!
 Project centric:
        GriPhyN/iVDGL - Rob Gardner
        PPDG - Torre Wenaus
        EDG - Ed May, Jerry Gieraltowski
        ATLAS/LHCb - Rich Baker
        ATLAS/CMS - Kaushik De
        ATLAS/D0 - Jae Yu
 Fabric/Middleware centric:
        Afs Software installations - Alex Undrus, Shane
         Canon, Iwona Sakrejda
        Networking - Shawn McKee, Rob Gardner
        Virtual and Real Data Management -
         Wendsheng Deng, Sasha Vaniachin, Pavel
         Nevski, David Malon, Rob Gardner, Dan Engh,
         Mike Wilde, Yong Zhao, Shava Smallen
        Security/Site AA/VO - Rich Baker, Dantong Yu

Kaushik De   DoE/NSF Review                  June 20, 2002   20
                 SC2002 Plans

 SC2002 in Maryland, mid-November
 Testbed Production demo (BNL) Kaushik De
        Monitor/interact with grid production
 ATLAS/CMS demo (FNAL/SLAC) Kaushik De
        preliminary discussions with CMS
        may become iVDGL demo (see Gardner talk)
        ATLAS GRAT already running at CMS sites
        GridView is monitoring two CMS sites
 Application monitoring (LBNL) Craig Tull
        Athena + Netlogger + Prophesy
 Virtual data demo (ANL/UC/IU) Rob Gardner
 Common areas
        Brochure - Rob Gardner
        Posters - Craig Tull
        Common script - Jennifer Schopf




Kaushik De   DoE/NSF Review                      June 20, 2002   21
  Testbed Production Demo.
       (in BNL booth)

 ATLAS physics story
 ATLAS computing story
 Visualize production:
        Monitor site status
                static - glue, pippy
                dynamic - jobs, cpu usage
        Monitor data status
                magda - visual?
                VDC (same as IU booth)
        Monitor applications
                Athena instrumented (same as LBNL booth)

 Event display?
 First version at LBNL US Computing
  meeting July 29-31



Kaushik De       DoE/NSF Review                   June 20, 2002   22
               ATLAS-CMS
             Demo. Architecture




                         SC2002 Demo
               Visualization
             (status, physics)          ATLAS-CMS
                                          User Job

                                                Globus,
                                                Condor-G?

                     MDS,
                     Ganglia,       Scheduling
                     Paw/Root
                                      Policy
                                                         ??
                                   Condor, Python?

      ATLAS-CMS                 MOP, GRAT, Grappa    Production
        Testbed                                         Jobs

Kaushik De     DoE/NSF Review                           June 20, 2002   23
                      Summary

 Testbed -> SC2002
        Recently refocused testbed activities and plans
        Important grid-based production milestone this
         summer to test middleware using light-weight
         layered approach to software deployment
        Testbed production should naturally lead to
         Supercomputing 2002 demos
        Exploring various integration and cooperation
         issues - no need to reinvent the wheel
        The testbed can provide a lot of resources,
         hardware and people, when fully grid-enabled
        In summary - hardware not limiting problem yet!
         Middleware coming along. Need serious work
         on integration and deployment and testing.
         Shortage of people critical here - lab and
         university base funding shortages are the limiting
         factors!!



Kaushik De   DoE/NSF Review                   June 20, 2002   24