PowerPoint Presentation - UC Grid

Document Sample
PowerPoint Presentation - UC Grid Powered By Docstoc
					UC Cloud Summit 2011 – LBL Campus Update

                                    Krishna Muriki,
                      High Performance Computing Services (HPCS),
                                      IT Division.
HPC activities – Condo Cluster Computing:
    Model
       • A new cluster support model.
       • To achieve flexibility, sharing and better utilization of hardware.

       • PIs purchase cluster hardware (nodes, leaf switches & cables).
       • PIs can purchase any additional storage other than provided.
       • HW in this condo has to be refreshed before 4 years.
       • PIs get free compute time equivalent to their contribution.

   Making a Condo
     • HW connected and shared with institutional cluster Lawrencium.
       • PI purchased storage will be accessible on all the condo nodes.
       • Scheduling policies tuned to give faster turn around time for PI jobs.

    Advantages
      • Monthly cluster support charges are waived.
       • Flexibility for PIs to use more resources (than purchased) when needed.
       • Easy mechanism to share idle resources to other users in the Lab.
IT Cloud Developments:

    Evaluating services for the past 3 years
    Google Apps including Google Docs and Sites
    Google Calendar
    Collaborative services like Manymoon and Smartsheet.
    Business Systems
       Point and Ship (for managing shipping)
       Daptiv (Ops project management)

   All systems leverage IT’s identity Management Infrastructure (SAML/Shib).

Future Cloud Developments:

    Additional Google apps like Google code, Reader & Picasa
    Taleo, a SaaS Talent Management Application.
    Carbonite, a SaaS service for user-managed desktop backups.
IT service - Virtual Machine Hosting:

   • VMWARE based virtual machine environment
   • Over 100 virtual machines running

IT service - Cloud Hosting (Amazon EC2 server):

   • Provides computing resources on Amazon’s AWS Platform.
   • CentOS AMIs with standard IT monitoring tools
   • Option to create a VPN connection to LBL.
   • IT manages the OS and Amazon layers
Support from Network Infrastructure - ESNET

   • ESnet peers with multiple cloud providers (including Amazon, Google, Microsoft).
   • When possible, we peer in multiple locations (Bay Area, Chicago, etc) and we're
   eager to peer with other providers as well
   • We're interested in pushing advanced network services (including virtual circuits and
   performance monitoring) into cloud contexts
   • Multiple DOE-funded scientists are actively researching clouds for computation,
   services, storage.
   • Several DOE sites are sourcing cloud services

   Questions about ESnet and cloud? Please send email to
Experiments with Amazon EC2 services:

    Seeking Supernovae in the Clouds : A Performance Study
     – K. Jackson, L. Ramakrishnan, K. Runge, R. Thomas.
      • AWS can be very useful for scientific computing.
      • Porting today requires significant effort.
      • Failures occur frequently and application must be able to handle them gracefully.

    Performance Analysis of HPC applications on the AWS Cloud
     – K. Jackson, L. Ramakrishnan, K. Muriki, S. Cannon, S. Cholia, J. Shalf, H. Wasserman, N. Wright.
      • Data shows that the more communication in application, the worse EC2 performance
      • Variability introduced by the shared nature of the virtualized environment causes
      significant variability in EC2 performance.

   Berkeley Lab Contributes Expertise to New Amazon Web Services Offering
      “When we applied these tests to the new Cluster Computer Instances for Amazon EC2, we
      found that the new offering performed 8.5 times faster than the previous Amazon instance
      types.” --K. Jackson.
 Cloud computing for Science.
  -- G. Bell, K. Jackson, G. Kurtzer, J. Li, K. Muriki, L. Ramakrishnan, J. White.

    • Large scale MPI has a high overhead on EC2.
    • Enables data-intensive science.
 HPC Cloud Applied to Lattice Optimization
                                                                                                                            HPC CLOUD APPLIED TO LATTICE OPTIMIZATION*
                                                                                                     C. Sun, H. Nishimura, S. James, K. Song, K. Muriki, Y. Qin, Lawrence Berkeley National Laboratory , CA 94720, U.S.A

   – C. Sun, H. Nishimura, S. James, K. Song, K. Muriki, Y. Qin       Abstract
                                                                      As Cloud services gain in popularity for enterprise use,
                                                                                                                                              Amazon CCI Instances
                                                                                                                                       •Recently introduced by Amazon EC2
                                                                                                                                                                                                    Amazon EC2 Regions and Zones

                                                                      vendors are now turning their focus towards providing
                                                                      cloud services suitable for scientific computing.                •Available only from US-EAST region today
                                                                      Recently, Amazon Elastic Compute Cloud (EC2)
                                                                      introduced the Cluster Compute Instances (CCI), a new            •Pre defined architecture & Hardware
                                                                      instance type specifically designed for High                     specification
                                                                      Performance Computing (HPC) applications.
                                                                                                                                       •Instance specifications:
                                                                                                                                          • Dual Quad core Intel Nehelam processors
                                                                      At Berkeley Lab, the physicists at the Advanced Light               • 64 bit platform
                                                                      Source (ALS) have been running Lattice Optimization                 • 23 Gb memory
                                                                      on a local cluster, but the queue wait time and the                 • 10Gb Ethernet
                                                                      flexibility to request compute resources when needed
                                                                      are not ideal for rapid development work.                        •HVM (Hardware Virtual Machine)

“Increased performance of the recently introduced AWS                 To explore alternatives, for the first time we investigate
                                                                      running the Lattice Optimization application on
                                                                      Amazon’s new CCI to demonstrate the feasibility and

                                                                                                                                       •HW not shared with multiple instances at the
                                                                                                                                       same time.

CCI instances better meet the needs of scientific                     trade-offs of using public cloud services for science.

                                                                            Cost Comparison (EC2 vs local cluster)
                                                                                                                                       •All other advantages of Amazon EC2
                                                                                                                                          • On demand access
                                                                                                                                          • No upfront costs
                                                                                                                                          • Pay as you go
                                                                                                                                          • Scalable

community, however EC2 may work less well for large-                   Below table shows the list of services provided in
                                                                       each option.

                                                                                            EC2            Local

                                                                                                          Cluster                                           Cluster Configurations                                            Cluster Block Diagram
scale parallel applications that depend heavily on memory                Hardware             X               X
                                                                                                                                       CPU Arch
                                                                          Facilities          X               X

and interconnect performance.                                           Electricity &
                                                                         HW effort


                                                                                                                                       CPU Freq (GHz)
                                                                                                                                       Cache (MB)




                                                                                                                                                                                                                                 Master/Head Node

                                                                                                                                                                                                                               NFS server for EBS vol

                                                                                                                                       HT                       On            Off          Off           Off
                                                                         SW effort          Not               X                                                                                                                         10 Gb Ethernet
                                                                  s                                                                    Interconnect 1           10            20           40             40
                                                                                                                                       Virtualization           On            Off          Off           Off
                                                                        SW effort represents the support and work involved in                                                                                                 Compute        Compute              Compute

It remains important for researchers to benchmark their
                                                                                                                                                                                                                               Node 1         Node 2               Node N
                                                                        creating a usable environment using the hardware.              Cores/Node               16             8            8             12

                                                                        Amazon EC2 provides above listed services @ $0.20 per          Memory/Node             23 GB         16 GB        24 GB        24 GB
                                                                        core hour. Researchers can obtain access to locally                                                                                                 All nodes cc1.4xlarge type instances.
                                                                        managed shared clusters at a much lower cost. Its dif ficult   1.   Gb/s                                                                            Application code & I/O in EBS.

particular application and review the local costs when                  to calculate the actual cost of a core hour on a locally
                                                                        managed cluster, because:

                                                                           • Facilities may be a sunk cost.
                                                                                                                                              Lattice Optimization solution plot                                  Team

making a decision to use the cloud.”                                       • Electricity and Cooling depends on local rates and
                                                                             data center efficiencies

                                                                           • Effective cost per core/hour depends on local cluster

                                                                              Performance Efficiencies

                                                                                                                                                        Runtime on Clusters
                                                                                                                                                         EC2          LRC          Mako     LR2

                                                                                                                                              Time       679           857          724     566                   Top row: K. Muriki, H. Nishimura, Y. Qin, K. Song;
                                                                                                                                             (secs)                                                               Bottom row: S. James, C. Sun


                                                                      The increased performance of the recently introduced Amazon CCI better meets the needs of the scientific community and this makes it a good option for researchers
                                                                      needing access to on-demand computing capacity . However, as demonstrated in this paper , EC2 may work less well for large-scale parallel applications that depend
                                                                      heavily on memory and interconnect performance. Therefore, it remains important for researchers to benchmark their particular application and review their local costs
                                                                      when making a decision to use the Cloud.

                                                                                                                              Advanced Light Source | Information Technology
NERSC Initiatives:

   Magellan Project Mission:

      •Determine the appropriate role for commercial and/or private cloud
      computing for DOE/SC midrange workloads
      •Deploy a test bed cloud to serve the needs of mid-range scientific
      •Evaluate the effectiveness of this system for a wide spectrum of DOE/SC
      applications in comparison with other platform models.

Shared By: