UC Cloud Summit 2011 – LBL Campus Update
High Performance Computing Services (HPCS),
HPC activities – Condo Cluster Computing:
• A new cluster support model.
• To achieve flexibility, sharing and better utilization of hardware.
• PIs purchase cluster hardware (nodes, leaf switches & cables).
• PIs can purchase any additional storage other than provided.
• HW in this condo has to be refreshed before 4 years.
• PIs get free compute time equivalent to their contribution.
Making a Condo
• HW connected and shared with institutional cluster Lawrencium.
• PI purchased storage will be accessible on all the condo nodes.
• Scheduling policies tuned to give faster turn around time for PI jobs.
• Monthly cluster support charges are waived.
• Flexibility for PIs to use more resources (than purchased) when needed.
• Easy mechanism to share idle resources to other users in the Lab.
IT Cloud Developments:
Evaluating services for the past 3 years
Google Apps including Google Docs and Sites
Collaborative services like Manymoon and Smartsheet.
Point and Ship (for managing shipping)
Daptiv (Ops project management)
All systems leverage IT’s identity Management Infrastructure (SAML/Shib).
Future Cloud Developments:
Additional Google apps like Google code, Reader & Picasa
Taleo, a SaaS Talent Management Application.
Carbonite, a SaaS service for user-managed desktop backups.
IT service - Virtual Machine Hosting:
• VMWARE based virtual machine environment
• Over 100 virtual machines running
IT service - Cloud Hosting (Amazon EC2 server):
• Provides computing resources on Amazon’s AWS Platform.
• CentOS AMIs with standard IT monitoring tools
• Option to create a VPN connection to LBL.
• IT manages the OS and Amazon layers
Support from Network Infrastructure - ESNET
• ESnet peers with multiple cloud providers (including Amazon, Google, Microsoft).
• When possible, we peer in multiple locations (Bay Area, Chicago, etc) and we're
eager to peer with other providers as well
• We're interested in pushing advanced network services (including virtual circuits and
performance monitoring) into cloud contexts
• Multiple DOE-funded scientists are actively researching clouds for computation,
• Several DOE sites are sourcing cloud services
Questions about ESnet and cloud? Please send email to email@example.com
Experiments with Amazon EC2 services:
Seeking Supernovae in the Clouds : A Performance Study
– K. Jackson, L. Ramakrishnan, K. Runge, R. Thomas.
• AWS can be very useful for scientific computing.
• Porting today requires significant effort.
• Failures occur frequently and application must be able to handle them gracefully.
Performance Analysis of HPC applications on the AWS Cloud
– K. Jackson, L. Ramakrishnan, K. Muriki, S. Cannon, S. Cholia, J. Shalf, H. Wasserman, N. Wright.
• Data shows that the more communication in application, the worse EC2 performance
• Variability introduced by the shared nature of the virtualized environment causes
significant variability in EC2 performance.
Berkeley Lab Contributes Expertise to New Amazon Web Services Offering
“When we applied these tests to the new Cluster Computer Instances for Amazon EC2, we
found that the new offering performed 8.5 times faster than the previous Amazon instance
types.” --K. Jackson.
Cloud computing for Science.
-- G. Bell, K. Jackson, G. Kurtzer, J. Li, K. Muriki, L. Ramakrishnan, J. White.
• Large scale MPI has a high overhead on EC2.
• Enables data-intensive science.
HPC Cloud Applied to Lattice Optimization
HPC CLOUD APPLIED TO LATTICE OPTIMIZATION*
C. Sun, H. Nishimura, S. James, K. Song, K. Muriki, Y. Qin, Lawrence Berkeley National Laboratory , CA 94720, U.S.A
– C. Sun, H. Nishimura, S. James, K. Song, K. Muriki, Y. Qin Abstract
As Cloud services gain in popularity for enterprise use,
Amazon CCI Instances
•Recently introduced by Amazon EC2
Amazon EC2 Regions and Zones
vendors are now turning their focus towards providing
cloud services suitable for scientific computing. •Available only from US-EAST region today
Recently, Amazon Elastic Compute Cloud (EC2)
introduced the Cluster Compute Instances (CCI), a new •Pre defined architecture & Hardware
instance type specifically designed for High specification
Performance Computing (HPC) applications.
• Dual Quad core Intel Nehelam processors
At Berkeley Lab, the physicists at the Advanced Light • 64 bit platform
Source (ALS) have been running Lattice Optimization • 23 Gb memory
on a local cluster, but the queue wait time and the • 10Gb Ethernet
flexibility to request compute resources when needed
are not ideal for rapid development work. •HVM (Hardware Virtual Machine)
“Increased performance of the recently introduced AWS To explore alternatives, for the first time we investigate
running the Lattice Optimization application on
Amazon’s new CCI to demonstrate the feasibility and
•HW not shared with multiple instances at the
CCI instances better meet the needs of scientific trade-offs of using public cloud services for science.
Cost Comparison (EC2 vs local cluster)
•All other advantages of Amazon EC2
• On demand access
• No upfront costs
• Pay as you go
community, however EC2 may work less well for large- Below table shows the list of services provided in
Cluster Cluster Configurations Cluster Block Diagram
scale parallel applications that depend heavily on memory Hardware X X
Facilities X X
and interconnect performance. Electricity &
CPU Freq (GHz)
NFS server for EBS vol
HT On Off Off Off
SW effort Not X 10 Gb Ethernet
s Interconnect 1 10 20 40 40
Virtualization On Off Off Off
SW effort represents the support and work involved in Compute Compute Compute
It remains important for researchers to benchmark their
Node 1 Node 2 Node N
creating a usable environment using the hardware. Cores/Node 16 8 8 12
Amazon EC2 provides above listed services @ $0.20 per Memory/Node 23 GB 16 GB 24 GB 24 GB
core hour. Researchers can obtain access to locally All nodes cc1.4xlarge type instances.
managed shared clusters at a much lower cost. Its dif ficult 1. Gb/s Application code & I/O in EBS.
particular application and review the local costs when to calculate the actual cost of a core hour on a locally
managed cluster, because:
• Facilities may be a sunk cost.
Lattice Optimization solution plot Team
making a decision to use the cloud.” • Electricity and Cooling depends on local rates and
data center efficiencies
• Effective cost per core/hour depends on local cluster
Runtime on Clusters
EC2 LRC Mako LR2
Time 679 857 724 566 Top row: K. Muriki, H. Nishimura, Y. Qin, K. Song;
(secs) Bottom row: S. James, C. Sun
The increased performance of the recently introduced Amazon CCI better meets the needs of the scientific community and this makes it a good option for researchers
needing access to on-demand computing capacity . However, as demonstrated in this paper , EC2 may work less well for large-scale parallel applications that depend
heavily on memory and interconnect performance. Therefore, it remains important for researchers to benchmark their particular application and review their local costs
when making a decision to use the Cloud.
Advanced Light Source | Information Technology
Magellan Project Mission:
•Determine the appropriate role for commercial and/or private cloud
computing for DOE/SC midrange workloads
•Deploy a test bed cloud to serve the needs of mid-range scientific
•Evaluate the effectiveness of this system for a wide spectrum of DOE/SC
applications in comparison with other platform models.