Amazon Web Services Training by zhangyun


									Virtualization & Life Science
2010 BioIT World Web Symposia Series

                                     - http://www.bioteam.n
Who am I?
• I’m from the BioTeam
  ▫ Independent consulting shop
  ▫ Staffed by scientists forced to
    learn IT to get our own
    research done

• Found a fun business niche
  ▫ Bridging the “gap” between
    science, IT & high performance

• This matters today because …
  ▫ We work daily in pharma,
    biotech, academic, government
    & military sites
      … often we can talk/share our
  ▫ We have no financial
    entanglements with providers of
    virtualization products
                              - http://www.bioteam.n
• When it comes to
  virtualization I’m beholden
  to nobody
• Plan on speaking only on
  what I’ve done, seen and
• Think of me as an on-the-
  ground spy reporting on
  trends and applications, not
  a pundit, “expert” or
  ▫ No excuses for not doing
    your own due diligence

                        - http://www.bioteam.n
I am not planning on …          I do plan to talk
• Explaining what               • … specifically about
  virtualization is               products and strategies we
• Selling you on a particular     see often see in research IT
  product or strategy             environments
• Covering the pros & cons in   • What I feel are the best
  any vague or generic sense      benefits to IT and the
                                • Will also be specific about
                                  what my own company is

                             - http://www.bioteam.n
Meta-Topics for Today
• Why virtualize?              • Benefits: Everyone
• Benefits: IT Organizations   • Some practical advice
• Benefits: Research

                           - http://www.bioteam.n
Reasons to virtualize
Hype aside, what are the real benefits?

                           - http://www.bioteam.n
Why virtualize?
Three main areas of benefit …

• Enhances Research
• Enhances IT Operations
• Saves Money

                       - http://www.bioteam.n
Why virtualize?
Enhancing research

1. Gives researchers new capabilities
2. Increases the speed at which researchers
   can gain access to existing capabilities
3. Ideal meeting ground for when informatics
   & IT have potentially conflicting goals

                  - http://www.bioteam.n
Why virtualize?
Research: New Capabilities

• Desktop level virtualization gives
  researchers a manageable way to escape
  the confines of a managed corporate
  desktop OS image
• Uses:
   ▫ Access to Linux
   ▫ Sandbox environment for software dev &
   ▫ Testing of entire workflows & even small
     compute farms
                      - http://www.bioteam.n
Linux virtualization on Apple OS X

                            - http://www.bioteam.n
Why virtualize?
Research: Extend/Enhance Existing Capabilities

• Generally boils down to being able to do
  things faster
   ▫ Cliché example is provisioning rapidly new
     servers for research informatics workflows or
      Weeks or months to provision if hardware needs
       to be purchased & installed
      Days or weeks to provision if hardware already

                                - http://www.bioteam.n
Why virtualize?
Research: Extend/Enhance Existing Capabilities

• Also a nice way for Enterprise IT to met
  Research IT halfway:
   ▫ Need: scientists often use webservers, databases
     and webapps for simple sharing of data among
     workgroups & projects
   ▫ Problem: scientists don’t typically code for
     security and their apps rarely rise to the level of
     needing to be added formally to the enterprise
     software portfolio
   ▫ Solution: enterprise qualified, patched & secured
     LAMP stack for researchers
                                - http://www.bioteam.n
Why virtualize?
And this brings me to …

• Virtual OS images can be an ideal “middle
  ground” between research informatics &
  Enterprise IT
   ▫ Especially IT support organizations …

                      - http://www.bioteam.n
Why virtualize?
Finding Common Ground between Research & Enterprise IT
• Absolutely standard, non controversial needs in
  many research environments:
   ▫ Access to Linux
   ▫ Ability to easily install software, libraries and
     patches via internet & other repositories without
     having to file a helpdesk ticket or ask someone
     for permission
   ▫ Elevated access privileges to handle file and
     owner permissions in a project or workgroup
   ▫ Desire to run web, database and application
   ▫ Elevated access privileges to control servers &
     services               - http://www.bioteam.n
Why virtualize?
Finding Common Ground between Research & Enterprise IT

• Given the laundry list of requirements in the
  previous slide, how can Enterprise IT
  possibly support these crazy R&D people
  who need root access and write bad or
  insecure code?

   ▫ Provisioning enterprise-approved VM images
     makes this a manageable problem
   ▫ The VM image is also the ideal “line in the
     sand” when it comes to support
                              - http://www.bioteam.n
Why virtualize?
Finding Common Ground between Research & Enterprise IT

• The “blessed” Linux VM for researchers:
   ▫ Fully patched OS & kernel
   ▫ Already integrated with Active Directory
   ▫ Root password known only to IT
      Elevated privileges via managed /etc/sudoers
      Anyone who needs sudo is allowed to have it
      Sudo actions logged to external host for audit
   ▫ Configurable periodic snapshot-based
     backups           - http://www.bioteam.n
Why virtualize?
Finding Common Ground between Research & Enterprise IT

• The Middle Ground
   ▫ In exchange for having unlimited freedom on
     the Linux VM, researchers understand that
     they give up a certain amount of IT support &
   ▫ If the researchers freeze, kill, crash or
     corrupt the Linux VM during their work:
      IT will restore the VM from the last known-
       good snapshot or backup image
      No individual troubleshooting or support, if they
       “break” the system they will be given an older
       working backup image, nothing else. - http://www.bioteam.n
Why virtualize?
Enhance IT Operations

               - http://www.bioteam.n
Reduce Operational Burden
Remote mgmt. is non-trivial
1.   Remote power control
2.   Serial console switch
3.   Serial console cabling
4.   IP KVM Device

•    All these devices (and
     often more) are required
     to successfully provide a
     100% “lights-out” remote
     management capability

                        - http://www.bioteam.n
‘lights-out’ mgmt is baked into virtualized

                                     - http://www.bioteam.n
Manage your virtual IT with an Apple iPad   ( shown: via SSH

                                     - http://www.bioteam.n
Why virtualize?
Enhance IT Operations

• The “lights-out” stuff is nice but not a huge
   ▫ You still need remote power, serial & KVM to
     your hypervisor hosts after all!

• The real win is what comes next …

                      - http://www.bioteam.n
“Scriptable Infrastructure” is a BIG DEAL

      This single command will start a 5GB managed MySQL database in the Amazon
      cloud for $0.11/hour. The database is automatically patched, managed and
      backed up. Planned enhancements include auto-scaling & snapshots.

                                                - http://www.bioteam.n
Why virtualize?
Scriptable VM infrastructure!

                       - http://www.bioteam.n
And just because we can … (SSH host management via

                                - http://www.bioteam.n
Why virtualize?
Save Money

• Can yield significant financial savings
• Four main ways
  1. API & management tools reduce admin staff
  2. Green IT (performance per watt, avg.
  3. Reduced physical footprint
  4. Storage efficiencies (*ymmv)

                     - http://www.bioteam.n
Case Study
Large Cancer Research Institute

• On-campus research datacenter almost full
• Explosive growth in demand for CPU and
   ▫ Driven by next-gen DNA sequencing …
• No additional electrical power available
• No additional HVAC/cooling resources

• What to do?
                         - http://www.bioteam.n
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Datacenter turned into “virtual colo” facility
   ▫ What Customers?
       Campus research labs & PI’s
• Started small, initial focus on server
   ▫ Replaces the biggest or least efficient boxes
• As HVAC/power envelope grew, add serious
  central storage and backup systems

                                   - http://www.bioteam.n
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Results?
   ▫ Moderate gains:
       Increase in admin staff productivity
       Better monitoring & reporting capability
   ▫ Major gains:
       Server consolidation & more efficient hardware
        drove cooling & power requirements way down
       Now well within envelope existing facility can
       Instant ROI when measured against the cost of
        new datacenter - http://www.bioteam.n
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Additional gains for IT:
   ▫ Central storage system is “VM aware”
       More efficient disk utilization via thin
       Additional efficiency/backup gains with de-dupe
        and other content-optimization techniques
       Savings measured in many dozens-of-terabtyes
       Each efficiency win on disk yields downstream
        efficiencies with tapes, backup & replication
                                   - http://www.bioteam.n
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Additional gains for Researchers & PI’s:
   ▫ Have the expansion (CPU & disk) that they
   ▫ Delegated management tools let researchers
     “own” and control their own servers
       Especially the systems they had managed pre-
        consolidation (an important political issue)
   ▫ Delegated management of storage pools also
     let research self-manage disk resources &
                                   - http://www.bioteam.n
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• End result:
   ▫ Significant capability expansion within same
     physical, cooling & power envelope; no
     additional datacenter required
   ▫ Major centralized storage efficiency gains
     with thin provisioning & content optimization
   ▫ IT staff can be more productive
   ▫ Research staff have the same (if not more)
     administrative control over “their” systems

                                   - http://www.bioteam.n
Additional Benefits
Items I could not wedge in elsewhere …

                         - http://www.bioteam.n
Virtualization is a step towards the cloud
• As a hype-averse anti-marketing cynic
  even I see the value of cloud platforms
• Virtual local resources puts you one step
  towards an easier cloud migration in the
  ▫ Two main ways:
    Direct VM movement via commercial companies
     such as
    Open Virtualization Format (“OVF”) is a no-
     brainer middle-ground. I expect to see cloud
     providers supporting OVF import/export in the
     future*            - http://www.bioteam.n
Virtualization audit/process friendly
• In audit-intensive environments it is
  straightforward to write the documents and
  idempotent deployment steps that will build
  a consistent OS image time and time again
• Combined with a good configuration
  management system you have an auditable
  system build process with change
  management & tracking built in

                   - http://www.bioteam.n
Virtualization is orchestration/CM
• I made the same mistake with the BioTeam
  virtualization platform as I originally did with
  the Amazon Cloud in 2007-2009
  ▫ Mistake: wasting too much time and effort
    building & managing unique OS images for
    different roles & tasks
• Then I saw the light …
  ▫ OpsCode Chef (
    Bork!

                      - http://www.bioteam.n
Chef lets you …
Treat your infrastructure as code

• Manage configuration as idempotent
• Put resources together as recipes
• Group recipes into roles
• Track it all like source code
• Configure your servers

                           - http://www.bioteam.n
As a result of Chef …
BioTeam now manages only 4 Linux AMI’s in the cloud
• 32 & 64 bit CentOS Linux
• 32 & 64 bit Debian Linux
   ▫ … using chef-solo or Chef Server we can orchestrate
     any cloud server into any role we need in a matter of
• My internal IT “to-do” list involves BioTeam’s
  corporate XenServer VM platform:
   ▫ Build a single, patched and LDAP-aware stripped
     down CentOS image
   ▫ Orchestrate it via Chef
   ▫ If I do it correctly
      … my Chef recipes will work locally & on the cloud
      Infrastructure agnosticism rocks!
                               - http://www.bioteam.n

 - http://www.bioteam.n
My $.02 on virtualization in R&D
In a nutshell …

• Ideal “middle ground” between research
  informatics efforts & enterprise IT groups
• Significant administrative burden savings
• Significant “Green IT” facility/footprint
• Potentially large gains in storage efficiency
• Cloud & Orchestration friendy

                     - http://www.bioteam.n
Some final advice
An attempt to provide some specific tips …

                          - http://www.bioteam.n
Final Advice - 1
• Consider hiring an expert
 ▫ This is not a new field, best practices exist
 ▫ Don’t waste time making the mistakes that
   others have already discovered
• Involve your storage architects from day 1
 ▫ Many of the virtualization benefits are
   realized on top of a solid shared storage pool

                      - http://www.bioteam.n
Final Advice - 2
• VMWare is not the only game in town
 ▫ There are worthy competitors out there
• Commercial is not your only option
 ▫ BioTeam uses the free Citrix XenServer
 ▫ … does 100% of what we require
 ▫ … and 90% of what we would “like to have”

                    - http://www.bioteam.n
Final Advice - 3
• Don’t get fooled by hardware vendors & price
 ▫ Yes, for many large enterprise projects it DOES
   make sense to use Tier1 server and storage
   products …
 ▫ This is not true for all projects and all use-
• BioTeam runs it’s lab & business operations
 ▫ Citrix XenServer (free)
 ▫ Server iron from
                    - http://www.bioteam.n



                - http://www.bioteam.n

To top