Learning Center
Plans & pricing Sign in
Sign Out

Amazon Web Services Training


									Virtualization & Life Science R&D
2010 BioIT World Web Symposia Series

Who am I?
• I’m from the BioTeam
  ▫ Independent consulting shop
  ▫ Staffed by scientists forced to learn
    IT to get our own research done

• Found a fun business niche
  ▫ Bridging the “gap” between
    science, IT & high performance

• This matters today because …
  ▫ We work daily in pharma, biotech,
    academic, government & military
      … often we can talk/share our
  ▫ We have no financial
    entanglements with providers of
    virtualization products

• When it comes to virtualization
  I’m beholden to nobody
• Plan on speaking only on what
  I’ve done, seen and
• Think of me as an on-the-
  ground spy reporting on trends
  and applications, not a pundit,
  “expert” or “visionary”
  ▫ No excuses for not doing
    your own due diligence

I am not planning on …             I do plan to talk
• Explaining what virtualization   • … specifically about products
  is                                 and strategies we see often
• Selling you on a particular        see in research IT
  product or strategy                environments
• Covering the pros & cons in      • What I feel are the best
  any vague or generic sense         benefits to IT and the
                                   • Will also be specific about
                                     what my own company is

Meta-Topics for Today
• Why virtualize?              • Benefits: Everyone
• Benefits: IT Organizations   • Some practical advice
• Benefits: Research

Reasons to virtualize
Hype aside, what are the real benefits?

Why virtualize?
Three main areas of benefit …

• Enhances Research
• Enhances IT Operations
• Saves Money

Why virtualize?
Enhancing research

1. Gives researchers new capabilities
2. Increases the speed at which researchers can
   gain access to existing capabilities
3. Ideal meeting ground for when informatics & IT
   have potentially conflicting goals

Why virtualize?
Research: New Capabilities

• Desktop level virtualization gives researchers a
  manageable way to escape the confines of a
  managed corporate desktop OS image
• Uses:
   ▫ Access to Linux
   ▫ Sandbox environment for software dev & testing
   ▫ Testing of entire workflows & even small compute

Linux virtualization on Apple OS X

Why virtualize?
Research: Extend/Enhance Existing Capabilities

• Generally boils down to being able to do things
   ▫ Cliché example is provisioning rapidly new servers
     for research informatics workflows or development
      Weeks or months to provision if hardware needs to
       be purchased & installed
      Days or weeks to provision if hardware already

Why virtualize?
Research: Extend/Enhance Existing Capabilities

• Also a nice way for Enterprise IT to met Research IT
   ▫ Need: scientists often use webservers, databases and
     webapps for simple sharing of data among
     workgroups & projects
   ▫ Problem: scientists don’t typically code for security
     and their apps rarely rise to the level of needing to be
     added formally to the enterprise software portfolio
   ▫ Solution: enterprise qualified, patched & secured
     LAMP stack for researchers

Why virtualize?
And this brings me to …

• Virtual OS images can be an ideal “middle
  ground” between research informatics &
  Enterprise IT
   ▫ Especially IT support organizations …

Why virtualize?
Finding Common Ground between Research & Enterprise IT
• Absolutely standard, non controversial needs in
  many research environments:
   ▫ Access to Linux
   ▫ Ability to easily install software, libraries and patches
     via internet & other repositories without having to file a
     helpdesk ticket or ask someone for permission
   ▫ Elevated access privileges to handle file and owner
     permissions in a project or workgroup environment
   ▫ Desire to run web, database and application servers
   ▫ Elevated access privileges to control servers &
   ▫ Quick & dirty code & scripts for short-term & one-off
Why virtualize?
Finding Common Ground between Research & Enterprise IT

• Given the laundry list of requirements in the
  previous slide, how can Enterprise IT possibly
  support these crazy R&D people who need root
  access and write bad or insecure code?

   ▫ Provisioning enterprise-approved VM images
     makes this a manageable problem
   ▫ The VM image is also the ideal “line in the sand”
     when it comes to support

Why virtualize?
Finding Common Ground between Research & Enterprise IT

• The “blessed” Linux VM for researchers:
   ▫ Fully patched OS & kernel
   ▫ Already integrated with Active Directory
   ▫ Root password known only to IT
      Elevated privileges via managed /etc/sudoers file
      Anyone who needs sudo is allowed to have it
      Sudo actions logged to external host for audit trail
   ▫ Configurable periodic snapshot-based backups

Why virtualize?
Finding Common Ground between Research & Enterprise IT

• The Middle Ground
   ▫ In exchange for having unlimited freedom on the
     Linux VM, researchers understand that they give
     up a certain amount of IT support & handholding
   ▫ If the researchers freeze, kill, crash or corrupt the
     Linux VM during their work:
      IT will restore the VM from the last known-good
       snapshot or backup image
      No individual troubleshooting or support, if they
       “break” the system they will be given an older
       working backup image, nothing else.
Why virtualize?
Enhance IT Operations

Reduce Operational Burden
Remote mgmt. is non-trivial
1.   Remote power control
2.   Serial console switch
3.   Serial console cabling
4.   IP KVM Device

•    All these devices (and often
     more) are required to
     successfully provide a 100%
     “lights-out” remote
     management capability

„lights-out‟ mgmt is baked into virtualized

Manage your virtual IT with an Apple iPad ( shown: via SSH

Why virtualize?
Enhance IT Operations

• The “lights-out” stuff is nice but not a huge win
   ▫ You still need remote power, serial & KVM to your
     hypervisor hosts after all!

• The real win is what comes next …

“Scriptable Infrastructure” is a BIG DEAL

      This single command will start a 5GB managed MySQL database in the Amazon
      cloud for $0.11/hour. The database is automatically patched, managed and
      backed up. Planned enhancements include auto-scaling & snapshots.

Why virtualize?
Scriptable VM infrastructure!

And just because we can … (SSH host management via

Why virtualize?
Save Money

• Can yield significant financial savings
• Four main ways
  1. API & management tools reduce admin staff
  2. Green IT (performance per watt, avg. utilization)
  3. Reduced physical footprint
  4. Storage efficiencies (*ymmv)

Case Study
Large Cancer Research Institute

• On-campus research datacenter almost full
• Explosive growth in demand for CPU and
   ▫ Driven by next-gen DNA sequencing …
• No additional electrical power available
• No additional HVAC/cooling resources available

• What to do?

Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Datacenter turned into “virtual colo” facility
   ▫ What Customers?
       Campus research labs & PI’s
• Started small, initial focus on server
   ▫ Replaces the biggest or least efficient boxes
• As HVAC/power envelope grew, add serious
  central storage and backup systems

Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Results?
   ▫ Moderate gains:
       Increase in admin staff productivity
       Better monitoring & reporting capability
   ▫ Major gains:
       Server consolidation & more efficient hardware
        drove cooling & power requirements way down
       Now well within envelope existing facility can handle
       Instant ROI when measured against the cost of new
        datacenter construction
Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Additional gains for IT:
   ▫ Central storage system is “VM aware”
       More efficient disk utilization via thin provisioning
       Additional efficiency/backup gains with de-dupe and
        other content-optimization techniques
       Savings measured in many dozens-of-terabtyes
       Each efficiency win on disk yields downstream
        efficiencies with tapes, backup & replication

Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• Additional gains for Researchers & PI’s:
   ▫ Have the expansion (CPU & disk) that they
   ▫ Delegated management tools let researchers
     “own” and control their own servers
       Especially the systems they had managed pre-
        consolidation (an important political issue)
   ▫ Delegated management of storage pools also let
     research self-manage disk resources & quotas

Case Study
Large Cancer Research Institute – “Virtual Colocation Facility”
• End result:
   ▫ Significant capability expansion within same
     physical, cooling & power envelope; no additional
     datacenter required
   ▫ Major centralized storage efficiency gains with thin
     provisioning & content optimization
   ▫ IT staff can be more productive
   ▫ Research staff have the same (if not more)
     administrative control over “their” systems

Additional Benefits
Items I could not wedge in elsewhere …

Virtualization is a step towards the cloud …
• As a hype-averse anti-marketing cynic even I
  see the value of cloud platforms
• Virtual local resources puts you one step
  towards an easier cloud migration in the future
  ▫ Two main ways:
    Direct VM movement via commercial companies
     such as
    Open Virtualization Format (“OVF”) is a no-brainer
     middle-ground. I expect to see cloud providers
     supporting OVF import/export in the future*

Virtualization audit/process friendly
• In audit-intensive environments it is
  straightforward to write the documents and
  idempotent deployment steps that will build a
  consistent OS image time and time again
• Combined with a good configuration
  management system you have an auditable
  system build process with change management
  & tracking built in

Virtualization is orchestration/CM friendly
• I made the same mistake with the BioTeam
  virtualization platform as I originally did with the
  Amazon Cloud in 2007-2009
  ▫ Mistake: wasting too much time and effort building
    & managing unique OS images for different roles
    & tasks
• Then I saw the light …
  ▫ OpsCode Chef (
     Bork!

Chef lets you …
Treat your infrastructure as code

•   Manage configuration as idempotent resources
•   Put resources together as recipes
•   Group recipes into roles
•   Track it all like source code
•   Configure your servers

As a result of Chef …
BioTeam now manages only 4 Linux AMI‟s in the cloud
• 32 & 64 bit CentOS Linux
• 32 & 64 bit Debian Linux
   ▫ … using chef-solo or Chef Server we can orchestrate any
     cloud server into any role we need in a matter of minutes.
• My internal IT “to-do” list involves BioTeam’s corporate
  XenServer VM platform:
   ▫ Build a single, patched and LDAP-aware stripped down
     CentOS image
   ▫ Orchestrate it via Chef
   ▫ If I do it correctly
      … my Chef recipes will work locally & on the cloud
      Infrastructure agnosticism rocks!

My $.02 on virtualization in R&D
In a nutshell …

• Ideal “middle ground” between research
  informatics efforts & enterprise IT groups
• Significant administrative burden savings
• Significant “Green IT” facility/footprint savings
• Potentially large gains in storage efficiency
• Cloud & Orchestration friendy

Some final advice
An attempt to provide some specific tips …

Final Advice - 1
• Consider hiring an expert
 ▫ This is not a new field, best practices exist
 ▫ Don’t waste time making the mistakes that others
   have already discovered
• Involve your storage architects from day 1
 ▫ Many of the virtualization benefits are realized on
   top of a solid shared storage pool

Final Advice - 2
• VMWare is not the only game in town
 ▫ There are worthy competitors out there
• Commercial is not your only option
 ▫ BioTeam uses the free Citrix XenServer platform
 ▫ … does 100% of what we require
 ▫ … and 90% of what we would “like to have”

Final Advice - 3
• Don’t get fooled by hardware vendors & price tags
 ▫ Yes, for many large enterprise projects it DOES
   make sense to use Tier1 server and storage
   products …
 ▫ This is not true for all projects and all use-cases
• BioTeam runs it’s lab & business operations using:
 ▫ Citrix XenServer (free)
 ▫ Server iron from
 ▫ OpenFiler NAS software (free) for our storage pools
    Lots of good high and midrange storage options out
     there (NexSan, etc. etc.)




To top