Blutopia Cluster Life Cycle Management Demonstration

Document Sample
Blutopia Cluster Life Cycle Management Demonstration Powered By Docstoc
					       IBM Research


Blutopia
Stackable Storage
for Cluster Management

       Eric Van Hensbergen, Fabio Oliveira,
       Gorka Guardiola, Jay Patel



       IEEE Cluster 2007       09/19/07       (c) 2007 IBM Corporation
      IBM Research


    Motivation
     Complex infrastructure of data centers
     – Conglomerate of interdependent distributed components
     – Non-trivial, error-prone systems management tasks
     – Management costs rapidly overtaking hardware costs

     What system services can we provide to facilitate
     better solutions to management issues, reduce
     down time, and decrease time to provision (or
     reprovision) clusters of servers running existing
     operating systems and applications?




2         Blutopia – IEEE Cluster 2007   09/19/2007    (c) 2007 IBM Corporation
      IBM Research


    Our Approach

     Appliance model system-stack delivery
      – Pre-built images provided by VARs and OS Distributions
     Management model
      – Storage consolidation
      – Centralized management through flexible disk image manipulation:
        stackable storage
      – Physical machines separated from logical machines
     Trivialized typical tasks
      – Deployment of new servers
      – Assignment of roles to servers
      – Exchanging roles between servers (cluster reprovision)
      – Upgrade of software
      – Checkpoint/rollback of logical volumes
     Result: less scope for human mistakes

3          Blutopia – IEEE Cluster 2007          09/19/2007         (c) 2007 IBM Corporation
      IBM Research


    Outline

     Management with Blutopia
     Stackable storage design space
     Stackable storage prototype
     Evaluation
     – Avoidance of human mistakes
     – Performance analysis

     Conclusions


4         Blutopia – IEEE Cluster 2007   09/19/2007   (c) 2007 IBM Corporation
        IBM Research


     Blutopia Model
                          Customer
                          Configured
                           Images
                                                                             Managed
                                                                              Server




                                                       Scalable
                                                       Stackable             Managed
                                                       Storage                Server
                                                        Server
Web Management Interface


                           VAR Provided
                          Image Packages
                                            Per-system copy-on-write layer
                                            VAR supplied Application Image   Managed
                                                                              Server
                                            VAR supplied Base Image



5            Blutopia – IEEE Cluster 2007                  09/19/2007         (c) 2007 IBM Corporation
       IBM Research




Features

     Auto-detect ‘blank’ systems via their network boot.

     Install service-oriented “shrink wrapped” system images via tftp boot
      of kernel and network-based root disk.

     Instantaneous snapshot and reboot-based-rollback of system
      images.

     Provide snapshot protected administrative actions (such as upgrade
      or configuration changes).

     Provide ability to instantaneous clone a system based on its running
      image and then re-deploy the clone to multiple systems.



6          Blutopia – IEEE Cluster 2007         09/19/2007         (c) 2007 IBM Corporation
       IBM Research




Front End Implementation

     AJAX/PHP front-end populates database and activates Perl
     scripts

     Perl scripts assemble stackable file systems, fix configuration
     items, and modify exports

     Scripts then instruct systems to reboot (currently via ssh, but
     could be done with BladeCenter MM)

     Configured systems have Ganglia pre-installed, web-interface
     detects system live-ness from Ganglia and updates status
     display.


7          Blutopia – IEEE Cluster 2007      09/19/2007       (c) 2007 IBM Corporation
      IBM Research


    Example of Managed Service
     Three-tiered Internet service
     – Application images obtained from stackable storage
       • Web Server
       • Application Server
                  Web Server
       • Database Server                     …           Web Server




          Application
           Server                    …   Application
                                          Server
                                                                    Application
                                                                     Server




                                           Database



8         Blutopia – IEEE Cluster 2007                 09/19/2007            (c) 2007 IBM Corporation
      IBM Research


    Management Tasks
     Machine installation
     – Assignment of existing role to new machine



                                         Personality 1 RW

                                         Web Server   RO
                                           1.0.33


                                                      RO
                                            Base




9         Blutopia – IEEE Cluster 2007                      09/19/2007   (c) 2007 IBM Corporation
       IBM Research


     Management Tasks (cont.)

      Reprovisioning (change role)



     Personality 1 RW                                  Personality 2 RW

     Web Server             RO                         DB Server      RO
       1.0.33                                            4.11


                            RO                                        RO
         Base                                             Base




10         Blutopia – IEEE Cluster 2007   09/19/2007               (c) 2007 IBM Corporation
         IBM Research


      Management Tasks (cont.)
       Checkpoint and rollback
                                                                           Personality 3 RW
       Personality 1 RW                       Personality 2 RW             Personality 2 RO

        DB Server             RO              Web Server   RO               Web Server         RO
          4.11                                  2.0.54                        2.0.54


                              RO                           RO                                  RO
           Base                                  Base                          Base



     Original logical                       Chosen snapshot                Resulting logical
     volume                                                                volume
11           Blutopia – IEEE Cluster 2007                     09/19/2007            (c) 2007 IBM Corporation
       IBM Research


     Management Tasks (cont.)
      Role publishing (VAR task)
      – Example: creation of an upgrade for Web Server

                                                  Web Server
                                                     3.1
                     Personality

                    Web Server
                      2.0.54



                          Base



12         Blutopia – IEEE Cluster 2007   09/19/2007           (c) 2007 IBM Corporation
      IBM Research


     Management Tasks (cont.)
      Upgrade deployment

                                                      Personality 2 RW

     Personality 1 RW                                 Personality 1 RO
                                                      Web Server
                                                                       RO
     Web Server            RO                            3.1
       2.0.54
                                                      Web Server       RO
                                                        2.0.54
                           RO
        Base

                                                         Base          RO




13        Blutopia – IEEE Cluster 2007   09/19/2007             (c) 2007 IBM Corporation
      IBM Research


     Stackable Storage Design Space
      Client-side stacks                         Server-side stacks




            Stack location: client or server-side stacking
            Stack granularity: block or file stacking

            Data transport protocol: block or file transport



14        Blutopia – IEEE Cluster 2007      09/19/2007          (c) 2007 IBM Corporation
       IBM Research


     Stackable Storage Prototype
      File system stacking
      – We leverage UnionFS (stackable file system)
      – File transport protocol: NFS
      – Block transport protocol: iSCSI
                       UnionFS
                                          UnionFS
        NFS or                                            NFS
        iSCSI




15         Blutopia – IEEE Cluster 2007      09/19/2007         (c) 2007 IBM Corporation
       IBM Research


     Stackable Storage Prototype (cont.)
      Block stacking
      – Implemented Stackable Block Devices (SBD)
        • Specialized driver + Linux device mapper

      – Block transport protocol: iSCSI
                          SBD
                                           SBD

                                                          iSCSI
         iSCSI




16         Blutopia – IEEE Cluster 2007      09/19/2007           (c) 2007 IBM Corporation
       IBM Research




Stackable File Systems

      Advantages
      – Easy to manage
      – Multiple layers can be managed relatively independently

      Disadvantages
      – Pure file-system copy-on-write solutions “copy-up” whole
       files on first write increasing delay and storage overhead
      – Current solutions interact poorly with distributed file
       systems and caching mechanisms


17         Blutopia – IEEE Cluster 2007       09/19/2007          (c) 2007 IBM Corporation
       IBM Research




Stackable Block Devices

      Advantages
      – Simple
      – Can be distributed with NAS or SAN technologies

      Disadvantages
      – More care must be given when provisioning (sizing)
      – Stacks are not mutable – upper layers always dependent
       on ancestors
      – Metadata stacking currently rather inefficient


18         Blutopia – IEEE Cluster 2007     09/19/2007       (c) 2007 IBM Corporation
       IBM Research


     Performance Evaluation
      Experimental testbed
      – 9-node cluster – Gigabit Ethernet network
          • 1 Blutopia server and 8 diskless clients
      –   two dual-core 2.4 GHz AMD Opteron
      –   1 MB of cache per core
      –   4 GB of RAM
      –   server storage
          • two 10K RPM SCSI disks, average seek time of 4.1 ms
      –   Linux kernel 2.6.17
      –   Ext2 file system
      –   NFSv3 over TCP
      –   iSCSI over TCP
          • Enterprise Target version 0.4.13 (server)
          • Open-iSCSI version 1.0-485 (clients)
      – UnionFS version 1.2
      – Three-layer stacks: Linux + Apache + Personality
          • reads from personality layer

19           Blutopia – IEEE Cluster 2007               09/19/2007   (c) 2007 IBM Corporation
     IBM Research

Micro-benchmark Measuring Unionfs Layer Overhead




20       Blutopia – IEEE Cluster 2007   09/19/2007   (c) 2007 IBM Corporation
      IBM Research


     Postmark Benchmark (client-side stacking)




21        Blutopia – IEEE Cluster 2007   09/19/2007   (c) 2007 IBM Corporation
      IBM Research


     Postmark (Client vs. Server Side Stacking)




22        Blutopia – IEEE Cluster 2007   09/19/2007   (c) 2007 IBM Corporation
       IBM Research


     Bonnie++ Benchmark
      No significant difference between iSCSI and NFS
      baselines

      Less pronounced overhead of UnionFS over NFS as
      compared to Postmark results

      – average of only 18% (as opposed to 93%)

      No noticeable overhead of block stacking over iSCSI

      UnionFS over iSCSI caused an average overhead of
      11%


23         Blutopia – IEEE Cluster 2007   09/19/2007     (c) 2007 IBM Corporation
     IBM Research



     Dm-cache: client side block cache

                                                          Read Performance
                                                300



                           IOzone Runtime (s)
                                                                          Second Run
                                                250
                                                                          First Run
                                                200
                                                150
                                                100
                                                 50
                                                  0
                                                      without cache     with cache

                                                           Write Performance
                                                800
                          IOzone Runtime (s)




                                                                          Write Back
                                                600                       Second Run
                                                                          First Run
                                                400

                                                200

                                                 0                                          * - separate work by
                                                      without cache      with cache         Ming Zhao
                                                                                            (mingzhao@ufl.edu)

24       Blutopia – IEEE Cluster 2007                                          09/19/2007            (c) 2007 IBM Corporation
       IBM Research



       Future Work

      Expand model to incorporate support for logical
      partitions and their mapping to physical hardware

      Explore options on how to better support Windows

      Evaluate alternative network transports

      Hybrid model – stackable file system metadata with
      common content-addressable-storage block backend

      Extend cache models to allow for collaborative peer
      caches and multicast distribution of commonly
      requested block data

25         Blutopia – IEEE Cluster 2007   09/19/2007      (c) 2007 IBM Corporation
        IBM Research



      Conclusion
      Blutopia prototype is a platform for continuing research in
      simplifying administrative tasks

      Client-side stacking scales the best

      SAN technologies outperform Linux

      Client side disk caches improve performance further

      Stacking allows for rapid provisioning and reprovisioning of
      systems

      Rapid snapshot and rollback help mitigate operator error

      Deficiencies in approaches suggests hybrid approach may be
      optimal
26          Blutopia – IEEE Cluster 2007      09/19/2007      (c) 2007 IBM Corporation