Polyserve Matrix - PowerPoint by cqy28049

VIEWS: 21 PAGES: 34

Polyserve Matrix document sample

More Info
									     Linux High Availability Cluster Selection


                           Tim Burke

                     tburke@redhat.com


11                     1
     Which cluster product is right for me ?
     There is no one size fits all winner
     •



     •   Rapidly evolving marketplace

     •   The good news: There is a lot to choose from

     •The bad news: There is a lot to choose from
     •Strategy - be an informed consumer




11                                1
          Selection Process / Presentation
                      Outline
     •   Identify target applications - usage model
     •   Identify required cluster feature set
     •   Open source vs proprietary, product vs
         project
     •   Cost factors
     •   Vendor evaluation
     •   OEM & ISV endorsements


11                           1
                   Identify Target Applications
     •Clustering Categories
         •   High Availibility Clusters
             •   Database
             •   Fileservers
             •   Off the shelf applications
         •   Load Balancing Clusters
             •   Dispatching web traffic
         •   High Performance Computing
             •   Large computational problems


11                                         1
         High Performance Computing
        HPC, HPTC cluster attributes
         1. Large# of systems working together to
           solve a common problem -scalability
         2. Performance,   not reliability is of utmost
           importance
         3. Requires custom parallelized
           applications
         4. Tendsto be bleeding edge, early
           adopters
         5. Example  deployments: genetics,
           pharmacutical, weather, seismic
11         analysis, modeling1
              Load Balancing Clusters
     •   Front end dispatching node (or 2 for
         redundancy)
     •   Pool of inexpensive back end servers
     •   Redirect transactions so no 1 system is
         overloaded
     •   Balancing algorithms: round robin,
         weighted, load based
     •   Typically used for web server traffic
         (Apache front end)
     •   Useful for static content
     •   Not applicable for dynamic content

11                               1
             High Availability Clusters
     •   The need for high availability (HA)
     •   Overview of high availability features




11                          1
         Reliability, Availability, Serviceability
                         (RAS)
         Users & businesses have high expectations
                       - high degree of protection for
          1. Reliability
            corporate data. Information is a crucial business
            asset.
          2. Availability   - near continuous data access
          3. Serviceability- procedures to correct problems
            with minimal business impact




11                               1
     Sources of Downtime
     The Standish Group - 2001
                         Application bug or      Other
                         error
                         Main-system
                         hardware failure
                         Database error
                         Main-server system
                         bug
                         Network
                         Operator error
                         Other server's
                         hardware failure
                         Other server's sys -
                         tem bug
                         Environmental condi -
                         tions
                         Planned outage




11               1
          Downtime Costs -The Standish Group
                 Cost per minute of downtime (dollars)
13000
12000
11000
10000
 9000
 8000
 7000
 6000                                                                       Column 2
 5000
 4000
 3000
 2000
 1000
    0
        Electronic   Supply    E-         Internet   Customer   Messaging
        resource     chain     com-       banking    service
        planning     man-                            center
        (ERP)        agement


11                                    1
     No Single Point of Failure (NSPF)
        Hardware Redundancy - increased overall
         reliability and availability
         1. Multiple   paths between systems
         2. Storage    - mirrored, RAID5
         3. Multiple   power sources
         4. Multiple   external networks




11                              1
            High Availability Clusters
•   Redundancy for fault
    tolerance
•   Failover - if 1 node shuts
    down or fails, another
    node takes over
    application load
•   Facilitates planned
    maintenance


11                         1
                             Failover
     •   Involves selecting a target node & moving
         resources - failover policies
     •   Example resource types
         1. Physical   disk ownership
         2. Filesystems

         3. Applications

         4. Databases

         5. IP   addresses


11                             1
                   Failover Configurations
     •Active / Passive
         •   1 node runs application(s)
         •   Other node on standby for takeover
         •   Idle node can takeover with no performance
             degradation
     •Active / Active
         •   All nodes actively running application(s)
         •   Workload moves to survivor on failure
         •   Effectively utilizes capacity (TCO)

11                                 1
               Data Integrity Provisions
     •Crucial for safe failover of data centric services (filesystem /
     database)
     •In failure scenarios (eg hung node), ensure failed node can
     not access storage - I/O Barriers, I/O Fencing
     •Lack of I/O Fencing can result in
      •   Loss of data (backups ?)
      •   System crashes
     •Common mechanisms
      •   Power switches
      •   SCSI reservations
      •   Watchdog timers
11                                   1
                       Application Monitoring
     •All HA clusters monitor node state
     Most monitor key cluster resources - network,
     •

     disk
     •Many monitor application health
         •   Process existence
         •   Application check scripts
             •   HTTP get on web server
             •   Record retrieval on database
             •   Filesystem directory listing

11                                     1
                        Failover Times
     •Don't get too hung up on this
     •Remember that data integrity is paramount
     •Quoted failover times only include cluster overhead, don't
     include application recovery
      • Application startup time
      • Filesystem consistency checks
      • Database recovery - transaction replay
     •Example
      •   Product literature cites 5 second failover time
      •   Can be several minutes for database recovery (size &
          activity dependent)

11                               1
           Open Source vs Proprietary
              Project vs Product
     •   Open source facilitates self-support &
         customization
     •   Support is a key determinant
     •   Products are generally well tested
     •   Some products are also open source
     •   If you care enough about high availability &
         solution stacks, you're likely to go the
         product route

11                          1
          Heterogeneous HA Products
     •   Proprietary offerings that run on Linux, W2K,
         UNIX
     •   Unifies user training
     •   May compromise flexibility, adaptability or
         data integrity (ouch!)
     •   Some are Linux products with GUIs that run
         on other platforms
     •   Virtually none allow heterogeneous platforms
         within the same cluster
11                          1
                           Cost Factors
     •Beware of hidden charges
         •   Product base fee
         •   Application specific charges (Oracle, DB2, NFS,
             etc)
         •   Support
     •Some only come with bundled service offerings
     •Hardware requirements
     Proprietary UNIX offerings typically cost
     •

     several times more
11                               1
                   Vendor Evaluation
     • Company vision - do their cluster offerings complement or
       distract. Futures roadmap.
     • Financial Stability
     • Ability to impact the marketplace
     • Responsiveness - ability to provide ongoing feature
       enhancements
     • Proprietary vs open source
     • Product integration - fit with distribution, kernel patches,
       compatibility & support implications
     • New Linux technology vs large monolithic legacy ports
     • How long its been on the market
11                                1
                    Open Source Projects
     FailSafe - from SGI & SuSE
     •




       Optional data integrity provisions (power switch)
         •


       Supports 16 nodes
         •


       Good set of application kits
         •


     Red Hat Cluster Manager
     •




         •   Also offered as a product
         •   Described later in presentation




11                                  1
         HA Cluster Product Comparisons
     •The ground rules
         •   Trying to remain objective
         •   Highlight product strengths
         •   Listed in alphabetical order
         •   Based on web site content as of 10/2002




11                                 1
              HP - MC/Serviceguard
     •   Proprietary - Ported from HP/UX
     •   Only supported on HP hardware
     •   Dynamic online addition/removal of members
     •   Worldwide support services
     •   Quorum voting membership
     •   Up to 8 nodes using FibreChannel storage, 2
         nodes using SCSI
     •   Compaq Alpha line targeted at HPC clusters

11                        1
              Legato - Availability Manager
     •Proprietary
     •Heterogeneous (Linux, W2K, Solaris, HP-UX)
     •Strong data centric services
         •   Well integrated with SAN environments
         •   Replication
         •   Storage management, volume management,
             backup
     •Application monitoring
     •Extensive set of application specific modules
11                               1
         PolyServe - Application Manager
     •   Proprietary
     •   Application monitoring
     •   Up to 16 nodes
     •   Multiple platforms - Linux, W2K, Solaris
     •   Doesn't require shared storage
     •   Dynamic member addition/removal
     •   Centralized management


11                         1
            PolyServe - Matrix Server
     •   Tailored for Oracle 9i Real Application
         Clusters
     •   Concurrent read + write access to data on
         shared storage SAN
     •   Cluster filesystem with lock manager +
         distributed cache
     •   Allows incremental growth by adding servers
         + storage
     •   Proprietary
11                          1
                Red Hat - Cluster Manager
     •Bundled with RHL Advanced Server 2.1
     •Both open source & product


     •Data integrity provisions

         •   Power switches (optional)
         •   Watchdog timer software
     •Application monitoring
     •Heterogeneous fileserving via NFS + Samba


     •Web monitoring GUI


     •Also integrated Piranha load balancing cluster



11                               1
               Steeleye - LifeKeeper
     •   Proprietary - UNIX port
     •   Multi-platform - Linux, W2K
     •   Wide set of application kits (separately
         purchaced)
     •   Established OEM relationships
     •   Data integrity provisions - via SCSI
         reservations, requiring kernel patches
     •   Application monitoring

11                          1
                                      IBM
     •Focusing on HPC
         •   Rackmounted Intel servers
         •   Custom solutions
         •   (older) XCAT software for management, parallel
             operations, and installation
         •   (newer) Cluster Systems Mgt (CSM) for Linux
             •   Remote monitoring, resets, bios console
             •   Parallel shell
             •   Requires IBM hardware for imbedded service processor
     •High Availability via partnering

11                                    1
             Veritas Cluster Server
     •Recent Linux port
     •16 nodes, wide range of supported apps


     •Also runs on Windows, AIX, UNIX, Solaris


     •Integrates with their storage offerings (volume

     management, backup, data replication)
     •Proprietary




11                        1
                             Other Vendors
     •   Dell
          •   Strategic partnering for HA software
     •   Penguin Computing
          •   HPC offering via partnership with Scyld Beowulf




11                                    1
                   Consolidated Solutions
     •   Egenera
         •   BladeFrame hardware, backplane eliminates
             cabling
         •   Management software, HA, provisioning
     •   Linux NetworX
         •   Turnkey solution, preintegrated hardware + management
             tools
         •   Custom hardware, dense racks




11                                  1
                        Summary
     •   Know what category of cluster is right for you
     •   Be knowledgeable of required cluster
         features
     •   Weigh your cost criteria
     •   Chose a vendor you can trust to safeguard
         your corporate assets
     •   Be wary of marketing collateral



11                          1

								
To top