Docstoc

JR

Document Sample
JR Powered By Docstoc
					Managing Availability in an Enterprise
                         Agenda

• Welcome and
  Introductions
• Review of HA
  Requirements and IT
  Infrastructure
• Availability Options
• Open Discussion
The Issue of “Downtime”

      • Defined:
         – Applications and/or data are not accessible
            by users for any reason



      • Unplanned – 20% of all Downtime
         – Environmental Factors – 20%
         – Operator Error – 40%
         – Application failure – 40%



      • Planned – 80% of all Downtime
         –   Physical Environment / Back-up / Recovery – 10%
         –   HW, Network, OS, Systems Software – 10%
         –   Batch Processing – 10%
         –   Application and database –50%
The Analyst‟s Comment

                    Gartner research shows
                 that an average of 80 percent
       of mission-critical application service downtime
       is directly caused by people or process failures.



             Organizations must recognize that
          people, process and infrastructure are all
          interdependent facets of an HA solution.
        In fact, the people, process issues comprise
                  at least 80% of the solution.
Did You Know …

      • Average company has
          – 4 different Server operating systems
          – 6 operational databases
          – 10GB of data per person


      • 65% of IT Managers have begun to integrate with suppliers
      • 55% of IT Managers have begun to integrate with Customers
      • 45% of all Enterprise Integration Plans have Failed
OS/400 Availability Options

           Switch Disk                 SAN - Shark/EMC²



               LAN                           LAN

               HSL




                         Replication



                            LAN


                            HSL
 OS/400 Switched Disk (IASP)


           Production   Clients                Secondary
             Server                              Server




                        LAN


Switched                 HSL
  Disk


                                  Disk Tower
OS/400 Switched Disk (IASP)
       •   System level solution with limited redundancy
            – hardware upgrades
            – Systems maintenance
            – System failure
       •   Does not resolve:
            – backup window issue
            – software maintenance
            – failure of disk (same as with single system)
            – disaster recovery
            – IASP component failure
                (power supply, HSL breakage, disk failure)
       •   iSeries Hosts must be physically close to I/O tower
            – maximum HSL cable length is 50 feet (15 meters)
       •   Provides Redundancy at system unit level only
       •   Limited Availability solution for avoiding downtime
            – No automatic switch facility or heartbeat to detect
               systems unit failure unless in clustered environment
OS/400 Switched Disk (IASP)

       Object types not supported in an IASP

       *AUTHLR Authorization Holder         *IPXD      Internetwork packet exchange
       *AUTL Authorization List                 description
       *CFGL Configuration List             *JOBQ Job Queue
       *CNNL Connection List                *JOBSCD Job Scheduled Entry
       *COSD Class of Service Description   *LIND Line Description
       *CRG     Cluster Resource Group      *MODD Mode Description
       *CSPMAP                              *M36      AS/400 Advanced 36 machine
       *CSPTBL                              *M36CFG AS/400 Advanced 36
       *CTLD Controller Description             machine configuration
       *DDIR Distributed File Directory     *NTBD NetBIOS description
       *DEVD Device Description             *NWID Network Description
       *DOC Document                        *NWSD Network server description
       *EDTD Edit Description               *OUTQ Output Queue
       *EXITRG Exit Registration            *PRDAVL Product availability
       *FLR Folder                          *USRPRF User profile
       *IGCSRT Double-byte character set    *SOCKET Socket
           (DBCS) sort table                *SSND       Session description
       *IGCTBL Double-byte character set    *S36 System 36 Description
           (DBCS) font table
OS/400 SAN Solution (Shark/EMC²)


          Production   Clients             Secondary
            Server                           Server




                         LAN




   SAN                   SAN



                                 PPRC or
                                  SRDF
OS/400 SAN Solution (Shark/EMC²)
         •   Disk image based solution similar to Switched Disk
              – System unit solution only
              – Disk-level approach for Disaster Recovery
              – Does not address many availability challenges
         •   Data resiliency solution, not part of OS/400 topology,
             architecture, or clustering
              – Works with volumes (NT, Unix) while
                  iSeries file system is object level
              – Takes image rather then „net change‟ objects
         •   Disaster recovery setup requires 2nd SAN
              – 2nd SAN can‟t be used for real time processing such as
                backups, etc.
              – In restricted state and unusable
              – Limitation on synchronous distance to 64 miles (103 km)
   SAN   •   Recovery process identical to single system outage
             recovery process … IPL and manual intervention
              – Primary copy must be brought to restricted state and
                powered down to ensure object integrity
              – Requires full Volume retrieval rather then individual objects
              – Individual objects cannot be retrieved from 2nd SAN image
              – Risk of damaged objects – data is okay but application does
                not start
  OS/400 Replicated Systems


             Production   Clients   Secondary
               Server                 Server




                          LAN


                           HSL
Replicated
 Servers
 OS/400 Solution versus Downtime
                                    Outage source              %      Raid-5     SwDisk       SAN        2x SAN       H Avail



                 Planned            Backup window             68%       no         no          No          No          Yes

                                    Software changes          10%       no         no          No          No          Yes



                                    PTF installation          10%       no         no          No          No          Yes

                                    Maintenance               7%        no         no          no          no          Yes

                                    Hardware upgrade          5%        no         Yes        Yes         Yes          Yes
Switched
  Disk
                 Unplanned          Disk Unit failure         25%      No*         no          No         Yes          Yes

                                    Software                  22%       no         no          No         Yes          Yes

                                    Power outage              17%       No         no          No         Yes          Yes

                                    Telecom                   18%       no         no          No          No              No

                                    Human error               12%       no         no          no          No          Yes

                                    Processors                4%        no         Yes        Yes         Yes          Yes

                                    Disaster                  2%        no         no          No         Yes          Yes

Replicated   * If only one disk fails that is within a Raid, failure is protected. If the whole disk unit fails, or more
 Servers     than one disk or load source disk outside Raid-5, loss of data and downtime will occur.
 The Critical Criteria
              Solution Fundamentals
             • Professional Services
                –   Certified Expertise
                –   Proven Methodology
                –   Delivered runbook of procedures
                –   Education

             • Vendor Support
                – Certified Technical Expertise
Switched        – A Live Person on the Phone
  Disk
             • Field Service and Support
                – Understands the business and requirements
                – Supports the environment

             • Solution Options for Your Requirements
                – Provider delivers a range of options
                     including applications / services and support

Replicated   • References
 Servers        – In your industry and applications
   OS/400 Replication
                     Production System                          Backup System


                                PGMS    USRPRF     PGMS    USRPRF
               DB2     DA                                             DB2          DA
                                SPOOL    LIND      SPOOL    LIND
               IFS     DQ                                              IFS        DQ
                                    Other              Other


               DB2   Journal                           APPLY
                                Audit   Journal       CHANGE
                                                                     1, 2 . . . . . . 8, 9
                                                      ODS/400
                                                      ARCHIVE


                                    Event                            Staging
                                    Polled                           queue
                                   SAVING
                                 PARAMETERS


                                 ODS/400 SAVF
                                  PROCESSING

                               SNA, IP, Opti - Communication Links



Remote journaling has no provision to ensure integrity of the target data.
OS/400 Remote Journaling

              Production A                                         Backup


                                         Local J1     Remote J2    A 1     A2    A3   A4


               A 1    A2    A3     A4    Remote J4    Local J3


                                           Local J5                A 1     A2    A3
                                                      Remote J6

                     A 1   A2     A3
                                         Remote J8    Local J7


                                           Local J9                 A 1     A2
                                                      Remote J10

                           A 1    A2
                                        Remote J12    Local J11
 Replicated
  Servers                                                            A 1
                                          Local J13   Remote J14

                                 A 1    Remote J16    Local J15
 The Critical Criteria
               Technology Fundamentals
             • System Integrity
                –   If the copy isn‟t perfect .. it‟s useless
                –   Includes ALL the data and objects
                –   System Integrity is different than data integrity
                –   Role swap is about integrity not about time

             • Ease of Use
                – Powerful and Intuitive / common interface
Switched        – Ease of management and configuration
  Disk          – Easily trained and fully documented including a “runbook”

             • Performance
                – Slow isn‟t an option – replication and switching
                – Best throughput least CPU
                – Best throughput in Catch up mode

             • Scalability
                – The solution must work on one to many machines,
                – across applications,
Replicated      – across databases,
 Servers        – across departments,
                – servers and service level agreements
 The Critical Criteria
              Solution Fundamentals
             • Professional Services
                – Certified Expertise
                – Proven Methodology
                – Delivered runbook of procedures

             • Support
                – Certified Technical Expertise
                – A Live Person on the Phone
Switched
  Disk
             • Local Service and Support
                – Understands the business and requirements
                – Supports the environment

             • Solution Options for Your Requirements
                – Provider delivers a range of options
                     including applications / services and support

             • References
Replicated      – In your industry and applications
 Servers
How OS/400 Role Swap Works

                                      Source System
                                                                                 Target System
                                         USER                               USER
                                         PGM                              PGM,INQ
                                                                             PRT                JRN




                                                                        DATA        DTAA        IFS
                             DATA        DTAA     IFS
                                                                        BASE        DTAQ
                             BASE        DTAQ




                                                                           A 1     A 2     An     A 9

                                                                                     ROUTER
                                        JOURNAL

                                                                                 STAGING QUEUE
 Replicated
  Servers                                               SNA, IP,Opti

                                                        Communication
                                                           Links


         Vision Solutions 09, 2001
How OS/400 Role Swap Works

                                Target System                                       Source System
                                                           SNA, IP,Opti

                                                           Communication                           JRN
                                                              Links

                                   STAGING QUEUE

                                      ROUTER                                           DTAA         IFS
                                                                           DATA
                                                                           BASE        DTAQ
                             A 1      A 2      An    A 9



                                                                              A 1     A 2     An     A 9


                                       DTAA                                             ROUTER
                         DATA                       IFS
                         BASE          DTAQ
                                                                                         USER
                                                                                          PGM
                                                                                    STAGING QUEUE
 Replicated
  Servers
                                       JOURNAL




         Vision Solutions 09, 2001
Thank You !

				
DOCUMENT INFO