Fault Tolerant and Resilient Web Services by olq42616

VIEWS: 0 PAGES: 37

									  Fault Tolerant and Resilient
         Web Services
                                        By Terry B. Bobbie
                                  Systems Engineer, Raytheon ITSS
                                        Bobbie@usgs.gov



April 18, 2002

                                                                                      Raytheon
U.S. Department of the Interior
U.S. Geological Survey                                 Contractor for the USGS at the EROS Data Center
Goals of this briefing

   Examine “Fault Tolerant and Resilient”
   Introduce an approach to mapping your
    requirements to service offerings
   Foster “out of the cube” thinking
   Learn from open discussions
   Gather some feedback and have some fun



                                             2
So why all the hub-bub ?

   Your data and service needs may have an
    elevated importance not known before
   Importance of hazard and emergency
    response information
            Protection of life and property
            Business continuity
   Matured technology and services
       Improved and reliable services



                                               3
Have our requirements changed ?
                     (suppliers and consumers)
           Private Industry, Academia, and Government


   Needs and requirements are diverse,
    unique, varied, and may not lend
    themselves to “stove-piped” solutions
       Diverse – community of users
       Uniqueness of use, data, and user elements
        (I.e. sophistication, access requirements,
        delivery requirements)
       “We’ve always done it this way” approach may
        no longer be valid


                                                        4
Fault Tolerant and Resilient

   What is “Fault Tolerant and Resilient” ?
No Fault Web … where packets collide with each
  other (injury) on the information super highway
  without individual ownership of responsibility
  (no individual packet liability)
Resilient Web … where injured packets repair
  themselves to “good as new” while “on-the-fly”
  (and sue the switches for pain, suffering and
    BIG $$$)

                                                    5
How about a design based on
Replication of … < ? >


  aa->xxhost.domain                       aa->xxhost.domain




                      aa->xxhost.domain




                                                              6
Example – USGS National Web




                              7
Example technologies used
3 Servers                    One Interface

   3 Sun quad 450s
   Replicated File systems (Andrew File System)
   DNS configurations
   CISCO Distributed Director will provide
    uninterrupted access to mirrored information
       Load balance between available National modules
       Only available modules remain in pick list




                                                          8
Benefits of Fault Tolerance and
Resiliency
   Improved reliability:
    Geographically distribute public access to
    content
   Improved customer service:
    Serve the public from high bandwidth sites and
    reclaim bandwidth for data transfer
   Improved management of content:
       Allow for distributed content management where
        appropriate while consolidating physical location



                                                            9
Benefits of Fault Tolerance and
Resiliency
   Improved security:
       Authentication and firewalls
       Sophisticated file access
            Kerberos authenticated editing of web pages from any
             system with an AFS client: desktop, laptop or server. At
             the office, home or away!
   Reduced System Administration Requirements
   Near 100% reliability for data and information
       Protects against
            network failures
            server failure
            natural disaster

                                                                        10
An approach to analyzing service
opportunities
   Phased approach
       1st phase is discovery, understanding, and
        translation of Web Service Requirements

       2nd phase is discovery, understanding, and
        translation of vendor market opportunities

       3rd phase is cross walking (mapping)
        requirements to vendor services available


                                                     11
Phase 1
Analyzing requirements
   Characterize Web hosting requirements
       Examples include
            Real-Time gathering and reporting
            WWW pages
            Images
            Flat files
            Databases
       Each may differ in their characteristics relative to
            Data
            Manipulation
            Access




                                                               12
Phase 1 - Web hosting
requirements - Real-Time
   Real-Time
       An event or series of events that by its nature and mission characteristic
        require periodic data collection and subsequent delivery in a timely manner.
        In some cases, this could be described as “on-demand” whereas a master
        process is executing for the collection of changing data and a
        corresponding slave process is made available for query and delivery by
        returning a element or series of data, collected at a specific moment in real
        time and delivered in a quick, efficient fashion. Should another request of
        the data collection be made, with all parameters equal, one could expect
        delivery content to be different. An example of this would be to sample a
        digital clock. At each second, the new time is passed to a query and
        delivery staging area. This area is made available to query and when
        queried, delivers its content(s) in real time (no delay). Each second may
        overwrite the previous or may be concatenated in order to construct a
        series. The query process is repeated with the parameters allowing for
        possible responses ranging from the single entry of current time to a series
        of collections ranging from current to oldest or any subset inclusive.



                                                                                        13
Phase 1 - Web hosting
requirements – WWW pages
   WWW pages
       Delivery of content within WWW pages may describe textual based
        information, documents, or graphics that are vital to the basis of
        information and research, but can be generally referred to as static. Each
        user request or “hit” returns the same front page information.
       Front pages of WWW servers that act as “directories or portals” of
        information may be static, requiring updates only as often as listing
        requires change. One would say that this page (a listing of directories) is
        static until a new directory is added or deleted. The actual content of the
        directory may not be hosted by the same source as the directory, thereby
        possibly not being described as static.




                                                                                      14
Phase 1 - Web hosting
requirements - Images
   Images
       Images can be large or small, compressed and uncompressed, and of
        different formats. Many images are jpeg (or other common format) and are
        used for logos, pictures for hosting, graphic representations, etc. Other
        images may be of different formats. Images are pre-generated (like a logo)
        while other images can be generated dynamically by user input.
       Images may be static – they are generated one time and rarely change (most
        often attached to WWW pages as static graphics delivered on each request
        or hit)
       Some images may be dynamically created – where user input defines
        criteria for graphic generation. (I.e. geo-spatial data and rendering)




                                                                                     15
Phase 1 - Web hosting
requirements - Databases
   Databases
       Many of today’s WWW pages contain user selectable parameters that may change and
        differ by user subject matter or interest. Custom user input may describe broad, open-
        ended, (like an infinite number of) input parameters much like a query based upon a
        key word. A good example of custom user input would be where results are returned
        based upon input parameters selected, chosen or otherwise obtained from a very large
        number of choices or selections. WWW search engines are designed and built with the
        idea that user input may not be entirely predictable (i.e. key word search and the key
        word could be any word (or combination of) used in the English language of over one
        million words).
       One may counter this concept with relating infinite to having a known set of boundaries
        (i.e. everything has an end limit or boundary). In the context of this definition, we
        should assume that infinite relates to a very large order of magnitude.
       USGS has many examples of this requirement today. One example is where user
        selectable boundaries are used as input criteria to deliver geo-spatial data. The same
        database, populated with a known set of data files, is queried with different input
        parameters and combinations and a different geo-spatial information is delivered for
        each unique query.




                                                                                                  16
Phase 1 - Web hosting
Data characteristics
   Data Characteristics
       Frequency of update requirements
            How often the data requires updating, modification or
             deletion. (I.e. hourly, weekly, monthly, dynamic)
       Volume of data
            Quantity as it relates to storage requirements
       Geographic Scope and Context
            Data may be relevant to global, national, regional, or local
             needs and may require service from multiple locations




                                                                            17
Phase 1 - Web hosting
manipulation requirements
   Manipulation Characteristics
       None (text-like)
            Data is served as a flat file without manipulation
       On-the-fly graphics generation
            Generation or rendering of graphics before presentation
             to a user
       Database query
            Lookup is executed based upon user input parameters
       Other special (Java based, map object rendering, etc.)




                                                                       18
Phase 1 - Web hosting access
requirements
   Access characteristics
       Frequency of use (hits, files served, etc.)
            How often are requests serviced in a period
       Fault tolerance limit (Low, medium, high)
       Importance of availability (L,M,H)
       Volume of units served per period
            150 WWW page (25KB ea.) deliveries hourly
            250 Images (500MB ea.) delivered per 24 hr day
            500 Database queries & responses per 8 hr business day
            350 “Gif-on-the-fly” deliveries per 24 hr day
       Expected delivery time per request


                                                                      19
An approach to analyzing service
opportunities
   Phased approach – Phase 2
       1st phase is discovery, understanding, and
        translation of Web Service Requirements

       2nd phase is discovery, understanding, and
        translation of vendor market opportunities

       3rd phase is cross walking (mapping) requirements
        to vendor services available


                                                            20
Phase 2 - Gain an understanding
of services available
   Web Services opportunities
       Vendor supplied
       Host site supplied
       Combinations of any or all
       Others ?




                                     21
Phase 2 - Characteristics of
Service Opportunities

   Key Characteristic Descriptions
   Data
       Local storage capability / capacity
       Responsiveness to (period or cycle) changes in source data (i.e.
        new www page or content, add/delete/change image files, database
        content and architecture, real-time data gathering
       Change Management Strategy and Plans (out-of-service
        maintenance, scheduled maintenance, access permissions, content
        change, software and platform changes, etc.)
       Geographic context (local, regional, national, global)




                                                                           22
Phase 2 - Characteristics of
Service Opportunities

   Key Characteristic Descriptions
   Manipulation
       Local processing capability/capacity
       Scalability of end-to-end response to events (i.e. excess capacity or
        headroom of resources, networks, CPU, memory, I/O interfaces,
        storage, other surge capability, etc.)




                                                                                23
Phase 2 - Characteristics of
Service Opportunities

   Key Characteristic Descriptions
   Access
       Bandwidth capability/capacity
       Service redundancy (networks, platforms, other infrastructure)
       Responsiveness (response time) to requests for serving data to
        end user
       Geographic context (locations are local, regional, national, global)
       Delivery of Data guarantee




                                                                               24
Phase 2 - Characteristics of
Service Opportunities
   Key Characteristic Descriptions
   Misc. (may apply of any or all of the categories)
       Uptime guarantee
       Security Management Strategy and Plans (system level, content,
        customer identity, etc.)
       Prioritized Users (i.e. can the vendor render a schema to priority
        users based upon volume, frequency, emergency response, etc.)
       Operations and Service Level agreements (backup strategies, 24x7
        system monitoring, trouble analysis and resolution, network
        management, technical support to end users and customer,
        contingency plans, etc.)




                                                                             25
An approach to analyzing service
opportunities
   Phased approach – Phase 3
       1st phase is discovery, understanding, and
        translation of Web Service Requirements

       2nd phase is discovery, understanding, and
        translation of vendor market opportunities

       3rd phase is cross walking (mapping) requirements
        to vendor services available


                                                            26
Phase 3 - a crosswalk analysis of
requirements and services
   Case Study
       Requirement # 1 – WWW pages
           Data Requirements
                Frequency of update (monthly)
                Volume is 500 MB (stored pages, graphics, work area)
           Manipulation
                None (text based page with small static graphics)




                                                                        27
Phase 3 - a crosswalk analysis of
requirements and services
   Case Study – Requirement # 1 - Con’t
          Access
               180 pages served per hour (history = 3 per min)
               Fault tolerance is high (outages are ok)
               Importance of availability is low (not required to
                safeguard human life and property)
               Volume is 180 x 50KB or 9000KB (9MB) per hour or
                150KB/min or 2500 Bytes per sec (sustained rate)
               If there is an expected delivery time of 5 sec …
                (delivery rate requirement = 10KB/sec)



                                                                     28
Phase 3 - a crosswalk analysis of
requirements and services
   Case Study
       Requirement # 2 – Image file generation
            Data Requirements
                 Frequency of update (hourly updates required)
                 Volume is 300 TB (image graphics)
            Manipulation
                 High (Gif-on-the-fly generation graphics)




                                                                  29
Phase 3 - a crosswalk analysis of
requirements and services
   Case Study – Requirement # 2 - Con’t
          Access
               10 files served per hour (history)
               Fault tolerance is low (very few outages)
               Importance of availability is medium (some
                requirement to safeguard human life and property)
               Volume is 10 x 500MB per hour or 1.389MB/sec
                (sustained rate)
               An expected delivery time of 1hr/file
                (delivery rate requirement)



                                                                    30
Requirements Matrix
                                                              Requirement 1     Requirement 2
                                                              (WWW pages)         (Images)
   Site and/or Customer Requirements


                                         Freq of Update         Monthly             Hourly
                                             Volume              500 MB             300 TB
                                        Geographic Scope          Local            National
                                       Manipulation (H,M,L)     None (L)       Gif-on-the-fly (H)
                                                               180 pgs/hr
                                             Access                           10 files/hr (500 MB)
                                                                 (50KB)
                                          Volume Output
                                                                2.5KB/sec        1.389MB/sec
                                         (sustaining rate)
                                        Expected Delivery
                                                                  5 sec              1 hr
                                               Time
                                                               10 KB/ sec        139 KB/ sec
                                          (delivery rate)




                                                                                                     31
Service Opportunities Matrix
                                                                     Vendor A offers
                                              Refresh Cycle
                                                                     Daily/Monthly
                                             (Freq of Update)
     Service Offerings (Opportunities)


                                            Storage Capacity
                                                                         500 TB
                                                (Volume)
                                           Geographic Context
                                                                        National
                                               (Location)
                                              Manipulation
                                         (Resource & Scalability       Yes (high)
                                             requirements)
                                                 Access              10 files/hr (500
                                               (Bandwidth)                 MB)
                                             Volume Output
                                                                      1.389MB/sec
                                          (Bandwidth Capacity)
                                          Expected Delivery Time         .5 hr
                                         (Delivery rate guarantee)    278 KB/ sec




                                                                                        32
The Magic Algorithm Matrix                                                                          Score
                                                                  Requirement 1       Vendor A       Pass
                                                                  (WWW Pages)          offers        Fail
                                           Refresh Cycle
                                                                    Monthly         Daily/Monthly    Pass
                                          (Freq of Update)
  Service Offerings (Opportunities)




                                         Storage Capacity
                                                                     500 MB            500 TB        Pass
                                             (Volume)
                                        Geographic Context
                                                                      Local           National       Pass
                                            (Location)
                                           Manipulation
                                                                                    Very Scalable
                                      (Resource & Scalability         Low                            Pass
                                                                                       (high)
                                          requirements)
                                              Access              180 pgs/hr of      10 files/hr
                                                                                                     Pass
                                            (Bandwidth)             50KB ea           500 MB
                                          Volume Output
                                                                    2.5KB/sec       1.389MB/sec      Pass
                                       (Bandwidth Capacity)
                                       Expected Delivery Time         5 sec            .5 hr
                                                                                                     Pass
                                      (Delivery rate guarantee)     10KB/sec        278 KB/ sec

                                                                                  Overall Score = Pass

                                                                                                            33
The Magic Algorithm Matrix                                                                           Score
                                                                  Requirement 2
                                                                                   Vendor A offers   Pass Fail
                                                                     (Image)
                                           Refresh Cycle
                                                                     Hourly         Daily/Monthly      Fail
                                          (Freq of Update)
  Service Offerings (Opportunities)




                                         Storage Capacity
                                                                     300 TB            500 TB          Pass
                                             (Volume)
                                        Geographic Context
                                                                     Global           National         Fail
                                            (Location)
                                           Manipulation
                                                                  Gif-on-the-fly    Very Scalable
                                      (Resource & Scalability                                          Pass
                                                                      (high)           (high)
                                          requirements)
                                              Access               10 files/hr       10 files/hr
                                                                                                       Pass
                                            (Bandwidth)            500 MB ea          500 MB
                                          Volume Output
                                                                  1.389MB/sec       1.389MB/sec        Pass
                                       (Bandwidth Capacity)
                                       Expected Delivery Time         1hr               .5 hr
                                                                                                       Pass
                                      (Delivery rate guarantee)    139KB/sec         278 KB/ sec
                                                                                   Overall Score = Fail

                                                                                                                 34
Cross walk Matrix
                              Vendor   Vendor   Vendor
                                A        B        C
       Requirement 1
                              Pass     Pass      Fail
       (WWW pages)
       Requirement 2
                               Fail    Pass      Pass
     (Image generation)
    Other Requirement X       P or F   P or F   P or F
    Other Requirement Y       P or F   P or F   P or F
    Other Requirement Z       P or F   P or F   P or F



   Total Score where 1 fail
                               Fail    Pass      Fail
         disqualifies


     In this case, only Vendor B meets all requirements



                                                          35
Summary
   Using facts and data, characterize your
    requirements
   Analyze vendor service offerings and
    opportunities
   Map requirements to vendor services
   Perform cost analysis
   Explore other options (½ full or ½ empty)
   Expect that not all needs can be fully met by
    vendors
   Analyze the cost benefit and tradeoffs

                                                    36
Example – USGS National Web




                              37

								
To top