Docstoc

Lecture 9 - Grid Monitoring

Document Sample
Lecture 9 - Grid Monitoring Powered By Docstoc
					   Grid Monitoring
Architecture (GMA)
Introduction
       The diversity of components and their large number of
        users render them vulnerable to faults, failure and
        excessive loads
       Grid monitoring is a critical facet for providing a
        robust, reliable and efficient environment
       The goal is to measure and publish the state of all
        resources – software, hardware, and networks – at a
        particular point in time
       Monitoring data can be used in
           Fault detection
           Recovery
           Performance forecast

    2                                                Grid Computing
Why do we need monitoring?
       Debugging purposes
       Resource Utilization
       Performance Evaluation
       Security
       Management Decisions
       Accounting




    3                            Grid Computing
The Challenges of Grid Monitoring
 No single point of observation
 No central point of monitoring information
 Diverse Hardware and Software Systems
 Different policies and decision making mechanisms
 Network monitoring is very    important
 Larger monitoring data sets
 Security




 4                                             Grid Computing
Characteristics for Grid Monitoring
 Scalable
 Dynamic
 Robust
 Flexible
 Should be integrated with other Grid Technologies
     and middleware (security infrastructure, resource
     brokers, schedulers, ...)




 5                                             Grid Computing
Grid Monitoring Architecture (GMA)
       The GMA consists of three types of components
           Directory Service: supports information publication and discovery
           Producer: makes performance data available (performance event
            source)
           Consumer: receives performance data (performance event sink)


                         Consumer           Lookup Location



                                                                 Directory
        Subscription to event   Data Transfer
                                                                  Service

                         Producer               Store Location


    6                                                                        Grid Computing
Consumer
       Consumer is a program that receives monitoring data
        (events) from one or more producers
       Different types of consumer
           The archiving consumer: aggregates and stores monitoring
            data for later retrieval/analysis.
           Real-time consumer: collects monitoring data in real time.
           Overview consumers: collects events from several sources.
            And uses the combined information for decision making.
           Job monitoring consumers: can be used to trigger an action
            based on an event from a job.




    7                                                        Grid Computing
Consumer
       Consumer steps
        1.     Locate events: Consumers search a schema repository for a new event type. The
               schema repository can be a part of the GMA Directory Service.
        2.     Locate producers: Consumers search the Directory Service to find a suitable
               producer.
        3.     Initiate a query: Consumers request event(s) from a producer, which are delivered
               as part of the reply.
        4.     Initiate a subscription: Consumers can subscribe to a producer for certain kinds of
               events they are interested in. Consumers request event(s) from a producer
        5.     Initiate an unsubscribe: Consumers terminate a subscription to a producer.
        6.     Register: Consumers can add/remove/update one of more entries in the Directory
               Service that describe events that the consumer will accept from producers.
        7.     Accept query: Consumers can also accept a query request from a producer. The
               “query” will also contain the response.
        8.     Accept subscribe: Consumers accept a subscribe request from a producer. The
               producer will be notified automatically once there are requests from the consumers.
        9.     Accept unsubscribe: Consumers accept an unsubscribe request from a producer. If
               this succeeds, no more events will be accepted for this subscription.
            Consumers that initiate the flow of events should support steps 2-5
            Consumers that allow a producer to initiate the flow of events should support
             steps 6-8

    8                                                                               Grid Computing
Directory Service
       Directory Service provides information about
        producers or consumers.
       When producers and consumers publish their
        existence, they must provide event types they produce
        or consume.
       The publication information allows producers and
        consumers to discovery the types of available events,
        the characteristics of that data, and sources or sinks of
        data.
       Directory Service is not responsible for the storage of
        event data; only information about which event
        instances can be provided.
    9                                                    Grid Computing
Directory Service
    Functions supported by the Directory Service
        Authorize a search: Establish the identity of a consumer
         that wants to undertake a search.
        Authorize a modification: Establish the identity of a
         consumer that wishes to modify entries.
        Add: Add a record to the directory.
        Update: Change the state of a record in the directory
        Remove: Remove a record from the directory.
        Search: Perform a search for a producer or consumer of a
         particular type, possibly with fixed values for some of the
         event elements.
    There can be more than one directory services.

    10                                                       Grid Computing
Grid Monitoring Architecture (GMA)
    Extended Grid Monitoring Architecture with multiple Directory Service


              Consumer                Consumer               Consumer



                        Event Directory Service Gateway



              Event Directory        Event Directory       Event Directory
                 Service                Service               Service



          Producer       Producer      Producer        Producer      Producer


                                    Grid Resources

    11                                                                          Grid Computing
Producer
    A producer is a software component that sends
     monitoring data (events) to a consumer
    Producers can deliver events in a stream or as a single
     response per request.
    Producers are also used to provide access control to
     the event depending on policies, varying frequencies of
     measurement and ranges of performance detail.




    12                                             Grid Computing
Producer
    Producer steps
     1.   Locate event: Search the Event Directory Service for the description of an event.
     2.   Locate consumer: Search the Event Directory Service for a consumer.
     3.   Register: Add/remove/update one of more entries in the Event Directory Service describing
          events that the producer will accept from the consumer.
     4.   Accept query: Accept a query request from a consumer. One or more event(s) are returned in the
          reply.
     5.   Accept subscribe: Accept a subscribe request from a consumer. Further details about the event
          stream are returned in the reply
     6.   Accept unsubscribe: Accept an unsubscribe request from the consumer. If this succeeds, no
          more events will be sent for this subscription.
     7.   Initiate query: Send a single set of event(s) to a consumer as part of a query “request”.
     8.   Initiate subscribe: Request to send events to consumers, which are delivered in a stream.
          Further details about the event stream are returned in the reply.
     9.   Initiate unsubscribe: Terminate a subscription to a consumer. In this succeeds, no more data will
          be sent for this subscription.
    Producers that wish to handle new event types dynamically should
     support the step 1
    Producers that allows consumers to initiate the flow of events should
     support steps 2-6
    Producers that initiate the flow of events should support steps 7-9

    13                                                                                     Grid Computing
Producer
    Optional producer tasks
        Event caching allows consumers to request historical data
         from a particular sensor for prediction algorithm
        Event filtering can be applied to sent only if data value
         crosses a certain threshold
            CPU utilization is > 50%
            1, 10, 60-minute average CPU usage.




    14                                                    Grid Computing
Intermediary
    The compound
     producer/consumer is a
                                          Consumer
     single component that
     implements both producer
     and consumer interfaces           Producer Interface
    Forward, broadcast, filter,
     or cache the performance      Monitoring Service X
     events                            Consumer Interface
    Lessen the load on
     producers of event data
     that is of interest to many   Producer        Producer
     consumers

    15                                                 Grid Computing
Monitoring data
    Time-related data
        Time-stamped dynamic data – may be provided by a
         counter related to the sampling rate. Data includes
         performance event and status monitoring.
        Time-stamped asynchronous data – indicate when an event
         happens (alerts and checkpoints)
        Non-time-related data – includes static information such as
         OS type and version, hardware characteristics or the update
         time of monitoring information




    16                                                    Grid Computing
Monitoring data
    Information flow data
            Direct producer-consumer flow does not need a central
             component. Three interactions are described by GMA
             document:
             1.   Publish/subscribe
             2.   Query/response
             3.   Notification
            Indirect data distribution via a centralized repository. This
             is useful for static information.
            Following a workflow’s path. The data is tagged so that it
             can be associated with a particular part of workflow.
             Monitoring information is produced and stored locally.


    17                                                           Grid Computing
Monitoring data
    Monitoring categories
        Static monitoring
            system configuration and descriptions
        Dynamic monitoring
            network and system performance
        Workflow monitoring
            Variable amount of data is produced as the processing of a
             job/task take place.
            Processing status information, error reporting, job tracking




    18                                                               Grid Computing
Criteria for Grid monitoring Tools
    Scalable and can tolerate faults
    Cross-API monitoring: can deal with data collection from
     legacy and specialized software.
    Homogeneous data presentation: Data are clear and
     presented in standard ways for clients
    Information searching can be done in a timely manner
    Run-time extensibility: can support rapid transitions when
     resources join and leave during runtime
    Filtering/fusing of data that comes from multiplex stream
    Open and standard protocols
    Support standard security features
    Tools can be installed on demand, independent of other
     components
    19                                                 Grid Computing
An overview of grid monitoring systems :
Autopilot
    Autopilot’s infrastructure is based on the
     GMA and uses the Globus Toolkit to
     perform wide-area communication
     between its components                                                   Classification
    Sensor = GMA producer
    Actuator = GMA producer +
            mechanisms for steering remote




                                                  Decision Procedure
     application and controlling sensors                                         Sensor
     operation
    The AM (Autopilot Manager)
     performs GMA registry
    An Autopilot client corresponds to a                              Application      Resource Policy
     GMA Consumer, which locate sens.ors
     and actuators by searching the AM for
     registered keywords.
    APD (Autopilot Performance Daemon)                                         Actuator
     retrieves and records system
     performance information from remote
     hosts

    20                                                                                Grid Computing
An overview of grid monitoring systems :
CODE (Control and Observation in Distributed Environment)




 21                                              Grid Computing
An overview of grid monitoring systems :
CODE (Control and Observation in Distributed Environment)
    Sensors are installed on monitored hosts and gather monitoring
     data
    The SM (Sensor Manager) receives query requests and
     subscriptions from the Observer
    The Observer encapsulates the SM and sensor mechanisms on a
     monitored host and provides a Producer Interface(PI)
    PI support both query-response and subscription-based requests
    The Controller resides on a monitored host and provides
     mechanisms that allow consumers to execute actions on that
     host
    The Manager (consumer) connects to an observer to query for
     data, to subscribe, and to modify the subscriptions
    The Registry stores the locations of Observers and Controllers


    22                                                   Grid Computing
An overview of grid monitoring systems :
GridRM
    GridRM has a hierarchical architecture that provide homogeneous view of
     heterogeneous resources
    A Naming Schema (NS) defines the semantics by which resources are defined
    A Driver is a modular plug-in that is used to retrieve select information from
     native monitoring agents
    A Local Layer accesses to real-time/historical information from local resources
    The Global Layer provides inter-grid site or VO interaction between GridRM
     gateways
    Requests are received in an SQL form and passed to the Local Layer for
     processing
    Consumers interact with gateways at the Global Layer                GMA
    The local layer can perform caching                               Directory


                                              GridRM          GridRM             GridRM
                                               Client        Gateway            Gateway
                                                     Local Site                Remote Site

    23                                                                        Grid Computing
An overview of grid monitoring systems :
MDS4
    Information service for the Globus Toolkit 4, based on
     OGSA
    Scalable, uniform and efficient access to distributed
     information sources to support the discovery, selection
     and optimization of resources in Globus environment
    Components of MDS4 are represented as information
     services, each instance has associated Service Data(SD)
     that reveals resource information
    Resources heterogeneity can be masked through
     standardized reporting of static and dynamic resources
     information.

    24                                             Grid Computing
MDS4
    MDS 4 has a decentralized structure.
    MDS4 Can handle both static and dynamic data
    Use GSI to restrict access
    The Resource Layer consists of one or more service instances that produce SD
    The Collective Layer aggregates information from multiple “Resource Layer”
     services. The Index Service is an example of a Collective Layer service
    Client, e.g. user applications, interact with the IS or resource level services
     directly using subscription and query requests

                                             Resource C
                                                          SDE
                           Client                 SDE              SDE



                  Client                           MDS4 Index Service


                                    Resource A                  Resource B
                                                                              SDE
                                                  SDE
                                       SDE                              SDE



    25                                                                              Grid Computing
MDS4
    Index – a resource/service registry that aggregates information from
     multiple ‘resource layer’ services. Index service :
        Supports accessing, aggregating, generating, and querying SD from
         remote services.
        Provide service lookup mechanisms
        Provide Caching
    Trigger – event-driven data filter. Trigger can perform action on
     conditions.
        Ex. can send email when queue length on a compute resource goes over a
         threshold value
    WebMDS – create a specialized and homogeneous view of Index data
    MDS4 supports query/response and subscription/notification
     protocols.


    26                                                              Grid Computing
Information Provider
    GT4 information providers collect information from
     some systems and make it accessible to typical grid
     monitoring system.
    Examples of information providers
        Ganglia http://ganglia.sourceforge.net
        Nagios http://www.nagios.org
        Netlogger http://www-didc.lbl.gov/NetLogger




    27                                                 Grid Computing
Ganglia
    Ganglia is a distributed monitoring system for high-
     performance computing systems such as clusters and
     the Grid
    Based on a hierarchical design, multicast-based
     listen/announce protocol
    Ganglia uses
        XML for data representation
        XDR for data transport
        RRDtool for data storage and visualization
    PHP Web User Interface provides a view of the
     gathered information via real-time dynamic Web pages

    28                                                Grid Computing
Ganglia
    The Ganglia Monitoring Daemon (gmond) is a multi-threaded
     daemon running on each cluster node to be monitored
        Monitor changes in host state
        Multicast relevant changes
        Listen to the state of all other Ganglia nodes via a multicast channel
        Answer requests for an XML description of the cluster state
    The Ganglia Meta Daemons (gmetad) are used to provide a
     federated view by polling a collection of child data sources.
    Data sources of gmetad may be either gmond or gmetad

                    Client               gmetad


                              gmetad               gmetad

                   gmond         gmond        gmond         gmond
                    Node          Node         Node          Node

    29                                                               Grid Computing
Summary
    Monitoring is critical for providing a robust, high-
     performance Grid environment
    A basic monitoring has the following components
        Producers(sensors) that generate monitoring data (events)
        Consumers that consume events
        One or more directory services for registration and discovery of
         sensors/events/consumers
    A monitoring system should have
        GMA compliance
        Caching capability
        Scalable
        Resources monitored include network resources, host resources and jobs
        Resource performance forecasting
        Resource performance analysis
        Various presentation views for resource monitoring
        Directory service for events subscription and notification


    30                                                                      Grid Computing

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:4/23/2012
language:English
pages:30