Document Sample

                      R. B. Patel
           Department of Computer Engineering
           M. M. Engineering Mullana Ambala

8/9/2012                                        1

    Introduction of the Problem

    Problem Solutions
     • Agent Host failure detection and recovery
     • Agent failure detection and recovery
     • Communication Failure Detection and Recovery

    Reliability Evaluations
     • Using agent implementation

8/9/2012                                              2
          Introduction of the problem
     Mobile agents are autonomous software programs that can
     roam on the network on its own chosen time and place.
     We focus on designing of a 3-Layered Monitor System
     (3LMS) for mobile agent systems.
     The challenges are:
      • Guarantee of the service availability in the presence of
         Agent Host failures.
      • Guarantee of the service availability in the presence of
         agent failures.
      • Preserve data consistency in both agents and Agent
      • Preserve the exactly-once property.
      • Guaranteed the agent can eventually finish its assigned
8/9/2012                                                         3
Platform for Mobile Agent Distribution
   and Execution (PMADE) Model
                         Mobile Agent with Task

           User    Agent                  Manager Modules
                  Submitter               and Host Driver
                                           (Agent Host)

                          Mobile Agent’s Result

8/9/2012                                                    4
 Agent Host (AH) is responsible for accepting and executing
 incoming autonomous Java agents and after completion of
 assigned task of agent, it checks the status of the next Agent
 Host in the itinerary, if no more host to be visited, Agent Host
 finds the status of agent submitter (AS). If AS is on line, then
 sends the result of agent to the AS, else send to the base host
 (BH) if BH is active. Otherwise Agent Host retries to AS/BH
 after some time.

 Agent Submitter (AS) submits the Mobile Agent on behalf
 of the user to the Agent Host.
 Details of PMADE can be seen in [PG01a, PG03a].
8/9/2012                                                            5
                 Agent Submitter

• Receive agent and verify from the Database whether the
  received agent is in the list or not.
• If agent found in the list, check the status of Agent Host
  through Task Manager and connect AS with Agent Host,
  else send appropriate reply to the user.
• If connection is successful then send the agent to the Agent
  Host, else send appropriate reply to user.
• If agent reaches the Agent Host successfully, receives
  acknowledgment from the Agent Host.
• Receive the result and store on the local disk for future
  references by the client.
8/9/2012                                                     6
               Fault Tolerance Issues
    Host Failure: The host system, on which an agent resides,
    crashes or shuts down unexpectedly, due to failures. Many
    agents on the host may be in an inactive but waiting state, due to
    the unavailability of external events. If more agents migrate to
    this host, it may run out of memory.
    Agent Failure: When an agent travels from one host to another,
    it never reaches its destination due to crashes resulting from
    uncaught exception(s) thrown by itself or because it is
    terminated by some malicious host or agent.
    Communication Failure: When an agent travels from one host
    to another, it never reaches its destination, if the destination host
    has failed, due to uncaught exception(s) thrown by itself during
    communication or agent execution, or it is forced to fail by the
    operating system.
8/9/2012                                                                7
           Fault Tolerant System Design
 Existing fault tolerance schemes for MASs could be centralized
  or distributed, based on whether they are processed at a single or
  multiple sites.
• In the centralized control mechanism, a single process is set up
  on some node to monitor the whole system. The bottleneck of
  the centralized control mechanism is its scalability. This
  mechanism also suffers from a single point of failure.
• In the distributed case, a coordinator is needed, which could be a
  separate process, or be embedded into a daemon process.
• Distributed control mechanisms are good in terms of scalability
  and adaptability. Their major weakness is complexity, which
  often leads to some very complicated implementations. Besides,
  they also suffer from a high volume of network traffic.
8/9/2012                                                           8
Fault Tolerant System Design (contd.)

• In order to maintain scalability and also avoid single points of
  failure, we have designed and implemented a new framework to
  handle fault tolerance in MASs, called 3-Layered Monitor
  System (3LMS).
• The model works at three layers, providing fault-tolerance at the
  local host level, between hosts on a single network and between
  different networks in a global system, respectively. Depending
  on where the fault has occurred in the system, the corresponding
  layer performs recovery.
• This optimizes the network traffic, reduces unnecessary
  communication delays and provides for fast recovery. Thus, we
  basically follow a centralized control mechanism, with peer-to-
  peer and local monitoring schemes.
8/9/2012                                                          9
              3-Layered Monitor System
                       CMP (Layer 3)         Mobile Agent
                                             Agent Data

   Layer 2                                          Layer 2
               MMP                           MMP

                 Agent Host                    Agent Host
     LMP                             LMP
    Layer 1                        Layer 1
               Host                          Host
8/9/2012                                                    10
       Central Monitoring Process (CMP)

• The router of a network is called the base host.
• The PMADE system installed on it is a base server, which
  behaves like a gateway and routes agents between networks.
• It is only responsible for receiving agents, not for executing
• It is an independent Agent Host, assumed to be failure-free.
• If an agent wants to migrate to another network, it first checks
  the status of the router of that network. If it is active, it forwards
  the agent, otherwise it keeps it with itself.
• The central monitoring process is a proxy server called Primary
  Daemon Server (PDS) installed on the router. It controls and
  monitors agents roaming in a global system of networks
8/9/2012                                                              11
      Middle Monitoring Process (MMP)

• This is also a proxy server called Secondary Daemon
  Server (SDS), which runs on every host in the local
  network, out side of the Agent Host.

• It monitors the status of agents migrating within the
  network and Agent Hosts running on the host and helps to
    recover them when failures occur inside the network.

8/9/2012                                                   12
           Lower Monitoring Process (LMP)
• This is a stationary agent called Daemon agent (DA),
  which works either on the instruction of the agent owner or
  the PDS running at the router and SDS running at the local
• It checks the status of agents executing at an Agent Host,
  the Agent Host itself and the hosts.
• When an agent or Agent Host is found to be down, it
  restates them on the instruction of PDS and SDS.
• When the status of a host is not known because of faulty
  communication, it replaces it with an active one.

8/9/2012                                                   13
  Host Failure Detection and Recovery

    Incorporate a failure detection program
    When an Agent Host restarts, abort all
    uncommitted transactions in the Agent Host.
      This preserves data consistency
    When the agent re-executes after the
    initial states
      Visited Agent Hosts will be visited again
      Violates exactly-once execution property

8/9/2012                                           14
  Host Failure Detection and Recovery
• We have a global daemon server (PDS)
  which monitors all the Agent Hosts.
• Single point of failure problem
                         monitoring daemon server (PDS)

Agent Host Pool

8/9/2012                                        15
  Host Failure Detection and Recovery
    When the daemon server recovers an Agent Host
     • It aborts all the uncommitted transactions performed by
       those lost agents.
     • This preserves data consistency in the Agent Host.

    This technique is
     • Easy to implement
     • Can be deployed on every existing mobile agent
       platform, without modifying the platform.
8/9/2012                                                    16
   Agent Failure Detection and Recovery

           When an Agent Host fails, its residing agents are lost.
            • We aim at recovering such loss in this level as well as agents
              terminated by some malicious agents.
           By using checkpointing
            • We checkpoint agent internal data
            • We use checkpointed data to recover lost agents.
            • Agent data consistency is preserved
           Recovery of agent happens on the failed Agent Host
            • This preserves the exactly-once execution property.

8/9/2012                                                                       17
           User Controlled (AFDR) contd.
• An agent owner directly launches an agent for execution on any
  host in a single (global) network. Thereafter, the agent travels in
  the network under the control of AHs.
• A time period is fixed in the agent’s itinerary to perform its
  assigned job and report back to its owner. The agent owner also
  fixes a time for each method, to perform its assigned task at the
  specified host.
• If the agent does not report back within this time period, the
  agent owner begins to broadcast a message to find its status.
  After broadcasting the message, it waits for a fixed interval of
  time (timeout period) for the return message.
• If the return message is not received, it keeps broadcasting the
  same message till it receives a message about the status of the
8/9/2012                                                           18
              User Controlled (AFDR) contd.




                                  Hi+1          Hi             Hi+1
Checkpointed Status flag                 Checkpointed   Agent’s result
data                                     data
                                                        Daemon Agent
   8/9/2012                                                      19
Base Host Controlled (AFDR)

• AS submits the agent to the BH of the network.
• The BH registers the agent and forwards it to the first active AH
  in the itinerary. When the agent completes its assigned task(s)
  on one network, but has tasks remaining in other networks, it is
  forwarded by the BH of the current network to the BH of the
  next network.
• If the agent is presently in network 1 and wants to migrate to
  network 2, whose router has failed, then it needs to wait at the
  router of network 1. If it has some task that can be performed on
  a third network, it is forwarded to that network.
• After completion of the task in this network, it retries network 2,
  etc. This process continues till the assigned task is over or no
  more hosts are active.
8/9/2012                                                           20
  Base Host Controlled (AFDR) contd.


           Base Host (router)
                                              AH              AH                AH

                                                                        Network 1
                                            Status flag

       Base Host
                                            AH              AH             AH
                                                                        Network 2
                                                                       Network 3

8/9/2012                                                                             21
User/Base Host Controlled (AFDR)
• The user controlled agent failure detection recovery
  scheme has the disadvantage that the agent owner needs to
  keep in touch with its launched agent.
• Also, if three or more successive hosts in the itinerary fail,
  failure detection and recovery of roaming agents is not
• This scheme is not suitable for slow or open networks like
  the Internet, because in such cases, time periods have to be
  allocated to agents to complete their itinerary.

8/9/2012                                                      22
                     Message Based (AFDR)
              Agent Host                          (2)                       Agent Host
                 A                                            (3)                A

                 DSi-1                            (5)                             DSi

    Checkpoint           Registration
                                            23                      Checkpoint           Registration
                Host                                                             Host
  Base Host                                             1.   Agent registered , R re g
                           PDS                                                    i
                                                        2.   Send message ,     M reg
                                                        3.   After computation checkpoint the
           Checkpoint         Registered agents
                                                        4.   Agent register message ,    i
                                     list                                              M dereg
                                                        5.                  . i
                                                             Send message M dereg

8/9/2012                                                                                          23
           Message Based (AFDR) contd.

                   Agent                                                 leave
     Arrive at i
                   messages        Checkpointing happens!!
     Leave i       box

     Agent Host i-1                Agent Host i                 Agent Host i+1

                                   Arrive at i
                                   Leave i
                    registration                 registration             registration
8/9/2012                                                                  24
           Message Based (AFDR) contd.

                               Arrive at i+1
                               Leave i+1       box

     Agent Host i-1            Agent Host i                   Agent Host i+1

             Transfer charge
                               Arrive at i                    Arrive at i+1
                               Leave i                        Leave i+1
                                               registration                   registration

8/9/2012                                                                      25
  Failure and Recovery Scenarios
    We only cover stopping and termination
    both failures.

    We handle most kinds of failures:
     • SDS fail to receive “arrive at i” message
     • SDS fail to receive “leave i” message

8/9/2012                                           26
           Missing arrive message
•     The reason may be:                               Zzz..
                                                              Arrive at i
     1. message is lost
     2. message arrives after timeout period
     3. Agent dies when it is ready to leave Agent Host i-1
     4. Agent dies when it has just arrive at Agent Host i,
        without registering.
     5. Agent dies when it has just arrive at Agent Host i,
        with registering.
8/9/2012                                                       27
           Missing arrive message
• It is simple for the 1st and 2nd case.                          Back

            Agent Host i-1          Agent Host i

                      message                        found reg.
                                    Arrive at i

                     registration                 registration
8/9/2012                                                          28
           Missing arrive message
• For the 3rd and 4th cases, recovery takes
  place.                                                    Back

             Agent Host i-1          Agent Host i

                           no reg.

                      registration          registration
8/9/2012                                                     29
           Missing arrive message
• For the 5th case, it results in missing
     – since registration appears in the Agent Host
     – the consequence is that “leave i” message never


8/9/2012                                             30
           Missing leave message
• The reason may be:                               Zzz..
                                                       leave i

     1. message is lost.
     2. message arrives after timeout period
     3. Agent dies when it has just sent the “arrive at i”
     4. Agent dies when it has just registered the
        message “leave i” message.
8/9/2012                                                   31
           Missing leave message
• The 3rd case is the same as the previous
  missing detection case (missing arrive
  message case).
             Agent Host i-1          Agent Host i

                         no reg.                             checkpoint
                                     Arrive at i             data

                      registration                 registration
8/9/2012                                                            32
           Missing leave message
• In this case, the recovery action is the same
  as the previous section (missing arrive
  message case).

     • When failure happens, the agent should be
       performing computation.
     • So, when Agent Host recovers, the agent’s
       computation has aborted.                    Back
8/9/2012                                           33
           Missing leave message
• This results in missing detection again.

     • This can be compensated by the Agent Host
     • It is because the witness will never receive
       “arrive i+1”.

8/9/2012                                              34
 Thread Based Agent failure Detection
           and Recovery
• AS sends the agent to an AH.
• After registration, a thread object is created and execution starts.
  When the agent migrates from the AH after completion of task,
  its thread object is removed from the sending AH.
• The SDS running on the host keeps in touch with the AH and
  gets its status at regular intervals of time. On any update (arrival
  of new agent, migration of old agents, etc) occurring on an AH,
  it checks for status changes in the agent’s thread objects,
  because any malicious agent generally attacks the thread. If the
  thread is suspended, SDS resumes it. Similarly, if the thread is
  terminated a new thread is created and started.

8/9/2012                                                            35
   Communication Failure Detection and
• In the proposed model, we have provided an option that if a host
  fails, the agent is forwarded to the next host in the itinerary
  [PG03b]. Here we have assumed that any number of consecutive
  nodes can fail, but at least one node in the itinerary should be
  active to receive the agent.
• To avoid communication failures, PMADE permits a user to
  specify a list of alternative hosts in the agent’s itinerary and the
  agent requests the executing host to transfer it to the first host in
• If agent migration to the first host fails, an attempt is made to
  transfer it to the second host in the itinerary and so on. If agent
  migration still fails, its result is transferred to its launcher, if it is
  active. Otherwise, it is sent to the persistence store.
8/9/2012                                                                  36

Shared By:
Description: mobile computing