"Distributed Processing, ClientServer, and Clusters"
Distributed Processing, Client/Server, and Clusters Chapter 13 Distributed Message Passing • Message passed used to communicate among processes • Send and receive messages as used in a single system or as Remote procedure calls (RPC) Fred Kuhns () 2 Basic Message-Passing Primitives Fred Kuhns () 3 Reliability Versus Unreliability • Reliable message-passing: guarantees delivery if possible – Not necessary to let the sending process know that the message was delivered • Unreliable message-passing: Send the message out into the communication network without reporting success or failure – Reduces complexity and overhead Fred Kuhns () 4 Blocking Versus Nonblocking • Nonblocking – Process is not suspended – Efficient and flexible – Difficult to debug • Blocking – Send does not return control to the sending process until • the message has been transmitted OR • until an acknowledgment is received – Receive does not return until a message has been placed in the allocated buffer Fred Kuhns () 5 Remote Procedure Calls • Allow programs on different machines to interact using simple procedure call/return semantics • Widely accepted • Standardized – Client and server modules can be moved among computers and operating systems easily Fred Kuhns () 6 Client/Server Binding • Binding specifies the relationship between remote procedure and calling program • Nonpersistent binding – logical connection established during remote procedure call • Persistent binding – connection is sustained after the procedure returns Fred Kuhns () 8 Synchronous versus Asynchronous • Synchronous RPC – Behaves must like a subroutine call • Asynchronous RPC – Does not block the caller – Enable a client to invoke a server repeatedly so that it has a number of requests in the pipeline at one time Fred Kuhns () 9 Object-Oriented Mechanisms Fred Kuhns () 10 Clusters • Alternative to symmetric multiprocessing (SMP) • Group of interconnected, whole computers working together as a unified computing resource – illusion is one machine – system can run on its own Fred Kuhns () 11 Benefits of Clusters • Absolute scalability – Can have dozens of machines each of which is a multiprocessor • Incremental scalability – Add new systems in small increments • High availability – Failure of one node does not mean loss of service • Superior price/performance – Cluster can be equal or greater computing power than a single large machine at a much lower cost Fred Kuhns () 12 Clusters • Separate server – Each computer is a separate server – No shared disks – Need management or scheduling software – Data must be constantly copied among systems so each is current Fred Kuhns () 13 Clusters • Shared nothing – Reduces communication overhead – Several servers connected to common disks – Disks partitioned into volumes – Each volume owned by a computer – If computer fails another computer gets ownership of the volume Fred Kuhns () 14 Clusters • Shared disk – Multiple computers share the same disks at the same time – Each computer has access to all of the volumes on all of the disks Fred Kuhns () 15 Operating System Design Issues • Failure management – Highly available cluster offers a high probability that all resources will be in service • No guarantee about the state of partially executed transactions if failure occurs – Fault-tolerant cluster ensures that all resources are always available Fred Kuhns () 16 Operating System Design Issues • Load balancing – When new computer added to the cluster, the load-balancing facility should automatically include this computer in scheduling applications • Parallelizing Computation – Parallelizing compiler – Parallelized application – Parametric computing Fred Kuhns () 17 Cluster Computer Architecture • Cluster middleware services and functions – Single entry point – Single file hierarchy – Single control point – Single virtual networking – Single memory space – Single job-management system Fred Kuhns () 18 Cluster Computer Architecture • Cluster middleware services and functions – Single user interface – Single I/O space – Single process space – Checkpointing – Process migration Fred Kuhns () 19 Clusters Compared to SMP • SMP is easier to manage and configure • SMP takes up less space and draws less power • Clusters are better for incremental and absolute scalability • Clusters are superior in terms of availability Fred Kuhns () 20 Windows 2000 Cluster Service • Cluster Service – Collection of software on each node that manages all cluster-specific activity • Resource – Item managed by the cluster service • Online – Online at node when it is providing service on that specific node • Group – Collection of resources managed as a single unit Fred Kuhns () 21 Sun Cluster • Major components – Object and communication support – Process management – Networking – Global distributed file system Fred Kuhns () 23 Beowulf and Linux Clusters • Key features – Mass market commodity components – Dedicated processors (rather than scavenging cycles from idle workstations) – A dedicated, private network (LAN or WAN or Internet combination) – No custom components – Easy replication from multiple vendors Fred Kuhns () 26 Beowulf and Linux Clusters • Key features – Scalable I/O – A freely available software base – Using freely available distribution computing tools with minimal changes – Returning the design and improvements to the community Fred Kuhns () 27 Distributed Process Management Process Migration • Transfer of sufficient amount of the state of a process from one machine to another • The process executes on the target machine Fred Kuhns () 30 Motivation • Load sharing – Move processes from heavily loaded to lightly load systems – Load can be balanced to improve overall performance • Communications performance – Processes that interact intensively can be moved to the same node to reduce communications cost – May be better to move process to where the data reside when the data is large Fred Kuhns () 31 Motivation • Availability – Long-running process may need to move because the machine it is running on will be down • Utilizing special capabilities – Process can take advantage of unique hardware or software capabilities Fred Kuhns () 32 Initiation of Migration • Operating system - When goal is load balancing • Process - When goal is to reach a particular resource Fred Kuhns () 33 What is Migrated? • Must destroy the process on the source system and create it on the target system • Process control block and any links must be moved Fred Kuhns () 34 What is Migrated? • Eager (all):Transfer entire address space – No trace of process is left behind – If address space is large and if the process does not need most of it, then this approach my be unnecessarily expensive Fred Kuhns () 37 What is Migrated? • Precopy: Process continues to execute on the source node while the address space is copied – Pages modified on the source during precopy operation have to be copied a second time – Reduces the time that a process is frozen and cannot execute during migration Fred Kuhns () 38 What is Migrated? • Eager (dirty): Transfer only that portion of the address space that is in main memory and have been modified – Any additional blocks of the virtual address space are transferred on demand – The source machine is involved throughout the life of the process Fred Kuhns () 39 What is Migrated? • Copy-on-reference: Pages are only brought over on reference – Variation of eager (dirty) – Has lowest initial cost of process migration Fred Kuhns () 40 What is Migrated? • Flushing: Pages are cleared from main memory by flushing dirty pages to disk – Relieves the source of holding any pages of the migrated process in main memory Fred Kuhns () 41 Negotiation of Migration • Migration policy is responsibility of Starter utility • Starter utility is also responsible for long-term scheduling and memory allocation • Decision to migrate must be reached jointly by two Starter processes (one on the source and one on the destination) Fred Kuhns () 42 Eviction • System evicts a process that has been migrated to it • If a workstation is idle, process may have been migrated to it – Once the workstation is active, it may be necessary to evict the migrated processes to provide adequate response time Fred Kuhns () 44 Distributed Global States • Operating system cannot know the current state of all process in the distributed system • A process can only know the current state of all processes on the local system • Remote processes only know state information that is received by messages – These messages represent the state in the past Fred Kuhns () 45 Example • Bank account is distributed over two branches • The total amount in the account is the sum at each branch • At 3 PM the account balance is determined • Messages are sent to request the information Fred Kuhns () 46 Example Fred Kuhns () 47 Example • If at the time of balance determination, the balance from branch A is in transit to branch B • The result is a false reading Fred Kuhns () 48 Example Fred Kuhns () 49 Example • All messages in transit must be examined at time of observation • Total consists of balance at both branches and amount in message Fred Kuhns () 50 Example • If clocks at the two branches are not perfectly synchronized • Transfer amount at 3:01 from branch A • Amount arrives at branch B at 2:59 • At 3:00 the amount is counted twice Fred Kuhns () 51 Example Fred Kuhns () 52 Some Terms • Channel – Exists between two processes if they exchange messages • State – Sequence of messages that have been sent and received along channels incident with the process Fred Kuhns () 53 Some Terms • Snapshot – Records the state of a process • Global state – The combined state of all processes • Distributed Snapshot – A collection of snapshots, one for each process Fred Kuhns () 54 Global State Fred Kuhns () 55 Global State Fred Kuhns () 56 Distributed Snapshot Algorithm Fred Kuhns () 57 Mutual Exclusion Requirements • Mutual exclusion must be enforced: only one process at a time is allowed in its critical section • A process that fails in its noncritical section must do so without interfering with other processes • It must not be possible for a process requiring access to a critical section to be delayed indefinitely: no deadlock or starvation Fred Kuhns () 58 Mutual Exclusion Requirements • When no process is in a critical section, any process that requests entry to its critical section must be permitted to enter without delay • No assumptions are made about relative process speeds or number of processors • A process remains inside its critical section for a finite time only Fred Kuhns () 59 Centralized Algorithm: Mutual Exclusion • One node is designated as the control node • This node control access to all shared objects • If control node fails, mutual exclusion breaks down Fred Kuhns () 60 Distributed Algorithm • All nodes have equal amount of information, on average • Each node has only a partial picture of the total system and must make decisions based on this information • All nodes bear equal responsibility for the final decision Fred Kuhns () 62 Distributed Algorithm • All nodes expend equal effort, on average, in effecting a final decision • Failure of a node, in general, does not result in a total system collapse • There exits no system wide common clock with which to regulate the time of events Fred Kuhns () 63 Ordering of Events • Events must be ordered to ensure mutual exclusion and avoid deadlock • Clocks are not synchronized • Communication delays • State information for a process is not up to date Fred Kuhns () 64 Ordering of Events • Need to consistently say that one event occurs before another event • Messages are sent when process wants to enter critical section and when leaving critical section • Time-stamping – Orders events on a distributed system – System clock is not used Fred Kuhns () 65 Time-Stamping • Each system on the network maintains a counter which functions as a clock • Each site has a numerical identifier • When a message is received, the receiving system sets is counter to one more than the maximum of its current value and the incoming time- stamp (counter) Fred Kuhns () 66 Time-Stamping • If two messages have the same time- stamp, they are ordered by the number of their sites • For this method to work, each message is sent from one process to all other processes – Ensures all sites have same ordering of messages – For mutual exclusion and deadlock all processes must be aware of the situation Fred Kuhns () 67 Token-Passing Approach • Pass a token among the participating processes • The token is an entity that at any time is held by one process • The process holding the token may enter its critical section without asking permission • When a process leaves its critical section, it passes the token to another process Fred Kuhns () 71 Deadlock in Resource Allocation • Mutual exclusion • Hold and wait • No preemption • Circular wait Fred Kuhns () 72 Deadlock Prevention • Circular-wait condition can be prevented by defining a linear ordering of resource types • Hold-and-wait condition can be prevented by requiring that a process request all of its required resource at one time, and blocking the process until all requests can be granted simultaneously Fred Kuhns () 73 Deadlock Avoidance • Distributed deadlock avoidance is impractical – Every node must keep track of the global state of the system – The process of checking for a safe global state must be mutually exclusive – Checking for safe states involves considerable processing overhead for a distributed system with a large number of processes and resources Fred Kuhns () 74 Distributed Deadlock Detection • Each site only knows about its own resources – Deadlock may involve distributed resources • Centralized control – one site is responsible for deadlock detection • Hierarchical control – lowest node above the nodes involved in deadlock • Distributed control – all processes cooperate in the deadlock detection function Fred Kuhns () 75 Deadlock in Message Communication • Mutual Waiting – Deadlock occurs in message communication when each of a group of processes is waiting for a message from another member of the group and there are no messages in transit Fred Kuhns () 76 Deadlock in Message Communication • Unavailability of Message Buffers – Well known in packet-switching data networks – Example: buffer space for A is filled with packets destined for B. The reverse is true at B. Fred Kuhns () 78 Direct Store-and-Forward Deadlock Fred Kuhns () 79 Deadlock in Message Communication • Unavailability of Message Buffers – For each node, the queue to the adjacent node in one direction is full with packets destined for the next node beyond Fred Kuhns () 80 Structured Buffer Pool Fred Kuhns () 82