Docstoc

Distributed Systems

Document Sample
Distributed Systems Powered By Docstoc
					           Processes


http://net.pku.edu.cn/~course/cs501/2011

              Hongfei Yan
   School of EECS, Peking University
               3/16/2011
                       Contents
Chapter
• 01: Introduction
• 02: Architectures
• 03: Processes
• 04: Communication
• 05: Naming
• 06: Synchronization
• 07: Consistency & Replication
• 08: Fault Tolerance
• 09: Security
• 10: Distributed Object-Based Systems
• 11: Distributed File Systems
• 12: Distributed Web-Based Systems
• 13: Distributed Coordination-Based Systems   2/N
              03: Processes
•   3.1 Threads
•   3.2 Virtualization
•   3.3 Clients
•   3.4 Servers
•   3.5 Code Migration




                              3/N
             3.1 Threads
• 3.1.1 Introduction to Threads
• 3.1.2 Threads in Distributed Systems




                                         4/N
                     Basic ideas
We build virtual processors in software, on top of physical
processors:
Processor: Provides a set of instructions along with the
capability of automatically executing a series of those
instructions.
Thread: A minimal software processor in whose context a
series of instructions can be executed. Saving a thread
context implies stopping the current execution and saving all
the data needed to continue the execution at a later stage.
Process: A software processor in whose context one or more
threads may be executed. Executing a thread, means
executing a series of instructions in the context of that thread.

                                                               5/N
             Context Switching
• Processor context: The minimal collection of values
stored in the registers of a processor used for the execution
of a series of instructions (e.g., stack pointer, addressing
registers, program counter).
• Thread context: The minimal collection of values stored in
registers and memory, used for the execution of a series of
instructions (i.e., processor context, state).
• Process context: The minimal collection of values stored
in registers and memory, used for the execution of a thread
(i.e., thread context, but now also at least MMU register
values).


                                                                6/N
• Observation 1: Threads share the same
  address space. Do we need OS
  involvement?
• Observation 2: Process switching is
  generally more expensive. OS is involved.
  – e.g., trapping to the kernel.
• Observation 3: Creating and destroying
  threads is cheaper than doing it for a
  process.


                                              7/N
  Context Switching in Large Apps
• A large app is commonly a set of cooperating processes,
  communicating via IPC, which is expensive.




                                                            8/N
       Threads and OS (1/2)
• Main Issue: Should an OS kernel provide
  threads or should they be implemented as
  part of a user-level package?
• User-space solution:
  – Nothing to do with the kernel. Can be very
    efficient.
  – But everything done by a thread affects the
    whole process. So what happens when a
    thread blocks on a syscall?
  – Can we use multiple CPUs/cores?

                                                  9/N
             Threads and OS (2/2)
• Kernel solution: Kernel implements
  threads. Everything is system call.
        – Operations that block a thread are no longer a
          problem: kernel schedules another.
        – External events are simple: the kernel (which
          catches all events) schedules the thread
          associated with the event.
        – Less efficient.
• Conclusion: Try to mix user-level and
  kernel-level threads into a single concept.
•   7


                                                       10/N
               Hybrid (Solaris)
• Use two levels. Multiplex user threads on top of LWPs
  (kernel threads).




                                                          11/N
• When a user-level thread does a syscall,
  the LWP blocks. Thread is bound to LWP.
• Kernel schedules another LWP.
• Context switches can occur at the user-
  level.
• When no threads to schedule, a LWP may
  be removed.
• Note
  – This concept has been virtually abandoned –
    it’s just either user-level or kernel-level
    threads.
                                                  12/N
 Threads and Distributed Systems
• Multithreaded clients: Main issue is hiding network
  latency.
• Multithreaded web client:
   – Browser scans HTML, and finds more files that need to be
     fetched.
   – Each file is fetched by a separate thread, each issuing an HTTP
     request.
   – As files come in, the browser displays them.
• Multiple request-response calls to other machines (RPC):
   – Client issues several calls, each one by a different thread.
   – Waits till all return.
   – If calls are to different servers, will have a linear speedup.



                                                                       13/N
• Suppose there are ten images in a page. How should
  they be fetched?
   – Sequentially
      • fetch_sequential() {
          for (int i = 0; i < 10; i++) {
                                           Fetching
            int sockfd = ...;
            write(sockfd, "HTTP GET ..."); Images
            n = read_till_socket_closed(sockfd, jpeg[i],
        100K);
            }
        }
   – Concurrently
      • fetch_concurrent() {
            int thread_ids[10];
            for (int i = 0; i < 10; i++) {
                thread_ids[i] = start_read_thread(urls[i],
        jpeg[i]);
            }
            for (int i = 0; i < 10; i++) {
                wait_for_thread(thread_ids[i]);
            }
        }
• Which is faster?

                                                           14/N
          A Finite State Machine
• A state machine consists of
   – state variables, which encode its state, and
   – commands, which transform its state.
   – Each command is implemented by a
     deterministic program;
       • execution of the command is atomic with respect
         to other commands and
       • modifies the state variables and/or produces some
         output.
[Schneider,1990] F. B. Schneider, "Implementing fault-tolerant services
using the state machine approach: a tutorial," ACM Comput. Surv., vol. 22,
pp. 299-319, 1990.
                                                                             15/N
         A FSM implemented
• Using a single process that awaits
  messages containing requests and
  performs the actions they specify, as in a
  server.
   1.   Read any available input.
   2.   Process the input chunk.
   3.   Save state of that request.
   4.   Loop.


                                               16/N
       Threads and Distributed Systems:
            Multithreaded servers:
• Improve performance:
  – Starting a thread to handle an incoming request
    is much cheaper than starting a process.
  – Single-threaded server can’t take advantage of
    multiprocessor.
  – Hide network latency. Other work can be done
    while a request is coming in.
• Better structure:
  – Using simple blocking I/O calls is easier.
  – Multithreaded programs tend to be simpler.
• This is a controversial area.
                                                  17/N
            Multithreaded Servers




• A multithreaded server organized in a dispatcher/worker model.
                                                            18/N
           Multithreaded Servers

• Three ways to construct a server.



 Model                     Characteristics
 Threads                   Parallelism, blocking system calls
 Single-threaded process   No parallelism, blocking system calls
 Finite-state machine      Parallelism, nonblocking system calls



                                                                   19/N
           3.2 Virtualization
• 3.2.1 The Role of Virtualization in
  Distributed Systems

• 3.2.2 Architectures of Virtual Machines




                                            20/N
   Intuition About Virtualization
• Make something look like something else.
• Make it look like there is more than one of
  a particular thing.




                                                21/N
                    Virtualization
• Observation: Virtualization is becoming
  increasingly important:
   – Hardware changes faster than software
   – Ease of portability and code migration
   – Isolation of failing or attacked components

 A virtual machine can support individual processes or a complete
 system depending on the abstraction level where virtualization
 occurs. Some VMs support flexible hardware usage and software
 isolation, while others translate from one instruction set to another.
 [James and Ravi,2005]      E. S. James and N. Ravi, "The Architecture of
 Virtual Machines," Computer, vol. 38, pp. 32-38, 2005.
                                                                            22/N
  Abstraction and Virtualization
     applied to disk storage




• (a) Abstraction provides a simplified interface to
  underlying resources.
• (b) Virtualization provides a different interface or
  different resources at the same abstraction level.
                                                         23/N
                                  Virtualization
•    Originally developed by IBM.
•    Virtualization is increasingly important.
      –   Ease of portability.
      –   Isolation of failing or attacked components.
      –   Ease of running different configurations, versions, etc.
      –   Replicate whole web site to edge server.




    General organization
    between a program,                                    General organization of
    interface, and system.                                virtualizing system A on top of
                                                          system B.
                                                                                            24/N
Interfaces offered by computer systems
• Computer systems offer different levels of interfaces.
    –   Interface between hardware and software, non-privileged.
    –   Interface between hardware and software, privileged.
    –   System calls.
    –   Libraries.
• Virtualization can take place at very different levels, strongly
  depending on the interfaces as offered by various systems
  components:




                                                                     25/N
                 Two kinds of VMs




•   Process VM: A program is compiled to intermediate (portable) code, which
    is then executed by a runtime system. (Example: Java VM)
•   VMM: A separate software layer mimics the instruction set of hardware: a
    complete OS and its apps can be supported. (Example: VMware, VirtualBox)
                                                                         26/N
                 3.3 Clients
• 3.3.1 Networked User Interfaces
• 3.3.2 Client-Side Software for Distributed
  Systems




                                               27/N
    Networked User Interfaces
• Two approaches to building a client.
  – For every application, create a client part and
    a server part.
    • Client runs on local machine, such as a PDA.
  – Create a reusable GUI toolkit that runs on the
    client. GUI can be directly manipulated by the
    server-side application code.
    • This is a thin-client approach.



                                                      28/N
                    Thick client
• The protocol is application specific.
• For re-use, can be layered, but at the top, it is
  application-specific.




                                                      29/N
Thin-client




              30/N
          X Window System
• The X Window System (commonly referred
  to as X or X11) is a network-transparent
  graphical windowing system based on a
  client/server model.
  – Primarily used on Unix and Unix-like systems
    such as Linux,
  – versions of X are also available for many other
    operating systems.
  – Although it was developed in 1984, X is not only
    still available but also is in fact the standard
    environment for Unix windowing systems.
                                                   31/N
   The X Client/Server Model
• It's the server that runs on the local machine,
  providing its services to the display based on
  requests from client programs that may be
  running locally or remotely.
• it was specifically designed to work across a
  network.
   – The client and the server communicate via the X
     Protocol,
   – a network protocol that can run locally or across a
     network.

                                                           32/N
         The XWindow System




• Protocol tends to be heavyweight.
• Other examples of similar systems?
   – VNC
   – Remote desktop

                                       33/N
     Scalability problems of X
• Too much bandwidth is needed.
  – By using compression techniques, bandwidth
    can be considerably reduced.
• There is a geographical scalability problem
  – as an application and the display generally
    need to synchronize too much.
  – By using caching techniques by which
    effectively state of the display is maintained at
    the application side

                                                    34/N
     Compound documents
• User interface is applicationaware =>
  interapplication communication
  – drag-and-drop: move objects across the
    screen to invoke interaction with other
    applications
  – in-place editing: integrate several applications
    at user-interface level (word processing +
    drawing facilities)



                                                   35/N
          Client-Side Software
• Often tailored for distribution transparency.
  – Access transparency: client-side stubs for
    RPCs.
  – Location/migration transparency: Let client-
    side software keep track of actual location.
  – Replication transparency: Multiple invocations
    handled by client-side stub.
  – Failure transparency: Can often be placed
    only at client.

                                                 36/N
• Transparent replication of a server using a
  client-side solution.




                                            37/N
               3.4 Servers
• 3.4.1 General Design Issues
• 3.4.2 Server Clusters
• 3.4.3 Managing Server Clusters




                                   38/N
Servers: General Organization
Basic model: A server is a process that waits for
incoming service requests at a specific transport address.
In practice, there is a one-to-one mapping between a
port and a service.

      ftp-data 20 File Transfer [Default Data]
      ftp      21 File Transfer [Control]
      telnet 23 Telnet
               24 any private mail system
      Smtp 25 Simple Mail Transfer
      login    49 Login Host Protocol
      Sunrpc 111 SUN RPC (portmapper)
      Courier 530 Xerox RPC
                                                         39/N
 Servers: General Organization
Types of servers
• Iterative vs. Concurrent: Iterative servers can
  handle only one client at a time, in contrast to concurrent
• Superservers: Listen to multiple end points,
  then spawn the right server.




                                                            40/N
(a) Binding Using Registry. (b) Superserver




                                        41/N
 Out-of-Band Communication
• Issue: Is it possible to interrupt a server once it has
  accepted (or is in the process of accepting) a service
  request?
• Solution 1: Use a separate port for urgent data (possibly
  per service request):
   – Server has a separate thread (or process) waiting for incoming
     urgent messages
   – When urgent message comes in, associated request is put on
     hold
   – Note: we require OS supports high-priority scheduling of specific
     threads or processes
• Solution 2: Use out-of-band communication facilities of
  the transport layer:
   – Example: TCP allows to send urgent messages in the same
     connection
   – Urgent messages can be caught using OS signaling techniques
                                                                      42/N
                 Stateless Servers
• Never keep accurate information about the status of a
  client after having handled a request:
   – Don’t record whether a file has been opened (simply close it
     again after access)
   – Don’t promise to invalidate a client’s cache
   – Don’t keep track of your clients
• Consequences:
   – Clients and servers are completely independent
   – State inconsistencies due to client or server crashes are reduced
   – Possible loss of performance
       • because, e.g., a server cannot anticipate client behavior (think of prefetching
         file blocks)



                                                                                       43/N
                  Stateful Servers
• Keeps track of the status of its clients:
   – Record that a file has been opened, so that prefetching can be
     done
   – Knows which data a client has cached, and allows clients to
     keep local copies of shared data
• Observation: The performance of stateful servers can be
  extremely high, provided clients are allowed to keep
  local copies.
   – Session state vs. permanent state
   – As it turns out, reliability is not a major problem.




                                                                      44/N
                        Cookies
• A small piece of data containing client-specific
  information that is of interest to the server
• Cookies and related things can serve two
  purposes:
   – They can be used to correlate the current client
     operation with a previous operation.
   – They can be used to store state.
      • For example, you could put exactly what you were buying,
        and what step you were in, in the checkout process.




                                                                   45/N
                      Server Clusters
• Observation: Many server clusters are organized along three
  different tiers, to improve performance.
    – Typical organization below, into three tiers. 2 and 3 can be merged.
• Crucial element: The first tier is generally responsible for passing
  requests to an appropriate server.




                                                                             46/N
             Request Handling
• Observation: Having the first tier handle all communication
  from/to the cluster may lead to a bottleneck.
• Solution: Various, but one popular one is TCP-handoff:




                                                          47/N
           Distributed Servers
• We can be even more distributed.
  – But over a wide area network, the situation is too
    dynamic to use TCP handoff.
  – Instead, use Mobile IP.
  – Are the servers really moving around?
• Mobile IP
  – A server has a home address (HoA), where it can
    always be contacted.
  – It leaves a care-of address (CoA), where it actually is.
  – Application still uses HoA.

                                                           48/N
Route optimization in a distributed servers




                                              49/N
     Managing Server Clusters
• Most common: do the same thing as usual.
  – Quite painful, if you have a 128 nodes.
• Next step, provide a single management
  framework that will let you monitor the whole
  cluster, and distribute updates en masse.
  – Works for medium sized. What if you have a 5,000
    nodes?
  – Need continuous repair, essentially autonomic
    computing.



                                                       50/N
        Example: PlanetLab
• Essence: Different organizations
  contribute machines, which they
  subsequently share for various
  experiments.
• Problem: We need to ensure that different
  distributed applications do not get into
  each other’s way => virtualization



                                              51/N
                         PlanetLab
• Vserver: Independent and protected environment with its own
  libraries, server versions, and so on. Distributed applications are
  assigned a collection of vservers distributed across multiple
  machines (slice).




                                                                        52/N
PlanetLab Management Issues
• Nodes belong to different organizations.
  –Each organization should be allowed to specify who is
   allowed to run applications on their nodes,
  –And restrict resource usage appropriately.
• Monitoring tools available assume a very specific
  combination of hardware and software.
  –All tailored to be used within a single organization.
• Programs from different slices but running on the
  same node should not interfere with each other.



                                                           53/N
• Node manager
   – Separate vserver
   – Task: create other vservers and control resource allocation
   – No policy decisions
• Resource specification (rspec)
   – Specifies a time interval during which a specific resource is
     available.
   – Identified via a 128-bit ID, the resource capability (rcap).
       • Given rcap, node manager can look up rspec locally.
   – Resources bound to slices.
• Slice associated with service provider.
   – Slice ID’ed by (principal_id, slice_tag), which identifies the
     provider and the slice tag which is chosen by the provider.
• Slice creation service (SCS) runs on node, receives
  creation requests from some slice authority.
   – SCS contacts node manager. Node manager cannot be
     contacted directly. (Separation of mechanism from policy.)
• To create a slice, a service provider will contact a slice
  authority and ask it to create a slice.
• Also have management authorities that monitor nodes,
  make sure running right software, etc.                              54/N
• Management relationships between PlanetLab entities:
  1. A node owner puts its node under the regime of a management
     authority, possibly restricting usage where appropriate.
  2. A management authority provides the necessary software to add
     a node to PlanetLab.
  3. A service provider registers itself with a management authority,
     trusting it to provide well-behaving nodes.
  4. A service provider contacts a slice authority to create a slice on a
     collection of nodes.
  5. The slice authority needs to authenticate the service provider.
  6. A node owner provides a slice creation service for a slice
     authority to create slices. It essentially delegates resource
     management to the slice authority.
  7. A management authority delegates the creation of slices to a
     slice authority.
                           Management
             2              authority        3

                      1           7         4
Node owner                                            Service provider

                 6        Slice authority
                                                                       55/N
         3.5 Code Migration
• 3.5.1 Approaches to Code Migration
• 3.5.2 Migration and Local Resources
• 3.5.3 Migration in Heterogeneous Systems




                                             56/N
               Approaches
• Why code migration?
  – Moving from heavily loaded to lightly loaded.
  – Also, to minimize communication costs.
  – Moving code to data, rather than data to code.
  – Late binding for a protocol. (Download it.)




                                                 57/N
      Dynamic Client Configuration




The principle of dynamically configuring a client to communicate to a
server. The client first fetches the necessary software, and then
invokes the server.
                                                                        58/N
A framework of Code Migration
A process consists of three segments
• Code segment: contains the actual code
• Resource segment: contains references to
  external resources needed by the process
  – E.g., files, printers, devices, other processes
• Execution segment: store the current
  execution state of a process, consisting of
  private data, the stack, and the program
  counter
                                                      59/N
A Reference Model




                    60/N
Two Notions of Code Mobility
• Weak mobility: only code, plus maybe some init data
  (and start execution from the beginning) after migration:
   – Examples: Java applets
• Strong mobility: Code and execution segment are
  moved
   – Migration: move the entire object from one machine to the other
   – Cloning: simply start a clone, and set it in the same execution
     state.
• Initiation
   – Sender-initiated migration
   – Receiver-initiated migration


                                                                   61/N
Alternatives for code migration




                                  62/N
Migrating Local Resources (1/3)
• Problem: A process uses local resources that
  may or may not be available at the target site.
• Process-to-resource binding
  – Binding by identifier: the process refers to a resource
    by its identifier (e.g., a URL)
  – Binding by value: the object requires the value of a
    resource (e.g., a library)
  – Binding by type: the object requires that only a type of
    resource is available (e.g., local devices, such as
    monitors, printers, and so on)



                                                           63/N
Migrating Local Resources (2/3)
• Resource-to-machine binding
  – Unattached: the resource can easily be moved along
    with the object (small files, e.g. a cache)
  – Fastened: the resource can, in principle, be migrated
    but only at high cost (possibly larger)
     • E.g., local databases and complete Web sites
  – Fixed: the resource cannot be migrated, such as local
    hardware (bound to the machine)
     • E.g., a local communication end point




                                                            64/N
Migrating Local Resources (3/3)
  Actions to be taken with respect to the references to local
  resources when migrating code to another machine.

   Process-to-               Resource-to-machine binding
resource binding
                       Unattached          Fastened           Fixed
By identifier       MV (or GR)         GR (or MV)        GR
By value            CP ( or MV, GR)    GR (or CP)        GR
By type             RB (or GR, CP)     RB (or GR, CP)    RB (or GR)



     GR: Establish a global system-wide reference
     MV: Move the resource
     CP: Copy the value of the resource
     RB: Rebind the process to a locally available resource
                                                                      66/N
 Migration in Heterogeneous Systems

• Main problem:
  – The target machine may not be suitable to execute
    the migrated code
  – The definition of process/thread/processor context is
    highly dependent on local hardware, operating
    system and runtime system
• Only solution: Make use of an abstract
  machine that is implemented on different
  platforms


                                                            67/N
            Current Solustions

• Interpreted languages running on a virtual
  machine
  – E.g., Java/JVM; scripting languages
• Virtual machine monitors, allowing
  migration of complete OS + apps.




                                               68/N
 Live Migration of Virtual Machines

• Involves two major problems:
  – Migrating the entire memory image
    • Push phase
    • Stop-and-copy phase
    • Pull phase
  – Migrating bindings to local resources
    • There is a single network, announce the new
      network-to-MAC address binding



                                                    69/N
• Three ways to handle memory migration
  (which can be combined)
 1. Pushing memory pages to the new
    machine and resending the ones that are
    later modified during the migration process.
 2. Stopping the current virtual machine;
    migrate memory, and start the new virtual
    machine.
 3. Letting the new virtual machine pull in new
    pages as needed, that is, let processes
    start on the new virtual machine
    immediately and copy memory pages on
    demand.

                                                   70/N

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:10/1/2011
language:English
pages:69