Distributed System Concepts and Architectures

Document Sample
Distributed System Concepts and Architectures Powered By Docstoc
					Distributed System Concepts and


•   Advantages and disadvantages of distributed OS
•   Goals
•   Transparency
•   Services
•   Architecture Models
•   Communication Network Protocols
•   Major Design Issues
•   Distributed Computing Environment (DCE)

          Distributed OS

• An integration of system services, presenting a transparent
  view of a multiple computer system with distributed
  resources and control
• A collection of independent computers that appear to the
  users of the system as a single computer
• Examples
   – Personal workstations + a pool of processors + single file system
   – Robots on the assembly line + Robots in the parts department
   – A large bank with hundreds of branch offices all over the world

         Advantages of Distributed Systems
         Over Centralized Systems
• Economics – microprocessors offer a better
  price/performance than mainframes
• Speed – a distributed system may have more total
  computing power than a mainframe
• Inherent distribution – some applications involve spatially
  separated machines
• Reliability – if one machine crashes, the system as whole
  can still survive
• Incremental growth – computing power can be added in
  small increments

        Advantages of Distributed Systems
        Over Isolated Computers
• Data sharing – allow many users access to a common data
• Device sharing – allow many users to share expensive
  peripherals like color printers
• Communication – make human-to-human communication
  easier, for example, by E-mail
• Flexibility – spread the workload over the available
  machines in the most cost effective way

         Disadvantages of Distributed
• Software – complex software
• Networking – the network can saturate or cause other
• Security – easy access also applies to secret data

          Goals (I)

• Provide a high-performance and robust computing
  environment with least awareness of the management and
  control of distributed system resources
• Efficiency - difficult due to communication delays
   – Propagation delay – nothing can be done
   – Protocol overhead
      • Effective communication primitives, good protocols
   – Load distribution – bottleneck or congestions in Network/SW
      • Balance and overlap computation and communication
      • Distributed processing and load sharing

          Goals (II)

• Flexibility
   – User view: friendly system and freedom in using the system
      • Friendliness: user interface, consistency, reliability  use OO
      • Freedom:
           – No unreasonable restrictions in using systems
           – Easy to build additional tools or services
   – System view
      • Ability to evolve and migrate
      • Modularity, scalability, portability and interoperability
   – Difficult to achieve…
      • Heterogeneous HW/SW components

          Goals (III)
• Consistency - Lack of global information, replication and
  partitioning of data, component failures, complexity of
  interaction among components
   – User needs: uniformity in using the system and predictable system
   – System needs: proper concurrency control mechanisms and failure
     handling and recovery procedure
• Robustness - problem with failures in communication links,
  processing nodes and client/server processes
   – System must reinitialize itself to a state where integrity preserved
     and only small loss in performance
   – Handle exceptions and errors, changes to topology, long message
     delays, inability to locate server
   – Security: reliability, protection, and access control

• Transparency
   – Hide all irrelevant system-dependent details from users
   – Create an illusion of the model users are supposed to see
   – Trade-off between simplicity and effectiveness
• Objective
   – Provide a logical view of a physical system and at the same time
     reduce the effect and awareness of the physical system to a

          Type of Transparency (I)

• Access: access local and remote system objects in same
   – Phone (local) VS. letter (remote)
• Location (name): No awareness of object location - use
  logical names
   – Area code for other cities
• Migration: object can be moved to different locations without
  changing names
   – Local numbers are changed if one moves to other cities
   – Need universal name (symbolic or numerical)
• Concurrency: sharing of objects without interference

           Type of Transparency (II)
• Relocation: a resource may be moved to another location when in use
• Replication: consistency of multiple instances of files and data
• Parallelism: permit parallel activities without users knowing how, where,
  and when these activities are carried out by the system
• Failure: fault tolerance, graceful performance degradation, minimum
  damages to the user
• Performance: consistent and predictable performance level even if
  changes in structure or load distribution
• Size: modularity and scalability Incremental growth in HW without user
• Persistence: (software) resource may be in memory or on disk
• Revision: SW revisions not visible (vertical growth)

             Categorization of Transparency
             Based on System Goals
• Efficiency                   • Consistency
    – Concurrency                 –   Access
    – Parallelism                 –   Replication
    – Performance                 –   Performance
• Flexibility                     –   Persistence
    –   Access                 • Robustness
    –   Location                  –   Failure
    –   Relocation                –   Replication
    –   Migration                 –   Size
    –   Size                      –   Revision
    –   Revision

            Distributed System Issues
            and Transparencies
Major Issues                  Transparencies

Communication                 Interaction and control transparency
Distributed algorithms
Process scheduling            Performance transparency
Deadlock handling
Load balancing
Resource scheduling           Resource transparency
File sharing
Concurrency control
Failure handling              Failure transparency

          Services (I)

• Primitive services - most fundamental, in kernel
   – Must implemented in the kernel of each node in the system
   – Communication – message passing (send/receive primitives)
       • Synchronous or asynchronous
   – Inter-node, inter-process Synchronization – synchronous
       • Synchronous semantics of communication or synchronization
   – Processor multiplexing -- Process server (for transparency reason)
       • Creation, deletion, tracking for memory and processing time

           Service (II)
• Services by System Servers – fundamental, not need in kernel
   – Provide fundamental services for managing processes, files, and process
   – Can be implemented anywhere in the system, and still perform functions
     basic to the operation of a distributed system
   – Mapping logical names to physical addresses
       • Name server: locate processes, users, machines
       • Directory server: locate files, communication ports
   – Translate addresses and locations into communication paths: network server
   – Broadcast messages: broadcast or multicast servers
   – Clocks for synchronization - impossible to agree on global clock information
       • Time server: physical clocks and logical clocks (for event ordering)
   – File servers, print servers, migration server, authentication server

          Service (III)

• Value-added Services - not essential in implementation of
  system but useful, higher-level or special purpose services
  (such as user applications)
   – Increase computational performance, enhance fault tolerance,
     cooperative activities
   – Example is Web server
   – Groups of interacting processes
       • Group server: membership (add/remove), admission policies,
   – Distributed conferencing server and concurrent editing server

          System Architecture Models

• System Architectures
   – Workstation-server model
       • Client workstations
           – Local processing capability and interface to the network
       • Server workstations
           – Dedicated for special services
   – Processor pool model - collect all processing power in one place,
     users use terminals only
       • Terminal: remote booting, remote file mounting, virtual terminal
         handling, packet assembling and disassembling (PAD)
       • File and processor allocation done by system
   – Integrated hybrid model

Workstation-Server Model

                     File Server

                     Printer Server

Processor-Pool Model

           Communication Network
           Architecture Models
• HW interconnection + inter-node inter-process communication protocols
• Hardware interconnection
   – Point-to-point links – direct connections between pairs of nodes
   – Multipoint links – allow connection of nodes into clusters
      • Common bus – time shared
            – IEEE 802 LAN Standard – Ethernet, Token Bus/Ring, FDDI…
      • Switch – space/time multiplexing at higher HW cost/complexity
            – Private switches for multiprocessor systems – cross-bar…
            – Public switches – ISDN, SMDS, ATM
• Ratio of propagation delay to transmission delay
   – LAN: small. Close components, more suitable for distributed processing
   – MAN/WAN: large. More communication oriented




          Communication Network
• Communication Protocol: set of rules that regulate the
  exchange of messages to provide a reliable and orderly flow
  of information among communicating processes
• Connection-oriented communication service – Phone
   – Need explicit set up of a connection channel before communication
   – Messages are delivered reliably and in sequence
   – Virtual circuit (logical) or circuit switching (physical)
• Connectionless communication service – postal service
   – No initial connection establishment is necessary
   – Messages are delivered on a best-effort basis in timing and route
     and may arrive in arbitrary order
   – Datagram (logical) or packet switching (physical)

          OSI Protocol Suite

• Seven-layer protocol suite
• OSI focuses on interconnecting computers
• A process communicates with a remote process by passing
  data through the seven layers, then the physical network,
  and finally through the remote layers in reverse order
   – Segmenting/reassembling
   – Transparency between layers – encapsulation
      • Add header for protocol data unit (PDU) from upper layer
      • The remote corresponding layer strip off the header
• A gateway or intermediate node only stores and forwards
  messages at the three lower network dependent layers

        OSI Protocol Suite (Cont.)
                    Peer-to-Peer Protocols
Application                                         Application

Presentation                                        Presentation

  Session                                             Session

 Transport                                           Transport
                      Intermediate Node
  Network              Network Network                Network

 Data Link            Data Link Data Link            Data Link

  Physical             Physical Physical              Physical

       Communication Link             Communication Link

            OSI Protocol Suite (Cont.) --
            Physical Layer
• Specify the electrical and mechanical characteristics of the physical
  communication link – standardize
    – Coding method, modulation technique, wire/connector specification
    – Sharing of common bus needs interface standards for the medium access
      control in the data link layer
• Reliable mapping of signals to bits – need bit synchronization
• Bit synchronization
    – Detection of the beginning of a bit and a sequence of bits
    – Bit synchronous: large blocks of bits transmitted at a regular rate
       • Offer higher data transfer speed and better link utilization
    – Character asynchronous: small fixed-size bit sequences transmitted
       • Low-speed character-oriented terminals

            OSI Protocol Suite (Cont.) --
            Data Link Control (DLC) Layer
• Ensure reliable data transfer of groups of bits (frames)
• Configuration setup
    – Establishment and termination of a connection
    – Full- or half-duplex, synchronous or asynchronous connection?
• Error controls
    – Transmission errors and loss or replication of data frames
    – Detected by checksum or time-out mechanisms
    – Recovered by retransmissions or forward error corrections
• Sequencing
    – Maintain an orderly delivery of frames by sequence numbers
    – Sequence number can assist error control and flow control of data frames
• Flow control of data frames
    – Permit the transmission of a frame only if it falls into an allowed windows of
      buffers for the send and the receiver
• Multipoint configuration: DLC sublayer – MAC sublayer – Physical layer
    – Resolve the access contention of the multiple access channel
          OSI Protocol Suite (Cont.) –
          Network Layer
• Address issues of sending packets across the network
  through several link segments
• Routing function
   – Which link should be selected for forwarding a packet, based on its
     destination address
   – Static or dynamic routing; centralized or distributed
   – Routing decision can be made at the time when a connection is
     requested and is being established (connection-oriented); or packet-
     by-packet basis (connectionless, multiple path routing)
• Error, sequencing, and flow control function
   – Reassemble packets and discard duplicate ones
   – Congestion control for favorable routing nodes

          OSI Protocol Suite (Cont.) –
          Transport Layer
• The most important layer from the OS view
   – The only interface between the communication sub-network layers
     and network-independent layers
• Provide a reliable end-to-end communication between peers
   – All network-dependent faults or problems are to be shielded from the
     communicating processes
   – Message packets (breaking/reassembling)
   – Multiple sessions can be multiplexed on one transport connection
   – One session may occupy multiple transport connection
   – Five classes (TP0 to TP4) of transport services to support sessions
      • Depend on application and network quality
      • TP4: multiplexing, error detection, and retransmission

          OSI Protocol Suite (Cont.) – Session,
          Presentation, Application Layers

• Session layer: add additional dialog and synchronization
  services to transport layer
   – Dialog: establishment of sessions
   – Synchronization: allow processes to insert checkpoints for efficient
     recovery from system crashes
• Presentation layer: data encryption, compression, and code
  conversion for messages that use different coding schemes
• Application layer: standard is completely left to the designer
  of the application

           TCP/IP Protocol Suite
• Address inter-process and inter-node communication
   – How is communication between a pair of processes maintained?
      • Transport Layer  TCP (TP4 in OSI)
      • Connection-oriented (TCP) or Connectionless (UDP)
   – How are messages routed through the network nodes?
      • Network Layer  IP (a little more than the OSI network Layer)
      • Virtual circuit or datagram
• TCPI/IP focuses on interconnecting networks
• (TCP, UDP) * (Virtual Circuit, Datagram IP)
   – Shift burden of maintaining reliable communication from network to OS
• Port and Socket (more in Chapter 4)
   – Port: inter-process communication endpoints
   – Socket: interface to port

         TCP/IP Protocol Suite (Cont.)

 Application            Peer to Peer Protocols    Application
 processes                                        processes
  Transport                                        Transport
    layer                                            layer
           packet             Gateway
   Internet                   Internet              Internet
     layer                      layer                 layer
Data link and              Data link and         Data link and
physical Layer             physical Layer        physical Layer

        Frame in bits
           Major Design Issues

• A distributed system consists of concurrent processes
  accessing distributed resources (which may be shared or
  replicated) through message passing in a network
  environment that may be unreliable and contain un-trusted
   –   How to model and identify objects
   –   How to coordinate the interaction among objects
   –   How to achieve objects communication
   –   How to manage shared or replicated objects
   –   How to protect objects and system security
• How to support transparency

          Major Design Issues – Object
          Models and Naming Schemes
• Objects: processes, data files, memory, devices,
  processors, networks
• Assume all objects can be represented uniformly
   – An object is represented abstractly by the allowable operations
   – The physical details of the object are transparent to other objects
• To identify a server:
   – By name - map name to logical address
   – Physical or logical address - done by network service, port for logical
   – By service - needed by CAS

            Major Design Issues –
            Distributed Coordination
• Coordinate interacting concurrent processes to achieve synchronization
• Requirements
    – Barrier synchronization – a set of processes (or events) must reach a
      common synchronization point before they can continue
    – Condition coordination – a set of processes (or events) must wait for an
      asynchronously condition set by other processes to maintain some ordering
      of execution
    – Mutual exclusion - concurrent processes must have mutual exclusion when
      accessing a critical shared resource
• Need knowledge of state information about other processes
    – Through messages  inaccurate or incomplete (unreliable network)
    – Centralized coordinator (leader election) or distributed resolution
• Deadlock handling – detect and recover
• Assimilate partial global state information and use it for decision making
    – Exchange local knowledge among cooperating sites

           Major Design Issues (Cont.)
• IPC - Use high-level methods for transparency in communication
   – Message passing – low level and physical
   – Client/Server Model - system interactions through message exchanges:
   – RPC - request/reply like procedure call, built on top of client/server model
       • RPC assumes point to point, but need groups (multicast, broadcast)
• Distributed Resources - data processing capacity
   – Multiprocessor scheduling - static load distribution vs. dynamic load sharing
      • Process migration, real-time scheduling
   – Distributed file system and distributed shared memory
      • Sharing and replication of data

          Major Design Issues (Cont.)

• Fault tolerance and security
   – Failure - unintentional intrusion - redundancy alleviates it
   – Security violation - intentional intrusion - need secure communication
     processes, integrity of messages
   – Need to authenticate clients/severs, messages

          Distributed Computing
          Environment (DCE)
• Proposed by Open Software Foundation (OSF)
   – Develop and standardize an open Unix environment that is free from
     the influence of AT&T and Sun
• DEC: an integrated package of software and tools for
  developing distributed applications on an existing OS
• Hierarchically layered architecture

DCE Architecture


Shared By: