Docstoc

fundamentals

Document Sample
fundamentals Powered By Docstoc
					CS6253: Distributed
     Systems
                                        Lecture 1
                                      Fundamentals
                                         Joel Wein
                                       wein@poly.edu

Note: Portions of these slides – the text, the figures, or the slides themselves – may be drawn from
   the instructor’s material from the textbook “Distributed Systems: Concepts and Design” by
                                Coulouris, Dollimore and Kindberg.
              Course Information
   http://pdc-amd01.poly.edu/~wein/cs6253/
   Make sure that you are on the my.poly mailing list for the
    course.
               Lecture Overview
   Introduction to Distributed Systems Issues.
   An Example: DNS
   Some Fundamental Concepts
       Communications Patterns
       Distributed Objects
                Introduction
   In 2009, distributed systems are ubiquitous.
   A distributed system is one in which hardware
    and/or software components located at
    networked computers communicate and
    coordinate their activities only by passing
    messages.
      Consequences of Definition
   Concurrency
   No global clock
   Independent failures.
        Why Distributed Systems?
   Allows resource sharing in a more flexible and
    powerful way.
       Printer
       Database
       Web site
       Authorization server…
       On-demand computing
       …
Examples of distributed systems
   The internet
       Open architecture/protocols.
   The ATM banking network
   A company intranet
   Mobile Computing Networks
                           Services
   A distinct part of a computer system that
    manages a collection of related resources and
    presents their functionality to users and
    applications.
       Printing Service
       File services
       Name service
       Challenges in Distributed
               Systems
   Heterogeneity
   Openness
   Security
   Scalability
   Failure Handling
   Concurrency
   Achieving Transparency (*)
         Challenges I: Heterogeneity
   Heterogeneity:
       Networks
       Computer hardware
       Operating System
       Programming Language
   Middleware:Software layer that provides a programming
    abstraction as well as masking underlying heterogeneity.
       E.g. CORBA provides remote object invocation.
   Internet Protocols mask differences in underlying networks.
             Challenges II: Openness
   Openness: can the system be extended and
    reimplemented in various ways?
       Can new resource-sharing programs be added and made
        available for use by a variety of client programs?
            Key interfaces must be published, but this is only the starting point.
             Architectural vision must support openness as well.
            Internet Protocols use RFCs to publish interfaces.
             Challenges III: Security
   Security
       Confidentiality (encryption)
       Authentication
       Integrity
       Availability.
   First three have reasonably good solutions.
   Open Issues:
       DOS attacks.
       Security of mobile code.
       A distributed system has a lot of components and is only as strong as its
        weakest link.
         Challenges IV: Scalability
   Distributed Systems can be a small intranet or the entire
    internet.
   Scalable: system remains effective when there is a significant
    increase in number of resources or users.
       Has the internet scaled well?
   Subchallenges:
       Cost of physical resources: should scale at MOST linearly with the
        number of users.
       Controlling performance impact. (Data lookup needs to scale at worst
        as logn)
       Preventing resources running out. (32 bit IP addresses)
       Avoiding Performance Bottlenecks
       Dealing with peak/flash crowds in a cost-effective fashion
             Challenges V: Failures
   Lots of things can fail in a distributed system.
       Any component can die.
       A component can get overloaded, temporarily or for a long time.
       A component can get disconnected from the rest of the network.
                 Pipe can break
                 Pipe can get full
                 Peering, routing loops.
       How do you tell the difference beteen all of these?
Connectivity Statistics of ~5000 distributed
     servers to ~10 “central” points
   Given the lack of clarity as to the status of another
    component, designing distributed protocols becomes very
    subtle.
   Certain failures can be detected
       Message corruption can be identified with a checksum.
       Sequence numbers may enable you to detect a lost packet.
   Other failures can just be tolerated.
   Others may be catastrophic or too expensive to fix.
   Failures can be innocent or malicious.
      Challenge VI: Concurrency
   Challenges in a uniprocessor environment now
    just multiply.
    Challenges VII: Lack of global
            information
   In any large distributed system that is required
    to be responsive, must operate with local
    information only.
       Information that you received in the past may be
        out of date.
   Lack of a global clock is one instance of this.
                    The Goal…
   In the face of all of these challenges, the goal is …

                    TRANSPARENCY

        “The illusion that you have before you
         something as simple as a uniprocessor
                        system.”
                Transparency
   Definition: Concealment from the user and the
    application programmer of the separation of
    components in a distributed system, so that the
    system is perceived as a whole than rather as a
    collection of independent components.
           Forms of Transparency
   Access transparency:
       Local and remote resources accessed using identical
        operations
   Location transparency:
       Resources can be accessed without knowledge of their
        location
   Concurrency transparency
       Several processes can operate concurrently using shared
        resources w/o trouble.
   Replication transparency
           Forms of Transparency
   Failure transparency
   Mobility transparency
       Can move resources and clients without affecting
        operations of users and programs
   Performance transparency
       System can adjust or be adjusted for performance
        as loads vary.
         Forms of Transparency
   Scaling transparency: can expand in scale
    without change to system structure or
    application algorithms.
          Some Example Problems
   Achieving Mutual Exclusion in a distributed fashion.
       Why is it hard?
   Allowing database transactions to be processed at distribunted
    sites.
       Why is it hard?
   Storing data in a replicated fault-tolerant fashion.
       Why is it hard?
   Maintaining object freshness in a collection of distribut caches.
       Why is it hard?
   Replacing TV with video broadcast over the internet.
       Why is it hard?
    A First Example: The Internet
            Naming Service
   A very basic question: how name things and find
    things in a distributed system?
   How build a distributed system to accomplish this?
   Let’s evaluate all of this with respect to transparency.




                                                               *
        The role of names and name
                  services
   Resources are accessed using identifier or reference
       An identifier can be stored in variables and retrieved from tables
        quickly
       Identifier includes or can be transformed to an address for an object
            E.g. NFS file handle, Corba remote object reference
       A name is human-readable value (usually a string) that can be resolved
        to an identifier or address
            Internet domain name, file pathname, process number
            E.g ./etc/passwd, http://www.cdk3.net/




                                                                                 *
   For many purposes, names are preferable to identifiers
       because the binding of the named resource to a physical location is
        deferred and can be changed
            Early binding is evil
       because they are more meaningful to users
   Resource names are resolved by name services
       to give identifiers and other useful attributes
    Requirements for name spaces
   Name space is a collection of all valid names recognized by a
    particular service.
   Requirements:
       Allow simple but meaningful names to be used
       Potentially infinite number of names
       Structured
            to allow similar subnames without clashes
            to group related names
       Allow re-structuring of name trees
            for some types of change, old programs should continue to work




                                                                              *
 Composed naming domains used to
   access a resource from a URL
Figure 9.1
             URL
              http://www.cdk3.net:8888/WebExamples/earth.html

             DNS lookup
                             Resource ID (IP number, port number, pathname)

                        138.37.88.61 8888 WebExamples/earth.html


       ARP lookup

     (Ethernet) Network address

      2:60:8c:2:b0:5a                                    file


                                          Socket
                                                     Web server
                                                                              *
                     Name Resolution
   Resolution is an iterative process whereby a name is
    repeatedly presented to naming contexts.
   A naming context either maps a given name onto a
    set of primitive attributes (such as those of a user)
    directly or it maps it onto a further naming context
    and a derived name to be presented to that context.
       E.g. /etc/passwd
            Etc presented to /, passwd presented to /etc
   DNS can’t store all the names in one database.
                Iterative navigation
Figure 9.2
                                        NS2

                            2
                                        Name
                     Client 1   NS1     servers
                            3
                                            NS3


             A client iteratively contacts name servers NS1–NS3 in order to resolve a name



Used in:
DNS: Client presents entire name to servers, starting at a local
  server, NS1. If NS1 has the requested name, it is resolved, else
  NS1 suggests contacting NS2 (a server for a domain that
  includes the requested name).

                                                                                             *
 Non-recursive and recursive server-
Figure 9.3
           controlled navigation
                          NS2                                      NS2

                     2                                        2
               1                                                  4    3
                                                        1   NS1
      client       NS1                         client
               4         3                              5
                              NS3                                        NS3


                    Non-recursive                              Recursive
                   server-controlled                         server-controlled

        A name server NS1 communicates with other name servers on behalf of a client

DNS offers recursive navigation as an option, but iterative is the
  standard technique. Recursive navigation must be used in
  domains that limit client access to their DNS information for
  security reasons.
                                                                                       *
         DNS - The Internet Domain
               Name System
   A distributed naming database
   Name structure reflects administrative structure of the Internet
   Rapidly resolves domain names to IP addresses
        exploits caching heavily
        typical query time ~100 milliseconds
 Scales to
Basic DNS millions of computers resolution (domain name -> IP number)
            algorithm for name
• Lookpartitioned database local cache
       for the name in the
• Try a superior DNS server, which responds with:
     caching

   – another
 Resilient to recommended DNS server
                failure of a server
   – the IP address (which may not be entirely up to date)
     replication




                                                                       *
                             DNS name servers
                                                     a.root-servers.net
     Figure 9.4                                          (root)

                                                       uk
                                  ns1.nic.uk           purdue.edu
                                     (uk)              yahoo.com
Note: Name server names are in                          ....                           ns.purdue.edu
italics, and the corresponding                                                          (purdue.edu)
domains are in parentheses.            co.uk
Arrows denote name server              ac.uk
entries                                            ns0.ja.net
                                       ...
                                                    (ac.uk)                       * .purdue.edu
                                                      ic.ac.uk
authoritative path to lookup:                         qmw.ac.uk
jeans-pc.dcs.qmw.ac.uk                                ...


                                 alpha.qmw.ac.uk   dns0.dcs.qmw.ac.uk     dns0-doc.ic.ac.uk
                                  (qmw.ac.uk)       (dcs.qmw.ac.uk)         (ic.ac.uk)


                                   dcs.qmw.ac.uk    *.dcs.qmw.ac.uk       *.ic.ac.uk
                                    *.qmw.ac.uk




                                                                                                       *
           DNS in typical operation
                       a.root-servers.net
                           (root)
Without caching
                         uk
    ns1.nic.uk           purdue.edu
       (uk)              yahoo.com
                          ....                           ns.purdue.edu
                                                          (purdue.edu)
         co.uk
         ac.uk
                     ns0.ja.net
         ...
                      (ac.uk)                       * .purdue.edu
                        ic.ac.uk
                        qmw.ac.uk
                        ...                      IP: alpha.qmw.ac.uk


   alpha.qmw.ac.uk
                                                                    2        client.ic.ac.uk
                     dns0.dcs.qmw.ac.uk     dns0-doc.ic.ac.uk
    (qmw.ac.uk)       (dcs.qmw.ac.uk)         (ic.ac.uk)
                                                             IP:ns0.ja.net
                                                 IP:jeans-pc.dcs.qmw.ac.uk
     dcs.qmw.ac.uk
      *.qmw.ac.uk
                      *.dcs.qmw.ac.uk       *.ic.ac.uk
                                                                                 4
                                                                                 1
                                                         jeans-pc.dcs.qmw.ac.uk ?



                           IP:dns0.dcs.qmw.ac.uk                3
                                                                                               *
         DNS server functions and
              configuration
   Main function is to resolve domain names for computers, i.e. to
    get their IP addresses
       caches the results of previous searches until they pass their 'time to live'
   Other functions:
       get mail host for a domain
       reverse resolution - get domain name from IP address
       Host information - type of hardware and OS
       Well-known services - a list of well-known services offered by a host
       Other attributes can be included (optional)




                                                                                       *
             DNS resource records
Figure 9.5

 Record typeMeaning                          Main contents
 A           A computer address               IP number
 NS          An authoritative name server Domain name for server
 CNAME       The canonical name for an aliasDomain name for alias
 SOA                                          P
             Marks the start of data for a zone arameters governing the zone
 WKS         A well-known service description List of service names and protocols
 PTR         Domain name pointer (reverse Domain name
             lookups)
 HINFO       Host information                 Machine architecture and operating
                                              system
 MX          Mail exchange                            preference, host
                                              List of <            > pairs
 TXT         Text string                      Arbitrary text




                                                                                    *
                           DNS issues
   Name tables change infrequently, but when they do, caching
    can result in the delivery of stale data.
       Clients are responsible for detecting this and recovering
   Its design makes changes to the structure of the name space
    difficult. For example:
       merging previously separate domain trees under a new root
       moving subtrees to a different part of the structure (e.g. if Scotland
        became a separate country, its domains should all be moved to a new
        country-level domain.




                                                                                 *
   Let’s look at a live example.
   Let’s evaluate wrto transparency.
Some Basic Concepts
                  Send/Receive
   One process sends a message (Sequence of bytes) to
    another process.
      Synchronous




       Asynchronous
              Message Passing
   Destinations
   Reliability
   Ordering
                       Sockets
   Socket abstraction provides an endpoint for
    communication between processes.
   For a socket to receive a message, must be bound to a
    local port and one of the internet addresses of the
    computer on which it runs.
   Each socket is associated with a particular protocol –
    TCP or UDP.
                       Figure 4.2
                    Sockets and ports


                             any port   agreed port
         socket                                              socket

                                    message
           client                                            server
                                   other ports
Internet address = 138.37.94.248                 Internet address = 138.37.88.249
        External Data Representation and
                    Marshalling
   Data must be flattened.
   External Data Rep
   Marshalling/Unmarshalling
   Approaches:
       CORBA CDR (binary)
       Java Object Serialization
       Sun XDR
       XML RPC (webservices)
         Communications Patterns
   Client Server
       Request-reply
            HTTP
       Other options
            Request
            Request-reply-acknowledge
         Figure 4.11
Request-reply communication
    Client                    Server



                  Request
 doOperation
                  message   getRequest
                            select object
    (wait)                    execute
                  Reply       method
                  message    sendReply
 (continuation)
          Implementing Request-Reply over UDP
   Operations:
       DoOperation
       getRequest
       sendReply
   DoOperation: uses a timeout to deal with failures.
            Could return immediately and fail
            Or send repeatedly until it gets a response or is “sure”of failure
   Discarding duplicate requests:
       Protocol designed to recognize succesive requests with same identifier
        and filter duplicates.
   Lost Reply Messages
       Idempotency
       History
        Implementing Request-Reply over TCP
   Stream protocol has advantages over multi-
    packet protocols.
                              HTTP
   HTTP 1.0 (over TCP)
       Client requests and server accepts a connection at the
        default server port or at a port specified in the URL.
       Client sends a request message to the server
       Server sends a reply message to the client.
       Connection closed
   Need to establish connection for each r-r exchange is
    expensive.
       HTTP 1.1: Pconns: connections that remain open over a
        series of r-r exchanges between client and server.
       Can be closed by either side or by idleness.
             Figure 4.15
         HTTP request message

method         URL or pathname      HTTP versionheadersmessage body

GET                                  HTTP/ 1.1
         //www.dcs.qmw.ac.uk/index.html
          Figure 4.16
       HTTP reply message

                        reason headers message body
 HTTP version status code
HTTP/1.1       200      OK            resource data
      Communications Patterns: Multicast
   IP Multicast
   Reliable multicast