Distributed Information Systems Coursework by kuo75959


More Info
									                         Distributed Systems:
                 Undergraduate Assessed Coursework 1
                                     Dan Chalmers

                                  11th January, 2007

1     Introduction
      This is an individual piece of work and will be peer-assessed, with sampling by the
course tutor. The assignment is due in at 1600 on Thursday of week 4 – 1st February
2007, and will be submitted electronically. This exercise is worth 15% of the marks for
the course.

1.1   Aims of the Assignment
    • To experiment with writing socket and RMI based communications code in Java.

    • To consider the effect of different communication patterns and message handling

    • To consider basic performance measurement and failure handling in distributed

2     Requirements
2.1   Design and Implementation of Protocols
      You are required to develop three communications protocols, and to examine their
behaviour. In each case the system should support a node sending data to one or more
other nodes and receiving an acknowledgement. The data sent should identify:

    • whether it contains a message or acknowledgement

    • a simple name / value pair with variable content, ie having some structure but not
      so much that the focus of the exercise becomes writing a parser or so little
      information that it might be optimised out entirely.

    • any other data required for the protocol

      There are a number of different implementations possible, and the core of this
assignment is to gain experience of these and make comparisons between them. The
main decisions are:
2                               Distributed Systems Undergraduate Coursework 1 – 2007

UDP vs RMI Handling communications using UDP sockets or RMI (we shall not use
   multicast here). Refer to the programming exercise classes for initial steps on this.

Data Extraction Handling the structure in the data with java.util.regex or
    java.util.StringTokenizer for UDP and with arguments to methods (serializing
    objects where necessary) for RMI.

Communications Patterns Handling communications to multiple destinations by
   different distributions of responsibility and effort:

        • a list to be sent to in-turn from the source (a → b, a → c, a → d, a → e);
        • arranging the destinations in a ring and forwarding
          (a → b, b → c, c → d, d → e);
        • arranging the destinations in a binary-tree and forwarding
          (a → b, a → c, b → d, b → e).

      The different communications patterns may achieve different performance,
      particularly in the case of the tree where there is obvious parallelism. There is also
      the possibility for different behaviour when a node fails. You should consider how
      acknowledgements are handled so that they fit the spirit of the pattern and any
      efficiencies it offers. Please do not use other pre-designed communications packages
      which may implement these patterns of communication for you.

      You should provide three different solutions. The set of solutions you provide
should include at least one solution with each of UDP sockets and RMI; and two
different source to destination patterns. This allows you to compare raw socket protocols
with RMI using the same communications pattern; and also two communications
patterns with both using the same “transport” mechanism. The choice of regex or
StringTokenizer can be retained if multiple UDP implementations are made.
      Where necessary original (rather than this-hop) source and (lists of) destination
nodes should be identified using extra data. Where sockets / RMI provides this then
duplication is not required.
      Messages should be acknowledged. For RMI the method return (without
exceptions) is sufficient. If it is not clear which message is being acknowledged, or which
node(s) the acknowledgement is coming from then the protocol should include this
      Your code should be clear and appropriately commented. Exceptions should be
handled appropriately (recovery is nice, clear messages a minimum). Threads may be
part of your solution, take care that concurrency is handled carefully.

2.2   Experiments
Function test First, using one machine, confirm that your code functions correctly.

Failure Behaviour Second, over a small number of machines examine the behaviour of
     your code under node-failures (ie terminate your program at one node).

Timing Third, add some repetition and timing to your code. If you avoid having other
    applications running it should be possible to simply use Java’s Date class to make
Distributed Systems Undergraduate Coursework 1 – 2007                                       3

      a reasonable analysis. Repeating the client call 100 times, as if 100 messages were
      to be sent in rapid succession, should be sufficient. Again, use more than one PC
      so that real network delays (albeit small within the lab) can be shown. Run these
      tests several times, to better understand any variation in the process.

      Take care in running these tests: keep a careful log of your results; build up the
number of processes; build up from one machine to two or three (the number of machines
need not be as large as the number of processes); build up the number of communication
cycles; take care in choosing which processes to stop; and above all do not do anything
which interferes with other people’s ability to use the labs. There is no need to use large
numbers of PCs or to overwhelm the network in order to perform these experiments.
You do not need to spend long using multiple machines. Consider the available resources
in the lab when scheduling your work. You are welcome to use personal systems and/or
teaching unix servers in these experiments – the absolute performance is less important
than the process and comparison. Do use similar specification machines and
machine/process configurations for timing tests so that comparisons are valid.

2.3   Write-Up

       You should write a report of up to 1000 words, plus any relevant figures, describing
the implementations chosen and what you found. Which combinations of
communications patterns and sockets/RMI did you make? What is the behaviour of the
various solutions under node failure conditions? What might be done to improve this?
What is the performance of the various solutions, how repeatable are your findings?
Given your findings and a theoretical analysis how would you expect performance to
scale with 100, or 1,000,000 nodes and with concurrent messages? Comment on the
complexity of the code and how difficult it would be to modify or extend your work.
Comment on any surprises or problems you encountered. Cite any external sources you
referred to in your work.

3     Submission
      Solutions should be emailed to “D.Chalmers@sussex.ac.uk” with the subject “DS
Coursework 1 Submission” and two attached files called “< userid > −report.ps”
containing your report and “< userid > −code.zip” containing all code, e.g. I would
submit a mail with attachments “dc52-report.ps” and “dc52-code.zip”. This ensures that
your work remains readily identifiable. The report should be either Postscript (.ps, as
indicated), PDF (.pdf) or plain text (.txt), not word. GZipped-tar is the only acceptable
alternative to zip. Your emailed submission will be acknowledged automatically once it
arrives in my mailbox.
      All files should contain near the start (in comments for code files) your name,
department, user id, which coursework this is, date of submission and the usual

      “In making this submission I declare that my work contains no examples of
      misconduct, such as plagiarism, collusion, or fabrication of results.”
4                               Distributed Systems Undergraduate Coursework 1 – 2007

The exercise is intended as individual work, however if any code or ideas arise from
books, web-sites or work with your class-mates make clear what parts these are. This is
acceptable (for instance I would expect that a Java API reference would be used,
although you are expected to have solved the core problems of the exercise yourself to
gain the marks), it simply needs to be clear which work is yours.
       You can use any computer system you have access to for the development of
solutions, taking appropriate backups if this is not a university system. The emailed
submission should come from your university account. Failures of personal systems and
off-site internet services are not reasonable grounds for late submission. A general
extension of the deadline will be given in the event of widespread and long-duration
failures of lab systems or university email systems. The usual rules for timely submission
and plagiarism apply and the usual penalties will be applied to those failing to meet
these requirements. Submission time will be taken from university mail servers.
      All code should compile and execute without error. Any code which you wish to
have considered as partial solutions but which cause errors should be commented out. A
portion of the marks will be given for correct function, which is most accurately assessed
by running your code and this cannot be done where code fails. The required
incantations for execution should be clearly documented in your report.

4     Marking
4.1   Mark Scheme
      Marks will be allocated roughly as follows:
    • Correct use of RMI and UDP: 2 ∗ 10% = 20%
    • Correctness & completeness of the communication protocols: 3 ∗ 10% = 30%
    • Elegance of design, particularly any additional work on protocols: 10%
    • Design and analysis of testing: 20%
    • Overall feel of the piece of work: 10%
    • Participation in the peer marking: 10%
Most of these require both good code and a clear description in your report, but none
requires great complexity. The emphasis of the exercise is on the protocols developed
and analysis of them, rather than on a user interface and no marks arise from the user
      Marks will typically be the average of two peer marks, moderated by the course
tutor. Your marked work will be returned via the department office as usual.

4.2   Peer Assessment
      Detailed marking guidelines will be given for the peer-marking session in the
lab/seminar session on week 5. As well as being worth 10% of the assessment it is an
excellent opportunity to reflect on your own work by comparing it to that of others and
engaging with the assessment of work, both of which are instructive learning activities.

To top