Distributed Systems: Undergraduate Assessed Coursework 1 Dan Chalmers 11th January, 2007 1 Introduction This is an individual piece of work and will be peer-assessed, with sampling by the course tutor. The assignment is due in at 1600 on Thursday of week 4 – 1st February 2007, and will be submitted electronically. This exercise is worth 15% of the marks for the course. 1.1 Aims of the Assignment • To experiment with writing socket and RMI based communications code in Java. • To consider the eﬀect of diﬀerent communication patterns and message handling mechanisms. • To consider basic performance measurement and failure handling in distributed systems. 2 Requirements 2.1 Design and Implementation of Protocols You are required to develop three communications protocols, and to examine their behaviour. In each case the system should support a node sending data to one or more other nodes and receiving an acknowledgement. The data sent should identify: • whether it contains a message or acknowledgement • a simple name / value pair with variable content, ie having some structure but not so much that the focus of the exercise becomes writing a parser or so little information that it might be optimised out entirely. • any other data required for the protocol There are a number of diﬀerent implementations possible, and the core of this assignment is to gain experience of these and make comparisons between them. The main decisions are: 2 Distributed Systems Undergraduate Coursework 1 – 2007 UDP vs RMI Handling communications using UDP sockets or RMI (we shall not use multicast here). Refer to the programming exercise classes for initial steps on this. Data Extraction Handling the structure in the data with java.util.regex or java.util.StringTokenizer for UDP and with arguments to methods (serializing objects where necessary) for RMI. Communications Patterns Handling communications to multiple destinations by diﬀerent distributions of responsibility and eﬀort: • a list to be sent to in-turn from the source (a → b, a → c, a → d, a → e); • arranging the destinations in a ring and forwarding (a → b, b → c, c → d, d → e); • arranging the destinations in a binary-tree and forwarding (a → b, a → c, b → d, b → e). The diﬀerent communications patterns may achieve diﬀerent performance, particularly in the case of the tree where there is obvious parallelism. There is also the possibility for diﬀerent behaviour when a node fails. You should consider how acknowledgements are handled so that they ﬁt the spirit of the pattern and any eﬃciencies it oﬀers. Please do not use other pre-designed communications packages which may implement these patterns of communication for you. You should provide three diﬀerent solutions. The set of solutions you provide should include at least one solution with each of UDP sockets and RMI; and two diﬀerent source to destination patterns. This allows you to compare raw socket protocols with RMI using the same communications pattern; and also two communications patterns with both using the same “transport” mechanism. The choice of regex or StringTokenizer can be retained if multiple UDP implementations are made. Where necessary original (rather than this-hop) source and (lists of) destination nodes should be identiﬁed using extra data. Where sockets / RMI provides this then duplication is not required. Messages should be acknowledged. For RMI the method return (without exceptions) is suﬃcient. If it is not clear which message is being acknowledged, or which node(s) the acknowledgement is coming from then the protocol should include this information. Your code should be clear and appropriately commented. Exceptions should be handled appropriately (recovery is nice, clear messages a minimum). Threads may be part of your solution, take care that concurrency is handled carefully. 2.2 Experiments Function test First, using one machine, conﬁrm that your code functions correctly. Failure Behaviour Second, over a small number of machines examine the behaviour of your code under node-failures (ie terminate your program at one node). Timing Third, add some repetition and timing to your code. If you avoid having other applications running it should be possible to simply use Java’s Date class to make Distributed Systems Undergraduate Coursework 1 – 2007 3 a reasonable analysis. Repeating the client call 100 times, as if 100 messages were to be sent in rapid succession, should be suﬃcient. Again, use more than one PC so that real network delays (albeit small within the lab) can be shown. Run these tests several times, to better understand any variation in the process. Take care in running these tests: keep a careful log of your results; build up the number of processes; build up from one machine to two or three (the number of machines need not be as large as the number of processes); build up the number of communication cycles; take care in choosing which processes to stop; and above all do not do anything which interferes with other people’s ability to use the labs. There is no need to use large numbers of PCs or to overwhelm the network in order to perform these experiments. You do not need to spend long using multiple machines. Consider the available resources in the lab when scheduling your work. You are welcome to use personal systems and/or teaching unix servers in these experiments – the absolute performance is less important than the process and comparison. Do use similar speciﬁcation machines and machine/process conﬁgurations for timing tests so that comparisons are valid. 2.3 Write-Up You should write a report of up to 1000 words, plus any relevant ﬁgures, describing the implementations chosen and what you found. Which combinations of communications patterns and sockets/RMI did you make? What is the behaviour of the various solutions under node failure conditions? What might be done to improve this? What is the performance of the various solutions, how repeatable are your ﬁndings? Given your ﬁndings and a theoretical analysis how would you expect performance to scale with 100, or 1,000,000 nodes and with concurrent messages? Comment on the complexity of the code and how diﬃcult it would be to modify or extend your work. Comment on any surprises or problems you encountered. Cite any external sources you referred to in your work. 3 Submission Solutions should be emailed to “D.Chalmers@sussex.ac.uk” with the subject “DS Coursework 1 Submission” and two attached ﬁles called “< userid > −report.ps” containing your report and “< userid > −code.zip” containing all code, e.g. I would submit a mail with attachments “dc52-report.ps” and “dc52-code.zip”. This ensures that your work remains readily identiﬁable. The report should be either Postscript (.ps, as indicated), PDF (.pdf) or plain text (.txt), not word. GZipped-tar is the only acceptable alternative to zip. Your emailed submission will be acknowledged automatically once it arrives in my mailbox. All ﬁles should contain near the start (in comments for code ﬁles) your name, department, user id, which coursework this is, date of submission and the usual declaration: “In making this submission I declare that my work contains no examples of misconduct, such as plagiarism, collusion, or fabrication of results.” 4 Distributed Systems Undergraduate Coursework 1 – 2007 The exercise is intended as individual work, however if any code or ideas arise from books, web-sites or work with your class-mates make clear what parts these are. This is acceptable (for instance I would expect that a Java API reference would be used, although you are expected to have solved the core problems of the exercise yourself to gain the marks), it simply needs to be clear which work is yours. You can use any computer system you have access to for the development of solutions, taking appropriate backups if this is not a university system. The emailed submission should come from your university account. Failures of personal systems and oﬀ-site internet services are not reasonable grounds for late submission. A general extension of the deadline will be given in the event of widespread and long-duration failures of lab systems or university email systems. The usual rules for timely submission and plagiarism apply and the usual penalties will be applied to those failing to meet these requirements. Submission time will be taken from university mail servers. All code should compile and execute without error. Any code which you wish to have considered as partial solutions but which cause errors should be commented out. A portion of the marks will be given for correct function, which is most accurately assessed by running your code and this cannot be done where code fails. The required incantations for execution should be clearly documented in your report. 4 Marking 4.1 Mark Scheme Marks will be allocated roughly as follows: • Correct use of RMI and UDP: 2 ∗ 10% = 20% • Correctness & completeness of the communication protocols: 3 ∗ 10% = 30% • Elegance of design, particularly any additional work on protocols: 10% • Design and analysis of testing: 20% • Overall feel of the piece of work: 10% • Participation in the peer marking: 10% Most of these require both good code and a clear description in your report, but none requires great complexity. The emphasis of the exercise is on the protocols developed and analysis of them, rather than on a user interface and no marks arise from the user interface. Marks will typically be the average of two peer marks, moderated by the course tutor. Your marked work will be returned via the department oﬃce as usual. 4.2 Peer Assessment Detailed marking guidelines will be given for the peer-marking session in the lab/seminar session on week 5. As well as being worth 10% of the assessment it is an excellent opportunity to reﬂect on your own work by comparing it to that of others and engaging with the assessment of work, both of which are instructive learning activities.
Pages to are hidden for
"Distributed Information Systems Coursework"Please download to view full document