Seminar Report by fdh56iuoui


									                  Seminar Report

   Dryad: Distributed Data-Parallel Programs from
             Sequential Building Blocks


DryadLINQ: A System for General-Purpose Distributed Data-
     Parallel Computing using a High-Level Language

              written by: Tahmineh Sanamrad

                 Hot Topics in DMS Seminar
               offered by: Prof. Nesime Tatbul

                Faculty of Computer Science
                   ETH Zürich University
                   Spring Semester 2009
1. Overview
As there was two papers to review, there will be two different report parts. The fraction dedicated to each paper
is completely related to the number of slides dedicated to that topic in the presentation. However, the emphasis
is mostly on the Dryad execution engine and the part dedicated to DryadLINQ will be introduced as
complementary to the first part and as a high-level programming interface enabling programmers to work with
the Dryad engine.

2. Dryad
What I find helpful personally, was the talk done by Michael Izard, a distinguished Micorsoft Researcher, at the
google site, which was also available for public on youtube. There was also some other talks of Michael Izard on
the Microsoft site, which also helped me a lot to get the idea. There was also descriptive slides on this topic,
developed by the paper’s authors for techtalks on some eminent universities.
A big Pitfall of this Project like any other Microsoft Project is the concept of proprietary software. Thus, the testing
of the engine and dryadLINQ was not very straightforward. There is a 3 page agreement which has to be signed
by a partner university in order to get hold of the binaries and codes, which is not an easy approach for busy
students and professors.
Therefore, my report is more based on the impression and research I made on the theoretical aspect of the dryad

    2.1.         Project Name

Dryad means the animated spirit of the forest and the trees. The Dryad engine is also responsible for controlling
the flow, animating and scheduling the computations on vertices. I think the naming was appropriately done by
the project group members.

    2.2.         Goals
The main idea in the dryad project is to make it easier for the developers to write efficient parallel and distributed
applications. Dryad is not the only project or technology addressing this problem. The paper refers to
technologies such as shader languages for general programming on GPU, parallel databases and Mapreduce.
These technologies were successful but are somehow limited. In all these technologies the developer is explicitly
forced to consider the data parallelism of the computation. Once, an application is cast into this framework, the
system is automatically able to provide the required scheduling and distribution. Nevertheless, this can be an
advantage or a disadvantage. An advantage because it offers simplicity to the whole system (developer and
execution engine), but it is not enough flexible. A good question is that what can flexibility give us that we do not
already have using for example map reduce?
Map reduce is said to be running uniformly on a set of clusters, but the truth is that we may observe a uniform
pattern. In reality there are some nodes being inefficient or not efficient at all in contributing the result. Dryad
suggests, by cutting out these nodes or better scheduling, we can gain an excellent performance. In the following
chapter, I will discuss how this “excellent performance” is being achieved. Briefly said, Dryad’s goal is mainly
achieved through following sub goals:
    •   Scheduling across resources
    •   Optimizing the level of concurrency
    •   Recovering from communication or computer failures

    2.3.         Design

The main design point is that Dryad works with directed acyclic graphs (DAG). This is the natural most general
design point(“Michael Izard”),why?
A directed graph may be used to represent a network of processing elements; in this formulation, data enters a
processing element through its incoming edges and leaves the element through its outgoing edges. Examples of
this include the following:
    •   In electronic circuit design, a combinational logic circuit is an acyclic system of logic gates that computes
        a function of an input, where the input and output of the function are represented as individual bits.
    •   A Bayesian network represents a system of probabilistic events as nodes in a directed acyclic graph. The
        likelihood of an event may be calculated from the likelihoods of its predecessors in the DAG.
    •   Dataflow programming languages describe systems of values that are related to each other by a directed
        acyclic graph. When one value changes, its successors are recalculated; each value is evaluated as a
        function of its predecessors in the DAG.
Allowing cycles causes trouble, for example deadlocks. DAG supports the full relational algebra by having multiple
inputs or outputs from a vertex, and the most important of all is that there is at least one sort-order to
accomplish the output(topological sorting). I would like to remind here the topological sort which is completely
based on this property of DAGs: Every DAG has one or more topological sorts.
Dryad tries to optimize throughput, not latency (dealing with time-outs). Hence, this makes it completely useless
for real-time applications, which the latency has to be avoided as much as possible.
Another aspect is that, the dryad project assumes working with a private network, and not over the internet.
Thus, dealing with authentication and other security factors has been left out from the project.
Dryad execution engine also only works with a finite set of inputs, i.e. data streaming approaches can not be
taken care of using this execution engine.
Back to the DAG, there are 4 kind of objects requiring to deal with:
    1) Inputs: which are normally persisted in the database
    2) Processing Vertices: which can be our clusters
    3) Outputs: the result of our processing
    4) Channels
Channels are the intermediary outputs and inputs of the vertices. Channels can be files (sometimes the
intermediary data being produced needs to be persisted), TCP pipes, or shared memory (which is the fastest
communication channel, based on the benchmark done at the end of the paper).
A job for Dryad is defined as a DAG, which the execution engine should schedule and optimize.
The Dryad system organization is as following:

                                   V       V       V

A Dryad job is coordinated by a process called the job manager (JM in the picture). JM runs either within the
cluster or on a user’s workstation with network access to the cluster. The job manager contains the application-
specific code along with library code to schedule the work across the available resources. Job Manager is only
responsible for control decisions, so it will not be a bottleneck for any data transfers. The cluster has a name
server, which helps JM to discover the available resources and their position in order to be able to also optimize
based on locality. There is a daemon(D) running on each computer in the cluster that is responsible for creating
processes on behalf of the job manager. The daemons make it possible for the JM to communicate with the
remote vertices and gather related statistics in order to do an optimum scheduling.
Good Question: What are the responsibilities of job manager?
    (1) instantiating a job’s dataflow diagram
    (2) scheduling processes on cluster computers
    (3) providing fault tolerance by re-executing failed or slow processes
    (4) monitoring the job and collecting statistics
    (5) transforming the job dynamically according to feedbacks received from daemons on runtime
Good Question: What is running on those machines which made them part of this engine?
On all these machines runs a process on Dryad’s behalf.
Good Example: Job Manager is similar to Project Manager, it checks the available resources and assign proper
tasks to them.

    2.4.         Policy Managers

In order to simplify job management, when a graph is constructed, each vertex is placed in a “stage”. The stage
can be seen as the summary of the overall job. Mostly vertices having the same computational functionality are
place into the same stage. Each stage has a “stage manager”. Each inter-stage set of edges has a “connection
The managers get upcalls for all important events in the corresponding vertices, and can make policy decisions,
the managers can even rewrite the graph at run-time. The user can change the managers.
in the slides there would be also some examples for different kind of policy managers and how they would react
to certain events being reported to the job manager from vertices. Following policies will be discussed on the
    •     Duplicate Manager
    •     Aggregation Manager: putting the computations close to its input data
    •     Range-Distribution Manager

    2.5.         Fault Tolerance

The fault tolerance being discussed about is suitable when all vertex programs are deterministic, i.e. each time
the program is running, having the same input, the same output will be produced.
When a vertex execution fails for any reason the job manager will be informed.
If the vertex reports an error, it will be reported through the daemon to the job manager;
If the process crashes the daemon notifies the job manager;
If daemon fails, the manager receives a time-out;
In all these above cases the vertex re-executes again.
If failure occurs because the input channel crashes, the vertex created the failed input channel should be re-
If the default stage manager includes heuristics to detect vertices that are running slower than their peers,
duplicate policy (already seen in policy manager section), executes a copy of that vertex elsewhere, and the
output from whichever was finished first will be used.

    2.6.         Conclusion

Following points have been the weaknesses and strength of this paper and project from my point of view:
Disadvantage: Dryad cannot be easily downloaded and used.
Disadvantage: There is no reference about comparing dryad with any other similar project. However, the paper
claims it has an excellent performance. Compared to what?
Disadvantage: Dryad is too complex for an everyday programmer(only specialized C++ programmers can use it)
Disadvantage: The job manager uses a greedy algorithm to distribute computations to vertices and thus only one
job can run at moment. The paper didn’t state, the case which there are more jobs running simultaneously in the
system. Does it support Job Concurrency at all?
Disadvantage: Limited set of problems, certain data inputs, performing some intermediate steps on them, having
a single output.
Disadvantage: It doesn’t work for data streaming problems, it works on a finite set of input data, do a bunch of
transformations and end up with an output. (Decision Support Systems)
Disadvantage: No talk about handling heterogeneous environments, as you can see on the software stack it’s all
about windows servers…

Advantage: Flexibility, giving a specialized programmer being able to place the computations and define data
channels by herself/himself.
Advantage: Performing runtime optimizations based on the statistics reported by daemons(clusters)
Advantage: Centralized Scheduler
Advantage: Fault tolerance

3. DryadLINQ
Micheal Izard: “When the dryad project was out, people were jumping up and down to use a proper
programming model to work with dryad. Then a coworker of the Dryad project (Yuan Yu) came with the idea to
integrate it with LINQ.”
Language Integrated Query is an extension of C# which allows one to write declarative computations on
collections. DryadLINQ translates LINQ programs into Dryad computations, C# and LINQ data objects become
distributed partitioned files. LINQ queries become distributed Dryad jobs. C# methods become code running on
the vertices of a Dryad job.
A step by step execution will be as following:

    (1) A .NET user application runs. It creates a DryadLinq expression object. Application calls ToDryadTable
        triggering the data-parallel execution.
    (2) The expression object is handed to DryadLinq.
    (3) DryadLinq compiles the LINQ expression into a distributed Dryad execution plan. It performs: a_ the
        decomposition of the expression into subexpressions, each to be run in a separate dryad vertex. b_ the
        generation of code and static data for the remote dryad vertices. c_ the generation of serialization code
        for the required data types.
    (4) DryadLinq calls the job manager. The job manager creates the job graph using the plan created in step3.
        It schedules the vertices as the resources become available.
    (5) Each vertex executes the assigned program.
    (6) When the job is fully accomplished it writes data to the output tables.
    (7) The job manager process terminates, and the control returns back to DryadLinq. DryadLinq encapsulates
        the local DryadTable objects and passes it to the user application
    (8) The application may generate subsequent DryadLinq expressions, to be executed by repeating steps 2-8

4. References
    4.1.        DryadLINQ
DryadLINQ Research Home Page
DryadLINQ Tutorial
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Some sample programs written in DryadLINQ

    4.2.        Dryad
Dryad Home Page
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

    4.3.        LINQ
LINQ: .NET Language-Integrated Query
Language-Integrated Query (LINQ)
Running Queries On Multi-Core Processors

To top