Desktop Grids by liuhongmeiyes


									                        Desktop Grids

Entropia Desktop Grid

    Commercial Desktop Grid System, designed for enterprise computing

    Windows-only system
    Out of business


    Berkeley Open Infrastructure for Network Computing

    Example of a desktop grid and volunteer computing
    Open source – many clients (Linux, Windows, MacOS). Servers
    mostly for *nix systems, some Windows servers mostly

      Outgrowth of the BOINC project

      Adds support for connecting local grids into a hierarchy of
                   Desktop Grids
    C# based grid system. Windows only. Closed source
                      Desktop Grids
Why desktop grids?

      $550: AMD Triple Core, 4GB memory, 320 GB hard drive

      $800: Intel Quad core processor, 5GB memory, 640 GB hard drive

      $1000: Core2 Quad Processor, 6GB memory, 750 GB hard drive

    Computers are mostly idle

    Return on investment
        Companies have already spent the money on desktops. No
        need for expensive new hardware if you can harness the
        power of the desktops

    Numbers of desktops worldwide expected to exceed 1 billion by
                           Desktop Grids
Requirements for desktop grid systems

  Efficient: Harvest most of the unused cycles on a computer.

  Robust: Jobs must complete with predictable performance,
  masking underlying resouce failures. Must tolerate job, machine,
  and network failures.
  Secure: Protect both the integrity of of the distributed application
  as well as the integrity of the physical node (no tampering
  with the user's registry, modifying user's data, etc)

  Scalable: Must scale to 1000s, 10,000s, 100,000s of machines.
  Not much additional administration should be required. Must scale
  up and down, depending on who's on and who's off the network.

  Manageable: Can't depends on dozens of admins to manage the

  Unobtrusive: The desktop user can't be aware that their computer
  is also being used as a node in a desktop grid. The primary use
  of the computer is for spreadsheet, word docs, email, internet, etc
                           Desktop Grids
Easy to integrate applications: Must support applications
developed with varied programming languages, models,and
tools, all with minimal development effort. This includes binary-
only applications where no source code is available – can not
require recompilation of code

Must see the same improvements in throughput as you would
see on a cluster or supercomputer.
                        Desktop Grids
Entropia grid architecture

  Physical Node Management

      Gathers statistics on the machine: CPU speed, memory, disk size

      Controls applications that run in the Entropia Virtual Machine

  Resource Scheduling
      Matches units of computation with the appropriate resource

      Must handle failure rates that are greater than in traditional
      clustered systems
                        Desktop Grids
Job Management

The end user submits a single logical job. The job
manager decomposes the logical job into individual
sub-jobs. It provides access to the status of subjobs
and aggregates the results of the sub-jobs
                        Desktop Grids

The end-user submits an application binary, along with any dll's required,
registers any data sets that will be used and Visual Basic scripts that
will be used to control sub-job execution.
                           Desktop Grids
Sub-job scheduler
    Maintains a list of clients on which it is allowed to run jobs

    Client information includes attributes of the physical nodes,
    whether they're connected, and status

    Maintains several priority queues for sub-jobs

    Each scheduler manages jobs for 1000s of clients

    Reports failed sub-jobs to the Job Manager

    Helps to cache executable and data files

Node Manager

    Centralized interface to manage all clients on the grid. Allows
    administrators to monitor, add, remove, stop, and restart clients. Also
    monitors the status of each client – issues, how much work has been
                           Desktop Grids
Desktop Client
  Runs sub-jobs on the machine at low priority

  Pauses jobs if desktop use is high.
  Makes sure that jobs do not get out of control. Kills any out-of-control

 Mediates access to system resources – file system, registry, and gui.
 This forms the basis for the sandboxed execution environment
                       Desktop Grids
Entropia Virtual Machine

  Desktop Controller is assigned a sub-job and monitors it for memory,
  CPU, I/O usage, number of processes and threads
                          Desktop Grids

Harvest unused cycles. All process run at the lowest

Resource limits are configurable

Single Desktop Controller on each machine

VM Portal Thread

    Thread created when the process is started. It is hidden from
    the process
    Receives a heart beat from the Desktop Controller. If this is lost, the
    thread terminates itself.
    This communication path also controls the pausing and resuming of
                Desktop Grids

Sandbox Layer
                             Desktop Grids
Sandbox layer (continued)

Works by rewriting the import table of the binary application
so that vm.dll is the first dll loaded at runtime

When dll_main from vm.dll runs, it modifies the binary
and any other dll files to intercept system calls

Some of the Windows APIs (file system, registry, and
network) are mediated. Others (mouse, graphical user
interface, shutting down or logging off the current user) are
                            Desktop Grids
Device Driver Mediation

Installed as part of the EVM by the administrator.
Software interrupt handler - replaces entries in the jump
table with pointers to the EVM functions for certain system
calls (eg File I/O)

Desktop Conroller shares with the device driver all of the
processes running under the EVM - so the driver knows
which process to intercept.

Small amount of overhead for those system calls that are intercepted.
But not all calls are intercepted - File I/O, process/thread creation,
registry access. For example File I/O, but not read/write

Also protects the EVM - non-sandboxed applications can't look
inside the Entropia directory
                             Desktop Grids
Self-modifying code is not permitted to execute in the EVM

The binary’s virtual address space is locked down.

The code portions of the address space are set as
“executable” and “non-writable”. Then the rest of the virtual
address space has their permissions set as “non-

One exception: the Java Virtual Machine, which uses a just-in-time
                              Desktop Grids
Sandbox layer (continued)

File System virtualization

File I/O routines are intercepted and virtualized to redirect
and restrict a subjob’s access to the filesystem

For example, a subjob believes that it is accessing a file in
the directory C:\Program Files\ when in fact it is accessing a
sandbox directory (e.g., C:\Entropia\root\C:\Program Files\).
The subjob only sees C:\Program Files\. Certain existing
directories (e.g. C:\WINNT\System\) on the desktop can be
accessed by subjobs as read-only

Registry Virualization

Certain parts of the registry are marked as read only, and the rest marked
as not accessible All writes to the registry are redirected. Updates to
a restricted registry entry will result in a copy on write.
                               Desktop Grids
Sandboxing (cont.)

    Validate binaries for execution

    A cryptographic checksum of each submitted binary
    file is created. These values are used to verify that
    the files to execute have not changed since it was

    A configuration file containing the list of sandboxed files
    and their checksums is also sent to the client to provide
    file validation. The configuration file is encrypted during
    subjob submission. It is stored on disk in encrypted form,
    and the key is securely communicated to the EVM.

    Before any application binary or dll file can be invoked the checksum
    is first validated.

    Intercept the CreateProcess call. A sandboxed application is allowed to
    launch another application only if (a) it is registered with the EVM in the
    configuration file and (b) its checksum matches.
                              Desktop Grids

Prevent subjob processes from opening with write
permissions on any binary listed in the configuration file
                             Desktop Grids
Application Security – protecting the jobs and data from the desktop

To start, desktop users must not have administrative privileges, and the
EVM and the subjob processes are run under a special Entropia user
account. This prevents other users from being able to look at the data of
the EVM and running subjob processes.

The Entropia sandbox keeps all data files encrypted on disk, so that their
contents are not accessible even if the disk is compromised.

In addition, the sandbox automatically monitors and checks the integrity of
a grid application’s binary, input and result files with an encrypted
checksum file.

If any tampering is found, the subjob is scheduled on another client.
                    Desktop Grids
BOINC Desktop
Grid Architecture

BOINC Server

BOINC Client
                              Desktop Grids
The server
    Apache Web Server
    MySQL database
   The scheduler CGI program handles requests from clients, receiving
   completed results and sending new work to compute
    Feeder: Loads tasks from the database for the scheduler
    Transitioner: Handles state transitions of workunits and results. Generates
    results from workunits when they are first created.
    Validator: Compares results from work units (User-supplied daemon. May
    use fuzzy comparison in cases of floating-point results where there might be
    differences in precision.) If the results are valid, creates the “canonical”
    Assimilator: Processes the canonical result
    File deleter: Deletes output files
                           Desktop Grids
BOINC Server Features

   Homogeneous redundancy (sending workunits only to computers of the
   same platform -- e.g.: Win XP SP2 only.)

   Workunit trickling (sending information to the server before the
   workunit completes)

   Locality scheduling (sending workunits to computers that already
   have the necessary files and creating work on demand)
   Work distribution based on host parameters (workunits requiring 512
   MB of RAM, for example, will only be sent to hosts having at least that
   much RAM)

   Servers can be clustered for high demand applications. One server
   can handle ~1000 clients

   Sends redundant workunits to various clients to aid in validation of
                             Desktop Grids
BOINC Projects

   The BOINC runtime library is implemented in C++ and is easiest to
   use from C/C++ programs. For this approach, the software must be
   recompiled with the BOINC API for each supported platform (Linux,
   Windows, MacOS)

   Or, for legacy applications where source code is not available or for
   applications written in other languages, BOINC provides a wrapper
   which acts as a main program, managing communication with the
   BOINC client, and running the application as a subprocess.

   Digitally sign the application.

   (Optional) Register with an Account Manager, such as Grid Republic or
   BAM (BOINCStats Account Manager). This makes it easier for users to
   find a project, sign up for it, and manage their account using the BOINC
   desktop client.
                            Desktop Grids
BOINC Desktop Application
                            Desktop Grids
BOINC Desktop Application

    Runs with the same privileges as the user who started the applications.
    Don't run as an administrator! The only other security provided is from
    the digital signatures of the downloaded applications. Data may also be
    encrypted, but users may still be able to view unencrypted data in a
    Allows users to sign up easily for new projects.

    Lets you configure your machine for the amount of workload and the
    scheduling of the workload, CPU, memory, and disk use.

    Lets you view various statistics about the projects running on your
                             Desktop Grids
Desktop Grid Applications

  The applications exhibit large degrees of parallelism (thousands to even
  hundreds of millions) with little or no coupling
  May lead to the reevaluation of many existing algorithms to find novel
  uncoupled approaches

  GIMPS (Great Internet Mersenne Prime Search): First application
  ported to the Entropia Desktop Grid

  Virtual Screening: The testing of hundreds of thousands (to millions) of
  candidate drug molecules to see if they alter the activity of a target protein
  by evaluating the binding affinity of the test molecule to a specific place on
  a protein in a process called docking. The amount of data required for
  each molecular evaluation is small - basically the atomic coordinates of the
  molecules - and the essential results are even smaller, a binding score.
                            Desktop Grids
Desktop Grid Applications (cont)

   Sequence Analysis: DNA or protein sequence analysis
   applications, including the BLAST, HMMER, and versions of Smith–
   Waterman programs. One sequence or set of sequences is
   compared to another sequence or set of sequences and evaluated
   for similarity. The sequence sizes vary, but each comparison is
   independent so that sets of millions of sequences (gigabytes) can
   be partitioned into thousands of slices. Each compute client
   receives a set of sequences to compare and the size of the
   database, enabling it to calculate expectation values properly for the
   final composite result.

  Molecular Properties and Structure: These programs are deployed in
  a data parallel mode similar to docking, where the data-parallelism
  arises from independent molecule evaluations.
                            Desktop Grids
Desktop Grid Applications (cont)

  Financial Risk Management: Involves the use of Monte Carlo methods to
  evaluate a wide range of possible outcomes The number needed to achieve
  seed independence is about 10,000. Further increases in the number of
  samples as well as increased model complexity can increase the accuracy of
  results. Each sample simulation is independent, and can be executed on a
  distinct processors
                           Desktop Grids
Desktop Grid Performance

                           Desktop Grids
Desktop Grid Performance
   SZTAKI Grid: Running averages from the previous 48 hours
        3488 Hosts
        983 Gflops avg.
        3.4 Tflops peak
                           Desktop Grids
Desktop Grid Performance

   BONIC (The Computational and Storage Potential of Volunteer Computing
             David P. Anderson1 and Gilles Fedak2)
   The BOINC client executing the Whetstone and Dhrystone benchmarks..

To top