FlumeJava Easy_ Efficient Data-Parallel Pipelines

Shared by: pptfiles
Categories
Tags
-
Stats
views:
2
posted:
2/26/2013
language:
English
pages:
12
Document Sample
scope of work template
							GraphLab
A New Parallel Framework for
Machine Learning

          Carnegie Mellon
        Based on Slides by Joseph Gonzalez
                Mosharaf Chowdhury
             The Need for a New Abstraction


             Data-Parallel           Graph-Parallel


 Map Reduce                     Pregel (Giraph)
 Feature            Cross
Extraction        Validation                            Belief
                                          Kernel
                               SVM                   Propagation
                                         Methods
   Computing Sufficient
       Statistics                 Tensor           PageRank
                               Factorization
                                                              Lasso
                                   Deep Belief      Neural
                                    Networks       Networks
                                                                  2
       GraphLab wants to support


1.   Sparse Computational Dependencies
2.   Asynchronous Iterative Computation
3.   Sequential Consistency
4.   Prioritized Ordering
5.   Rapid Development
    The GraphLab Framework
   Graph Based        Update Functions
Data Representation   User Computation




    Scheduler         Consistency Model



                                          4
                  Data Graph
A graph with arbitrary data (C++ Objects) associated
with each vertex and edge.

                                Graph:
                                • Social Network

                                Vertex Data:
                                • User profile text
                                • Current interests estimates

                                Edge Data:
                                • Similarity weights



                                                                5
                  Update Functions
An update function is a user defined program which when
applied to a vertex transforms the data in the scope of the vertex


                               label_prop(i, scope){
                                 // Get Neighborhood data
                                 (Likes[i], Wij, Likes[j]) scope;
                                   // Update the vertex data
                                   Likes[i] ¬      å           Wij ´ Likes[ j];
                                                jÎFriends[i]
                                   // Reschedule Neighbors if needed
                                   if Likes[i] changes then
                                     reschedule_neighbors_of(i);
                               }


                                                                              6
                   The Scheduler
The scheduler determines the order that vertices are updated.

                               a        b        c        d
                  CPU 1
  Scheduler



              b
              a                     e        f        g

              i
              h
                               h         i        j       k
                  CPU 2



     The process repeats until the scheduler is empty.
                                                                7
  Sequential Consistency Models
– Full Consistency



            Write           Write             Write
                    Canonical Lock Ordering
– Edge Consistency



             Read           Write             Read
                                              Read    Write
    Consistency Through Scheduling
• Edge Consistency Model:
  – Two vertices can be Updated simultaneously if they do
    not share an edge.
• Graph Coloring:
  – Two vertices can be assigned the same color if they do
    not share an edge.     Phase 1    Phase 2    Phase 3




                                 Barrier




                                             Barrier




                                                        Barrier
          Algorithms Implemented
•   PageRank
•   Loopy Belief Propagation
•   Gibbs Sampling
•   CoEM
•   Graphical Model Parameter Learning
•   Probabilistic Matrix/Tensor Factorization
•   Alternating Least Squares
•   Lasso with Sparse Features
•   Support Vector Machines with Sparse Features
•   Label-Propagation
•   …
The Table

						
Related docs
Other docs by pptfiles