FlumeJava Easy_ Efficient Data-Parallel Pipelines
Shared by: pptfiles
-
Stats
- views:
- 2
- posted:
- 2/26/2013
- language:
- English
- pages:
- 12
Document Sample


GraphLab
A New Parallel Framework for
Machine Learning
Carnegie Mellon
Based on Slides by Joseph Gonzalez
Mosharaf Chowdhury
The Need for a New Abstraction
Data-Parallel Graph-Parallel
Map Reduce Pregel (Giraph)
Feature Cross
Extraction Validation Belief
Kernel
SVM Propagation
Methods
Computing Sufficient
Statistics Tensor PageRank
Factorization
Lasso
Deep Belief Neural
Networks Networks
2
GraphLab wants to support
1. Sparse Computational Dependencies
2. Asynchronous Iterative Computation
3. Sequential Consistency
4. Prioritized Ordering
5. Rapid Development
The GraphLab Framework
Graph Based Update Functions
Data Representation User Computation
Scheduler Consistency Model
4
Data Graph
A graph with arbitrary data (C++ Objects) associated
with each vertex and edge.
Graph:
• Social Network
Vertex Data:
• User profile text
• Current interests estimates
Edge Data:
• Similarity weights
5
Update Functions
An update function is a user defined program which when
applied to a vertex transforms the data in the scope of the vertex
label_prop(i, scope){
// Get Neighborhood data
(Likes[i], Wij, Likes[j]) scope;
// Update the vertex data
Likes[i] ¬ å Wij ´ Likes[ j];
jÎFriends[i]
// Reschedule Neighbors if needed
if Likes[i] changes then
reschedule_neighbors_of(i);
}
6
The Scheduler
The scheduler determines the order that vertices are updated.
a b c d
CPU 1
Scheduler
b
a e f g
i
h
h i j k
CPU 2
The process repeats until the scheduler is empty.
7
Sequential Consistency Models
– Full Consistency
Write Write Write
Canonical Lock Ordering
– Edge Consistency
Read Write Read
Read Write
Consistency Through Scheduling
• Edge Consistency Model:
– Two vertices can be Updated simultaneously if they do
not share an edge.
• Graph Coloring:
– Two vertices can be assigned the same color if they do
not share an edge. Phase 1 Phase 2 Phase 3
Barrier
Barrier
Barrier
Algorithms Implemented
• PageRank
• Loopy Belief Propagation
• Gibbs Sampling
• CoEM
• Graphical Model Parameter Learning
• Probabilistic Matrix/Tensor Factorization
• Alternating Least Squares
• Lasso with Sparse Features
• Support Vector Machines with Sparse Features
• Label-Propagation
• …
The Table
Get documents about "