Docstoc

Map-V - Microsoft Research

Document Sample
Map-V - Microsoft Research Powered By Docstoc
					Distributed Nonnegative Matrix
Factorization for Web-Scale Dyadic
Data Analysis on MapReduce

Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang

                      Internet Services Research Center (ISRC)
                                 Microsoft Research Redmond
Internet Services Research Center (ISRC)
 • Advancing the state of the art in online services
 • Dedicated to accelerating innovations in search and ad
   technologies
 • Representing a new model for moving technologies quickly
   from research projects to improved products and services
Thursday, 04/29/2010                              Friday, 04/30/2010
10:30~12:00pm: Data Analysis & Efficiency         11:00~12:30pm: Query Analysis
• Distributed Nonnegative Matrix Factorization for • Exploring Web Scale Language Models for
Web-Scale Dyadic Data Analysis on MapReduce        Search Query Processing
                                                   • Building Taxonomy of Web Search Intents for
                                                   Name Entity Queries
                                                   • Optimal Rare Query Suggestion With Implicit
                                                   User Feedback
1:30~3:00pm: Information Extraction               1:30~3:00pm: Infrastructure 2
• Automatic Extraction of Clickable Structured    • Large-scale Bot Detection for Search Engines
Web Contents for Name Entity Queries
       Dyadic Data on the Web
• Web abounds with dyadic data
  – Web search: term by document,
    query by clickedURL, web linkage, …
  – Advertising: query by ad, bid term by ad,
    user by ad, …
  – Social media: tag by image, user by community,
    friendship graph, …
• Common characteristics
  – Good source for discovering latent relationships
  – High dimensionality, sparse, nonnegative, dynamic
Nonnegative Matrix Factorization (NMF)
• Effective tool to uncover latent relationships in
  nonnegative matrices with many applications [Berry et al.,
  2007, Sra & Dhillon, 2006]
   – Interpretable dimensionality reduction [Lee & Seung, 1999]
   – Document clustering [Shahnaz et al., 2006, Xu et al, 2006]
            n                 k               n
                                             H              k

    m       A              m W
                                         A  0,W  0, H  0

• Challenge: Can we scale NMF to million-by-million matrices
    NMF Algorithm [Lee & Seung, 2000]
      n            k             n
                                H            k

     A          W
m                 m
                            A  0,W  0, H  0
      Parallel NMF [Robila & Maciak, 2006]
• Parallelism on multi-core machines
  – Partition along the long dimension for parallelism
  – Assuming all matrices can be held in shared memory
            Distributed NMF
• Data Partition: A, W and H across machines
                                    (i , j , Ai , j )

                              …
       A
                              …
                                                        H
                        (i , wi )
                .....




    W                                                   .....
                                                        ( j, h j )
Copmuting DNMF: The Big Picture


                      T
           X     W A
    H  H.*  H.* T
           Y     W WA
                         A : (i, j , Ai , j )     W : (i, wi )                                     H : ( j, h j )

                                         …                       …                                             …
                                                                Map-III
      Map-I                                                                                                         Map-IV

                                                 (0, wiT wi )
(i , j , Ai , j , wi )                   …                                          ( j, y j )             …
  Reduce-I                                      Reduce-III

 ( j, Ai , j wi )                        …       (0,W TW )

    Map-II                                                                                                                   Map-V


  ( j, Ai , j wi )                       …
                                                                          ( j, h j , x j , y j )               …
                                                                                                                             Reduce-V
  Reduce-II

     ( j, x j )                          …                                    ( j, hnew)
                                                                                    j
                                                                                                               …
                                                 X W A         T


                         A : (i , j , Ai , j )   W : (i, wi )


                                          …                     …

      Map-I

(i , j , Ai , j , wi )
                                          …
  Reduce-I

 ( j , Ai , j wi )                        …
    Map-II

  ( j , Ai , j wi )                       …
  Reduce-II

     ( j, x j )                           …
                   Y  W WH        T



                            W : (i, wi )                           H : ( j, h j )

     .....                                 …                                   …
W            (i , wi )
                                          Map-III
                                                                                    Map-IV

                           (0, wiT wi )
                                                      ( j, y j )           …
                          Reduce-III
    ...                                                            Y  W TWH
                           (0,W TW )
                                                m
                         C  W TW   wiT wi
    ...                                        i 1
                 H  H.* X
                                  Y
                                                 H : ( j, h j )


                                                             …


                                  ( j, y j )
                                                         …


                                                                  Map-V

                        ( j, h j , x j , y j )
                                                             …
                                                                  Reduce
                                                                  -V
( j, x j )   …              ( j, hnew)
                                  j
                                                             …
                         A : (i, j , Ai , j )     W : (i, wi )                                     H : ( j, h j )

                                         …                       …                                             …
                                                                Map-III
      Map-I                                                                                                         Map-IV

                                                 (0, wiT wi )
(i , j , Ai , j , wi )                   …                                          ( j, y j )             …
  Reduce-I                                      Reduce-III

 ( j, Ai , j wi )                        …       (0,W TW )

    Map-II                                                                                                                   Map-V


  ( j, Ai , j wi )                       …
                                                                          ( j, h j , x j , y j )               …
                                                                                                                             Reduce-V
  Reduce-II

     ( j, x j )                          …                                    ( j, hnew)
                                                                                    j
                                                                                                               …
       Experimental Evaluation
• Synthesized data on a sandbox cluster
  – No interference from other jobs
  – Performance with various parameters


• Real-world data on a commercial cluster
  – Real-world scalability
 Synthesized Data on Sandbox Cluster
• A Hadoop cluster with 8 workers in total
  – Worker: Pentium-IV CPU, 1 or 2 cores, 1~2 GB
    memory, 150G hard drive
  – V: Number of workers in cluster
• Matrix simulator
  – Generate m-by-n matrix with sparsity δ
  – k: factorization dimensionality
  – Defaults:
    m  217 , n  216 ,   27 , k  23
       Computation Breakdown




• X  W T A dominates the computation
• Y  W TWH is lightweight
• The sparser, the faster
Performance w.r.t. Parameters




              • Linear to m×n×δ
              • Linear to factorization dimension k
              • Sub-ideal speedup w.r.t. cluster
                size V
    Scalability on Real-world Data
• User-by-Website matrix
   – Browsed URLs of opt-in users, represented by UID
   – URLs trimmed to site level
      • http://www.cnn.com/breakingnews --> www.cnn.com




• Experiments on Microsoft SCOPE
   – SCOPE: Structure Computations Optimized for Parallel
     Execution [Chaiken et al., VLDB’08]
                                       Executions w.r.t. Iterations

                                                                               • Observations
                                                                                  – Longer total elapse time
                                                                                  – Shorter time per iteration

                          5

                         4.5                                                   • Reason
Normalized Elapse Time




                          4                                                       – Overlapped computation
                         3.5
                                                                                    across iterations
                          3

                         2.5
                                                    y = 0.7215x + 0.4226
                          2

                         1.5
                                                         R² = 0.9934
                          1

                         0.5

                          0
                               0   1     2      3      4     5      6      7
                                             Iterations
         Scalability w.r.t. Matrix Size




3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours



            Less than 7 hours on a 43.9M-by-769M matrix
               with 4.38 billion nonzero values
                Conclusion
• NMF is an effective tool to uncover latent
  structures in dyadic data that is abundant on
  the Web
• NMF is admissible to MapReduce
• Distributed NMF solves the scalability
  challenge
• Applications down the road
  Q&A




Thank You!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:2/4/2013
language:Unknown
pages:22