PageRank Computing the PageRank The World’s Largest Eigenvalue by emadhafed


									                              Computing the                                                                            PageRank
                                        The World’s Largest Eigenvalue Problem

What is PageRank                                            The Link Structure Matrix P                                     The PageRank Algorithm

A search with Google’s search engine usually returns a            1           2            3                                The Google matrix P is currently of size 4.2×109
very large number of pages. E.g., a search on ‘weather                                                                      and therefore the eigenvalue computation is not
forecast’ returns 5.5 million pages.                                                                                        trivial. To find an approximation of the principal
                                                                                               A tiny internet              eigenvector the power method is used:
                                                                  4           5                with 5 pages.
                                                                                                                                      w0 = initial guess
Although the search returns several million pages, the                                                                                For k = 1 to 50
most relevant pages are usually found within the top        The model forming the basis of the PageRank
                                                                                                                                            wk = P*wk-1
ten or twenty pages in the list of results.                 algorithm is a random walk through all the pages of
                                                            the Internet. Let pt(x) denote the possibility of being
                                                            on page x at time t. The PageRank of page x is                            Return w50
How does the search engine know which pages are
the most important?                                         expressed as lim(pt (x)) for t → ∞. To make sure the
                                                            random walk process does not get stuck, pages with              The special properties of the matrix P ensures that the
                                                            no out-links (here: page 3) are assigned artificial links       largest eigenvalue is λ = 1, rendering normalisation in
Google assigns a number to each individual page,
                                                            or “teleporters” to all other pages.                            the power method unnecessary. Fast convergence of
expressing its importance. This number is known as the
                                                                                                                            the power method makes 50 iterations adequate.
PageRank and is computed via the eigenvalue problem
                                                            1           2           3           0 1/ 3 1/ 5 1/ 2      0   Because the computation involves an extremely large
                   Pw=λ w
                                                                                                1/ 3 0 1/ 5 0         0
                                                                                                                          matrix, the matrix-vector multiplications must be
                                                                                           P =  0 1/ 3 1/ 5 0         1
                                                                                                                          implemented in parallel on multi-processor systems.
where P is based on the link structure of the Internet.                                         1/ 3 0 1/ 5 0         0
                                                                                                1/ 3 1/ 3 1/ 5 1/ 2   0
                                                            4           5                                              

      PageRank                                              The matrix P is irreducible and stochastic and therefore
                                                            the random walk can be expressed as a Markov chain,
The key problem is to formulate the link structure, i.e.,   and the PageRank of all pages can be computed as the
the matrix P, in a proper way.                              principal eigenvector of P.

                 Informatics and Mathematical Modelling

To top