Google PageRank
How to increase your PR?
Laure Ninove, Cristobald de Kerchove and Paul Van Dooren CESAME, Université catholique de Louvain, Belgium Golub memorial, Feb 29, 2008
Google sorts webpages according to their PRs
Improving your PR will make your webpage more visible
The PR equations
PR vector π is obtained from a combination of a random walk on the graph, with probability c and a preferential zapping, with probability (1-c)
The PR equations
PR vector π is obtained from a combination of a random walk on the graph, with probability c and a preferential zapping, with probability (1-c)
Stochastic matrix: Dii = outdegree(i) A = adjacency matrix
The PR equations
PR vector π is obtained from a combination of a random walk on the graph, with probability c and a preferential zapping, with probability (1-c)
Damping factor c in ]0,1[ Stochastic matrix: Dii = outdegree(i) A = adjacency matrix
The PR equations
PR vector π is obtained from a combination of a random walk on the graph, with probability c and a preferential zapping, with probability (1-c)
Damping factor c in ]0,1[ Stochastic matrix: Dii = outdegree(i) A = adjacency matrix Personalization vector z > 0 and zT e =1.
The Google matrix G
is the left eigenvector of
The Google matrix G
is the left eigenvector of
G is irreducible and stochastic, therefore is the stationary distribution of the corresponding Markov chain
The Google matrix G
is the left eigenvector of
G is irreducible and stochastic, therefore is the stationary distribution of the corresponding Markov chain G is the transition probability matrix
How to improve your PR?
How to improve your PR?
adding any in-link and removing well-chosen outlinks
How to improve your PR?
Add inlinks
How to improve your PR?
Add inlinks
0 0 1 1 0 0 0 0 1
0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
How to improve your PR?
Add inlinks
you
0 0 1 1 0 0 0 0 1
0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
How to improve your PR?
Add inlinks
you
0 0 1 1 0 0 0 0 1
0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
How to improve your PR?
Add inlinks
you
0 0 1 1 0 0 0 0 1
0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
You have 3 inlinks. Adding new inlinks, improves always your PR.
How to improve your PR?
Add inlinks
How to improve your PR?
Choose outlinks
How to improve your PR?
Choose outlinks
you
0 0 1 1 0 0 0 0 1
0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
You have 2 outlinks. Adding/Removing new outlinks, does not always improve your PR.
How to improve your PR?
Choose outlinks
Optimal choice of outlinks
for a single node
for a set of nodes
Optimal choice of outlinks
for a single node
for a set of nodes
Optimal choice of outlinks
for a single node
The Google matrix :
you = a single node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
Optimal choice of outlinks
for a single node
The Google matrix :
you = a single node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 FIXED 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
We maximize the PR of your single webpage by choosing the outlinks.
Optimal choice of outlinks
for a single node
Proposition : the maximal PR for node 1 is obtained by having only one outlink to a particular parent of node 1. But some parents may be less interesting than nodes that are not a parent of node 1.
Optimal choice of outlinks
for a single node
Optimal choice of outlinks
for a single node
You must choose the nodes with smallest mean return time to your node; this can be expressed in terms of G
Optimal choice of outlinks
for a single node
for a set of nodes
Optimal choice of outlinks
for a set of nodes
The Google matrix :
you = a single node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
Optimal choice of outlinks
for a set of nodes
The Google matrix :
you = a set of node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
Optimal choice of outlinks
for a set of nodes
The Google matrix :
you = a set of node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
You have 6 outlinks. You have 5 inlinks.
Optimal choice of outlinks
for a set of nodes
The Google matrix :
you = a set of node
0 1 1 1 0 0 0 0 1
0 0 1 0 1 0 1 0 0
1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 FIXED 0 0 1 0 0 0 1 0 1 0 1 0
0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 1 0 0
1 0 1 0 0 1 0 0 0
We maximize the sum of PRs of the set of nodes by choosing the outlinks.
Optimal choice of outlinks
for a set of nodes
Proposition : the maximal sum of PRs is obtained by having k outlinks under the constraint that the set of nodes must have at least k outlinks (in order not to be dismissed by Google)
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases. Optimal choice depends on G and does not depend explicitly on z !
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
25
FIXED
25
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
zoom
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
99
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
FIXED
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
Optimal local choice Monotonic function Random
FIXED
99
Optimal choice of outlinks
for a set of nodes
Idea : it is always possible to remove an outlink in such a way that the sum of PRs increases.
The optimum is not always reached by the monotonic or local optimal choice
Optimal local choice Monotonic function Random
99
Concluding remarks
We modify the outlinks of
Concluding remarks
We modify the outlinks of
Results do not extend to maximization of cTπ FIXED
Concluding remarks
We modify the outlinks of
Results do not extend to maximization of cTπ FIXED
What about modifying its internal structure ?
Optimal structure is knwon but complex Adding a link between 2 pages of can decrease the PR of one of these pages or even, can decrease the sum of their PRs !
FIXED
What about Gene’s graph ?
What about Gene’s graph ?
The hub-authority algorithm used to rate « authorities » is nothing but the SVD …