PageRank and HITS
✓
PageRank
Repeated here for comparison only Hypertext induced topic search
Internet Information 2006/2007
Web retrieval (3)
March 5, 2007
HITS
Valentin Jijkoun Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl/Teaching/II0607
2
PageRank
PageRank example (1)
3
4
PageRank example (2)
PageRank example (3)
5
6
PageRank example (4)
PageRank example (5)
7
8
PageRank and HITS
✓
HITS
PageRank
Idea due to Kleinberg [1998] There are two kinds of web pages:
Repeated here for comparison only Hypertext induced topic search
HITS
Authorities Hubs
Authorities are web pages to which many hubs point Hubs are web pages that point to many authorities A web page is an authority or a hub to a certain degree The degree is computed recursively
9
10
HITS (2)
Computing hubs and authorities
Given a query, perform regular term-based retrieval
The set W contains the top t (~200) pages Expand the set W to the set S by adding all pages that link to or are linked from S Restriction: a page in W cannot add more than m (~50) pages (Delete domain-internal links) Put the remaining links into E and the pages in V and we have a (small) web graph G = (V, E)
W S
11
12
Flashback
HITS algorithm
Bibliometric analysis
Citation analysis Citations generate “links”
Compute iteratively the hub score and the authority score for page in V Rank the documents with respect to their authority score The original HITS algorithm identifies authorities per query
Two key notions
Co-citation
If papers i and j are both cited by k, they are said to be co-cited by k ~authority If papers i and j both cite paper k, there is a bibliographic coupling between them ~hub
Alternative: compute authorities globally (T = whole collection)
Bibliographic coupling
Three shortcomings of the HITS algorithm
... ... ...
13
14
Integration into the retrieval model
PageRank and HITS
✓ ✓
Page content Page structure
PageRank
Repeated here for comparison only Hypertext induced topic search
Layout Positional information Uses link structure “Slash counting”
HITS
Page ranking (PageRank or HITS)
Site structure
Age of page “Physical location” of page
Uses positional information about the user
Observed user behavior
15
16