PageRank and HITS PageRank Repeated here for comparison only

PageRank and HITS ✓ PageRank  Repeated here for comparison only Hypertext induced topic search Internet Information 2006/2007 Web retrieval (3) March 5, 2007  HITS  Valentin Jijkoun Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl/Teaching/II0607 2 PageRank PageRank example (1) 3 4 PageRank example (2) PageRank example (3) 5 6 PageRank example (4) PageRank example (5) 7 8 PageRank and HITS ✓  HITS  PageRank  Idea due to Kleinberg [1998] There are two kinds of web pages:   Repeated here for comparison only Hypertext induced topic search HITS   Authorities Hubs     Authorities are web pages to which many hubs point Hubs are web pages that point to many authorities A web page is an authority or a hub to a certain degree The degree is computed recursively 9 10 HITS (2)  Computing hubs and authorities Given a query, perform regular term-based retrieval      The set W contains the top t (~200) pages Expand the set W to the set S by adding all pages that link to or are linked from S Restriction: a page in W cannot add more than m (~50) pages (Delete domain-internal links) Put the remaining links into E and the pages in V and we have a (small) web graph G = (V, E) W S 11 12 Flashback  HITS algorithm   Bibliometric analysis   Citation analysis Citations generate “links” Compute iteratively the hub score and the authority score for page in V Rank the documents with respect to their authority score The original HITS algorithm identifies authorities per query   Two key notions  Co-citation    If papers i and j are both cited by k, they are said to be co-cited by k ~authority If papers i and j both cite paper k, there is a bibliographic coupling between them ~hub Alternative: compute authorities globally (T = whole collection)  Bibliographic coupling    Three shortcomings of the HITS algorithm    ... ... ... 13 14 Integration into the retrieval model   PageRank and HITS ✓ ✓ Page content Page structure   PageRank  Repeated here for comparison only Hypertext induced topic search Layout Positional information Uses link structure “Slash counting” HITS       Page ranking (PageRank or HITS)  Site structure  Age of page “Physical location” of page  Uses positional information about the user Observed user behavior 15 16

Related docs
pagerank
Views: 128  |  Downloads: 7
The Value of Pagerank
Views: 38  |  Downloads: 0
PageRank for Product Image Search
Views: 20726  |  Downloads: 329
The_5_Myths_About_Google_Pagerank
Views: 5  |  Downloads: 0
Other docs by Zach McClure
Authorization to Release Information
Views: 223  |  Downloads: 1
RESIDENTIAL LEASE GUARANTY
Views: 257  |  Downloads: 4
Demand for repayment of advances
Views: 153  |  Downloads: 3
Commercial Gross Lease
Views: 591  |  Downloads: 26
In or for business
Views: 637  |  Downloads: 9
Sample Executive Summary SingleCenter
Views: 462  |  Downloads: 8
ContentSpecs81706
Views: 88  |  Downloads: 0
Affidavit that there are no creditors
Views: 195  |  Downloads: 1
Storage Contract
Views: 504  |  Downloads: 27
103_3-day_Notice_To_Pay_Rent_Or_Move_Out
Views: 280  |  Downloads: 8
Chapter 7 bankruptcy
Views: 562  |  Downloads: 19
Virginia Plan info
Views: 368  |  Downloads: 0
Form 8839 Qualified Adoption Expenses
Views: 151  |  Downloads: 1
Commitments Subject to Rescission by Borrowers
Views: 136  |  Downloads: 0