Keyword Search in Databases using PageRank

Reviews
Shared by: One Seven
Stats
views:
153
rating:
not rated
reviews:
0
posted:
4/23/2009
language:
English
pages:
0
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003 Roadmap   PageRank: Ranking Web Pages using link structure Ranking Keyword Search Results in Structured Databases  Ranking Combining Individual PageRanks Roadmap   PageRank: Ranking Web Pages using link structure of the web Ranking Keyword Search Results in Structured Databases  Ranking Combining Individual PageRanks PageRank(1)   Stanford project Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd. “The PageRank Citation Ranking: Bringing Order to the Web”.  Started Google PageRank(2)      Make use of the link structure of the web to calculate a quality ranking (PageRank) for each web page. Citation counting a metric for measuring page/paper quality PageRank a more sophisticated citation counting method, not prone to manipulation. Each page has unique PageRank, independent of keyword query PageRank does NOT express relevance of page to query PageRank (3)    Calculation Intuition :PageRank of page P increases when pages with large PageRanks point to P. The rank of a page is evenly distributed among its forward links. A problem: When two pages form a loop by pointing to each other but no other page, then in every iteration this loop accumulates and never distributes rank. This is called rank sink. PageRank is a Usage Simulation  “Random surfer”    Given a random URL Clicks randomly on links After a while gets bored and gets a new random URL  The number of visits to each page is its PageRank. PageRank Calculation PR(A)=(1-d) + d*( PR(T1)/C(T1)+…+ PR(Tn)/C(Tn) ) d: damping factor, normally this is set to 0.85. T1, …, Tn: pages pointing to page A PR(A): PageRank of page A. PR(Ti): PageRank of page Ti. C(Ti): the number of links going out of page Ti. Note: d counts for PageRank sinks Example of Calculation (1) Page A Page B Page C Page D Example of Calculation (2) 1*0.85/2 Page A 1 1*0.85 Page B 1 1*0.85 1*0.85/2 1*0.85 Page D 1 Page C 1 Example of Calculation (3)  Page A 1 Page B 0.575 Each page has not passed on 0.15, so we get: Page A: 0.85 (from Page C) + 0.15 (not transferred) = 1 Page B: 0.425 (from Page A) + 0.15 (not transferred) = 0.575 Page C: 0.85 (from Page D) + 0.85 (from Page B) + 0.425 (from Page A) + 0.15 (not transferred) = 2.275 Page D: receives none, but has not transferred 0.15 = 0.15 Page C 2.275 Page D 0.15 Example of Calculation (4) Page A 2.08375 Page B 0.575 Page A: 2.275*0.85 (from Page C) + 0.15 (not transferred) = 2.08375 Page B: 1*0.85/2 (from Page A) + 0.15 (not transferred) = 0.575 Page C: 0.15*0.85 (from Page D) + 0.575*0.85(from Page B) + 1*0.85/2 (from Page A) +0.15 (not transferred) = 1.19125 Page D: receives none, but has not transferred 0.15 = 0.15 Page C 1.19125 Page D 0.15 Example - Conclusions   Page C has the highest PageRank, and page A has the next highest: page C has a highest importance in this page graph! More iterations lead to convergence of PageRanks. Base set   In practice when the user gets bored tends to use his bookmarked pages instead of a random one. These bookmarked pages constitute the base set. The PR formula is modified to reflect this behavior. PR(A)=(1-d)*E + d*( PR(T1)/C(T1)+…+ PR(Tn)/C(Tn) ) If A in base set E = 1 else E = 0 Roadmap   PageRank: Ranking Web Pages using link structure Ranking Keyword Search Results in Structured Databases  Ranking Combining Individual PageRanks Keyword Query Input: set of keywords Output: List of nodes ranked according to their relevance to the keywords Score of a result-node: • Sum of keyword-specific PRs (OR semantics) • Product of keyword-specific PRs (AND semantics) Database Schema C(cid,name) Y(yid,year,cid) P(pid,title,yid) A(aid,name) PP(pid1,pid2) PA(pid,aid) Tupples in C, Y, P, A are objects that represent nodes in schema graph Primary to foreign key relations represent edges in the graph All connections are two way except P – P that is only from paper to cited paper C: conference Y: conference year P: paper A: author : primary to foreign key Architecture d, edge weights, epsilon, threshold Database Keywords, k Create PR index Query Module Attributes of PRindex table: •Keyword •CLOB of (id,PR) list PRindex List of •Nodeid •Node text •PR wrt all keywords Results Preprocessing stage Query stage Modified PageRank Formula PR(A)=(1-d) + d*(weight(T1→A)*PR(T1)/C(T1)+…+ weight(Tn→A)*PR(Tn)/C(Tn)), if A has keyword PR(A)=d*(weight(T1→A)*PR(T1)/C(T1)+… + weight(Tn→A)*PR(Tn)/C(Tn)), if A doesn’t have keyword Preprocessing stage (1)  Load whole database in memory    Create edges Hashtable ( nodeId, nodeId, Type of edge ) Create nodes Hashtable ( nodeId ) Create text Hashtable ( nodeId, text )  For each keyword   Find all nodes that contain keyword and put them in base set. Execute PR algorithm with base set. Preprocessing stage (2)   Create descending list of (nodeid,PR) pair. Store list in CLOB in PRindex table indexed by keyword. Query Stage   For each keyword in input retrieve ( id, PR ) list from database. Resolve top-k ids with respect to the sum of Page ranks using Fagin’s algorithm (PODS 2001). Fagin’s Algorithm  Descending sorted keyword-specific PR lists   Keep the maximum possible value of a node that is the current PR for node extracted so far in scanned lists plus the PR of currently pointed nodes in other lists. Keep the minimum value that is the current PR for node. Algorithm terminates when it finds k objects of which minimum value is greater than the maximum PR value for the rest of nodes. Conclusions   We implemented a system for keyword search in databases using PageRank. It uses an index of keyword specific Object Ranks

Related docs
Keyword Search in Databases
Views: 0  |  Downloads: 0
Keyword Search in Structured Databases
Views: 13  |  Downloads: 4
pagerank
Views: 270  |  Downloads: 0
Keyword Search in Structured Databases
Views: 0  |  Downloads: 0
Keyword Keyword Type
Views: 84  |  Downloads: 2
premium docs
Other docs by One Seven
Certificate of organization
Views: 220  |  Downloads: 3
3-day_Notice_To_Pay_Rent_Or_Move_Out
Views: 392  |  Downloads: 12
Private Equity
Views: 697  |  Downloads: 28
Security_Deposit_Refund
Views: 488  |  Downloads: 18
Iowa articles of incorporation
Views: 309  |  Downloads: 5
Transcript of Treaty of Alliance with France
Views: 159  |  Downloads: 0
CAH-HIT-Briefing-Paper
Views: 171  |  Downloads: 3
RELEASE OF MORTGAGE
Views: 461  |  Downloads: 5
Municipal parking space rental permit
Views: 1133  |  Downloads: 2
Of individual or individual1
Views: 117  |  Downloads: 0