Web data mining is systematic approach to keyword based and hyperlink based web
research for gaining business intelligence. It requires analytical skills to understand
hyperlink structure of given website. Hyperlinks possess enormous amount of hidden
human annotations that can help automatically understand the authority. If the
webmaster provides a hyperlink pointing to another website or web page, this action
is perceived as an endorsement to that webpage. Search engines highly focus on
such endorsements to define the importance of the page and place them higher in
organic search results.

However every hyperlink does not refer to the endorsement since the webmaster
may have used it for other purposes, such as navigation or to render paid
advertisements. It is important to note that authoritative pages rarely provide
informative descriptions. For an instant, Google ’s homepage may not provide explicit
self-description as “Web search engine.”

These features of hyperlink systems have forced researchers to evaluate another
important webpage category called hubs. A hub is a unique, informative webpage
that offers collections of links to authorities. It may have only a few links pointing to
other web pages but it links to a collection of prominent sites on a single topic. A hub
directly awards authority status on sites that focus on a single topic. Typically, a
quality hub points to many quality authorities, and, conversely, a web page that
many such hubs link to can be deemed as a superior authority.

Such approach of identifying authoritative pages has resulted in the development of
various popularity algorithms such as PageRank. Google uses PageRank algorithm to
define authority of each webpage for a relevant search query. By analyzing hyperlink
structures and web page content, these search engines can render better-quality
search results than term-index engines such as Ask and topic directories such as

