Google_s_Query_Refinements by zhucezhao

VIEWS: 3 PAGES: 3

									Google’s Query Refinements

Word Count:
1096

Summary:
Search engines aim to provide the most relevant results in response to
queries but limitations can be seen on what is actually returned based on
the queries used. Search queries can either be too specific or too
general for search engines to recognize good results. Google has filed
patent applications regarding alternative query terms or query
refinements to offer a solution.

The Google Solution

Search queries that are not too effective in providing good results
include...


Keywords:
google, internet, online business, seo


Article Body:
Search engines aim to provide the most relevant results in response to
queries but limitations can be seen on what is actually returned based on
the queries used. Search queries can either be too specific or too
general for search engines to recognize good results. Google has filed
patent applications regarding alternative query terms or query
refinements to offer a solution.

The Google Solution

Search queries that are not too effective in providing good results
include homonyms which are words that have the same sound or spelling but
different meanings. Improper contexts in the choice of words can also be
very confusing especially to search engines. Very general terms provide
results that are too broad while very narrow terms can be very
restrictive and may provide non-responsive search results.

Google presents a system and method that attempts to address this
particular problem. In this system, a stored query and a stored document
are associated as a logical pairing. The pairing is assigned a weight
thus when a search query is issued, a set of search documents is
produced. There is at least one search document that matches at least one
document. Retrieval is done when the stored query and the assigned weight
associated with it matches at least one stored document. A cluster is
formed through this and scoring is done on at least one cluster relative
to at least one other cluster. At least one such scored query is
suggested as a set of query refinements.

The process starts when Google finds results by choosing the top 100
documents for clustering. During this phase, term vectors are computed
for each of the said documents which were ranked by relevance score. The
documents are matched to a stored document listed in an association
database. Alternative query terms are found by looking at associations
with queries that had been computed beforehand for the matched stored
documents.

Term vectors are also created for alternative query terms. Clusters are
created from both sets of term vectors to form groupings. Each cluster
has a calculated cluster centroid. Search queries associated with a
search document in the cluster are scored according to the distance from
this centroid and the percent of stored documents occurring in the
cluster. The best suggested query refinement contains the highest number
search query terms and the most frequently seen in the documents in the
cluster.

Other clusters and query names may be created to come up with additional
suggested query refinements. Refinements are sorted by relevance scores.
Alternative queries can include negated forms of terms appearing in the
set of refinements but does not appear on the original search query. A
number of predetermined search queries selected from past user queries
can be used to arrive at a precomputed possible set of refinements. The
predetermined queries would be issued while search results are maintained
in a database for future user search requests. The refined queries would
be provided to the user together with the results of the original search.

The precomputation stage happens before any query is entered into the
search engine. It is best described with the use of at least four parts –
associator, selector, regenerator and inverter.

The associator creates relevance-weighted relationships between stored
queries and stored documents. The selector decides which stored documents
and stored queries should be retrieved. The regenerator looks at query
logs and selects stored documents based on previous searches. The
inverter looks at the cached data and selects documents and associated
queries based on the cached data.

The query refinements system itself has four parts. A matcher matches one
or more stored documents to the actual search documents which have been
generated by the search engine to answer a search query. It also
identifies the stored queries and assigned weights using the associations
corresponding to the matched stored documents. A clusterer forms one or
more clusters using term vectors formed from the terms occurring in the
matched stored queries and corresponding weights. The scorer computes
centroids which represent the weighted center of each cluster’s term
vector. A presenter identifies the highest scoring search queries as one
or more query refinements to the user. The interesting aspect about this
approach is how user data is incorporated into results through the use of
log files and cached information.

The patent application shows one way of achieving query refinements but
no one really knows for sure exactly how Google comes up with alternative
results. However, it offers some hints on how to create contents on
websites and how to show up in these alternative results. By taking into
careful consideration the words that people will probably search for and
what appears in Google’s results for search phrases, a clue can be
provided on how the search refinements approach will treat a website.

Multi-Stage Query Processing

The determination of page relevancy in responding to queries from
searchers considers how a term or phrase is used in the context of a
page. A patent application that looks into the possible ways of
considering the context of these words was likewise submitted by Google.
It describes a multi-stage process that determines relevancy and finds
results to a search.

The possible actions to be taken as described in this document can be
divided into stages. The first stage deals with deletion of stop words,
term stemming and expansion of queries to use things like synonyms and
related terms that commonly co-occur with them. During this stage, the
relevancy scores are created between query and each document computed
with one or more scoring algorithms. The second stage uses adjacency and
proximity of terms to rank documents. The third stage reviews the term
attributes such as determining whether terms are titles, headings,
metadata or whether these terms possess certain font characteristics. The
fourth and last stage is the generation of snippets to return with
results.

Interactive query refinements have shown that it can promote effective
retrieval. Major search engines use the history of a user’s actions such
as queries or clicks to personalize search results. The query-specific
web recommendations (QSRs) retroactively answer queries from the user’s
history as new results arise. Its main goal is to recommend new web pages
for user’s old queries. However, this will not be of any use unless the
user has a standing interest in a particular query. Focus can also be
shifted from individual queries to query sessions which includes all
actions associated with a given initial query. A query is considered a
query refinement of the previous one if both queries contain at least one
common term.

								
To top