Docstoc

how latent semantic indexing is achieved.txt

Document Sample
how latent semantic indexing is achieved.txt Powered By Docstoc
					==== ====

For the best related Tips:
http://adf.ly/4vM5h

==== ====
How Latent Semantic Indexing Is Achieved

In order to understand how Latent Semantic Indexing is
achieved, it is important to know some basic high
school math, particularly Cartesian coordinates.

Typically when a search query is sent a term-document
matrix is created. The pages that have been previously
processed send back results that contain the correct
semantic meanings.

All formatting from the pages including
capitalization, punctuation and extraneous makeup are
removed.

Also, the conjunctions, common verbs, pronouns and
prepositions are removed. Lastly, the common endings
are removed and what you have left are the stem words.

In order to plot the position of the web page, you
need to think of the page in terms of a three -
dimensional shape.

Using three words instead of three lines, you are able
to achieve this image. The position of every page that
contains these three words is known as a term space.

Each page forms a vector in the space and the vectors
direction and magnitude determine how many times the
three words appear in the structure.

With three words, it is easy to imagine what the
resulting form may look like, and the resulting query
would turn up a good number of correct searches.

Instead, if every word and every page were
represented, then the dimensions would be endless. But
it is not practical to assume seeing every web page in
existence. This is just not possible, nor is it
probable.
Typically a term-document matrix is created from pages
that have been pre-processed. This is so that only the
words, which have the semantic meaning, remain. All
formatting of the pages include capitalization,
punctuation.




==== ====

For the best related Tips:
http://adf.ly/4vM5h

==== ====

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/20/2012
language:
pages:2