indexing by latent semantic analysis by truth4reviews


									Indexing By Latent Semantic Analysis

Indexing by latent semantic analysis is natural
language processing technique of vectorial semantics
that analyzes the relationship between documents and
the terms contained within. They also produce a set of
concepts related to the documents.

The new concepts of space from the latent semantic
indexing analysis can be used to compare the documents
in the concept space. This is also known as data
clustering or document classification.

They can be used to find similar documents across
languages, which is called cross language retrieval,
and can be used to find relations between terms, known
as synonymy and polysmemy.

Given a query of terms, the LSI analysis can be
translated into the concept space and find matching
documents. This is commonly known as information

But a fundamental problem with the synonymy and
polysemy is in the natural language processing.
Synonymy is where different words describe the same

A query in a search engine may fail to retrieve a
document that does not contain the words appearing in
the query, even if the document is relevant. So even
if words have the same meanings, the search query may
not turn up all related articles.

Polysemy is where the same word has multiple meanings.
When a query is made the search may return irrelevant
documents containing the desired word in the wrong

LSI adds an important step to the indexing process.
LSI records which keywords a document contains as well
as examines the whole collection to see which other
documents contain the same keywords.

To top