Indexing By Latent Semantic Analysis Indexing by latent semantic analysis is natural language processing technique of vectorial semantics that analyzes the relationship between documents and the terms contained within. They also produce a set of concepts related to the documents. The new concepts of space from the latent semantic indexing analysis can be used to compare the documents in the concept space. This is also known as data clustering or document classification. They can be used to find similar documents across languages, which is called cross language retrieval, and can be used to find relations between terms, known as synonymy and polysmemy. Given a query of terms, the LSI analysis can be translated into the concept space and find matching documents. This is commonly known as information retrieval. But a fundamental problem with the synonymy and polysemy is in the natural language processing. Synonymy is where different words describe the same idea. A query in a search engine may fail to retrieve a document that does not contain the words appearing in the query, even if the document is relevant. So even if words have the same meanings, the search query may not turn up all related articles. Polysemy is where the same word has multiple meanings. When a query is made the search may return irrelevant documents containing the desired word in the wrong meaning. LSI adds an important step to the indexing process. LSI records which keywords a document contains as well as examines the whole collection to see which other documents contain the same keywords.
Pages to are hidden for
"indexing by latent semantic analysis"Please download to view full document