Probabilistic Latent Semantic Indexing
Probabilistic latent semantic indexing is an automated
document that is based on a statistical latent model
for factor analysis of data.
It is an approach to automatic indexing and
information retrieval, which overcomes problems by
mapping documents and terms to a LSI space.
Although LSI has been applied with much success in
different domains, it has a number of deficits. These
are due to its statistical foundation.
One typical scenario of human and machine interaction
in the information retrieval is by using natural
A natural language query provides a number of key
words and expects the system to pull up all relevant
articles or pages that include the key words.
But the systems are not infallible. Most search
engines will come up with a big number of unrelated
searches. This is usually due to a key word having two
meanings or where an idea, or multiple uses of key
words comes up with many words.
These problems are called polysymy and synonymy. But
many of the newer, better-derived latent semantic
indexing programs have reduced much of this unneeded
Many retrieval methods are based on simple word
matches. It is well known that literal term matching
has severe drawbacks.
But newer LSA's are more specific in their searching
and do a much better job than what the old search
queries would give for results.
The standard procedure for maximum likelihood
estimates a latent variable model as the expectation.