Docstoc

Vector and Latent Semantic Indexing

Document Sample
Vector and Latent Semantic Indexing Powered By Docstoc
					Application of Vector Indexing in
      Information Retrieval




        Darius Mahdjoubi
          Don Drumtra
       September 25, 2000
In the theater, how do you go to your seat?
That depends on who you are!




           Vectors are not used only in math class.
           They can be used in the theaters as well --
                     by “most” people!!
The City of Austin is planned based on the vector system.
“Congress Bridge” is the
origin of this vector system.

Congress St. – like Y axis -
separates Austin into two
parts: East and West.

The river – like X axis -
separates Austin into two
parts: North and South.
The address of any location
can be depicted as vectors
along Congress Street and
then East or West.

  Any vector can broken
  down to its elements as
           well.
In Austin, any two locations can
be compared based on how much
  is the ANGLE between their
 vectoral addresses and how far
   they are from the Congress
             Bridge.

We can think of “neighborness!!”
 of homes also according to the
     their vectoral address!

If the “angle” between the vectroal
address of two homes is small then
       they can be neighbors!
US ZIP Codes is based on a type of Vector Representation
Canadian Postal Code is also a type of Vector Representation




Note the similarities between the ZIP (Postal) coding and
the library classification schemes.
Can the library classification schemes be also conceived
as a type of vector system?
     Vectors are used for information retrieval about time




12 is the main axis and we measure time according angle of indicators
Are we using vectors indexing for information retrieval
              system in the car panel?
Vector indexing can be mixed with fuzzy logic in the
“temperature gauge” and “gas tank gauge” as in the above
information retrieval system.
 Application of Vector Indexing in Information Retrieval

In “Vector indexing” for information retrieval each “subject
heading” in a document or a query is treated like the
dimension of a vector system and the relative “weight” of the
subject heading is like the length of the vector. If we have “N”
subject heading, then we need a “N dimensional” vectoral
system.
In the “N dimensional vectoral” system, we can depict the
vectors for both the “document” and the “query.” The “angel
between the vectors of the document and the query” defines
how much they are similar. If the angle between document
and query is small, they are similar. If the angle between the
document and query is big, then then are not similar.
Recall the similarity with the map of Austin, where if the angle
between two addresses is small, then they are neighbors!
Example: Suppose there are five DCOUMENTs (Numbered
DOC 1, DOC 2, DOC 3, DOC 4 and DOC 5) and two
TERMs (TERM 1 and TERM 2). The WEIGHT of each
TERM in the above DOCUMENTs is as in next table


              TERM 1 (dog) Term 2 (cat)
     DOC1            1                2
     DOC2            1                1
     DOC3            0                1
     DOC4            1                0
     DOC5            2                2
If we transfer the table for the DOC 1 to DOC 5 in a vector
system then we will have the next diagram.

                                                TERM 1 Term 2
                                                (dog)  (cat)
                                         DOC1     1       2
          Term 2 (cat)                   DOC2     1       1
                                         DOC3     0       1
                                         DOC4     1       0
  Doc 1                                  DOC5     2       2

 Doc 5
  Doc 3
  Doc 2
                                      Term 1 (dog)

                 Doc 4
Example: Suppose there are two QUERYs (Numbered
QUERY1 and QUERY2) and two TERMs (TERM 1 and
TERM 2). The WEIGHT of each TERM in the QUERYs as
in next table




                 TERM 1 (dog) Term 2 (cat)
 QUERY1                2               1
 QUERY2                1               2
If we transfer the table for QUERY1 and QUERY2 in a
vector system then we will have the next diagram.
                                              TERM 1   Term 2
                                              (dog)    (cat)

 Term 2 (cat)                        QUERY1     2        1
                                     QUERY2     1        2


                         QUERY 2




                           Term 1 (dog)

                 QUERY 1
If we combine the two vector systems, we will have the
following diagram.

           Term 2 (cat)


   Doc 1                            QUERY 2

  Doc 5
   Doc 3
   Doc 2
                                      Term 1 (dog)

                 Doc 4     QUERY 1
The above diagram indicates that DOC 1 is fully relevant to
QUERY 2. DOC 5 is the nearest to Query 1.
To quantify the similarity between a document and
a query, instead of measuring the angle between
their Vectors directly, we can measure the Cosine
of the angle. When the angle is zero (it means the
two vector are exactly along the same path) then
Cosine is 1. If the angle is 90 degrees,then Cosine
is 0 and it means there is minimum relevancy
between the document and query. This
relationships can be measured by this formula:
Thank you!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:4/10/2012
language:simple
pages:17