VIEWS: 4 PAGES: 17 POSTED ON: 4/10/2012
Application of Vector Indexing in Information Retrieval Darius Mahdjoubi Don Drumtra September 25, 2000 In the theater, how do you go to your seat? That depends on who you are! Vectors are not used only in math class. They can be used in the theaters as well -- by “most” people!! The City of Austin is planned based on the vector system. “Congress Bridge” is the origin of this vector system. Congress St. – like Y axis - separates Austin into two parts: East and West. The river – like X axis - separates Austin into two parts: North and South. The address of any location can be depicted as vectors along Congress Street and then East or West. Any vector can broken down to its elements as well. In Austin, any two locations can be compared based on how much is the ANGLE between their vectoral addresses and how far they are from the Congress Bridge. We can think of “neighborness!!” of homes also according to the their vectoral address! If the “angle” between the vectroal address of two homes is small then they can be neighbors! US ZIP Codes is based on a type of Vector Representation Canadian Postal Code is also a type of Vector Representation Note the similarities between the ZIP (Postal) coding and the library classification schemes. Can the library classification schemes be also conceived as a type of vector system? Vectors are used for information retrieval about time 12 is the main axis and we measure time according angle of indicators Are we using vectors indexing for information retrieval system in the car panel? Vector indexing can be mixed with fuzzy logic in the “temperature gauge” and “gas tank gauge” as in the above information retrieval system. Application of Vector Indexing in Information Retrieval In “Vector indexing” for information retrieval each “subject heading” in a document or a query is treated like the dimension of a vector system and the relative “weight” of the subject heading is like the length of the vector. If we have “N” subject heading, then we need a “N dimensional” vectoral system. In the “N dimensional vectoral” system, we can depict the vectors for both the “document” and the “query.” The “angel between the vectors of the document and the query” defines how much they are similar. If the angle between document and query is small, they are similar. If the angle between the document and query is big, then then are not similar. Recall the similarity with the map of Austin, where if the angle between two addresses is small, then they are neighbors! Example: Suppose there are five DCOUMENTs (Numbered DOC 1, DOC 2, DOC 3, DOC 4 and DOC 5) and two TERMs (TERM 1 and TERM 2). The WEIGHT of each TERM in the above DOCUMENTs is as in next table TERM 1 (dog) Term 2 (cat) DOC1 1 2 DOC2 1 1 DOC3 0 1 DOC4 1 0 DOC5 2 2 If we transfer the table for the DOC 1 to DOC 5 in a vector system then we will have the next diagram. TERM 1 Term 2 (dog) (cat) DOC1 1 2 Term 2 (cat) DOC2 1 1 DOC3 0 1 DOC4 1 0 Doc 1 DOC5 2 2 Doc 5 Doc 3 Doc 2 Term 1 (dog) Doc 4 Example: Suppose there are two QUERYs (Numbered QUERY1 and QUERY2) and two TERMs (TERM 1 and TERM 2). The WEIGHT of each TERM in the QUERYs as in next table TERM 1 (dog) Term 2 (cat) QUERY1 2 1 QUERY2 1 2 If we transfer the table for QUERY1 and QUERY2 in a vector system then we will have the next diagram. TERM 1 Term 2 (dog) (cat) Term 2 (cat) QUERY1 2 1 QUERY2 1 2 QUERY 2 Term 1 (dog) QUERY 1 If we combine the two vector systems, we will have the following diagram. Term 2 (cat) Doc 1 QUERY 2 Doc 5 Doc 3 Doc 2 Term 1 (dog) Doc 4 QUERY 1 The above diagram indicates that DOC 1 is fully relevant to QUERY 2. DOC 5 is the nearest to Query 1. To quantify the similarity between a document and a query, instead of measuring the angle between their Vectors directly, we can measure the Cosine of the angle. When the angle is zero (it means the two vector are exactly along the same path) then Cosine is 1. If the angle is 90 degrees,then Cosine is 0 and it means there is minimum relevancy between the document and query. This relationships can be measured by this formula: Thank you!
Pages to are hidden for
"Vector and Latent Semantic Indexing"Please download to view full document