Algorithm

Document Sample
Algorithm Powered By Docstoc
					Re-ranking method based on
inter-document distances


            授課老師:陳彥良教授、許秉瑜教授


            報告者:吳家齊
    Outline
   Background
   Main conception
   Method
   An example
   Conclusion
    Background & Motivation
   The best documents should be located as
    close to the top of the list as possible.
   None of the existing models can
    guarantee that relevant documents will
    occupy the top position in the list.
   Effective integration of more information
    should lead to better information
    retrieval.
   One kind of additional information is
    inter-document relationships.
    Objective
   Proposing a new document re-ranking
    method.
   It uses the distances between documents
    for modifying initial relevance weights.
    Main conception
   Similarities can be equivalently described
    by means of distances.
   Documents strongly interrelated should
    not be assigned very different weights.
   Metric space property:
    |δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
Main conception


   d        e




       d*
Main conception

  δ(d, e) ≤ δ(e, d*) + δ(d, d*)



    e                  d*          d

           δ(e, d*)
                        δ(d, d*)
Main conception

  δ(d, e) ≥ δ(e, d*) - δ(d, d*)



    e            d δ(d, d*) i*

            δ(e, d*)
    Method
   D = {d1, d2, …, dn} is a set of documents
    returned in response to a query.
   Input
       distance vecter c = [δ(d1, d*), δ(d2, d*) , …,
                              δ(dn, d*)]
       distance matrix D[dij = δ(di, dj)]
       maxerror > 0
   Output
       A better distance vector c’
    Method
    Change matrix
|δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
|ci - dij|≦ cj ≦ ci + dij


             |ci - dij|- cj   if cj<|ci - dij|
    Zi,j =    ci + dij - cj   if cj> ci + dij
             0                 if |ci - dij|≦ cj ≦ ci + dij
    Method
    Change matrix
|δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
|ci - dij|≦ cj ≦ ci + dij


             |ci - dij|- cj   if cj<|ci - dij|
    Zi,j =    ci + dij - cj   if cj> ci + dij
             0                 if |ci - dij|≦ cj ≦ ci + dij
    Method
    Change matrix
|δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
|ci - dij|≦ cj ≦ ci + dij


             |ci - dij|- cj   if cj<|ci - dij|
    Zi,j =    ci + dij - cj   if cj> ci + dij
             0                 if |ci - dij|≦ cj ≦ ci + dij
    Method
    Change matrix
|δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
|ci - dij|≦ cj ≦ ci + dij


             |ci - dij|- cj   if cj<|ci - dij|
    Zi,j =    ci + dij - cj   if cj> ci + dij
             0                 if |ci - dij|≦ cj ≦ ci + dij
    Method
    Change matrix
|δ(d, d*)-δ(e, d)|≦(e, d*)≦δ(d, d*) + δ(e, d)
|ci - dij|≦ cj ≦ ci + dij


             |ci - dij|- cj   if cj<|ci - dij|
    Zi,j =    ci + dij - cj   if cj> ci + dij
             0                 if |ci - dij|≦ cj ≦ ci + dij
    Method
   An example of change matrix

                           0 5 7
      C = [2, 8, 3]   D=   5 0 3
           i j             7 3 0

             0   -1   +2
      Z=    +1    0   +2
            +2   -2    0
    Method
   An example of change matrix

                           0 5 7
      C = [2, 8, 3]   D=   5 0 3
           j i             7 3 0

             0   -1   +2
      Z=    +1    0   +2
            +2   -2    0
Method
    An example
   query = data mining methods.
   A list produced by a dictionary
    S = (association, classification, clustering, data,
         method, mining, regression)
    q = [0,0,0,1,1,1,0]
   Response documents D = {d1,d2,d3,d4}
    d1=[0,0,0,2,1,2,0]
    d2=[2,2,2,4,4,4,2]
    d3=[0,1,2,1,2,2,1]
    d4=[0,1,0,0,0,1,2]
    An example
   Determine the relevance weights of
    documents using the cosine coefficient
    (Salton & McGill, 1983)
   r1 = sim(d1, q) = 0.96
   r2 = sim(d2, q) = 0.87
   r3 = sim(d3, q) = 0.75
   r4 = sim(d4, q) = 0.24
   r = [0.96, 0.87, 0.75, 0.24]
An example
S = (association, classification, clustering, data,
     method, mining, regression)

q = [0,0,0,1,1,1,0]

d1=   [0,0,0,2,1,2,0]
d2=   [2,2,2,4,4,4,2]
d3=   [0,1,2,1,2,2,1]
d4=   [0,1,0,0,0,1,2]

r = [0.96, 0.87, 0.75, 0.24]
    An example
   Obtain the distance matrix using the
    Hamming distance

    d1 =   [0,   0,   0,   2,   1,   2,   0]
    d2 =   [2,   2,   2,   4,   4,   4,   2]
    d3 =   [0,   1,   2,   1,   2,   2,   1]
    d4 =   [0,   1,   0,   0,   0,   1,   2]

             0        15 6             7
    D=      15         0 11           16
             6        11 0             7
             7        16 7             0
    An example
   Determine distances with the relevance
    function
       relevance function: r = 1 – 0.03c
        ri = 1 – 0.03ci
        ci = (1 – ri) / 0.03

       c = [1.33, 4.33, 8.33, 25.33]
    An example
   Produce initial change matrix with DWI
       c = [1.33, 4.33, 8.33, 25.33]
               0   15 6      7
        D=    15    0 11    16
               6   11 0      7
               7   16 7      0

               0   -9.34    -1   -17
        Z=    9.34   0       0    -5
               1      0      0   -10
              17     5      10     0
    An example
   A new distance vector is determined
    c’ = [10.25, 4.75, 9.08, 16.08]
   Using the relevance function
    ri’ = 1 – 0.03ci’
    new weights are calculated.
    r’ = [0.69, 0.86, 0.73, 0.52]
    Conclusion
   This method take the relationship with all
    the documents in the answer into
    account.
   The practical benefits of the method
    proposed remain to be shown.
Thanks for your attention

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:3/10/2012
language:
pages:26