Docstoc

Speech Lab template

Document Sample
Speech Lab template Powered By Docstoc
					Context-Sensitive Information Retrieval Using
             Implicit Feedback
 Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign
    Bin Tan : department of Computer Science University of Illinois at Urbana-Champaign
ChengXiang Zhai : department of Computer Science University of Illinois at Urbana-Champaign



                                                               Present by Chia-Hao
                                                               Lee
                                  outline
• Introduction
• Problem Definition
• Language Models for Context-Sensitive Information
  Retrieval
   –   Basic retrieval model
   –   Fixed Coefficient Interpolation (FixInt)
   –   Bayesian Interpolation (BayesInt)
   –   Online Bayesian Updating (OnlineInt)
   –   Batch Bayesian updating (batchUp)
• Experiments
• Conclusions and Future Work



                                                      2
                       Introduction

• In most existing information retrieval models, the retrieval
  problem is treated as involving one single query and a
  set of documents.

• From a single query, however, the retrieval system can
  only have very limited clue about the user’s information
  need.

• An optimal retrieval system thus should try to exploit as
  much additional context information as possible to
  improve retrieval accuracy, whenever it is available.



                                                                 3
                      Introduction
• There are many kinds of context that we can exploit.

• Relevance feedback is known to be effective for
  improving retrieval accuracy.

• However, relevance feedback requires that a user
  explicitly provides feedback information, such as
  specifying the category of the information need or
  marking a subset of retrieved documents as relevant
  documents.




                                                         4
                      Introduction

• A major advantage of implicit feedback is that we can
  improve the retrieval accuracy without requiring any user
  effort.

• For example, if the current query is “java”, without
  knowing any extra information, it would be impossible to
  know whether it is intended to mean the Java
  programming language or Java island in Indonesia.




                                                              5
                    Problem Definition
• There are two kinds of context information we can use
  for implicit feedback.
   – Short-term context
   – Long-term context


• Short-term context is the immediate surrounding
  information which throws light on a user’s current
  information need in a single session.

• A session can be considered as a period consisting of all
  interactions for the same information need.



                                                              6
                     Problem Definition
• In a single search session, a user may interact with the
  search system several times. During interactions, the user
  would continuously modify the query.

• Therefore for the current query    , there is a query history.

•                 associated with it, which consists of the
    preceding queries given by the same user in the current
    session.

• Indeed, our work has shown that the short-term query
  history is useful for improving retrieval accuracy.

                                                                   7
                    Problem Definition
• A user would presumably frequently click some documents
  to view.

• We refer to data associated with these actions as
  clickthrough history.

• The clickthrough data may include the title, summary, and
  perhaps also the content and location of the clicked
  document.

• Our work has shown positive results using similar
  clickthrough information.

                                                              8
 Language models for context-sensitive information retrieval

• We propose to use statistical language models to model a
  user’s information need and develop four specific context-
  sensitive language models to incorporate context
  information into a basic retrieval model.

• 1. Basic retrieval model
       We compute            , which serves as the score of the
  document.

       One advantage of this approach is that we can
  naturally incorporate the search context as additional
  evidence to improve our estimate of the query language
  model.

                                                                  9
Language models for context-sensitive information retrieval

• Our task is to estimate a context query model, which we
  denote by          , based on the current query , as well
  as the query     and clickthough history    .

• We will use        to denote the count of word ω in text
  X, which could be either a query or a clicked document’s
  summary or any other text.

• We will use   to denote the length of text X or the total
  number of words in X.




                                                              10
 Language models for context-sensitive information retrieval

• 2. Fixed Coefficient Interpolation (FixInt)
     Our first idea is to summarize the query history     with a
  unigram language model           and the clickthrough history
  with another unigram language model            .




                                                               11
 Language models for context-sensitive information retrieval

• 3. Bayesian Interpolation (BayesInt)
       One possible problem with the FixInt approach is that
  the coefficient, especially α, are fixed across all the queries.

       If our current query   is very long, we should trust the
  current query more, whereas if     has just one word, it may
  be beneficial to put more weight on the history.

       To capture this intuition, we treat     and          as
  Dirichlet priors and   as the observed data to estimate a
  context query model using Bayesian estimator.



                                                                 12
Language models for context-sensitive information retrieval

       The estimated model is given by




                  : the prior sample size for

                  : the prior sample size for




                                                              13
Language models for context-sensitive information retrieval

• 4. Online Bayesian Updating (Online Up)
       4.1 Bayesian updating
       Let        be or current query model and T be a new
  piece of text evidence observed. To update the query
  model based on T, we use to define a Dirichlet prior
  parameterized as



         With such a conjugate prior, the predictive
  distribution of




                                                              14
 Language models for context-sensitive information retrieval

• 4.2 Sequential query model updating
         We use such information to define a prior on the
  query model, which is denoted by      .
         After we observe the first query , we can update
  the query model based on the new observed data .
         The update query model       can then be used for
  ranking documents in response to . As the user’s views
  some documents, the displayed summary text for such
  documents      can serve as some new data for us to further
  update the query model to obtain      .




                                                               15
Language models for context-sensitive information retrieval

         We see two types of updating:
     (1) updating based on a new query
     (2) updating based on a new clicked summary

     Thus we have the following updating equations:




                   : the equivalent sample size for the prior
                     when updating the model based on a query
                   : the equivalent sample size for the prior
                     when updating the model based on a clicked summary


                                                                          16
Language models for context-sensitive information retrieval

• 5. Batch Bayesian updating (BatchUp)
      The updating equations are as follows.




               : the same interpretation as in OnlineUp

               : indicates to what extent we want to trust the clicked summaries




                                                                                   17
Experiments




              18
Experiments




              19
Experiments




              20
Experiments




              21
Experiments




              22
                       Conclusions

• In this paper, we have explored how to exploit implicit
  feedback information, including query history and
  clickthrough history within the same search session, to
  improve information retrieval performance.

• Experiment results show that using implicit feedback,
  especially clickthrough history, can substantially improve
  retrieval performance without requiring any additional
  user effort.




                                                               23

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:7/27/2013
language:English
pages:23