VIEWS: 2 PAGES: 23 POSTED ON: 7/27/2013
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign Bin Tan : department of Computer Science University of Illinois at Urbana-Champaign ChengXiang Zhai : department of Computer Science University of Illinois at Urbana-Champaign Present by Chia-Hao Lee outline • Introduction • Problem Definition • Language Models for Context-Sensitive Information Retrieval – Basic retrieval model – Fixed Coefficient Interpolation (FixInt) – Bayesian Interpolation (BayesInt) – Online Bayesian Updating (OnlineInt) – Batch Bayesian updating (batchUp) • Experiments • Conclusions and Future Work 2 Introduction • In most existing information retrieval models, the retrieval problem is treated as involving one single query and a set of documents. • From a single query, however, the retrieval system can only have very limited clue about the user’s information need. • An optimal retrieval system thus should try to exploit as much additional context information as possible to improve retrieval accuracy, whenever it is available. 3 Introduction • There are many kinds of context that we can exploit. • Relevance feedback is known to be effective for improving retrieval accuracy. • However, relevance feedback requires that a user explicitly provides feedback information, such as specifying the category of the information need or marking a subset of retrieved documents as relevant documents. 4 Introduction • A major advantage of implicit feedback is that we can improve the retrieval accuracy without requiring any user effort. • For example, if the current query is “java”, without knowing any extra information, it would be impossible to know whether it is intended to mean the Java programming language or Java island in Indonesia. 5 Problem Definition • There are two kinds of context information we can use for implicit feedback. – Short-term context – Long-term context • Short-term context is the immediate surrounding information which throws light on a user’s current information need in a single session. • A session can be considered as a period consisting of all interactions for the same information need. 6 Problem Definition • In a single search session, a user may interact with the search system several times. During interactions, the user would continuously modify the query. • Therefore for the current query , there is a query history. • associated with it, which consists of the preceding queries given by the same user in the current session. • Indeed, our work has shown that the short-term query history is useful for improving retrieval accuracy. 7 Problem Definition • A user would presumably frequently click some documents to view. • We refer to data associated with these actions as clickthrough history. • The clickthrough data may include the title, summary, and perhaps also the content and location of the clicked document. • Our work has shown positive results using similar clickthrough information. 8 Language models for context-sensitive information retrieval • We propose to use statistical language models to model a user’s information need and develop four specific context- sensitive language models to incorporate context information into a basic retrieval model. • 1. Basic retrieval model We compute , which serves as the score of the document. One advantage of this approach is that we can naturally incorporate the search context as additional evidence to improve our estimate of the query language model. 9 Language models for context-sensitive information retrieval • Our task is to estimate a context query model, which we denote by , based on the current query , as well as the query and clickthough history . • We will use to denote the count of word ω in text X, which could be either a query or a clicked document’s summary or any other text. • We will use to denote the length of text X or the total number of words in X. 10 Language models for context-sensitive information retrieval • 2. Fixed Coefficient Interpolation (FixInt) Our first idea is to summarize the query history with a unigram language model and the clickthrough history with another unigram language model . 11 Language models for context-sensitive information retrieval • 3. Bayesian Interpolation (BayesInt) One possible problem with the FixInt approach is that the coefficient, especially α, are fixed across all the queries. If our current query is very long, we should trust the current query more, whereas if has just one word, it may be beneficial to put more weight on the history. To capture this intuition, we treat and as Dirichlet priors and as the observed data to estimate a context query model using Bayesian estimator. 12 Language models for context-sensitive information retrieval The estimated model is given by : the prior sample size for : the prior sample size for 13 Language models for context-sensitive information retrieval • 4. Online Bayesian Updating (Online Up) 4.1 Bayesian updating Let be or current query model and T be a new piece of text evidence observed. To update the query model based on T, we use to define a Dirichlet prior parameterized as With such a conjugate prior, the predictive distribution of 14 Language models for context-sensitive information retrieval • 4.2 Sequential query model updating We use such information to define a prior on the query model, which is denoted by . After we observe the first query , we can update the query model based on the new observed data . The update query model can then be used for ranking documents in response to . As the user’s views some documents, the displayed summary text for such documents can serve as some new data for us to further update the query model to obtain . 15 Language models for context-sensitive information retrieval We see two types of updating: (1) updating based on a new query (2) updating based on a new clicked summary Thus we have the following updating equations: : the equivalent sample size for the prior when updating the model based on a query : the equivalent sample size for the prior when updating the model based on a clicked summary 16 Language models for context-sensitive information retrieval • 5. Batch Bayesian updating (BatchUp) The updating equations are as follows. : the same interpretation as in OnlineUp : indicates to what extent we want to trust the clicked summaries 17 Experiments 18 Experiments 19 Experiments 20 Experiments 21 Experiments 22 Conclusions • In this paper, we have explored how to exploit implicit feedback information, including query history and clickthrough history within the same search session, to improve information retrieval performance. • Experiment results show that using implicit feedback, especially clickthrough history, can substantially improve retrieval performance without requiring any additional user effort. 23
"Speech Lab template"