QLMLesson4SearchShortcuts

Document Sample
QLMLesson4SearchShortcuts Powered By Docstoc
					An efficient algorithm to generate
         Search Shortcuts




          “las vegas”
“caesars palace”       “gambling places”




           “bellagio hotel”
               Last query of the session:
              click on (at least) one result
“las vegas”

                 “satisfactory” session
             % satisfactory




Rank (log)
                   64%
                                of users has
%	
  sa-sfactory




                                clicked on at
                                least one result
                                on the last query




                     Rank	
  (log)
“las vegas”




                use as suggestion: final query
              from other “satisfactory” sessions
5   minutes
    each session
las vegas               caesars palace
gambling places
hotels pool          las vegas gambling
las vegas casino     places hotels pool
 caesars palace      las vegas casino




                   “las vegas hotels”


                      1. caesars palace
las vegas              caesars palace     las vegas poker
gambling places                           las vegas hotels
hotels pool          las vegas gambling   caesars casino
las vegas casino     places hotels pool
 caesars palace      las vegas casino       caesars palace




                   “poker gambling”


                     1. caesars palace
 ion in relation to ⇤ , defined above in section 4 as the concatena
ms appearing in the head of the current search session, is comp
                     Suggestions ranking
ng:


           w(⇤, qfi ) =   · BM 25(⇤, qfi ) + ⇥ · f req(qfi )

ice that both BM25 rank and frequency are normalized val
) domain is defined by the range (0..2].
                IR-rank                       popularity


                          we β =
 ur experimental settings α = used1/2 = ⇥ = 1, giving the sam
 o both the parameters; obviously, further tests aimed to find t
of    and ⇥ coefficients in the above formula could be performe
14,921,286
 total queries from Microsoft log




9,461,423    sessions



1,949,320
       virtual documents
                                                                                                                        COVER GRAPH
                                                                                                                      Baeza-Yates et al.
                                                                                                                          KDD ‘07


                                                                                   barcelona fc   are also given the query-flow graph, which has been com-
                                                                                     website
                                                                                                  puted with the sessions of S as part of its input. The chain-
                                                                           0.043
                                                                                                  finding problem can also be defined in the case that the
                                                                                   barcelona fc



                                                                                                              QUERY FLOW GRAPH
                                                                                     fixtures
                                                                                                  sessions of S have not participated in the construction of
                                                                           0.031                  the query-flow graph. However, in this paper we focus on
                                                            barcelona fc   0.017      real
                                                                                                  the former case and we leave the latter for future work.
                                                                                     madrid          One of the challenges of the problem we consider arises
                                         0.080                                                    from our definition of chains: we allow chains not to be con-

                                                                                                                                 Boldi et al.
                                                    0.011
                                                                           0.506                  secutive in the supersession S; in other words, the super-
                                                                                                  session S may contain many intertwined chains such as the
                                                               0.439
                                                                                                  ones shown in the Table 1. Previous work has mostly focused

                                                                                                                                  CIKM ‘08
                                        barcelona
                                          hotels    0.072
                                                               cheap
                                                                                                  on the case where all chains are consecutive.
                     0.018
                                                             barcelona
                                0.023
                                                               hotels
                                                    0.029                                         Chain #1                       Chain #2
                                                                                      <T>
                                                                                                  ...                            ...
                    barcelona                                 luxury
            0.043
                                                             barcelona
                                                                                                  football results january 2nd   pointui forum
            0.018
barcelona                                                      hotels                             royal carribean cruises        audi ipswich
 weather
                                                    0.416                                         holidays                       golfers elbow
                                                                                                  motherwell football club       cox ipswich
                                                                                                  ...                            ...
                                         0.523
            0.100

                                                                                                  Table 1: Two fragments from actual sessions con-
                    barcelona                                                                     taining non-consecutive chains.
                     weather
                     online
 MANUAL EVALUATION



simple?
objec-ve?
reproducible?
                                                  (TREC query no. 14)



                        “dinosaurs”

1. Go to the Discovery Channel’s dinosaur site, which has
   pictures of dinosaurs and games.
2. I’m looking for free pictures of dinosaurs.
3. I want to find pictures of dinosaurs that I can color in, as in a
   coloring book.
4. I’m looking for a list of all (or many of) the different kinds of
   dinosaurs, with pictures
5. Take me to the homepage for the BBC series, “Walking with
   Dinosaurs”
                            (Search Shorctuts suggestions)


8
to
  0%
  p i c co
          verag
               e

                   “dinosaurs”

 1.dinosaur pictures         “I’m looking for free pictures of
                             dinosaurs.” (sub-topic 2)
 2.dinosaur worksheets
 3.dinosaur games            “Take me to the homepage for the
 4.all about dinosaurs       B B C s e r i e s , ‘ Wa l k i n g w i t h
 5.walking with dinosaurs    Dinosaurs’.” (sub-topic 5)

 6.poetry dinosaurs          “I want to find pictures of dinosaurs
 7.dinosaur clip art         that I can color in, as in a coloring
 8.trooden dinosaurs         book.” (sub-topic 3)
 9.dinosaurs list            “I’m looking for a list of all (or
10.tyrannosaurus dinosaur    many of) the different kinds of
                             dinosaurs, with pictures.” (sub-topic 4)
       SS           CG         QFG




AVERAGE     47.06   > 50%      27/50
TOPIC       18.76   TOPIC       5/50
COVERAGE     8.40   COVERAGE    0/50
                                                (TREC suggestions)
                                      (Search Shorctutsquery no. 13)


9/10  d sug
           gestion
                  s
relate
                 “map of the united states”

                   1.map of united states
                   2.blank map of the united states
                   3.map of united states of america
                   4.united states maps
                   5.outline map of the united states
                   6.united states of america map
                   7.printable united states map
                   8.united states region map
                   9.political map of the united states
                  10.updated wrestling news
   SS         CG       QFG




  AVERAGE           9.52
PRECISION           4.72
on 10 suggestions   2.46
               The story so far...


Search Shortcuts: the idea and preliminary studies

How the idea becomes reality: the implementation

             Ranking of suggestions

    A new evaluation metric: topic coverage

               Analisys of results
                More details...




 Daniele Broccolo, Lorenzo Marcon, Franco Maria
     Nardini, Raffaele Perego, Fabrizio Silvestri
Generating suggestions for queries in the
     long tail with an inverted index.
   Information Processing & Management (2011)
          doi:10.1016/j.ipm.2011.07.005
http://searchshortcuts.isti.cnr.it

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:4/9/2012
language:
pages:21