Future-of-Search-by-Yury-Lifshits

Document Sample
Future-of-Search-by-Yury-Lifshits Powered By Docstoc
					                         St. Petersburg CS Club
                               December 2008




             Future of
              Search


Yury Lifshits
Yahoo! Research
http://yury.name
Outline


 Structured Search
 Yahoo! Work in Search
   SearchMonkey
   BOSS
 Research Agenda
Structured Search:
work in progress
  Structured Search =
  Bring structured data to search users




M.K. Bergman. The Deep Web: Surfacing Hidden Value. 2001.
Value Proposition

 Coverage
   Real-time data
   Semi-private data
 Structured queries
 Ordering and filtering results
 Straight-to-answers
User Interface: Query
   Search assist: Yahoo!
   Selector: LinkedIn, VKontakte.ru
   Multiple search buttons: Gmail
   Search tabs: Yahoo / Google
  User Interface: Results

   Federated page
   Facets
   Search transfer / search form



K.P. Yee, K. Swearingen, K. Li, M. Hearst.
Faceted metadata for image search and browsing. CHI 2003.

Fernando Diaz. Aggregation of News Content Into Web Results. WSDM 2009.

http://glue.yahoo.com
http://au.alpha.yahoo.com
Data Supply Chain

 Atomic fact
                                     Flight, Event, Patent


 Data aggregator
         US Patents, Amadeus/Sabre flights, Upcoming.com


 Domain search
                                          Expedia, Spock


 General purpose search
                           Yahoo!, Google, Yandex, Baidu
Getting structured data

   Entity extraction
   Markup
   Feeds
   Search API (OpenSearch)

OR

 Do a search transfer
Give Us Your Data For …

 Traffic via search transfer
Firefox search box


 Better presentation in search
SearchMonkey


 Hosted search
BOSS Custom


 Showing your ads
Yahoo Local + AT&T
Yahoo! Work in Search
                         Slides by:
            Paul Tarjan, Chief Technical Monkey
                 (ptarjan@yahoo-inc.com)

Full version http://www.slideshare.net/ptarjan/searchmonkey-presentation
   What is SearchMonkey?

         an open platform for using structured data to build more
         useful and relevant search results



Before                               After
Enhanced Result: Zagat




    Image   Links   Key/Value Pairs
                    or Abstract
Infobar: Wikipedia Preview



        Summary    Blob
Creating an Infobar
 Infobar advantages
   Annotate someone else’s site
   Use links and images from other domains
     • Mash up info from multiple sites
     • Affiliate / coupon links? Hmmm…
   Can act on *, all websites
     • But these apps can be annoying if poorly
       designed
 Key design principles
   Put something useful in the summary
   Be creative with the HTML
How to get data to SearchMonkey?

                      Humans see:
                      • name
                      • picture of a person
                      • current job
                      • industry, …

                      Computers see:
                      an undifferentiated
                      blob of HTML

                      Can we make
                      computers smarter?
      How does it work?
1     site owners/publishers share structured data with Yahoo!.

2     site owners & third-party developers build SearchMonkey apps.

3     consumers customize their search experience with Enhanced Results or Infobars


                           Page Extraction


                 RDF/Microformat Markup



    Acme.com’s
    Web Pages


                                 Index


                 DataRSS feed



                            Web Services
    Acme.com’s
    database
SearchMonkey Resources


 Main:
   http://developer.yahoo.com/searchmonkey
 Lists and forums:
   searchmonkey-developers@yahoogroups.com
   http://suggestions.yahoo.com/searchmonkey
Vik Singh (Architect)
Graham Mudd (Senior PMM)
                  What


BOSS = Build your Own Search Service

Open Yahoo’s core search features via web services
to let 3rd parties revolutionize Search

Unrestricted
                   What

Unrestricted:

• Unlimited queries
• Blend, re-order, discard
• Full presentation control
• Non-search apps OK

Monetization: Free or CPM or Ads
                           Why

Barriers to entry are massive
• $300M, top talent, a prayer to get to basic parity


No monopoly over great ideas

Search anywhere
• Improve Vertical Quality w/ Web comprehensiveness
• Fragment the market, foster more players, choice, competition

Yahoo extends advertising reach, 3rd parties revenue share
            Why
+ BOSS Distribution




     Traditional Search Distribution
                                    Tracks

             API                            CUSTOM                                ACADEMIC

A self-service, web services      Working with 3rd parties to       Working with the following
model for developers and start-   build a more relevant,            universities to allow for wide-scale
ups to quickly build and deploy   brand/site specific web search    research in the search field:
                                                                     • UIUC
new search experiences.           experience.                        • CMU
                                                                     • Stanford
                                  This option is jointly built by    • Purdue
                                  Yahoo! and select partners.        • IIT Bombay
                                                                     • MIT
                                                                     • UMass




Interested in Custom? Email us bosscustom@yahoo-inc.com
                      BOSS API v1

http://boss.yahooapis.com/ysearch/{vert}/v1/{q}

{vert} := {web, news, images, spelling}

@ required
appid

@ optional (Y!OS compliant)
start, count, lang, region, format, callback, sites
                   BOSS Mashup Framework

Python (v2.5+) library

BOSS Search SDK plus …

SQL for remixing arbitrary XML/JSON sources

Loosely Functional programming paradigm
                              BMF + Google App Engine


Ported enhanced version of BMF to GAE platform
http://zooie.wordpress.com/2008/08/04/yahoo-boss-google-app-engine-integrated/



Easiest way to deploy a BOSS application online
               Examples


http://www.4hoursearch.com

http://123people.com

Mashable! Contest for BOSS search engines
http://mashable.com/boss/
BOSS Custom for TechCrunch
TechCrunch Neywork Search

   CrunchBase + Posts + Web
   Sort by time / relevance
   Enhanced results
   Domain-specific facets
   Yahoo! sponsored search
   Real-time indexing
   Special results
Research Agenda
Structured Search

 Analysis of search demand
   Intent classification
   General search vs. vertical

 Incentives in data supply
 Push & real-time indexing
 Search user interface
   One box vs. multi-box
   General vs. vertical

 Deciding search transfer
   When?
   To whom?
  Key Scientific Challenges
  Draft: http://research.yahoo.com/ksc



  1.   Search intent
  2.   Quality metrics
  3.   Web mining
  4.   Multilingual IR
  5.   Nextgen search
          Synthesized result pages
  6. World knowledge

A.Z. Broder. Taxonomy of web search. SIGIR 2002.
More Problems

 Discovery search

 Web search vs. asking people

 Event search
Thanks for your attention!



Yury Lifshits
http://yury.name
yury@yury.name

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/6/2011
language:English
pages:38