Future-of-Search-by-Yury-Lifshits by wuxiangyu


									                         St. Petersburg CS Club
                               December 2008

             Future of

Yury Lifshits
Yahoo! Research

 Structured Search
 Yahoo! Work in Search
   SearchMonkey
   BOSS
 Research Agenda
Structured Search:
work in progress
  Structured Search =
  Bring structured data to search users

M.K. Bergman. The Deep Web: Surfacing Hidden Value. 2001.
Value Proposition

 Coverage
   Real-time data
   Semi-private data
 Structured queries
 Ordering and filtering results
 Straight-to-answers
User Interface: Query
   Search assist: Yahoo!
   Selector: LinkedIn, VKontakte.ru
   Multiple search buttons: Gmail
   Search tabs: Yahoo / Google
  User Interface: Results

   Federated page
   Facets
   Search transfer / search form

K.P. Yee, K. Swearingen, K. Li, M. Hearst.
Faceted metadata for image search and browsing. CHI 2003.

Fernando Diaz. Aggregation of News Content Into Web Results. WSDM 2009.

Data Supply Chain

 Atomic fact
                                     Flight, Event, Patent

 Data aggregator
         US Patents, Amadeus/Sabre flights, Upcoming.com

 Domain search
                                          Expedia, Spock

 General purpose search
                           Yahoo!, Google, Yandex, Baidu
Getting structured data

   Entity extraction
   Markup
   Feeds
   Search API (OpenSearch)


 Do a search transfer
Give Us Your Data For …

 Traffic via search transfer
Firefox search box

 Better presentation in search

 Hosted search
BOSS Custom

 Showing your ads
Yahoo Local + AT&T
Yahoo! Work in Search
                         Slides by:
            Paul Tarjan, Chief Technical Monkey

Full version http://www.slideshare.net/ptarjan/searchmonkey-presentation
   What is SearchMonkey?

         an open platform for using structured data to build more
         useful and relevant search results

Before                               After
Enhanced Result: Zagat

    Image   Links   Key/Value Pairs
                    or Abstract
Infobar: Wikipedia Preview

        Summary    Blob
Creating an Infobar
 Infobar advantages
   Annotate someone else’s site
   Use links and images from other domains
     • Mash up info from multiple sites
     • Affiliate / coupon links? Hmmm…
   Can act on *, all websites
     • But these apps can be annoying if poorly
 Key design principles
   Put something useful in the summary
   Be creative with the HTML
How to get data to SearchMonkey?

                      Humans see:
                      • name
                      • picture of a person
                      • current job
                      • industry, …

                      Computers see:
                      an undifferentiated
                      blob of HTML

                      Can we make
                      computers smarter?
      How does it work?
1     site owners/publishers share structured data with Yahoo!.

2     site owners & third-party developers build SearchMonkey apps.

3     consumers customize their search experience with Enhanced Results or Infobars

                           Page Extraction

                 RDF/Microformat Markup

    Web Pages


                 DataRSS feed

                            Web Services
SearchMonkey Resources

 Main:
   http://developer.yahoo.com/searchmonkey
 Lists and forums:
   searchmonkey-developers@yahoogroups.com
   http://suggestions.yahoo.com/searchmonkey
Vik Singh (Architect)
Graham Mudd (Senior PMM)

BOSS = Build your Own Search Service

Open Yahoo’s core search features via web services
to let 3rd parties revolutionize Search



• Unlimited queries
• Blend, re-order, discard
• Full presentation control
• Non-search apps OK

Monetization: Free or CPM or Ads

Barriers to entry are massive
• $300M, top talent, a prayer to get to basic parity

No monopoly over great ideas

Search anywhere
• Improve Vertical Quality w/ Web comprehensiveness
• Fragment the market, foster more players, choice, competition

Yahoo extends advertising reach, 3rd parties revenue share
+ BOSS Distribution

     Traditional Search Distribution

             API                            CUSTOM                                ACADEMIC

A self-service, web services      Working with 3rd parties to       Working with the following
model for developers and start-   build a more relevant,            universities to allow for wide-scale
ups to quickly build and deploy   brand/site specific web search    research in the search field:
                                                                     • UIUC
new search experiences.           experience.                        • CMU
                                                                     • Stanford
                                  This option is jointly built by    • Purdue
                                  Yahoo! and select partners.        • IIT Bombay
                                                                     • MIT
                                                                     • UMass

Interested in Custom? Email us bosscustom@yahoo-inc.com
                      BOSS API v1


{vert} := {web, news, images, spelling}

@ required

@ optional (Y!OS compliant)
start, count, lang, region, format, callback, sites
                   BOSS Mashup Framework

Python (v2.5+) library

BOSS Search SDK plus …

SQL for remixing arbitrary XML/JSON sources

Loosely Functional programming paradigm
                              BMF + Google App Engine

Ported enhanced version of BMF to GAE platform

Easiest way to deploy a BOSS application online



Mashable! Contest for BOSS search engines
BOSS Custom for TechCrunch
TechCrunch Neywork Search

   CrunchBase + Posts + Web
   Sort by time / relevance
   Enhanced results
   Domain-specific facets
   Yahoo! sponsored search
   Real-time indexing
   Special results
Research Agenda
Structured Search

 Analysis of search demand
   Intent classification
   General search vs. vertical

 Incentives in data supply
 Push & real-time indexing
 Search user interface
   One box vs. multi-box
   General vs. vertical

 Deciding search transfer
   When?
   To whom?
  Key Scientific Challenges
  Draft: http://research.yahoo.com/ksc

  1.   Search intent
  2.   Quality metrics
  3.   Web mining
  4.   Multilingual IR
  5.   Nextgen search
          Synthesized result pages
  6. World knowledge

A.Z. Broder. Taxonomy of web search. SIGIR 2002.
More Problems

 Discovery search

 Web search vs. asking people

 Event search
Thanks for your attention!

Yury Lifshits

To top