Data sets _Test collections or TCs_ Project started in ... - CLEF 2012

Document Sample
Data sets _Test collections or TCs_ Project started in ... - CLEF 2012 Powered By Docstoc
					           NTCIR: NII Testbeds and Community for Information access Research
                                               Research Infrastructure for Evaluating IA
          A series of evaluation workshops designed to enhance
 research in information-access technologies by providing an
 infrastructure for large-scale evaluations.
  ■Data sets, evaluation methodologies, and forum

Project started in late 1997                  NTCIR-9
                                              NTCIR-8
                                                                     17
                                                                     17
                                                                                                        90          119
                                                                                         65
      Once every 18 months                    NTCIR-7            15                                82
                                                                                                              109

                                                                12
Data sets (Test collections or TCs)           NTCIR-6
                                              NTCIR-5            15                           77
                                                                                                   85          # of countries

                                                                10                                             # of Active
  Scientific, news, patents, web, CQA,        NTCIR-4
                                              NTCIR-3       9                            65
                                                                                              74
                                                                                                               participant groups
  Wikipedia, Entrance Exams                   NTCIR-2       8                  36                              # of registered
                                                            6
  Chinese, Korean, Japanese, and English      NTCIR-1                     28

                                                        0                           50                  100               150

Tasks (Research Areas)
 IR: Cross-lingual tasks, patents, web, Geo
 QA:Monolingual tasks, cross-lingual tasks
 Summarization, trend info., patent maps
 Opinion analysis, text mining
Community-based Research Activities
                                                                                                                                1
                                          Tasks at Past NTCIRs


                       User Generated                         ■                Community QA
                                                                               Opinion Analysis
                          Contents                        ■ ■ ■
                                                                           ■   Inference + QA
                        Module-Based
                                                              ■ ■              Cross-Lingual QA + IR
                                                                    ■      ■   Entrance Exam
                        IR for Focused                              ■      ■   Spoken Doc IR
                            Domain                                ■ ■          Geo Temporal
                                                   ■ ■ ■ ■ □                   Patent
                                                               ■ ■ ■ □         Complex/ Any Types
                            Question                   ■ ■                     Dialog
                           Answering                       ■ ■ ■ ■             Cross-Lingual
                                                   ■ ■ ■ ■                     Factoid, List
                           Extraction                                  ■   ■   Cross Link Discovery
                            Semantic                                   ■   ■   Inference + QA
                                                       ■ ■ ■ ■ ■               Text Mining / Classification
                        Summarization /
                                                           ■ ■ ■               Trend Info Visualization
                                                                               Text Summarization
                          Consolidation
                                               ■ ■ ■
                           Interactive                                 ■       Visualization + IIR
                                                                       ■   ■   Intent Mining
                              Web                                      ■   ■   Quality of SERP
                                                   ■ ■ ■                       Web
                           Crosslingual                            ■ ■ ■   ■   Statistical MT
                                          ■ ■ ■ ■ ■ ■ ■ ■                      Cross-Lingual IR
                                          ■ ■ ■ ■ ■ ■ ■ ■                      Non-English Search
                          Text Retrieval  ■ ■ ■ ■ ■ ■ ■ ■                      Ad Hoc IR, IR for QA
he Years the meetings were held. The tasks started 18 months before


                                                                                                              2
        NTCIR-9/10: Objectives
• Solid foundation          • Community-led task
                              organisation
  – New structure
• Task diversity              – Sustainability of
                                research
  – Covers a wide context
    in Information Access     – Seed Funding
  – Studies rich media      • Promotion of research
    types                     resources
                              – Show case in the
                                NTCIR Meeting

                                                    3
NTCIR: 検索システムの力くらべ




       http://research.nii.ac.jp/ntcir/
                             4
                                          4
             NTCIR-10: Structure                           TO

• General Co-Chairs             • Task Organizers
   – Tsuneaki Kato (U Tokyo)       – 48 researchers all over the
                                     world
   – Noriko Kando (NII)
                                   – Particiapnts (you!)
   – Douglas W. Oard
     (University of Maryland)   • EVIA 2013 Co-Chairs
   – Mark Sanderson (RMIT)         – Ruihua Song (MSRA)
• Program Co-Chairs                – William Webber
                                     (University of Maryland)
   – Hideo Joho (U Tsukuba)
   – Tetsuya Sakai (MSRA)

                                                                   5
http://research.nii.ac.jp/ntcir/ntcir-10/organizers.html   6
      NTCIR-10 Program Committe
•   Charles Clarke (University of Waterloo, Canada)
•   Kalervo Järvelin (University of Tampere, Finland)
•   Hideo Joho (Co-chair, University of Tsukuba, Japan)
•   Gareth Jones (Dublin City University, Ireland)
•   Noriko Kando (NII, Japan)
•   Tsuneaki Kato (The University of Tokyo, Japan)
• Douglas W. Oard (University of Maryland)
• Tetsuya Sakai (Co-chair, Microsoft Research Asia, PRC)
• Mark Sanderson (RMIT, Austraria)
• Ian Soboroff (NIST, US)
                                                          7
                  NTCIR-9 Tasks
CORE TASKS
• [Intent] Intent (with One-Click Access)
   – Subtask 1 CLICK -> CORE
• [RITE] Recognizing Inference in Text
• [GeoTime] Geotemporal information retrieval (x)
• [SpokenDoc] IR for Spoken Documents
PILOT TASKS
• [CrossLink] Cross-lingual Link Discovery -> Core
• [Vis-Ex] Interactive Visual Exploration (x)
• [PatentMT] Patent Machine Translation -> Core
                                                     8
         NTCIR-10 Accepted Tasks
Core
• [Intent-2] Search intent and diversification
• [1Click-2] One-Click Access
• [RITE-2] Recognizing Inference in Text
• [SpokenDoc-2] IR for Spoken Documents
• [PatentMT-2] Cross-lingual access to Patent Docs
• [CrossLink-2] Cross-lingual Link Discovery
Pilot
• [Math] Access to mathematical contents

                                                     9
                          Cross-lingual Link Discovery
Article: Australia                                No link was created for this term, for
                                                  finding articles in languages we                Links in other languages?
…                                                 prefer traditionally we do:                     New articles?
Ranked third in the Index of Economic                                                             Missing links?
Freedom (2010),[178] Australia is the world's                                                     Not what we are looking for?
thirteenth largest economy and has the ninth
highest per capita GDP; higher than that of the                                                   What about other relevant
United Kingdom, Germany, France, Canada,              Search                 Translate         links?
Japan, and the United States. The country was
ranked second in the United Nations 2010
Human Development Index and first in
Legatum's 2008 Prosperity Index.[179] All of
Australia's major cities fare well in global
comparative livability surveys;[180] Melbourne
reached first place on The Economist's 2011       人学济经                                         Cross-lingual Link Discovery
World's Most Livable Cities list, followed by     …
                                                                                    미노코이
                                                                                      트스
Sydney, Perth, and Adelaide in sixth, eighth,                The                    …
and ninth place respectively.[181] Total                     Economist
                                                             …
government debt in Australia is about $190                               ミノコエ
billion.[182] Australia has among the highest                              トス                    Cross-lingual Links
                                                                         …
house prices and some of the highest
household debt levels in the world.
                                                                                                 New Links
…                                                                                                Better Links
                                                                                                 More options
                                                   How to automatically create cross-lingual
                                                  links for a document if no links existing
                                                  yet?



   •All about multi-lingual knowledge discovery in knowledge bases (e.g. Wikipedia)
   •All about easy and efficient information access
INTENT-2: underspecified Query
Harry Potter
Search Results Diversification
  Produce SERP to sartisfy the various usrs’s
  intentions against the underspecified queries
(SERP)
       INTENT-1 & INTENT-2
INTENT-1@NTCIR-9                     INTENT-2@NTCIR-10

   INTENT-1                              INTENT-2
     topics                                Topics




    INTENT-1                              INTENT-2
     Systems                               Systems
               INTENT-1 system
               test s on INTENT-2       INTENT-2 system tests on
               Topics (Test the         INTENT-1 topics
               relationship of the
               two topic sets)          (show improvments from
                                        INTENT-1
          Traditional Search
     = More-than-One Click Access
                            湘南厚木病院
                        (Shonan Atsugi Hospital)
       Enter query


 Click SEARCH button

 Scan ranked list of URLs


      Click URL
   Read URL contents


Get all desired information
                   One Click Access
                               湘南厚木病院
                           (Shonan Atsugi Hospital)
         Enter query
                                         The system outputs                           X-string
   Click SEARCH button
                                                Phone: 046-223-3636. Fax: 046-223-3630. Address: 118-1
                                                Nurumizu, Atsugi, 243-8551. Email: soumu@shonan-
                                                atsugi.jp. Visiting hours: general ward Mon-Fri 15-20;
                                                Sat&Holidays 13-20 / Intensive Care Unit (ICU) 11-11:30,
                                                15:30, 19-19:30.




                                         Phone: 046-223-3636.
                                         Fax: 046-223-3630.
                                         Address: 118-1
                                         Nurumizu, Atsugi, 243-
                                                                      Particularly important for
                                         8551. Email:
                                         soumu@shonan-
                                         atsugi.jp. Visiting hours:
                                         general ward Mon-Fri
                                         15-20; Sat&Holidays 13-
                                         20 / Intensive Care Unit
                                         (ICU) 11-11:30, 15:30,
                                         19-19:30.
                                                                      mobile search
  Get all desired information


Go beyond the "ten-blue-link" paradigm in Web search
                                                   15
                   NTCIR-10 RITE-2 タスク
                (Recognizing Inference in TExt)



 Yotaro    Junta             Yusuke            Tomohide      Cheng-             Chuan-
Watanabe1 Mizuno1            Miyao2             Shibata3     Wei Lee4           Jie Lin5
   1Tohoku University    2National Institute     3Kyoto      4Academia       5National Taiwan

                           of Informatics       University     Sinica        Ocean University




        Shuming           Hiroshi         Koichi          Hideki         Teruko
          Shi6          Kanayama7        Takeda7          Shima8        Mitamura8
        6Microsoft             7IBM Research                 8Carnegie Mellon

       Research Asia                                            University

              NTCIR-10 Kick-off Event March 8th, 2012
Three Subtasks in NTCIR-9 RITE
Binary-class subtask
• Given a text pair <t1 ,t2>, your system will detect whether t1 entails a
  hypothesis t2 or not
Multi-class (5-way) subtask
• Given a text pair <t1 ,t2>, your system detects whether t1 and t2
    – Have entailment relation: t1 t2 / t1  t2 / t1  t2
    – Does not have entailment relation: Contradiction / unkown (cannot be
      determined)
RITE4QA subtask
• Same as the binary-class subtask, but in a Question Answering
  scenario
    – t2 is a question converted to affirmative statement with a wh-word
      replaced with an answer candidate. t1 is a sentence/paragraph
      containing the answer candidate.
    – What’s the impact of RITE on a practical dataset/task?
Entrance Exam subtask
                                                                             17
             NTCIR-10 RITE-2
• Binary-Class (BC)
• Multi-Class (MC) 4 types
• Entrance Exam
  – BC
  – Retrieval
     • Hypothesis + Whole (Wikipedia + Text book) Corpus
                       Spoken Doc
                    Target Speech Data
• Type of speech data
  – Broadcast news speech, podcast, lecture speech
                                            speech…
                                             Having noisier words

                                                Our target
• Databases
  – CSJ (Corpus of Spontaneous Japanese)
       • 2,702 lecture speeches, 628 hours
  –   New target!   Real academic meeting lectures collection
       • Over 70 speeches from the spoken document processing
         workshops
    Spoken Document Retrieval
• Ad-hoc Information Retrieval from lecture
  speeches
• Finding the passages including the relevant
  information related to a given query topic
• Query
  – Text query
  – Spoken Query (optional)
                 Goals of PatentMT
• To develop challenging and significant practical research into patent
  machine translation.

• To investigate the performance of state-of-the-art machine translation
  systems in terms of patent translations involving Japanese, English, and
  Chinese.

• To compare the effects of different methods of patent translation by
  applying them to the same test data.

• To create publicly-available parallel corpora of patent documents and
  human evaluations of MT results for patent information processing
  research.

• To drive machine translation research, which is an important technology
  for cross-lingual access of information written in unknown languages.

• The ultimate goal is fostering scientific cooperation.                     21
Findings of PatentMT at NTCIR-9
• SMT was the best system for Chinese to English and
  English to Japanese patent translation.
   – This is the first time for SMT to be demonstrated equal
     or better quality than that of the top-level RBMT for
     English to Japanese patent translation.
   – The pre-ordering method of NTT-UT for SMT is very
     effective for English to Japanese patent translation.

• 80% of patent sentences could be understood in the
  best system for Chinese to English patent translation.

• RBMT was the best system for Japanese to English
  patent translation.
                                                           22
     The Goal of NTCIR-10 Math Task
 • NTCIR Math Task aims at exploring methods
   for mathematical content access through its
   task design and the construction of the
   evaluation dataset.

[ Formula ]
a mathematical relationship or rule expressed in symbols
(Oxford Dictionary)


In science, a formula is a concise way of expressing
information, or a general relationship between
quantities. (Wikipedia)
                            Math Information Access
             Representations                                   Resources                             Requirement
 Embedded image (png, gif, ...)                       mathematical knowledge-base                      NISTEP Policy Study
                                                      and math ontology                          Mathematics as deserted science
                                                                                                     in Japanese S&T policy
                                                                                                  - Current situation on mathematical
                                                                               Strict Content
                                                                               MathML             sciences research in major countries
                                                       OpenMath                (W3C               and need for mathematical sciences
 Character sequence (latex source)                                             recommendation)     from the science in Japan (2006.5)
  log(z_1)+log(z_2) == log(Z_1,
  Z_2)¥¥; z_1+z_2 ¥geq 0                                                                         Q. Is mathematics related
                                                     Wolfram MathWorld :
                                                        13,081 entries (Sep.
                                                                                                      to your research?
                                                     13, 2011)
Web-browsable XML                                    Wolfram Functions
<math xmlns='http://www.w3.org/1998/Math                                                                               Strongly related
                                                     site :
/MathML' mathematica:form='TraditionalForm'
xmlns:mathematica='http://www.wolfram.com/XML
                                                        307,409 formulas
/'> <semantics> <mrow> <mrow> <mrow> <mrow>          (Sep. 15, 2011)
<mi> log </mi> <mo> &#8289; </mo> <mo> ( </mo>                                                                              Related
<msub> <mi> z </mi> <mn> 1 </mn> </msub>                                                            Somewhat
                                                                                                    related

XML for math semantics                                          Wikipedia:
<annotation-xml encoding='MathML-Content'>
<apply> <ci> Condition </ci> <apply> <eq />
                                                                  26,566                               77% researchers
                                                               mathematics
<apply> <plus /> <apply> <ln /> <apply> <ci>
                                                                 articles
                                                                                                      across diversity of
Subscript </ci> <ci> z </ci> <cn type='integer'> 1
</cn> </apply> </apply> <apply> <ln /> <apply>
                                                                                                     disciplines answered
<ci> Subscript </ci>                                                                                         ‘YES’.
      NTCIR-9 pilot VisEX Task Outline
                                Browser (Log Collection)                Experimental
         Provide
                                                                           Tasks
         Framework                                         Editor
                                 IAES Core
                                                                    Mainly by the Organizer
Organizer Provide
          Baseline                 Log Collection

                              Display … etc. Func.                        Laboratory
                                                                         Experiments
                Submit           Log Collection

                                        Info. Retrieval
                         Documents          Engine
 Participants
            It is important to discuss the followings through the WS
                                                                                    Human
                  • I/F between an IAES core and the framework                      Subjects
                  • Taxonomy of process primitives
                  • Detailed design of the laboratory experiments
                 NTCIR’s Grand Challenge
Infrastructure
                                                    Impact to real
Work task,                                           challenges in
  roles                                               our society
                                Intent
Interaction
                                         Vis-Ex
                                             NTCIR’s
 Between-        Time
 document
                                RITE Infrastructure for
                                          CrossLink
                                          IA Evaluation
 Document
 structure
                        GeoTime          PatentMT       SpokenDoc

                         News        Web        Legal        Speech
                 Special Issues
•   Diversified Search (IRJ)
•   RITE (TALIP)
•   LT for IA (NLPJ)
•   PATENT (IRJ)




                                  27
  http://research.nii.ac.jp/ntcir/ntcir-10/
  Registration is still open
  Conference : 18-21 June 2013
  EVIA : 18 June 2013

              Thank you for your attention!
                  For further enquiries, contact the NTCIR office
                                          ntc-secretariat nii.ac.jp

Q&A


                                                                      28

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:10/12/2013
language:English
pages:28
wu yunyi wu yunyi
About wuyyok@163.com