Docstoc

IR_Ji

Document Sample
IR_Ji Powered By Docstoc
					                                            ACLCLP-IR2010 Workshop




Web-Scale Knowledge Discovery and Population
           from Unstructured Data


                         Heng Ji
              Computer Science Department
          Queens College and the Graduate Center
                City University of New York
                    hengji@cs.qc.cuny.edu
                      December 3, 2010


                                                                     1/55
Outline
   Motivation of Knowledge Base Population (KBP)
   KBP2010 Task Overview
   Data Annotation and Analysis
   Evaluation Metrics
   A Glance of Evaluation Results
   CUNY-BLENDER team @ KBP2010
   Discussions and Lessons
   Preview of KBP2011
       Cross-lingual (Chinese-English) KBP
       Temporal KBP




                                                    2/55
Limitations of Traditional IE/QA Tracks
   Traditional Information Extraction (IE) Evaluations (e.g.
    Message Understanding Conference /Automatic Content
    Extraction programs)
       Most IE systems operate one document a time; MUC-style Event
        Extraction hit the 60% „performance ceiling‟
       Look back at the initial goal of IE
            Create a database of relations and events from the entire corpus
            Within-doc/Within-Sent IE was an artificial constraint to simplify the
             task and evaluation

   Traditional Question Answering (QA) Evaluations
       Limited efforts on disambiguating entities in queries
       Limited use of relation/event extraction in answer search




                                                                                 3/55
    The Goal of KBP
   Hosted by the U.S. NIST, started from 2009, supported by DOD, coordinated
    by Heng Ji and Ralph Grishman in 2010, 55 teams registered, 23 teams
    participated
   Our Goal
      Bridge IE and QA communities

      Promote research in discovering facts about entities and expanding a
        knowledge source
   What‟s New & Valuable
        Extraction at large scale (> 1 million documents) ;
        Using a representative collection (not selected for relevance);
        Cross-document entity resolution (extending the limited effort in ACE);
        Linking the facts in text to a knowledge base;
        Distant (and noisy) supervision through Infoboxes;
        Rapid adaptation to new relations;
        Support multi-lingual information fusion (KBP2011);
        Capture temporal information (KBP2011)

   All of these raise interesting and important research issues


                                                                                   4/55
Knowledge Base Population
(KBP2010) Task Overview




                            5/55
          KBP Setup




   Knowledge Base (KB)
       Attributes (a.k.a., “slots”) derived from Wikipedia
        infoboxes are used to create the reference KB

   Source Collection
       A large corpus of newswire and web documents
        (>1.3 million docs) is provided for systems to
        discover information to expand and populate KB
                                                              6/55
  Entity Linking: Create Wiki Entry?


                             NIL




Query = “James Parsons”




                                       7/55
    Entity Linking Task Definition
   Involve Three Entity Types
       Person, Geo-political, Organization
   Regular Entity Linking
       Names must be aligned to entities in the KB; can use
        Wikipedia texts
   Optional Entity linking
       Without using Wikipedia texts, can use Infobox values
   Query Example
    <query id="EL000304">
     <name>Jim Parsons</name>
     <docid>eng-NG-31-100578-11879229</docid>
    </query>


                                                                8/55
Slot Filling: Create Wiki Infoboxes?

       <query id="SF114">
         <name>Jim Parsons</name>
         <docid>eng-WL-11-174592-12943233</docid>
         <enttype>PER</enttype>
         <nodeid>E0300113</nodeid>
         <ignore>per:date_of_birth
                 per:age per:country_of_birth
                 per:city_of_birth</ignore>
        </query>

    School Attended: University of Houston




                                                    9/55
  Regular Slot Filling
                       Person                                      Organization
per:alternate_names                 per:title          org:alternate_names
per:date_of_birth                   per:member_of      org:political/religious_affiliation
per:age                             per:employee_of    org:top_members/employees
per:country_of_birth                per:religion       org:number_of_employees/members
per:stateorprovince_of_birth        per:spouse         org:members
per:city_of_birth                   per:children       org:member_of
per:origin                          per:parents        org:subsidiaries
per:date_of_death                   per:siblings       org:parents
per:country_of_death                per:other_family   org:founded_by
per:stateorprovince_of_death        per:charges        org:founded
per:city_of_death                                      org:dissolved
per:cause_of_death                                     org:country_of_headquarters
per:countries_of_residence                             org:stateorprovince_of_headquarters
per:stateorprovinces_of_residence                      org:city_of_headquarters
per:cities_of_residence                                org:shareholders
per:schools_attended                                   org:website

                                                                                       10/55
Data Annotation and Analysis




                               11/55
       Annotation million newswire
Data collection: about 1.3Overview docs and 500K web docs,
Source
a few speech transcribed docs

Entity Linking   Genre/Source                  Size (entity mentions)
   Corpus                            Person        Organization         GPE
   Training      2009 Training        627               2710            567
                 2010 Web data        500                500            500
  Evaluation       Newswire           500                500            500
                   Web data           250                250            250

 Slot Filling        Task             Source               Size (entities)
   Corpus                                             Person      Organization
  Training       Regular Task     2009 Evaluation       17              31
                                 2010 Participants      25              25
                                    2010 LDC            25              25
                 Surprise Task      2010 LDC            16              16
 Evaluation      Regular Task          LDC              50              50
                 Surprise Task         LDC              20              20

                                                                                 12/55
Entity Linking Inter-Annotator Agreement

                Annotator 1                          Annotator 2




                                       Annotator 3

Entity Type       #Total Queries   Agreement Rate     Genre        #Disagreed Queries
  Person                59            91.53%         Newswire              4
                                                     Web Text              1
Geo-political           64             87.5%         Newswire              3
                                                     Web Text              5
Organization            57            92.98%         Newswire              3
                                                     Web Text              1


                                                                                13/55
Slot Filling Human Annotation Performance
   Evaluation assessment of LDC Hand Annotation
               Performance                  P(%) R(%)        F(%)
                 All Slots                  70.14 54.06      61.06
        All except per:top-employee,        71.63    57.6    63.86
             per:member_of, per:title

   Why is the precision only 70%?
       32 responses were judged as inexact and 200 as wrong answers
       A third annotator‟s assessment on 20 answers marked as wrong:
           65% incorrect; 15% correct; 20% uncertain
       Some annotated answers are not explicitly stated in the document
            … some require a little world knowledge and reasoning
       Ambiguities and underspecification in the annotation guideline
       Confusion about acceptable answers
       Updates to KBP2010 annotation guideline for assessment

                                                                       14/35
                                                                       14/55
    Slot Filling Annotation Bottleneck
   The overlap rates between two participant annotators in community
    are generally lower than 30%
   Keep adding more human annotators help? No




                                                                 15/23
                                                                 15/55
        Can Amazon Mechanical Turk Help?
                                     Useful Annotations   Useless Annotations (58.2%)
                                             (41.8%)

                                    Cases       Number      Cases         Number

                                    YYYYY         230     YYYNN             164

   Given a q, a and                NNNNN          16     YYYNU             165

    supporting context              YYYYU         151     NNNYY             158

    sentence, Turk should           NNNNU          24     YYNNU             171

                                    YYYYN         227     YYNUU              77
    judge if the answer is          NNNNY          46     YYUUU              17
       Y: correct; N: incorrect;   YYYUU          13     NNNYU              72

        U: unsure                   NNNUU          59     NNYUU              57

   Result Distribution for                               YNUUU              22

                                                                             8
    1690 instances                                        YUUUU

                                                          NNUUU              11

                                                          NUUUU              1

                                                          UUUUU              1


                                                                                  16/23
                                                                                  16/55
     Why is Annotation so hard for Non-Experts?
     Even for all-agreed cases, some annotations are
      incorrect…
     Query           Slot      Answer                     Context
                org:                      He and Tim Sullivan, Citibank's Boston
                                 Tim      area manager, said they still to plan
    Citibank    top_members    Sullivan   seek advice from activists going forward.
                /employees

                                          President George W. Bush said
International   org:            World     Saturday that a summit of world leaders
  Monetary      subsidiaries    Bank      agreed to make reforms to the World
    Fund
                                          Bank and International Monetary Fund.


     Require quality control
     Training difficulties

                                                                              17/23
                                                                              17/55
Evaluation Metrics




                     18/35
                     18/55
Entity Linking Scoring Metric
   Micro-averaged Accuracy (official metric)
       Mean accuracy across all queries


   Macro-averaged Accuracy
       Mean accuracy across all KB entries




                                                19/35
                                                19/55
Slot Filling Scoring Metric
   Each response is rated as correct, inexact, redundant, or
    wrong (credit only given for correct responses)
       Redundancy: (1) response vs. KB; (2) among responses: build
        equivalence class, credit only for one member of each class


   Correct = # (non-NIL system output slots judged correct)
   System = # (non-NIL system output slots)
   Reference =
      # (single-valued slots with a correct non-NIL response) +
      # (equivalence classes for all list-valued slots)

   Standard Precision, Recall, F-measure


                                                                20/35
                                                                20/55
Evaluation Results




                     21/35
                     21/55
Top-10 Regular Entity Linking Systems
                      <0.8 correlation between overall vs.
                         Non-NIL performance




                                                             22/35
                                                             22/55
Human/System Entity Linking Comparison (subset of 200 queries)




           Average among three annotators




                                                                 23/35
                                                                 23/55
Top-10 Regular Slot Filling Systems




                                      24/35
                                      24/55
CUNY-BLENDER Team @
KBP2010




                      25/55
System Overview             Query

                   Query Expansion
                                                         External KBs


           IE         Pattern          QA        Free
                     Matching                             Wikipedia
                                                 Base

                Answer               Answer                     Text
                Filtering           Validation                 Mining


        Cross-System & Cross-Slot Reasoning


            Statistical Answer Re-ranking

             Priority-based Combination

                 Inexact & Redundant               Answer
                   Answer Removal                 Validation


                        Answers
                                                                        26/23
                                                                        26/55
IE Pipeline
      Apply ACE Cross-document IE (Ji et al., 2009)
      Mapping ACE to KBP, examples:
                 KBP 2010 slots                    ACE2005 relations/ events
    per:date_of_birth, per:country_of_birth,   event: be-born
    per:stateorprovince_of_birth,
    per:city_of_birth
    per:countries_of_residence,                relation:citizen-resident-religion-
    per:stateorprovinces_of_residence,             ethnicity
    per:cities_of_residence,per:religion
    per:school_attended                        relation:student-alum
    per:member_of                              relation:membership,
                                               relation:sports-affiliation
    per:employee_of                            relation:employment
    per:spouse, per:children, per:parents,     relation:family, event: marry,
    per:siblings, per:other_family             event:divorce
    per:charges                                event:charge-indict, event:convict

                                                                               27/23
                                                                               27/55
    Pattern Learning Pipeline
   Selection of query-answer pairs from Wikipedia
    Infobox
       split into two sets
   Pattern extraction
       For each {q,a} pair, generalize patterns by entity tagging
        and regular expressions e.g. <q> died at the age of <a>
   Pattern assessment
       Evaluate and filter based on matching rate
   Pattern matching
       Combine with coreference resolution
   Answer Filtering based on entity type checking,
    dictionary checking and dependency parsing
    constraint filtering

                                                               28/23
                                                               28/55
    QA Pipeline
   Apply open domain QA system, OpenEphyra
    (Schlaefer et al., 2007)
   Relevance metric related to PMI and CCP
       Answer pattern probability:
         P (q, a) = P (q NEAR a): NEAR within the same sentence
         boundary
                      freq(q NEAR a)
           R(q, a)                     # sentences
                     freq(q)  freq(a)

   Limited by occurrence based confidence and recall
    issues


                                                                  29/23
                                                                  29/55
More Queries and Fewer Answers
   Query Template expansion
       Generated 68 question templates for organizations and 68
        persons
           Who founded <org>?
           Who established <org>?
           <org> was created by who?
   Query Name expansion
       Wikipedia redirect links
   Heuristic rules for Answer Filtering
       Format validation
       Gazetteer based validation
       Regular expression based filtering
       Structured data identification and answer filtering
                                                              30/23
                                                              30/55
Motivation of Statistical Re-Ranking
       Union and voting are too sensitive to the performance of
        baseline systems
           Union guarantees highest recall
              requires comparable performance
           Voting
              assumes more frequent answers are more likely true (FALSE)
           Priority-based combination
              voting with weights
              assumes system performance does not vary by slot (FALSE)
              Slot                           IE    QA       PL
              org:country_of_headquarters   75.0   15.8    16.7
              org:founded                    -     46.2      -
              per:date_of_birth             100    33.3    76.9
              per:origin                     -     22.6     40


                                                                            31/23
                                                                            31/55
Statistical Re-Ranking
   Maximum Entropy (MaxEnt) based supervised re-
    ranking model to re-rank candidate answers for the
    same slot
   Features
       Baseline Confidence
       Answer Name Type
       Slot Type X System
       Number of Tokens X Slot Type
       Gazetteer constraints
       Data format
       Context sentence annotation (dependency parsing, …)
       …



                                                              32/23
                                                              32/55
MLN-based Cross-Slot Reasoning
   Motivation
       each slot is often dependent on other slots
       can construct new „revertible‟ queries to verify candidate
        answers
       X is per:children of Y  Y is per:parents of X;
       X was born on date Y  age of X is approximately (the current
        year – Y)

   Use Markov Logic Networks (MLN) to encode
    cross-slot reasoning rules
       Heuristic inferences are highly dependent on the order of
        applying rules
       MLN can
          adds a weight to each inference rule
          integrates soft rules and hard rules



                                                                    33/23
                                                                    33/55
       Error Analysis on Supervised Model
   Name Error Examples
               classification errors     spurious errors

    <PER>Faisalabad</PER>'s <PER>Catholic Bishop</PER> <PER>John
       Joseph</PER>, who had been campaigning against the law, shot himself
       in the head outside a court in Sahiwal district when the judge convicted
       Christian Ayub Masih under the law in 1998.

                 missing errors
   Nominal Missing Error Examples
      supremo/shepherd/prophet/sheikh/Imam/overseer/oligarchs/Shiites
       …
   Intuitions of using lexical knowledge discovered from ngrams
        Each person has a Gender (he, she…) and is Animate (who…)

                                                                        34/55
    Motivations of Using Web-scale Ngrams
   Data is Power
       Web is one of the largest text corpora: however,
        web search is slooooow (if you have a million queries).


   N-gram data: compressed version of the web
       Already proven to be useful for language modeling
       Google N-gram: 1 trillion token corpus




                        (Ji and Lin, 2009)


                                                                  35/55
car 13966, automobile 2954, road 1892, auto 1650, traffic 1549, tragic 1480, motorcycle 1399,
boating 823, freak 733, drowning 438, vehicle 417, hunting 304, helicopter 289, skiing 281,
mining 254, train 250, airplane 236, plane 234, climbing 231, bus 208, motor 198, industrial 187
swimming 180, training 170, motorbike 155, aircraft 152, terrible 137, riding 136, bicycle 132,
diving 127, tractor 115, construction 111, farming 107, horrible 105, one-car 104, flying 103, hit-
and-run 99, similar 89, racing 89, hiking 89, truck 86, farm 81, bike 78, mine 75, carriage 73,
logging 72, unfortunate 71, railroad 71, work-related 70, snowmobile 70, mysterious 68, fishing
67, shooting 66, mountaineering 66, highway 66, single-car 63, cycling 62, air 59, boat 59,
horrific 56, sailing 55, fatal 55, workplace 50, skydiving 50, rollover 50, one-vehicle 48, <UNK>
48, work 47, single-vehicle 47, vehicular 45, kayaking 43, surfing 42, automobile 41, car 40,
electrical 39, ATV 39, railway 38, Humvee 38, skating 35, hang-gliding 35, canoeing 35, 0000
35, shuttle 34, parachuting 34, jeep 34, ski 33, bulldozer 31, aviation 30, van 30, bizarre 30,
wagon 27, two-vehicle 27, street 27, glider 26, " 25, sawmill 25, horse 25, bomb-making 25,
bicycling 25, auto 25, alcohol-related 24, snowboarding 24, motoring 24, early-morning 24,
trucking 23, elevator 22, horse-riding 22, fire 22, two-car 21, strange 20, mountain-climbing 20,
drunk-driving 20, gun 19, rail 18, snowmobiling 17, mill 17, forklift 17, biking 17, river 16,
motorcyle 16, lab 16, gliding 16, bonfire 16, apparent 15, aeroplane 15, testing 15, sledding 15,
scuba-diving 15, rock-climbing 15, rafting 15, fiery 15, scooter 14, parachute 14, four-wheeler
14, suspicious 13, rodeo 13, mountain 13, laboratory 13, flight 13, domestic 13, buggy 13,
horrific 12, violent 12, trolley 12, three-vehicle 12, tank 12, sudden 12, stupid 12, speedboat 12,
single 12, jousting 12, ferry 12, airplane 12, unrelated 11, transporter 11, tram 11, scuba 11,
                                                                                            36/55
common 11, canoe 11, skateboarding 10, ship 10, paragliding 10, paddock 10, moped 10,
          Gender Discovery from Ngrams

    Discovery Patterns (Bergsma et al., 2005, 2008)
           (tag=N.*|word=[A-Z].*) tag=CC.* (word=his|her|its|their)
           (tag=N.*|word=[A-Z].*) tag=V.* (word=his|her|its|their)
           …
    If a mention indicates male and female with high confidence,
     it‟s likely to be a person mention
    Patterns for candidate mentions   male   female   neutral   plural
     John Joseph bought/… his/…       32       0        0         0
           Haifa and its/…            21      19        92       15
    screenwriter published/… his/…    144     27        0         0
            it/… is/… fish            22      41       1741     1186




                                                                         37/55
        Animacy Discovery from Ngrams
   Discovery Patterns
       Count the relative pronoun after nouns
       not (tag=(IN|[NJ].*) tag=[NJ].* (? (word=,))
        (word=who|which|where|when)
   If a mention indicates animacy with high confidence, it‟s likely
    to be a person mention
           Patterns for          Animate          Non-Animate
        candidate mentions         who     when        where    which
             supremo               24        0          0        0
            shepherd               807      24          0        56
             prophet              7372     1066         63      1141
              imam                 910      76          0        57
            oligarchs              299       13         0        28
             sheikh                338       11         0        0

                                                                        38/55
       Unsupervised Mention Detection Using
           Gender and Animacy Statistics
   Candidate mention detection
      Name: capitalized sequence of <=3 words; filter stop words,
       nationality words, dates, numbers and title words
      Nominal: un-capitalized sequence of <=3 words without stop
       words
   Margin Confidence Estimation
         freq (best property) – freq (second best property)
                       freq (second best property)
   Confidence (candidate, Male/Female/Animate) > 
      Full Matching: John Joseph (M:32)
      Composite Matching: Ayub (M:87) Masih (M:117)
      Relaxed Matching:
         Mahmoud (M:159 F:13) Hamadan(N:19) Salim(F:13 M:188)
        Qawasmi(M:0 F:0)

                                                                     39/55
          Mention Detection Performance

                 Methods                      P(%)    R(%)     F(%)
    Name             Supervised Model        88.24    81.08    84.51
   Mention
                  Unsupervised Methods       87.05    82.34    84.63
   Detection
                     Using Ngrams

   Nominal           Supervised Model        85.93    70.56    77.49
   Mention        Unsupervised Methods       71.20    85.18    77.57
   Detection         Using Ngrams

• Apply the parameters optimized on dev set directly on the blind test set
• Blind test on 50 ACE05 newswire documents, 555 person name
  mentions and 900 person nominal mentions

                                                                       40/55
       Impact of Statistical Re-Ranking
                         Pipelines      Precision   Recall   F-measure
    Bottom-up          Supervised IE     0.2416     0.1421     0.1789
                     Pattern Matching    0.2186     0.3769     0.2767
    Top-down                QA           0.2668     0.1730     0.2099
        Priority based Combination       0.3048     0.2658     0.2840
    Re-Ranking based Combination         0.2797     0.4433    0.3430



    5-fold cross-validation on training set
    Mitigate the impact of errors produced by scoring based on
     co-occurrence (slot type x sys feature)
    e.g. the query “Moro National Liberation Front” and answer
     “1976”did not have a high co-occurrence, but was bumped
     up by the re-ranker based on the slot type feature
     org:founded

                                                                  41/23
                                                                  41/55
  Impact of Cross-Slot Reasoning


Operations          Total          Correct(%)      Incorrect(%)
 Removal             277              88%              12%
  Adding             16              100%              0%


 Brian McFadden | per:title | singers | “She had two daughter
 with one of the MK’d Westlife singers, Brian McFadden, calling
 them Molly Marie and Lilly Sue”




                                                                    42/23
                                                                  42/55
Slot-Specific Analysis
   A few slots account for a large fraction of the answers:
        per:title, per:employee_of, per:member_of, and
         org:top_members/employees account for 37% of correct responses

   For a few slots, delimiting exact answer is difficult …
    result is „inexact‟ slot fills
        per:charges, per:title (“rookie driver”; “record producer”)

   For a few slots, equivalent-answer detection is important
    to avoid redundant answers
        per:title again accounts for the largest number of cases. e.g.,
         “defense minister” and “defense chief” are equivalent.




                                                                           43/35
                                                                           43/55
How much Inference is Needed?




                                44/35
                                44/55
Why KBP is more difficult than ACE

   Cross-sentence Inference – non-identity coreference(per:children)
       Lahoud is married to an Armenian and the couple have three children. Eldest
        son Emile Emile Lahoud was a member of parliament between 2000 and 2005.

   Cross-slot Inference (per:children)
       People Magazine has confirmed that actress Julia Roberts has given birth to her
        third child a boy named Henry Daniel Moder. Henry was born Monday in Los
        Angeles and weighed 8? lbs. Roberts, 39, and husband Danny Moder, 38, are
        already parents to twins Hazel and Phinnaeus who were born in November
        2006.




                                                                                  45/35
                                                                                  45/55
Statistical Re-Ranking based Active Learning




                                               46/23
                                               46/55
Preview of KBP2011




                     47/55
Cross-lingual Entity Linking




                    Query = “吉姆.帕森斯”




                                       48/55
Cross-lingual Slot Filling




   Other family: Todd Spiewak




               Query = “James Parsons”



                                         49/55
Cross-lingual Slot Filling
   Two Possible Strategies
       1. Entity Translation (ET) + Chinese KBP
       2. Machine Translation (MT) + English KBP
   Stimulate Research on
       Information-aware Machine Translation
       Translation-aware Information Extraction
       Foreign Language KBP, Cross-lingual Distant Learning




                                                               50/55
    Error Example of SF on MT output
   Query: Elizabeth II
   Slot type: per:cities_of_residence
   Answer: Gulf
   XIN20030511.0130.0011 | Xinhua News Agency,
    London , may 10 -according to British media ten,
    British Queen Elizabeth II did not favour in the
    Gulf region to return British unit to celebrate the
    victory in the war.




                                                      51/55
    Query Name in Document Not Translated

    Query: Celine Dion
    Answer: PER:Origin = Canada
    British singer , Clinton's plan to Caesar Palace of the ( Central
     news of UNAMIR in Los Angeles , 15th (Ta Kung Pao) -
     consider British singer , Clinton ( ELT on John ) today,
     according to the Canadian and the seats of the Matignon
     Accords , the second to Las Vegas in the international arena
     heavyweight.




                                                                    52/55
Answer in Document Not Translated
   Query: David Kelly
   Answer: per:schools_attended = Oxford University
   MT: The 59-year-old Kelly is the flea basket for trapping
    fish microbiology and internationally renowned biological
    and chemical weapons experts. He had participated in
    the United Nations Iraq weapons verification work, and
    the British Broadcasting Corporation ( BBC ) the British
    Government for the use of force on Iraq evidence the
    major sources of information. On , Kelly in the nearby
    slashed his wrist, and public opinion holds that he &quot;
    cannot endure the enormous psychological pressure
    &quot; to commit suicide.


                                                           53/55
Temporal KBP (Slot Filling)




                              54/55
Temporal KBP
   Many attributes such as a person‟s title and employer,
    and spouse change over time
      Time-stamped data is more valuable

      Distinguish static attributes and dynamic attributes

      Address the multiple answer problem


   What representation to require?
       <start_date, end_date>
           Such explicit info rarely provided
       <<earliest_start_date, latest_start_date>,
         <earliest_end_date, latest_end_date>>
           Captures wider range of information

                                                      55/55
Temporal KBP: scoring
   Score each element of 4-tuple separately,
    then combine scores
   Smoothed score to handle +∞ and -∞
   Need rules for granularity mismatches
          Year vs month vs day
   Possible Formula (constraint based validation)
    key = <t1, t2, t3, t4> ; answer = <x1, x2, x3, x4>
    if xi is judged as incorrect then S ( xi )  0
    otherwise            1       1
                 S ( xi )   (                )
                           4 1 | ti  xi | m



                                                         56/55
Need Cross-document Aggregation
   Query: Ali Larijani; Answer: Iran
 Doc1:

Ali Larijani had held the post for over two years but resigned after
    reportedly falling out with the hardline Ahmadinejad over the handling
    of Iran's nuclear case.
 Doc2:

The new speaker, Ali Larijani, who resigned as the country's nuclear
    negotiator in October over differences with Ahmadinejad, is a
    conservative and an ardent advocate of Iran's nuclear program, but is
    seen as more pragmatic in his approach and perhaps willing to
    engage in diplomacy with the West.




                                                                     57/55
Same Relation Repeat Over Time
   Query: Mark Buse; Answer: McCain
   Doc1: NYT_ENG_20080220.0185.LDC2009T13.sgm
    (seven years, P7Y); (2001, 2001) In his case, it was a round
    trip through the revolving door: Buse had directed McCain's
    committee staff for seven years before leaving in 2001 to lobby
    for telecommunications companies.
   Doc2:LTW_ENG_20081003.0118.LDC2009T13.sgm
    (this year, 2008) Buse returned to McCain's office this year
    as chief of staff.




                                                              58/55
Require Paraphrase Discovery
   Query: During when was R. Nicholas Burns a member of the U.S. State Department?
   Answer: 1995-2008
   <DOCID> APW_ENG_19950112.0477.LDC2007T07 </DOCID>
    R. Nicholas Burns, a career foreign service officer in charge of Russian affairs at
    the National Security Council, is due to be named the new spokesman at the U.S.
    State Department, a senior U.S. official said Thursday.
   [APW_ENG_20070324.0924.LDC2009T13 and many other DOCS]
    The United States is "very pleased by the strength of this resolution" after two years
    of diplomacy, said R. Nicholas Burns, undersecretary for political affairs at the
    State Department.
   <DOCID> NYT_ENG_20080118.0161.LDC2009T13 </DOCID>
    R. Nicholas Burns, the country's third-ranking diplomat and Secretary of State
    Condoleezza Rice's right-hand man, is retiring for personal reasons, the State
    Department said Friday.
   <DOCID> NYT_ENG_20080302.0157.LDC2009T13 </DOCID>
    The chief U.S. negotiator, R. Nicholas Burns, who left his job on Friday, countered
    that the sanctions were all about Iran's refusal to stop enriching uranium, not about
    weapons. But that argument was a tough sell.


                                                                                      59/55
Related Work
   Extracting slots for persons and organizations (Bikel et al., 2009; Li et
    al., 2009; Artiles et al., 2008)
        Distant Learning (Mintz et al., 2009)
   Re-ranking techniques (e.g. Collins et al., 2002; Zhai et al., 2004; Ji et
    al., 2006)
   Answer validation for QA (e.g. Magnini et al., 2002; Peatas et al.,
    2007; Ravichandran et al., 2003; Huang et al., 2009)
   Inference for Slot Filling (Bikel et al., 2009; Castelli et al., 2010)




                                                                         60/23
                                                                         60/55
Conclusions
   KBP proves a much more challenging task than traditional IE/QA
   Brings great opportunity to stimulate research and collaborations
    across communities
   An adventure to promote IE to web-scale processing and higher
    quality
   Encourage research on cross-document cross-lingual IE
   Big gains from statistical re-ranking combining 3 pipelines
        Information Extraction
        Pattern Learning
        Question-Answering
   Further gains from MLN cross-slot reasoning
   Automatic profiles from SF dramatically improve EL
   Human-system combination provides efficient answer-key
    generation
      Faster, better, cheaper!




                                                                        61/55
Thank you and


Join us:
http://nlp.cs.qc.cuny.edu/kbp/2010




                                        62
                                     62/55

				
DOCUMENT INFO