Embed
Email

TDT 2002 Straw Man

Document Sample
TDT 2002 Straw Man
Shared by: HC1111252177
Categories
Tags
Stats
views:
0
posted:
11/25/2011
language:
English
pages:
8
TDT 2002 Straw Man





TDT 2001 Workshop

November 12-13, 2001

Corpus

 TDT-4

 English and Mandarin required

 Arabic optional, but encouraged

 Provided in TREC-friendly format

 Made substantially cheaper

 Topics

 60 new topics, as before

 “Brief” redefined to be “more useful”

 Stand-off notation standard

 So sites can provide useful annotations

Tasks

 Drop tracking

 Transition to TREC filtering

 Entry task if TREC does not pick it up

 Keep segmentation (sunset?)

 Focus on FSD and SLD

 Exploratory evaluations on clustering

 New event-based evaluation

Changes to “brief”

 Currently “brief” is “less than 10% is on

topic”

 LDC does this strictly

 So 10.5% is “YES”!

 Prefer notion of whether topic is central to

the story or not

 If central topic, then YES

 If mentioned in passing, then BRIEF

 Requires a SHARED-CENTRAL possibility?

 This idea requires rethinking

Standoff anotation

 TDT-2 and TDT-3 distributed with:

 ASR into text

 SYSTRAN into English

 Named entity tagging

 Standardize a means for sites to provide

other annotations:

 POS or parsings

 Co-references for named entities

 Time expressions with normalization

 Alternate translations

 Subject-like headings à la BBN’s tags

 …

Clustering evaluations

 Few people happy with clustering measures

 Many people unhappy with central idea of

clustering

 Partitioning of corpus (single topic stories)

 No hierarchies permitted in results

 Allow exploration of new models for

clustering

 Perhaps inspired by IFE Bio and IFE Arabic?

 Both systems have UMass detection running

 Or new problems based on clustering

 Linking clusters, describing their substructures,



Event-based evaluation

 Most (not all) TDT approaches would work

just as well for IR filtering or event IR

document retrieval

 Force exploration of TDT-specific needs

 Topic is made of events

 Seminal events and inevitable ones

 Event is something that happens

somewhere at a particular time

 Who, where, when, what

 Explicitly capture components of events?

Event-based straw man

 Based on link detection

 Given two stories:

 Is the perpetrator (“who”) the same?

 Do they describe events that take place at the

same location?

 …at the same time?

 Idea:

 If two stories talk about events at the same

time, they’re more likely to be talking about the

same event (obviously more than time needed)


Related docs
Other docs by HC1111252177
??????????? ?
Views: 5  |  Downloads: 0
Table 2
Views: 0  |  Downloads: 0
????????????????????? ...
Views: 9  |  Downloads: 0
Sheet1
Views: 0  |  Downloads: 0
????? ????
Views: 30  |  Downloads: 0
RETIREE CLUB NEWS
Views: 27  |  Downloads: 0
CERA-MUNICIPIOS
Views: 105  |  Downloads: 0
News Release
Views: 5  |  Downloads: 0
??? ??????
Views: 2  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!