TDT 2002 Straw Man
TDT 2001 Workshop
November 12-13, 2001
Corpus
TDT-4
English and Mandarin required
Arabic optional, but encouraged
Provided in TREC-friendly format
Made substantially cheaper
Topics
60 new topics, as before
“Brief” redefined to be “more useful”
Stand-off notation standard
So sites can provide useful annotations
Tasks
Drop tracking
Transition to TREC filtering
Entry task if TREC does not pick it up
Keep segmentation (sunset?)
Focus on FSD and SLD
Exploratory evaluations on clustering
New event-based evaluation
Changes to “brief”
Currently “brief” is “less than 10% is on
topic”
LDC does this strictly
So 10.5% is “YES”!
Prefer notion of whether topic is central to
the story or not
If central topic, then YES
If mentioned in passing, then BRIEF
Requires a SHARED-CENTRAL possibility?
This idea requires rethinking
Standoff anotation
TDT-2 and TDT-3 distributed with:
ASR into text
SYSTRAN into English
Named entity tagging
Standardize a means for sites to provide
other annotations:
POS or parsings
Co-references for named entities
Time expressions with normalization
Alternate translations
Subject-like headings à la BBN’s tags
…
Clustering evaluations
Few people happy with clustering measures
Many people unhappy with central idea of
clustering
Partitioning of corpus (single topic stories)
No hierarchies permitted in results
Allow exploration of new models for
clustering
Perhaps inspired by IFE Bio and IFE Arabic?
Both systems have UMass detection running
Or new problems based on clustering
Linking clusters, describing their substructures,
…
Event-based evaluation
Most (not all) TDT approaches would work
just as well for IR filtering or event IR
document retrieval
Force exploration of TDT-specific needs
Topic is made of events
Seminal events and inevitable ones
Event is something that happens
somewhere at a particular time
Who, where, when, what
Explicitly capture components of events?
Event-based straw man
Based on link detection
Given two stories:
Is the perpetrator (“who”) the same?
Do they describe events that take place at the
same location?
…at the same time?
Idea:
If two stories talk about events at the same
time, they’re more likely to be talking about the
same event (obviously more than time needed)