Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Domain adaptation in named entity recognition

VIEWS: 6 PAGES: 81

									 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks



                Domain adaptation in named entity recognition

                                     Massimiliano Ciaramita

                                   Yahoo! Research Barcelona, Spain
                 joint work with Yasemin Altun, Jordi Atserias, Giuseppe Attardi, Olivier
                         Chapelle, Peter Mika, Mihai Surdeanu, Hugo Zaragoza


                              University of Sussex, Nov. 22 2007
 M. Ciaramita
                The domain effect
Domain
adaptation in
NLP

Named entity
detection

Concluding      Named entity recognition: detect entity mentions in text
remarks
                (news):
 M. Ciaramita
                The domain effect
Domain
adaptation in
NLP

Named entity
detection

Concluding      Named entity recognition: detect entity mentions in text
remarks
                (news):
                    State-of-the-art: sequence models (HMMs)
                    discriminatively trained on existing annotated data (e.g.,
                    BBN, ACE, CoNLL)
 M. Ciaramita
                The domain effect
Domain
adaptation in
NLP

Named entity
detection

Concluding      Named entity recognition: detect entity mentions in text
remarks
                (news):
                    State-of-the-art: sequence models (HMMs)
                    discriminatively trained on existing annotated data (e.g.,
                    BBN, ACE, CoNLL)
                    Very accurate “in domain”, i.e., when the target data is
                    similar to the training data
 M. Ciaramita
                The domain effect
Domain
adaptation in
NLP

Named entity
detection

Concluding      Named entity recognition: detect entity mentions in text
remarks
                (news):
                    State-of-the-art: sequence models (HMMs)
                    discriminatively trained on existing annotated data (e.g.,
                    BBN, ACE, CoNLL)
                    Very accurate “in domain”, i.e., when the target data is
                    similar to the training data
                    What happens “out of domain”?
 M. Ciaramita
                Close: Web News
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks

                ISLAMABAD, Pakistan - Former Prime Minister Benazir
                Bhutto will not be allowed to hold a protest procession across
                Pakistan because it will violate a ban on political rallies under
                the state of emergency, a government spokesman said
                Monday.
 M. Ciaramita
                Close: Web News
Domain
adaptation in
NLP

Named entity
detection

Concluding
                 ISLAMABAD CITY , Pakistan COUNTRY - Former
remarks
                 Prime Minister PER DESC Benazir Bhutto PER will not be
                allowed to hold a protest procession across Pakistan COUNTRY
                because it will violate a ban on political rallies under the
                 state GPE DESC of emergency, a government ORG DESC
                 spokesman   PER DESC   said Monday   DATE .




                    Tags: CORRECT, WRONG, SUSPICIOUS
 M. Ciaramita
                Not so close: Web News - Part 2
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks         TOKYO - Japan ’s whaling fleet was set to leave port
                Sunday for its biggest-ever scientific whale hunt in the South
                Pacific, the government fisheries agency said. Greenpeace and
                the animal rights activist group Sea Shepherd have said they
                will track the South Pacific hunt. Humpback whales were
                hunted to near-extinction four decades ago. Japanese fisheries
                officials insist, however, that the animals population has
                returned to a sustainable level.
 M. Ciaramita
                Not so close: Web News - Part 2
Domain
adaptation in
NLP
                 TOKYO     CITY    - Japan   COUNTRY    ’s whaling fleet   VEHICLE
Named entity
detection       was set to leave port      FAC DESC   Sunday   DATE   for its
Concluding                                              ORG DESC
remarks         biggest-ever scientific whale hunt          in the
                 South Pacific  OCEAN , the government fisheries

                 agency ORG DESC said. Greenpeace ORG and the
                 animal   ANIMAL   rights activist group   PER DESC

                 Sea Shepherd      PER   have said they will track the
                South Pacific         hunt PER DESC . Humpback DISEASE
                                  OCEAN

                whales were hunted to near-extinction four decades ago.
                Japanese NATIONALITY fisheries WEAPON officials PER DESC
                insist, however, that the animals ANIMAL
                 population PER DESC has returned to a sustainable level.
 M. Ciaramita
                Different: Yahoo! Answers
Domain
adaptation in
NLP
                Who will win at Survivor Series?
Named entity
detection       The Hell in the Cell Match for the World Heavyweight
Concluding      Championship between Batista and the Undertaker. A team
remarks
                captained by Triple H takes on a team captained by Umaga in a
                classic Survivor Series elimination match. Plus a 10 diva tag team
                match. A triple threat match for the ECW title between CM Punk,
                John Morrison, and the Miz. Lance Cade and Trevor Murdoch
                defend the World tag team titles against Hardcore Holly and Cody
                Rhodes. Hornswoggle will face the Great Khali. Randy Orton will
                defend the WWE title against Shawn Michaels. If Orton gets
                disqualified, he loses the title. If HBK uses Sweet Chin Music or
                attempts to use it, he loses the match and will never get another
                shot as long as Orton is the champion. In your opinion, who will
                win at Survivor Series? Who will survive?
 M. Ciaramita
                Different: Yahoo! Answers
Domain
adaptation in
                Who will win at Survivor Series EVENT ? The Hell CITY in the
NLP
                Cell Match ORG for the World Heavyweight Championship ORG
Named entity
detection       between Batista ORG and the Undertaker PER DESC . A team
Concluding
remarks
                captained by Triple H ORG takes on a team captained by
                         PER                                  EVENT
                 Umaga         in a classic Survivor Series           elimination match.
                                 PER DESC             PER DESC
                Plus a 10 diva          tag team           match. A triple threat
                match for the ECW ORG GOV title between CM Punk ORG ,
                 John Morrison PER , and the Miz VEHICLE . Lance Cade PER and
                 Trevor Murdoch PER defend the World ORG tag team titles against
                 Hardcore Holly PER and Cody Rhodes PER . Hornswoggle ORG
                                            EVENT                      PER
                will face the Great Khali           . Randy Orton            will defend the
                       NULL                                    PER
                 WWE         title against Shawn Michaels     . If Orton CITY gets
                disqualified, he loses the title. If HBK ORG uses
                 Sweet Chin Music PER or attempts to use it, he loses the match
                and will never get another shot as long as Orton CITY ...
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection       1   Domain adaptation in NLP
Concluding
remarks




                2   Named entity detection



                3   Concluding remarks
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                1   Domain adaptation in NLP
                2   Named entity detection
                3   Concluding remarks
 M. Ciaramita
                Domain adaptation
Domain
adaptation in
NLP

Named entity
detection       The most typical setting in any application: a model is trained
Concluding      on some annotated dataset, the in-domain or source data, and
remarks
                applied on some out-of-domain, or target data.
                Basic assumptions:
 M. Ciaramita
                Domain adaptation
Domain
adaptation in
NLP

Named entity
detection       The most typical setting in any application: a model is trained
Concluding      on some annotated dataset, the in-domain or source data, and
remarks
                applied on some out-of-domain, or target data.
                Basic assumptions:
                  1   Training data and target data come from different
                      distributions; e.g., Financial news and Web pages (vs.
                      semisupervised learning)
 M. Ciaramita
                Domain adaptation
Domain
adaptation in
NLP

Named entity
detection       The most typical setting in any application: a model is trained
Concluding      on some annotated dataset, the in-domain or source data, and
remarks
                applied on some out-of-domain, or target data.
                Basic assumptions:
                  1   Training data and target data come from different
                      distributions; e.g., Financial news and Web pages (vs.
                      semisupervised learning)
                  2   There is no labeled data from the target domain
 M. Ciaramita
                Domain adaptation
Domain
adaptation in
NLP

Named entity
detection       The most typical setting in any application: a model is trained
Concluding      on some annotated dataset, the in-domain or source data, and
remarks
                applied on some out-of-domain, or target data.
                Basic assumptions:
                  1   Training data and target data come from different
                      distributions; e.g., Financial news and Web pages (vs.
                      semisupervised learning)
                  2   There is no labeled data from the target domain
                  3   At least some out-of-domain data is available for
                      evaluation
 M. Ciaramita
                Domain adaptation
Domain
adaptation in
NLP

Named entity
detection       The most typical setting in any application: a model is trained
Concluding      on some annotated dataset, the in-domain or source data, and
remarks
                applied on some out-of-domain, or target data.
                Basic assumptions:
                  1   Training data and target data come from different
                      distributions; e.g., Financial news and Web pages (vs.
                      semisupervised learning)
                  2   There is no labeled data from the target domain
                  3   At least some out-of-domain data is available for
                      evaluation OR
                  4   Direct evaluation in the context of a larger application
 M. Ciaramita
                Motivations for domain adaptation
Domain
adaptation in
NLP

Named entity
detection         1 A challenging scientific problem: human language
Concluding
remarks
                    processing, and reasoning in general, adapt effortlessly to
                    “domain” and context shifts
 M. Ciaramita
                Motivations for domain adaptation
Domain
adaptation in
NLP

Named entity
detection         1 A challenging scientific problem: human language
Concluding
remarks
                    processing, and reasoning in general, adapt effortlessly to
                    “domain” and context shifts
                  2 The most typical setting in any application...
 M. Ciaramita
                Motivations for domain adaptation
Domain
adaptation in
NLP

Named entity
detection         1 A challenging scientific problem: human language
Concluding
remarks
                    processing, and reasoning in general, adapt effortlessly to
                    “domain” and context shifts
                  2 The most typical setting in any application...
                                      Web technology?
 M. Ciaramita
                Motivations for domain adaptation
Domain
adaptation in
NLP

Named entity
detection         1 A challenging scientific problem: human language
Concluding
remarks
                    processing, and reasoning in general, adapt effortlessly to
                    “domain” and context shifts
                  2 The most typical setting in any application...
                                           Web technology?




                           http://www.nytimes.com/2007/02/09/technology/09license.html
 M. Ciaramita
                Main domain adaptation methods in NLP
Domain
adaptation in
NLP
                    More data: Directly add data from target domain to
Named entity
detection           standard training data: self-training, semi-supervised
Concluding          methods (co-training)
remarks
 M. Ciaramita
                Main domain adaptation methods in NLP
Domain
adaptation in
NLP
                    More data: Directly add data from target domain to
Named entity
detection           standard training data: self-training, semi-supervised
Concluding          methods (co-training)
remarks
                    Shared representations: Induce shared feature
                    representations between target and original domain data:
                    structural correspondence learning [Blitzer et al., 2006]
 M. Ciaramita
                Main domain adaptation methods in NLP
Domain
adaptation in
NLP
                    More data: Directly add data from target domain to
Named entity
detection           standard training data: self-training, semi-supervised
Concluding          methods (co-training)
remarks
                    Shared representations: Induce shared feature
                    representations between target and original domain data:
                    structural correspondence learning [Blitzer et al., 2006]
                    More knowledge: Enrich models with external more
                    general knowledge which might partially cover both
                    training and target data: Wordnet, Wikipedia, gazetteers,
                    etc.
 M. Ciaramita
                Main domain adaptation methods in NLP
Domain
adaptation in
NLP
                    More data: Directly add data from target domain to
Named entity
detection           standard training data: self-training, semi-supervised
Concluding          methods (co-training)
remarks
                    Shared representations: Induce shared feature
                    representations between target and original domain data:
                    structural correspondence learning [Blitzer et al., 2006]
                    More knowledge: Enrich models with external more
                    general knowledge which might partially cover both
                    training and target data: Wordnet, Wikipedia, gazetteers,
                    etc.
                    Other: Other learning approaches: revision/corrective
                    modeling [Attardi et al., 2007], model parameter
                    adjustments (some labeled data,
                    see [Chelba and Acero, 2004])
 M. Ciaramita
                Parsing, reranking and self-training
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                Adaptation in constituent parsing [McClosky et al., 2006]:
                    Discriminative reranking [Charniak and Johnson, 2005]:
                    train a model to re-rank the n highest scoring trees of the
                    base parser: Better use of global features, less overfitting
                    Self-training: train a parser in-domain, parse additional
                    data, add external data to gold-standard and re-train
 M. Ciaramita
                Parsing, reranking and self-training
Domain
adaptation in
NLP

Named entity
detection
                  PARSER        TRAIN   TEST           F-score
Concluding
remarks           Base Parser   WSJ     WSJ            89.7
                  +Reranker     WSJ     WSJ            91.3
                  +NANC         WSJ     WSJ            92.1
                  Base parser   WSJ     BROWN          83.9
                  +Reranker     WSJ     BROWN          85.8
                  +NANC         WSJ     BROWN          87.7
                  Base parser   WSJ     SWITCHBOARD    74.0
                  +Reranker     WSJ     SWITCHBOARD    75.9
                  +NANC         WSJ     SWITCHBOARD    77.0
 M. Ciaramita
                Parsing, reranking and self-training
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                    Limitations: Self-training is not always effective:
                    Charniak (1997), Steedman et al. (2003), and Clark et
                    al. (2003) in PoS tagging, NER (ongoing work)
                    Related work: Ratnaparkhi (1999), Hwa (1999), Gildea
                    (2001), Lease and Charniak (2005), Dasgupta et al.
                    (2001), Bacchiani et al. (2006)
 M. Ciaramita
                Structural correspondence learning
Domain
adaptation in
NLP

Named entity
detection

Concluding
                General idea: build a feature representation shared between
remarks         domains:
                    SCL: [Blitzer et al., 2006], use unlabeled data to identify
                    frequently occurring pivot features in both domain:
                         Pivot features are extended to represent correlations to
                         other features W
                         Singular value decomposition (SVD) on W
                    Related work: [Ando, 2004], use SVD on in and
                    out-domain features to induce a shared feature
                    representation for tagging and chunking.
 M. Ciaramita
                Structural correspondence learning
Domain
adaptation in
NLP

Named entity
detection

Concluding          NER and PoS tagging:
remarks
                        NER from WSJ to CNS 65.3% to 76.2% [Ando]
                        PoS from WSJ to Brown 94.7% to 95.6% [Ando]
                        PoS from WSJ to MEDLINE 87.9% to 88.9% [SCL]
                    Sentiment analysis: positive results on online reviews
                    from different domains (DVDs, books, electronics,
                    kitchen), [SCL]
                    Dependency parsing: not useful (yet) on CoNLL shared
                    task on domain adaptation [Dredze et al., 2007]
 M. Ciaramita
                Adaptation in dependency parsing CoNLL 2007
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
 M. Ciaramita
                Adaptation as revision
Domain
adaptation in
NLP

Named entity
                    Train:
detection             1   Train a model in the standard way
Concluding            2   Use the standard model to re-annotate training data (by
remarks
                          crossval, or poor model)
                      3   Train a revision model on the gold standard and predicted
                          labels on training data
                    Prediction:
                      1   Predict an outcome based on a standard model
                      2   Revise base prediction with revision model
                    Some initial promising results on CoNLL 2007 Domain
                    adaptation [Attardi et al., 2007]
                    Two independent systems for first and second step
                    processing, advantages of reranking without being limited
                    to the n best hypotheses.
 M. Ciaramita
                Wrong edge revision
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
 M. Ciaramita
                Wrong edge revision
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                1   Domain adaptation in NLP
                2   Named entity detection
                3   Concluding remarks
 M. Ciaramita
                Named entity detection
Domain
adaptation in
NLP

Named entity
detection           CoNLL tagset: Person, Location, Organization,
Concluding
remarks
                    Miscellaneous
 M. Ciaramita
                Named entity detection
Domain
adaptation in
NLP

Named entity
detection           CoNLL tagset: Person, Location, Organization,
Concluding
remarks
                    Miscellaneous
                    Goal: improve non-standard retrieval tasks, e.g., non
                    document retrieval (Answers, Blogs, Ads, etc.);
                    NER/SRL can improve QA retrieval [Bilotti et al., 2007]
 M. Ciaramita
                Named entity detection
Domain
adaptation in
NLP

Named entity
detection           CoNLL tagset: Person, Location, Organization,
Concluding
remarks
                    Miscellaneous
                    Goal: improve non-standard retrieval tasks, e.g., non
                    document retrieval (Answers, Blogs, Ads, etc.);
                    NER/SRL can improve QA retrieval [Bilotti et al., 2007]
                    Domain adaptation: quality of NER systems is strongly
                    affected by the domain effect;
                    e.g. [Ciaramita and Altun, 2005] 90% to 64% F-score
                    degradation from CoNLL to Wall Street Journal
 M. Ciaramita
                Named entity detection
Domain
adaptation in
NLP

Named entity
detection           CoNLL tagset: Person, Location, Organization,
Concluding
remarks
                    Miscellaneous
                    Goal: improve non-standard retrieval tasks, e.g., non
                    document retrieval (Answers, Blogs, Ads, etc.);
                    NER/SRL can improve QA retrieval [Bilotti et al., 2007]
                    Domain adaptation: quality of NER systems is strongly
                    affected by the domain effect;
                    e.g. [Ciaramita and Altun, 2005] 90% to 64% F-score
                    degradation from CoNLL to Wall Street Journal
                    Evaluation: How to evaluate systematically minimizing
                    bias and noise and avoiding manual annotation?
 M. Ciaramita
                The WSJ-CoNLL NER Dataset
Domain
adaptation in
NLP

Named entity
detection
                Goal: Compare named entity detection models across different
Concluding
                domains
remarks
                    WSJ-BBN entity corpus (Treebank collection, 1987):
                         49,199 sentences
                         105 Categories: 12 NEs, 9 nominal entity types, 7
                         numeric types
                    CoNLL-Reuters 2003 NER data: from Reuters Corpus
                    (1996-97)
                         20.744 sentences
                         4 NE categories: PER, LOC, ORG, MISC
                    Mapping WSJ Tagset → CoNLL Tagset : WSJ corpus
                    annotated with simpler CoNLL tagset
 M. Ciaramita
                BBN Entity Corpus (2005)
Domain
adaptation in
NLP

Named entity
detection           Supplements the Wall Street Journal Penn Treebank with
Concluding          annotation of a large set of entity types:
remarks
                        12 named entity types (70.5%): Person, Facility,
                        Organization, GPE, Location, Nationality, Product,
                        Event, Work of Art, Law, Language, and Contact-Info
                        9 nominal entity types (17%): Person, Facility,
                        Organization, GPE, Product, Plant, Animal, Substance,
                        Disease and Game
                        7 numeric types (12.5%): Date, Time, Percent, Money,
                        Quantity, Ordinal and Cardinal
                    Several categories further divided into subtypes: 105
                    categories, 23.5% of all tokens in WSJ Penn Treebank
                    have a non-null tag
 M. Ciaramita
                WSJ → CoNLL mapping
Domain
adaptation in
NLP

Named entity
detection
                  1   Train a WSJ-BBN tagger, annotate CoNLL data
Concluding
remarks           2   Compare annotated string in both corpora find most
                      frequent pairs of tags:
                          E:WORK OF ART:BOOK → MISC
                          E:ORGANIZATION:EDUCATIONAL → ORG
                          E:LOCATION:CONTINENT → LOC
                          E:PERSON → PER
                  3   Remove (i.e., map to “NULL”) all “DESC” WSJ-BBN
                      categories
                  4   Double check correct manually: e.g.,
                      E:EVENT:HURRICANE → MISC (was PER)
 M. Ciaramita
                NER tagger
Domain
adaptation in
NLP

Named entity
detection

Concluding
                   Regularized perceptron-trained 1st-order Hidden Markov
remarks            Model (Collins 2002):
                        Viterbi decoding: linear complexity
                        Generic NER features: words, lemmas, POS tags,
                        prefixes/suffixes, and word shapes
                        Label to label dependencies limited to the previous tag
                        One adjustable parameter, number of epochs found on
                        development data
                        ≈ 87% F-score on WSJ-BBN (105 categories)
                   (Ciaramita&Altun (2006),
                   http://sourceforge.net/projects/supersensetag)
 M. Ciaramita
                In-domain
Domain
adaptation in
NLP                                             Standard in−domain evaluation
                             0.92
Named entity
detection

Concluding                   0.91
remarks

                              0.9




                             0.89
                   F−score




                             0.88




                             0.87




                             0.86



                                                                                 WSJ on WSJ
                             0.85
                                                                                 CoNLL on CoNLL
                                    0   2   4   6     8     10         12   14   16   18    20
                                                           Itaration
 M. Ciaramita
                Out-of-domain
Domain
adaptation in
NLP                                                 Out−of−domain evaluation
                             0.66
Named entity
detection
                             0.65
Concluding
remarks
                             0.64



                             0.63
                   F−score




                             0.62



                             0.61



                              0.6



                             0.59


                                                                                       WSJ on CoNLL
                             0.58
                                                                                       CoNLL on WSJ
                                    0   2   4   6       8    10         12   14   16       18   20
                                                            Iteration
 M. Ciaramita
                In/out-of-domain
Domain
adaptation in
NLP
                                                    In and out domain
                             0.95
Named entity
detection

Concluding                    0.9
remarks

                             0.85



                              0.8

                                                                               WSJ on CoNLL
                   F−score




                             0.75
                                                                               CoNLL on WSJ
                                                                               WSJ on WSJ
                                                                               CoNLL on CoNLL
                              0.7



                             0.65



                              0.6




                             0.55
                                    0   2   4   6   8     10         12   14   16   18    20
                                                         Iteration
 M. Ciaramita
                Adaptation with pivot features
Domain
adaptation in
NLP

Named entity
detection

Concluding      Several experiments with variants of Ando’s method and SCL:
remarks
                    Learn with pivot features:
                      1   features which appear often both in the source and target
                          domain
                      2   features that are informative (as measured by the KL
                          divergence of P(Y |Xi = 1), P(Y |Xi = 0))
                      3   both.
                    Two strategies:
                      A select the best pivot features and discard the others
                        include all features but weight them by their ”score”
 M. Ciaramita
                Adaptation with LSA
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                Variants on Ando’s method:
                    Build a “reduced” Latent Semantic (LSA) representation
                    on both source and target domains
                    Strategies:
                      1   Use only the new LSA features in training and prediction
                      2   Append (add) the LSA features to the original features
 M. Ciaramita
                Adaptation with SCL
Domain
adaptation in
NLP

Named entity
detection           Take pivot features to be frequent and informative
Concluding
remarks
                    Compute the correlation between the each pivot feature
                    and the other features and build the matrix W
                    Do one of the following:
                      A Project the training data onto W to generate the new
                        feature values
                      B As (A), but do an SVD of W before (as in SCL)
                      C As (A), but do a PCA after
                    Either use the new feature alone or append them to the
                    existing features. In the latter case, rescale the new
                    features such that the average L1 norm of the new
                    features is close to the L1 norm of the original features.
 M. Ciaramita
                Results
Domain
adaptation in
NLP

Named entity
detection

Concluding
                Outcome: All methods proved unsuccessful
remarks
 M. Ciaramita
                Results
Domain
adaptation in
NLP

Named entity
detection

Concluding
                Outcome: All methods proved unsuccessful
remarks
                    Is it possible to improve at all?
                    Ben-David et al. (2007) propose that domain adaptation
                    loss depends on:
                      1   different distribution between domains
                      2   different labeling functions
                    If the source/target difference is only due to (2) there is
                    no hope for adaptation
 M. Ciaramita
                Results
Domain
adaptation in
NLP

Named entity
detection

Concluding
                Outcome: All methods proved unsuccessful
remarks
                    Is it possible to improve at all?
                    Ben-David et al. (2007) propose that domain adaptation
                    loss depends on:
                      1   different distribution between domains
                      2   different labeling functions
                    If the source/target difference is only due to (2) there is
                    no hope for adaptation
                    Does (2) explain all the errors?
 M. Ciaramita
                Adding external knowledge
Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                    External lexical knowledge can have a significant impact
                    in domain adaptation (e.g., [Ciaramita and Altun, 2005]
                    SMM approach to DA)
                    Simple approach: inject more general knowledge through
                    “multilayered” gazetteer features:
                      1   Wordnet: supersense features
                      2   GATE: lists of triggers, locations, first/last person names
                      3   FORBES: list of few hundred company names
 M. Ciaramita
                WSJ on WSJ + gazetteers
Domain
adaptation in
NLP
                                                    WSJ with/without gazeteers
Named entity                 0.91
detection

Concluding
remarks
                              0.9




                             0.89
                   F−score




                             0.88




                             0.87




                             0.86


                                                                                    WSJ−22 with gazetteers
                                                                                    WSJ−22 without gazetteers
                             0.85
                                    0   2   4   6        8     10         12   14     16        18       20
                                                             Iterations
 M. Ciaramita
                CoNLL on CoNLL + gazetteers
Domain
adaptation in
NLP
                                                CoNLL with/without gazetteers features
Named entity                 0.93
detection

Concluding                   0.92
remarks

                             0.91



                              0.9
                   F−score




                             0.89



                             0.88



                             0.87



                             0.86



                             0.85
                                                                                     with gazetteers
                                                                                     without gazetteers
                                    0   2   4     6      8      10        12   14        16   18    20
                                                              Iteration
 M. Ciaramita
                CoNLL on WSJ + gazetteers
Domain
adaptation in
NLP
                                                CoNL on WSJ with/without gazetteers
Named entity                  0.7
detection

Concluding
remarks                      0.68




                             0.66




                             0.64
                   F−score




                             0.62




                              0.6




                             0.58

                                                                               with gazetteer features
                                                                               without gazetteer features
                                    0   2   4      6     8     10         12   14      16     18      20
                                                             Iterations
 M. Ciaramita
                WSJ on CoNLL + gazetteers
Domain
adaptation in
NLP
                                            WSJ on CoNLL with/without gazetteer features
Named entity                 0.67
detection
                             0.66
Concluding
remarks
                             0.65


                             0.64


                             0.63
                   F−score




                             0.62


                             0.61


                              0.6


                             0.59


                             0.58
                                                                               without gazetteers features
                                                                               with gazetteers features
                                    0   2    4     6      8    10         12    14      16     18      20
                                                              Iteration
 M. Ciaramita
                Adaptation and background knowledge
Domain
adaptation in
NLP

Named entity
detection           Background knowledge (world knowledge?) seems crucial
Concluding
remarks
                    in domain shifting, does it provide support for
                    generalization?
 M. Ciaramita
                Adaptation and background knowledge
Domain
adaptation in
NLP

Named entity
detection           Background knowledge (world knowledge?) seems crucial
Concluding
remarks
                    in domain shifting, does it provide support for
                    generalization?
                    Shared representations alone might not provide an
                    equally effective support for such purpose: if the original
                    features do not support generalizations the shared
                    features won’t either
 M. Ciaramita
                Adaptation and background knowledge
Domain
adaptation in
NLP

Named entity
detection           Background knowledge (world knowledge?) seems crucial
Concluding
remarks
                    in domain shifting, does it provide support for
                    generalization?
                    Shared representations alone might not provide an
                    equally effective support for such purpose: if the original
                    features do not support generalizations the shared
                    features won’t either
                    Need to re-evaluate the SCL/Ando framework on the
                    new data...
 M. Ciaramita
                Adaptation and background knowledge
Domain
adaptation in
NLP

Named entity
detection           Background knowledge (world knowledge?) seems crucial
Concluding
remarks
                    in domain shifting, does it provide support for
                    generalization?
                    Shared representations alone might not provide an
                    equally effective support for such purpose: if the original
                    features do not support generalizations the shared
                    features won’t either
                    Need to re-evaluate the SCL/Ando framework on the
                    new data...
                    Can we leverage the largest and most up-to-date
                    resource,Wikipedia, in this direction?
 M. Ciaramita
                Adaptation via Wikipedia
Domain
adaptation in
NLP

Named entity        Use Wikipedia as an external
detection

Concluding
                    knowledge-base [Kazama and Torisawa, 2007]
remarks
                    Sequences with matching entries in Wikipedia articles
                    can be assigned “label”features
                    “label” features are extracted with heuristic methods
                    from Wikipedia definition sentences
                    Standard sequential learning (CRF)
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks
                1   Domain adaptation in NLP
                2   Named entity detection
                3   Concluding remarks
 M. Ciaramita
                Domain adaptation in NLP
Domain
adaptation in
NLP

Named entity
detection
                    NLP is still looking for a killer application, Web
Concluding          technology is an obvious target
remarks
 M. Ciaramita
                Domain adaptation in NLP
Domain
adaptation in
NLP

Named entity
detection
                    NLP is still looking for a killer application, Web
Concluding          technology is an obvious target
remarks
                    Systems’ accuracy drop dramatically out of domain
                    virtually in every basic NLP tasks: PoS, Parsing, NER,
                    etc. Degradation is particularly significant on Web data
 M. Ciaramita
                Domain adaptation in NLP
Domain
adaptation in
NLP

Named entity
detection
                    NLP is still looking for a killer application, Web
Concluding          technology is an obvious target
remarks
                    Systems’ accuracy drop dramatically out of domain
                    virtually in every basic NLP tasks: PoS, Parsing, NER,
                    etc. Degradation is particularly significant on Web data
                    Still limited understanding of domain adaptation and
                    solutions: techniques work in some cases, but not in
                    others, single solution provide small improvements,
                    combined solutions seems more promising/robust
 M. Ciaramita
                Domain adaptation in NLP
Domain
adaptation in
NLP

Named entity
detection
                    NLP is still looking for a killer application, Web
Concluding          technology is an obvious target
remarks
                    Systems’ accuracy drop dramatically out of domain
                    virtually in every basic NLP tasks: PoS, Parsing, NER,
                    etc. Degradation is particularly significant on Web data
                    Still limited understanding of domain adaptation and
                    solutions: techniques work in some cases, but not in
                    others, single solution provide small improvements,
                    combined solutions seems more promising/robust
                    Lack of data with consistent annotations across different
                    domains: more data from different domains needed,
                    rather than new training data (LDC)
 M. Ciaramita
                Domain adaptation in NER
Domain
adaptation in
NLP

Named entity
detection
                    Dramatic drop in accuracy (-20/30% F-score)
Concluding
remarks
 M. Ciaramita
                Domain adaptation in NER
Domain
adaptation in
NLP

Named entity
detection
                    Dramatic drop in accuracy (-20/30% F-score)
Concluding
remarks
                    Small generalization power of learned models, learning
                    seems mostly based on memorization
 M. Ciaramita
                Domain adaptation in NER
Domain
adaptation in
NLP

Named entity
detection
                    Dramatic drop in accuracy (-20/30% F-score)
Concluding
remarks
                    Small generalization power of learned models, learning
                    seems mostly based on memorization
                    As a single solution, additional knowledge seems more
                    important than shared representation building via
                    machine learning
 M. Ciaramita
                Domain adaptation in NER
Domain
adaptation in
NLP

Named entity
detection
                    Dramatic drop in accuracy (-20/30% F-score)
Concluding
remarks
                    Small generalization power of learned models, learning
                    seems mostly based on memorization
                    As a single solution, additional knowledge seems more
                    important than shared representation building via
                    machine learning
                    Shallow semantic characterizations in NER often seems
                    inadequate and too general: do entity categories always
                    translate to new domains? E.g., the “Survivor Series”
                    example (ORG→TEAMS, PER→CHARACTER, etc.?)
 M. Ciaramita
                Domain adaptation in NER
Domain
adaptation in
NLP

Named entity
detection
                    Dramatic drop in accuracy (-20/30% F-score)
Concluding
remarks
                    Small generalization power of learned models, learning
                    seems mostly based on memorization
                    As a single solution, additional knowledge seems more
                    important than shared representation building via
                    machine learning
                    Shallow semantic characterizations in NER often seems
                    inadequate and too general: do entity categories always
                    translate to new domains? E.g., the “Survivor Series”
                    example (ORG→TEAMS, PER→CHARACTER, etc.?)
 M. Ciaramita
                Conclusion
Domain
adaptation in
NLP

Named entity
                    Too much focus on in-domain tasks: limited perspective
detection           both scientifically and application-wise, several NLP
Concluding
remarks
                    methods are mature “in domain” but unreliable “out of
                    domain”.
 M. Ciaramita
                Conclusion
Domain
adaptation in
NLP

Named entity
                    Too much focus on in-domain tasks: limited perspective
detection           both scientifically and application-wise, several NLP
Concluding
remarks
                    methods are mature “in domain” but unreliable “out of
                    domain”.
                    Are “black box” evaluations; i.e., within the context of
                    an application and its global accuracy, a viable approach?
                    Limitation: additional complexity...
 M. Ciaramita
                Conclusion
Domain
adaptation in
NLP

Named entity
                    Too much focus on in-domain tasks: limited perspective
detection           both scientifically and application-wise, several NLP
Concluding
remarks
                    methods are mature “in domain” but unreliable “out of
                    domain”.
                    Are “black box” evaluations; i.e., within the context of
                    an application and its global accuracy, a viable approach?
                    Limitation: additional complexity...
                    Domain adaptation seems a task which is not going to be
                    solved with one stroke: virtually every step (usually taken
                    for granted) needs to be reconsidered, from the
                    tokenization on ...
 M. Ciaramita
                Conclusion
Domain
adaptation in
NLP

Named entity
                    Too much focus on in-domain tasks: limited perspective
detection           both scientifically and application-wise, several NLP
Concluding
remarks
                    methods are mature “in domain” but unreliable “out of
                    domain”.
                    Are “black box” evaluations; i.e., within the context of
                    an application and its global accuracy, a viable approach?
                    Limitation: additional complexity...
                    Domain adaptation seems a task which is not going to be
                    solved with one stroke: virtually every step (usually taken
                    for granted) needs to be reconsidered, from the
                    tokenization on ...
                    SW1 Corpus: Wikipedia annotated with NEs, parsed, plus
                    an entity graph http://www.yr-bcn.es/semanticWikipedia
 M. Ciaramita
                Ando, R. (2004).
Domain
adaptation in   Exploting unannotated corpora for tagging and chunking.
NLP
                In Proceedings of ACL.
Named entity
detection

Concluding
                Attardi, G., Chanev, A., Dell’Orletta, F., Simi, M., and
remarks         Ciaramita, M. (2007).
                Multilingual dependency parsing and domain adaptation with
                DeSR.
                In Proceedings of CoNLL Shared Task.

                Bilotti, M., Ogilvie, P., Callan, J., and Nyberg, E. (2007).
                Structured rettrieval for question answering.
                In Proceedings of SIGIR.

                Blitzer, J., McDonald, R., and Pereira, F. (2006).
                Domain adaptation with structural correspondence learning.
                In Proceedings of EMNLP.
                Charniak, E. and Johnson, M. (2005).
 M. Ciaramita
                Coarse-to-fine n-best parsing and MaxEnt discriminative
Domain
adaptation in
                reranking.
NLP
                In Proceedings of ACL.
Named entity
detection
                Chelba, C. and Acero, A. (2004).
Concluding
remarks         Adaptation of maximum entropy capitalizer: A little data can
                help a lot.
                In Proceedings of ACL.

                Ciaramita, M. and Altun, Y. (2005).
                Named-entity recognition in novel domains with external lexical
                knowledge.
                In Advances in Structured Learning for Text and Speech
                Processing (NIPS 2005).

                Dredze, M., Blitzer, J., Talukdar, P., Ganchev, K., Graca, J.,
                and Pereira, F. (2007).
                Frustratingly hard domain adaptation for parsing.
                In Proceedings of CoNLL.
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection       Kazama, J. and Torisawa, K. (2007).
Concluding
remarks         Exploiting wikipedia as external knowledge for named entity
                recognition.
                In Proceedings of EMNLP.

                McClosky, D., Charniak, E., and Johnson, M. (2006).
                Reranking and self-training for parser adaptation.
                In Proceedings of COLING-ACL 2006.
 M. Ciaramita

Domain
adaptation in
NLP

Named entity
detection

Concluding
remarks

                 Johnson PER , Hendrick   PER   and Boone   PER   counties are in
                the running.

								
To top