Docstoc

Owen's slides - Department of Computer Science_ Columbia

Document Sample
Owen's slides - Department of Computer Science_ Columbia Powered By Docstoc
					          Center for Computational
             Learning Systems
• Independent research center within the Engineering School
• NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche,
  Rebecca Passonneau, Owen Rambow
• We are part of “The NLP Group” but not of the CS department
• What we do:
    o   Researchers
    o   Work with Kathy and Julia
    o   Our own projects
    o   Sometimes teach
    o   Supervise students (PhD, Masters, independent studies)
•   Some of us are in CEPSR, some in the Interchurch Building
•   Some NLP Group meetings will take place in Interchurch Center
  CLiMB 2: Computational Linguistics
    for Metadata Building, phase 2

• Becky Passonneau (with University of
  Maryland)
• Interactive workbench for image
  cataloguers/indexers: Use NLP to extract
  descriptive terms from scholarly text
• Mellon Foundation
• http://www.umiacs.umd.edu/~climb/
 Automated Readers Advisor, Heiskell
Talking Books and Braille Library (NYPL)

• Becky Passonneau
• Replace some of librarians’ tasks in current
  over-the-phone borrowing system with
  automated dialogue system
• Use Wizard-of-Oz paradigm for data collection
• Joint project with CCNY (Esther Levin)
• http://www.cs.columbia.edu/~becky/pubs/W
  ozVariant.ppt
  Tracking Emergent Narrative Skills
              (TENS)

• Becky Passonneau
• Current data set: ten-year olds retelling silent
  movies
• Develop quantitative methods to compare
  semantic and pragmatic content (e.g., adapt
  Pyramid Method for evaluating summary
  content)
• Joint project with University of Connecticut
  (Elena Levy)
                           Arabic NLP
• CADIM Group: Mona Diab, Nizar Habash, Owen Rambow
• Focus on Standard Arabic AND the dialects
• NLP tools for Arabic:
  o   Morphological analysis (exists)
  o   Morphological tagging (exists, best-performing)
          Tokenization
          POS tagging (best-performing)
          Diacritization (best-performing)
  o   Word-sense disambiguation (in progress)
  o   Sentence-boundary detection for ASR (in progress)
  o   Parsing (initial research)
  o   Names-entity recognition (joint with Fair Isaacs, in progress)
  o   …
               Machine Translation
• Nizar Habash
• Focus: Arabic-English MT
• Different hybrid MT approaches explored
  o Linguistic preprocessing for Statistical MT
           Morphological and Syntactic preprocessing
   o   Adding statistical resources to rule-based MT systems
           Automatically extracted phrase tables combined with
            Generation-Heavy MT
• Columbia first time participation in NIST MTEval
  (2006)
     Word Sense Modeling and
         Disambiguation

• Mona Diab
• Using corpora (including multilingual
  parallel and similar) for unsupervised
  learning
• Arabic WordNet
• Arabic PropBank
              Email Summarization:
                Social Networks
• Aaron Harnly (PhD student) and Owen Rambow, with
  Kathy McKeown
• Study interaction between:
   o   Email-intrinsic factors
           Language in email (lexison, syntax, …)
           Email genre
   o   Structure of dialog
           Threads
           Speech acts
   o   Relation among people
           Roles in organization
           Social networks
• Use to predict on factor from others
• Use in high-level summaries of large amounts of
  email communication
  Multilingual Metagrammars
• Owen Rambow (with University of
  Pennsylvania)
• Goal: high-level abstract representation of
  syntax of (many/all) natural languages, from
  which we can automatically generate
  grammars that can be used for NLP
• Have: Universal Grammar component and
  language-specific modules for Korean,
  German, Yiddish
• Next: Icelandic, Mainland Scandinavian,
  English, Kashmiri, …

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/13/2011
language:English
pages:9