Docstoc

Zhao

Document Sample
Zhao Powered By Docstoc
					Semantic Labeling of Compound
  Nominalization in Chinese


              Jinglei Zhao, Hui Liu and Ruzhan Lu
                      zjl@sjtu.edu.cn

              <Shanghai Jiao Tong University>
Introduction

   Compound Nominalization:
    A subset of nominal compounds in which the head word is an
    nominalization, e.g 卫星-导航 (satellite navigation) .

   Properties of Chinese Nominalization:
    a. Different from English, Verb nominalization in Chinese has the
    same form as the verb predicate. e.g navigation 导航 =navigate 导航
    b. Like English, Nominalizations retain the argument structure of
    the corresponding predicates.
Problem Definition

   Semantic Labeling of Compound Nominalization:
    Classify the possible semantic relations involved between the
    modifier and the nominalization head in a compound
    nominalization.

    For example: for the compound nominalization 卫星-导航 “satellite
    navigation”, determine that the relation between 卫星( satellite) and 导
    航(navigation) is manner.
Semantic Relations

   The semantic relation between a noun modifier and a verb
    nominalization head can be characterized by the semantic role
    the modifier can take respecting to the corresponding verb
    predicate.
   By analyzing a set of compound nominalizations of length two
    from a balanced corpus, we find the semantic relations between
    a noun modifier and a verb nominalization head can be
    characterized by four coarse-grained semantic roles:
    Proto-Agent (PA), Proto-Patient (PP), Range (RA) and Manner
    (MA)
Examples of the four roles
Proposed Method to Label the
Relations

1 Use Relation Specific Paraphrase Patterns formed by selected
  word instances (prepositions, support verbs, feature nouns and
  aspect markers) to indicate specific semantic relations.
2 Exploit the PMI-IR co-occurrence between paraphrase patterns
  and compound nominalization got by standard Web search
  engine as the classification features of ME classifier.
3 Use Log transition and the CAIM discretization algorithm to
  scale the raw PMI feature to obtain better performance.
Paraphrase Pattern Templates
PMI-IR As Features

  Given a compound nominalization pair p(x, y) and a set of
 paraphrase pattern templates t1, t2, .., tn, the PMI-IR score
 between p and ti can be computed as:
Scaling PMI Features


   Web counts are inflated which need to be scaled to attain a
    good estimation of the underlying probability density function in
    ME.
    The two scaling procedure:
      1. Log sub-linear transformation: Compression of the inflated
    feature space。
      2. CAIM discretization: Discretize an attribute into the
    smallest number of intervals and maximize the class-attribute
    inter-dependency.
Experiments

Dataset:

   300 compound nominalizations extracted from the Chinese
    National Corpus. Annotated by two annotator with Kappa
    87.3%.

   the proportion of relations PP, PA, MA, RA is 45.6%, 27.7%,
    16.7% and 10% respectively, giving a baseline of 45.6% of the
    classification problem by viewing all the relations to be PP.
Result
Conclusions


1 Compound nominalization in Chinese can be
  characterized by four coarse-grained roles.
2 Verb related roles can be classified using no lexical
  resources and parsed text.
3 Scaling PMI-IR score has a positive effect on the
  specific task.
Thank You!!

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:11
posted:3/10/2010
language:English
pages:13