Semantic Labeling of Compound Nominalization in Chinese Jinglei Zhao, Hui Liu and Ruzhan Lu email@example.com <Shanghai Jiao Tong University> Introduction Compound Nominalization: A subset of nominal compounds in which the head word is an nominalization, e.g 卫星-导航 (satellite navigation) . Properties of Chinese Nominalization: a. Different from English, Verb nominalization in Chinese has the same form as the verb predicate. e.g navigation 导航 =navigate 导航 b. Like English, Nominalizations retain the argument structure of the corresponding predicates. Problem Definition Semantic Labeling of Compound Nominalization: Classify the possible semantic relations involved between the modifier and the nominalization head in a compound nominalization. For example: for the compound nominalization 卫星-导航 “satellite navigation”, determine that the relation between 卫星( satellite) and 导 航(navigation) is manner. Semantic Relations The semantic relation between a noun modifier and a verb nominalization head can be characterized by the semantic role the modifier can take respecting to the corresponding verb predicate. By analyzing a set of compound nominalizations of length two from a balanced corpus, we find the semantic relations between a noun modifier and a verb nominalization head can be characterized by four coarse-grained semantic roles: Proto-Agent (PA), Proto-Patient (PP), Range (RA) and Manner (MA) Examples of the four roles Proposed Method to Label the Relations 1 Use Relation Specific Paraphrase Patterns formed by selected word instances (prepositions, support verbs, feature nouns and aspect markers) to indicate specific semantic relations. 2 Exploit the PMI-IR co-occurrence between paraphrase patterns and compound nominalization got by standard Web search engine as the classification features of ME classifier. 3 Use Log transition and the CAIM discretization algorithm to scale the raw PMI feature to obtain better performance. Paraphrase Pattern Templates PMI-IR As Features Given a compound nominalization pair p(x, y) and a set of paraphrase pattern templates t1, t2, .., tn, the PMI-IR score between p and ti can be computed as: Scaling PMI Features Web counts are inflated which need to be scaled to attain a good estimation of the underlying probability density function in ME. The two scaling procedure: 1. Log sub-linear transformation: Compression of the inflated feature space。 2. CAIM discretization: Discretize an attribute into the smallest number of intervals and maximize the class-attribute inter-dependency. Experiments Dataset： 300 compound nominalizations extracted from the Chinese National Corpus. Annotated by two annotator with Kappa 87.3%. the proportion of relations PP, PA, MA, RA is 45.6%, 27.7%, 16.7% and 10% respectively, giving a baseline of 45.6% of the classification problem by viewing all the relations to be PP. Result Conclusions 1 Compound nominalization in Chinese can be characterized by four coarse-grained roles. 2 Verb related roles can be classified using no lexical resources and parsed text. 3 Scaling PMI-IR score has a positive effect on the specific task. Thank You!!