Morphology

Document Sample
Morphology Powered By Docstoc
					Morphology (CS 626-449)
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya

What is Morphology?
• Study of Words
– Their internal structure
washing wash -ing

– How they are formed?
bat write bats writer rat browse rats browser

• Morphology tries to formulate rules

Morphology for NLP
• Machine Translation
Analyze किताबें: किताबे, Noun, Direct Case, Plural

पुस्ति, Noun, Direct Case, Plural
Transfer

Generate पुस्ति े

• Information Retrieval
– goose and geese are two words referring to the same root goose

Need of MA and MG
• Why not list all the forms of a word along with their features?
– Drink:
– drink, V, 1st person – drink, V, 2nd person – drink, V, 3rd person, plural

– – – –

Drinks: drink, V, 3rd person, singular Drank: … Drunk: … Drinking: …

Need of MA and MG
• Reasons:
– Productivity: going, drinking, running, playing
• Storing every form leads to inefficiency

– Addition of new words
• Verb: To fax. Forms: fax, faxes, faxed, faxing

– Morphological complex languages: Marathi
• दारासमोरच्यांनी – दार(SG)+समोर+चा(PL)+नी Meaning: दरवाजे ि सामने वाऱं ने े • Polymorphemic • Possible to store all the forms?

Morphemes
• Smallest meaning bearing units constituting a word
Morphemes

Stem

Prefix
re

consider

Suffix
ation

Stem

Affixes
reconsideration

tree, go, fat

Prefixes

Suffixes

post (postpone)

-ed (tossed)

Affixes in Hindi?

Classes of Morphology
• Inflection • Derivation

Inflection
• Indicates some grammatical function like
Case Number Person ऱड़िा (D) ऱड़िा (Sg) जाऊगा (1st) ँ गया(Pas) ऱड़ि (O) े ऱड़ि (Pl) े जाओगे (2nd)

Gender
Tense

जाऊगा(Masc) ँ

जाऊगी (Fem) ँ
जाऊगी (Fem) ँ

• Results in a word of the same class • Productivity

Derivation
• Usually, results in a word of a different class
• -able when attached to a verb gives an adjective • read (V) + -able = readable (Adj)

• Often meaning of the derived word is difficult to predict exactly
• writer :: writer (one who writes) • paint :: painter (one who paints) • cut :: cutter? (an instrument used to cut)

• Less productive
– eatable :: readable :: runnable?

Problems in MA
• Productivity • False Analysis • Bound Base Morphemes

Productivity
• Property of a morphological process to give rise to new formations on a systematic basis
Transitive Verb (read) -able Productive (readable)
Not Productive (gameable)

Noun (game)

-able

• Exceptions
Peaceable Saleable Impressionable Actionable Marriageable Fashionable Companionable Reasonable knowledgeable

False analysis
hospitable, sizeable

They don’t have the meaning “to be able”

They can not take the suffix -ity to form a noun

Analyzing them as the words containing suffix -able leads to false analysis

Bound Base Morphemes
• Occur only in a particular complex word • Do not have independent existence
base (nonexistent) morpheme (known) Compound

malleable feasible (fease+ible)

• -able has the regular meaning “be able” • -ity form is possible • Base words don’t exit independently

More on Inflection
Noun inflectional suffixes •Plural marker -s •Possessive marker ‘s •Third person present singular marker -s •Past tense marker -ed •Progressive marker -ing •Past participle markers -en or –ed •Comparative marker -er •Superlative marker -est

Verb inflectional suffixes

Adjective inflectional suffixes

Inflectional Suffixes in English

Spelling Rules
• Generally words are pluralized by adding –s to the end • Words ending in –s, -z, -sh and sometimes –x require –es
– buses, quizzes, dishes, boxes

• Nouns ending in –y preceded by a consonant change the –y to -i
– babies, floppies

Verbal Inflection
Morphological Form Classes
Stem -s form -ing participle Past form –ed participle Jump Jumps Jumping Jumped Jumped

Regularly Inflected Verbs
Parse Parses Parsing Parsed Parsed Fry Fries Frying Fried Fried Sob Sobs Sobbing Sobbed Sobbed Eat

Irregularly Inflected Verbs
Bring Brings Bringing Brought Brought Cut Cuts Cutting Cut Cut

Eats Eating Ate Eaten

Forms governed by spelling rules Idiosyncratic forms

Morphological Parsing
• Finding
– Constituent morphemes – Features
Input cats geese goose Morphological Parsed Output cat +N +PL goose +N +PL (goose +N +SG) or (goose +V)

gooses caught

goose +V +3G (catch +V +PAST-PART) or (catch +V +PAST)

Resources
Lexicon List of stems and suffixes along with basic information about them A model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes Spelling rules used to model the changes that occur in the work usually when two morphemes combine

Morphotactics

Orthographic Rules

Morphological Recognition
reg-noun irregular-sg-noun goose irregular-pl-noun geese plural -s

Lexicon

flower

cat
dog

sheep
mouse

sheep
mice

Morphological Recognition: Nouns
reg-noun irregular-sg-noun goose irregular-pl-noun geese plural -s

Lexicon

flower

cat
dog

sheep
mouse

sheep
mice

reg-noun

plural (-s)
q1 q2 Note: Here, we are ignoring the nouns which take the suffix –es for pluralization

FSA

q0

irreg-pl-noun irreg-sg-noun

Adjectives
Type adj-root1 Properties Occur with un- and -ly Examples happy, real

Adj-root2

Can’t occur with un- and -ly

big, red

Adjectives
Type adj-root1 Adj-root2 Properties Occur with un- and -ly Can’t occur with un- and -ly adj-root1 q1 q2
-er -ly -est

Examples happy, real big, red

un-

q0

adj-root1

q5

ε
q3 adj-root2 q4
-er -est

References
• “Linguistics, An Introduction to Language and Communication” by Adrian Akmajian, Richard A. Demers, Ann K. Farmer and Robert M. Harnish (5th Edition) • SPEECH and LANGUAGE PROCESSING, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin (Second Edition)