Computer Sc & Engineering Department, IIT Kharagpur
Natural Language Processing CS60057
Autumn 2006 Midterm Full Marks: 50
Date: Sep 15, 2006 Time: 2 hours
1. In what ways do natural languages (i) resemble (ii) differ from artificial languages. 
2. The Soundex algorithm is a method that can be used for representing people’s [3+3]
names. Write a Finite State Transducer to implement the first three steps of the
Soundex algorithm. Construct a second FST to implement steps (iv) and (v) of the
(i) Retain the first letter of the word.
(ii) remove all occurrence of the following letters except from the first position:
'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'.
(iii) Change letters from the following sets into the digit given:
'B', 'F', 'P', 'V' 1
'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' 2
(iv) Remove all pairs of digits which occur beside each other from the string that
resulted after step (iii). (i.e., 666 is changed to 6)
(v) Pad the string that resulted from step (f) with trailing zeros and return only the
first four positions, which will be of the form
<uppercase letter> <digit> <digit> <digit>.
3. The following table lists some bigram counts from the BERP domain. [2+4+4]
I want to eat Chinese food lunch
I 8 1087 0 13 0 0 0
want 3 0 786 0 6 8 6
to 3 0 10 860 3 0 12
eat 0 0 2 0 19 2 52
Chinese 2 0 0 0 0 120 1
food 19 0 17 0 0 0 0
lunch 4 0 0 0 0 1 0
(i) Explain add-one OR any one of the other smoothing methods.
(ii) Apply the smoothing method to these bigrams and compute the smoothed
estimates of the bigram probability table (containing P(wi|wj)).
(iii)Now, calculate the probability of the following sentence based on the
I want Chinese lunch.
4. Show a grammar that you can use to handle article-noun agreement in English. You 
will need a distinction between mass and count nouns (e.g., water, love, honesty vs. cup,
word, idea) and between singular count and plural count or mass indefinite articles (a vs.
some). Your grammar should accept the following:
but reject the following:
Give the necessary grammar rules and the lexical entries for the words a, some, water,
cup, and cups, and show how the phrases would succeed or fail during parsing. You can
ignore the other details of the parser (assuming that the right entries or rules are
5. Suggest logical forms for the following sentences: [4+6]
(i) PC teaches CS305 to UG students
(ii) A bus connects Kharagpur with Digha
Write a grammar/lexicon that computes the logical form for each of the above sentences.
6. Consider a small corpus consisting of the following sentences. [7+3]
I want to book a room. Book me a large room. I wish to read a book. Book a bed for the
night. I want a night light.
N : noun N book | room | bed | night |
V : verb light
P : preposition V want | book | wish | light
R : pronoun Pp to | for
A: article A a | the | an
J : adjective Pro I | you | me
J night | large
Construct by hand a Markov model from this corpus with six states corresponding to the
six parts of speech. No smoothing is required. Show the transition probabilities as well as
the emission probabilities. Provide a tagging for the following sentence and evaluate the
probability of this tag sequence.
I book a large book.
Show all your work.