VIEWS: 18 PAGES: 7 POSTED ON: 12/5/2011
600.103 MidTerm Exam (4 pages) + Extra Credit/Homework (3 pages) Name: Question 1: Binomial Distribution Write a Python program called binomial_coef that takes two arguments, n and k, and returns binomial_coef: Run your binomial_coef on the following and verify that you are computing Pascale’s triangle: for n in range(9): print [binomial_coef(n, k) for k in range(n+1)] What is the next row of Pascale’s triangle? 1, 6 and then what? 1 Use your binomial_coef function to compute binomial_prob. Thus, if we have a fair coin (p = ½), and we toss it (n=2) times, then there is a ¼ prob of k=0, and ½ prob of k=1 and ¼ prob of k=2. Check the Python function with the following in R: table(rbinom(1e6, 2, 0.5))/1e6 Let’s use your binomial_prob function to model the distribution of “the” in text. Suppose that the next word is “the” with probability p=0.05. Fill out the following table with the probability of seeing k instances of “the” in a sample of 100 words of text. Use both R and Python to compute these numbers. Round probabilities to hundredths. k 0 1 2 3 4 5 6 7 R Python 2 Consider words with p between 0.007 and 0.008. Fill out the following table to hundredths with binomial_prob(k, 100, p), for these two values of p and these six values of k. k 0 1 2 3 4 5 p=0.007 p=0.008 We can estimate p for a word, w, in a text with p(w) = freq(w)/N, where freq(w) is the number of times that w appears in the text, and N is the number of words in the text. What is N? How many words are there in Genesis? What is freq(‘the’) in Genesis? (How many times does ‘the’ appear in Genesis?) What is p(‘the’)? That is, what is freq(‘the’)/N? Which words in Genesis have p between 0.007 and 0.008? Split Genesis into blocks of 100 words each. There are N/100 such blocks. For words with p between 0.007 and 0.008, we would expect to see them (one or more times) in B blocks where Blow ≤ B ≤ Bhigh Blow = N/100 * (1-binomial_prob(0,100,0.007)) Bhigh = N/100 * (1-binomial_prob(0,100,0.008)) Fill out the following table. The last two columns are N/100 * (1-binomial_prob(0,100,0.007)) and N/100 * (1-binomial_prob(0,100,0.008)), respectively. 3 Note: Since the last two columns don’t depend on the word, they will be the same for all words. Hint: This Unix command counts the number of 100-word blocks that contain “thou.” Use this value for the observed column (but change “thou” to the appropriate word). tr –sc ‘A-Za-z’ ‘\n’ < genesis.txt | egrep . | awk ‘/^thou$/ {print int(NR/100)}’ | sort –u | wc -l word observed Blow Bhigh Is the observed column more than expected, or less than expected? That is, is the 2nd column larger than the last two columns (or smaller)? End of MidTerm (Everything after this is extra credit/homework). Homework is due at dawn before the next class (after spring break) 4 Why isn’t the 2nd col between the last two cols? What’s wrong with the binomial assumption? Question 2: Fibonacci fib0, fib1 and fib2 are three programs that almost compute Fibonacci in R. fib0 = function(n) if(n <= 1) 1 else fib0(n-1) + fib0(n-2) fib1 = function(n) matpow(matrix(data=c(1,1,1,0), ncol=2), n)[1,1] fib2 = function(n) round(((1+sqrt(5))/2)^n/sqrt(5)) # note: pow is the divide-and-conquer log n method of powering a number pow = function(x,n) { if(n==1) x else if(even(n)) pow(x, n/2)^2 else x * pow(x,n-1) } even = function(n) n == floor(n/2) * 2 # matpow uses the same divide-and-conquer log n method to power a matrix # note: %*% is matrix multiplication matpow = function(x,n) { if(n==1) x else if(even(n)) matsquare(matpow(x, n/2)) else x %*% matpow(x,n-1) } matsquare = function(x) x %*% x Run all three implementations of fib in R and compare their output to the truth: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 5 Specifically, fill out the following table. Hint: fill in the first column with for(n in 0:7) print(fib0(n)). n truth fib0 fib1 fib2 0 0 1 1 2 1 3 2 4 3 5 5 6 8 7 13 What’s wrong with these programs? Can you fix them? Specifically, modify all three programs so they produce the truth (at least for these 8 values). Fib0: Fib1: Fib2: 6 How fast are these three programs? Which implementation is faster? How does time grow with n? Answer the question both empirically and theoretically. To show growth with n empirically, write down the elapsed time in each cell of the table below. Hint: You can fill out the first column with: for(i in 25:30) print (system.time(fib0(i))) Hint: Some of these implementations are so fast that system.time will report 0 elapsed time. If that happens, try system.time(for(i in 1:1000) fib1(30)) and then divide the elapsed time by 1000. n fib0 fib1 fib2 25 26 27 28 29 30 Theoretically, using the methods discussed in last week’s lecture, how would we expect time to grow with n? Fib0: Fib1: Fib2: 7