Document Sample

Frequency distributions: Testing of goodness of fit and contingency tables Chi-square statistics • Widely used for nominal data’s analysis • Introduced by Karl Pearson during 1900 • Its theory and application expanded by him and R. A. Fisher • This lecture will cover Chi-square test, G test, Kolmogorov-Smirnov goodness of fit for continuous data The 2 test: 2 = (observed freq. - expected freq.)2/ expected freq. • Obtain a sample of nominal scale data and to infer if the population conforms to a certain theoretical distribution e.g. genetic study • Test Ho that the observations (not the variables) are independent of each other for the population. • Based on the difference between the actual observed frequencies (not %) and the expected frequencies The 2 test: 2 = (observed freq. - expected freq.)2/ expected freq. • As a measure of how far a sample distribution deviates from a theoretical distribution • Ho: no difference between the observed and expected frequency (HA: they are different) • If Ho is true: the difference and Chi-square SMALL • If Ho is false: both measurements Large For Questionnaire Example (1) • In a questionnaire, 259 adults were asked what they thought about cutting air pollution by increasing tax on vehicle fuel. • 113 people agreed with this idea but the rest disagreed. • Perform a Chi-square text to determine the probability of the results being obtained by chance. For Questionnaire Agree Disagree Observed 113 259 -113 = 146 Expected 259/2 = 129.5 259/2 = 129.5 Ho: Observed = Expected 2 = (113 - 129.5)2/129.5 + (146 - 129.5)2 /129.5 = 2.102 + 2.102 = 4.204 df = k - 1 = 2 - 1 = 1 From the Chi-square (Table B1 in Zar’s book) 2 ( = 0.05, df = 1)= 3.841 for 2 = 4.202, 0.025<p<0.05 Therefore, rejected Ho. The probability of the results being obtained by chance is between 0.025 and 0.05. For Genetics Practical (1) • Calculate the Chi-square of data consisting of 100 flowers to a hypothesized color ratio of 3:1 (red: green) and test the Ho • Ho: the sample data come from a population having a 3:1 ratio of red to green flowers • Observation: 84 red and 16 green • Expected frequency for 100 flowers: – 75 red and 25 green Please Do it Now For Genetics Practical (2) • Calculate the Chi-square of data consisting of 100 flowers to a hypothesized color ratio of 3:1 (red: green) and test the Ho • Ho: the sample data come from a population having a 3:1 ratio of red to green flowers • Observation: 67 red and 33 green • Expected frequency for 100 flowers: – 75 red and 25 green Please Do it Now For Genetics For > 2 categories • Ho: The sample of Drosophila from a population having 9: 3: 3: 1 ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to dark-normal wing (DNW) to dark-vestigial wing (DVW) • Student’s observations in the lab: PNW PVW DNW DVW Total 300 77 89 36 502 Calculate the chi-square and test Ho For Genetics • Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1 ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to dark-normal wing (DNW) to dark-vestigial wing (DVW) PNW PVW DNW DVW Total Observed 300 77 89 36 502 Exp. proportion 9/16 3/16 3/16 1/16 1 Expected 282.4 94.1 94.1 31.4 502 O-E 17.6 -17.1 -5.1 4.6 0 (O - E)2 309.8 292.4 26.0 21.2 (O - E)2/E 1.1 3.1 0.3 0.7 2 = 1.1 + 3.1 + 0.3 + 0.7 = 5.2 df = 4 -1 = 3 2 ( = 0.05, df = 3)= 7.815 for 2 = 5.20, 0.25<p<0.10 Therefore, accept Ho. For Questionnaire Cross Tabulation or Contingency Tables – Further examination of the data on the opinion on increasing fuel to cut down air pollution (example 1): – Ho: the decision is independent of sex Males Females Agree 13 (a) 100 (b) Disagree 116 (c) 30 (d) Expected frequency for cell b = (a + b)[(b + d)/N] Males Females n Agree 13 100 113 113(129/259)=56.28 113(130/259)= 56.72 Disagree 116 30 146 146(129/259)=72.72 146(130/259)= 73.28 n 129 130 259 Cross tabulation or contingency tables: – Ho: the decision is independent of sex Males Females n Agree 13 100 113 56.28 56.72 Disagree 116 30 146 72.72 73.28 n 129 130 259 2 = (13 - 56.28)2/56.28 + (100 - 56.72)2/56.72 + (116 - 72.72) 2/72.72 + (30 - 73.28)2/73.28 = 117.63 df = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1 2 ( = 0.05, df = 1)= 3.841 p<0.001 Therefore, reject Ho and accept H A that the decision is dependent of sex. Quicker method for 2 x 2 cross tabulation: Class A Class B n State 1 a b a+b State 2 c d c+d n a+c b+d n = a + b + c +d 2 = n (ad - bc)2/(a + b)(c + d)(a + c)(b + d) Males Females Agree 13 100 113 Disagree 116 30 146 129 130 259 2 = 259(13 30 - 116 100)2/(113)(146)(129)(130) = 117.64 2 ( = 0.05, df = 1)= 3.841 p<0.001; Therefore, rejected Ho. Yates’ continuity correction: • Chi-square is also a continuous distribution, while the frequencies being analyzed are discontinuous (whole number). • To improve the analysis, Yates’ correction is often applied (Yate,1934): • 2 = (observed freq. - expected freq. - 0.5)2/ expected freq. • For 2 x 2 contingency table: 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d) Yates’ Correction (example 1): • 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d) Males Females Agree 13 100 113 Disagree 116 30 146 129 130 259 2 = 259(1330 - 116100 -0.5259)2/(113)(146)(129)(130) = 114.935 (smaller than 117.64, less bias) 2 ( = 0.05, df = 1)= 3.841 p<0.001; Therefore, rejected Ho. Practical 3: • 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d) • For a drug test, Ho: The survival of the animals is independent of whether the drug is administered Dead Alive n Treated 12 30 42 Not treated 27 31 58 n 39 61 100 Using Yates’ correction to calculate 2 and test the hypothesis Please do it at home Bias in Chi-square calculations • If values of expected frequency (fi) are very small, the calculated 2 is biased in that it is larger than the theoretical 2 value and we shall tend to reject Ho. • Rules: fi > 1 and no more than 20% of fi < 5.0. • It may be conservative at significance levels < 5%, especially when the expected frequencies are all equal. • If having small fi, (1) increase the sample size if possible, use G-test or (2) combine the categories if possible. The G test (log-likelihood ratio) G = 2 O ln (O/E) • Similar to the 2 test • Many statisticians believe that the G test is superior to the 2 test (although at present it is not as popular) • For 2 x 2 cross tabulation: Class A Class B State 1 a b State 2 c d The expected frequency for cell a = (a+b)[(a+c)/n] Practical 3 Dead Alive n Treated 12 (16.38) 30 (25.62) 42 Not treated 27 (22.62) 31 (35.38) 58 n 39 61 100 G = 2 O ln (O/E) Dead Alive n Treated 12 (16.38) 30 (25.62) 42 Not treated 27 (22.62) 31 (35.38) 58 n 39 61 100 (1) Calculate G: G = 2 [ 12 ln(12/16.38) + 30 ln(30/25.62) + 27 ln(27/22.62) + 31 ln(31/35.38)] G = 2 (1.681) = 3.362 (2) Calculate the William’s correction: 1 + [(w2 - 1)/6nd] where w is the number of frequency cells, n is total number of measurements and d is the degree of freedom (r-1)(c-1) = 1 + [(42 - 1)/ (6)(100)(1)] = 1.025 G (adjusted) = 2 = 3.362/1.025 = 3.28 (< 3.31 from 2 test) 2 ( = 0.05, df = 1)= 3.841 p>0.05; Therefore, accept Ho. • Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1 ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to dark-normal wing (DNW) to dark-vestigial wing (DVW) PNW PVW DNW DVW Total Observed 300 77 89 36 502 Expected 282.4 94.1 94.1 31.4 O ln(O/E) 18.14 -15.44 -4.96 4.92 G value: G = 2 (18.14 - 15.44 - 4.96 + 4.92) = 5.32 William’s correction: 1 + [(42 - 1)/6 (502) (3)] = 1.00166 G (adjusted): 5.32/1.00166 = 5.311 2 ( = 0.05, df = 3)= 7.815 for 2 = 5.20, 0.25<p<0.10 Therefore, accept Ho. The Kolmogorov-Smirnov goodness of fit test = Kolmogorov-Smirnov one-sample test • Deal with goodness of fit tests applicable to nominal scale data and for data in ordered categories • Example: 35 cats were tested one at a time, and allowed to choose 5 different diets with different moisture content (1= very moist to 5 = very dry): • Ho: Cats prefer all five equally 1 2 3 4 5 n Observed 2 18 10 4 5 35 Expected 7 7 7 7 7 35 Kolmogorov-Smirnov one-sample test • Ho: Cats prefer all five diets equally 1 2 3 4 5 n O 2 18 10 4 1 35 E 7 7 7 7 7 35 Cumulative O 2 20 30 34 35 Cumulative E 7 14 21 28 35 di 5 6 9 6 0 dmax = maximum di = 9 (dmax), k, n = (dmax) 0.05, 5, 35 = 7 (Table B8: k = no. of categories) Therefore reject Ho. 0.002< p < 0.005 • When applicable (i.e. the categories are ordered), the K-S test is more powerful than the 2 test when n is small or when values of observed frequencies are small. • Note: order for the same data changed to 2, 1, 4, 18 and 10: the 2 test will give the same results (independent of the orders) but the calculated dmax from the K-S test will be different. Kolmogorov-Smirnov one-sample test for continuous ratio scale data • Example 22.11 (page 479 in Zar) • Ho: Moths are distributed uniformly from ground level to height of 25 m • HA: Moths are not distributed uniformly from ground level to height of 25 m • Use of Table B9 Ho: Moths are distributed uniformly from ground level to height of 25 m Xi/25 Fi Fi/15 relative Xi fi cumulative relative expected no. height frequency frequency frequency frequency Di Dí 1 1.4 1 1 0.0667 0.056 0.0107 0.0560 2 2.6 1 2 0.1333 0.104 0.0293 0.0373 3 3.3 1 3 0.2000 0.132 0.0680 0.0013 4 4.2 1 4 0.2667 0.168 0.0987 0.0320 5 4.7 1 5 0.3333 0.188 0.1453 0.0787 6 5.6 2 7 0.4667 0.224 0.2427 0.1093 7 6.4 1 8 0.5333 0.256 0.2773 0.2107 8 7.7 1 9 0.6000 0.308 0.2920 0.2253 9 9.3 1 10 0.6667 0.372 0.2947 0.2280 10 10.6 1 11 0.7333 0.424 0.3093 0.2427 11 11.5 1 12 0.8000 0.460 0.3400 0.2733 12 12.4 1 13 0.8667 0.496 0.3707 0.3040 13 18.6 1 14 0.9333 0.744 0.1893 0.1227 14 22.3 1 15 1.0000 0.892 0.1080 0.0413 Max = 0.3707 0.3040 Table B9 D 0.05, 15 = 0.3376 < D max Therefore, reject Ho. 0.02<p<0.05 Kolmogorov-Smirnov one-sample test for grouped data (example 22.11) Xi 0-5 m 5-10 m 10-15 m 15-20 m 20-25 m n observed fi 5 5 3 1 1 15 expected fi 3 3 3 3 3 15 Cumulative O fi 5 10 13 14 15 Cumulative E fi 3 6 9 12 15 abs di 2 4 4 2 0 d max 4 d max 0.05, 5, 15 5 (use Table B8) Thus, accept Ho 0.05<p<0.10 Note: The power is lost (and Ho is not rejected) by grouping the data and therefore grouping should be avoided whenever possible. • The power is reduced by grouping the data and therefore grouping should be avoided whenever possible. • K-S test can be used to test normality of data • Recognizing the distribution of your data is important – Provides a firm base on which to establish and test hypotheses – If data are normally distributed, you can use parametric tests; – Otherwise transform data to normal distribution – Or non-parametric tests should be performed • For a reliable test for normality of interval data, n must be large enough (e.g. > 15) – Difficult to tell whether a small data set (e.g. 5) is normally distributed • Inspection of the frequency histogram • Probability plot • Chi-square goodness of fit • Kolmogorov-Smirnov one-sample test • Symmetry and Kurtosis: D’Agostino- Pearson K2 test (Chapters 6 & 7, Zar 99) Inspection of the frequency histogram • Construct the frequency histogram • Calculate the mean and median (mode as well, if possible) • Check the shape of the distribution and the location of these measurements Probability plot e.g. 1 c.f./61 =NORMSINV(X) Cumulative Probit Upper Class frequency frequency Percentile z (5 + z) Class limit 0- 2 1 1 0.0164 -2.1347 2.8653 2 2- 4 2 3 0.0492 -1.6529 3.3471 4 4- 6 3 6 0.0984 -1.2910 3.7090 6 6- 8 5 11 0.1803 -0.9141 4.0859 8 8- 10 8 19 0.3115 -0.4917 4.5083 10 10 - 12 11 30 0.4918 -0.0205 4.9795 12 12 - 14 8 38 0.6230 0.3132 5.3132 14 14 - 16 9 47 0.7705 0.7405 5.7405 16 16 - 18 6 53 0.8689 1.1210 6.1210 18 18 - 20 4 57 0.9344 1.5096 6.5096 20 20 - 22 3 60 0.9836 2.1347 7.1347 22 22 - 24 1 61 1.0000 e.g. 1 Probability plot 1.0 12 0.9 10 0.8 Expected cum ulative p 0.7 8 frequency 0.6 6 0.5 0.4 4 0.3 y = 0.8502x + 0.0736 2 0.2 R2 = 0.9711 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Bin num ber (bin size = 2) Observed cum ulative p Probability plot e.g. 2 Cumulative Probit Upper Class frequency frequency Percentile z (5 + z) Class limit 0- 2 10 10 0.1111 -1.2206 3.7794 2 2- 4 24 20 0.2222 -0.7647 4.2353 4 4- 6 11 44 0.4889 -0.0279 4.9721 6 6- 8 9 55 0.6111 0.2822 5.2822 8 8- 10 4 64 0.7111 0.5566 5.5566 10 10 - 12 5 68 0.7556 0.6921 5.6921 12 12 - 14 2 73 0.8111 0.8820 5.8820 14 14 - 16 2 75 0.8333 0.9674 5.9674 16 16 - 18 4 77 0.8556 1.0606 6.0606 18 18 - 20 1 81 0.9000 1.2816 6.2816 20 20 - 22 8 82 0.9111 1.3476 6.3476 22 22 - 24 10 90 1.0000 Probability plot e.g. 2 30 1.0 25 20 0.9 Frequency y = 1.0876x - 0.2443 15 0.8 R2 = 0.8576 10 0.7 5 0.6 Exp cum P 0 0.5 1 2 3 4 5 6 7 8 9 10 11 12 Bin num ber (bin size = 2) 0.4 0.3 • Obviously, the data is not 0.2 distributed on the line. 0.1 0.0 • Based on the frequency distribution 0.0 0.2 0.4 0.6 0.8 1.0 Obs cum P of the data, the distribution is positive skew (higher frequencies at lower classes) • Concave curve indicates positive skew which suggest a log- normal distribution (i.e. log-transformation of the upper class limit is required) very common e.g. mortality rates • Convex curve indicates negative skew less common (e.g. some binomial distribution) • S-shaped curve suggests ‘bad’ kurtosis: Normality departure but their mean, median, mode remain equal • Leptokurtic distribution: data bunched around the mean, giving a sharp peak • Platykurtic distribution: a board summit which falls rapidly in the tails • Bimodal distributions e.g. toxicity data produce a sigmoid probability plot • Multi-modal distributions: data from animals with several age-classes; undulating wave- like curve Chi-Square Goodness of Fit The heights of 70 students: Chi-square goodness of fit of a normal distribution. (Example 6.1 in Zar) 7.4 E O Xi Z Expected Expected observed Upper class (Xi-mean)/s P(z) P(Xi) frequency frequency no. Height class frequency limit n(P(Xi)) n(P(Xi)) (O-E)^2/E 1 <62.5 0 62.5 -2.32 0.0102 0.0102 0.7172 2 62.5 - 63.5 2 63.5 -2.02 0.0219 0.0117 0.8191 1.5363 0.1400 3 63.5 - 64.5 2 64.5 -1.71 0.0434 0.0214 1.4987 1.4987 0.1677 4 64.5 - 65.5 3 65.5 -1.41 0.0791 0.0358 2.5048 2.5048 0.0979 5 65.5 - 66.5 5 66.5 -1.11 0.1338 0.0546 3.8238 3.8238 0.3618 6 66.5 - 67.5 4 67.5 -0.81 0.2099 0.0762 5.3318 5.3318 0.3327 7 67.5 - 68.5 6 68.5 -0.50 0.3069 0.0970 6.7906 6.7906 0.0921 8 68.5 - 69.5 5 69.5 -0.20 0.4198 0.1129 7.8996 7.8996 1.0643 9 69.5 - 70.5 8 70.5 0.10 0.5397 0.1199 8.3939 8.3939 0.0185 10 70.5 - 71.5 7 71.5 0.40 0.6561 0.1164 8.1467 8.1467 0.1614 11 71.5 - 72.5 7 72.5 0.70 0.7593 0.1032 7.2220 7.2220 0.0068 12 72.5 - 73.5 10 73.5 1.01 0.8428 0.0835 5.8479 5.8479 2.9481 13 73.5 - 74.5 6 74.5 1.31 0.9046 0.0618 4.3251 4.3251 0.6486 14 74.5 - 75.5 3 75.5 1.61 0.9463 0.0417 2.9219 2.9219 0.0021 15 75.5 - 76.5 2 76.5 1.91 0.9721 0.0258 1.8029 1.8029 0.0215 16 76.5 - 77.5 0 77.5 2.21 0.9866 0.0145 1.0161 1.5392 1.5392 17 77.5 - 78.5 0 78.5 2.52 0.9941 0.0075 0.5231 Chi-square = 7.6026 Chi-sq 0.05, 12 = 21.026 Accept Ho: the data are normally distributed Xi fi fiXi fi(Xi)^2 Mid height freq 63 2 126 7938 64 2 128 8192 65 3 195 12675 66 5 330 21780 67 4 268 17956 68 6 408 27744 69 5 345 23805 70 8 560 39200 12 Observed 71 7 497 35287 72 7 504 36288 10 Expected 73 10 730 53290 8 Frequency 74 6 444 32856 75 3 225 16875 76 2 152 11552 6 sum 70 4912 345438 4 mean 70.17 2 sd 3.310 0 60 65 70 75 80 Height (in), Xi =(345438-(49122/70))/(70-1) Kolmogorov-Smirnov one-sample test The heights of 70 students: Chi-square goodness of fit of a normal distribution. Xi observed cumulative cumulative Z cumulative Upper class O O relative (Xi-mean)/s E Di Dí no. Height class limit frequency frequency O frequency frequency 1 <62.5 62.5 0 0.0 0.0000 -2.32 0.0102 0.0102 0.0102 2 62.5 - 63.5 63.5 2 2.0 0.0286 -2.02 0.0219 0.0066 0.0219 3 63.5 - 64.5 64.5 2 4.0 0.0571 -1.71 0.0434 0.0138 0.0148 4 64.5 - 65.5 65.5 3 7.0 0.1000 -1.41 0.0791 0.0209 0.0220 5 65.5 - 66.5 66.5 5 12.0 0.1714 -1.11 0.1338 0.0377 0.0338 6 66.5 - 67.5 67.5 4 16.0 0.2286 -0.81 0.2099 0.0186 0.0385 7 67.5 - 68.5 68.5 6 22.0 0.3143 -0.50 0.3069 0.0073 0.0784 8 68.5 - 69.5 69.5 5 27.0 0.3857 -0.20 0.4198 0.0341 0.1055 9 69.5 - 70.5 70.5 8 35.0 0.5000 0.10 0.5397 0.0397 0.1540 10 70.5 - 71.5 71.5 7 42.0 0.6000 0.40 0.6561 0.0561 0.1561 11 71.5 - 72.5 72.5 7 49.0 0.7000 0.70 0.7593 0.0593 0.1593 12 72.5 - 73.5 73.5 10 59.0 0.8429 1.01 0.8428 0.0001 0.1428 13 73.5 - 74.5 74.5 6 65.0 0.9286 1.31 0.9046 0.0240 0.0617 14 74.5 - 75.5 75.5 3 68.0 0.9714 1.61 0.9463 0.0251 0.0178 15 75.5 - 76.5 76.5 2 70.0 1.0000 1.91 0.9721 0.0279 0.0007 16 76.5 - 77.5 77.5 0 70.0 1.0000 2.21 0.9866 0.0134 0.0134 17 77.5 - 78.5 78.5 0 70.0 1.0000 2.52 0.9941 0.0059 0.0059 D max 0.0593 0.1593 Another method can be found in D 0.05, 70 0.1598 > D max example 7.14 (Zar 99) Accept Ho Symmetry (Skewness) and Kurtosis Skewness • A measure of the asymmetry of a distribution. • The normal distribution is symmetric, and has a skewness value of zero. • A distribution with a significant positive skewness has a long right tail. • A distribution with a significant negative skewness has a long left tail. • As a rough guide, a skewness value more than twice it's standard error is taken to indicate a departure from symmetry. Symmetry (Skewness) and Kurtosis Kurtosis • A measure of the extent to which observations cluster around a central point. • For a normal distribution, the value of the kurtosis statistic is 0. • Positive kurtosis indicates that the observations cluster more and have longer tails than those in the normal distribution ( leptokurtic). • Negative kurtosis indicates the observations cluster less and have shorter tails ( Platykurtic). • You should read the Chapters 1-7 of Zar 1999 which have been covered by the five lectures so far. • The frequency distribution of a sample can often be identified with a theoretical distribution, such as the normal distribution. • Five methods for comparing a sample distribution: inspection of the frequency histogram; probability plot; Chi-square goodness of fit, Kolmogorov-Smirnov one-sample test and D’Agostino-Pearson K2 test. • Probability plots can be used for testing normal and log-normal distributions. • Graphical methods often provide evidence of non-normal distributions, such as skewness and kurtosis (Excel or SPSS can determine the degree of these two measurements). • The Chi-square goodness of fit or Kolmogorov-Smirnov one-sample test also can be used to test of an unknown distribution against a theoretical distribution (apart from normal distribution). Binomial & Poisson Distributions and their Application (Chapters 24 & 25, Zar 1999) Binomial • Consider nominal scale data that come from a population with only two categories – members of a mammal litter may be classified as male or female – victims of an epidemic as dead or alive – progeny of a Drosophila cross as white-eyed or red-eyed Binomial Distributions The proportion of the population belonging to one of the two categories is denoted as: – p, then the other q = 1- p – e.g. if 48% male and 52% female so p = 0.48 and q = 0.52 (Source of photos: BBC) http://www.mun.ca/biology/scarr/Bird_sexing.htm http://zygote.swarthmore.edu/chap20.html Binomial Distributions • e.g. if p = 0.4 and q = 0.6: for taking 10 random samples, you will expect 4 males and 6 females; however, you might get 1 male and 9 females. • The probabilities of two independent events both occurring is the product of the probabilities of the two separate events: – (p)(q) = (0.4)(0.6) = 0.24; – (p)(p) = 0.16; and – (q)(q) = 0.36 Binomial Distributions • e.g. if p = 0.4 and q = 0.6: for taking 10 random samples, you will expect 4 males and 6 females • The probabilities of either of two independent events is sum of the probabilities of each event, e.g. for having one male and one female in the sample: pq + qp = 2 pq = 2(0.4)(0.6) = 0.48 • For having all male, all female, Both sexes = 0.16 + 0.36 + 0.48 = 1 Binomial Distributions If a random sample of size n is taken from a binomial population, the probability of X individuals being in one category (other category = n - X) is P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X) For n = 5, X = 3, p = q = 0.5, then P(X) = (5!/3!2!)(0.53)(0.52) P(X) = (10)(0.125)(0.25) = 0.3125 For X = 0, 1, 2 , 4, 5, P(X) = 0.03125, 0.15625, 0.31250, 0.15625, 0.03125, respectively Binomial distributions • For example: The data consist of observed frequencies of females in 54 litters of 5 offspring per litter. X = 0 denotes a litter having no females, X = 1 denotes a litter having one female, etc; f is the observed number of litters, and ef is the the number of litters expected if the null hypothesis is true. Computation of the values of ef requires the values of P(X) • Ho: The sex of the offspring are from a binomial distribution with p = q = 0.5 Observed efi X n n-X n!/(X!(n-x)!) p q p^X q^(n-X) P(X) Xi fi (P(X))(n) 0 5 5 1 0.5 0.5 1 0.03125 0.03125 0 3 1.688 1 5 4 5 0.5 0.5 0.5 0.0625 0.15625 1 10 8.438 2 5 3 10 0.5 0.5 0.25 0.125 0.31250 2 14 16.875 3 5 2 10 0.5 0.5 0.125 0.25 0.31250 3 17 16.875 4 5 1 5 0.5 0.5 0.0625 0.5 0.15625 4 9 8.438 5 5 0 1 0.5 0.5 0.03125 1 0.03125 5 1 1.688 2 = (Observed freq – Expected freq)2/Expected freq 2 = (3-1.688)2/1.688 + 0.2948 + 0.4898 + 0.0009 + 0.0375 + 0.2801 = 2.117 df = k -1 = 6 -1 = 5; 2 0.05, 5 = 11.07 so accept Ho. P>0.05 P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X) Poisson Distributions Important in describing random occurrences, these occurrences being either objects in space or events in time. P(X) = e- X/X! • When n is large and p is very small, Possion distribution approaches the binomial distribution. • Interesting property: 2 = Poisson Distributions P(X) = e- X/X! • e.g. The data are the number of sparrow nests in an area of given size (8,000 m2). There are totally 40 areas of the same size surveyed. Then Xi is the number of nests in an area; fi is the frequency of Xi nests per hectare; and P(Xi) is the probability of Xi nests per hectare, if the nests are distributed randomly. • Ho: the population of sparrow nests is distributed randomly Example 25.3 (Zar 1999) • Ho: the population of sparrow nests is distributed randomly O E fi (O-E)2/E Xi fi fiXi P(Xi) [P(Xi)](n) 0 9 0 0.33287 13.3148 1.398280 1 22 22 0.36616 14.6463 3.692154 2 6 12 0.20139 8.0555 0.524488 3 2 6 0.07384 2.9537 0.307921 4 1 4 0.02031 0.8123 0.043392 >=5 0 sum 40 44 Chi-square = 5.966234 mean 1.1 df = K -2 = 3 Chi-square (0.05, 3) = 7.815 Accept Ho P(0) = e –1.1 = 0.0332871 P(1) = (0.332871)(1.1)/1 P(X) = e- X/X! For further reading on Binomial and Poisson distributions: Zar’s chapters 24 and 25

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 2/9/2012 |

language: | |

pages: | 56 |

OTHER DOCS BY yurtgc548

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.