# Frequency distributions Testing of goodness of fit and

Document Sample

```					  Frequency distributions:
Testing of goodness of fit and
contingency tables
Chi-square statistics
• Widely used for nominal data’s analysis
• Introduced by Karl Pearson during 1900
• Its theory and application expanded by
him and R. A. Fisher
• This lecture will cover Chi-square test,
G test, Kolmogorov-Smirnov goodness
of fit for continuous data
The       2   test:
2 =  (observed freq. - expected freq.)2/ expected freq.

• Obtain a sample of nominal scale data and to
infer if the population conforms to a certain
theoretical distribution e.g. genetic study
• Test Ho that the observations (not the
variables) are independent of each other for
the population.
• Based on the difference between the actual
observed frequencies (not %) and the
expected frequencies
The 2 test:
2 =  (observed freq. - expected freq.)2/ expected freq.

• As a measure of how far a sample distribution
deviates from a theoretical distribution
• Ho: no difference between the observed and
expected frequency (HA: they are different)
• If Ho is true: the difference and Chi-square
 SMALL
• If Ho is false: both measurements  Large
For Questionnaire

Example (1)
what they thought about cutting air pollution
by increasing tax on vehicle fuel.
• 113 people agreed with this idea but the rest
disagreed.
• Perform a Chi-square text to determine the
probability of the results being obtained by
chance.
For Questionnaire
Agree        Disagree
Observed           113          259 -113 = 146
Expected 259/2 = 129.5          259/2 = 129.5
Ho: Observed = Expected
2 = (113 - 129.5)2/129.5 + (146 - 129.5)2 /129.5
= 2.102 + 2.102 = 4.204
df = k - 1 = 2 - 1 = 1

From the Chi-square (Table B1 in Zar’s book)
2 ( = 0.05, df = 1)= 3.841  for 2 = 4.202, 0.025<p<0.05

Therefore, rejected Ho. The probability of the results
being obtained by chance is between 0.025 and 0.05.
For Genetics

Practical (1)
• Calculate the Chi-square of data consisting of
100 flowers to a hypothesized color ratio of 3:1
(red: green) and test the Ho
• Ho: the sample data come from a population
having a 3:1 ratio of red to green flowers
• Observation: 84 red and 16 green
• Expected frequency for 100 flowers:
– 75 red and 25 green
For Genetics

Practical (2)
• Calculate the Chi-square of data consisting of
100 flowers to a hypothesized color ratio of 3:1
(red: green) and test the Ho
• Ho: the sample data come from a population
having a 3:1 ratio of red to green flowers
• Observation: 67 red and 33 green
• Expected frequency for 100 flowers:
– 75 red and 25 green
For Genetics
For > 2 categories
• Ho: The sample of Drosophila from a population
having 9: 3: 3: 1 ratio of pale body-normal wing
(PNW) to pale-vestigial wing (PVW) to dark-normal
wing (DNW) to dark-vestigial wing (DVW)

• Student’s observations in the lab:
PNW     PVW    DNW     DVW      Total
300     77     89      36       502

Calculate the chi-square and test Ho
For Genetics

• Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1
ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to
dark-normal wing (DNW) to dark-vestigial wing (DVW)

PNW PVW DNW DVW Total
Observed        300        77           89     36         502
Exp. proportion 9/16       3/16         3/16   1/16         1
Expected        282.4 94.1              94.1   31.4       502
O-E             17.6       -17.1 -5.1          4.6          0
(O - E)2        309.8 292.4 26.0               21.2
(O - E)2/E      1.1        3.1          0.3    0.7
2 = 1.1 + 3.1 + 0.3 + 0.7 = 5.2
df = 4 -1 = 3
2 ( = 0.05, df = 3)= 7.815  for 2 = 5.20, 0.25<p<0.10

Therefore, accept Ho.
For Questionnaire

Cross Tabulation or Contingency Tables

– Further examination of the data on the opinion on increasing
fuel to cut down air pollution (example 1):
– Ho: the decision is independent of sex
Males             Females
Agree          13 (a)              100 (b)
Disagree      116 (c)               30 (d)
Expected frequency for cell b = (a + b)[(b + d)/N]

Males              Females        n
Agree         13                 100          113
113(129/259)=56.28           113(130/259)= 56.72
Disagree     116                   30         146
146(129/259)=72.72            146(130/259)= 73.28
n          129                 130         259
Cross tabulation or contingency tables:
– Ho: the decision is independent of sex
Males             Females             n
Agree           13               100              113
56.28              56.72
Disagree      116                  30             146
72.72              73.28
n           129                130              259

2 = (13 - 56.28)2/56.28 + (100 - 56.72)2/56.72 + (116 - 72.72) 2/72.72 +
(30 - 73.28)2/73.28
= 117.63
df = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1
2 ( = 0.05, df = 1)= 3.841  p<0.001
Therefore, reject Ho and accept H A that the decision is dependent of sex.
Quicker method for 2 x 2 cross tabulation:
Class A         Class B         n
State 1       a              b           a+b
State 2       c              d           c+d
n       a+c             b+d       n = a + b + c +d
2 = n (ad - bc)2/(a + b)(c + d)(a + c)(b + d)
Males         Females
Agree       13             100         113
Disagree   116              30         146
129             130         259
2 = 259(13  30 - 116  100)2/(113)(146)(129)(130) = 117.64
2 (   = 0.05, df = 1)=   3.841  p<0.001; Therefore, rejected Ho.
Yates’ continuity correction:

• Chi-square is also a continuous distribution, while the
frequencies being analyzed are discontinuous (whole
number).
• To improve the analysis, Yates’ correction is often applied
(Yate,1934):

• 2 =  (observed freq. - expected freq. - 0.5)2/ expected freq.

• For 2 x 2 contingency table:
2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)
Yates’ Correction (example 1):
• 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)

Males          Females
Agree        13             100          113
Disagree    116              30          146
129             130           259
2 = 259(1330 - 116100 -0.5259)2/(113)(146)(129)(130)
= 114.935 (smaller than 117.64, less bias)

2 (   = 0.05, df = 1)=   3.841  p<0.001; Therefore, rejected Ho.
Practical 3:
• 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)
• For a drug test, Ho: The survival of the animals is
independent of whether the drug is administered
Treated            12             30              42
Not treated        27             31              58
n                  39             61              100

Using Yates’ correction to calculate 2 and test the hypothesis

Bias in Chi-square calculations

• If values of expected frequency (fi) are very small,
the calculated 2 is biased in that it is larger than the
theoretical 2 value and we shall tend to reject Ho.
                                  
• Rules: fi > 1 and no more than 20% of fi < 5.0.

• It may be conservative at significance levels < 5%,
especially when the expected frequencies are all
equal.

• If having small fi, (1) increase the sample size if
possible, use G-test or (2) combine the categories if
possible.
The G test (log-likelihood ratio)
G = 2  O ln (O/E)
• Similar to the 2 test
• Many statisticians believe that the G test is superior to the
2 test (although at present it is not as popular)
• For 2 x 2 cross tabulation:
Class A        Class B
State 1         a               b
State 2         c               d
The expected frequency for cell a = (a+b)[(a+c)/n]
Treated              12 (16.38)        30 (25.62)     42
Not treated          27 (22.62)        31 (35.38)     58
n                    39                61            100
G = 2  O ln (O/E)
Treated              12 (16.38)       30 (25.62)       42
Not treated          27 (22.62)       31 (35.38)       58
n                    39               61              100

(1) Calculate G:
G = 2 [ 12 ln(12/16.38) + 30 ln(30/25.62) + 27 ln(27/22.62) + 31
ln(31/35.38)]
G = 2 (1.681) = 3.362
(2) Calculate the William’s correction: 1 + [(w2 - 1)/6nd] where w
is the number of frequency cells, n is total number of
measurements and d is the degree of freedom (r-1)(c-1)
= 1 + [(42 - 1)/ (6)(100)(1)] = 1.025
 G (adjusted) = 2 = 3.362/1.025 = 3.28 (< 3.31 from 2 test)
 2 ( = 0.05, df = 1)= 3.841  p>0.05; Therefore, accept Ho.
• Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1
ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to
dark-normal wing (DNW) to dark-vestigial wing (DVW)

PNW        PVW DNW DVW Total
Observed          300        77           89     36         502
Expected          282.4      94.1         94.1   31.4
O ln(O/E)         18.14      -15.44 -4.96 4.92
G value:                     G = 2  (18.14 - 15.44 - 4.96 + 4.92) = 5.32
William’s correction:        1 + [(42 - 1)/6 (502) (3)] = 1.00166
2 ( = 0.05, df = 3)= 7.815  for 2 = 5.20, 0.25<p<0.10

Therefore, accept Ho.
The Kolmogorov-Smirnov goodness of fit test =
Kolmogorov-Smirnov one-sample test

• Deal with goodness of fit tests applicable to nominal scale
data and for data in ordered categories
• Example: 35 cats were tested one at a time, and allowed to
choose 5 different diets with different moisture content (1=
very moist to 5 = very dry):

• Ho: Cats prefer all five equally
1      2       3      4       5      n
Observed       2      18      10     4       5      35
Expected       7      7       7      7       7      35
Kolmogorov-Smirnov one-sample test
• Ho: Cats prefer all five diets equally
1             2       3        4        5        n
O                    2             18      10       4        1        35
E                    7             7       7        7        7        35
Cumulative O 2                     20      30       34       35
Cumulative E 7                     14      21       28       35
 di                5             6       9        6        0
dmax = maximum  di = 9
(dmax), k, n = (dmax) 0.05, 5, 35 = 7 (Table B8: k = no. of categories)
Therefore reject Ho.               0.002< p < 0.005
• When applicable (i.e. the categories are ordered), the K-S test is more
powerful than the 2 test when n is small or when values of observed
frequencies are small.
• Note: order for the same data changed to 2, 1, 4, 18 and 10: the 2 test
will give the same results (independent of the orders) but the calculated
dmax from the K-S test will be different.
Kolmogorov-Smirnov one-sample test for
continuous ratio scale data

• Example 22.11 (page 479 in Zar)
• Ho: Moths are distributed uniformly from ground
level to height of 25 m
• HA: Moths are not distributed uniformly from
ground level to height of 25 m
• Use of Table B9
Ho: Moths are distributed uniformly from ground level to height of 25 m

Xi/25
Fi          Fi/15        relative
Xi       fi        cumulative relative      expected
no.         height frequency   frequency frequency      frequency      Di        Dí
1        1.4       1             1     0.0667            0.056    0.0107        0.0560
2        2.6       1             2     0.1333            0.104    0.0293        0.0373
3        3.3       1             3     0.2000            0.132    0.0680        0.0013
4        4.2       1             4     0.2667            0.168    0.0987        0.0320
5        4.7       1             5     0.3333            0.188    0.1453        0.0787
6        5.6       2             7     0.4667            0.224    0.2427        0.1093
7        6.4       1             8     0.5333            0.256    0.2773        0.2107
8        7.7       1             9     0.6000            0.308    0.2920        0.2253
9        9.3       1            10     0.6667            0.372    0.2947        0.2280
10       10.6       1            11     0.7333            0.424    0.3093        0.2427
11       11.5       1            12     0.8000            0.460    0.3400        0.2733
12       12.4       1            13     0.8667            0.496    0.3707        0.3040
13       18.6       1            14     0.9333            0.744    0.1893        0.1227
14       22.3       1            15     1.0000            0.892    0.1080        0.0413

Max =       0.3707        0.3040

Table B9 D 0.05, 15 =         0.3376 < D max

Therefore, reject Ho.      0.02<p<0.05
Kolmogorov-Smirnov one-sample test
for grouped data (example 22.11)
Xi                  0-5 m     5-10 m 10-15 m           15-20 m    20-25 m         n
observed fi             5           5      3                 1          1        15
expected fi             3           3      3                 3          3        15
Cumulative O fi         5          10     13                14         15
Cumulative E fi         3           6      9                12         15
abs di                  2           4      4                 2          0
d max                   4
d max 0.05, 5, 15       5 (use Table B8)
Thus, accept Ho 0.05<p<0.10

Note: The power is lost (and Ho is not rejected) by grouping the data
and therefore grouping should be avoided whenever possible.

• The power is reduced by grouping the data and
therefore grouping should be avoided whenever
possible.
• K-S test can be used to test normality of data
• Recognizing the distribution of your data
is important
– Provides a firm base on which to establish
and test hypotheses
– If data are normally distributed, you can
use parametric tests;
– Otherwise transform data to normal
distribution
– Or non-parametric tests should be
performed
• For a reliable test for normality of
interval data, n must be large enough
(e.g. > 15)

– Difficult to tell whether a small data set
(e.g. 5) is normally distributed
•   Inspection of the frequency histogram
•   Probability plot
•   Chi-square goodness of fit
•   Kolmogorov-Smirnov one-sample test
•   Symmetry and Kurtosis: D’Agostino-
Pearson K2 test (Chapters 6 & 7, Zar 99)
Inspection of the frequency histogram

• Construct the frequency histogram
• Calculate the mean and median (mode as well, if possible)
• Check the shape of the distribution and the location of
these measurements
Probability plot
e.g. 1                  c.f./61                   =NORMSINV(X)

Cumulative                     Probit     Upper
Class        frequency   frequency Percentile z         (5 + z)    Class limit
0-   2            1            1   0.0164    -2.1347     2.8653          2
2-   4            2            3   0.0492    -1.6529     3.3471          4
4-   6            3            6   0.0984    -1.2910     3.7090          6
6-   8            5           11   0.1803    -0.9141     4.0859          8
8-   10           8           19   0.3115    -0.4917     4.5083         10
10 -   12          11           30   0.4918    -0.0205     4.9795         12
12 -   14           8           38   0.6230     0.3132     5.3132         14
14 -   16           9           47   0.7705     0.7405     5.7405         16
16 -   18           6           53   0.8689     1.1210     6.1210         18
18 -   20           4           57   0.9344     1.5096     6.5096         20
20 -   22           3           60   0.9836     2.1347     7.1347         22
22 -   24           1           61   1.0000
e.g. 1
Probability plot

1.0
12
0.9

10                                                                                     0.8

Expected cum ulative p
0.7
8
frequency

0.6
6                                                                                      0.5

0.4
4
0.3
y = 0.8502x + 0.0736
2                                                                                      0.2                               R2 = 0.9711
0.1
0
1   2   3   4    5    6   7    8    9    10   11   12                            0.0
0.0   0.2        0.4         0.6       0.8    1.0
Bin num ber (bin size = 2)
Observed cum ulative p
Probability plot
e.g. 2

Cumulative                     Probit     Upper
Class        frequency   frequency Percentile z         (5 + z)    Class limit
0-   2           10           10   0.1111    -1.2206     3.7794          2
2-   4           24           20   0.2222    -0.7647     4.2353          4
4-   6           11           44   0.4889    -0.0279     4.9721          6
6-   8            9           55   0.6111     0.2822     5.2822          8
8-   10           4           64   0.7111     0.5566     5.5566         10
10 -   12           5           68   0.7556     0.6921     5.6921         12
12 -   14           2           73   0.8111     0.8820     5.8820         14
14 -   16           2           75   0.8333     0.9674     5.9674         16
16 -   18           4           77   0.8556     1.0606     6.0606         18
18 -   20           1           81   0.9000     1.2816     6.2816         20
20 -   22           8           82   0.9111     1.3476     6.3476         22
22 -   24          10           90   1.0000
Probability plot
e.g. 2
30
1.0
25

20                                                                       0.9
Frequency

y = 1.0876x - 0.2443
15                                                                       0.8               R2 = 0.8576

10                                                                       0.7

5
0.6

Exp cum P
0
0.5
1   2   3   4    5    6   7    8    9    10   11   12
Bin num ber (bin size = 2)                              0.4

0.3

• Obviously, the data is not                                                           0.2

distributed on the line.                                                             0.1

0.0

• Based on the frequency distribution                                                        0.0   0.2       0.4       0.6   0.8   1.0
Obs cum P
of the data, the distribution is
positive skew (higher frequencies
at lower classes)
• Concave curve indicates positive skew which suggest a log-
normal distribution (i.e. log-transformation of the upper class
limit is required)
 very common e.g. mortality rates

• Convex curve indicates negative skew
 less common (e.g. some binomial distribution)
• S-shaped curve suggests ‘bad’ kurtosis: Normality departure but their
mean, median, mode remain equal
• Leptokurtic distribution: data bunched around the mean, giving a
sharp peak
• Platykurtic distribution: a board summit which falls rapidly in the
tails

• Bimodal distributions e.g.
toxicity data produce a
sigmoid probability plot
• Multi-modal distributions:
data from animals with several
age-classes; undulating wave-
like curve
Chi-Square Goodness of Fit
The heights of 70 students: Chi-square goodness of fit of a normal distribution.
(Example 6.1 in Zar)
7.4                                                      E
O         Xi         Z                               Expected Expected
observed Upper class (Xi-mean)/s P(z)      P(Xi)     frequency frequency
no.        Height class   frequency limit                                      n(P(Xi))   n(P(Xi))        (O-E)^2/E
1   <62.5                  0       62.5        -2.32   0.0102    0.0102     0.7172
2    62.5 - 63.5           2       63.5        -2.02   0.0219    0.0117     0.8191          1.5363       0.1400
3    63.5 - 64.5           2       64.5        -1.71   0.0434    0.0214     1.4987          1.4987       0.1677
4    64.5 - 65.5           3       65.5        -1.41   0.0791    0.0358     2.5048          2.5048       0.0979
5    65.5 - 66.5           5       66.5        -1.11   0.1338    0.0546     3.8238          3.8238       0.3618
6    66.5 - 67.5           4       67.5        -0.81   0.2099    0.0762     5.3318          5.3318       0.3327
7    67.5 - 68.5           6       68.5        -0.50   0.3069    0.0970     6.7906          6.7906       0.0921
8    68.5 - 69.5           5       69.5        -0.20   0.4198    0.1129     7.8996          7.8996       1.0643
9    69.5 - 70.5           8       70.5         0.10   0.5397    0.1199     8.3939          8.3939       0.0185
10    70.5 - 71.5           7       71.5         0.40   0.6561    0.1164     8.1467          8.1467       0.1614
11    71.5 - 72.5           7       72.5         0.70   0.7593    0.1032     7.2220          7.2220       0.0068
12    72.5 - 73.5          10       73.5         1.01   0.8428    0.0835     5.8479          5.8479       2.9481
13    73.5 - 74.5           6       74.5         1.31   0.9046    0.0618     4.3251          4.3251       0.6486
14    74.5 - 75.5           3       75.5         1.61   0.9463    0.0417     2.9219          2.9219       0.0021
15    75.5 - 76.5           2       76.5         1.91   0.9721    0.0258     1.8029          1.8029       0.0215
16    76.5 - 77.5           0       77.5         2.21   0.9866    0.0145     1.0161          1.5392       1.5392
17    77.5 - 78.5           0       78.5         2.52   0.9941    0.0075     0.5231
Chi-square =        7.6026
Chi-sq 0.05, 12 =       21.026

Accept Ho: the data are normally distributed
Xi         fi               fiXi           fi(Xi)^2
Mid height      freq
63               2           126          7938
64               2           128          8192
65               3           195         12675
66               5           330         21780
67               4           268         17956
68               6           408         27744
69               5           345         23805
70               8           560         39200
12
Observed
71               7           497         35287
72               7           504         36288               10                                    Expected
73              10           730         53290
8

Frequency
74               6           444         32856
75               3           225         16875
76               2           152         11552               6

sum                    70          4912       345438                4
mean                    70.17                            2
sd                      3.310
0
60   65         70          75        80
Height (in), Xi

=(345438-(49122/70))/(70-1)
Kolmogorov-Smirnov one-sample test
The heights of 70 students: Chi-square goodness of fit of a normal distribution.
Xi          observed cumulative cumulative Z                  cumulative
Upper class O         O            relative      (Xi-mean)/s E             Di        Dí
no.         Height class        limit       frequency frequency    O frequency                frequency
1   <62.5                      62.5         0          0.0        0.0000        -2.32       0.0102    0.0102    0.0102
2         62.5 - 63.5          63.5         2          2.0        0.0286        -2.02       0.0219    0.0066    0.0219
3         63.5 - 64.5          64.5         2          4.0        0.0571        -1.71       0.0434    0.0138    0.0148
4         64.5 - 65.5          65.5         3          7.0        0.1000        -1.41       0.0791    0.0209    0.0220
5         65.5 - 66.5          66.5         5         12.0        0.1714        -1.11       0.1338    0.0377    0.0338
6         66.5 - 67.5          67.5         4         16.0        0.2286        -0.81       0.2099    0.0186    0.0385
7         67.5 - 68.5          68.5         6         22.0        0.3143        -0.50       0.3069    0.0073    0.0784
8         68.5 - 69.5          69.5         5         27.0        0.3857        -0.20       0.4198    0.0341    0.1055
9         69.5 - 70.5          70.5         8         35.0        0.5000         0.10       0.5397    0.0397    0.1540
10         70.5 - 71.5          71.5         7         42.0        0.6000         0.40       0.6561    0.0561    0.1561
11         71.5 - 72.5          72.5         7         49.0        0.7000         0.70       0.7593    0.0593    0.1593
12         72.5 - 73.5          73.5        10         59.0        0.8429         1.01       0.8428    0.0001    0.1428
13         73.5 - 74.5          74.5         6         65.0        0.9286         1.31       0.9046    0.0240    0.0617
14         74.5 - 75.5          75.5         3         68.0        0.9714         1.61       0.9463    0.0251    0.0178
15         75.5 - 76.5          76.5         2         70.0        1.0000         1.91       0.9721    0.0279    0.0007
16         76.5 - 77.5          77.5         0         70.0        1.0000         2.21       0.9866    0.0134    0.0134
17         77.5 - 78.5          78.5         0         70.0        1.0000         2.52       0.9941    0.0059    0.0059

D max           0.0593     0.1593

Another method can be found in                                                        D 0.05, 70      0.1598 > D max

example 7.14 (Zar 99)                                                                 Accept Ho
Symmetry (Skewness)
and Kurtosis
Skewness
• A measure of the asymmetry of a
distribution.
• The normal distribution is
symmetric, and has a skewness
value of zero.
• A distribution with a significant
positive skewness has a long
right tail.
• A distribution with a significant
negative skewness has a long left
tail.
• As a rough guide, a skewness
value more than twice it's
standard error is taken to
indicate a departure from
symmetry.
Symmetry (Skewness)
and Kurtosis
Kurtosis
• A measure of the extent to which
observations cluster around a central
point.
• For a normal distribution, the value
of the kurtosis statistic is 0.
• Positive kurtosis indicates that the
observations cluster more and have
longer tails than those in the normal
distribution ( leptokurtic).
• Negative kurtosis indicates the
observations cluster less and have
shorter tails ( Platykurtic).
• You should read the Chapters 1-7 of Zar 1999 which have been
covered by the five lectures so far.
• The frequency distribution of a sample can often be identified with a
theoretical distribution, such as the normal distribution.
• Five methods for comparing a sample distribution: inspection of the
frequency histogram; probability plot; Chi-square goodness of fit,
Kolmogorov-Smirnov one-sample test and D’Agostino-Pearson K2
test.
• Probability plots can be used for testing normal and log-normal
distributions.
• Graphical methods often provide evidence of non-normal distributions,
such as skewness and kurtosis (Excel or SPSS can determine the
degree of these two measurements).
• The Chi-square goodness of fit or Kolmogorov-Smirnov one-sample
test also can be used to test of an unknown distribution against a
theoretical distribution (apart from normal distribution).
Binomial & Poisson Distributions
and their Application

(Chapters 24 & 25, Zar 1999)
Binomial
• Consider nominal scale data that come from
a population with only two categories
– members of a mammal litter may be classified
as male or female
– victims of an epidemic as dead or alive
– progeny of a Drosophila cross as white-eyed or
red-eyed
Binomial Distributions

The proportion of the population belonging to one of
the two categories is denoted as:

– p, then the other q = 1- p

– e.g. if 48% male and 52% female so
p = 0.48 and q = 0.52
(Source of photos: BBC)
http://www.mun.ca/biology/scarr/Bird_sexing.htm
http://zygote.swarthmore.edu/chap20.html
Binomial Distributions
• e.g. if p = 0.4 and q = 0.6: for taking 10 random samples,
you will expect 4 males and 6 females; however, you
might get 1 male and 9 females.

• The probabilities of two independent events both occurring
is the product of the probabilities of the two separate
events:
– (p)(q) = (0.4)(0.6) = 0.24;
– (p)(p) = 0.16; and
– (q)(q) = 0.36
Binomial Distributions

• e.g. if p = 0.4 and q = 0.6: for taking 10 random
samples, you will expect 4 males and 6 females

• The probabilities of either of two independent events
is sum of the probabilities of each event, e.g. for
having one male and one female in the sample:
pq + qp = 2 pq = 2(0.4)(0.6) = 0.48

• For having all male, all female,
Both sexes = 0.16 + 0.36 + 0.48 = 1
Binomial Distributions
If a random sample of size n is taken from a binomial
population, the probability of X individuals being in
one category (other category = n - X) is
P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X)

For n = 5, X = 3, p = q = 0.5, then
P(X) = (5!/3!2!)(0.53)(0.52)
P(X) = (10)(0.125)(0.25) = 0.3125

For X = 0, 1, 2 , 4, 5,
P(X) = 0.03125, 0.15625, 0.31250, 0.15625, 0.03125,
respectively
Binomial distributions
• For example: The data consist of observed frequencies of females in 54
litters of 5 offspring per litter. X = 0 denotes a litter having no females, X
= 1 denotes a litter having one female, etc; f is the observed number of
litters, and ef is the the number of litters expected if the null hypothesis is
true. Computation of the values of ef requires the values of P(X)
• Ho: The sex of the offspring are from a binomial distribution with p = q =
0.5
Observed          efi
X   n     n-X    n!/(X!(n-x)!)     p    q        p^X    q^(n-X)       P(X)
Xi          fi   (P(X))(n)
0   5      5           1         0.5   0.5         1   0.03125    0.03125
0           3      1.688
1   5      4           5         0.5   0.5       0.5    0.0625    0.15625    1          10      8.438
2   5      3          10         0.5   0.5      0.25     0.125    0.31250    2          14     16.875
3   5      2          10         0.5   0.5     0.125      0.25    0.31250    3          17     16.875
4   5      1           5         0.5   0.5    0.0625        0.5   0.15625    4           9      8.438
5   5      0           1         0.5   0.5   0.03125          1   0.03125    5           1      1.688

2 = (Observed freq – Expected freq)2/Expected freq
2 = (3-1.688)2/1.688 + 0.2948 + 0.4898 + 0.0009 + 0.0375 +
0.2801
= 2.117
df = k -1 = 6 -1 = 5; 2 0.05, 5 = 11.07 so accept Ho. P>0.05
P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X)
Poisson Distributions
Important in describing random occurrences, these
occurrences being either objects in space or events in
time.

P(X) = e- X/X!

• When n is large and p is very small, Possion
distribution approaches the binomial distribution.
• Interesting property: 2 = 
Poisson Distributions
P(X) = e- X/X!
• e.g. The data are the number of sparrow nests in an
area of given size (8,000 m2). There are totally 40
areas of the same size surveyed. Then Xi is the
number of nests in an area; fi is the frequency of Xi
nests per hectare; and P(Xi) is the probability of Xi
nests per hectare, if the nests are distributed
randomly.
• Ho: the population of sparrow nests is distributed
randomly
Example 25.3 (Zar 1999)
• Ho: the population of sparrow nests is distributed
randomly
O                                    E fi          (O-E)2/E
Xi            fi          fiXi         P(Xi)       [P(Xi)](n)
0           9           0      0.33287     13.3148     1.398280
1          22          22      0.36616     14.6463     3.692154
2           6          12      0.20139        8.0555   0.524488
3           2           6      0.07384        2.9537   0.307921
4           1           4      0.02031        0.8123   0.043392
>=5           0
sum                  40           44          Chi-square = 5.966234
mean               1.1
df = K -2 = 3
Chi-square (0.05, 3) = 7.815
Accept Ho

P(0) = e –1.1 = 0.0332871
P(1) = (0.332871)(1.1)/1                             P(X) = e- X/X!
For further reading on Binomial and Poisson
distributions: Zar’s chapters 24 and 25

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 2/9/2012 language: pages: 56