# Poisson

Document Sample

```					Ch 12, Several Proportions
   Skip 12.1, 12.2
   12.3 deals with constructing 2x2
tables such as Table 12.9, p 556
   We will assume the tables have been
constructed

SM219 Ch12            1
Ch 12, Several Proportions
   There are several ways to view these tables
   One is that there are two categorical
variables for each subject
   In 12.72, p 576, this could be (1) diabetic?
(2) education?
   We then want to test if the two
categorizations are independent

SM219 Ch12                 2
Ch 12, Several Proportions
   If Educ is ind of Diabetic, then the cond
prob of each level of education is the same
for Diabetic and non-Diabetic
   For the researcher, this would mean that
the different levels of Education are not
affected by whether or not the subject is
Diabetic

SM219 Ch12                 3
Ch 12, Several Proportions
   In a problem like 12.74, this is harder to visualize
   Each variable has >2 levels
   We could think of each of the levels of illness and imagine
that there are probabilities for each type of response
   Then the question becomes, “Are the probabilities of the
different responses the same for each level of illness?”
   This is nearly the same as the way we posed the question
before.
   The analysis is the same

SM219 Ch12                         4
Ch 12, Several Proportions
   Consider 12.74
   H0 says that the prob of “Def not” is the
same for Normal, Mild and Severe
   We don’t know what this probability is
   Could est it by the total in our sample who
said “Def not”, regardless of Illness
   If there are N subjects and R1 said “Def
not”, then we would est the prob by R1/N

SM219 Ch12                5
Ch 12, Several Proportions
   Similarly, P(“Don’t think so”) would be est by R2/N,
etc
   There are slightly different numbers in each Illness
category
   To account for this, we find the Expected # in each
cell
   For Normal, Def not, this would be R1/N * C1 where
C1 is the sum of col 1
   For Mild, Def not, it is R1/N * C2, etc


SM219 Ch12                      6
Ch 12, Several Proportions
   For the I,j entry, the expected is Ri*Cj/N
    We then have 12 expected values to
compare to our 12 observed values
   Use  (obs-exp)^2/exp
   If this is small, then all the obs are close to
what was expected (when H0 is true)
   If this is not small, then at least one obs is
different from expected

SM219 Ch12                   7
Ch 12, Several Proportions
   How large? How small?
   To compute p-values, we use the Chi-
square distn
   The only parameter is “degrees of
freedom”
   (Yes, it is related to df for t.)

SM219 Ch12             8
Ch 12, Several Proportions
   Use the Chi^2 distn to determine the
probability of our statistic (or larger)
   This is the p-value for the test
   Either test if the distributions are the same
for each level of one category
   Or if the two classifications are independent
   For these problems, df=(# rows-1)* (#
cols-1)

SM219 Ch12                  9
Ch 12, Several Proportions
   Recall the binomial model
   We can compute p-value for
difference using normal
   Or use Chi^2
   For two-sided problem, we get the
same p-value

SM219 Ch12          10
Goodness of Fit
   Previously, we found the expected
cell counts using the idea that the
distributions of each row (col) should
be the same as the overall distn
   That’s not the only way to find the
expected counts

SM219 Ch12              11
Goodness of Fit
   IQ tests are designed so that the
scores are normally distributed with
mean=100 and SD=10
   We have gathered scores for 75
students and tabulated the scores

SM219 Ch12            12
Goodness of Fit
Range    # of
Scores
<90      7

90-100   18

100-110 34

>110     16

SM219 Ch12   13
Goodness of Fit
   We can use the supposed normal
distn of IQ scores to find how many
scores we would expect in each range
   We put the probabilities in the table

SM219 Ch12              14
Goodness of Fit
Range    # of     Prob
Scores
<90      7        0.158

90-100   18       0.341

100-110 34        0.341

>110     16       0.158

SM219 Ch12   15
Goodness of Fit
   Now we can find the expected # of
scores by multiplying the probabilities
by 75

SM219 Ch12               16
Goodness of Fit
Range    # of     Prob         Exp.
Scores
<90      7        0.158        11.5

90-100   18       0.341        25.6

100-110 34        0.341        25.6

>110     16       0.158        11.5

SM219 Ch12          17
Goodness of Fit
   We now have the observed counts (2nd col)
and the expected counts (4th col) and can
compute Chi^2
   The value of the statistic is 8.44
   FOR THIS PROBLEM, df=# cells-1
   So the p-value is 0.037 and we would reject
H0 that these scores come from a normal
dist with mean=100 and SD=10

SM219 Ch12                 18
Goodness of Fit
   This is a rather particular problem
where we knew not only a normal
distn, but also the mean and SD
   A more general problem is
determining if the data might come
from ANY normal distn

SM219 Ch12            19
Goodness of Fit
   There are 2 modifications to the method
   (1) Estimate the mean and SD from the
data
   (2) DF=# cells -1 - # parameters estimated
   (In this example, we estimate 2 parameters
from the data.)

SM219 Ch12                20
Goodness of Fit
   Suppose our original data had
avg=103 and SD=9.5
   Our table of probabilities and
expected values becomes

SM219 Ch12      21
Goodness of Fit
Range    # of     Prob         Exp.
Scores
<90      7        0.086        6.4

90-100   18       0.29         21.8

100-110 34        0.39         29.5

>110     16       0.23         17.3

SM219 Ch12          22
Goodness of Fit
   If we compute Chi^2 from this table,
the value of the statistic is 1.5
   DF=4-1-2 = 1
   The pvalue is 0.22, so we would not
reject H0. The data might come from
“some” normal distn.

SM219 Ch12             23
Goodness of Fit

SM219 Ch12   24
Ch 12, Pooling
   If we find a small p-value, we might want to
know where the differences are
   Look for a pattern in the differences
between obs and exp
   Is one row somewhat consistently higher
than expected? Lower?

SM219 Ch12                  25
Ch 12, Pooling
   We can then combine the counts for
rows or cols that have similar
patterns
   If we base our combinations on the
data, then the df do not change

SM219 Ch12           26
Ch 12, Pooling

Fly          Nuke   Surface

Group I    15              8       15

II   11             18        8

III     8            22        7
SM219 Ch12                    27
Ch 12, Pooling

df                      4

LO                 12.36708

HI                     99

Ans=                   0.0148

SM219 Ch12              28
Ch 12, Pooling
   So are quite sure that Service
depends on Group
   But what are the differences?
   Consider the table of (obs-exp)/exp
   Note: no square

SM219 Ch12            29
Ch 12, Pooling

Fly           Nuke    Surf

I   0.30            -0.51   0.47

II   -0.02           0.14    -0.19

III   -0.29           0.39    -0.29
SM219 Ch12               30
Ch 12, Pooling
   In this table, Groups II and III are
the opposite of Group I
   Suggests we combine II and III

SM219 Ch12            31
Ch 12, Pooling

Fly          Nuke       Surf

I    15                  8    15

II    19                 40    15

SM219 Ch12               32
Ch 12, Pooling

df                   4

LO             11.41149

HI                  99

Ans=              0.0223
SM219 Ch12              33
Ch 12, Pooling
   Note df are unchanged
   Chi^2 slightly lower
   P-value slightly higher
   PV still small
   Conclude that Group I is diff from
other Groups

SM219 Ch12          34
Ch 12, Pooling
   Look at (obs-exp)/exp for this table

SM219 Ch12            35
Ch 12, Pooling

Fly           Nuke    Surf

I   0.30            -0.51   0.47

II   -0.15           0.26    -0.24

SM219 Ch12                   36
Ch 12, Pooling
   Note that Fly and Surface are simlar
   Combine these

SM219 Ch12            37
Ch 12, Pooling

Fly, Surf     Nuke

I          30        8

II          34       40

SM219 Ch12          38
Ch 12, Pooling

df                      4

LO                 11.16548

HI                     99

Ans=                  0.0248

SM219 Ch12              39
Ch 12, Pooling
   As before, Chi^2 slightly smaller
   PV slightly larger
   Still small
   Conclude that the diff is really Group
I, Nuke

SM219 Ch12              40
Ch 12, Pooling

SM219 Ch12   41

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 8/30/2012 language: English pages: 41
How are you planning on using Docstoc?