# Lec.18 Chi Square.ppt

Document Sample

```					Chi-Square Test of Independence:

TARIQ MAHMOOD
In this regard, it is interesting to note
that, (since the formula of chi-square in this
particular situation is very similar to the
formula that we have just discussed),
therefore, the chi-square test of independence
can also be regarded as a kind of chi-square
test of goodness of fit.
We illustrate this concept with the help
of an example:
EXAMPLE
A random sample of 250 men and 250
women were polled as to their desire
concerning the ownership of personal
computers.
The following data resulted:
Men   Women   Total
Want PC        120    80     200
Don’t Want
130    170    300
PC
Total   250    250    500
Test the hypothesis that desire to own a
personal computer is independent of sex at
the 0.05 level of significance.
SOLUTION

i)   H0 : The two variables of classification
(i.e. gender and desire for PC) are
independent, and
H1 : The two variables of classification
are not independent.
ii)  The significance level is set at      =
0.05.

iii) The test-statistic to be used is

    oij  eij 
2                           2
eij
i    j
This statistic, if H0 is true, has an
approximate chi-square distribution with
(r - 1) (c - 1) = (2 - 1) (2 - 1) = 1
degrees of freedom.
iv) Computations:
In order to determine the value of 2,
we carry out the following computations   :
The first step is to compute the
expected frequencies.
The expected frequency of any cell is
obtained by multiplying the marginal total to
the right of that cell by the marginal total
directly below that cell, and dividing this
product by the grand total.
200 250  
In this example,   e11                     100 ,
500
200250 
e12               100 ,
500
300250 
e21                    150 ,
500
and
300 250 
e22                    150 .
500
Hence, we have:
Expected Frequencies:

Men       Women   Total
Want PC         100        100     200
Don’t Want
150        150     300
PC
Total   250        250     500
Next, we construct the columns of
oij – eij, (oij - eij)2 and (oij - eij)2 eij , as
shown below:                           /
 o  e   o  e  o  e  e
Observed Expected
2               2
Frequency Frequency
oij     eij        ij   ij    ij   ij        ij    ij       ij

120        100        20          400            4.00

130        150        - 20        400            2.67

80         100        - 20        400            4.00

170        150        20          400            2.67

  13.33
2
Hence, the computed value of our test-
statistic comes out to be

  13.33
2
v)   Critical Region:
2  20.05(1) = 3.84
vi) Conclusion:
Since 13.33 is bigger than 3.84, we
reject H0 and conclude that desire to
own a personal computer set and sex are
associated.
Now that we have concluded that
gender and desire for PC are associated, the
natural question is, “Which gender is it
where the proportion of persons wanting a
PC is higher?”
We have:
Men   Women   Total
Want PC        120    80     200
Don’t Want
130    170    300
PC
Total   250    250    500
A close look at the given data indicates
clearly that the proportion of persons who
are desirous of owning a personal computer
is higher among men than among women.
And, (since our test statistic has come
out to be significant), therefore we can say
that the proportion of men wanting a PC is
significantly higher than the proportion of
women wanting to own a PC.
Let us consider another example:
EXAMPLE

A national survey was conducted in a
country to obtain information regarding the
smoking patterns of the adults males by
marital status.
A random sample of 1772 citizens, 18
years old and over, yielded the following
data :
SMOKING PATTERN
Marital
Total     Only    Regular   Total
Status
Abstinence at times Smoker
Single          67       213       74       354
Married         411       63       129     1173
Widowed         85        51        7       143
Divorced        27        60       15       102
Total      590      957       225     1772
Use this data to decide whether there is
an association between marital status and
smoking patterns.
The students are encouraged to work
on this problem on their own, and to decide
for themselves whether to accept or reject
the null hypothesis.
(In this problem, the null and the
alternative hypotheses will be:
H0 : Marital status and smoking patterns
are statistically independent.
HA : Marital status and smoking
patterns are not statistically independent.)
In addition, in today’s lecture, you
r c 0  e 2
learnt that the statistic
  
2       ij  ij
,
i 1 j1   eij
follows the chi-square distribution having (r-
1)(c-1) degrees of freedom.
Let us try to understand this point:
Consider the 2  2 contingency table ---
similar to the one that we had in the example
regarding the desire for ownership of a
personal computer.
In this regard, suppose that we have two
variables of classification, A and B, and the
situation is as follows:
A1    A2    Total
B1                  200
B2                  300
Total   250   250    500
The point is that, given the marginal
totals and the grand total, if we choose the
frequencies of the first cell of the first row
freely, we are not free to choose the
frequency of the second cell of the first row.
Also, given the frequency of the above-
mentioned first cell, we are not even free to
choose the frequency of the second cell of
the first column.
Not only this, it is interesting to note
that, given the above, we are not even free to
choose the frequency of the second cell of
the second row or the second column !
Hence, given the marginal and grand
totals, we have only degree of freedom (i.e.
1       =    1            1     =    (2-1)
(2-1) degrees of freedom).
A similar situation holds in the case of a
2 x 3 contingency table. The students are
encouraged to work on this point on their
own, and to realize for themselves that, in the
case of a 2 x 3 contingency table, there exist
(2 - 1) ( 3 - 1) = 2 degrees of freedom .
Next, let us consider the concept of p-
value:
You will recall that, with reference to
the concept of hypothesis-testing, we
compared the computed value of our test
statistic with a critical value.
For example, in case of a right-tailed
test, we rejected the null hypothesis if our
computed value exceeded the critical value,
and we accepted the null hypothesis if our
computed value turned out to be smaller than
the critical value.
A hypothesis can also be tested by
means of what is known as the p-value:
p-Value:

The probability of observing a sample
value as extreme as, or more extreme than,
the value observed, given that the null
hypothesis is true.
We illustrate this concept with the help
of the example concerning the hourly wages
of computer analysts and registered nurses
that we discussed in an earlier lecture:
The students will recall that the
example was as follows:
EXAMPLE

A survey conducted by a market-
research organization five years ago showed
that the estimated hourly wage for temporary
computer analysts was essentially the same
as the hourly wage for registered nurses.
This year, a random sample of 32
temporary computer analysts from across the
country is taken.
The analysts are contacted by telephone
and asked what rates they are currently able
to obtain in the market-place.
A similar random         sample   of   34
registered nurses is taken.
The resulting wage figures are listed in
the following table:
Computer Analysts          Registered Nurses
\$24.10   \$25.00   \$24.25   \$20.75   \$23.30   \$22.75
23.75    22.70    21.75    23.80    24.00    23.00
24.25    21.30    22.00    22.00    21.75    21.25
22.00    22.55    18.00    21.85    21.50    20.00
23.50    23.25    23.50    24.16    20.40    21.75
22.80    22.10    22.70    21.10    23.25    20.50
24.00    24.25    21.50    23.75    19.50    22.60
23.85    23.50    23.80    22.50    21.75    21.70
24.20    22.75    25.60    25.00    20.80    20.75
22.90    23.80    24.10    22.70    20.25    22.50
23.20                      23.25    22.45
23.55                      21.90    19.10
Conduct a hypothesis test at the 2%
level of significance to determine whether
the hourly wages of the computer analysts
are still the same as those of registered
nurses.
In order to carry out this test, the Null
and Alternative Hypotheses were set up as
follows:
Null and Alternative Hypotheses:

H 0 : 1 – 2 = 0
H A : 1 – 2  0
(Two-tailed test)
The computed value of our test
statistic came out to be 3.43, whereas, at the
5% level of significance, the critical value
was 2.33, hence, we rejected H0.
Z.01 = -2.33      Z=0         Z.01 = +2.33
Z
Calculated Z = 3.43

1   2  0      X1  X 2  1.15
X1  X 2
Hence, we concluded that there was a
significant difference between the average
hourly wage of a temporary computer
analyst and the average hourly wage of a
temporary registered nurse.
This conclusion could also have been
reached         by        using       the
p-value method:
I. Looking up the probability of Z > 3.43 in
the area table of the standard normal
distribution yields an area of .5000 – .4996 =
.0004.
II. To compute the p-value, we need to be
concerned with the region less than –3.43 as
well as the region greater than 3.43 (because
the rejection region is in both tails).
p-value = 0.0004+0.0004 = 0.0008

0.0004                                                      0.0004

 .05                                                        .05
     .025                                                      .025
2 2                                                         2 2

-3.43 -1.96
-1.96              0           1.96    3.43        Scale of z

Rejection Region                                      Rejection Region
The p-value is
0.0004 + 0.0004 = 0.0008 .
Since this value is very small, it means
that the result that we have obtained in this
example is highly improbable if, in fact, the
null hypothesis is true.
Hence, with such a small p-value, we
decide to reject the null hypothesis.
:
The above example shows that
The p-value is a property of the data,
and it indicates “how improbable” the
obtained result really is.
A simple rule is that
if our p-value is less than the level of
significance , then we should reject H0,
whereas
if our p-value is greater than the level of
significance , then we should accept H0.
(In     the     above      example,
 = 0.02 whereas the p-value is equal to
0.0008, hence we reject H0.)

```
DOCUMENT INFO
Shared By:
Categories:
Tags: Square.ppt
Stats:
 views: 34 posted: 1/9/2013 language: English pages: 59
Description: Chi Square.ppt
How are you planning on using Docstoc?