Docstoc

QQ Plots for Normality - QQ Plots to Assess Normality

Document Sample
QQ Plots for Normality - QQ Plots to Assess Normality Powered By Docstoc
					                                                           Q-Q Plots to Assess Normality

Suppose that a sample x1,…,xn ordered from smallest to largest values: x(1)≤x(2)≤…≤x(n). The
proportion of the sample that is at or below x(j) might be thought of as j/n but since obviously
there is the possibility of ties, so this is usual calculated as (j-½)/n. If we have random variable Z
with pdf f(Z), the jth quantile is a number q(j) such that P[Z≤q(j)]=(j-½ )/n. If Z is standard
                                                               q( j) 1       1 z2
normal, the quantiles are found by solving the equation                  e 2 dz  ( j  1 ) / n for q(j). Of
                                                                   2
                                                                                         2

course, if we had a non-standard normal we would have to be slightly more complex, since we
should transform this to +q(j). If the sample is drawn from a normal distribution, then x(j)
should equal +q(j). To see whether this is true, we draw a Q-Q diagram (see SPSS drop-down
list for Descriptive Statistics) with points (x(1), q(1)), …,(x(n), q(n)), and this should be a straight
line. In the graph below, the Fortune 500 “sales” Q-Q plot is seen. Notice that sales tend to
below the line at extremes and above the line in the middle, suggesting that this is a curve, not a
line of data. Hence, sales are not very “normal.”


                                                             Normal Q-Q Plot of sales

                                             120,000




                                             100,000
                     Expected Normal Value




                                             80,000




                                             60,000




                                             40,000




                                             20,000




                                                  0
                                                       0   20,000   40,000    60,000   80,000   100,000   120,000   140,000

                                                                             Observed Value
                                                                                                __
        This procedure can be generalized to multi-dimensions as follows. For each observation
i, compute the Mahalanobis squared distance: d i2  (x i  x)' S1 (x i  x) . These are on the
diagonal of the matrix (X-1 x ' )S-1(X-1 x ' )’. If X is normal, then this should be 2p-distributed.
Rank order the Mahalanobis squared distances, compare them to the quantiles of the 2p
distribution; there should be linearity with slope 1. SPSS allows you to draw a Q-Q plot for 2p.

				
DOCUMENT INFO
Lingjuan Ma Lingjuan Ma
About