# Chapter 1 - PowerPoint Template

Document Sample

```					STAT312: Applied Regression
Methods

 http://www.mysmu.edu/faculty/zlyang/
Course Contents

1   Linear Regression

2   Transformed Linear Regression

3   Logistic Regression

4   Multinomial Logistic Regression

5   Loglinear Model

STAT312, Term II, 10/11                 2                       Zhenlin Yang, SMU
Chapter 1: Introduction
Contents
 History of regression analysis
 Basic concepts
 Applications - Examples
 Sampling models
 Sampling distributions
 Statistical inference
 Large sample inference
 Matrix algebra
STAT312, Term II, 10/11       3               Zhenlin Yang, SMU
Chapter 1: Introduction
History of Regression Analysis

The earliest form of regression was the method of least
squares, which was published by Legendre in 1805, and
by Gauss in 1809. The term “least squares” is from
Legendre’s term, moindres carrés. However, Gauss
claimed that he had known the method since 1795.

Legendre and Gauss both applied the method to the
problem of determining, from astronomical observations,
the orbits of bodies about the Sun. Euler had worked on
the same problem (1748) without success. Gauss
published a further development of the theory of least
squares in 1821, including a version of the Gauss–Markov
theorem.

STAT312, Term II, 10/11      4                    Zhenlin Yang, SMU
Chapter 1: Introduction
History of Regression Analysis

The term "regression" was coined in the nineteenth
century to describe a biological phenomenon, namely
that the progeny of exceptional individuals tend on
average to be less exceptional than their parents and
more like their more distant ancestors.

Francis Galton, a cousin of Charles Darwin, studied this
phenomenon and applied the slightly misleading term
"regression towards mediocrity" to it. For Galton,
regression had only this biological meaning, but his work
was later extended by Udny Yule and Karl Pearson to a
more general statistical context. Nowadays the term
"regression" is often synonymous with "least squares
curve fitting".

STAT312, Term II, 10/11      5                    Zhenlin Yang, SMU
Chapter 1: Introduction
Some Basic Concepts

A variable represents some characteristic of a social
phenomenon.

A continuous variable is a variable that assumes values in
an interval.

A discrete variable is a variable that assumes distinct
values.

A random variable is a variable that is used to represent
possible outcomes of some random phenomenon, e.g., results
of tossing a coin, results of rolling a die, etc.

Response/dependent variable: variable of interest in the
study.

STAT312, Term II, 10/11        6                          Zhenlin Yang, SMU
Chapter 1: Introduction
Some Basic Concepts

Explanatory/independent variable: variable used to
“explain” or “study” the variable of interest.

Regression Analysis: a method for investigating functional
relationships among variables.

Categorical variables having ordered scales are called
ordinal variables.

Categorical variables having unordered scales are called
nominal variables.

Categorical variables are often referred to as qualitative
variables, and numerical-valued variables are called
quantitative variables.

STAT312, Term II, 10/11         7                       Zhenlin Yang, SMU
Chapter 1: Introduction
Some Basic Concepts

STAT312, Term II, 10/11      8                Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.1 Where to locate a new motor inn?
 La Quinta Motor Inns is planning an expansion.
 Management wishes to predict which sites are likely
to be profitable.
 Several areas where predictors of profitability can
be identified are:
• Competition
• Market awareness
• Demand generators
• Demographics
• Physical quality

STAT312, Term II, 10/11       9                      Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Predictors of
Profitability                   Operating Margin
profitability:
Market
Competition                                 Customers              Community          Physical
awareness

Rooms                Nearest         Office         College       Income           Disttwn
space         enrollment
Number of            Distance to                                    Median       Distance to
hotels/motels        the nearest                                    household    downtown.
rooms within         La Quinta inn.                                 income.
3 miles from
the site.                                  Data:       EX1-01.XLS               EX1-01.TXT
STAT312, Term II, 10/11                             10                                   Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.2. The table below cross classifies 1091 respondents to the
1991 General Social Survey by their gender and their belief in an
afterlife.
Table 1.1 Cross Classification of Belief in Afterlife by Gender
Belief in Afterlife
Gender                  Yes                No or undecided
Females                 435                        147
Males                   375                        134

Purpose of the Study: whether an association exists between gender
and belief in afterlife. Is one sex more likely than the other to belief in
an afterlife, or is belief in afterlife independent of gender?
This is a typical 22 contingency table, where an analysis of association
between two factors (variables) is of interest.

STAT312, Term II, 10/11               11                            Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.3. The data given below comes from a randomized, double-
blind clinical trial investigating a new treatment for rheumatoid arthritis.
Investigators compared the new treatment with a placebo. The response
measured was whether there was no, some, or marked improvement in
the symptoms of rheumatoid arthritis.
Table 1.2 Rheumatoid Arthritis Data                       This is a 2 23
Gender          Treatment       Improvement                 contingency table,
None Some Marked      Total    where interest lies
Female          Test Drugs    6     5       16     27       in the association
Female          Placebo       19    7        6     32       between treatment
Total                         25    12      22     59       and degree of
Male            Test Drugs    7     2        5     14
improvement,
Male            Placebo       10    0        1     11       adjusting for
gender effect.
Total                         17    2        6     25

STAT312, Term II, 10/11               12                           Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.4. (Education Expenditure) Per capita expenditure on
public education can be affected by (1) per capita personal income, (2)
number of residents per thousand under 18 years of age, (3) number of
people per thousand residing in urban areas, and (4) the geographical
region. The data have been collected for each of the 50 states in U.S., in
1960, 1970, and 1975.
The problems of interest can be:
• How is education expenditure related to the factors listed above?
• Does the geographical region make a difference on education
expenditure?
• Is the relationship between education expenditure and other variables
constant over time?

STAT312, Term II, 10/11              13                          Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.5. (Probability of Bankruptcies) Detecting ailing financial
and business establishments is an important function of audit and
control. Systematic failure to do audit and control can lead to grave
consequences, such as the saving-and-loan-fiasco of the 1980s in the
United Stats, and current financial crises. The data P322.txt gives some
of the operating financial ratios of 33 firms that went bankrupt after 2
years and 33 that remained solvent during the same period:
X1 = Retained Earnings/Total Assets
X2 = Earning Before Interest and Taxes/Total Assets
X3 = Sales/ Total Assets
Y = 0 if bankrupt after two years, and 1 if solvent after two
years.
Question: given a firm’s characteristics, what is the chance that this
firm remains solvent after two years?
STAT312, Term II, 10/11                  14                          Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.6. (Chemical Diabetes) To determine the treatment and
management of diabetes it is necessary to determine whether the patient
has chemical diabetes of overt diabetes. The data P331.txt is from a
study to determine the nature of chemical diabetes. The measurements
were taken on 145 non-obese volunteers who were subject to the same
regimen. Many variables were measured, but only three considered:
insulin response (IR), the steady state of plasma glucose (SSPG), which
measures insulin resistance, and relative weight (RW). The diabetes
status of each subject was recorded. The clinical classification (CC)
categories were overt (1), chemical diabetes (2), and normal (3).

Question: given a subject’s characteristics, what is the chance that
he/she has overt diabetes, or has chemical diabetes, or is normal?

STAT312, Term II, 10/11              15                          Zhenlin Yang, SMU
Chapter 1: Introduction
Applications - Examples
Example 1.7. (Political Ideology and Party Affiliation) The data
given below, from a General Social Survey, relates political ideology to
political party affiliation.

Table 1.3 Political Ideology and Part Affiliation Data
Political Ideology
Political    Very      Slightly     Moderate     Slightly        Very
Gender          Party      Liberal    Liberal                 Conservative   Conservative

Female      Democratic       44         47           118          23              32
Republican       18         28            86          39              48
Male        Democratic       36         34            53          18              23
Republican       12         18            62          45              51

Question: Who is more conservative, democrats or republicans, males
of females?

STAT312, Term II, 10/11                       16                                 Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
Normal Distribution: A random variable X is said to have a normal
distribution, denoted by X ~ N(, 2), if its probability density function
(pdf) is of the form
1        (x  )2 
f ( x;  ,  )       exp          
2         2 2

This is a continuous distribution with mean  and variance 2.
0.4                                                      0.2

 =1, 1.5, 2                                        =1.0
0.3                                                     0.15

0.2                                                      0.1

0.1                                                     0.05

0                                                        0
4      6       8   10    12   14       16               0    5   10   15         20

STAT312, Term II, 10/11                                17                             Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
Binomial Distribution: A random variable X is said to have a binomial
distribution, denoted by X ~ Bin(n,p), if its probability function is:

n!
p ( x)              x (1   ) n x , x  0,1, 2, , n
x!(n  x)!
a discrete distribution with mean n and variance n(1- ).
0.2                                                    0.2

p(x)
p(x)

n=20, p=0.5
n=20, p=0.3

0.1                                                    0.1

0.0                                                    0.0

0                 10            20                     0       10                     20
x                                                      x

STAT312, Term II, 10/11                              18                                   Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
Poisson Distribution: A random variable X is said to have a Poisson
distribution, denoted by X ~ Poi(), if its probability function is:
e   x
p( x)           , x  0,1, 2, ,
x!
a discrete distribution with mean , called the Poisson rate, and
variance 2.

p(x)
p(x)

0.15
0.10                      mean=10
mean=6

0.10

0.05
0.05

0.00                                              0.00
0            10                 20
x                     0       10                 x   20

STAT312, Term II, 10/11                           19                                 Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
Chi-squared Distribution: A random variable X is said to have a Chi-
squared distribution with  degrees of freedom if its pdf is of the form

x ( 2 ) 2 exp(  x 2)                      a continuous distribution
f ( x)           2
, x  0,             with mean 2 and variance
2 ( 2)                               4, denoted by  
2

 =3
0.2                                                                 2  Z12  Z 2    Z2
2

0.15                                                                  where Z1 , Z2, …, Z
=6                                              are iid standard normal
0.1                           =12                                  random variables.
0.05                                             =20
It is a distribution used
0                                                                in statistical inference
0       5         10          15   20    25     30   35

STAT312, Term II, 10/11                                  20                          Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
Student’s t distribution: A continuous r.v. T is said to
have a Student’s t distribution with v df, denoted by T ~ tv,
if its pdf has the form:
 ( v 1) 2
[(v  1) 2]  t 2 
f(t) =              1                    ,  < t < , v > 0
 (v 2) v     v

• The t distribution is
symmetric around zero
• If Z ~ N(0, 1) and Y ~  2 (v )
and if Z and Y are indep.,
T =Z          Y v ~ tv
• It approaches to N(0, 1) as v

STAT312, Term II, 10/11                       21                                Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Models
F Distribution: If X1 ~  (v1 ) and X2 ~ 
2                2
(v 2 )   are independent,
then the r.v.         X 1 v1
Y=
X 2 v2
follows an F–distribution, with v1 df in the numerator and v2 df in the
denominator.

• This distributional
result is often used to
construct F test in
linear regression
models.

STAT312, Term II, 10/11               22                               Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions

The Central Limit Theorem
If a random sample X1, …, Xn is drawn from any
population, the sampling distribution of the
sample mean x is approximately normal for a
sufficiently large sample size.
The larger the sample size, the more closely
the sampling distribution of x will resemble a
normal distribution.

STAT312, Term II, 10/11      23                 Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
In more detail:
Let X represent a population, and X1, …, Xn be a
random sample drawn from this population. Then,
1.  X   X
   2
2.     2
X
x
n
3. If X is normal, X is normal. If X is nonnormal
X is approximat ely normally distribute d for
sufficient ly large sample size (n  30).

STAT312, Term II, 10/11           24                      Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
Sampling Distribution of a Sample Proportion
ˆbe
Let p the proportion of “successes” in a sequence of
n Bernoulli trials.

 From the laws of expected value and variance, it can be
ˆ              ˆ
shown that E( p ) = p and Var(p) = p(1-p)/n
 If both np ≥ 5 and np(1–p) ≥ 5, then

p p
ˆ
Z 
p (1  p )
n

 Z is approximately standard normally distributed.
STAT312, Term II, 10/11             25                 Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
Sampling Distribution of the Difference Between
Two Sample Means
The distribution of x1  x 2 is normal if
 The two samples are independent, and
 The parent populations are normally distributed.

If the two populations are not both normally
distributed, but the sample sizes are 30 or
more, the distribution of x1  x 2 is
approximately normal.

STAT312, Term II, 10/11           26                     Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
 Applying the laws of expected value and
variance we have:

E ( X 1  X 2 )  E ( X 1 )  E ( X 2 )  1   2
 12       2
2
Var ( X 1  X 2 )  Var ( X 1 )  Var ( X 2 )             
n1        n2

 We can define:
( x1  x2 )  ( 1   2 )
Z
 12       2
2

n1         n2

STAT312, Term II, 10/11                    27                                    Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
Sampling Distribution of the difference between
two Sample Proportions
 From the laws of expected value and variance, it can be shown
that
E( p1  p 2 )  E( p1 )  E( p 2 )  p1  p 2
ˆ    ˆ          ˆ         ˆ
p1 (1  p1 ) p 2 (1  p 2 )
Var ( p1  p 2 )  Var ( p1 )  Var ( p 2 ) 
ˆ    ˆ             ˆ            ˆ                     
n1           n2
 If both n1p1 ≥ 5, n1(1-p1) ≥ 5, n2p2 ≥ 5, n2(1-p2) ≥ 5, then
p1  p2  ( p1  p2 )
ˆ    ˆ
Z
p1 (1  p1 ) n1  p 2 (1  p2 ) n2
 Z is approximately standard normally distributed.

STAT312, Term II, 10/11                          28                                    Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
Sampling Distribution of Sample Variance
The statistic (n  1) s 2  2 has a Chi-squared
distribution with df = n-1, if the population
is normally distributed.

d.f. = 5                (n  1) s 2
2                  ,   d. f .  n  1
   2

d.f. = 10

STAT312, Term II, 10/11          29                           Zhenlin Yang, SMU
Chapter 1: Introduction
Sampling Distributions
Sampling Distribution of the Ratio of Two
sample Variances              2   2
s1  1
Define the statistic:           F 2 2
where the two samples
s2  2
are drawn from two Normal populations

The sampling distribution of this statistic
is an F distribution with df n1 = n1–1 for the numerator
and df n2 = n2–1 for the denominator.
STAT312, Term II, 10/11       30                    Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
 Any numerical feature of a population, such as mean and
variance, is called a parameter.
 Statistical Inference deals with drawing
generalizations about population parameters from an
analysis of information contained in the sample data.
Studying the whole population is usually impractical;
that is why we study a part of it. Inferences include:
 Point Estimation: obtain a guess or an estimate for
the unknown true parameter value.
 Interval Estimation: obtain an interval of plausible
values for the parameter, and determine the accuracy of
the procedure.
 Testing hypothesis: decide whether the value of the
parameter is equal to some pre-assumed value.

STAT312, Term II, 10/11         31                      Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
• Point Estimation
Let f(x; ) be the pdf with parameter (vector) . Let X1, X2, …, Xn be a
random sample drawn from f(x; ).
Point estimation of  is to find a statistic such that its value computed
from the sample data would reflect value of  as closely as possible.
Such a statistic is called an estimator of  and a specific value of the
estimator computed form sample data is called an estimate of .
Maximum Likelihood Estimation Method:
The joint pdf of X1, X2, …, Xn when regarded as a function of , is
called the likelihood function of :
L( )  f ( x1; ) f ( x2 ; ) f ( xn ; )
ˆ
The value of  , denoted by  , that maximizes L() is called the
maximum likelihood estimator, or the MLE.
STAT312, Term II, 10/11               32                           Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
Example 1.8. Bernoulli sampling: f ( x, p)  p x (1  p)1 x , x  0,1

L( p)  p  xi (1  p) n xi , 0  p  1         p   xi n
ˆ

Example 1.9. Normal sampling: Xi ~ N(, 2)

 n ( xi   ) 2 
n
  1 
L(  ,  )         exp               
2

 2              2 2     
 i 1           
1 n                   1 n
    X i  X , and    ( X i  X ) 2
ˆ                    ˆ 2

n i 1                n i 1
Other methods: least square, method of moment, Bayesian estimator, etc.
Properties:    unbiasness, relative efficiency, etc.
STAT312, Term II, 10/11                    33                        Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
• Confidence Interval
Let L(X) and U(X) be functions of X = (X1, X2, …, Xn) such that
P[L(X) <  < U(X)] = 1 – a
Then the interval {L(X), U(X)} is called a 100(1–a)% confidence
interval (CI) for , L(X) and U(X) the lower and the upper confidence
limits, and (1–a) the confidence coefficient associated with the interval.

It is an approximate CI if the above equality holds only approximately.

Bernoulli sampling: an approx. CI for p: p  Z a 2 p(1  p) n
ˆ         ˆ     ˆ

Normal sampling: an exact CI for :   ta 2 
ˆ         ˆ    n 1

STAT312, Term II, 10/11               34                         Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
• Test Statistical Hypothesis

Null Hypothesis (H0): A theory about the values of population
parameter(s), representing the status quo, accepted until proven
false.
Alternative Hypothesis (Ha): A theory that contradicts H0, which is
favored upon existence of sufficient evidence.
Test Statistic: A sample statistic used to decide whether to reject H0,
which a measure of difference between the data and what is
expected under the null hypothesis.
Rejection Region: The numerical values of test statistic for which H0
is rejected. This region is chosen so that the probability is a that it
will contain the test statistic when H0 is true, thereby leading to a
wrong rejection (Type I error). It is also referred to as level of
significance.

STAT312, Term II, 10/11               35                           Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference

Conclusion: If the numerical value of the test statistic falls
in the rejection region, we reject the H0 and conclude
that the Ha is true. If the test statistic does not fall in the
rejection region, we reserve the judgment about which
H0 is true. An incorrect acceptance of H0 leads to a
Type II error.
p–value: the probability (assuming H0 is true) of observing
a value of the test statistic that is at least as
contradictory to the null hypothesis as the one
computed from the data.
Power of the test: Probability of rejecting a wrong null
hypothesis.

STAT312, Term II, 10/11           36                       Zhenlin Yang, SMU
Chapter 1: Introduction
Statistical Inference
Example 1.10. From past sales records, it is known that 30% of
the population buys Brand X toothpaste. A new advertising
campaign is completed, and to test its effectiveness, 1000
people are asked whether they buy Brand X toothpaste now. If
334 answer yes, does this indicate that the advertising campaign
has been successful?

H0: p = 0.30, Ha: p > 0.30.
ˆ
n = 1000, p = 0.334, Z0.05= 1.65. Rejection region: Z > 1.65.
Test Stat.
p  p0
ˆ              0.334  0.3
Z=                       =             = 2.35
p0 (1  p0 ) n    0.01449
STAT312, Term II, 10/11                      37                    Zhenlin Yang, SMU
Chapter 1: Introduction
Large Sample Inference Methods
When the exact sampling distribution of an estimator is unknown,
statistical inference can only be made approximately, based large
sample properties of the estimator. Common methods include:
ˆ   0 H 0
Wald Statistics:                ~ N (0,1)
ˆ
ASE ( ) approx.
S ( 0 )       H0
Score Statistic:                           ~ N (0,1)
ASE[ S ( 0 )]      .
approx

where S ( 0 )  d log[ L( )] d     0

LR Statistic:   maximum likelihood when parameters satisfy H0
maximum likelihood when parameters unrestrict ed
H0
 2 log  ~  df
2
.
approx
STAT312, Term II, 10/11                     38                            Zhenlin Yang, SMU
Chapter 1: Introduction
Matrix Algebra
Vector:                             Matrix:

 a1                                 a11    a12     a1n 
                                                         
a = a 2  , b = b1   b2  bn     A =  a 21   a 22    a2n 
                                                 
                                                         
a                                  a               am n 
 n                                  m1     am2           

Transpose: a   a1     a2  an 
10 1          2           23 
Matrix Multiplication: A = 
 3 6 , b =
         then A  b   
 ,
 3           24 
                          
 0.10526316 - 0.01754386 
Matrix Inverse: A 1  
 - 0.05263158 0.17543860 

                         

STAT312, Term II, 10/11              39                           Zhenlin Yang, SMU
Chapter 1: Introduction
Computer Software: R

 ‘R’ is a computer package which does statistical
analysis in a rather simple way,
 R is an open source software project and can be
http://info.smu.edu.sg/rsite/
http://cran.r-project.org/
http://cran.bic.nus.edu.sg/
http://www.mysmu.edu/faculty/zlyang/
 Other popular software include: Excel, Minitab,
Matlab, SPSS, SAS, Gauss, S-Plus.

STAT312, Term II, 10/11      40                 Zhenlin Yang, SMU

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 21 posted: 2/23/2011 language: English pages: 40
How are you planning on using Docstoc?