# Correlation and Multiple Regression

Document Sample

```					  Correlation and Multiple
Regression

Robert K. Toutkoushian
Associate Professor
Indiana University
Objectives of Module
• Review statistical procedures such as
correlation and multiple regression
analysis
• Examine ways in which these procedures
can be applied to institutional research
• Practice using SPSS to implement these
procedures
• Discuss more involved procedures and
applications
My Approach
• Aim for a “middle ground” in terms of
difficulty (higher UG/lower G level)
• Focus more on intuition behind procedures
rather than proofs & derivations
• Assume familiarity with descriptive stats
and hypothesis testing
• STRONGLY encourage questions at any
time!
Covariance and Correlation
Both measure the extent to which two
variables “move together.” They differ
only in units of measure:
• Positive covariance/correlation: Both
variables tend to move in the same
direction
• Negative covariance/correlation: Both
variables tend to move in the opposite
direction
Remember...
• When looking for correlations, you may
have to first reorder one of the variables
• If two variables are related, then knowing
the value of one variable may help with
guesses as to the value of the other (e.g.,
retention and SAT/ACT scores)
• “Correlation” does not imply “causation”!
Calculating the Covariance

• Calculate the means for X and Y (denoted “x-
bar” and “y-bar”)
• Subtract the mean for X from each X value
and repeat for Y
• Multiply the differences together for each
observation, then sum and divide by degrees
of freedom (n-1)
Covariance = -132,000/(4-1) = -44,000
Correlation Coefficient
Properties of correlation coefficient:
• A “standardized” measure of covariance
that ranges between -1 and +1
• Positive: 0 < r  +1
• Negative: -1  r < 0
• No correlation: r = 0
• Cov(x,y) and r will have the same sign
• Stronger relationship as r moves away from
zero
Calculating Correlation
Coefficient
• Calculate cov(x,y) as before
• Calculate st. dev‟s for X and Y
• Divide cov(x,y) by product of standard
deviations
Institutional Research
Applications
Correlations can be useful in IR when one
variable of interest is unobservable, and a
correlated variable is observable:
• College performance (correlated with HS
performance)
• Faculty experience (correlated with age, years
since degree)
• Teaching quality (correlated with student
evaluations)
Limitations
• Weak correlations are less useful for
making inferences
• Correlations vary across factors, so it is
difficult to compare across factors (e.g.,
stock prices and faculty salaries)
• May be multiple factors affecting a single
factor of interest
• Does not measure non-linear relationships
Class Example #1
Filename TUITION.SAV contains data on
average public tuition rates, state
appropriations, and median family income by
state in 1994. In SPSS:
• Calculate the means and standard deviations
for these three variables.
• Calculate the covariances and correlations
between state appropriations and (a) public
tuition rates, (b) median family income.
Linear Regression (“OLS”)
• Objective: find the best linear (“straight line”)
relationship between two or more variables.
• Ordinary Least Squares (OLS) is the technique
most often used to choose the best line.
• This linear relationship is based on the
covariance between two variables.
• Regression analysis requires the analyst to
specify the direction of causation.
Regression:
• Can predict/forecast one variable (Y)
based on values of another variable (X)
• Can perform hypothesis tests to determine
if X affects Y
• Can control for differences in Y due to X
• Very flexible with regard to functional form,
model specification, etc.
Example: Gender Equity in
Salaries
institution and determine if there is a gender equity problem.
Descriptive stats show that on average men earn more than
women.
• How can you control for salary differences due to justifiable
factors such as experience, productivity?
• How can you determine if the remaining pay difference is
large enough to conclude that this is a problem?
Ordinary Least Squares
Three Formulations

• Slope = β in population, b in sample
• Error term (ε or e) encompasses effects of all omitted factors
• Parameters in the population model is unobservable
• Sample line is what you estimate with OLS
Assumptions in Linear
Regression
• The error term has a mean of zero and
constant variance
• The errors are unrelated to each other
• The errors are unrelated to the
independent variable(s) in the model
• The error term is normally distributed
(needed for hypothesis testing)
Ordinary Least Squares
OLS specifies that the “best” line is the one that minimizes
the sum of squared errors ( minimize Σ ei2 )

Intercept (a) =
Notes on OLS:

The slope formula is the covariance between
X and Y divided by the variance for X
The slope and covariance will always have
the same sign
• b > 0 indicates a positive relationship
• b < 0 indicates a negative relationship
• b = 0 indicates no linear relationship
Example: An IR analyst is asked to help
forecast applications. She believes there
is a relationship between HS grads and
resident applications each year
Regression line: Ŷ = -358.28 + 0.29X

predicted applications will rise by 0.29.
• The intercept may not have much meaning.
• Can predict applications given projections of

Ŷ = -358.28 + 0.29(36,000) = 10,082
Goodness of Fit
• Measures the strength of the relationship between X
and Y
• R-squared (or coefficient of determination):
proportion of total deviation in Y that is “explained”
by X(s)
• R-squared is bounded between 0 and 1 (R2 = 1 if
perfect fit, R2 = 0 if no fit)
• R-squared = square of correlation coefficient (with
only one X variable in the model)
More on R-squared...
• When there is no covariance, the slope of the
regression line is zero and R2 = 0.
• Adding variables to the regression model will
almost always raise R2, but this does not mean
that the resulting model is “better”
• Adjusted R2 attempts to correct for this, but no
longer has the same interpretation
• R2 varies depending on the dependent variable.
Do not use this to compare regression models
with different Y‟s.
Predicting Resident Applications

Note that HS grads account for 88.5%
of the total deviation in applications.
Class Example #2
Using TUITION.SAV, in SPSS:
• Calculate a regression line showing how
median income affects average tuition
• Calculate R2, TSS, RSS, ESS, and corr(x,y).
SPSS syntax:
• REGRESSION
•  /MISSING LISTWISE
•  /STATISTICS COEFF OUTS R ANOVA
•  /CRITERIA=PIN(.05) POUT(.10)
•  /NOORIGIN
•  /DEPENDENT tuition
•  /METHOD=ENTER income .
Equation: Tuition = 313.119 + 0.0719*Income
Hypothesis Testing for β
• In most situations in the social sciences, it is
rarely known for sure if X affects Y
• A hypothesis test can be used to determine
if the data provide sufficient evidence of a
relationship
• For most variables, the sample slope „b‟ will
not exactly equal zero. How far from zero
must it be in order to safely conclude that β 
0??
Steps in Hypothesis
Testing
• Specify null (H0) and alternative (HA)
hypotheses
• Identify test statistic and find critical
value(s) based on degrees of freedom and
significance level
• Calculate test statistic and compare to
critical value(s)
Common Hypotheses for β
• β = 0 (X has no effect on Y)
• β > 0 (X has a positive effect on Y)
• β < 0 (X has a negative effect on Y)
• β  0 (X has some effect on Y…+ or - )
Choose two hypotheses that are mutually
exclusive and exhaustive.
The null hypothesis (H0) should always
contain some form of equal sign.
Test Statistic for β
If ε ~ N(0, σ2), then b ~ N(β, Var(b))

Therefore the t-ratio =

distribution with n-k      The t-ratio is defined as the
degrees of freedom (k =    random variable minus its mean
# parameters to be
estimated)
(when H0 is true), divided by its
standard deviation.
Notes on Hypothesis
Testing
• The t-ratio simply counts the # standard
deviations the slope is from zero (“distance”)
• The greater the distance, the less likely you
would have found the value of b if β = 0.
• For significance tests, since β = 0, the t-ratio is
the slope divided by its standard deviation (or
“standard error”)
Example: t-ratio of +2.40
This shows that there
is only a 1.3% chance
of finding a t-ratio of
2.40 or greater if in
fact β = 0. Therefore,
if you found a t-value
this high, it is
unlikely that β = 0.
R2 = 0.025

TSS = 1.9E+10
ESS = 1.9E+10
se = √(ESS/826) = 4766
Do undergraduate enrollments have a significant effect on average costs per
student?
Null Hypothesis: β = 0, Alternative Hypothesis: β ≠ 0
For 826 df, 1% significance level, reject the null when the calculated t-ratio
exceeds 2.575 in absolute value.
P-value = Probability of drawing a more extreme sample value
given that the null hypothesis is true:
P-value = Pr(b < -0.175) = Pr(t < -4.577) = 0.000
Units of Measurement
The significance levels of any variable will
not be influenced by the units of measure
used for X or Y
• The coefficient represents the # units
change in Y due to a one-unit change in X
• When the units of measure change, both
the coefficients and standard errors
change proportionately (t-ratio remains the
same)
Out-of-Sample
Forecasts/Predictions
The regression model can be used to derive
predictions of Y given values of X(s)
• Point estimates are found by substituting X into
the equation and solving for Y (“I predict that the
• Interval estimates are predictions that Y will fall
within a certain interval (“I am 95% certain that the
grad rate will be between 68% and 72%”)
• Interval estimates are more conservative, and
convey the uncertainty in predictions.
Two Types of Intervals
C.I. For expected value (“mean”) of Y
• For given X, what is the predicted average value
of Y
C.I. For a single value of Y
• For given X, what is the predicted single value of Y
(more uncertainty, so wider interval)
The two methods yield very similar intervals. Most
IR applications use C.I.‟s for single value.
Intervals can be obtained in SPSS using the “save”
subcommand.
Hampshire
An IR analyst is charged with developing a model to
help predict changes in HS grads in the state
through 2006.
File AIR1.SAV in SPSS has two vars: # HS grads in
• Estimate a regression model
• Form point and 95% CI estimates of high school
grads for the next ten years.
Under statistics > correlate > bivariate:

Note that r = +0.959, cov(x,y) = 701,160 (n=12)
Under statistics > regression > linear:
In 2006, the model
predicts there will
be 14,919 high

95% certain that
in 2006 there will
be between 14,185
and 15,652 high
Multiple Regression
Analysis
In most IR applications, the dependent variable may
be influenced by multiple factors:
• Grad rate = f(avg. SAT, gender composition, avg.
HS rank, % students on campus,...)
• Faculty Salary = f(education, experience,
productivity, field,...)
• Education Costs = f(enrollments, research
intensity, student/faculty ratio,...)
Assumptions in Multiple
Regression
• Error term has a mean of zero and
constant variance
• Error terms are unrelated to each other
• Error term is unrelated to independent
vars
• Error term is normally distributed
• Independent variables are not collinear
with each other (no “multicollinearity”)
Ordinary Least Squares
Least Squares Estimates
• The coefficients are referred to as “partial
effects” because they show the effect of
one variable on Y holding other vars
constant.
• The OLS formula takes into account any
relationships between the X variables. For
this reason, the coefficients usually
from model.
Other Stats in Multiple
Regression
• Hypothesis tests for significance of coefficients
can be performed as before, except degrees of
freedom change (n-k-1).
• Goodness of fit measures are calculated as
before. R-squared now represents the %
deviation in Y explained by all X‟s together. Thus,
R2 usually rises as X‟s are added.
• Confidence intervals and point estimates can be
calculated as before.
Example: Average Public Tuition
Rates
An IR analyst is asked to help explain why
there are variations across states in
their tuition rates at public institutions.
She feels that factors such as state aid
given to students and state
appropriations help account for these
differences.
• Open the file TUITION.SAV in SPSS.
Question 1: How do state appropriations affect average tuition?

State appropriations
account for 13.5%
of differences in
tuition.
Question 2: How do state appropriations and aid to students
affect average tuition?

These two variables
account for 40.4% of
differences in tuition.

A \$1 increase in appropriations reduces tuition by 22.6 cents,
holding constant state aid per student.
Extensions of Regression
Model
So far, we have only considered linear
models where X‟s and Y‟s were
continuous. We will now examine how to
handle:

• Categorical X‟s
• Interactions among X‟s
• Non-linear relationships between X and Y
Categorical Variables
There are many examples of independent variables
that are not numerical (ex: gender, race,
institution attended, attitudes/beliefs)
“Likert scale” variables (assign #‟s to categorical
responses) should not be used in regression
models in their present form due to problems in
interpreting changes in units.
• Slope = # units change in Y due to a one-unit
change in X (but Likert #‟s are artificial)
Dummy Variables
However, categorical X‟s can be used if they are
first recoded into “dummy variables”
• Dummy variable: has only two values (0,1)
• Need to specify an assignment rule. Can be used
for categorical, Likert, and continuous variables.
• The variable can now be used in regression
• It does not matter which group is assigned 1
• Coef represents the difference in intercepts for the
two groups
• Must omit one of the dummy variables for a
construct to avoid multicollinearity
Examples of Assignment
Rules
Let X = 1 if (0 otherwise):
•   Teaches in Psychology Department
•   Enrolled in public university
•   Family income exceeds \$100,000
•   Student is “very satisfied” with the
quality of instruction
•   Student dropped out of college
Note: Both equations
have the same slope
for RANK.

Question: Does living
on campus matter?
Variable Interactions
It is possible that the joint occurrence of two X‟s
has an effect on Y separate from each X‟s
effect:
• Academic performance of students with high
SAT scores and HS ranks
• State appropriations for higher ed in states
with low incomes and high tax rates
• The salary increase from promotions for men
and women may be different
Interactions (cont‟d)
In these examples, there is something
special about the joint occurrence of two
variables.
• To test these assertions, an “interaction
variable” can be created and added to the
regression model.
• Interaction variables are created by
defining a third variable as the product of
the two variables in question.
The interaction variable is then added to the regression model and
treated as any other variable:

To find the effect of x1 on y, you need to differentiate the equation
with respect to x1:
Non-linear Functional
Forms
Regression analysis can also be used in
situations where X has a non-linear
relationship with Y
• Linear: The change in Y due to a one-unit
change in X is constant.
• Non-linear: The change in Y due to a
one-unit change in X can vary with the
level of X.
Graphs of Non-linear
Functions
Y                                 Y

X                               X
Exponential: Y = exp(X)           Logarithmic:Y = ln(X)
Functions
Y                       Y

X                      X
“Maximize” Y            “Minimize” Y
Possible IR Examples
Exponential: Implies that as X increases, Y
increases at a faster rate.
• Y = salary, X = years of experience
Logarithmic: Implies that as X increases, Y
increases at a slower rate.
• Y = college GPA, X = hrs/week studying
• Y = retention rate, X = avg. student SAT
score
Examples:
“Maximize” Y: There is some value of X at
which Y is maximized.
• Y = Tuition revenue, X = tuition rate
• Y = Student gains, X = class size
“Minimize” Y: There is some value of X at
which Y is minimized.
• Y = costs/student, X = enrollments
Using Non-linear Functions
• Regression analysis requires a linear
relationship between X and Y.
• When there is a non-linear relationship, you can
transform one or more variables and then use
the transformed variables in the regression
model.
• As long as there is a linear relationship between
the transformed variables, regression analysis is
appropriate.
Exponential
Transformations

• The  coefficient estimate for β represents the
approximate percentage change in Y due to a one-
unit increase in x.
• The variable x always has the same directional
effect on Y (positive or negative)
• The change in Y due to a change in x increases
at an increasing rate
Natural Log Function
The natural log function is the inverse of the exponential
function:      ln (exp (X)) = X
Logarithmic

This can also be used for a subset of X’s.
“Double-Log” Function: Elasticities
If X is believed to have a quadratic effect on Y, then create
a new variable as the square of X and add this to the
regression model:

The change in Y due to a one-unit change in X1 would
be found by differentiating the equation with respect to
this variable:

Hill-shaped if β3 < 0, U-shaped if β3 > 0, linear if β3 = 0
Functions
• The value of X that maximizes or
minimizes Y can have important
implications. This is found by solving for
X in the first-derivative.
• Higher-order functions (ex., cubic) can
also be used in regression. They can
yield better representations of
relationships, but are harder to explain
and interpret.
SPSS Exercise: Faculty
Salaries
An IR analyst is asked to investigate if female faculty are
paid less than comparable males. She draws a sample
of 432 faculty and creates these variables:
• Salary = monthly base salary (in dollars)
• Rank = 1 if Full, 2 if Associate, 3 if Assistant
• Gender = if if male, 0 otherwise
• Prevexp = days of experience before current job
• Npleave = days of non-professional leave
• Potenexp = days since highest degree
• Nine12 = 1 if nine-month appointment, 0 otherwise
• Cite85 = Citations in 1985 to all publications
Open the SPSS system file FACSAL.SAV:
• Estimate a regression model showing how
gender affects salary. How do these results
compare to a two-sample t-test?
• How do your findings change when potential
• An economist argues that salaries rise
exponentially with potential experience,
citations, and gender. How can this be

Note: mean difference
is \$916, which has a t-
value of 6.227 and is
significant.
• The VP for Finance argues that individuals
with high experience levels often get
smaller percentage salary increases than
others. How could this be addressed (use
same function as in previous example)?
• A female faculty member claims that
women face discrimination in part because
they are rewarded less for each citation
they receive. How could you test this?
Model Selection
For most IR problems, there are many
alternative models from which to choose.
How should the “best” model be selected?
• Begin with published studies that look at the
same (similar) Y‟s. What variables and
functional forms do they use?
• Is there a theory that can be used to guide:
human capital theory > salary models
median voter theory > state funding for HE
Tinto‟s model > student retention
More model selection
• Better to include too many factors than to omit
important variables (“omitted variable bias”)
• Can estimate several competing model
specifications and compare results. Be careful
not to simply select model with the most
“appealing” results!
• Keep in mind trade-off between simplicity and
accuracy. A simple model is worth its weight in
gold when explaining to decisionmakers!
Faculty Salary Example
dummy variable for full professors
• Estimate a model explaining salary as a
function of gender, then gender and full
professor.
• Estimate a model explaining salary as a
function of gender, full professor, and
potential experience.
Problems in Regression
Analysis
There are three main problems which may
arise in multiple regression:
• Autocorrelation
• Heteroscedasticity
• Multicollinearity
We will briefly discuss what each means,
how they can be detected, and what can
be done about them when they occur.
Autocorrelation
This can occur in time-series data when the error in
one period is related to the error in the next.
• Violates the assumption E(εiεj) for i  j
• Causes the computer to calculate incorrect
standard errors, thereby affecting t-ratios. Usually,
st.errs are too small, so t-ratios are too high
(making X appear significant when it isn‟t.)
• Possible IR Examples: Predicting applications, HS
First-order autocorrelation

Error
εt
+

0

-

Time
t=4   t=9
Durbin-Watson test
Calculates a “d-statistic” that reflects the
correlation among subsequent error terms:
Correcting Autocorrelation
If autocorrelation is detected, it can be corrected
through transforming the data to yield correct
standard errors (“generalized least squares”).
• Cochrane-Orcutt or Prais-Winston two commonly-
used methods
• Standard “autocorrelation” option in SPSS does
not do this. Use SPSS Trends or another
program.
• Keep in mind that autocorrelation affects the
standard errors and not coefficients.
Heteroscedasticity
May occur in cross-section data when the variance
of the error term is related to one or more
independent variable (σi2 not constant).
• Affects standard errors, and hence t-ratios (but
not coefficient estimates)
Potential IR examples:
• Effects of enrollments on average costs
• Effect of tax revenues on state appropriations
• Effect of program size on expenditures
Graph of Heteroscedasticity
Dependent variable
Regression
line

Independent
variable
As X increases, the possible errors become
larger.
Testing for
Heteroscedasticity
• Visual: Plot residuals against the variable
thought to be causing the problem.
• Park-Glesjer test: Estimate model and save
residuals. Regress the log of squared
residuals against the log of variable thought
to cause the problem.
• Other tests: White (1978), Goldfeld-Quandt.
• SPSS will not do these by default (must do
by hand or with other software).
Correcting for
Heteroscedasticity
• Weighted least squares: Weight
observations by the variable causing
heteroscedasticity. However, you must
know the form.
• For example, if σi2 = σ2X1i, then weighting
each observation by the square root of X1
will yield correct standard errors.
• An option that does not require knowing
the form of heteroscedasticity is by White.
Multicollinearity

Multicollinearity arises when there is an
extremely high correlation between two or
more independent variables in the model.

• The coefficients are biased; the stats program
does not know how to assign proper weights
• Standard errors increase, making t-ratios
small
Multicollinearity (cont‟d)
Potential IR examples include: (1) effect of
current and previous experience on faculty
salaries, (2) effect of SAT score and high
school rank on academic performance, (3)
effect of family income and wealth on student
demand for higher education.
• A significant correlation between X‟s does
not necessarily lead to multicollinearity. Only
when the correlation is very high does this
occur.
Testing/Correcting Multicollinearity
There is no universally-accepted test for
multicollinearity.
• Variance inflation factors (VIF) estimate how much
the standard errors increase due to correlation with
other X‟s. No single “cutoff point” for VIFs.
Signs of multicollinearity include:
• Two similar variables have widely different effects
on Y (e.g., only one is signif.)
• The standard errors are large
To test, drop one of the variables from the model and
compare results. If the coef and st. err. change
considerably, this may be a problem.
Correcting Multicollinearity

There is also no uniformly-accepted
solution to this problem. However, you
can drop one of the problem variables
from the model.
• Multicollinearity may not be an important
issue if the collinearity occurs between
“unimportant” variables.
“Makin‟ Multicollinearity”
a new variable:
• newpot = potenexp / 365 (“years of
exper”) and add this to the regression
model
• Then, make slight changes to first two
data points: change “27.02” to “13” and
change “19.01” to “27”.
• Estimate regression model again, using
gender, potenexp, newpot
Using gender and potenexp:
Using gender, potenexp and newpot:

Variable POTENEXP drops out of the equation because it
is perfectly correlated with NEWPOT.
Using gender, potenexp,
newpot (after changes)

Gender is significant         Standard errors are about forty-
throughout all three models   three times larger than before!
Limited Dependent
Variables
Thus far, we have considered instances
where Y was continuous and unbounded.
However, there are many situations where
this is violated:
• Individual student data are often
dichotomous (0,1) variables: 1 if graduate, 1
if return, 1 if apply/enroll.
• Some data are discrete counts: number of
journal articles or citations, number of times
a student changes his/her major
Problems with OLS when Y is
(0,1)
•Predictions can be > 1 or < 0
•Coefficients may be biased
•Heteroscedasticity is present (σ2 = P(1-P))
•Error term is not normally distributed (only
two possible values), so hypothesis tests
are invalid
Of these problems, the last is the most
severe.
Maximum Likelihood Estimation
In this instance, there are advantages to
using a technique (MLE) in place of OLS.
• MLE: Find the coefficients that maximize
the likelihood of generating the
observations on Y in the sample.
• Recall that OLS chooses the coefficients
based on those that minimize the sum of
squared errors.
Logit and probit analysis

When Y = (0,1), the two most commonly-used
functional forms in MLE are the cumulative
logistic distribution (“logit analysis” or
“logistic regression”) and the cumulative
normal distribution (“probit analysis”).
• The two choices usually yield similar results
• Each avoids the four problems noted with
OLS
Logistic regression
For logistic regression, the following functional form is used:

Ln P/(1-P) = a + b1X1 + b2X2 + …
where P = probability that Y=1

All you have to do, however, is create the dummy variable for
Y and tell SPSS to use logistic regression to estimate the
model. SPSS will create the log odds ratio for you.
Interpreting results
• The coefficients from logistic regression are
hard to interpret and explain.
• Focus on the signs of the coefficients:
– If the sign is positive and significant, then as X
increases, the probability that Y=1 will also
increase
– If the sign is negative and significant, then as X
increases, the probability that Y=1 will
decrease
– If the coefficient is not significant, then X has no
effect on the probability that Y=1.
Example: Faculty Rank
logistic regression model to explain whether a
faculty member is a Full professor: (under
“Regression / bivariate logistic”)
• X‟s include gender, potenexp, prevexp, and
cite85
• Need to create a dummy variable for Full
Professor first
• SPSS Probit module is different than used
here.
Wald Chi-square statistic:         Note that these Chi-
(coefficient / standard error) 2   square values are the
square of the standard
t-ratios

Coefficients:
Effect of X         Standard                                  Odds
P-value
on log odds         errors                                    Ratio
Results from rank analysis:
• Since the coefficient for GENDER is
positive and significant, it means that men
and more likely than women to hold the
rank of Full professor after controlling for
experience and citations.
• The positive and significant coef for
CITE85 means that a faculty member is
more likely to be a Full Prof as citations
rise
Final Exam: SATDATA.SAV
File contains data on 1,999 NH high school seniors
in 1996 who have taken the SAT

• ASSOC = 1 if highest planned    •   SATCOMB = Combined SAT score
degree AA
•   SATCOMB2 = SAT squared
• MA = “       “      “       “
•   ANYAP = 1 if taken any AP course
“ Masters
•   GRADEAVG = high school GPA
• PHD =        “      “       “
“ Doctorate                •   UNH = 1 if sent SAT score to UNH
• MALE = 1 if male                •   KSC = 1 if sent SAT score to KSC
• FIRSTGEN = 1 if 1st             •   PSC = 1 if sent SAT score to PSC
generation
• INCOME = family income
• INCOME2 = income squared
• PUBHS = 1 if attend public
high school
Questions
• How does family income, student ability,
and student intentions affect whether a
student submits SAT scores to UNH, KSC,
or PSC?
• Do SAT takers from poor families and/or
first generation families do worse on the
SAT than other students?

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 12 posted: 11/18/2011 language: English pages: 117
How are you planning on using Docstoc?