# Frequency Distributions by dffhrtcv3

VIEWS: 0 PAGES: 41

• pg 1
S
l
i
d
e           Multiple Regression
1

Sample Homework Problem

Solving the Problem with SPSS

Logic for multiple Regression
S
l
i
d
e   Key points about multiple regression
2

Ø   Few, if any, phenomena in social and behavioral
research can be explained with a single predictor.
More realistically, social phenomena are very
complex, requiring a number of predictors to model
the relationship.

Ø   Multiple regression is an extension of simple linear
regression that enables us to include multiple
predictors in our regression equation. The
interpretation of a multiple regression is very similar
to the interpretation of a simple linear regression,
but there are important differences.
S
l
i
d
e      Similarities and differences - 1
3

Ø   In both simple linear and multiple regression, there
is an ANOVA test of the overall relationship.

Ø   In both simple linear and multiple regression, R2
represents the proportion of variance explained
(error reduced) in predicting the dependent variable
based on the independent variable.

Ø   In both simple linear and multiple regression,
Multiple R represents the strength of the relationship
and the effect size. In multiple regression it is always
positive and is not equal to any of the beta
coefficients.
S
l
i
d
e      Similarities and differences - 2
4

Ø   In simple linear regression, the significance of the
overall relationship and the relationship of each
independent variable were the same. In multiple
regression, there is a test of significance for the
coefficient of each independent variable.

Ø   There is no necessary relationship between the
significance of the overall relationship and the
significance of the relationships for each of the
individual predictors. When the overall relationship
is significant, it is possible that none, some, or all of
the individual relationships will be significant.
S
l
i
d
e      Similarities and differences - 3
5

Ø   Multiple regression is required to satisfy all of the
assumptions of simple linear regression:
Ø 1. The relationship is linear
Ø 2. The residuals have the same variance
Ø 3. The residuals are independent of each other
Ø 4. The residuals are normally distributed

Ø The independent variables are independent of one
another, i.e. they add to the variance explained
in the dependent variable rather than explain the
same variance explained by other independent
variables.
S
l
i
d
e      Similarities and differences - 4
6

Ø   In a multiple regression equation, the coefficient for
each individual variable represents the change in the
dependent variable that it is uniquely responsible
for, i.e. assuming the relationships between the
other independent variables and the dependent
variable.
Ø   The correlation between individual predictors results
in contribution toward explaining the dependent
variable made jointly by both, and not credited to
either individual predictor.
Ø   In extreme cases, the relationship between
independent variable is so strong that they are not
credited with explaining the dependent variable,
even though both might have a strong individual
relationship to the dependent variable.
S
l
i
d
e      Similarities and differences - 5
7

Ø   If this happens, we may have predictors that really
have a strong relationship having a b coefficient that
is not statistically significant. The interpretation,
based on the non-significant b coefficient, that the
variable did not have a relationship would be an
error.

Ø   To satisfy the assumption of independence of
variable, our regression must not include variables
that are collinear.

Ø   The diagnostic statistic for detecting
multicollinearity is “tolerance,” which SPSS includes
in the table of coefficients.
S
l
i
d
e      Similarities and differences - 6
8

Ø   In extreme cases of multicollinearity, SPSS cannot
compute the regression equation. In this case, SPSS
will exclude the variable which it thinks is producing
the variable even though we have told it to include
the variable in the analysis.
S
l
i
d
e      Similarities and differences - 7
9

Ø   Having more than one predictor in the regression
equation leads to the question of which variable has
the more important relationship to the dependent
variable, i.e. which has the largest impact on the
predicted scores.

Ø   Since beta coefficients are standardized, the one
with the largest absolute value (ignoring the sign) is
the most important, since it is the amount of
increase in standard deviations for the dependent
variable that is produced by a one standard deviation
change in the independent variable.
S
l
i
d
e

1
Change in response for sample size
0

Ø   On the simple linear regression problems, the answer
was an Incorrect application of a statistic if the
sample size available to the analysis was less than
the number recommended by Tabachnick and Fidell.
Ø   In reviewing problems, there were numerous
occasions when a smaller sample yielded a
statistically significant result, making the response
Incorrect application of a statistic inappropriate
itself.
Ø   For these problems, I am changing the response to
reflects the possibility that planning a sample of the
given size risked not finding a significant result, but
does not negate an otherwise useful result.
S
l
i
d
e            Sample homework problem:
1
1
Multiple regression – part 1
Based on information from the data set 2001WorldFactbook.sav, is the
This is the general framework for the
the homework
following statement true, false, problems in on multiple regression a statistic?
or an incorrect application of
assignment
Use .05 for alpha.               problems.

"Population growth rate" [pgrowth],"total fertility rate" [fertrate] and
"percent of the population below poverty line" [poverty] significantly
predicted "infant mortality rate" [infmort]. The relationship was strong
and reduced the error in predicting "infant mortality rate" by
approximately 75% (R² = 0.753, F(3, 91) = 92.67, p < .001).

"Population growth rate" significantly predicted "infant mortality rate", ß =
-0.393, t(91) = -4.04, p < .001. Higher values of "population growth rate"
were inversely related to lower values of "infant mortality rate".
The problem includes a statement for
the overall relationship, an individual
statement for each of the
"Total fertility rate" significantly predicted "infant mortality rate", ß =
independent variables, and a
statement on
0.965, t(91) = 8.90, p < .001. Higher predictors. the relative importance were
of values of "total fertility rate"
directly related to higher values of "infant mortality rate".
S
l
i
d
e            Sample homework problem:
1
2
Multiple regression - part 2
(cont’d)

"Percent of the population below poverty line" significantly predicted
"infant mortality rate", ß = 0.280, t(91) = 4.41, p < .001. Higher values of
"percent of the population below poverty line" were directly related to
higher values of "infant mortality rate".

"Total fertility rate" [fertrate] was the most important predictor of the
value of "infant mortality rate" [infmort] compared to the other
independent variables.

o   True
o   True with caution
o   False                             The problem includes a statement for
o                                     the         relationship, an
Incorrect application of a statistic overall for each of the individual
statement
independent variables, and a
statement on the relative importance
of predictors.
S
l
i
d
e           Sample homework problem:
1
3
Data set and alpha
Based on information from the data set 2001WorldFactbook.sav, is the
following statement true, false, or an incorrect application of a statistic?
Use .05 for alpha.
The first paragraph
identifies:
"Population growth rate" [pgrowth],"total fertility rate" [fertrate] and
•
[poverty] to use, e.g.
"percent of the population below poverty line"The data set significantly
2001WorldFactbook.sav
predicted "infant mortality rate" [infmort]. The relationship was strong
• The alpha level for the
and reduced the error in predicting "infant mortality rate" by
hypothesis test
approximately 75% (R² = 0.753, F(3, 91) = 92.67, p < .001).

"Population growth rate" significantly predicted "infant mortality rate", ß =
-0.393, t(91) = -4.04, p < .001. Higher values of "population growth rate"
were inversely related to lower values of "infant mortality rate".

"Total fertility rate" significantly predicted "infant mortality rate", ß =
0.965, t(91) = 8.90, p < .001. Higher values of "total fertility rate" were
directly related to higher values of "infant mortality rate".
S
l
i
d
e           Sample homework problem:
1
4
The overall relationship
Based on information from the data set 2001WorldFactbook.sav, is the
following statement true, false, or an incorrect application of a statistic?
Use .05 for alpha.

"Population growth rate" [pgrowth],"total fertility rate" [fertrate] and
"percent of the population below poverty line" [poverty] significantly
predicted "infant mortality rate" [infmort]. The relationship was strong
and reduced the error in predicting "infant mortality rate" by
approximately 75% (R² = 0.753, F(3, 91) = 92.67, p < .001).

significantly predicted finding mortality rate", ß =
"Population growth rate" second paragraph states the"infant that we
The
p < to verify with multiple regression. The
-0.393, t(91) = -4.04, want .001. Higheravalues of "population growth rate"
finding identifies:
were inversely related to lower values of "infant mortality rate".
• The independent variables
• The dependent variable
• The strength of the relationship
"Total fertility rate" significantly predicted "infant mortality rate", ß =
0.965, t(91) = 8.90, p < .001. Higher values of "total fertility rate" were
directly related to higher values of "infant mortality rate".
S
l
i
d
e           Sample homework problem:
1
5
Individual relationships
Based on information from the data set 2001WorldFactbook.sav, is the
true, false, or an incorrect application of a statistic?
following statementEach of the paragraphs for the individual
independent variables contains:
Use .05 for alpha.
• A statement about the significance of the
relationship between the individual
independent variable and the dependent
"Population growth        [pgrowth],"total fertility rate" [fertrate]
rate"variable                                  and
"percent of the population below poverty line" [poverty] significantly
• A statement about the direction of the
predicted "infant mortality rate" [infmort]. The relationship was strong
relationship between the individual
independent variable and the dependent
predicting "infant mortality rate" by
and reduced the error invariable
approximately 75% (R² = 0.753, F(3, 91) = 92.67, p < .001).

"Population growth rate" significantly predicted "infant mortality rate", ß =
-0.393, t(91) = -4.04, p < .001. Higher values of "population growth rate"
were inversely related to lower values of "infant mortality rate".

"Total fertility rate" significantly predicted "infant mortality rate", ß =
0.965, t(91) = 8.90, p < .001. Higher values of "total fertility rate" were
directly related to higher values of "infant mortality rate".
S
l
i
d
e             Sample homework problem:
1
6
Importance of variables
"Percent of the population below poverty line" significantly predicted
"infant mortality rate", ß = 0.280, t(91) = 4.41, p < .001. Higher values of
The last paragraph is a statement of the
relative importance of were directly
"percent of the population below poverty line" the predictors, related to
e.g. which variable makes the largest
higher values of "infant mortality rate".dependent variable.
change in the

"Total fertility rate" [fertrate] was the most important predictor of the
value of "infant mortality rate" [infmort] compared to the other
independent variables.
True if all parts of
the problem are
o   True                 correct.
o   True with caution                                The answer will be
False if any part of the
o   False                                            problem is not correct.
o   Incorrect application of a statistic

will be True with caution if                    Incorrect application of a
the analysis includes an                        statistic if the level of
ordinal or we do not meet                       measurement or multicollinearity
the sample size                                 requirement is violated.
requirement.
S
l
i
d
e   Solving the problem with SPSS:
1
7
Level of measurement

Multiple regression requires that the dependent
variable be interval and the independent variables be
interval or dichotomous. "Infant mortality rate"
[infmort] is interval level, satisfying the requirement
for the dependent variable. "Population growth rate"
[pgrowth] is interval level, satisfying the requirement
for the independent variable. "Total fertility rate"
[fertrate] is interval level, satisfying the requirement
for the independent variable. "Percent of the
population below poverty line" [poverty] is interval
level, satisfying the requirement for the independent
variable.
S
l
i
d
e   Solving the problem with SPSS:
1
8
Multiple regression -1
the other issues involved
in solving the problem,
we need to generate the
SPSS output.

Select Regression
> Linear… from the
S
l
i
d
e   Solving the problem with SPSS:
1
9
Multiple regression -2
First, move the
dependent variable
infmort to the
Dependent list
box.

Second, move the independent
variables pgrowth, fertrate, and
poverty to the Independents
list box.

Third, click on the
statistics.
S
l
i
d
e   Solving the problem with SPSS:
2
0
Multiple regression -3

Second, click on the
Continue button to
close the dialog box.
the check box for
Descriptives and
Collinearity diagnositics.
S
l
i
d
e   Solving the problem with SPSS:
2
1
Multiple regression -4

Linear Regression
dialog box, we click on
OK to obtain the
output.
S
l
i
d
e   Solving the problem with SPSS:
2
2
Multicollinearity

The tolerance values for all of the independent
variables are larger than 0.10: "population growth
rate" [pgrowth] (0.287), "total fertility rate" [fertrate]
(0.230) and "percent of the population below poverty
line" [poverty] (0.673).

Multicollinearity is not a problem in this regression
analysis.
S
l
i
d
e   Solving the problem with SPSS:
2
3
Sample size

Using the rule of thumb from Tabachnick and Fidell
that the required number of cases should be the
larger of the number of independent variables x 8
+ 50 or the number of independent variables +
105, multiple regression requires 108 cases. With
95 valid cases, the sample size requirement is not
satisfied. A caution should be added to our findings.

NOTE: adding a caution to our
findings rather than concluding that it
is not an appropriate use of statistics
is a more reasonable response than
what we did for multiple regression.
S
l
i
d
e       Solving the problem with SPSS:
2
4
Interpreting the overall relationship - 1

The first sentence in the
finding states that:                                                     The R² of .753 is the
"Population growth rate"                                                 reduction in error
[pgrowth],"total fertility                                               achieved by using scores
rate" [fertrate] and "percent                                            for Population growth
of the population below                                                  rate" [pgrowth],"total
poverty line" [poverty]                                                  fertility rate" [fertrate]
significantly predicted                                                  and "percent of the
"infant mortality rate"                                                  population below poverty
[infmort]. The relationship                                              line" [poverty] to predict
was strong and reduced                                                   scores for "infant
the error in predicting                                                  mortality rate" [infmort].
"infant mortality rate" by
approximately 75% (R²
= 0.753, F(3, 91) =
92.67, p < .001).

The overall relationship between the independent
variables "population growth rate"
[pgrowth],"total fertility rate" [fertrate] and
"percent of the population below poverty line"
[poverty] and the dependent variable "infant
mortality rate" [infmort] was statistically
significant, R² = 0.753, F(3, 91) = 92.67, p <
.001.
S
l
i
d
e       Solving the problem with SPSS:
2
5
Interpreting the overall relationship - 2

The first sentence in the
finding states that:
"Population growth rate"
[pgrowth],"total fertility
rate" [fertrate] and
"percent of the population
below poverty line"
[poverty] significantly
predicted "infant mortality
rate" [infmort]. The
relationship was strong and
reduced the error in
predicting "infant mortality
rate" by approximately 75%
(R² = 0.753, F(3, 91) =
92.67, p < .001).

We reject the null hypothesis that all of
the partial slopes (b coefficients) = 0 and
conclude that at least one of the partial
slopes (b coefficients) ≠ 0.
S
l
i
d
e       Solving the problem with SPSS:
2
6
Interpreting the overall relationship - 3

The first sentence in the
finding states that:
"Population growth rate"
[pgrowth],"total fertility
rate" [fertrate] and "percent
of the population below
poverty line" [poverty]
significantly predicted
"infant mortality rate"
[infmort]. The relationship
was strong and reduced
the error in predicting
"infant mortality rate" by
approximately 75% (R² =
0.753, F(3, 91) = 92.67, p
< .001).

The Multiple R of 0.868 was correctly
characterized as a strong relationship,
using Cohen’s criteria:

• r < .1 =   Trivial
• .1 ≤ r <   .3 = Small or weak
• .3 ≤ r <   .5 = Medium or
moderate
• r ≥ .5 =   Large or strong
S
l
i
d
e       Solving the problem with SPSS:
2
7
Interpreting individual relationships - 1
The second sentence in the finding states that:
"Population growth rate" significantly
predicted "infant mortality rate", β = -0.393,
t(91) = -4.04, p < .001. Higher values of
"population growth rate" were inversely related to
lower values of "infant mortality rate".

The individual relationship
between the independent
variable "population growth
rate" [pgrowth] and the
dependent variable "infant
mortality rate" [infmort] was
statistically significant, β = -
0.393, t(91) = -4.04, p <
.001.

We reject the null hypothesis that the partial slope
(b coefficient) for the variable "population growth
rate" = 0 and conclude that the partial slope (b
coefficient) for the variable "population growth rate"
≠ 0.
S
l
i
d
e        Solving the problem with SPSS:
2
8
Interpreting individual relationships - 2
The second sentence in the finding states that:
"Population growth rate" significantly predicted
"infant mortality rate", β = -0.393, t(91) = -4.04,
p < .001. Higher values of "population
growth rate" were inversely related to lower
values of "infant mortality rate".

The negative sign of the B coefficient and the
Beta coefficient implies that higher values of
"population growth rate" were inversely related
to lower values of "infant mortality rate".
S
l
i
d
e        Solving the problem with SPSS:
2
9
Interpreting individual relationships - 3
The third sentence in the finding states that:
"Total fertility rate" significantly predicted
"infant mortality rate", β = 0.965, t(91) =
8.90, p < .001. Higher values of "total fertility
rate" were directly related to higher values of
"infant mortality rate".

The individual relationship
between the independent variable
"total fertility rate" [fertrate] and
the dependent variable "infant
mortality rate" [infmort] was
statistically significant, β = 0.965,
t(91) = 8.90, p < .001.

We reject the null hypothesis that the partial slope
(b coefficient) for the variable "total fertility rate"
= 0 and conclude that the partial slope (b
coefficient) for the variable "total fertility rate" ≠
0.
S
l
i
d
e        Solving the problem with SPSS:
3
0
Interpreting individual relationships - 4
The third sentence in the finding states that:
"Total fertility rate" significantly predicted "infant
mortality rate", β = 0.965, t(91) = 8.90, p <
.001. Higher values of "total fertility rate"
were directly related to higher values of
"infant mortality rate".

The positive sign of the B coefficient and the
Beta coefficient implies that higher values of
"total fertility rate" were directly related to
higher values of "infant mortality rate".
S
l
i
d
e       Solving the problem with SPSS:
3
1
Interpreting individual relationships - 5
The fourth sentence in the finding states that:
"Percent of the population below poverty
line" significantly predicted "infant mortality
rate", β = 0.280, t(91) = 4.41, p < .001.
Higher values of "percent of the population below
poverty line" were directly related to higher
values of "infant mortality rate".

The individual relationship
between the independent variable
"percent of the population below
poverty line" [poverty] and the
dependent variable "infant
mortality rate" [infmort] was
statistically significant, β =
0.280, t(91) = 4.41, p < .001.

We reject the null hypothesis that the partial slope
(b coefficient) for the variable "population growth
rate" = 0 and conclude that the partial slope (b
coefficient) for the variable "population growth rate"
≠ 0.
S
l
i
d
e        Solving the problem with SPSS:
3
2
Interpreting individual relationships - 6
The fourth sentence in the finding states that:
"Percent of the population below poverty line"
significantly predicted "infant mortality rate",
β = 0.280, t(91) = 4.41, p < .001. Higher
values of "percent of the population
below poverty line" were directly related
to higher values of "infant mortality
rate".

The positive sign of the B coefficient and the
Beta coefficient implies that higher values of
"percent of the population below poverty line"
were directly related to higher values of "infant
mortality rate".
S
l
i
d
e       Solving the problem with SPSS:
3
3
Interpreting individual relationships - 7

The fifth sentence in the finding states
that:
"Total fertility rate" [fertrate] was the most
important predictor of the value of "infant
mortality rate" [infmort] compared to the
other independent variables.

"Total fertility rate" [fertrate] was the most
important predictor because the absolute value
of it's beta coefficient (0.965) was larger than
the absolute value of the beta coefficients for
the other independent variables.
S
l
i
d
e            Solving the problem with SPSS:
3
4

The findings for this problem state that:
• "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and
"percent of the population below poverty line" [poverty] significantly
predicted "infant mortality rate" [infmort]. The relationship was
strong and reduced the error in predicting "infant mortality rate" by
approximately 75% (R² = 0.753, F(3, 91) = 92.67, p < .001).
• "Population growth rate" significantly predicted "infant mortality
rate", ß = -0.393, t(91) = -4.04, p < .001. Higher values of
"population growth rate" were inversely related to lower values of
"infant mortality rate".
• "Total fertility rate" significantly predicted "infant mortality rate", ß =
0.965, t(91) = 8.90, p < .001. Higher values of "total fertility rate"
were directly related to higher values of "infant mortality rate".
• "Percent of the population below poverty line" significantly predicted
"infant mortality rate", ß = 0.280, t(91) = 4.41, p < .001. Higher
values of "percent of the population below poverty line" were directly
related to higher values of "infant mortality rate".
• "Total fertility rate" [fertrate] was the most important predictor of
the value of "infant mortality rate" [infmort] compared to the other
independent variables.

All of the statements of findings
are true, so the answer to the
question is True with caution.
The caution is added because we
did not satisfy the required sample
size.
S
l
i
d
e              Logic for multiple regression:
3
5
Level of measurement

Measurement
level of
independent
variable?
Nominal                           Interval/Ordinal
/Dichotomous

Inappropriate
application of                       Measurement
a statistic                           level of
dependent
variable?
Interval/ordinal                         Nominal/
Dichotomous

Strictly speaking, the                                            Inappropriate
test requires an interval                                         application of
level variable. We will                                             a statistic
allow ordinal level
variables with a
caution.
S
l
i
d
e   Logic for multiple regression:
3
6
multicollinearity

Compute linear
regression including
descriptive statistics

Tolerance for all
independent
variables ≥ 0.10?
No

Yes          Inappropriate
application of
a statistic
S
l
i
d
e              Logic for multiple regression:
3
7
Sample size requirement

Compute linear
regression including
descriptive statistics

Valid cases
satisfies computed
requirement?
No

The sample size requirement is               Yes          Caution added
the larger of :                                            to any true
findings
• the number of independent
variables x 8 + 50

• the number of independent                              NOTE: violation of
variables + 105                                        sample size
requirements is a
caution rather than an
inappropriate application
of a statistic.
S
l
i
d
e          Logic for multiple regression:
3
8
Significant, non-trivial overall relationship

Probability for F-test
for all coefficients
less than or equal to
alpha?
No

Yes                False

Effect size (Multiple R) is
not trivial by Cohen’s scale,
i.e. equal to or larger than
0.10?
No

Yes                        False
S
l
i
d
e    Logic for multiple regression:
3
9
Strength of overall relationship

Strength of relationship
correctly interpreted
(Multiple R)?
No

Yes                   False

Reduction in error
correctly interpreted
based Multiple R²?
No

Yes           False
S
l
i
d
e   Logic for multiple regression: Significance
4
0
and direction individual relationships

Probability for t-test
for B coefficient less
than or equal to
alpha?
No
These steps must
be repeated for
each independent               Yes             False
variable.

Direction of relationship
correctly interpreted
based on B or Beta
coefficient?
No

Yes                 False
S
l
i
d
e         Logic for multiple regression:
4
1
Importance of individual predictors

Predictor with largest
absolute Beta
identified as most
important?
No

Yes            False

The statistics in the SPSS
output match all of the
statistics cited in the
problem?
No