Sample answers for Socy 709 Homework 1 Homework for Socy.709, lecture 1: Assigned 1/29/09, due 2/05/09 In this exercise, you or your group will use a STATA data file to explore the relationship betIen fertility and career success. Use the sample STATA .do file I have provided to do some regression analyses and explain your results. 1. Estimate three regression models with socioeconomic index score as the outcome. (Show STATA output for each). a. SEI as a function of childbearing, age, and sex
. regress SEI CHILDS AGE SEX Source | SS df MS -------------+-----------------------------Model | 20778.9294 3 6926.3098 Residual | 952734.284 2624 363.084712 -------------+-----------------------------Total | 973513.213 2627 370.57983 Number of obs F( 3, 2624) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2628 19.08 0.0000 0.0213 0.0202 19.055
-----------------------------------------------------------------------------SEI | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------CHILDS | -1.843863 .2463378 -7.49 0.000 -2.326898 -1.360827 AGE | .0959523 .0238434 4.02 0.000 .0491984 .1427061 SEX | .7101726 .7492548 0.95 0.343 -.7590175 2.179363 _cons | 46.58078 1.566806 29.73 0.000 43.50848 49.65309 -----------------------------------------------------------------------------The main result from model a.) is that each additional child in the family predicts an decrease of 1.84 points in the respondent’s SEI score, net of differences in age and sex.
b.
SEI as a function of childbearing, for men only
. regress SEI CHILDS if SEX==1 Source | SS df MS -------------+-----------------------------Model | 2230.41601 1 2230.41601 Residual | 434467.031 1185 366.638845 -------------+-----------------------------Total | 436697.447 1186 368.210327 Number of obs F( 1, 1185) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1187 6.08 0.0138 0.0051 0.0043 19.148
-----------------------------------------------------------------------------SEI | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------CHILDS | -.8011278 .324809 -2.47 0.014 -1.438393 -.1638629 _cons | 49.99346 .776164 64.41 0.000 48.47065 51.51627 ------------------------------------------------------------------------------
1
c.
SEI as a function of childbearing, for women only
. regress SEI CHILDS if SEX==2 Source | SS df MS -------------+-----------------------------Model | 15269.5867 1 15269.5867 Residual | 524596.511 1448 362.290408 -------------+-----------------------------Total | 539866.097 1449 372.578397 Number of obs F( 1, 1448) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1450 42.15 0.0000 0.0283 0.0276 19.034
-----------------------------------------------------------------------------SEI | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------CHILDS | -2.011947 .309907 -6.49 0.000 -2.619862 -1.404033 _cons | 52.76572 .777605 67.86 0.000 51.24037 54.29108 ------------------------------------------------------------------------------
d.
SEI as a function of childbearing and age, for men only
. regress SEI CHILDS AGE if SEX==1 Source | SS df MS -------------+-----------------------------Model | 11079.7773 2 5539.88867 Residual | 424931.431 1181 359.806462 -------------+-----------------------------Total | 436011.209 1183 368.563997 Number of obs F( 2, 1181) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1184 15.40 0.0000 0.0254 0.0238 18.969
-----------------------------------------------------------------------------SEI | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------CHILDS | -1.652731 .3664228 -4.51 0.000 -2.371644 -.9338193 AGE | .1873807 .0375427 4.99 0.000 .1137228 .2610386 _cons | 42.76771 1.628161 26.27 0.000 39.57329 45.96212 ------------------------------------------------------------------------------
e.
SEI as a function of childbearing and age, for women only
. regress SEI CHILDS AGE if SEX==2 Source | SS df MS -------------+-----------------------------Model | 15919.6435 2 7959.82176 Residual | 521520.01 1441 361.915343 -------------+-----------------------------Total | 537439.653 1443 372.446052 Number of obs F( 2, 1441) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1444 21.99 0.0000 0.0296 0.0283 19.024
-----------------------------------------------------------------------------SEI | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------CHILDS | -2.156846 .3332142 -6.47 0.000 -2.810483 -1.503209 AGE | .0309402 .0307616 1.01 0.315 -.0294022 .0912826 _cons | 51.63514 1.432192 36.05 0.000 48.82574 54.44455 ------------------------------------------------------------------------------
2
2. Interpret the coefficient for CHILDS in model 1b. The coefficient for CHILDS from model b.) indicates that each additional child predicts a decrease of 0.80 points in the male’s SEI score. 3. Interpret the coefficient for the constant in model 1b.
The coefficient for _cons from model b.) indicates that the predicted SEI score for a man with no children is 49.99. 4. What does R-squared in model 1b. signify?
The R2 score of .0051 in model 1b.) indicates that the variable CHILDS explains only 0.51% of the variance in the variable SEI in the analysis for men. This is lower than the R2 of .0283 in the model for women, but the “true” strength of the relationship between CHILDS and SEI for men might be understated due to the confounding effects of age. 5. What does Root MSE in model 1b signify?
The root MSE of 19.148 in model 1b. indicates that when we use CHILDS to predict SEI for men, our prediction is typically off by 19.148 points on the SEI scale. (rMSE is the typical difference between y i and yhat .) 6. Describe and explain the results from models 1b to 1e, using coefficients as needed to support your conclusions. Questions to answer include. “Is childbearing associated with lower socioeconomic success?” “Are results similar or different across gender categories?” “How does age act as a potential confounding variable for men?”
In model b.) I look at the relationship between number of children and SEI only for men, and without additional controls for age. The main result from model b.) is that each additional child predicts a decrease of 0.80 points in the male’s SEI score. However, this model does not control for age, so this is result might be confounded if older men average more children and have higher SEI scores for reasons unrelated to family size. In model c.) I look at the relationship between number of children and SEI only for women, and without additional controls for age. The main result from model c.) is that each additional child predicts an decrease of 2.01 points in the woman’s SEI score. Because the coefficient in the parallel model for men was only -.80, the relationship between number of children and SEI appears to be stronger for women than for men. Note that I haven’t controlled for age yet, so this story might change. In model d.) I look at the relationship between number of children and SEI only for men, and with additional controls for age. The main result from model d.) is that each additional child predicts an decrease of 1.66 points in the man’s SEI score. Because the coefficient in the model for men without age was only -.80, this model suggests that number of children is actually a fairly strong predictor of low SEI for men, but that age is a confounding variable that partly conceals this relationship. In model e.) I look at the relationship between number of children and SEI only for women, and with additional controls for age. The main result from model e.) is that each additional child predicts an decrease of 2.16 points in the woman’s SEI score. Because the coefficient in the model for women without age was -2.01, this model suggests that age did not act (much) as a confounding variable for women. Comparing the coefficient of -1.65 for model d.) and -2.15 for model e.), I conclude that a higher number of children predicts a lower SEI score for both men and women, perhaps more so for women.
7.
Venture some interpretations for your findings in part 6.
I am not surprised that the coefficient for CHILDS is more negative in the full model for women (-2.15) than in the full model for men (-1.65), because I would expect the conflicts between work and family that arise in large families to have a greater impact on women’s careers than on men’s careers. I also think that a large part of the coefficient for CHILDS might be related to variables that I have not measured, because I believe there is a general relationship between lower social class and larger family size such that lower class persons might have both lower SEI scores and larger numbers of children.
3
8.
Obviously this analysis is not free of statistical problems. a. Identify possible problems with this analysis, such as violations of assumptions of the regression model, and the specific results that lead you to think the problems might exist.
The first problem I always look for is nonlinearity in the relationship between Y and X. To check for this problem, I summarize Y for each category of X. (I reformatted the log output to save paper) . by CHILDS: summarize SEI if SEX==1 (for men) Childs | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------0 SEI | 408 49.48529 19.44575 21 97 1 SEI | 192 46.26563 17.12343 20 92 2 SEI | 259 50.66795 19.70986 17 97 3 SEI | 193 50.21244 19.77481 22 97 4 SEI | 68 46.19118 19.07977 21 90 5 SEI | 27 42.55556 19.186 26 85 6 SEI | 15 45.4 17.54911 28 90 7 SEI | 6 37.33333 11.12954 22 50 8 SEI | 19 35.47368 15.00136 21 75
. by CHILDS: summarize SEI if SEX==2 (for women) CHILDS | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------0 SEI | 345 52.48696 19.12932 18 97 1 SEI | 261 49.50575 19.71033 17 97 2 SEI | 374 49.83422 19.50498 17 88 3 SEI | 269 47.88104 18.94495 18 96 4 SEI | 111 42.91892 16.564 17 78 5 SEI | 44 42.70455 19.21794 22 92 6 SEI | 23 38.08696 13.93137 21 76 7 SEI | 15 36.6 17.57758 22 78 8 SEI | 8 36 21.16601 22 73 The main problem here is that the relationship between CHILDS and SEI is not linear for men, but peaks at 2 children. Part of the reason I got smaller results for men than for women could have been that I miss-specified a nonlinear relationship for men using a linear regression model. For men there is also some evidence of heteroskedasticity, in that the standard deviation for SEI scores is smaller at the highest number of children (for example, S.D. is only 11.13 at childs = = 7). This problem might create some bias in the model’s estimates of standard errors. One additional concern might be the small numbers of cases at the largest family sizes. Rare events like CHILDS==8 could be having disproportionate influence on the overall model, especially since that category of children has very low SEI scores. b. How might you change the study to correct the problems?
One solution to the nonlinearity problem might be to include a linear transform of number of children, such as CHILDS and CHILDS2. Another would be to separate CHILDS into a series of dichotomous variables, and restate the problem as a comparison of each number of children to the comparison (standard) number of children, CHILDS = 2. For heteroskedasticity, some researchers recommend robust variance estimation. I don’t see this problem as big enough to warrant a separate fix. Switching to dichotomous categories for CHILDS would correct the problem of influential observations, because each category for number of children would have its own estimate and standard error.
4