VIEWS: 0 PAGES: 45 POSTED ON: 12/14/2011 Public Domain
Multivariate Analysis One-way ANOVA Tests the difference in the means of 2 or more nominal groups E.g., High vs. Medium vs. Low exposure Can be used with more than one IV Two-way ANOVA, Three-way ANOVA etc. ANOVA _______-way ANOVA Number refers to the number of IVs Tests whether there are differences in the means of IV groups E.g.: Experimental vs. control group Women vs. Men High vs. Medium vs. Low exposure Logic of ANOVA Variance partitioned into: 1. Systematic variance: the result of the influence of the Ivs 2. Error variance: the result of unknown factors Variation in scores partitions the variance into two parts by calculating the “sum of squares”: 1. Between groups variation (systematic) 2. Within groups variation (error) SS total = SS between + SS within Significant and Non-significant Differences Significant: Non-significant: Between > Within Within > Between Partitioning the Variance Comparisons Total variation = score – grand mean Between variation = group mean – grand mean Within variation = score – group mean Deviation is taken, then squared, then summed across cases Hence the term “Sum of squares” (SS) One-way ANOVA example Total SS (deviation from grand mean) Group A Group B Group C 49 56 54 52 57 52 52 57 56 53 60 50 49 60 53 Mean = 51 58 53 Grand mean = 54 One-way ANOVA example Total SS (deviation from grand mean) Group A Group B Group C -5 25 2 4 0 0 -2 4 3 9 -2 4 -2 4 3 9 2 4 -1 1 6 36 -4 16 -5 25 6 36 -1 1 Sum of squares = 59 + 94 + 25 = 178 One-way ANOVA example Between SS (group mean – grand mean) A B C Group means 51 58 53 Group deviation from grand mean -3 4 -1 Squared deviation 9 16 1 n(squared deviation) 45 80 5 Between SS = 45 + 80 + 5 = 130 Grand mean = 54 One-way ANOVA example Within SS (score - group mean) A B C 51 58 53 Deviation from group means -2 -2 1 1 -1 -1 1 -1 3 2 2 -3 -2 2 0 Squared deviations 4 4 1 1 1 1 1 1 9 4 4 9 4 4 0 Within SS = 14 + 14 + 20 = 48 The F equation for ANOVA F = Between groups sum of squares/(k-1) Within groups sum of squares/(N-k) N = total number of subjects k = number of groups Numerator = Mean square between groups Denominator = Mean square within groups F-table page 195% POINTS FOR THE F DISTRIBUTION Page 1 Numerator Degrees of Freedom * 1 2 3 4 5 6 7 8 9 10 * 1 161 199 216 225 230 234 237 239 241 242 1 2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 2 D 3 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 3 e 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 4 n 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 5 o m 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 6 i 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 7 n 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 8 a 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 9 t 10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 10 o r 11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 11 12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 12 D 13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 13 e 14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 14 g 15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 15 r e 16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 16 e 17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 17 s 18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 18 19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 19 o 20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 20 f 21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 21 F 22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 22 r 23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 23 e 24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 24 e 25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 25 d o 26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 26 m 27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 27 28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 28 29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 29 30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 30 35 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.11 35 40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 40 50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03 50 60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 60 70 3.98 3.13 2.74 2.50 2.35 2.23 2.14 2.07 2.02 1.97 70 80 3.96 3.11 2.72 2.49 2.33 2.21 2.13 2.06 2.00 1.95 80 100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93 100 150 3.90 3.06 2.66 2.43 2.27 2.16 2.07 2.00 1.94 1.89 150 300 3.87 3.03 2.63 2.40 2.24 2.13 2.04 1.97 1.91 1.86 300 1000 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84 1000 Significance of F F-critical is 3.89 (2,12 df) F observed 16.25 > F critical 3.89 Groups are significantly different -T-tests could then be run to determine which groups are significantly different from which other groups Computer Printout Example Descriptiv es GAVE 'THE FINGER' TO SOMEONE WHILE DRIVI 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum 1.00 1462 1.7148 1.28915 .03372 1.6486 1.7809 1.00 7.00 2.00 1858 1.3660 .93491 .02169 1.3234 1.4085 1.00 7.00 Total 3320 1.5196 1.11830 .01941 1.4815 1.5576 1.00 7.00 ANOVA GAVE 'THE FINGER' TO SOMEONE WHILE DRIVI Sum of Squares df Mean Square F Sig. Between Groups 99.536 1 99.536 81.522 .000 Within Groups 4051.191 3318 1.221 Total 4150.727 3319 Two-way ANOVA ANOVA compares: Between and within groups variance Adds a second IV to one-way ANOVA 2 IV and 1 DV Analyzes significance of: Main effects of each IV Interaction effect of the IVs Graphs of potential outcomes No main effects or interactions Main effects of color only Main effects for motion only Main effects for color and motion Interactions Graphs A R O x Motion U S A * Still L Color B&W No main effects for interactions A R O x Motion U S A * Still L Color B&W No main effects for interactions A R O x Motion U S x x * Still A * * L Color B&W Main effects for color only A R O x Motion U S A * Still L Color B&W Main effects for color only A R * O x x Motion U S A * Still L * x Color B&W Main effects for motion only A R O x Motion U S A * Still L Color B&W Main effects for motion only A R O x x x Motion U S A * Still L * * Color B&W Main effects for color and motion A R O x Motion U S A * Still L Color B&W Main effects for color and motion A R x O x Motion U S x * Still A * L * Color B&W Transverse interaction A R O x Motion U S A * Still L Color B&W Transverse interaction A * x R O x Motion U S A * Still L x * Color B&W Interaction—color only makes a difference for motion A R O x Motion U S A * Still L Color B&W Interaction—color only makes a difference for motion A R x O x Motion U S A * Still L * x * Color B&W Partitioning the variance for Two- way ANOVA Total variation = Main effect variable 1 + Main effect variable 2 + Interaction + Residual (within) Summary Table for Two-way ANOVA Source SS df MS F Main effect 1 Main effect 2 Interaction Within Total Printout Example Tests of Between-Subj ects Effects Dependent Variable: MARIJUANA USE SHOULD BE LEGALIZED Type III Sum Source of Squares df Mean Square F Sig. Corrected Model 74.465 a 7 10.638 3.392 .001 Intercept 5889.077 1 5889.077 1877.565 .000 SEX 13.191 1 13.191 4.205 .040 RACE2 19.048 3 6.349 2.024 .108 SEX * RACE2 .560 3 .187 .060 .981 Error 10366.297 3305 3.137 Total 31942.000 3313 Corrected Total 10440.762 3312 a. R Squared = .007 (Adjusted R Squared = .005) Printout plot Estimated Marginal Means of MARIJUANA USE S 3.2 3.0 2.8 RACE OF RESPONDENT(W 2.6 1.00 2.00 2.4 3.00 2.2 4.00 1.00 2.00 SEX OF RESPONDENT Scatter Plot of Price and Attendance Attendance 2.5 2 1.5 1 0.5 Price 2 3 4 5 6 Price is the average seat price for a single regular season game in today’s dollars Attendance is total annual attendance and is in millions of people per annum. Is there a relation there? Lets use linear regression to find out, that is Let’s fit a straight line to the data. But aren’t there lots of straight lines that could fit? Yes! Desirable Properties We would like the “closest” line, that is the one that minimizes the error The idea here is that there is actually a relation, but there is also noise. We would like to make sure the noise (i.e., the deviation from the postulated straight line) to be as small as possible. We would like the error (or noise) to be unrelated to the independent variable (in this case price). If it were, it would not be noise --- right! Scatter Plot of Price and Attendance Attendance 2.5 2 1.5 1 0.5 Price 2 3 4 5 6 Price is the average seat price for a single regular season game in today’s dollars Attendance is total annual attendance and is in millions of people per annum. Simple Regression The simple linear regression MODEL is: y = 0 + 1x + x y describes how y is related to x 0 and 1 are called parameters of the model. e is a random variable called the error term. Simple Regression Graph of the regression equation is a straight line. β0 is the population y-intercept of the regression line. β1 is the population slope of the regression line. E(y) is the expected value of y for a given x value Simple Regression E(y) Regression line Intercept Slope 1 0 is positive x Simple Regression E(y) Regression line Intercept 0 Slope 1 is 0 x Types of Regression Models 1 Explanatory Regression 2+ Explanatory Variable Models Variables Simple Multiple Non- Non- Linear Linear Linear Linear Regression Modeling Steps 1. Hypothesize Deterministic Components 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation Linear Multiple Regression Model 1. Relationship between 1 dependent & 2 or more independent variables is a linear function Population Population Random Y-intercept slopes error Yi 0 1X 1i 2 X 2i k X ki i Dependent Independent (response) (explanatory) variable variables Multiple Regression Model Multivariate model Yi = 0 + 1X1i + 2X2i + i Y (Observed Y) Response 0 i Plane X2 X1 (X1i,X2i) E(Y) = 0 + 1X1i + 2X2i