Document Sample

Hypothesis Tests with Means Sampling Distribution of the Mean The distribution of the means of an infinite number of samples of size n This could be with respect to a single mean, a difference between independent means, etc. Note: This forms the basis for generating critical values, p-values, etc. Central Limit Theorem The sampling distribution of the mean approaches normal, and this tendency increases with N The mean of the sampling distribution (μĀ) equals μ and the variance of the sampling distribution (σ2Ā) equals σ2/N Extensions of the C.L.T. Regardless of the shape of the original raw score population, as N increases the distribution of sample means will approach normal For a sample size of 30 or greater even an extremely skewed raw score population will generate a normally distributed sampling distribution of the mean Z-test for a Single Mean σ Known Z = (A - μ) / σ was used to test the relative position of a score (A) within a distribution of scores with mean μ and standard deviation σ Z = (Ā - μ) / σĀ is used to test the relative position of a mean within a sampling distribution of means with mean μ and standard deviation σĀ(σ/ √n) Z-test for a Single Mean Example Dr. Brown is interested in whether a group of students who have followed strict vegetarian diets (n = 25) since early childhood will have significantly different IQs than the general population (α=.01) Ho: μv = 100 Ha: μv ≠ 100 Z-test for a Single Mean Example Ā = 91 Z = (Ā - μ) / σĀ = (91 - 100) / (15 / √25) = -3 The probability of obtaining a test statistic more extreme than Z=-3 is .0013 With α=.01 we can conclude that the vegetarian diet group scored significantly less than the general population in IQ (i.e., .0013 < .005) ◦ Thought: What would a good effect size be? How can I do a One- Sample z test in R? iq<-c(95,105,105,108,85,87,83,80,120,72, 107,94,90,67,71,63,84,100,83,87,102,103, 90,101,93) install.packages("TeachingDemos") library(TeachingDemos) z.test (iq, mu=100, sd=15) ◦ One Sample z-test data: ◦ iq z = -3, n = 25, Std. Dev. = 15, ◦ Std. Dev. of the sample mean = 3, p-value = 0.0027 t-test for a Single Mean The logic of the z-test for a single mean and the t-test for a single mean is the same, although the t-test is adopted when we do not have information about the population standard deviation One of the consequences of using s2 as an estimate of σ2 is that the sampling distribution is no longer standard normal (unless n is large) Standard Normal and t distributions More specifically, for small sample sizes s2 will often underestimate σ2 and result in a test statistic (t) that is larger than what would have been found if we (could have) used the true value of σ2 Therefore, the t-distribution is flatter than the standard normal (especially for small n) and requires larger test statistic values for significance Student’s t-distribution t-test for a Single Mean σ Unknown t = (Ā - μ) / sĀ is used to test the relative position of a mean within a sampling distribution of means with mean μ, standard deviation sĀ (s/ √n), and df = n–1 The statistical decision rule we use is to reject Ho: μ = ?, if |t|≥tα, df (for a two-tailed test) t-test for a Single Mean Example Dr. Brown is interested in whether a group of students who have followed strict vegetarian diets (n = 25) since early childhood will have significantly different IQs than the general population (α=.01) Ho: μv = 100 Ha: μv ≠ 100 t-test for a Single Mean Example Ā = 94.36, s=9.76 t=(Ā - μ)/sĀ = (94.36-100) / (9.76 /√25)=-2.89 The one-tailed probability of obtaining a test statistic more extreme than t=-2.89 with n=25 is .004 (or one-tailed t.01, 24 = - 2.492) With α=.01 we can conclude that the vegetarian diet group scored significantly less than the general population in IQ How can I do a One- Sample t test in R? iq<-c(85,105,105,95,85,107,83,80,110,92, 107,94,90,94,94,80,84,100,83,87,102,103,90, 111,93) t.test (iq, mu=100) ◦ One Sample t-test data: iq t = -2.8896, df = 24, p-value = 0.004027 alternative hypothesis: true mean is less than 100 Effect Size The fact that an effect is statistically significant does not necessarily mean that the effect has any “practical significance” As N increases the probability of finding even minute differences between means statistically significant approaches 1 Effect size measures help to clarify the meaning of significant or nonsignificant effects t-test for a Single Mean: Effect Size In order to be able to quantify how large the difference is between the sample mean and the hypothesized value, Cohen’s d can be used Guidelines for Interpreting Cohen’s d Cohen provided the following guidelines for interpreting d: ◦ .20-.50 is a small effect ◦ .50-.80 is a medium effect ◦ .80 + is a large effect Therefore, for our experiment we can conclude that the effect is statistically significant and has a moderate effect size ◦ Note that even the difference between the sample mean and hypothesized value could be used as an appropriate measure of effect sizes Confidence Intervals for the One- Sample t-test Confidence interval: If samples of size n are drawn repeatedly from a population, and a CI is calculated from each sample, then 95% of these intervals should contain the population mean 1-α% CI = Ā +/- tα,df sĀ For our previous example: ◦ 99% CI = 94.36 +/- 2.797 (1.95) = {88.91, 99.81} The fact that the CI does not include 100 verifies our previous hypothesis testing conclusion Single Sample Inference with Nonnormal Distributions Valid use of the one-sample t-test requires that ◦ A) The observations are independent of one another (random sampling from the population) ◦ B) The population distribution is normal in form When the population distribution is not normal the probability of spurious results (Type I/Type II errors) increases (and this tendency is greatest with small n) ◦ Thought: Why does the CLT not completely fix this problem? In this case we can use a nonparametric test Wilcoxon’s Signed Rank Test When the distribution shape is nonnormal we can use the signed rank test to make more valid inferences Let’s say we want to determine if the average IQ of Psych grad students is greater than 110 (Ho: μ=110; H1: μ>110) ◦ Data: 115, 123, 128, 116, 106, 91, 113 ◦ (-110): 5, 13, 18, 6, -4, -19, 3 ◦ Ordered (regardless of sign): 3,-4,5,6,13,18,-19 ◦ Signed Ranks: 1,-2,3,4,5,6,-7 Wilcoxon Signed Ranks Example We take the smaller of the absolute value of the sum of the positive ranks (19) and negative ranks (9) as our test statistic (T) ◦ Thus, we compare T=9 to our critical value in Appendix T (which is based on n, α and one- tailed/two-tailed test) ◦ At α=.05 (one-tailed) we would need a T of 3 or less to reject Ho (hence we do not reject the null hypothesis and conclude that grad student IQ’s do not differ from 110) How can I do a Wilcoxon Signed Ranks test in R? ◦ > wilcox.test(iq, mu=110, alternative="greater") Wilcoxon signed rank test data: iq V = 19, p-value = 0.2344 alternative hypothesis: true location is greater than 110 Two Independent-Samples t-test The two independent-samples t-test is much more common in empirical studies than is the one-sample t-test The primary reason is that we rarely have information about population means (or even legitimate comparison values) so we compare two sample means (where often one group is a control group) Sampling Distribution of the Difference between Means The mean of the sampling distribution of the difference between means is μ1 -μ2 and the variance is σ21/n1 + σ22/n2 From this we could deduce that the formula for the two independent-samples t is: ( X 1 - X 2 ) - (m1 - m2 ) t= s 2 1 s 2 2 + n1 n2 Two Independent-Samples t However, if we assume that the variances are equal (more on that to come) we can take a weighted average of the variances (i.e., compute a pooled estimate of the variance) This statistic will be: Two Independent Samples t This statistic is distributed as t with n1+n2-2 degrees of freedom The assumptions of this statistic are: ◦ 1. Subjects are randomly and independently selected from their respective populations ◦ 2. Population variances are equal ◦ 3. Population distributions are normal in form Two Independent-Samples t Example Dr. Stein would like to know if there are motivational differences between students in an 8:30 a.m. class and students in an 11:30 a.m. class (α=.10) She posts a notice for students to sign up to fill out a questionnaire on “achievement motivation” 5 and 15 students from the 8:30 and 11:30 classes, respectively, show up to fill out the questionnaire Two Independent-Samples t Example Results (achievement motivation scores): ◦ 8:30: 12, 1, 14, 15, 2 ◦ 11:30: 6, 4, 4, 6, 3, 6, 3, 7, 5, 6, 5, 4, 5, 4, 6 Ā1 = 8.8, s1 = 6.76 Ā2 = 4.93, s2 = 1.22 Ho: μ1 = μ2, H1: μ1 ≠ μ2 df = n1 + n2 - 2 = 5 + 15 - 2 = 18 Two Independent-Samples t Example Decision Rules: ◦ If |t| ≥ tα,df then reject Ho ◦ If |t| < tα,df then do not reject Ho ▪ Note that if we were using R the two tailed p-value would have been .039 Two Independent-Samples t Example t.10,18 = 1.734 Therefore, since our obtained t (2.23) is greater than our critical t (1.734) we reject the null hypothesis and conclude that motivation scores are significantly higher in the 8:30 a.m. class than in the 11:30 a.m. class Using our two-tailed p-value (.039), we would make the same conclusion because .039 < .10 How can I do a two independent- samples t test in R? motiv<-c(12, 1, 14, 15, 2, 6, 4, 4, 6, 3, 6, 3, 7, 5, 6, 5, 4, 5, 4, 6) class<-rep(c("8:30","11:30"),c(5,15)) t.test(motiv~class,var.equal=T,conf.level=.9) ◦ Two Sample t-test data: motiv by class ◦ t = -2.2257, df = 18, p-value = 0.03906 alternative hypothesis: true difference in means is not equal to 0 ◦ 90 % CI: -6.8792855 -0.8540479 Confidence Interval for the Difference between Means 90% CI = (Ā1 - Ā2) +/- t α,df sĀ1-Ā2 For our previous example: ◦ 90% CI = (8.80 - 4.93) +/- 1.734 (1.74) = {0.85, 6.89} The fact that the CI does not include 0 verifies our previous conclusion that the difference between the mean differs from 0 Cohen’s d For population parameters, d = (μ1 - μ2) / σ For sample statistics, d = (Ā1 - Ā2) / sp, ◦ Where sp represents the square root of the pooled variance Cohen’s d Example From our previous example we can calculate d to be 1.15, which would be considered a large effect Measures of Association Strength Eta-squared (η2), or the squared point- biserial correlation (r2pb), provides a useful measure of the proportion of variability in the dependent variable that can be accounted for by variability in the independent variable η2 = r2pb = t2 / (t2 + df) For our example η2 = 4.97/ (4.97+18) = .22 Interpreting Eta-Squared Cohen suggested the following guidelines for interpreting eta-squared ◦ .01-.05 is a small association ◦ .06-.14 is a medium association ◦ .15 + is a large association Therefore, 22% of the variability in motivation can be attributed to the difference in the times of the classes, which we can interpret as a large effect Omega-Squared Some authors have reported that the eta- squared statistic is biased and have instead recommended a modified version of the eta- squared statistic, called omega-squared (ω2) ω2 = (t2 - 1) / (t2 + df + 1) For our example: ω2 = 3.97/23.97 = .17 (which would still be considered a large effect) The Variance Homogeneity Assumption In computing the two independent-samples t test in the previous example one of the assumptions was that the variances of the two groups were equal However, recall that s21=45.7 and s22=1.49 What effect does this have on our test statistic? Not much, unless the sample size are unequal Sample Size & Variance Heterogeneity When both sample sizes and variances are unequal the t-test can become severely biased (with respect to Type I and Type II error rates) Why is this? ( n1 - 1) s + ( n 2 - 1) s 2 1 2 2 sp = n1 + n 2 - 2 Positively and Negatively Paired Sample Sizes and Variances Liberal Test: When the larger n is paired with the smaller s2 (and the smaller n is paired with the larger s2) the empirical Type I error rate is inflated Conservative Test: When the larger n is paired with the larger s2 (and the smaller n is paired with the smaller s2) the empirical Type I error rate is deflated How to Detect Variance Heterogeneity (Levene) Levene’s Test: Levene developed a test of variance homogeneity that tests the null hypothesis that the group variances are equal ◦ Therefore, a significant Levene test (i.e., p ≤α, where α is usually set at .10) indicates that the variances are not equal The ‘lawstat’ package in R has an excellent levene.test function that provides modified versions of the test based on the median or trimmed mean How to Detect Variance Heterogeneity (Variance Ratios) Another way to determine if the variances are unequal is to just look at the ratio of the largest to smallest variance Ratios larger than 2:1 (for unequal ns) or 4:1 (for equal ns) indicate variance heterogeneity Note that both methods for detecting variance heterogeneity are affected by nonnormality (although the Levene test is less affected, especially when used with the median or trimmed mean) Welch (1938) Test Statistic A statistic developed by Welch can be used to test for mean equality when the variances of the two groups are not equal The Welch statistic is reported in SPSS as the two independent samples t with “equal variances not assumed” X 1 - X 2 t'= 2 2 s 1 s2 + n 1 n 2 Sampling Distribution of t’ The sampling distribution of t’ is very difficult to completely determine, although t’ is approximately distributed as t with df’ degrees of freedom, where: 2 æs 2 1 s ö2 2 ç + ÷ è n1 n 2 ø df ' = s14 s2 4 + 2 n 1 (n 1 - 1) n 2 (n 2 - 1) 2 Welch t for the Motivation Example If we compare the motivation of the 8:30 and 11:30 a.m. classes using t’ we find that t=1.27 with df’=4.10 Rounding off df’ to 4, we have a critical t of 2.132 Therefore, with the Welch t, there is no significant difference Welch t for the Motivation Example Why might we find a significant difference between the 8:30 and 11:30 a.m. classes on motivation with the Student t, but not with the Welch t? Take a look at the pattern of the sample sizes and variances (negatively paired) ◦ 8:30 class: n = 5, s1 = 6.76 ◦ 11:30 class: n 15, s2 = 1.22 The significant independent samples t test result may have been a Type I error How can I do a two independent- samples Welch t test in R? motiv<-c(12, 1, 14, 15, 2, 6, 4, 4, 6, 3, 6, 3, 7, 5, 6, 5, 4, 5, 4, 6) class<-rep(c("8:30","11:30"),c(5,15)) t.test(motiv~class,conf.level=.9) ◦ Welch Two Sample t-test data: motiv by class ◦ t = -1.2721, df = 4.088, p-value = 0.2709 alternative hypothesis: true difference in means is not equal to 0 ◦ 90 % confidence interval: -10.307118 2.573785 Welch t or Student t???? When variances are equal the Welch t (t’) is only slightly less powerful than the Student t Further, when variances are unequal the Welch t maintains Type I error rates at the nominal level (i.e., α) So why do we not always use Welch???

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 1 |

posted: | 6/14/2013 |

language: | English |

pages: | 49 |

OTHER DOCS BY pptfiles

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.