Bootstrapping for statistical inference. T-Tests. Jacob Seybert 9/17/09 A couple of quick notes: The “Run;” command in SAS Technically only need one such statement at the end of a program. In practical usage many people place after every proc. Allows for the submission of selected commands. Note: Regarding hypothesis testing and the equality of variances: Confusion during lab regarding “If SDs are equal, estimate is:” This refers to an assumption that the variances of both populations are equal. Assumption of homogeneity of variance (Why the above caused some confusion) When sample sizes are equal or nearly equal size, relatively big differences in population variances have relatively small consequences for t-tests. If you cannot get equal, or close to equal, sample sizes in each group, you correct in the degrees of freedom. Independent Sample T-Test Using Proc TTEST Proc TTEST; class condition; *This is the grouping variable which defines the two groups; var variable ; title ‘Independent t-test’; RUN; Using Proc GLM Proc GLM data=Data1; Class categoricalvar; *This is the grouping variable which defines the two groups; Model dependentvar=independentvar; Means categoricalvar /hovtest; *This finds the means and SDs for each group on the continuous variable and also is used for post- hoc tests. Hovtest runs a test for the homogeneity of variance. Run; Compare: Proc TTEST: Proc GLM: Tests for Equality of Variances “Folded F” Variance1(larger)/Variance2(smaller) = F. Very sensitive to violations of normality. 10.09292/9.33082 = 1.17 Tests for Equality of Variances Levene’s Test of Homogeniety Runs an ANOVA on the deviations from the mean. Tested with an F distribution. What to do if significant? T-test Examine the t-test statistic for unequal variances assumed, Satterthwaite. Just like the Pooled Standard Error method discussed in class. • Lab 7, Slide 11. ANOVA Method of unweighted means I will let Brannick explain that! (If he does.) Tests for Normality Contained in the Proc Univariate command. Descriptive Statistics: Skew (positive=positive skew, etc) Kurtosis – Indicates how peaked the data are Provides 4 statistics: Shapiro-Wilk Want close to 1. Sample size: 7 ≥ N ≤ 2,000 Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling SAS Tests for Normality What to do if not normal? Look at how “not normal” the data is. Transform the data to be more normal Square-root Logarithmic Many others… Examine the robustness of your statistic to violations of normality. We already discussed implications of equality of variance. Bootstrap! Questions about independent sample t-tests? Homework 4 Bootstrapping! Bootstrapping is cool and common. You will hear about it in: Univariate Statistics Multivariate Statistics Factor Analysis Structural Equation Modeling Item Response Theory … and more I haven’t been exposed to. What is bootstrapping? Randomly sampling, with replacement, from an original dataset for use in obtaining statistical estimates. Start with a set of values. Randomly draw a value from the “population”. The value stays in the available population of values. Randomly draw another value from the population. Do this N number of times to fill your dataset. Perform an analysis on your dataset(s) Do this 10,000 times Utilize the results of your 10,000 analyses to draw conclusions. Why bootstrap? Good question. Small sample size. Little to no parametric modeling. Non-normal distribution of the sample. A test of means for two samples. Not as sensitive to N. What bootstrapping looks like: http://luna.cas.usf.edu/~mbrannic/files/software/boots_ind _t.sas http://luna.cas.usf.edu/~mbrannic/files/software/boots_cor rel.sas All done in SAS IML. Bootstrap T-Test Explained 2 Samples Create 10,000 dataset samples (2 for each cycle) from the original 2 samples. For each sample pair, run the t-test. Save the t value in a matrix Sort that matrix of 10,000 t-test values from high to low. Select 95% of the values from middle of the distribution. Removes the extremes. This results in a “Bootstrap Confidence Interval” Want this to not contain zero. Our Example: If the interval contained zero, we cannot be confident that the difference between the sample means is not zero. Examine the Distribution of t: Mean and SD of the t distribution: Examine the distribution: T-Test of Original Data Vs. Bootstrap Correlation Explained A sample of variable pairs. Create 10,000 dataset samples from the original sample pairs. For each dataset, find the correlation. Save the correlation values in a matrix Sort that matrix of 10,000 correlation values from high to low. Select 95% of the values from middle of the distribution. Removes the extremes. This results in a “Bootstrap Confidence Interval” Want this to not contain zero. Our Example: If the interval contained zero, we cannot be confident that the difference between the sample means is not zero. Examine the Distribution of Correlations: Mean and SD of the correlation distribution: Examine the distribution: Correlation of Original Data Vs. So what did we learn? Bootstrapping is interesting. Provides a distribution of statistical values from which to draw conclusions. Many applications based on the same technique but done for different purposes.
Pages to are hidden for
"Bootstrapping for statistical inference. T-Tests"Please download to view full document