# Bootstrapping for statistical inference. T-Tests by malj

VIEWS: 4 PAGES: 29

• pg 1
```									Bootstrapping for statistical inference.
T-Tests.
Jacob Seybert
9/17/09
A couple of quick notes:
 The “Run;” command in SAS
 Technically only need one such statement at the end of a
program.
 In practical usage many people place after every proc.
 Allows for the submission of selected commands.
Note:
 Regarding hypothesis testing and the equality of variances:
 Confusion during lab regarding “If SDs are equal, estimate is:”
 This refers to an assumption that the variances of both populations are
equal.
 Assumption of homogeneity of variance (Why the above
caused some confusion)
 When sample sizes are equal or nearly equal size, relatively big
differences in population variances have relatively small
consequences for t-tests.
 If you cannot get equal, or close to equal, sample sizes in each
group, you correct in the degrees of freedom.
Independent Sample T-Test
Using Proc TTEST
Proc TTEST;
class condition;      *This is the grouping variable which
defines the two groups;
var variable ;
title ‘Independent t-test’;
RUN;
Using Proc GLM
Proc GLM data=Data1;
Class categoricalvar; *This is the grouping variable which
defines the two groups;
Model dependentvar=independentvar;
Means categoricalvar /hovtest; *This finds the means and SDs
for each group on the continuous
variable and also is used for post-
hoc tests. Hovtest runs a test for
the homogeneity of variance.
Run;
Compare:
Proc TTEST:

Proc GLM:
Tests for Equality of Variances
 “Folded F”
 Variance1(larger)/Variance2(smaller) = F.
 Very sensitive to violations of normality.

10.09292/9.33082 = 1.17
Tests for Equality of Variances
 Levene’s Test of Homogeniety
 Runs an ANOVA on the deviations from the mean.

 Tested with an F distribution.
What to do if significant?
 T-test
 Examine the t-test statistic for unequal variances assumed,
Satterthwaite.
 Just like the Pooled Standard Error method discussed in class.
• Lab 7, Slide 11.

 ANOVA
 Method of unweighted means
 I will let Brannick explain that! (If he does.)
Tests for Normality
 Contained in the Proc Univariate command.
 Descriptive Statistics:
 Skew (positive=positive skew, etc)
 Kurtosis – Indicates how peaked the data are
 Provides 4 statistics:
 Shapiro-Wilk
 Want close to 1.
 Sample size: 7 ≥ N ≤ 2,000
 Kolmogorov-Smirnov
 Cramer-von Mises
 Anderson-Darling
SAS Tests for Normality
What to do if not normal?
 Look at how “not normal” the data is.
 Transform the data to be more normal
 Square-root
 Logarithmic
 Many others…
 Examine the robustness of your statistic to violations of
normality.
 We already discussed implications of equality of variance.
 Bootstrap!
t-tests?
 Homework 4
Bootstrapping!
Bootstrapping is cool and common.
 You will hear about it in:
 Univariate Statistics
 Multivariate Statistics
 Factor Analysis
 Structural Equation Modeling
 Item Response Theory
 … and more I haven’t been exposed to.
What is bootstrapping?
 Randomly sampling, with replacement, from an original
dataset for use in obtaining statistical estimates.

 Randomly draw a value from the “population”.
 The value stays in the available population of values.
 Randomly draw another value from the population.
 Do this N number of times to fill your dataset.
 Perform an analysis on your dataset(s)
 Do this 10,000 times
 Utilize the results of your 10,000 analyses to draw conclusions.
Why bootstrap?
 Good question.

 Small sample size.
 Little to no parametric modeling.
 Non-normal distribution of the sample.
 A test of means for two samples.
 Not as sensitive to N.
What bootstrapping looks like:
http://luna.cas.usf.edu/~mbrannic/files/software/boots_ind
_t.sas
http://luna.cas.usf.edu/~mbrannic/files/software/boots_cor
rel.sas

 All done in SAS IML.
Bootstrap T-Test Explained
 2 Samples
 Create 10,000 dataset samples (2 for each cycle) from the
original 2 samples.
 For each sample pair, run the t-test.
 Save the t value in a matrix
 Sort that matrix of 10,000 t-test values from high to low.
 Select 95% of the values from middle of the distribution.
 Removes the extremes.

 This results in a “Bootstrap Confidence Interval”
 Want this to not contain zero.
Our Example:

 If the interval contained zero, we cannot be confident that
the difference between the sample means is not zero.
Examine the Distribution of t:
 Mean and SD of the t distribution:

 Examine the distribution:
T-Test of Original Data

 Vs.
Bootstrap Correlation Explained
 A sample of variable pairs.
 Create 10,000 dataset samples from the original sample pairs.
 For each dataset, find the correlation.
 Save the correlation values in a matrix
 Sort that matrix of 10,000 correlation values from high to low.
 Select 95% of the values from middle of the distribution.
 Removes the extremes.

 This results in a “Bootstrap Confidence Interval”
 Want this to not contain zero.
Our Example:

 If the interval contained zero, we cannot be confident that
the difference between the sample means is not zero.
Examine the Distribution of
Correlations:
 Mean and SD of the correlation distribution:

 Examine the distribution:
Correlation of Original Data

 Vs.
So what did we learn?
 Bootstrapping is interesting.

 Provides a distribution of statistical values from which to
draw conclusions.

 Many applications based on the same technique but done for
different purposes.

```
To top