Docstoc

Bootstrapping for statistical inference. T-Tests

Document Sample
Bootstrapping for statistical inference. T-Tests Powered By Docstoc
					Bootstrapping for statistical inference.
               T-Tests.
               Jacob Seybert
                 9/17/09
A couple of quick notes:
 The “Run;” command in SAS
   Technically only need one such statement at the end of a
    program.
   In practical usage many people place after every proc.
     Allows for the submission of selected commands.
Note:
 Regarding hypothesis testing and the equality of variances:
   Confusion during lab regarding “If SDs are equal, estimate is:”
     This refers to an assumption that the variances of both populations are
      equal.
 Assumption of homogeneity of variance (Why the above
  caused some confusion)
   When sample sizes are equal or nearly equal size, relatively big
    differences in population variances have relatively small
    consequences for t-tests.
   If you cannot get equal, or close to equal, sample sizes in each
    group, you correct in the degrees of freedom.
Independent Sample T-Test
Using Proc TTEST
Proc TTEST;
class condition;      *This is the grouping variable which
                                       defines the two groups;
var variable ;
title ‘Independent t-test’;
RUN;
Using Proc GLM
Proc GLM data=Data1;
  Class categoricalvar; *This is the grouping variable which
                                        defines the two groups;
  Model dependentvar=independentvar;
  Means categoricalvar /hovtest; *This finds the means and SDs
                                for each group on the continuous
                                variable and also is used for post-
                                hoc tests. Hovtest runs a test for
                                the homogeneity of variance.
Run;
Compare:
Proc TTEST:




Proc GLM:
Tests for Equality of Variances
 “Folded F”
   Variance1(larger)/Variance2(smaller) = F.
   Very sensitive to violations of normality.




  10.09292/9.33082 = 1.17
Tests for Equality of Variances
 Levene’s Test of Homogeniety
   Runs an ANOVA on the deviations from the mean.



   Tested with an F distribution.
What to do if significant?
 T-test
   Examine the t-test statistic for unequal variances assumed,
    Satterthwaite.
      Just like the Pooled Standard Error method discussed in class.
                • Lab 7, Slide 11.


 ANOVA
   Method of unweighted means
   I will let Brannick explain that! (If he does.)
Tests for Normality
 Contained in the Proc Univariate command.
 Descriptive Statistics:
   Skew (positive=positive skew, etc)
   Kurtosis – Indicates how peaked the data are
 Provides 4 statistics:
   Shapiro-Wilk
        Want close to 1.
        Sample size: 7 ≥ N ≤ 2,000
   Kolmogorov-Smirnov
   Cramer-von Mises
   Anderson-Darling
SAS Tests for Normality
What to do if not normal?
 Look at how “not normal” the data is.
 Transform the data to be more normal
   Square-root
   Logarithmic
   Many others…
 Examine the robustness of your statistic to violations of
  normality.
   We already discussed implications of equality of variance.
 Bootstrap!
Questions about independent sample
t-tests?
 Homework 4
Bootstrapping!
Bootstrapping is cool and common.
 You will hear about it in:
   Univariate Statistics
   Multivariate Statistics
   Factor Analysis
   Structural Equation Modeling
   Item Response Theory
   … and more I haven’t been exposed to.
What is bootstrapping?
 Randomly sampling, with replacement, from an original
  dataset for use in obtaining statistical estimates.

   Start with a set of values.
   Randomly draw a value from the “population”.
     The value stays in the available population of values.
   Randomly draw another value from the population.
   Do this N number of times to fill your dataset.
   Perform an analysis on your dataset(s)
   Do this 10,000 times
   Utilize the results of your 10,000 analyses to draw conclusions.
Why bootstrap?
 Good question.


 Small sample size.
 Little to no parametric modeling.
 Non-normal distribution of the sample.
 A test of means for two samples.
   Not as sensitive to N.
What bootstrapping looks like:
http://luna.cas.usf.edu/~mbrannic/files/software/boots_ind
  _t.sas
http://luna.cas.usf.edu/~mbrannic/files/software/boots_cor
  rel.sas

 All done in SAS IML.
Bootstrap T-Test Explained
 2 Samples
 Create 10,000 dataset samples (2 for each cycle) from the
  original 2 samples.
 For each sample pair, run the t-test.
   Save the t value in a matrix
 Sort that matrix of 10,000 t-test values from high to low.
 Select 95% of the values from middle of the distribution.
   Removes the extremes.


 This results in a “Bootstrap Confidence Interval”
   Want this to not contain zero.
Our Example:




 If the interval contained zero, we cannot be confident that
  the difference between the sample means is not zero.
Examine the Distribution of t:
 Mean and SD of the t distribution:




 Examine the distribution:
T-Test of Original Data

 Vs.
Bootstrap Correlation Explained
 A sample of variable pairs.
 Create 10,000 dataset samples from the original sample pairs.
 For each dataset, find the correlation.
    Save the correlation values in a matrix
 Sort that matrix of 10,000 correlation values from high to low.
 Select 95% of the values from middle of the distribution.
    Removes the extremes.


 This results in a “Bootstrap Confidence Interval”
    Want this to not contain zero.
Our Example:




 If the interval contained zero, we cannot be confident that
  the difference between the sample means is not zero.
Examine the Distribution of
Correlations:
 Mean and SD of the correlation distribution:




 Examine the distribution:
Correlation of Original Data


 Vs.
So what did we learn?
 Bootstrapping is interesting.


 Provides a distribution of statistical values from which to
  draw conclusions.

 Many applications based on the same technique but done for
  different purposes.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:10/14/2012
language:English
pages:29