Document Sample

Today’s Lecture • Interpreting data and problems to help select the correct statistical test • Introduction to the analysis of 3 or more variables The First Things that You Should Do When Given Data and a Problem • First Question: – What type of data do I have? – What is the level of measure? – How many sets of data are there? • Second Question: – What is being asked of me in the question? – Does the question mention any key words like compare or associate? Narrowing the Range of Possibilities • In our class, I have taught you what I viewed to be the most applicable tests to the various types of data that you will encounter. • There are entire groups of methods that deal with data forms that we didn't cover. • What this means is that your options on the exam (although seemingly large) are actually quite limited. • We spent the bulk of our time in hypothesis testing working on two types of statistical cases. • The first was comparisons of samples via their means, medians, distributions, variances, etc. • The second was the association of two variables at different levels of measurement. Samples and Variables • Samples are the portion of a population that is observed. • At their simplest, they are a representation of a larger group. • Variables are measurable phenomenon whose values change from observation to observation. • In statistics, samples of variables can exist for data at any level of measurement. • Variables are often associated with one another, such associations can be spurious or a potential source of causality. • If we were interested in comparing the AFC to the NFC, what would be the correct method. • If we look at the data (point differential for each Example team in the AFC and NFC) we can see that it is definitely a variable. AFC (PF-PA) NFC (PF-PA) • But when we look at the column head, it would 208 90 be easy to consider AFC and NFC to be a -26 41 categorical variable as well. -69 -8 • But is this a two variable case? No, it isn't really. -78 1 • There is only one variable here (point 93 52 differential). The categories are immaterial -6 15 because our comparison is between the NFC and 5 12 AFC. -46 -37 • The nature of the test that we would use assumes 118 20 in its null hypothesis that there is no difference 41 -24 between the AFC and NFC. 4 -13 • It assumes that they are two samples from the -23 -54 same population (the NFL). -22 26 • So AFC and NFC are not a nominal variable in 25 -17 the statistical sense. -101 -82 -25 -120 • Here we have Point Differential Another plotted vs Number of Wins Example • There are clearly at least two variables here • Any question or hypothesis would deal the association between two variables 10 9 8 7 6 Wins AFC 5 NFC 4 3 2 1 0 -200 -100 0 100 200 300 Point Differential Decisions for One Variable If we have data with one continuous variable, then we have a number of options in terms of analysis (all of which are essentially comparisons of samples to samples or samples to populations) First Question: How many Samples? One Sample Two Samples Three or More Samples One Sample Estimation of Parameters or Test of Distribution? Estimation Distribution One Sample Population Parameters Known? Goodness of Fit Yes No With one sample, the only options are estimation of population parameters (like the mean, variance or proportion), or comparisons of the sample distribution to a hypothesized theoretical distribution via a goodness of fit (most commonly done via a Chi-square Test) Normal- Distribution T-Distribution Chi-Square/K-S Two Samples Two Samples Question: Are samples dependent or independent of one another? Two Samples - Dependent or Paired Two Samples - Independent Question: Sample Size Question: Sample Size large (>30) samples small samples large (>30) samples small samples Question: Is the sample Question: Is the sample normally distributed normally distributed Question: Is the sample normally distributed Question: Is the sample normally distributed Check with K-S Test Check with S-W Test Check with K-S Test Check with S-W Test normal not normal normal not normal normal not normal normal not normal Questions: Are the variances Question: Are the variances equal equal Check with Ratio of Variances Check with Ratio of Variances Yes No Yes No T-Test (non- T-Test (non- Paired T- Wilcoxon Sign- Paired T- Wilcoxon Sign- T-Test (pooled pooled Wilcoxon- T-Test (pooled pooled Wilcoxon- Test Rank Test Test Rank Test variance) variance) Rank Sum variance) variance) Rank Sum With Two Samples, we have to ask a minimum of three questions to ask. Two Samples - Continued • Are the samples independent of one another (remember that paired cases require a slightly different approach) • How large are our samples – The larger the sample, the more likely that you will approach a normal distribution, larger samples are more robust with respect to assumptions – Different tests of normality work best on different samples sizes (Shapiro-Wilk for smaller samples, Kolmogorov- Smirnov for larger samples) – Non-parametric tests tend to require large sample approximations for large samples (the tables for large samples aren’t published) Two Samples - Continued • Is each sample normal in its distribution? – If one of your samples fails the test for normality, then it is almost always better to use a non-parametric test • If your samples are normal, then you will use a t- test, but the standard t-test pools the variance from each sample • Are your variances are roughly equal, if yes, then that is the correct statistic, but if they aren’t, then you will want to use a non-pooled variance T-test to compare the means of your samples Three or More Samples Three or more Samples Our course only covered 2 options for three or more samples Question: Sample Size large (>30) samples small samples You should note Question: Is the sample normally Question: Is the sample normally distributed distributed that I left out the Check with K-S Test Check with S-W Test ANOVA pretest normal not normal normal not normal for equality of variances (Levene’s Test) Analysis of Analysis of Variance, then T- Kruskal-Wallis, then Variance, then T- Kruskal-Wallis, then tests Wilcoxon Rank Sum tests Wilcoxon Rank Sum Three or More Samples • We only need to ask two questions: – What is our sample size? – Are all our samples normally distributed? • Once we determine the sample size and run the correct test for normality, we can select the appropriate test to compare samples. • If even one sample is not normal, then we should use the Kruskal Wallis in lieu of the ANOVA • If all samples are normal, then you have to run the Levene’s Test for equality of variance before the data can meet the assumptions for an ANOVA Three or More Samples - Continued • Remember that when you have completed your comparison of samples, that a rejection of the null hypothesis (that they are all the same) is only the first step • When you determine that there is a difference, you then have to find which samples differ via a series of T-tests (if normal) or Wilcoxon Rank Sums (if not normal) • Your work isn’t done until you have determined which samples differ significantly Two Variable Associations • We started looking at association with simple tests for independence. • Given two variables, we used a Chi-Square Goodness of Fit comparison of the observed data vs an expected distribution where the variables were completely independent. • From there we moved into measures of association or correlation to assess the strength and potentially the direction of the association Key Questions for any Association Problem • First: What is the level of measurement for your data? • The following question depends on your first answer – If nominal, then what is the size of your table – If ordinal and in categories, then what is the geometry of your table (square or rectangular) – If ordinal and in ranks, then no further questions – If interval ratio data, then is it normal Nominal Associations • If you have nominal data, then your best recourse is to test for independence between the nominal variables using a Chi-square goodness of fit test • Once you have determined if there is an association, you should use Phi to assess its strength if you have a 2x2 table and Cramer’s V if you have a larger than 2x2 table Ordinal Category Associations • If your data is in Ordinal Categories (with a clear hierarchy), then your biggest question is whether or not the table is symmetrical (2x2, 3x3, etc.) or assymmetrical (2x3, 3x4, etc.) – If it is symmetrical, then you use Kendall’s Tau-b, so you can include ties into your analysis – If it is assymetrical, then you use the less sensitive but more versatile Kendall’s Tau-c Ordinal Rank Associations • This type of data is continuous and can therefore be treated much like interval/ratio data. • The only difference is that instead of running your correlation on raw numbers, you run it on ranks via a Spearman’s Rank Correlation Interval Ratio Associations • The definitive parametric correlation is the Pearson’s Product Moment Correlation • However, this test requires both bivariate normality and a linear relationship so it if fails a test for normality or the scatter plot is clearly non-linear, then you should rank your data and use the non-parametric Spearman’s Rank Correlation Summary Table for Associations Tests of Independence Measures of Association Level of Measurement Strength Strength and Direction Nominal Category Data 2x2 Tables Phi 2x3 Tables or Larger Chi-Square Cramer's V Ordinal Category Data Symmetric Tables Kendall's Tau-b Kendall's Tau-b Assymetric Tables Kendall's Tau-c Kendall's Tau-c Ordinal Rank Data Spearman's Rho Spearman's Rho Interval Ratio Data Normally Distributed Pearson's r Pearson's r Not Normally Distributed Spearman's Rho Spearman's Rho Note that there is no measure that will determine the direction of the association in purely nominal data. But if your data is pseudo-nominal (ordinal) then you can make the determination by looking at the major diagonal and off diagonal of the table. If your data is potentially Ordinal, then you should consider a Kendall’s test in lieu of the Chi-square I promised on the first day that we would cover all of this: BeginData Describe No Test Yes One No Two Analysis Variables? Hypothesis? Variable? Variables? Yes No Organized in One Sample Tables Describe Distribution Two Measures of Samples Association Measures of Centrality Three or Analysis of More Variance Variables Measures of Dispersion Estimate Population End Data Values Analysis Association Between Three or More Variables • Given the tools that you now have, dealing with multiple dependent variables is only an extension of the more simple two variable analysis • Typically what we do is create a matrix of correlations between each of the variables and then observe their relationships to one another • The statistics are exactly the same, but we run them multiple times (once for each pair of variables) Example Output from SPSS Descriptiv e Statistics Correlations Mean Std. Deviation N VAR00001 1.5350 1.19661 20 VAR00001 VAR00002 VAR00003 VAR00001 Pearson Correlation VAR00002 54.6500 25.39224 20 1 -.567** .263 Sig. (2-tailed) .009 .263 VAR00003 13.0000 4.80132 20 N 20 20 20 VAR00002 Pearson Correlation -.567** 1 -.526* Sig. (2-tailed) .009 .017 VAR00003 N Pearson Correlation 20 .263 20 -.526* 20 1 Pearson’s r Sig. (2-tailed) .263 .017 N 20 20 20 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). Correlations VAR00001 VAR00002 VAR00003 Spearman's rho VAR00001 Correlation Coefficient 1.000 -.533* .162 Sig. (2-tailed) . .015 .495 N 20 20 20 Spearman’s Rho VAR00002 Correlation Coefficient Sig. (2-tailed) -.533* .015 1.000 . -.419 .066 N 20 20 20 VAR00003 Correlation Coefficient .162 -.419 1.000 Sig. (2-tailed) .495 .066 . N 20 20 20 *. Correlation is significant at the 0.05 level (2-tailed). The End -for now at least

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 10 |

posted: | 8/27/2012 |

language: | English |

pages: | 25 |

OTHER DOCS BY lanyuehua

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.