Data Analysis

Document Sample
Data Analysis Powered By Docstoc
					Data Analysis

Leedy and Ormrod Ch. 11
Gray Ch. 14
The difference between Parametric
and Non-Parametric statistics
   Parametric Statistics:
     -appropriate for interval/ratio data
     -generalizable to a population
     -assumes normal distributions


   Non-Parametric Statistics:
     -used with nominal/ordinal data
     -not generalizable to a population
     -does not assume normal distributions
Contingency (Cross-Tabs) Analysis
and Chi-Square or Gamma/Tau-b
 - non-parametric (non-normal distributions)
  statistics
 Assumptions
     Nominal or ordinal (categorical) data
     Any type of distribution
   The hypothesis test: The null hypothesis:
    the two (or more) samples come from the
    same distribution
Contingency (cont.)
Conducting the Analysis:

   a. calculate percentages within the categories
    of the IV and compare across the categories of
    the DV. Are there differences in the outcomes?
   b. for nominal
     Chi-square  statistic: is the relationship (the above
      differences) real?
     Phi, Cramer's V, etc.: how strong is the relationship?
   c. for ordinal
     t-testfor gamma, tau-b: is the relationship (the above
      differences) real?
     Gamma, tau-b: how strong and what direction?
T-Tests (parametric) for Means and
Proportions
   The t-test is used to determine whether
    sample(s) have different means. Essentially, the
    t-test is the ratio between the sample mean
    difference and the standard error of that
    difference. The t-test makes some important
    assumptions:
     Interval/Ratiolevel data
     one or two levels of one or two variables
     normal distributions
     equal variances (relatively). Use the Levene’s test to
      determine whether variances are equal.
    T-tests (cont.)
   a. The one sample t-test:
       tests a sample mean against a known population mean The null hypothesis
        tests if the sample mean is equal to the population mean.
   b. The independent samples t-test:
     tests whether the mean of one sample is different from the mean of another
      sample. The null hypothesis tests if the mean of sample 1 is equal to the
      mean of sample 2.
     Note: With independent t-tests, you must pay attention to the standard error
      of the sample(s). There are two ways to estimate the standard error. The
      Levene test is used in SPSS to do this.
     Equal variances: the two samples are relatively equal in size & distribution
     Non-equal variances: the two samples are not equal in variance (large
      discrepancy)
   c. The paired group t-test (dependent or related samples)
     tests if two groups within the overall sample are different on the same
      dependent variable The null hypothesis tests if the mean of (var1 - var2) is
      equal to 0.
     Overall, you will be looking for the t-value and its corresponding p-value.
      Depending on your alpha level, you will accept/reject the null hypothesis
      based on these numbers.
    ANOVA (parametric)
   Analysis of Variance, or ANOVA, is testing the difference in
    the means among 3 or more different samples.
   One-way ANOVA Assumptions:
     One independent variable -- categorical with two+ levels
     Dependent variable -- interval or ratio
   Two-way ANOVA Assumptions
     Two or more independent variables -- categorical with two+
      levels
     Dependent variable -- interval or ratio level
     Analysis will include main effects and an interaction term.
   ANOVA is testing the ratio (F) of the mean squares between
    groups and within groups. Depending on the degrees of
    freedom, the F score will show if there is a difference in the
    means among all of the groups.
    ANOVA (cont.)
   One-way ANOVA will provide you with an F-ratio and its
    corresponding p-value. If there is a large enough difference
    between the between groups mean squares and the within
    groups mean squares, then the null hypothesis will be
    rejected, indicating that there is a difference in the mean
    scores among the groups. However, the F-ratio does not tell
    you where those differences are. You can do ad-hoc
    comparisons such as the Tukey-b, Bonferroni or Scheffe
    test to do this.

   Two-way ANOVA will provide you with an F-ratio and its
    corresponding p-value as well as F-ratios and p-values for
    each main effect and interaction term. When the interaction
    is significant, the interaction means (in SPSS, ask for this
    under options in GLM) should also be interpreted.
Correlation (parametric)
   Used to test the presence, strength and direction
    of a linear relationship among variables.
   Correlation is a numerical expression that signifies
    the relationship between two variables. Correlation
    allows you to explore this relationship by
    'measuring the association' between the variables.
   Correlation is a 'measure of association' because
    the correlation coefficient provides the degree of
    the relationship between the variables. Correlation
    does not infer causality! Typically, you need at
    least interval and ratio data. However, you can run
    correlation with ordinal level data with 5 or more
    categories.
    Correlation (cont.)
   a. The relationships: Essentially, there are four types of
    relationships: (1) positive, (2) negative, (3) curvilinear, and (4) no
    relationship.

   b. The hypotheses and tests: The correlation statistic (Pearson's
    r) tests the null hypothesis that there is no relationship between the
    variables.

   c. The Correlation Coefficient : Pearson's r, the correlation
    coefficient, is the numeric value of the relationship between
    variables. The correlation coefficient is a percentage and can vary
    between -1 and +1. If no relationship exists, then the correlation
    coefficient would equal 0. Pearson's r provides an (1) estimate of
    the strength of the relationship and (2) an estimate of the direction
    of the relationship.
       If the correlation coefficient lies between -1 and 0, it is a negative
        (inverse) relationship, 0 and +1, it is a positive relationship and is 0,
        there is no relationship The closer the coefficient lies to -1 or +1, the
        stronger the relationship.
Correlation (cont.)
   d. Coefficient of determination: Related to the
    correlation coefficient is the coefficient of
    determination. This statistic provides the
    percentage of the variance accounted for both
    variables (x & y). To calculate the determination
    coefficient, you square the r value. In other
    words, if you had an r of 90, your coefficient of
    determination would account for just 81 percent
    of the variance between the variables.
   e. Partial Correlation: When you need to
    'control' for the effect of variables, you can use
    partial correlation.
    Simple (Bivariate) and Multiple
    (Multivariate) Regression
   Regression is used to model, calculate, and predict
    the pattern of a linear relationship among two or
    more variables.
   There are two types of regression -- simple &
    multiple
   a. Assumptions
     Note:  Variables should be approximately normally
      distributed. If not, recode and use non-parametric
      measures.
     Dependent Variable: at least interval (can use ordinal if
      using summated scale)
     Independent Variable: should be interval. Independent
      variables should be independent of each other, not
      related in any way. You can use nominal if it is binary or
      'dummy' variable (0,1)
Regression (cont.)
   b. Tests
     Overall: The null tests that the regression
      (estimated) line no better predicting dependent
      variable than the mean line
     Coefficients (slope "b", etc.): That the estimated
      coefficient equals 0
   c. Statistics
     Overall: R-squared, F-test
     Coefficient: t tests
   d. Limitations
     Only addresses linear patterns
     Multicollinearity may be a problem
Useful Sources
   Agresti and Finley Statistical Methods
    for the Social Sciences

   Tabachnick and Fidell (2001) Using
    Multivariate Statistics

   For writing up the above statistics,
    “From Numbers to Words”

   Links at bottom of Soc302 webpage

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:12/10/2011
language:
pages:14