VIEWS: 14 PAGES: 10 POSTED ON: 10/16/2009
Vocab. 1-5, 7 Population – entire collection of events that are of interest External validity – extent to which results apply to the general population Internal validity – extent to experiment is accurately executed Descriptive statistics – describing a set of data Inferential statistics – significance and meaning of the data Parameter – measure that refers to an entire population Statistic – measure calculated from a sample of data Nominal scales – labeling of items (categorical) Ordinal scale – system of ranking (ordering) along a set continuum Interval scale – measurement scale with definitive differences between scale points Ratio scale – scale with a true zero point Symmetric – distributions with the same shape on both sides of the center Bimodal – distribution with two peaks Unimodal – distribution with only one peak Modality – term for number of peaks in a distribution Negatively skewed – tail to the left Positively skewed – tail to the right Skewness – degree of asymmetry Kurtosis – relative concentration of scores in the center, tails, and shoulders of a distribution Mesokurtic – normal distribution – Platykurtic – center of distribution is too flat Leptokurtic – too many scores in center of distribution and tails of distribution Measures of central tendency – Measures of location – measures that reflect where the distribution is centered Dispersion – variability – around mean, mode, or any other point Interquartile range – obtained by discarding the upper 25% and the lower 25% of the distribution and taking the range of what remains. Trimmed samples – samples with a certain percentage of the values in each tail removed Trimmed statistics – statistics calculated on trimmed samples Mean absolute deviation – sum of absolute deviations divided by N Sample variance – sum of deviations squared, and divided by N-1 Population variance – sum of deviations squared, and divided by N Standard deviation – positive square root of the variance for a sample/population Sufficient statistic – contains or uses all of the information in a sample Expected value – long range average of many, many samples Unbiased estimator – estimator whose expected value equals the parameter to be estimated Efficiency – degree of accuracy in estimating the parameter in question Resistance – degree to which an estimator is not influence by outliers Degrees of freedom – the number of pieces of independent data, a restriction imposed whenever an estimate is used Standard normal distribution – mean of 0, SD of 1 Sampling error – variability due to chance Sampling Distributions – degree of sample to sample variability we can expect by chance due to sampling error Sampling Distribution of the mean – Sample statistics – statistics derived directly from the data sample Test statistics – statistics derived from tests performed on the data sample Decision-making – deciding whether an event with X probability is likely or unlikely to cause rejection of H0 Rejection region – probabilistic area where the outcome less than or equal to the significance leads to a rejection of H0 Type I error – rejecting H0 when it is in fact true – probability designated as alpha (size of rejection area) Type II – failing to reject H0 when it is false, probability designated as beta Power – probability of rejecting H0 when it is actually false One-tailed (directional) test – rejection region is located in only one tail of the distribution – prediction indicates direction of deviation from mean Relative frequency view – the limit of the relative frequency of occurrences of the desired event that we approach as the number of draws increases Subjective probability – individual’s subjective belief in the occurrence of an event. Additive law of probability – Given a set of mutually exclusive events, the probability of the occurrence of one event or another is equal to the sum of their separate probabilities Density – height of the curve at different values of X. Combinatorics – branch of mathematics that deals with the number of ways that objects can be combined together Permutation – the ordering of items Combinations – no consideration of order, simply the combination of items Binomial distribution – distribution for trials that result in one of two mutually exclusive outcomes Sampling distribution of the mean - ? Central limit theorem – Given a population with mean and variance 2, the sampling distribution of the mean (the distribution of the sample means) will have a mean equal to (i.e. X ), a variance ( 2 ) equal to 2 /n , and a standard deviation ( x ) = equal to / n . The distribution will approach the normal distribution will approach the normal distribution as n, the sample size, increases x Uniform distribution – every value between 0 and 100 will be equally likely Standard error – standard deviation Matched samples – Repeated measures – subjects respond on two (or more?) occasions Related samples (correlated samples, paired samples, dependent samples) – repeated measures? Matched-sample t test – test to assess difference between two means from a matched sample Difference scores – difference between X1 and X2 for two subjects, compared across subjects Sampling distribution of difference between means – when pairs of sample means are drawn independently the variance sum law gives the distribution Variance sum law – the variance of a sum or difference of two independent variables is equal to the sum of their variances. Standard Error of Difference Between Means – X 1 X 2 2 2 X1 X 2 12 n1 2 2 n2 Weighted average – sample variances are weighted by their degrees of freedom (ni – 1) s2 p n1 1s12 n 2 1s22 n1 n 2 2 Pooled variance estimate – weighted average of two sample variances Effect size – commonly associated with d d= 1 2 point estimate – specific estimate of a parameter interval estimates – limits set to encompass the true (population) value of the mean confidence limits – limits enclosing a confidence interval confidence interval – a probabilistic interval that’s likely to contain given the data on hand 2 homogeneity of variance - 12 2 2 2 heterogeneous variance - 12 2 2 robust – test is relatively unaffected by moderate departures from the underlying assumptions Chapter 9 === correlation regression – random variable – variable beyond experimental control fixed variable – determined by experimenter linear regression models bivariate normal models scatterplot –scatter diagram – scattergram – representation of each subject by a point in two-dimensional space. X, Y are the individual’s scores on variables X and Y predictor – variable represented on the abscissa (x-axis) – variable from which predictions are made criterion – variable represented on ordinate (y-axis) – variable that is predicted regression lines – prediction of Yi for a given value of Xi, for the ith subject or observation correlation ( r ) degree to which the actual values of Y agree with the predicted values - degree to which points cluster around the regression line pearson product-moment correlation coefficient r covxy sx sy correlation coefficient in the population – () rho – adjusted correlation coefficient (radj) radj 1 (1 r 2 )(N 1) N 2 slope – amount of difference in Y associated with a one-unite difference in X intercept – value of Y when X = 0 normal equations a Y bX b covxy 2 sx standardized regression coefficient – slope coefficient for standardized data sum of squares of Y (SSy) = (Y Y) 2 standard error of estimate = sy.x – standard deviation of Y predicted from X – error of prediction 2 residual variance – error variance - sY X - unbiased estimate of the 2 corresponding parameter ( Y X ) conditional distribution – sets of Ys corresponding to a specific X, distribution of Y scores for those cases that meet a certain condition with respect to X proportional reduction in error (PRE) SSY SSY SSY r2 proportional improvement in prediction (PIP) – reduction in the size of the standard error PIP 1 (1 r 2 ) array – residual variance of Y conditional on a specific X homogeneity of variance in arrays – assumption that the variance of Y for each value of X is constant normality in arrays – values of Y corresponding to any specified value of X are normally distributed around Y conditional array – each Y for Xi or the Y that correspond to each specific X conditional distributions – distribution of Y conditional upon a specific value of X marginal distribution – all values of Y (X) regardless of X (Y) assumption of linearity of regression – the relationship between X and Y is linear, the line that best fits the data is a straight range restrictions – alter the correlation between X and Y in comparison to what the correlation would have been had the range not been restricted heterogeneous subsamples – Chapter 15 – Multiple Linear Regression Validities – correlation of each predictor with the criterion Collinearity – correlation of a variable with several other predictors Regression coefficients – Standardized regression coefficients – equal standard deviation, unit differences are comparable, intercept equals zero Residual variance – residual error – MSresidual MSerror - (Y Y ) 2 N p 1 Multivariate normal – joint distribution of multiple variables – (extension to multiple variables of the bivariate normal distribution described in chapter 9) Multiple correlation coefficient – simultaneous value of multiple predictors for a specific criterion Hyperspace – multidimensional space Regression surface – analog of the regression line or plane Partial correlation – correlation between two variables with one or more variables partialed out of both X and Y. Semipartial correlation – correlation between the criterion and a partialed predictor variable. Suppressor variable – a regression coefficient in this situation that is significantly negative Multivariate outliers – Distance – identifies potential outliers in the dependent variable Leverage – identifies potential outliers in the independent variable Influence – combines distance and leverage to identify unusually influential observations. Influential – if the regression surface would change markedly depending on the presence or absence of that observation Cooks D – measure of influence – function of the sum of the squared changes in bj that would occur if the ith observation were removed from the data analysis and the analysis rerun Studentized residuals – residuals that can be interpreted as stand t-statistics on (N-p-1) degrees of freedom Tolerance – degree to which one predictor can can itself be predicted by other predictors in the model Cross-correlation – correlation between one predictor and all other predictors Singular covariance – once predictor can be perfectly predicted from the others Variance inflation factor – VIF – degree to which the standard error of bj is increased because Xj is correlated with the other predictors All subsets regression – looks at all possible subsets of the predictor variables and chooses that set that is optimal in some way (such as maximizing R2 minimizing the mean square error Backward elimination – task proceeds in logical stepwise fashion – model includes all predictors – remove variable that contributes least to the model – rerun regression without that predictor – find variable with smallest contribution – remove – continue Stepwise regression – reverse of backward regression – variables are added and tested for their contribution to R until the addition of further variables produces no significant improvement Forward selection – similar to stepwise regression – but variables are not removed before addition of another variable (based on test of variable below or above “F to remove” Listwise deletion – deletion of an entire case based on the lack of single observation within that case Pairwise deletion – Multicollinearity – variables are highly correlated