Learning Center
Plans & pricing Sign in
Sign Out



									Vocab. 1-5, 7 Population – entire collection of events that are of interest External validity – extent to which results apply to the general population Internal validity – extent to experiment is accurately executed Descriptive statistics – describing a set of data Inferential statistics – significance and meaning of the data Parameter – measure that refers to an entire population Statistic – measure calculated from a sample of data Nominal scales – labeling of items (categorical) Ordinal scale – system of ranking (ordering) along a set continuum Interval scale – measurement scale with definitive differences between scale points Ratio scale – scale with a true zero point Symmetric – distributions with the same shape on both sides of the center Bimodal – distribution with two peaks Unimodal – distribution with only one peak Modality – term for number of peaks in a distribution Negatively skewed – tail to the left Positively skewed – tail to the right Skewness – degree of asymmetry

Kurtosis – relative concentration of scores in the center, tails, and shoulders of a distribution Mesokurtic – normal distribution – Platykurtic – center of distribution is too flat Leptokurtic – too many scores in center of distribution and tails of distribution Measures of central tendency – Measures of location – measures that reflect where the distribution is centered Dispersion – variability – around mean, mode, or any other point Interquartile range – obtained by discarding the upper 25% and the lower 25% of the distribution and taking the range of what remains. Trimmed samples – samples with a certain percentage of the values in each tail removed Trimmed statistics – statistics calculated on trimmed samples Mean absolute deviation – sum of absolute deviations divided by N Sample variance – sum of deviations squared, and divided by N-1 Population variance – sum of deviations squared, and divided by N Standard deviation – positive square root of the variance for a sample/population Sufficient statistic – contains or uses all of the information in a sample Expected value – long range average of many, many samples Unbiased estimator – estimator whose expected value equals the parameter to be estimated Efficiency – degree of accuracy in estimating the parameter in question

Resistance – degree to which an estimator is not influence by outliers Degrees of freedom – the number of pieces of independent data, a restriction imposed whenever an estimate is used Standard normal distribution – mean of 0, SD of 1 Sampling error – variability due to chance Sampling Distributions – degree of sample to sample variability we can expect by chance due to sampling error Sampling Distribution of the mean – Sample statistics – statistics derived directly from the data sample Test statistics – statistics derived from tests performed on the data sample Decision-making – deciding whether an event with X probability is likely or unlikely to cause rejection of H0 Rejection region – probabilistic area where the outcome less than or equal to the significance leads to a rejection of H0 Type I error – rejecting H0 when it is in fact true – probability designated as alpha (size of rejection area) Type II – failing to reject H0 when it is false, probability designated as beta Power – probability of rejecting H0 when it is actually false One-tailed (directional) test – rejection region is located in only one tail of the distribution – prediction indicates direction of deviation from mean Relative frequency view – the limit of the relative frequency of occurrences of the desired event that we approach as the number of draws increases Subjective probability – individual’s subjective belief in the occurrence of an event.

Additive law of probability – Given a set of mutually exclusive events, the probability of the occurrence of one event or another is equal to the sum of their separate probabilities Density – height of the curve at different values of X. Combinatorics – branch of mathematics that deals with the number of ways that objects can be combined together Permutation – the ordering of items Combinations – no consideration of order, simply the combination of items Binomial distribution – distribution for trials that result in one of two mutually exclusive outcomes Sampling distribution of the mean - ? Central limit theorem – Given a population with mean  and variance 2, the sampling distribution of the mean (the distribution of the sample means) will have a mean equal to  (i.e. X   ), a variance (  2 ) equal to  2 /n , and a standard deviation (  x ) = equal to  / n . The distribution will approach the normal distribution will approach the normal distribution as n, the sample size,   increases 

Uniform distribution – every value between 0 and 100 will be equally likely Standard error – standard deviation Matched samples – Repeated measures – subjects respond on two (or more?) occasions Related samples (correlated samples, paired samples, dependent samples) – repeated measures? Matched-sample t test – test to assess difference between two means from a matched sample



Difference scores – difference between X1 and X2 for two subjects, compared across subjects Sampling distribution of difference between means – when pairs of sample means are drawn independently the variance sum law gives the distribution Variance sum law – the variance of a sum or difference of two independent variables is equal to the sum of their variances. Standard Error of Difference Between Means –
1 X 2

2 2  X1 X 2 



2 2



Weighted average – sample variances are weighted by their degrees of freedom (ni – 1)
s2  p

n1 1s12  n 2 1s22
n1  n 2  2


Pooled variance estimate – weighted average of two sample variances Effect size – commonly associated with d d=
1  2 

point estimate – specific estimate of a parameter

interval estimates – limits set to encompass the true (population) value of the mean confidence limits – limits enclosing a confidence interval confidence interval – a probabilistic interval that’s likely to contain  given the data on hand
2 homogeneity of variance - 12   2   2


2 heterogeneous variance - 12   2   2

robust – test is relatively unaffected by moderate departures from the underlying assumptions  Chapter 9 === correlation regression – random variable – variable beyond experimental control fixed variable – determined by experimenter linear regression models bivariate normal models scatterplot –scatter diagram – scattergram – representation of each subject by a point in two-dimensional space. X, Y are the individual’s scores on variables X and Y predictor – variable represented on the abscissa (x-axis) – variable from which predictions are made criterion – variable represented on ordinate (y-axis) – variable that is predicted regression lines – prediction of Yi for a given value of Xi, for the ith subject or observation correlation ( r ) degree to which the actual values of Y agree with the predicted values - degree to which points cluster around the regression line pearson product-moment correlation coefficient


covxy sx sy


correlation coefficient in the population – () rho – adjusted correlation coefficient (radj)
radj  1 (1 r 2 )(N 1) N 2


slope – amount of difference in Y associated with a one-unite difference in X intercept – value of Y when X = 0

normal equations 

a  Y  bX b covxy 2 sx

standardized regression coefficient – slope coefficient for standardized data

sum of squares of Y (SSy) =

(Y Y)


standard error of estimate = sy.x – standard deviation of Y predicted from X – error of prediction 
2 residual variance – error variance - sY X - unbiased estimate of the 2 corresponding parameter (  Y X )

conditional distribution – sets of Ys corresponding to a specific X,  distribution of Y scores for those cases that meet a certain condition with  respect to X proportional reduction in error (PRE) SSY  SSY SSY

r2 


proportional improvement in prediction (PIP) – reduction in the size of the standard error
PIP  1 (1 r 2 )

array – residual variance of Y conditional on a specific X

homogeneity of variance in arrays – assumption that the variance of Y for each value of X is constant normality in arrays – values of Y corresponding to any specified value of X  are normally distributed around Y conditional array – each Y for Xi or the Y that correspond to each specific X

conditional distributions – distribution of Y conditional upon a specific value of X marginal distribution – all values of Y (X) regardless of X (Y) assumption of linearity of regression – the relationship between X and Y is linear, the line that best fits the data is a straight range restrictions – alter the correlation between X and Y in comparison to what the correlation would have been had the range not been restricted heterogeneous subsamples – Chapter 15 – Multiple Linear Regression Validities – correlation of each predictor with the criterion Collinearity – correlation of a variable with several other predictors Regression coefficients – Standardized regression coefficients – equal standard deviation, unit differences are comparable, intercept equals zero

Residual variance – residual error – MSresidual MSerror -

 (Y  Y )



N  p 1

Multivariate normal – joint distribution of multiple variables – (extension to multiple variables of the bivariate normal distribution described in chapter 9)  Multiple correlation coefficient – simultaneous value of multiple predictors for a specific criterion Hyperspace – multidimensional space Regression surface – analog of the regression line or plane Partial correlation – correlation between two variables with one or more variables partialed out of both X and Y. Semipartial correlation – correlation between the criterion and a partialed predictor variable. Suppressor variable – a regression coefficient in this situation that is significantly negative Multivariate outliers – Distance – identifies potential outliers in the dependent variable Leverage – identifies potential outliers in the independent variable Influence – combines distance and leverage to identify unusually influential observations. Influential – if the regression surface would change markedly depending on the presence or absence of that observation Cooks D – measure of influence – function of the sum of the squared changes in bj that would occur if the ith observation were removed from the data analysis and the analysis rerun

Studentized residuals – residuals that can be interpreted as stand t-statistics on (N-p-1) degrees of freedom Tolerance – degree to which one predictor can can itself be predicted by other predictors in the model Cross-correlation – correlation between one predictor and all other predictors Singular covariance – once predictor can be perfectly predicted from the others Variance inflation factor – VIF – degree to which the standard error of bj is increased because Xj is correlated with the other predictors All subsets regression – looks at all possible subsets of the predictor variables and chooses that set that is optimal in some way (such as maximizing R2 minimizing the mean square error Backward elimination – task proceeds in logical stepwise fashion – model includes all predictors – remove variable that contributes least to the model – rerun regression without that predictor – find variable with smallest contribution – remove – continue Stepwise regression – reverse of backward regression – variables are added and tested for their contribution to R until the addition of further variables produces no significant improvement Forward selection – similar to stepwise regression – but variables are not removed before addition of another variable (based on test of variable below or above “F to remove” Listwise deletion – deletion of an entire case based on the lack of single observation within that case Pairwise deletion – Multicollinearity – variables are highly correlated

To top