3 Types of Distributions
1. Population
– Distribution of all points in the population
Means Problem
2. Sample Data / Data
– Distribution of one particular sample
3. Sampling
– Distribution of the sample means of a given size n
Distribution Shapes
• Sample Data (Data Distribution) is the same shape as the population
– If population is skewed, so is the sample data – If population is normal, so is the sample data
Sampling Distribution (HW 6.6-6.4)
• 16 subjects are from a population skewed right, with mean 40 and s.d. 8. • Shape of Sample Data Skewed right (population) • Mean of Sampling Distribution 40 (population) • St. Dev. Of Sampling Distribution 8 / sqrt(16) = 2 • Shape of Sampling Distribution (and why):
No conclusion (too small a sample, and population shape isn’t normal)
• Sampling Distribution is normal if…
– Population is normal, or… – n > 30 (by the Central Limit Theorem) – Otherwise no conclusion about shape
Sampling Distribution (HW 6.6-6.4)
• 100 subjects are from a population skewed left, with mean 40 and s.d. 8. • Shape of Sample Data Skewed left (population) • Mean of Sampling Distribution 40 (population) • St. Dev. Of Sampling Distribution 8 / sqrt(100) = .8 • Shape of Sampling Distribution (and why): Normal (Central Limit Theorem)
Mean & Standard Error Properties
As the sample size n increases…
• • • The mean of the sampling distribution does not change. The standard error (s.d. of the sampling distribution) decreases. Example: ! > " (larger denominator, smaller overall fraction)
Similarly, as the sample size decreases... • The mean of the sampling distribution does not change. • The standard error increases (the opposite).
Distributions (HW 6.4-6.6B)
• The size of a badger colony follows a skewed left distribution with mean 20 and s.d. 4. A sample of 36 colonies is selected, and this sample has mean 16 and s.d. 5.5. • What is the center and spread for the population?
Distributions (HW 6.4-6.6B)
• What is the center and spread for the sampling distribution with size 36? ! 4
µ = 20
n
=
36
= .66667
• What shape is the sampling distribution?
µ = 20
!=4
Approximately normal, since n > 30 (CLT)
• Suppose now we adjust the sample size, and the new resulting sampling distribution has a standard error of 1. Did we use a larger or smaller sample size? Smaller, since standard error increases with a smaller sample size • How will the mean of the new sampling distribution change? It stays the same: it does not depend on sample size
• What is the center and spread for the sample data?
x = 16
s = 5.5
• What shape is the sample data? Skewed left, same as the population
StatCrunch (HW 6.4-6.6A)
• The average household temperature in Vancouver is 67.6 degrees, and the s.d. is 4.2. A sample of 51 households is selected. • What’s the probability the average of this sample will be above 68.1? Fill in the boxes.
StatCrunch (HW 6.4-6.6A)
• Sampling distribution with n = 51 • Mean = 67.6 (population’s) • S.D. = standard error
StatCrunch (HW 6.4-6.6A)
• What’s the probability the average of the sample with be within 1.5 degrees of the population mean? • Hint: draw a sketch!
StatCrunch (HW 6.4-6.6A)
1 ! 2 (.00538 ) = .98924
Proportions Problem
Computing Square Roots
• It is very important to type in square roots on your calculator correctly, especially with the proportions standard errors • Example: compute .20 (1 ! .20 )
64 • One safe way is to first compute .20(1 - .20) / 64 = .0025, then square root the result: sqrt(.0025) = .05 • Otherwise, use parentheses wisely
Correct: sqrt(.20*(1 - .20)/64) = .05 Correct: sqrt(.20*.80/64) = .05 Incorrect: sqrt(.20)(1 - .20)/64 = .00559 Incorrect: sqrt(.20(1 - .20)) / 64 = .00625
Proportions (HW 6.4-6.6A)
• 60% of students at an academy in Vancouver are female. In a random sample of 55 students, 26 of them are female. • Let 1 = female and 0 = male. • Identify the population distribution of gender. X P(X) 1 .60 0 .40 = 1 - .60
Proportions (HW 6.4-6.6A)
• Identify the data distribution of gender. X P(X) 1 26 / 55 = .47273 0 1 - .47273 = .52727 • What is the mean & standard error of the sampling distribution of the sample proportion?
• Is the sampling distribution approximately normal? np = 55(.60) = 22 and n(1-p) = 55(.40) = 33, so yes
Notation
We use different letters for population parameters versus sample parameters.
Notation
What symbol is used to denote the population mean?
µ (population mean)
µ = population mean x = sample mean s = sample st. dev. ! = population st. dev. p = population proportion ! = sample proportion p
Know the differences among these!
What symbol is used to estimate the population proportion?
! (the sample proportion estimates p) p
What symbol is used to describe the spread in one sample? s (sample standard deviation)
Empirical Rule
• Quick Flashback to Test 1… • The population mean is 55, and s.d. is 12. Find an interval within which about 95% of the population will fall. • Lower Limit: 55 – 2(12) = 31 • Upper Limit: 55 + 2(12) = 79 • So our interval would be (31,79)
Empirical Rule (HW 6.4-6.6A)
• Same idea, but with a normal sampling distribution Mean is .56 and the standard error is .0086. Find an interval within which the sample mean will almost certainly fall. • Recall: standard error is the s.d. of the sampling distribution • Lower Limit: .56 – 3(.0086) = .5342 • Upper Limit: .56 + 3(.0086) = .5858 • So the interval is (.5342, .5858)
Confidence Intervals
• Calculate the sample mean/proportion in your sample • Point Estimate = sample mean/proportion • Calculate the width, based on level of confidence and standard error • You get a range of plausible values for the true population mean/proportion
C.I. for Proportions
Properties of a C.I.
• The sample proportion/mean is ALWAYS inside the confidence interval! • In fact, it’s always right in the center
Interpretation of a C.I.
• Any number falling within a confidence interval is a plausible value for the true population mean, while any number outside the interval is not. • Example: a 98% interval is (60, 75) – With probability 98%, the population mean is somewhere between 60 and 75 – We can conclude, with 98% confidence, that the true mean is above any number less than 60 – Similarly, we can conclude that the true mean is below any number greater than 75 – But we cannot rule out any numbers in this interval
• The population proportion/mean may or may not be inside the confidence interval
Interpretation of a C.I.
• The main interpretation of a 95% interval is as follows: • We are 95% certain the population mean/proportion is somewhere inside the interval. – The population mean, while unknown, is fixed – What changes is the interval • Warning: It is incorrect to say “The population mean is in the confidence interval 95% of the time.” – This is because this statement implies that the population mean changes and sometimes happens to be in the interval – Incorrect because the population mean is fixed
Another Way to Interpret A C.I.
• A 95% C.I. also means that about 95% of all C.I.s constructed contain the true population proportion/mean, and about 5% do not • A 99% C.I. means that about 99% of all C.I.s constructed contain the true population proportion/mean, and about 1% do not • Example: 1000 intervals
– At 95%, about 950 (maybe 940-960) contain the true proportion – At 99%, about 990 (maybe 985-995) contain the true proportion
C.I. (HW 7.1-7.2)
• The annual salaries of 100 randomly selected people in Vancouver have a mean of $49,000 and a margin of error of $8000 with 95% confidence. • Find the point estimate for this sample. 49000 • Construct the 95% C.I.
Determining z
• z = level of confidence • 95% C.I. : z = 1.96 • 99% C.I. : z = 2.58 • To get these numbers… – 95%, 5% is left over – Half of that is 2.5% – P (z >= ?) = .025 in StatCrunch • What about 85%? – 85%, 15% is left over – Half of that is 7.5% – So .075 in right box
( 49000 ! 8000, 49000 + 8000 ) = ( 41000, 57000 )
We are 95% certain the true salary average falls somewhere between $41000 and $57000.
Proportions C.I. (HW 7.1-7.2)
• A random sample of 200 people were asked if they believed in the Loch Ness Monster. 160 said yes. • Find a point estimate for the proportion of people who said yes. ! = 160 = .8 p 200 • Find the standard error.
! 1! ! p p n
Proportions C.I. (HW 7.1-7.2)
• Find the margin of error for a 95% C.I. ! 1" ! p p 95%, so 1.96 ! = 1.96 (.02828 ) = .05544 n • Construct the 95% C.I.
(
)
(.8 ! .05544,.8 + .05544 ) = (.74456,.85544 )
• Can we conclude that more than 70% of all people believe in the Loch Ness Monster? Yes, because .70 is beneath this interval • How about less than 88%? Yes, because .88 is above this interval
(
)=
.8 (1 ! .8 ) = .02828 200
Proportions C.I. (HW 7.1-7.2)
• Same problem: now suppose another, different sample, also of size 200, is taken, and the confidence interval from this new sample is (.76, .80). • If possible, find the population proportion and the new sample proportion. We can’t compute the population proportion: it’s unknown The sample proportion’s in the center:
.76 + .80 = .78 = ! p 2
C.I. Properties
• Increasing level of confidence (z) widens the interval • Decreasing level of confidence (z) shortens the interval
! ± z! p ! 1" ! p p n
(
)
• What is the new margin of error? The distance between the center and an endpoint: .80 - .78 = .02 • Compare this interval with our earlier 95% interval, which was (.74456, .85544). Is this new interval more likely a 91% or a 98% interval? 91% since shorter, yet same sample size
• Intuition: narrowing your field for the true proportion means you’re not as certain it really does fall inside the interval
C.I. Properties
• Increasing the sample size shortens the C.I. • Decreasing the sample size widens the C.I. • This is because standard error decreases as n increases, so the margin of error (width) decreases as well. ! 1" ! p p ! ± z! p n
Summary of C.I. Width Factors
Confidence Level (z) • As z increases, C.I. widens • As z decreases, C.I. shortens Sample Size (n) • As n increases, C.I. shortens • As n decreases, C.I. widens
(
)
• Intuition: a larger sample size gives a more accurate estimate and allows you to zero in on the true proportion.
• Assumptions for proportion C.I.:
1. Sample is randomly selected 2. 3.
Proportions C.I. in StatCrunch
• Stat > Proportions > One Sample > With Summary • # Successes and # Observations • C.I., level, StandardWald, Calculate • Tells you the C.I. and standard error • Does not tell you the margin of error
C.I. with Means
• Same general idea: • But we have a different formula:
• With proportions, use z • With means, use t
C.I. for Means
The T Calculator
• Only new feature: degrees of freedom • DF = n – 1 • Same strategy as before: with 95%... • 5% left over, and half of that is 2.5% • P(X >= ?) = .025 with 18 observations (DF = 17)
C.I. with Means
• T-values change as degrees of freedom change (unlike normal calculator) • Degrees of freedom = n – 1 ALWAYS • Assumptions for doing C.I. for means:
– Random sample – One of these two should be true:
• Sampling from a normal population • n > 30
• • •
C.I. with Means (HW 7.3-7.4)
480 people responded to a question on how many children they have: Mean = 3 Median = 2 S.D. = 1.78 Find the point estimate for the sample. Find the standard error of the sample.
• Suppose the 95% C.I. was (2.84,3.16). Choose an answer: With 95% confidence, the true mean lies (above, below, within) this interval. • Is it plausible that the true population mean is 2? Why or why not? No, because 2 is below the interval, not inside it
C.I. Means (HW 7.3-7.4)
• New Problem: The starting salaries of a sample are $34000, $44000, $54000. • Mean = 44000, S.D. = 10000 • Find the point estimate. • Find the standard error.
C.I. Means (HW 7.3-7.4)
• Find the margin of error for a 95% C.I. (t = 4.30265)
t! s = ( 4.30265 ) ( 5773.50269 ) = 24841.36135 n
s = 44000 ± 24841.36135 n = (19158.63865, 68841.36135 )
• Find the 95% C.I.
x±t!
• How many degrees of freedom?
3-1=2
• What effect will increasing the sample size have on the 95% interval? A larger sample size will make the interval narrower
C.I. Means (StatCrunch)
• With data: enter data in one column • Stat > T-Statistics > One Sample > With Data • Select var1, Next • C.I., level of confidence, calculate
C.I. Means (StatCrunch)
• Stat > T-Statistics > One Sample > With Summary • Enter sample mean, s.d., sample size • C.I., level of confidence, calculate
Choosing Sample Size
• Idea: We have a given confidence level and a desired margin of error • What sample size is needed to achieve that? • Formula is different for proportions and means (see formula sheet)
Sample Size Needed Formulas
n=
n= z= s= m=
z2 s2 m2
sample size needed z-score for confidence level sample standard deviation desired margin of error
• What do we choose for the sample proportion? 1. Proportion of a previous study p 2. If nothing is known, ! = .50
Sample Size (HW 7.3-7.4)
• We are interested in the proportion of students at a college that are affiliated with Greek life. We want to estimate it with probability .95 and within 0.06. • What sample size do we need, if no previous study is known?
Sample Size (HW 7.3-7.4)
• Now suppose we’re told that a previous study at Harvard said that 20% of students are involved with Greek life. • Before doing any calculations, will the required sample size for a margin of error of .06 be larger or smaller than before?
p Smaller, since ! = .50 is the “worst case scenario.” Any other choice will yield a smaller needed sample size.
! = .5 m = .06 z = 1.96 (95% confidence) p ! 1 ! ! z2 2 p p .5 (1 ! .5 ) (1.96 ) n= = = 266.77778 " 267 2 2 m (.06 )
(
)
• Find the new sample size, again with probability .95.
! = .20 m = .06 z = 1.96 (95% confidence) p ! 1 ! ! z2 2 p p .20 (1 ! .20 ) (1.96 ) n= = = 170.73778 " 171 2 2 m (.06 )
Round up, since sample size has to be a whole number
• All other things the same, if we instead estimate with probability .98, will we need more or fewer subjects? This will use a higher number
(
)
for z (from the normal calculator), so the numerator is bigger and therefore the whole fraction is bigger. Thus, we would need more subjects.
• What is the advantage to knowing a previous study proportion?
p We get to use a different choice for ! , which won’t require as large a sample size.
Sample Size (HW 7.3-7.4)
• We are estimating the average number of acres on a farm to within 23. In a previous study, the sample s.d. was 202 acres. • Find the sample size needed. m = 23 s = 202
Proportions Summary
Sampling Distribution Mean = p (population) S.D. = (standard error) Normal if sample size is large: np > 15 and n(1 - p) > 15 Confidence Interval Point Estimate = Standard Error = Level of Confidence: use z Margin of Error = z !
! 1" ! p p n
(
)
• All other things being equal, if we wanted a smaller margin of error, will we need more or fewer subjects? The denominator will be smaller, so the overall number will be bigger. Therefore we would need more subjects.
Finding Sample Size
Lower Limit =
! 1" ! p ( p) Upper Limit = ! p+ z! n
Means Summary
Sampling Distribution Mean = (population) S.D. = (standard error) Normal if population’s normal, or n > 30 Confidence Intervals Point Estimate = Standard Error = Level of confidence depends on t Margin of Error = Finding Sample Size Lower Limit = Upper Limit =
n=
z2 s2 m2