Confidence Interval Estimation
Document Sample


Definition
A point estimate of a parameter is the value of a
Confidence Interval Estimation statistic that is used to estimate the parameter.
A confidence-interval estimate of a parameter
consists of an interval of numbers obtained from a
point estimate of the parameter together with a
percentage that specifies how confident we are that
the parameter lies in the interval. The confidence
percentage is called the confidence level.
1 2
Interval Estimation Estimation Process
Population Random Sample
Confidence Interval Estimation for the Mean I am 95%
Mean confident that µ
(σ Known and Unknown) Mean, µ, is is between 40 &
unknown X = 50
60.
Confidence Interval Estimation for the Sample
Proportion Sample Size Estimation
3 4
Population Parameters
Estimated Confidence Interval Estimation
Estimate Population with Sample
Parameter... Statistic Provides Range of Values
_
Mean µ Based on Observations from 1 Sample
X
Gives Information about Closeness to
Proportion p ps
Unknown Population Parameter
2
Variance σ2 s Stated in terms of Probability
_ _
Difference µ - µ x - x Never 100% Sure
1 2 1 2
5 6
Elements of Confidence Confidence Limits for
Interval Estimation Population Mean
A Probability That the Population Parameter Parameter = µ = X ± Error
Statistic ± Its Error
Falls Somewhere Within the Interval.
X − µ = Error = µ − X
Confidence Interval Sample
Statistic X − µ Error
Z = =
σ X σ X
Error = Z σ x
Confidence Limit Confidence Limit
µ = X ± Zσ X
(Lower) (Upper)
7 8
Confidence Intervals Level of Confidence
X ± Z •σ X = X ± Z •
σ σx
_
Probability that the unknown
n
population parameter falls within the
_ interval
X
µ − 1.645σ x µ + 1.645σ x Denoted (1 - α) % = level of confidence
90% Samples
e.g. 90%, 95%, 99%
µ −1.96σ x µ +1.96σ x
95% Samples
α Is Probability That the Parameter Is Not
σ
µ − 2.58 x σ
µ + 2.58 x
99% Samples Within the Interval
9 10
Procedure Facts
Assumption
For small samples, say, of size less than
•Normal population or large sample; 15, the z-interval procedure should be
• σ is known. used only when the variable under
consideration is normally distributed
Step 1 : for a confidence level of 1 - α find Zα
or very close to being so.
Step 2 : the confidence interval for µ
For moderate-size samples, say, between
σ σ 15 and 30, the z-interval procedure can be
x − Zα / 2 to x + Zα / 2
n n used unless the data contain outliers (or
where n is the sample size; the interval is exact the variable is far from being normally
for normal population and an approximat ively distributed).
correct for large sample non normal population s 11 12
Facts Facts
For large samples, say, of size 30 or more, the If outliers are present but their removal is
z-interval procedure can be used essentially justified and results in a data set for which the
without restriction. However, if outliers are z-interval procedure is appropriate (see
present and their removal is not justified, the above), then the procedure can be used.
effect of the outliers on the confidence
interval should be examined; (compare the
confidence intervals obtained with and
without the outliers. If the effect is
substantial, then it is probably best to use a
different procedure or take another sample).
13 14
Factors Affecting Interval Width
Margin of Error
Data Variation Intervals Extend from
The margin of error for the mean under the assumptions
measured by σ X - Zσ
x
to X + Z σ
x
•Normal population or large sample;
• σ is known; Sample Size
is: H = Zα / 2
σ σX =σX / n
n
half of the length of the confidence interval.
Level of Confidence
(1 - α)
15 16
Confidence Intervals Confidence Intervals
(σ Known) (σ Unknown)
Assumptions Assumptions
Population Standard Deviation Is
Population Standard Deviation Is Unknown
Known
Population Is Normally Distributed
Population Must Be Normally Distributed
If Not Normal, use large samples
Use Student’s t Distribution
Confidence Interval Estimate
Confidence Interval Estimate
X − Zα / 2 •
σ
n
≤µ ≤ X + Zα / 2 •
σ
n
X − tα / 2,n−1 •
S
≤µ≤ X + tα / 2,n−1 •
S
n n
17 18
Student’s t Distribution Degrees of Freedom (df)
Standard
Normal Number of Observations that Are Free
to Vary After Sample Mean Has Been
Bell-Shaped calculated
degrees of freedom =
Symmetric
t (df = 13) Example n -1
‘Fatter’ Tails Mean of 3 Numbers Is 2 = 3 -1
t (df = 5) X1 = 1 (or Any Number) = 2
X2 = 2 (or Any Number)
Z X3 = 3 (Cannot Vary)
t Mean = 2
0
19 20
Interval Estimation
Student’s t Table σ Unknown
α/2 Assume: n = 3 df
Upper Tail Area
=n-1=2 A random sample of n = 25 has X =
α = .10 50 and s = 8. Set up a 95%
α/2 =.05
df .25 .10 .05 confidence interval estimate for µ.
1 1.000 3.078 6.314 S S
X − tα / 2 , n −1 • ≤ µ ≤ X + tα / 2 , n −1 •
.05 n n
2 0.817 1.886 2.920
8 8
50 − 2 . 0639 • ≤µ≤ 50 + 2 . 0639 •
3 0.765 1.638 2.353 25 25
0 t 46 . 69 ≤ µ ≤ 53 . 30
t Values 2.920
21 22
Procedure Project effort
Effort QQ Plot
Assumption Effort_month Duration_month Size_LOC Normal Q-Q Plot
16,7 23 6050
70
22,6 15,5 8363
32,2 14 13334
•Normal population or large sample;
60
3,9 9,2 5942
• σ is NOT known.
17,3 13,5 3315
50
67,7 24,5 38988
Sample Quantiles
10,1 15,2 38614
Step 1 : for a confidence level of 1 - α and a df = n - 1 find t α
40
19,3 14,7 12762
10,6 7,7 13510
59,5 15 26500
30
Step 2 : the confidence interval for x
Mean effort: 25.99
20
s s Standard deviation: 21.33
10
x − tα / 2 to x + tα / 2
n n
qt(0.025,9) gives the -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
where n is the sample size; the interval is exact Conf. Level 0.05 t= 2.262157
Theoretical Quantiles
for normal population and an approximatively 21.33
25.99 ± 2.262157 • = 25.99 ± 15.26
correct for large sample non normal populations 10
23 24
Estimation for Finite Confidence Interval Estimate
Populations Proportion
Assumptions Assumptions
Two Categorical Outcomes (faulty/not faulty –
Sample Is Large Relative to Population complex/easy)
n / N > .05 Population Follows Binomial Distribution
Use Finite Population Correction Factor Normal Approximation Can Be Used if:
n·p ≥ 5 & n·(1 - p) ≥ 5
Confidence Interval (Mean, σX Unknown)
Confidence Interval Estimate
X − tα / 2 , n −1 •
S N − n S N −n ps (1 − ps ) ps (1− ps )
n
•
N −1 ≤ µ X ≤ X + tα / 2 , n −1 •
n
•
N −1 ps − Z α / 2 •
n
≤ p≤ ps + Zα/ 2 •
n
25 26
Example: Estimating Proportion Faults
A random sample of 400 Voters showed Out of 17 modules there are 11 with more than 1 fault.
32 preferred Candidate A (i.e., 8%). Set Set up 99% confidence interval for p = .353
up a 95% confidence interval estimate for ps (1 − ps ) ps (1− ps )
p.
ps − Z α / 2 •
n
≤ p≤ ps + Zα/ 2 •
n
ps (1− ps )
ps − Zα / 2 •
ps (1− ps )
n
≤ p≤ ps + Zα / 2 •
n .353 ± 2.575 •
.353 (1 − .353 )
= .353 ± 0 . 3
17
.08 (1 − .08 ) Notice the 95% interval is:
. 08 − 1 . 96 •
. 08 (1 − . 08 )
400
≤ p≤ .08 + 1 .96 •
400
.353 (1 − .353 )
.353 ± 1 . 96 • = .353 ± 0 . 23
.053 ≤ p ≤ .107 17
27 28
Sample Size Basics
Too Big: Too Small: Notice that the half width depends on the sample size:
•Requires too •Won’t do
much resources the job S
S H = tα / 2,n−1 •
X ± tα / 2,n−1 • n
n
ps (1 − ps ) ps (1− ps )
ps ± Z α / 2 •
n
H = Zα / 2 •
n
29 30
Example: Sample Size Example: Sample Size for
for Mean Proportion
What sample size is needed to be 90% What sample size is needed to be within ± 5
(i.e., Z=1.645) confident of being correct with 90% confidence? Out of a population of
within ± 5? A pilot study suggested that 1,000 files, we randomly selected 100 of which
the standard deviation is 45. 30 were defective (Pr[fault>0] = 30%).
Z 2 p (1 − p ) 1.645 2 (.30 )(.70 )
Z 2σ 2 1645
.
2
45
2 n= = = 227 .3
n= = = 219.2 ≅ 220 error 2 .05 2
Error 2 5
2
≅ 228
Round Up
31 32
Related docs
Get documents about "