# Confidence Interval Estimation

Shared by:
Categories
-
Stats
views:
52
posted:
7/20/2010
language:
English
pages:
8
Document Sample

```							                                                Definition
A point estimate of a parameter is the value of a
Confidence Interval Estimation                statistic that is used to estimate the parameter.

A confidence-interval estimate of a parameter
consists of an interval of numbers obtained from a
point estimate of the parameter together with a
percentage that specifies how confident we are that
the parameter lies in the interval. The confidence
percentage is called the confidence level.

1                                                          2

Interval Estimation                              Estimation Process
Population        Random Sample
Confidence Interval Estimation for the Mean                                                I am 95%
Mean              confident that µ
(σ Known and Unknown)                           Mean, µ, is                            is between 40 &
unknown               X = 50
60.

Confidence Interval Estimation for the          Sample
Proportion Sample Size Estimation

3                                                          4
Population Parameters
Estimated                                                Confidence Interval Estimation
Estimate Population              with Sample
Parameter...                       Statistic               Provides Range of Values
_
Mean            µ                                            Based on Observations from 1 Sample
X
Gives Information about Closeness to
Proportion            p                  ps
Unknown Population Parameter
2
Variance              σ2                s                  Stated in terms of Probability
_   _
Difference        µ - µ                x - x                            Never 100% Sure
1    2            1       2

5                                                        6

Elements of Confidence                                   Confidence Limits for
Interval Estimation                                      Population Mean
A Probability That the Population Parameter               Parameter =                 µ = X ± Error
Statistic ± Its Error
Falls Somewhere Within the Interval.
X − µ = Error = µ − X
Confidence Interval  Sample
Statistic                                             X − µ        Error
Z =             =
σ   X       σ X

Error = Z σ         x

Confidence Limit            Confidence Limit
µ = X ± Zσ X
(Lower)                     (Upper)
7                                                        8
Confidence Intervals                                                      Level of Confidence
X ± Z •σ X = X ± Z •
σ                                      σx
_
Probability that the unknown
n
population parameter falls within the
_       interval
X
µ − 1.645σ x   µ + 1.645σ x                        Denoted (1 - α) % = level of confidence
90% Samples
e.g. 90%, 95%, 99%
µ −1.96σ x        µ +1.96σ x
95% Samples
α Is Probability That the Parameter Is Not
σ
µ − 2.58 x                                             σ
µ + 2.58 x
99% Samples                                      Within the Interval
9                                                  10

Procedure                                                              Facts
Assumption
For small samples, say, of size less than
•Normal population or large sample;                                    15, the z-interval procedure should be
• σ is known.                                                          used only when the variable under
consideration is normally distributed
Step 1 : for a confidence level of 1 - α find Zα
or very close to being so.
Step 2 : the confidence interval for µ
For moderate-size samples, say, between
σ                             σ                      15 and 30, the z-interval procedure can be
x − Zα / 2      to     x + Zα / 2
n                       n                               used unless the data contain outliers (or
where n is the sample size; the interval is exact                         the variable is far from being normally
for normal population and an approximat ively                             distributed).
correct for large sample non normal population s                11                                                     12
Facts                                                   Facts
For large samples, say, of size 30 or more, the         If outliers are present but their removal is
z-interval procedure can be used essentially            justified and results in a data set for which the
without restriction. However, if outliers are           z-interval procedure is appropriate (see
present and their removal is not justified, the         above), then the procedure can be used.
effect of the outliers on the confidence
interval should be examined; (compare the
confidence intervals obtained with and
without the outliers. If the effect is
substantial, then it is probably best to use a
different procedure or take another sample).

13                                                         14

Factors Affecting Interval Width
Margin of Error
Data Variation             Intervals Extend from
The margin of error for the mean under the assumptions
measured by σ              X - Zσ
x
to X + Z σ
x

•Normal population or large sample;
• σ is known;                                          Sample Size

is:          H = Zα / 2
σ                                  σX =σX / n
n
half of the length of the confidence interval.
Level of Confidence
(1 - α)
15                                                         16
Confidence Intervals                                           Confidence Intervals
(σ Known)                                                      (σ Unknown)
Assumptions                                                 Assumptions
Population Standard Deviation Is
Population Standard Deviation Is Unknown
Known
Population Is Normally Distributed
Population Must Be Normally Distributed
If Not Normal, use large samples
Use Student’s t Distribution
Confidence Interval Estimate
Confidence Interval Estimate
X − Zα / 2 •
σ
n
≤µ ≤   X + Zα / 2 •
σ
n
X − tα / 2,n−1 •
S
≤µ≤   X + tα / 2,n−1 •
S
n                             n
17                                                           18

Student’s t Distribution                                     Degrees of Freedom (df)
Standard
Normal                                           Number of Observations that Are Free
to Vary After Sample Mean Has Been
Bell-Shaped                                                       calculated
degrees of freedom =
Symmetric
t (df = 13)                 Example                     n -1
‘Fatter’ Tails                                                       Mean of 3 Numbers Is 2    = 3 -1
t (df = 5)             X1 = 1 (or Any Number) = 2
X2 = 2 (or Any Number)
Z              X3 = 3 (Cannot Vary)
t              Mean = 2
0
19                                                           20
Interval Estimation
Student’s t Table                                                                    σ Unknown
α/2              Assume: n = 3 df

Upper Tail Area
=n-1=2                              A random sample of n = 25 has X =
α = .10                        50 and s = 8. Set up a 95%
α/2 =.05
df     .25     .10     .05                                                           confidence interval estimate for µ.
1 1.000 3.078 6.314                                                                                                S                                                                                S
X − tα / 2 , n −1 •                       ≤ µ ≤ X + tα / 2 , n −1 •
.05                                            n                                                                                n
2 0.817 1.886 2.920
8                                                                                 8
50 − 2 . 0639 •                              ≤µ≤                            50 + 2 . 0639 •
3 0.765 1.638 2.353                                                                                                   25                                                                                25
0                    t                                        46 . 69 ≤       µ ≤ 53 . 30
t Values                                2.920
21                                                                                                                                  22

Procedure                                                                            Project effort
Effort QQ Plot
Assumption                                                                 Effort_month    Duration_month Size_LOC                                                   Normal Q-Q Plot
16,7                23         6050
70

22,6              15,5         8363
32,2                14        13334
•Normal population or large sample;
60

3,9              9,2         5942

• σ is NOT known.
17,3              13,5         3315
50

67,7              24,5        38988
Sample Quantiles

10,1              15,2        38614

Step 1 : for a confidence level of 1 - α and a df = n - 1 find t α
40

19,3              14,7        12762
10,6               7,7        13510
59,5                15        26500
30

Step 2 : the confidence interval for x
Mean effort: 25.99
20

s                    s                                         Standard deviation: 21.33
10

x − tα / 2    to     x + tα / 2
n                    n
qt(0.025,9) gives the                                                     -1.5   -1.0   -0.5    0.0      0.5    1.0    1.5

where n is the sample size; the interval is exact                                 Conf. Level 0.05 t= 2.262157
Theoretical Quantiles

for normal population and an approximatively                                                          21.33
25.99 ± 2.262157 •         = 25.99 ± 15.26
correct for large sample non normal populations                                                         10
23                                                                                                                                  24
Estimation for Finite                                                                             Confidence Interval Estimate
Populations                                                                                       Proportion
Assumptions                                                                                   Assumptions
Two Categorical Outcomes (faulty/not faulty –
Sample Is Large Relative to Population                                                        complex/easy)

n / N > .05                                                                             Population Follows Binomial Distribution

Use Finite Population Correction Factor                                                           Normal Approximation Can Be Used if:
n·p ≥ 5       &     n·(1 - p) ≥ 5
Confidence Interval (Mean, σX Unknown)
Confidence Interval Estimate
X − tα / 2 , n −1 •
S          N − n                                      S            N −n                    ps (1 − ps )                                        ps (1− ps )
n
•
N −1     ≤ µ X ≤ X + tα / 2 , n −1 •
n
•
N −1   ps − Z α / 2 •
n
≤ p≤               ps + Zα/ 2 •
n
25                                                                                      26

Example: Estimating Proportion                                                                    Faults
A random sample of 400 Voters showed                                                            Out of 17 modules there are 11 with more than 1 fault.
32 preferred Candidate A (i.e., 8%). Set                                                         Set up 99% confidence interval for p = .353
up a 95% confidence interval estimate for                                                                       ps (1 − ps )                                        ps (1− ps )
p.
ps − Z α / 2 •
n
≤ p≤               ps + Zα/ 2 •
n
ps (1− ps )
ps − Zα / 2 •
ps (1− ps )
n
≤ p≤         ps + Zα / 2 •
n                               .353 ± 2.575 •
.353 (1 − .353 )
= .353 ± 0 . 3
17

.08 (1 − .08 )          Notice the 95% interval is:
. 08 − 1 . 96 •
. 08 (1 − . 08 )
400
≤ p≤       .08 + 1 .96 •
400
.353 (1 − .353 )
.353 ± 1 . 96 •                    = .353 ± 0 . 23
.053 ≤ p ≤ .107                                                                                       17
27                                                                                      28
Sample Size                                                      Basics
Too Big:                                   Too Small:         Notice that the half width depends on the sample size:
•Requires too                               •Won’t do
much resources                                the job                                                                        S
S                    H = tα / 2,n−1 •
X ± tα / 2,n−1 •                                             n
n

ps (1 − ps )                             ps (1− ps )
ps ± Z α / 2 •
n
H = Zα / 2 •
n
29                                                                               30

Example: Sample Size                                              Example: Sample Size for
for Mean                                                          Proportion
What sample size is needed to be 90%                       What sample size is needed to be within ± 5
(i.e., Z=1.645) confident of being correct                 with 90% confidence? Out of a population of
within ± 5? A pilot study suggested that                   1,000 files, we randomly selected 100 of which
the standard deviation is 45.                              30 were defective (Pr[fault>0] = 30%).

Z 2 p (1 − p ) 1.645 2 (.30 )(.70 )
Z 2σ 2        1645
.
2
45
2                         n=                  =                     = 227 .3
n=             =                       = 219.2 ≅ 220                 error 2             .05 2
Error 2          5
2
≅ 228
Round Up
31                                                                               32

```
Related docs