Confidence Interval Estimation

Document Sample
scope of work template
							                                                Definition
                                              A point estimate of a parameter is the value of a
Confidence Interval Estimation                statistic that is used to estimate the parameter.

                                              A confidence-interval estimate of a parameter
                                              consists of an interval of numbers obtained from a
                                              point estimate of the parameter together with a
                                              percentage that specifies how confident we are that
                                              the parameter lies in the interval. The confidence
                                              percentage is called the confidence level.


                                         1                                                          2




Interval Estimation                              Estimation Process
                                              Population        Random Sample
Confidence Interval Estimation for the Mean                                                I am 95%
                                                                     Mean              confident that µ
(σ Known and Unknown)                           Mean, µ, is                            is between 40 &
                                                unknown               X = 50
                                                                                              60.

Confidence Interval Estimation for the          Sample
Proportion Sample Size Estimation


                                         3                                                          4
  Population Parameters
  Estimated                                                Confidence Interval Estimation
Estimate Population              with Sample
Parameter...                       Statistic               Provides Range of Values
                                       _
 Mean            µ                                            Based on Observations from 1 Sample
                                        X
                                                           Gives Information about Closeness to
Proportion            p                  ps
                                                           Unknown Population Parameter
                                            2
Variance              σ2                s                  Stated in terms of Probability
                                       _   _
Difference        µ - µ                x - x                            Never 100% Sure
                      1    2            1       2

                                                    5                                                        6




  Elements of Confidence                                   Confidence Limits for
  Interval Estimation                                      Population Mean
 A Probability That the Population Parameter               Parameter =                 µ = X ± Error
                                                        Statistic ± Its Error
    Falls Somewhere Within the Interval.
                                                                                X − µ = Error = µ − X
             Confidence Interval  Sample
                                  Statistic                                             X − µ        Error
                                                                                 Z =             =
                                                                                         σ   X       σ X

                                                                                   Error = Z σ         x

   Confidence Limit            Confidence Limit
                                                                                    µ = X ± Zσ X
       (Lower)                     (Upper)
                                                    7                                                        8
       Confidence Intervals                                                      Level of Confidence
X ± Z •σ X = X ± Z •
                       σ                                      σx
                                                               _
                                                                               Probability that the unknown
                        n
                                                                               population parameter falls within the
                                                                       _       interval
                                                                       X
                            µ − 1.645σ x   µ + 1.645σ x                        Denoted (1 - α) % = level of confidence
                                    90% Samples
                                                                                  e.g. 90%, 95%, 99%
                       µ −1.96σ x        µ +1.96σ x
                                95% Samples
                                                                                 α Is Probability That the Parameter Is Not
                   σ
           µ − 2.58 x                                             σ
                                                          µ + 2.58 x
                                99% Samples                                      Within the Interval
                                                                           9                                                  10




         Procedure                                                              Facts
      Assumption
                                                                                 For small samples, say, of size less than
          •Normal population or large sample;                                    15, the z-interval procedure should be
          • σ is known.                                                          used only when the variable under
                                                                                 consideration is normally distributed
       Step 1 : for a confidence level of 1 - α find Zα
                                                                                 or very close to being so.
       Step 2 : the confidence interval for µ
                                                                                 For moderate-size samples, say, between
                            σ                             σ                      15 and 30, the z-interval procedure can be
              x − Zα / 2      to     x + Zα / 2
                         n                       n                               used unless the data contain outliers (or
       where n is the sample size; the interval is exact                         the variable is far from being normally
       for normal population and an approximat ively                             distributed).
       correct for large sample non normal population s                11                                                     12
       Facts                                                   Facts
      For large samples, say, of size 30 or more, the         If outliers are present but their removal is
      z-interval procedure can be used essentially            justified and results in a data set for which the
      without restriction. However, if outliers are           z-interval procedure is appropriate (see
      present and their removal is not justified, the         above), then the procedure can be used.
      effect of the outliers on the confidence
      interval should be examined; (compare the
      confidence intervals obtained with and
      without the outliers. If the effect is
      substantial, then it is probably best to use a
      different procedure or take another sample).

                                                        13                                                         14




                                                               Factors Affecting Interval Width
       Margin of Error
                                                             Data Variation             Intervals Extend from
The margin of error for the mean under the assumptions
                                                             measured by σ              X - Zσ
                                                                                              x
                                                                                                  to X + Z σ
                                                                                                               x

      •Normal population or large sample;
      • σ is known;                                          Sample Size

is:          H = Zα / 2
                          σ                                  σX =σX / n
                          n
half of the length of the confidence interval.
                                                             Level of Confidence
                                                                (1 - α)
                                                        15                                                         16
      Confidence Intervals                                           Confidence Intervals
      (σ Known)                                                      (σ Unknown)
    Assumptions                                                 Assumptions
        Population Standard Deviation Is
                                                                  Population Standard Deviation Is Unknown
        Known
        Population Is Normally Distributed
                                                                  Population Must Be Normally Distributed
        If Not Normal, use large samples
                                                                Use Student’s t Distribution
    Confidence Interval Estimate
                                                                Confidence Interval Estimate
        X − Zα / 2 •
                       σ
                       n
                            ≤µ ≤   X + Zα / 2 •
                                                  σ
                                                  n
                                                                  X − tα / 2,n−1 •
                                                                                     S
                                                                                          ≤µ≤   X + tα / 2,n−1 •
                                                                                                                   S
                                                                                      n                             n
                                                           17                                                           18




      Student’s t Distribution                                     Degrees of Freedom (df)
                 Standard
                  Normal                                           Number of Observations that Are Free
                                                                   to Vary After Sample Mean Has Been
 Bell-Shaped                                                       calculated
                                                                                               degrees of freedom =
 Symmetric
                                       t (df = 13)                 Example                     n -1
‘Fatter’ Tails                                                       Mean of 3 Numbers Is 2    = 3 -1
                                              t (df = 5)             X1 = 1 (or Any Number) = 2
                                                                     X2 = 2 (or Any Number)
                                                      Z              X3 = 3 (Cannot Vary)
                                                      t              Mean = 2
                             0
                                                           19                                                           20
                                                                                             Interval Estimation
        Student’s t Table                                                                    σ Unknown
                                 α/2              Assume: n = 3 df

       Upper Tail Area
                                                  =n-1=2                              A random sample of n = 25 has X =
                                                      α = .10                        50 and s = 8. Set up a 95%
                                                      α/2 =.05
df     .25     .10     .05                                                           confidence interval estimate for µ.
1 1.000 3.078 6.314                                                                                                S                                                                                S
                                                                                      X − tα / 2 , n −1 •                       ≤ µ ≤ X + tα / 2 , n −1 •
                                                                     .05                                            n                                                                                n
2 0.817 1.886 2.920
                                                                                                                      8                                                                                 8
                                                                                   50 − 2 . 0639 •                              ≤µ≤                            50 + 2 . 0639 •
3 0.765 1.638 2.353                                                                                                   25                                                                                25
                                                      0                    t                                        46 . 69 ≤       µ ≤ 53 . 30
                  t Values                                2.920
                                                                           21                                                                                                                                  22




      Procedure                                                                            Project effort
                                                                                                                                Effort QQ Plot
     Assumption                                                                 Effort_month    Duration_month Size_LOC                                                   Normal Q-Q Plot
                                                                                          16,7                23         6050
                                                                                                                                                       70




                                                                                          22,6              15,5         8363
                                                                                          32,2                14        13334
       •Normal population or large sample;
                                                                                                                                                       60




                                                                                            3,9              9,2         5942

       • σ is NOT known.
                                                                                          17,3              13,5         3315
                                                                                                                                                       50




                                                                                          67,7              24,5        38988
                                                                                                                                    Sample Quantiles




                                                                                          10,1              15,2        38614

Step 1 : for a confidence level of 1 - α and a df = n - 1 find t α
                                                                                                                                                       40




                                                                                          19,3              14,7        12762
                                                                                          10,6               7,7        13510
                                                                                          59,5                15        26500
                                                                                                                                                       30




Step 2 : the confidence interval for x
                                                                                   Mean effort: 25.99
                                                                                                                                                       20




                    s                    s                                         Standard deviation: 21.33
                                                                                                                                                       10




         x − tα / 2    to     x + tα / 2
                     n                    n
                                                                                  qt(0.025,9) gives the                                                     -1.5   -1.0   -0.5    0.0      0.5    1.0    1.5


where n is the sample size; the interval is exact                                 Conf. Level 0.05 t= 2.262157
                                                                                                                                                                          Theoretical Quantiles




for normal population and an approximatively                                                          21.33
                                                                                 25.99 ± 2.262157 •         = 25.99 ± 15.26
correct for large sample non normal populations                                                         10
                                                                           23                                                                                                                                  24
         Estimation for Finite                                                                             Confidence Interval Estimate
         Populations                                                                                       Proportion
       Assumptions                                                                                   Assumptions
                                                                                                         Two Categorical Outcomes (faulty/not faulty –
           Sample Is Large Relative to Population                                                        complex/easy)

                 n / N > .05                                                                             Population Follows Binomial Distribution

       Use Finite Population Correction Factor                                                           Normal Approximation Can Be Used if:
                                                                                                                            n·p ≥ 5       &     n·(1 - p) ≥ 5
       Confidence Interval (Mean, σX Unknown)
                                                                                                     Confidence Interval Estimate
X − tα / 2 , n −1 •
                         S          N − n                                      S            N −n                    ps (1 − ps )                                        ps (1− ps )
                          n
                              •
                                    N −1     ≤ µ X ≤ X + tα / 2 , n −1 •
                                                                                n
                                                                                        •
                                                                                            N −1   ps − Z α / 2 •
                                                                                                                         n
                                                                                                                                      ≤ p≤               ps + Zα/ 2 •
                                                                                                                                                                             n
                                                                                              25                                                                                      26




        Example: Estimating Proportion                                                                    Faults
     A random sample of 400 Voters showed                                                            Out of 17 modules there are 11 with more than 1 fault.
    32 preferred Candidate A (i.e., 8%). Set                                                         Set up 99% confidence interval for p = .353
    up a 95% confidence interval estimate for                                                                       ps (1 − ps )                                        ps (1− ps )
    p.
                                                                                                   ps − Z α / 2 •
                                                                                                                         n
                                                                                                                                      ≤ p≤               ps + Zα/ 2 •
                                                                                                                                                                             n
                                                                          ps (1− ps )
             ps − Zα / 2 •
                              ps (1− ps )
                                   n
                                             ≤ p≤         ps + Zα / 2 •
                                                                               n                               .353 ± 2.575 •
                                                                                                                                  .353 (1 − .353 )
                                                                                                                                                   = .353 ± 0 . 3
                                                                                                                                        17

                                                                           .08 (1 − .08 )          Notice the 95% interval is:
       . 08 − 1 . 96 •
                          . 08 (1 − . 08 )
                                400
                                             ≤ p≤       .08 + 1 .96 •
                                                                                400
                                                                                                                                  .353 (1 − .353 )
                                                                                                                .353 ± 1 . 96 •                    = .353 ± 0 . 23
                                  .053 ≤ p ≤ .107                                                                                       17
                                                                                              27                                                                                      28
    Sample Size                                                      Basics
  Too Big:                                   Too Small:         Notice that the half width depends on the sample size:
•Requires too                               •Won’t do
much resources                                the job                                                                        S
                                                                                     S                    H = tα / 2,n−1 •
                                                                 X ± tα / 2,n−1 •                                             n
                                                                                      n

                                                                                    ps (1 − ps )                             ps (1− ps )
                                                                 ps ± Z α / 2 •
                                                                                         n
                                                                                                          H = Zα / 2 •
                                                                                                                                  n
                                                          29                                                                               30




    Example: Sample Size                                              Example: Sample Size for
    for Mean                                                          Proportion
     What sample size is needed to be 90%                       What sample size is needed to be within ± 5
    (i.e., Z=1.645) confident of being correct                 with 90% confidence? Out of a population of
    within ± 5? A pilot study suggested that                   1,000 files, we randomly selected 100 of which
    the standard deviation is 45.                              30 were defective (Pr[fault>0] = 30%).

                                                                     Z 2 p (1 − p ) 1.645 2 (.30 )(.70 )
       Z 2σ 2        1645
                      .
                            2
                                45
                                     2                         n=                  =                     = 227 .3
  n=             =                       = 219.2 ≅ 220                 error 2             .05 2
       Error 2          5
                            2
                                                                                     ≅ 228
                                                                                                   Round Up
                                                          31                                                                               32

						
Related docs