Variance and Standard Deviation (7.5)

Document Sample
Variance and Standard Deviation (7.5) Powered By Docstoc
					                                                                                                            10/29/2009




                                                                     We discussed one statistic quantity, which
                                                                       was the mean. These quantities are
          Variance and                                              supposed to extract some understanding of
                                                                                    our data.
            Standard
                                                                    You probably have some intuition for what
         Deviation (7.5)                                               the mean is. The standard deviation
                                                                    measures how much, on average, our data
                                                                            differs from the average.

                                                            1                                                   2




     Here is the typical example. Consider the following                        In this case, the mean is
distributions for students’ scores on a quiz. In the first one,
                   everyone received 40%.

1.0                                                               1.0

0.8                                                               0.8

0.6                                                               0.6
0.4                                                               0.4
0.2                                                               0.2



        0 1 2 3 4 5 6 7 8 9 10                                           0 1 2 3 4 5 6 7 8 9 10
            Total Score (out of 10)                         3
                                                                             Total Score (out of 10)            4




                                                                                                                    1
                                                                                                                    10/29/2009




      Here is a different distribution, where there are 5          Here we can also compute the mean score, and it is
  different scores, and 20% of the students have scores of
                        2, 3, 4, 5, and 6.

1.0                                                          1.0

0.8                                                          0.8

0.6                                                          0.6

0.4                                                          0.4

0.2                                                          0.2



        0 1 2 3 4 5 6 7 8 9 10                                         0 1 2 3 4 5 6 7 8 9 10
            Total Score (out of 10)                      5
                                                                           Total Score (out of 10)                      6




                                                                   Continuing to denote the mean by μ, you
 So, in both cases, the mean is the same, but                      would probably guess that the best way to
the distributions are quite different. There is                     define the average difference from the
 a quantity that distinguishes between these                                       mean is
two, which measures the average “spread” of
           the data about the mean.
                                                              However, notice that in the second example
        It is called the standard deviation.
                                                                 from before, the positive and negative
                                                                contributions will cancel, and so we will
                                                               take the square to eliminate this problem.
                                                         7                                                              8




                                                                                                                            2
                                                                                                                                      10/29/2009




              So we define the variance as                                           So, if you’d like to see some algebra performed
                                                                                        on random variables, we have that, for a
                                                                                                   probability distribution

  Just note that in the case of having some
sample from which we understand something
about our larger population, there is a slightly
  different, better estimator for the sample
 variance. I don’t want to mention here this
  so as not to confuse you, but I’ll say some
                                                                                      Here we use linearity of expected value and
          words about this in class.
                                                                                9
                                                                                            the fact that E(X) is a constant.          10




Example (7.5.7): This table has the relative frequency distribution for the weekly
                        sales of two businesses, A and B.
    a.) Compute the population mean and the variance for each business.
                 b.) Which business has the better sales record?
           c.) Which business has the more consistent sales record?

                     Relative freq.                                                               Relative freq.
  Sales             A           B                                                      Sales     A           B     Putting these numbers
   100             0.1        0.0             Part (a) is the only                      100     0.1        0.0      into a calculator, we
                                                                                                                          have that
   101             0.2        0.2               one with actual                         101     0.2        0.2
   102             0.3        0.0              work. Let’s first                        102     0.3        0.0
   103             0.0        0.2              find the mean in                         103     0.0        0.2
   104             0.0        0.1                 each case.                            104     0.0        0.1
   105             0.2        0.2                                                       105     0.2        0.2
   106             0.2        0.3                                              11
                                                                                        106     0.2        0.3                         12




                                                                                                                                              3
                                                                                                            10/29/2009




One way to find the variance now is to compute E(X2) in each
 case. Let’s recall the computation we just did for E(X) first:




                                                                  For part (b), we see that Business B has the
                                                                  higher mean number of sales, so they have
                                                                    the better sales record, and for part (c),
                                                                   Business B also has a smaller variance, so
  Now we just apply our formula for variance.                          they are more consistent as well.
                                                           13                                                14




                                                                  Theorem (Chebychev’s Inequality):
    Definition: The standard deviation is the
                                                                      Let μ be the mean and let σ be the
    square root of the variance. It is usually
                                                                      standard deviation of a probability
             denoted by the letter σ.
                                                                              distribution. Then

     You are probably more used to hearing
     about the standard deviation than the
                                                                  The bounds that the theorem yields are in
                   variance.
                                                                          practice not so great…

                                                           15                                                16




                                                                                                                    4
                                                                                                                                    10/29/2009




Example (7.5.12): An electronics firm determines that the number of   Example (7.5.12): An electronics firm determines that the number of
   defective transistors in each batch averages 15 with standard         defective transistors in each batch averages 15 with standard
deviation 10. Suppose that 100 batches are produced. Estimate the     deviation 10. Suppose that 100 batches are produced. Estimate the
 number of batches having between 0 and 30 defective transistors.      number of batches having between 0 and 30 defective transistors.

We’ll use Chebychev’s Inequality. Here we’re
trying to find the probability of being within
15 of the mean. That is, we’re trying to find


        Our theorem says that this will be
                                                                      So at least of the batches will have between
                                                                         0 and 30 defective transistors, which is
                                                                17
                                                                                approximately 56 batches.                             18




Recall that last time we had a simply formula                         Example: What is the probability of success for a binomial random
                                                                                variable with 20 trials whose variance is 5?
for the expected value of a binomial random
              variable, which was                                             We just need to plug the numbers
                                                                                            and
                                                                                           into the formula
       There is also a simple formula for its
                 variance, which is
                                                                                   remembering that q = 1 – p.
                                                                                     Then we get the equation

                                                                19                                                                    20




                                                                                                                                            5
                                                                                                                                           10/29/2009




                                                    Example (7.5.15): The probability distribution for the sum of numbers obtained from tossing a
                                                                                      pair of dice is given in the table.
                                                                 a.) Compute the mean and the variance of this probability distribution.
                                                      b.) Using the table, calculate the probability that the number is between 4 and 10, inclusive.
This has solution p = ½, but we can solve it        c.) Use the Chebychev inequality to estimate the probability that the number is between 4 and
                                                                                                 10, inclusive.
                explicitly…
                                                     Number                 2 3 4 5 6 7 8 9 10 11 12
                                                    Probability


                                                        Hopefully you can compute the mean easily
                                                        now. Just to make things clear, I’ll compute
                                                           the variance in two different ways.

                                               21                                                                                             22




         (a). The mean is given by                                             The variance is then




                                                     If we wanted to, we could have computed the
This is what you should have expected. Now                 variance using our original formula
      let’s start computing the variance.


                                                          So let’s do that now, in case you like that
                                                                          way better.
                                               23                                                                                             24




                                                                                                                                                       6
                                                                                             10/29/2009




                                                Number        2 3 4 5 6 7 8 9 10 11 12
                                                Probability


                                                    For part (b), we have to calculate the
                                                 probability that the sum rolled is between 4
As expected, we get the same answer. Notice      and 10 inclusive, using the table. This is just
       that the standard deviation is                 adding a few numbers together:




                                           25                                                 26




    For part (c), we will use Chebychev’s
 inequality to estimate what this should be.
    Remember that the inequality states                  This is because we want to find


                                                       Following the theorem with c = 3,
 In our case, μ = 7 and σ = 2.42. We want to
 estimate the probability that the number is
    between 4 and 10 inclusive, and so we
            should take c to be 3 .
                                                 So the bounds that we get are not that great.
                                           27                                                 28




                                                                                                     7