# Variance and Standard Deviation (7.5)

Document Sample

```					                                                                                                            10/29/2009

We discussed one statistic quantity, which
was the mean. These quantities are
Variance and                                              supposed to extract some understanding of
our data.
Standard
You probably have some intuition for what
Deviation (7.5)                                               the mean is. The standard deviation
measures how much, on average, our data
differs from the average.

1                                                   2

Here is the typical example. Consider the following                        In this case, the mean is
distributions for students’ scores on a quiz. In the first one,

1.0                                                               1.0

0.8                                                               0.8

0.6                                                               0.6
0.4                                                               0.4
0.2                                                               0.2

0 1 2 3 4 5 6 7 8 9 10                                           0 1 2 3 4 5 6 7 8 9 10
Total Score (out of 10)                         3
Total Score (out of 10)            4

1
10/29/2009

Here is a different distribution, where there are 5          Here we can also compute the mean score, and it is
different scores, and 20% of the students have scores of
2, 3, 4, 5, and 6.

1.0                                                          1.0

0.8                                                          0.8

0.6                                                          0.6

0.4                                                          0.4

0.2                                                          0.2

0 1 2 3 4 5 6 7 8 9 10                                         0 1 2 3 4 5 6 7 8 9 10
Total Score (out of 10)                      5
Total Score (out of 10)                      6

Continuing to denote the mean by μ, you
So, in both cases, the mean is the same, but                      would probably guess that the best way to
the distributions are quite different. There is                     define the average difference from the
a quantity that distinguishes between these                                       mean is
two, which measures the average “spread” of
the data about the mean.
However, notice that in the second example
It is called the standard deviation.
from before, the positive and negative
contributions will cancel, and so we will
take the square to eliminate this problem.
7                                                              8

2
10/29/2009

So we define the variance as                                           So, if you’d like to see some algebra performed
on random variables, we have that, for a
probability distribution

Just note that in the case of having some
sample from which we understand something
about our larger population, there is a slightly
different, better estimator for the sample
variance. I don’t want to mention here this
so as not to confuse you, but I’ll say some
Here we use linearity of expected value and
9
the fact that E(X) is a constant.          10

Example (7.5.7): This table has the relative frequency distribution for the weekly
sales of two businesses, A and B.
a.) Compute the population mean and the variance for each business.
b.) Which business has the better sales record?
c.) Which business has the more consistent sales record?

Relative freq.                                                               Relative freq.
Sales             A           B                                                      Sales     A           B     Putting these numbers
100             0.1        0.0             Part (a) is the only                      100     0.1        0.0      into a calculator, we
have that
101             0.2        0.2               one with actual                         101     0.2        0.2
102             0.3        0.0              work. Let’s first                        102     0.3        0.0
103             0.0        0.2              find the mean in                         103     0.0        0.2
104             0.0        0.1                 each case.                            104     0.0        0.1
105             0.2        0.2                                                       105     0.2        0.2
106             0.2        0.3                                              11
106     0.2        0.3                         12

3
10/29/2009

One way to find the variance now is to compute E(X2) in each
case. Let’s recall the computation we just did for E(X) first:

For part (b), we see that Business B has the
higher mean number of sales, so they have
the better sales record, and for part (c),
Business B also has a smaller variance, so
Now we just apply our formula for variance.                          they are more consistent as well.
13                                                14

Theorem (Chebychev’s Inequality):
Definition: The standard deviation is the
Let μ be the mean and let σ be the
square root of the variance. It is usually
standard deviation of a probability
denoted by the letter σ.
distribution. Then

You are probably more used to hearing
about the standard deviation than the
The bounds that the theorem yields are in
variance.
practice not so great…

15                                                16

4
10/29/2009

Example (7.5.12): An electronics firm determines that the number of   Example (7.5.12): An electronics firm determines that the number of
defective transistors in each batch averages 15 with standard         defective transistors in each batch averages 15 with standard
deviation 10. Suppose that 100 batches are produced. Estimate the     deviation 10. Suppose that 100 batches are produced. Estimate the
number of batches having between 0 and 30 defective transistors.      number of batches having between 0 and 30 defective transistors.

We’ll use Chebychev’s Inequality. Here we’re
trying to find the probability of being within
15 of the mean. That is, we’re trying to find

Our theorem says that this will be
So at least of the batches will have between
0 and 30 defective transistors, which is
17
approximately 56 batches.                             18

Recall that last time we had a simply formula                         Example: What is the probability of success for a binomial random
variable with 20 trials whose variance is 5?
for the expected value of a binomial random
variable, which was                                             We just need to plug the numbers
and
into the formula
There is also a simple formula for its
variance, which is
remembering that q = 1 – p.
Then we get the equation

19                                                                    20

5
10/29/2009

Example (7.5.15): The probability distribution for the sum of numbers obtained from tossing a
pair of dice is given in the table.
a.) Compute the mean and the variance of this probability distribution.
b.) Using the table, calculate the probability that the number is between 4 and 10, inclusive.
This has solution p = ½, but we can solve it        c.) Use the Chebychev inequality to estimate the probability that the number is between 4 and
10, inclusive.
explicitly…
Number                 2 3 4 5 6 7 8 9 10 11 12
Probability

Hopefully you can compute the mean easily
now. Just to make things clear, I’ll compute
the variance in two different ways.

21                                                                                             22

(a). The mean is given by                                             The variance is then

If we wanted to, we could have computed the
This is what you should have expected. Now                 variance using our original formula
let’s start computing the variance.

So let’s do that now, in case you like that
way better.
23                                                                                             24

6
10/29/2009

Number        2 3 4 5 6 7 8 9 10 11 12
Probability

For part (b), we have to calculate the
probability that the sum rolled is between 4
As expected, we get the same answer. Notice      and 10 inclusive, using the table. This is just
that the standard deviation is                 adding a few numbers together:

25                                                 26

For part (c), we will use Chebychev’s
inequality to estimate what this should be.
Remember that the inequality states                  This is because we want to find

Following the theorem with c = 3,
In our case, μ = 7 and σ = 2.42. We want to
estimate the probability that the number is
between 4 and 10 inclusive, and so we
should take c to be 3 .
So the bounds that we get are not that great.
27                                                 28

7

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 20 posted: 8/29/2010 language: English pages: 7