Chapter 7_ Random Variables

Document Sample
Chapter 7_ Random Variables Powered By Docstoc
					                               Chapter 7: Random Variables


Key Vocabulary:




   random variable

   discrete random variable

   probability distribution

   probability histogram

   density curve

   probability density curve

   continuous random variable

   uniform distribution

   normal distribution







   expected value

   Law of Large Numbers

   variance

   standard deviation
7.1     Discrete and Continuous Random Variables (pp.367-379)

1. What is a discrete random variable?



2. If X is a discrete random variable, what information does the probability distribution of X

   give?



3. In a probability histogram what does the height of each bar represent?



4. In a probability histogram what is the sum of the height of each bar?



5. What is a continuous random variable?



6. If X is a discrete random variable, how is the probability distribution of X described?



7. What is the area under a probability density curve equal to?



8. What is the difference between a discrete random variable and a continuous random

   variable?




9. If X is a discrete random variable, do            and             have the same value?

   Explain.




10. If X is a continuous random variable, do            and             have the same value?

   Explain.
11. How is a normal distribution related to probability distribution?




12. If a normal distribution is always a probability distribution, is a probability distribution

   always a normal distribution?
                               7.2    Means and Variances of Random Variables

       (pp.385-402)

1. Explain the difference between the notations       and       .


2. What is meant by the expected value of X ?


3. How do you calculate the mean of a discrete random variable X ?


4. Explain the Law of Large Numbers.



5. Suppose       = 5 and       = 10. According to the rules for means, what is    ?


6. Suppose       = 2. According to the rules for means, what is        ?


7. Explain how to calculate the variance of a discrete random variable X using the formula


                           .




8. Given the variance of a random variable, explain how to calculate the standard deviation.


9. Suppose       = 2 and        = 3 and X and Y are independent random variables. According to

   the rules for variances, what is       ? What is         ?
10.       Suppose   = 4. According to the rules for variances, what is   ? What is

      ?
Chapter 7
  Sec 7.1

Sample spaces need not consist of numbers. In statistics, we are most often interested in
numerical outcomes such as the "count" of an occurrence. We call X a random variable because
its values vary when the phenomenon is repeated. We use capital letters near the end of the
alphabet like X or Y.

A random variable is a variable whose value is a numerical outcome of a random phenomenon.
When a random variable describes a random phenomenon the sample space S just lists the
possible values of the random variable. There are two ways of assigning probabilities to the
values of a random variable that will dominate our application of probability as we study
statistical inference.

Random variables can be either discrete or continuous. A discrete random variable X has a
"countable number of possible values. The probability distribution of X lists the values and their
probabilities in table form. The probabilities must satisfy two requirement:
1) every probability pi is a number between 0 and 1
 2) p1 + p2 +,,,+pk = 1.
The probability of any event is found by adding the probabilities pi of the particular values xi that
make up the event.

In Chapters 1 and 2 we used histograms and density curves to describe finite quantitative data.
In this chapter we will use analogous methods to describe the probabilities of discrete (finite)
random variables. For discrete random variables histograms can be used to display probability
distributions instead of table form. We previously used histograms to picture the distributions of
data. The height of each bar shows the probability of the outcome at its base. Because the
heights are probabilities, they add to 1. All the bars in the histogram have the same width so the
areas of the bars also display the assignment of probability to outcomes. See Ex. 7.2 page 394
for more explanation.

For continuous random variables which have infinite values defined by a given interval other
methods must be employed. We cannot assign probabilities to EACH individual value of x and
then sum since there are INFINITE possible values. Instead we assign probabilities directly to
events using areas under a density curve. Any density curve has area exactly 1 underneath it,
corresponding to total probability 1.

More formally...

A continuous random variable X takes all values in an interval of numbers. The probability
distribution of X is described by a density curve. The probability of any event is the area under
the density curve and above the values of X that make up the event.

The probability model for a continuous random variable assigns probabilities to intervals
of outcomes rather than to individual outcomes. In fact all continuous probability
distributions assign probability 0 to every individual outcome. Only intervals of values
have positive probability.

We ignore the distinction between > and > when finding probabilities for continuous random
variables but keep the distinction when working with discrete random variables.

Because any density curve describes an assignment of probabilities, normal distributions are
probability distributions. Recall N(mean, standard deviation) for data which permitted
standardization of data to "z scores". Random variables can also be standardized to become a
standard normal random variable (Z) having distribution N(0,1) using the same formula.
Chapter 7
  Sec 7.2

Probability is the math language that describes the LONG-RUN regular behavior of random
phenomena.

Read the first sentence again until you understand every word.

The mean x of a set of observations is their ordinary average. The mean of a random variable
X is also an average of the possible values of X, but with an essential CHANGE to take into
account the fact that NOT all outcomes need be equally likely. See Ex.7.5 page 407. The mean
of X is the LONG RUN AVERAGE you expect for a very large number of times. Just as
probabilities are an idealized description of long run proportions, the mean of a probability
distribution describes the long run average outcome.

The common symbol for the mean of a probability distribution is x ...notice the subscript to
indicate this is the mean of a random variable X and not the mean of a normal distribution. The
mean of a random variable X is often called the EXPECTED VALUE of X. The mean of a
discrete random variable is the average of the possible outcomes, but a weighted average in
which each outcome is weighted by its probability. Because the probabilities add to 1, we have
total weight 1 to distribute among the outcomes. The probability distribution of a discrete
random variable is given in table form as on page 408 with row 1 giving variable values and row
2 giving corresponding probabilities. To find the mean of X, multiply each possible value by it
probability, then ADD. Symbolically, it looks like
                                    x = x1p1 + x2p2 + ...+ xkpk

The mean is a measure of the center of a distribution. The variance and the standard deviation
are the measures of spread that accompany the choice of the mean to measure center. To
distinguish between the variance of a data set (s2) and the variance of a random variable we need
to change our notation to x2. The definition of the variance of a random variable is similar to the
definition of the sample variance from Chapter 1. That is, the variance is an average of the
squared deviation (X - x)2 of the variable X from its mean. See page 410 for more detail.

The "LAW OF LARGE NUMBERS"..(holds true for any population)

Draw independent observations at random from any population with finite mean  Decide how
accurately you would like to estimate  As the number of observations drawn increases, the
mean  x of the observed values eventually approaches the mean of the population as
closely as you specified and then stays that close. (asymptotic - remember????) The law says
broadly that the average of many independent observations are stable and predictable and that
averaging over many individuals produces a stable result.

The mean of a random variable is the average of the variable in two senses:
1) by definition it is the average of the possible values, weighted by their probabilities
2) by the law of large numbers it is the long run average of many independent observations on
the variable.
We are unable to distinguish random behavior from systematic influences which points out the
need for statistical inference to supplement exploratory analysis of data. Probability calculations
can help verify that what we see in the data is more than a random pattern. How large is large
depends on the variability of the random outcomes. The more variable the outcomes, the more
trials are needed to ensure that the mean outcome is close to the distribution mean.


RULES FOR VARIANCES:

The mean of a sum of random variables is the sum of their means, BUT this addition is not
always true for variances. If random variables are independent the association between their
values is ruled out and their variances DO ADD. Two random variables X and Y are
independent if knowing that any event involving X alone did or did not occur tells us nothing
about the occurrence of any event involving Y alone. Probability models often assume
independence when the random variables describe outcomes that appear unrelated to each other.
You should ask in each instance whether the assumption for independence seems reasonable.

The exact rules for variance can be found on pages 420 and 421. See Combining normal random
variables on page 424.
Ch7 Supplement



Discrete and Continuous Random Variables:



A variable is a quantity whose value changes.



A discrete variable is a variable whose value is obtained by counting.



Examples:    number of students present

                       number of red marbles in a jar

                       number of heads when flipping three coins

                       students’ grade level



A continuous variable is a variable whose value is obtained by measuring.



Examples:    height of students in class

               weight of students in class

               time it takes to get to school

               distance traveled between classes



A random variable is a variable whose value is a numerical outcome of a random
phenomenon.



   ▪   A random variable is denoted with a capital letter
   ▪    The probability distribution of a random variable X tells what the possible values
        of X are and how probabilities are assigned to those values

   ▪    A random variable can be discrete or continuous



A discrete random variable X has a countable number of possible values.



Example: Let X represent the sum of two dice.




Then the probability distribution of X is as follows:



   X         2      3       4       5       6       7      8      9      10        11    12

 P(X)




To graph the probability distribution of a discrete random variable, construct a
probability histogram.
A continuous random variable X takes all values in a given interval of numbers.



   ▪   The probability distribution of a continuous random variable is shown by a
       density curve.
   ▪   The probability that X is between an interval of numbers is the area under the
       density curve between the interval endpoints
   ▪   The probability that a continuous random variable X is exactly equal to a
       number is zero
Means and Variances of Random Variables:



The mean of a discrete random variable, X, is its weighted average. Each value of X is
weighted by its probability.



To find the mean of X, multiply each value of X by its probability, then add all the
products.




The mean of a random variable X is called the expected value of X.




Law of Large Numbers:




As the number of observations increases, the mean of the observed values,              ,

approaches the mean of the population,         .




The more variation in the outcomes, the more trials are needed to ensure that              is

close to      .
Rules for Means:



If X is a random variable and a and b are fixed numbers, then




If X and Y are random variables, then




Example:

Suppose the equation Y = 20 + 100X converts a PSAT math score, X, into an SAT math
score, Y. Suppose the average PSAT math score is 48. What is the average SAT math
score?
Example:



Let                represent the average SAT math score.



Let                represent the average SAT verbal score.




                         represents the average combined SAT score. Then

                                                        is the average combined
total SAT score.
The Variance of a Discrete Random Variable:




If X is a discrete random variable with mean      , then the variance of X is




The standard deviation            is the square root of the variance.




Rules for Variances:



If X is a random variable and a and b are fixed numbers, then




If X and Y are independent random variables, then
Example:

Suppose the equation Y = 20 + 100X converts a PSAT math score, X, into an SAT math
score, Y. Suppose the standard deviation for the PSAT math score is 1.5 points. What
is the standard deviation for the SAT math score?




Suppose the standard deviation for the SAT math score is 150 points, and the standard
deviation for the SAT verbal score is 165 points. What is the standard deviation for the
combined SAT score?



*** Because the SAT math score and SAT verbal score are not independent, the rule for
     adding variances does not apply!

				
DOCUMENT INFO