Chapter 7: Random Variables Key Vocabulary: random variable discrete random variable probability distribution probability histogram density curve probability density curve continuous random variable uniform distribution normal distribution expected value Law of Large Numbers variance standard deviation 7.1 Discrete and Continuous Random Variables (pp.367-379) 1. What is a discrete random variable? 2. If X is a discrete random variable, what information does the probability distribution of X give? 3. In a probability histogram what does the height of each bar represent? 4. In a probability histogram what is the sum of the height of each bar? 5. What is a continuous random variable? 6. If X is a discrete random variable, how is the probability distribution of X described? 7. What is the area under a probability density curve equal to? 8. What is the difference between a discrete random variable and a continuous random variable? 9. If X is a discrete random variable, do and have the same value? Explain. 10. If X is a continuous random variable, do and have the same value? Explain. 11. How is a normal distribution related to probability distribution? 12. If a normal distribution is always a probability distribution, is a probability distribution always a normal distribution? 7.2 Means and Variances of Random Variables (pp.385-402) 1. Explain the difference between the notations and . 2. What is meant by the expected value of X ? 3. How do you calculate the mean of a discrete random variable X ? 4. Explain the Law of Large Numbers. 5. Suppose = 5 and = 10. According to the rules for means, what is ? 6. Suppose = 2. According to the rules for means, what is ? 7. Explain how to calculate the variance of a discrete random variable X using the formula . 8. Given the variance of a random variable, explain how to calculate the standard deviation. 9. Suppose = 2 and = 3 and X and Y are independent random variables. According to the rules for variances, what is ? What is ? 10. Suppose = 4. According to the rules for variances, what is ? What is ? Chapter 7 Sec 7.1 Sample spaces need not consist of numbers. In statistics, we are most often interested in numerical outcomes such as the "count" of an occurrence. We call X a random variable because its values vary when the phenomenon is repeated. We use capital letters near the end of the alphabet like X or Y. A random variable is a variable whose value is a numerical outcome of a random phenomenon. When a random variable describes a random phenomenon the sample space S just lists the possible values of the random variable. There are two ways of assigning probabilities to the values of a random variable that will dominate our application of probability as we study statistical inference. Random variables can be either discrete or continuous. A discrete random variable X has a "countable number of possible values. The probability distribution of X lists the values and their probabilities in table form. The probabilities must satisfy two requirement: 1) every probability pi is a number between 0 and 1 2) p1 + p2 +,,,+pk = 1. The probability of any event is found by adding the probabilities pi of the particular values xi that make up the event. In Chapters 1 and 2 we used histograms and density curves to describe finite quantitative data. In this chapter we will use analogous methods to describe the probabilities of discrete (finite) random variables. For discrete random variables histograms can be used to display probability distributions instead of table form. We previously used histograms to picture the distributions of data. The height of each bar shows the probability of the outcome at its base. Because the heights are probabilities, they add to 1. All the bars in the histogram have the same width so the areas of the bars also display the assignment of probability to outcomes. See Ex. 7.2 page 394 for more explanation. For continuous random variables which have infinite values defined by a given interval other methods must be employed. We cannot assign probabilities to EACH individual value of x and then sum since there are INFINITE possible values. Instead we assign probabilities directly to events using areas under a density curve. Any density curve has area exactly 1 underneath it, corresponding to total probability 1. More formally... A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event. The probability model for a continuous random variable assigns probabilities to intervals of outcomes rather than to individual outcomes. In fact all continuous probability distributions assign probability 0 to every individual outcome. Only intervals of values have positive probability. We ignore the distinction between > and > when finding probabilities for continuous random variables but keep the distinction when working with discrete random variables. Because any density curve describes an assignment of probabilities, normal distributions are probability distributions. Recall N(mean, standard deviation) for data which permitted standardization of data to "z scores". Random variables can also be standardized to become a standard normal random variable (Z) having distribution N(0,1) using the same formula. Chapter 7 Sec 7.2 Probability is the math language that describes the LONG-RUN regular behavior of random phenomena. Read the first sentence again until you understand every word. The mean x of a set of observations is their ordinary average. The mean of a random variable X is also an average of the possible values of X, but with an essential CHANGE to take into account the fact that NOT all outcomes need be equally likely. See Ex.7.5 page 407. The mean of X is the LONG RUN AVERAGE you expect for a very large number of times. Just as probabilities are an idealized description of long run proportions, the mean of a probability distribution describes the long run average outcome. The common symbol for the mean of a probability distribution is x ...notice the subscript to indicate this is the mean of a random variable X and not the mean of a normal distribution. The mean of a random variable X is often called the EXPECTED VALUE of X. The mean of a discrete random variable is the average of the possible outcomes, but a weighted average in which each outcome is weighted by its probability. Because the probabilities add to 1, we have total weight 1 to distribute among the outcomes. The probability distribution of a discrete random variable is given in table form as on page 408 with row 1 giving variable values and row 2 giving corresponding probabilities. To find the mean of X, multiply each possible value by it probability, then ADD. Symbolically, it looks like x = x1p1 + x2p2 + ...+ xkpk The mean is a measure of the center of a distribution. The variance and the standard deviation are the measures of spread that accompany the choice of the mean to measure center. To distinguish between the variance of a data set (s2) and the variance of a random variable we need to change our notation to x2. The definition of the variance of a random variable is similar to the definition of the sample variance from Chapter 1. That is, the variance is an average of the squared deviation (X - x)2 of the variable X from its mean. See page 410 for more detail. The "LAW OF LARGE NUMBERS"..(holds true for any population) Draw independent observations at random from any population with finite mean Decide how accurately you would like to estimate As the number of observations drawn increases, the mean x of the observed values eventually approaches the mean of the population as closely as you specified and then stays that close. (asymptotic - remember????) The law says broadly that the average of many independent observations are stable and predictable and that averaging over many individuals produces a stable result. The mean of a random variable is the average of the variable in two senses: 1) by definition it is the average of the possible values, weighted by their probabilities 2) by the law of large numbers it is the long run average of many independent observations on the variable. We are unable to distinguish random behavior from systematic influences which points out the need for statistical inference to supplement exploratory analysis of data. Probability calculations can help verify that what we see in the data is more than a random pattern. How large is large depends on the variability of the random outcomes. The more variable the outcomes, the more trials are needed to ensure that the mean outcome is close to the distribution mean. RULES FOR VARIANCES: The mean of a sum of random variables is the sum of their means, BUT this addition is not always true for variances. If random variables are independent the association between their values is ruled out and their variances DO ADD. Two random variables X and Y are independent if knowing that any event involving X alone did or did not occur tells us nothing about the occurrence of any event involving Y alone. Probability models often assume independence when the random variables describe outcomes that appear unrelated to each other. You should ask in each instance whether the assumption for independence seems reasonable. The exact rules for variance can be found on pages 420 and 421. See Combining normal random variables on page 424. Ch7 Supplement Discrete and Continuous Random Variables: A variable is a quantity whose value changes. A discrete variable is a variable whose value is obtained by counting. Examples: number of students present number of red marbles in a jar number of heads when flipping three coins students’ grade level A continuous variable is a variable whose value is obtained by measuring. Examples: height of students in class weight of students in class time it takes to get to school distance traveled between classes A random variable is a variable whose value is a numerical outcome of a random phenomenon. ▪ A random variable is denoted with a capital letter ▪ The probability distribution of a random variable X tells what the possible values of X are and how probabilities are assigned to those values ▪ A random variable can be discrete or continuous A discrete random variable X has a countable number of possible values. Example: Let X represent the sum of two dice. Then the probability distribution of X is as follows: X 2 3 4 5 6 7 8 9 10 11 12 P(X) To graph the probability distribution of a discrete random variable, construct a probability histogram. A continuous random variable X takes all values in a given interval of numbers. ▪ The probability distribution of a continuous random variable is shown by a density curve. ▪ The probability that X is between an interval of numbers is the area under the density curve between the interval endpoints ▪ The probability that a continuous random variable X is exactly equal to a number is zero Means and Variances of Random Variables: The mean of a discrete random variable, X, is its weighted average. Each value of X is weighted by its probability. To find the mean of X, multiply each value of X by its probability, then add all the products. The mean of a random variable X is called the expected value of X. Law of Large Numbers: As the number of observations increases, the mean of the observed values, , approaches the mean of the population, . The more variation in the outcomes, the more trials are needed to ensure that is close to . Rules for Means: If X is a random variable and a and b are fixed numbers, then If X and Y are random variables, then Example: Suppose the equation Y = 20 + 100X converts a PSAT math score, X, into an SAT math score, Y. Suppose the average PSAT math score is 48. What is the average SAT math score? Example: Let represent the average SAT math score. Let represent the average SAT verbal score. represents the average combined SAT score. Then is the average combined total SAT score. The Variance of a Discrete Random Variable: If X is a discrete random variable with mean , then the variance of X is The standard deviation is the square root of the variance. Rules for Variances: If X is a random variable and a and b are fixed numbers, then If X and Y are independent random variables, then Example: Suppose the equation Y = 20 + 100X converts a PSAT math score, X, into an SAT math score, Y. Suppose the standard deviation for the PSAT math score is 1.5 points. What is the standard deviation for the SAT math score? Suppose the standard deviation for the SAT math score is 150 points, and the standard deviation for the SAT verbal score is 165 points. What is the standard deviation for the combined SAT score? *** Because the SAT math score and SAT verbal score are not independent, the rule for adding variances does not apply!