Document Sample

Introduction to Probability Frank Betz EUI September 2007 Frank Betz (EUI) Introduction to Probability September 2007 1 / 124 ——————————————————————————– Frank Betz (EUI) Introduction to Probability September 2007 2 / 124 Outline 1 Introduction 2 The Probability Space 3 Conditioning, independence, Bayes’ Rule and combinatorics 4 Random variables and probability distributions 5 Moments of random variables 6 Some common univariate distributions Frank Betz (EUI) Introduction to Probability September 2007 2 / 124 Probability theory: why do we care? The descriptive analysis of data does not allow for generalisations beyond the data under considerations. However, as economists our concern is to draw inference on the underlying population. Probability theory enables us to do just that: it develops mathematical models for probability that provide the logical foundations for statistical inference. Frank Betz (EUI) Introduction to Probability September 2007 3 / 124 Probability theory - statistics - econometrics Probability Theory analyses characteristics of probability mechanisms on the basis of a limited number of deﬁnitions and axioms. On the basis of data on trials and some maintained assumptions about a probability mechanism Statistics ”estimates” its parameters, or assesses ”hypotheses” about them. Econometrics applies statistics to test or quantify economic models and theories. Frank Betz (EUI) Introduction to Probability September 2007 4 / 124 References Spanos (1986) ”The statistical foundations of econometric modelling” Appendix to Hansens’s Lecture Notes ıt Benoˆ Champagne’s class notes ”Probability and random signals I” Capinski and Kopp (2004) ”Measure, integral and probability” Ivan Wilde’s script ”Measure, integration and probability” Frank Betz (EUI) Introduction to Probability September 2007 5 / 124 Outline 2 The Probability Space Approaches to Probability Random experiment, sample space, and event Sigma-ﬁeld Probability Frank Betz (EUI) Introduction to Probability September 2007 6 / 124 Classical probability Deﬁnition If a random experiment can result in N mutually exclusive and equally likely outcomes and if NA of these outcomes result in the occurrence of NA the event A, then the probability of A is deﬁned by P(A) = N Problems: Applies only to situations with a ﬁnite number of outcomes. Due to the ’equally likely’ condition the deﬁnition is circular. Frank Betz (EUI) Introduction to Probability September 2007 7 / 124 The frequency approach to probability Deﬁnition An experiment is repeated many times under similar conditions. For an event of interest, we postulate a number PA , called the probability of the event, and approximate PA by the relative frequency with which the repeated observations satisfy the event. To overcome the circularity of the classical deﬁnition the frequency approach views probability as the limit of empirical frequencies. Frank Betz (EUI) Introduction to Probability September 2007 8 / 124 Axiomatic probability Deﬁnition A probability space is a triple (S, F , P(· )), where S is the sample space, a collection of possible outcomes of an experiment. F is the event space or set of events: a collection of subsets of S. P(· ) is the probability function deﬁned on F : To any event, P(· ) assigns the probability that the event is observed once the experiment is completed. Modern probability theory and also the remainder of this course is concerned with axiomatic probability. We now have a closer look at its ingredients. Frank Betz (EUI) Introduction to Probability September 2007 9 / 124 Outline 2 The Probability Space Approaches to Probability Random experiment, sample space, and event Sigma-ﬁeld Probability Frank Betz (EUI) Introduction to Probability September 2007 10 / 124 Random experiment Deﬁnition A random experiment E is an experiment which satisﬁes the following conditions: all possible distinct outcomes are known a priori; in any particular trial the outcome is not known a priori; it can be repeated under identical conditions. The axiomatic approach to probability can be viewed as a formalisation of the concept of a random experiment. Frank Betz (EUI) Introduction to Probability September 2007 11 / 124 Sample space Deﬁnition The sample space S is deﬁned to be the set of all possible outcome of the experiment E . The elements of S are called elementary events. Frank Betz (EUI) Introduction to Probability September 2007 12 / 124 Tossing a fair coin twice Example Consider the random experiment E of tossing a fair coin twice and observing the faces turning up. The sample space of E is S = {(HT ), (TH), (HH), (TT )} with (HT ), (TH), (HH), (TT ) being the outcomes or elementary events of S. Frank Betz (EUI) Introduction to Probability September 2007 13 / 124 Event An event is a subset of the sample space S, formed by set theoretic operations on the elementary events. An event occurs when any of the elementary events it comprises occurs. When a trial is made only one elementary event is observed , but a large set of events may have occured. Special events are S, the sure event, and Ø, the impossible event. Example A1 = {(HT ), (TH), (HH)} = {(HT )} ∪ {(TH)} ∪ {(HH)}, that is ’two tails’ does not occur A2 = {(TT ), (HH)} = {(TT )} ∪ {(HH)}, either ’two heads’ or ’two tails’ occur Frank Betz (EUI) Introduction to Probability September 2007 14 / 124 Event We observe that for two events A1 and A2 the following are also events: S\A1 = A1 , the complement of A1 and likewise for A2 A1 ∪ A2 , the union of A1 and A2 A1 ∩ A2 = A1 ∪ A2 , the intersection of A1 and A2 A1 \A2 = A1 ∩ A2 = A1 ∪ A2 ... To construct the events we only need the set theoretic operations complementation and union. Frank Betz (EUI) Introduction to Probability September 2007 15 / 124 Example Example A1 = {(HT ), (TH), (HH)} = {TT } A1 ∪ A2 = {(HT ), (TH), (HH)} ∪ {(TT ), (HH)} = S A1 ∩ A2 = {(HT ), (TH), (HH)} ∩ {(TT ), (HH)} = {(HH)} A1 \A2 = {(HT ), (TH), (HH)} ∩ {(HT ), (TH)} = {(HT ), (TH)} The occurence or non-occurence of A1 or A2 implies the occurence or non-occurence of these events. A mathematical structure that aims to assign probabilities to events has to take this into account. Frank Betz (EUI) Introduction to Probability September 2007 16 / 124 Outline 2 The Probability Space Approaches to Probability Random experiment, sample space, and event Sigma-ﬁeld Probability Frank Betz (EUI) Introduction to Probability September 2007 17 / 124 Set of events We now understand the meaning and the properties of events. Our goal, however, is to assign probabilities to these events. The probability function P(·) : F −→ [0; 1] does just that. As we use sets to represent events, its domain is a collection of sets, denoted by F . To understand the probability function we ﬁrst have to understand its domain. The key property of F is that if A ∈ F , then sets obtained via the various set theoretic operations performed on A must also be elements of F . Frank Betz (EUI) Introduction to Probability September 2007 18 / 124 Power set Example When tossing a fair coin twice the set of all subsets, that is the power set of S is an obvious candidate for F . The power set is given by: P = {∅, S, {(HT )}, {(HH)}, {(TT )}, {(TH)}, {(HT ), (HH)}, {(TT ), (HH)}, {(TH), (HH)}, {(HT ), (TH)}, {(TT ), (TH)}, {(HT ), (TT )}, {(HT ), (TH), (TT )}, {(HT ), (TH), (HH)}, {(HH), (TT ), (TH)}, {(HH), (TT ), (HT )}} However, one can only deﬁne F to be the power set of S, if S is ﬁnite or countably inﬁnite. To assign probabilities when S is uncountable we require that F be a σ-ﬁeld. Frank Betz (EUI) Introduction to Probability September 2007 19 / 124 Sigma-ﬁeld Deﬁnition Let F be a set of subsets of S. F is called a sigma-ﬁeld if: A ∈ F , then A ∈ F - closure under complementation; and Ai ∈ F , i = 1, 2, ..., then ( i Ai ) ∈ F - closure under countable union We observe that the deﬁnition implies: S ∈ F , because A ∪ A = S ∅ ∈ F , as S = ∅ Ai ∈ F , i = 1, 2, ..., then ( i Ai ) ∈ F Frank Betz (EUI) Introduction to Probability September 2007 20 / 124 Example Example Check whether the collections F1 and F2 are σ-ﬁelds where F1 , F2 are given by F1 = {{(HT )}, {(HH), (TH), (TT )}, ∅, S}, F2 = {{(HT ), (TH)}, {(HH)}, ∅, S} Frank Betz (EUI) Introduction to Probability September 2007 21 / 124 Example Example Suppose you are interested in the event ’ﬁrst toss head’. How do you obtain a σ-ﬁeld? Frank Betz (EUI) Introduction to Probability September 2007 22 / 124 Sigma-ﬁeld generated by a family of sets Deﬁnition G is the σ-ﬁeld generated by a family of sets A if G = {F : F is a σ-ﬁeld such that F ⊃ A } G is the smallest σ-ﬁeld containing A Starting from some events of interest to obtain the σ-ﬁeld generated by these events is necessary to construct σ-ﬁelds when S is inﬁnite or uncountable. Frank Betz (EUI) Introduction to Probability September 2007 23 / 124 Borel ﬁeld Example Let S be the real line R and suppose we are interested in events J = {Bx : x ∈ R}, where Bx = {z : z ≤ x} = (−∞, x] Construct the σ-ﬁeld generated by Bx taking complements and countable unions. Frank Betz (EUI) Introduction to Probability September 2007 24 / 124 Borel ﬁeld Deﬁnition Put B = {F : F is a σ-ﬁeld containing all intervals}. We say that B is the σ-ﬁeld generated by all intervals and we call the elements of B Borel sets. B is the smallest σ-ﬁeld containing all intervals. We started with all intervals of the type (−∞, x] to generate B. However, we could have started with all intervals of any other type (all open intervals, all closed intervals,...) and still arrived at B. Not only does B contain all open intervals, it also contains all open sets as any open set is a countable union of open intervals. Countable sets are Borel sets, since each is a countable union of closed intervals of the form [a, a]; in particular N and Q are Borel sets. Frank Betz (EUI) Introduction to Probability September 2007 25 / 124 Outline 2 The Probability Space Approaches to Probability Random experiment, sample space, and event Sigma-ﬁeld Probability Frank Betz (EUI) Introduction to Probability September 2007 26 / 124 Probability Deﬁnition Probability is deﬁned as a set function on F satisfying the following axioms: P(A) ≥ 0 for every A ∈ F P(S) = 1 P( i Ai ) = i P(Ai ) if Ai is a sequence of mutually exclusive events in F - called countable additivity Hence, probability is a countably additive set function with domain F and range [0,1]. Frank Betz (EUI) Introduction to Probability September 2007 27 / 124 Properties of the probability set function Proposition Let P(·) be a probability measure on F . Then the following hold. P(A) = 1 − P(A), A∈F (1) P(∅) = 0 (2) If A1 ⊂ A2 then P(A1 ) ≤ P(A2 ), A1 , A2 ∈ F (3) P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 ) (4) Frank Betz (EUI) Introduction to Probability September 2007 28 / 124 Properties of the probability set function Proposition If A1 ⊆ A2 ⊆ . . . with An ∈ F , n = 1, 2, . . . , then we have lim P(An ) = P( Ai ) n−→∞ i If A1 ⊇ A2 ⊇ . . . with An ∈ F , n = 1, 2, . . . , then we have lim P(An ) = P( Ai ) n−→∞ i Frank Betz (EUI) Introduction to Probability September 2007 29 / 124 Probability set function Example Is it possible for two events A and B where P(A) = 0.7, P(B) = 0.9 and P(A ∩ B) = 0.5? Frank Betz (EUI) Introduction to Probability September 2007 30 / 124 Measure Deﬁnition A ﬁnite measure on a measurable space (S, F ) is a map µ : F −→ [0, ∞), such that if A1 , A2 , . . . is any sequence of pairwise disjoint members of F , then µ( Ai ) = µ(Ai ) i i This requirement is referred to as countable additivity. Frank Betz (EUI) Introduction to Probability September 2007 31 / 124 Probability and measure A measure space is a triple (S, F , µ), where µ is a measure on the σ-ﬁeld F of subsets of S. If µ(S) = 1, then µ is called a probability measure and (S, F , µ) is called a probability space. Probability is therefore a special case of a ﬁnite measure. Frank Betz (EUI) Introduction to Probability September 2007 32 / 124 Outline 3 Conditioning, independence, Bayes’ Rule and combinatorics Conditional Probability Independence Bayes’ Theorem Combinatorics Frank Betz (EUI) Introduction to Probability September 2007 33 / 124 Conditional probability Example Suppose that in the case of tossing a fair coin twice we know that in the ﬁrst trial it was heads. What does it imply for the sample space, the associated σ-ﬁeld and the corresponding probabilities? Frank Betz (EUI) Introduction to Probability September 2007 34 / 124 Conditional probability Deﬁnition P(A∩B) Suppose that P(B) > 0. Then the number P(A|B) = P(B) is called the conditional probability of A given B. Frank Betz (EUI) Introduction to Probability September 2007 35 / 124 Example Example An urn contains 3 white and 2 red balls. Suppose you draw two balls without replacement and consider the events A ’second ball red’ and B ’ﬁrst ball red’. Compute P(A), P(B), P(A ∩ B), and P(A|B). Frank Betz (EUI) Introduction to Probability September 2007 36 / 124 Outline 3 Conditioning, independence, Bayes’ Rule and combinatorics Conditional Probability Independence Bayes’ Theorem Combinatorics Frank Betz (EUI) Introduction to Probability September 2007 37 / 124 Independence Deﬁnition Two events A and B are independent if and only if P(A ∩ B) = P(A)P(B), or (5) P(A|B) = P(A), or (6) P(B|A) = P(B) (7) where (6) and (7) hold only if P(A) > 0 and P(B) > 0 respectively. Frank Betz (EUI) Introduction to Probability September 2007 38 / 124 Independence Example A student takes two courses - calculus and statistics. It is known that the probability that he passes both courses is 0.5, that he passes only calculus equals 0.2, that he passes only statistics is 0.1, and that he fails both courses is 0.2. Is the performance in statistics independent of the performance in calculus? Frank Betz (EUI) Introduction to Probability September 2007 39 / 124 Independence of n events Deﬁnition Events A1 , ..., An are independent if and only if P(Ai ∩ Aj ) = P(Ai )P(Aj ), 1 ≤ i < j ≤ n P(Ai ∩ Aj ∩ Ak ) = P(Ai )P(Aj )P(Ak ), 1 ≤ i < j < k ≤ n ... P( Ak ) = P(Ak ) k k Put diﬀerently, the events A1 , ..., An are independent if for all k ≤ n for each choice of k events, the probability of their intersection is the product of the probabilities. Frank Betz (EUI) Introduction to Probability September 2007 40 / 124 Independence of n events There are 2n − n − 1 equations. Independence of n events implies independence of any subset A1 , ..., Ak . If only P(Ai ∩ Aj ) = P(Ai )P(Aj ) holds, the events are called pairwise independent. However, P( k Ak ) = k P(Ak ) does not imply pairwise independence Frank Betz (EUI) Introduction to Probability September 2007 41 / 124 Outline 3 Conditioning, independence, Bayes’ Rule and combinatorics Conditional Probability Independence Bayes’ Theorem Combinatorics Frank Betz (EUI) Introduction to Probability September 2007 42 / 124 The law of total probability Theorem Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and i Ai = S. Let B ⊂ S. Then P(B) = P(B|Ai )P(Ai ) i Frank Betz (EUI) Introduction to Probability September 2007 43 / 124 The law of total probability Example Prove the law of total probability. Frank Betz (EUI) Introduction to Probability September 2007 44 / 124 Bayes’ Rule Theorem Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and i Ai = S. Let B ⊂ S where P(B) > 0. Then P(B|Ai )P(Ai ) P(Ai |B) = j P(B|Aj )P(Aj ) In Bayes’ formula A1 , ..., An are often called hypotheses, P(Ai ) a priori probability of Ai , and P(Ai |B) a posteriori probability of Ai . Frank Betz (EUI) Introduction to Probability September 2007 45 / 124 Bayes’ Rule Example Prove Bayes’ Rule. Frank Betz (EUI) Introduction to Probability September 2007 46 / 124 Example Example A product is produced by two machines 1 and 2. Machine 1 produces 40% of the output, but 5% of its output is substandard. Machine 2 produces 60% of the output, but 10% of the products are deﬁcient. Compute the probability that a deﬁcient product is from machine 1. Frank Betz (EUI) Introduction to Probability September 2007 47 / 124 Outline 3 Conditioning, independence, Bayes’ Rule and combinatorics Conditional Probability Independence Bayes’ Theorem Combinatorics Frank Betz (EUI) Introduction to Probability September 2007 48 / 124 Classical probability Classical probability rests on the assumption that the sample space is ﬁnite. 1 Suppose S = s1 , ..., sn , that is E results in n elementary events. 2 Find n positive real numbers p1 , ..., pn such that i pi = 1 These numbers represent the probabilities of the elementary events 3 Obtain the ﬁeld of events F as the power set of S. Then any event A ∈ F can be represented as Ai = si1 , ..., sik with P(Ai ) = pi1 + ... + pik . 1 Suppose all elementary events are equally likely, that is p1 = ... = pn = n . |A| Then, for any event A ∈ F deﬁne P(A) = |S| , called the classical probability of A. Frank Betz (EUI) Introduction to Probability September 2007 49 / 124 Combinatorics Combinatorics is located in the realm of classical probability. It evolved from the analysis of chance games where the elementary events are typically equally likely. Many of these problems can be formulated as drawing k objects from a set of n elements with or without replacement while being concerned or not concerned with the ordering of the draw. Frank Betz (EUI) Introduction to Probability September 2007 50 / 124 Ordered sample, with replacement Proposition Suppose k objects are drawn with replacement out of a set of n elements. Then the total number of ordered k-tuples is nk . Example Suppose an urn contains n distinctly numbered balls. You draw k balls with replacement and write down the sequence of numbers. Frank Betz (EUI) Introduction to Probability September 2007 51 / 124 Ordered sample, without replacement Proposition Suppose k objects are drawn without replacement out of a set of n elements. Then the total number of ordered k-tuples is n! n(n − 1)(n − 2)...(n − k + 1) = (n−k)! , denoted by P(n, k). Example Suppose an urn contains n distinctly numbered balls. You draw k balls without replacement and write down the sequence of numbers. Frank Betz (EUI) Introduction to Probability September 2007 52 / 124 Unordered sample, without replacement Proposition Suppose k objects are drawn without replacement out of a set of n n! elements. Then the total number of unordered k-tuples is (n−k)!k! , n denoted by k or C (n, k). Example Take k balls in a single draw from the urn containing n balls. Frank Betz (EUI) Introduction to Probability September 2007 53 / 124 Unordered sample, with replacement Proposition Suppose k objects are drawn without replacement out of a set of n elements. Then the total number of unordered k-tuples is (n+k−1)! n+k−1 (n−1)!k! = k = C (n + k − 1, k) Example Suppose an urn contains n distinctly numbered balls. You draw k balls with replacement and note the sequence of numbers. You then count how many times the 1st,...,nth ball appeared. Frank Betz (EUI) Introduction to Probability September 2007 54 / 124 Example Example In Poker, a player is dealt ﬁve cards from a 52 card deck. Compute the number of possibilities. Frank Betz (EUI) Introduction to Probability September 2007 55 / 124 Example Example In a lottery six balls are drawn from an urn containing 49 distinctly numbered balls. Compute the probability that a player correctly guesses the numbers of four balls out of the six balls drawn. Frank Betz (EUI) Introduction to Probability September 2007 56 / 124 Useful formulas for unordered sample without replacement Proposition Binomial Expansion: n i n−i (x + y )n = x y i i Proposition Pascal’s triangle: n−1 n−1 n + = k −1 k k Frank Betz (EUI) Introduction to Probability September 2007 57 / 124 Outline 4 Random variables and probability distributions Random variable Distribution and density functions Frank Betz (EUI) Introduction to Probability September 2007 58 / 124 Why do we care? The concept of the probability space (S, F , P(·)) was developed to consistently assign probabilities to events, especially when the sample space S is inﬁnite. However, the mathematical manipulation of P(·) is diﬃcult, as its domain is a σ-ﬁeld F of arbitrary sets. To go further in our development of a mathematical probability model we need a more ﬂexible framework. However, random experiments E often yield quantiﬁable outcomes, i.e. outcomes that can be represented by ’numbers’. The concept of a random variable exploits this opportunity by assigning numbers to outcomes without altering the probabilistic structure of (S, F , P(·)). Frank Betz (EUI) Introduction to Probability September 2007 59 / 124 Coin toss revisited Example Suppose that in our familiar example we are interested in the number of heads. Consider the function X (·) : S −→ Rx that maps all elementary events si onto the set Rx = {0, 1, 2}. Hence, X (·) is a function that assigns numbers to the outcome of the random experiment. However, for X (·) to be a random variable we require more. A random variable has to preserve the event structure of (S, F , P(·)). This means, that for every value of X , there exists a corresponding subset in F and that the mapping preserves unions and complements. . . Frank Betz (EUI) Introduction to Probability September 2007 60 / 124 Coin toss revisited Example . . . We obtain: X −1 (0) = {(TT )} X −1 (1) = {(TH), (HT )} X −1 (2) = {(HH)} If X −1 (0) ∈ F , X −1 (1) ∈ F , X −1 (2) ∈ F , X −1 (0 ∪ 1) ∈ F , X −1 (1 ∪ 2) ∈ F , and X −1 (0 ∪ 2) ∈ F , then X (·) is a random variable with respect to F . Frank Betz (EUI) Introduction to Probability September 2007 61 / 124 Random variable Deﬁnition A random variable X is a real valued function from S to R, which satisﬁes the condition that for each Borel set B ∈ B on R, the set X −1 (B) = {s : X (s) ∈ B, s ∈ S} is an event in F . . Frank Betz (EUI) Introduction to Probability September 2007 62 / 124 Features of a random variable A random variable is a real valued function. It is neither ’random’ nor a ’variable’. A random variable is always deﬁned relative to some speciﬁc σ-ﬁeld. In deciding whether some function Y (·) : S −→ R is a random variable we proceed from the elements of the Borel ﬁeld B to those of the σ-ﬁeld F and not the other way round. Frank Betz (EUI) Introduction to Probability September 2007 63 / 124 Features of a random variable There is no need to consider all Borel sets B ∈ B. B is the σ-ﬁeld generated by all intervals of the type (−∞, x]. Hence, if X (·) is such that X −1 ((−∞, x]) = {s : X (s) ∈ (−∞, x], s ∈ S} ∈ F for all (−∞, x] ∈ B, then X −1 (B) = {s : X (s) ∈ B, s ∈ S} ∈ F for all B ∈ B since all Borel sets can be expressed in terms of the semi-closed intervals (−∞, x]. Frank Betz (EUI) Introduction to Probability September 2007 64 / 124 Tossing the coin, once more Example Consider X - number of heads and let F = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)}, {(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}} Then ∅, x <0 {(TT )}, 0≤x <1 −1 X ((−∞, x]) = {(TH), (HT )}, 1≤x <2 {(HH)}, 2≤x and X −1 ((−∞, x]) ∈ F for all x ∈ R and thus X (·) is a random variable with respect to F . Frank Betz (EUI) Introduction to Probability September 2007 65 / 124 Example, continued Example Let Y equal 1 if the ﬁrst toss is a head. F given as before by F = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)}, {(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}} Then ∅, y <0 Y −1 ((−∞, y ]) = {(TH), (TT )}, 0 ≤ y < 1 {(HT ), (TT )}, 1 ≤ y Thus, Y −1 ((−∞, y ]) ∈ F for y = 0, Y = 1. Therefore, Y (·) is not a / random variable with respect ot F . How can we turn Y (·) into a random variable? Frank Betz (EUI) Introduction to Probability September 2007 66 / 124 The set function PX (·) To assign probabilities to the Borel set B ∈ B we deﬁne the set function PX (·) : B −→ [0, 1] such that PX (B) = P(X −1 (B)) = P(s : X (s) ∈ B, s ∈ S) for all B ∈ B. Again, there is no need to consider all Borel sets B ∈ B when deﬁning PX (·). It is suﬃcient to consider semi-closed intervals of the type (−∞, x] as all Borel sets can be expressed in terms of these intervals. Frank Betz (EUI) Introduction to Probability September 2007 67 / 124 Tossing a coin twice Example X (·), the number of heads, is deﬁned with respect to the same σ-ﬁeld F as before. Then, 0, x <0 1, 0≤x <1 PX ((−∞, x]) = 4 3, 4 1≤x <2 1, 2≤x Frank Betz (EUI) Introduction to Probability September 2007 68 / 124 Probability space transformed Using the concept of a random variable, we have transformed the proability space (S; F , P(·)) into the equivalent probability space (R, B, PX (·)). In particular: S, a set of arbitrary elements, has been replaced by the real line R. Correspondingly we traded F , the σ-ﬁeld of subsets of S against B, the Borel ﬁeld on the real line. P(·) a set function deﬁned on arbitrary sets has been replaced by PX (·) a set function deﬁned on semi-closed intervals on the real line. Hence, we have obtained a more ﬂexible framework to develop a probability model. Frank Betz (EUI) Introduction to Probability September 2007 69 / 124 Outline 4 Random variables and probability distributions Random variable Distribution and density functions Frank Betz (EUI) Introduction to Probability September 2007 70 / 124 Motivation for distribution function Though we deﬁned PX (·) on semi-closed intervals rather than arbitrary sets, it is still a set function. We would like to simplify it further by transforming it into a point function, which can be easily represented by an algebraic formula. This seems feasible as the intervals (∞, x] diﬀer only in their ’end point’ x. Heuristically, one deﬁnes F (·) as a point function by PX ((−∞, x]) = F (x) − F (−∞), ∀x ∈ R, and assigning the value zero to F (−∞). Frank Betz (EUI) Introduction to Probability September 2007 71 / 124 Distribution function Deﬁnition Let X be a random variable deﬁned on (S, F , P(·)). The point fucntion F (·) : R −→ [0, 1] deﬁned by F (x) = PX ((−∞, x]) = Pr (X ≤ x), ∀x ∈ R is called the distribution function of X and satisﬁes the following properties: 1 F (x) is non-decreasing; 2 F (−∞) = limx−→−∞ F (x) = 0, F (∞) = limx−→∞ F (x) = 1 3 F (x) is continuous from the right (limh+ −→0 F (x + h+ ) = F (x), ∀x ∈ R). Frank Betz (EUI) Introduction to Probability September 2007 72 / 124 Discrete random variable Deﬁnition A random variable is called discrete if its Range RX is a countable set. Example We roll three dice one time and deﬁne X to be the sum of the numbers that occur. The set RX = {3, 4, . . . , 18} is ﬁnite and thus X is discrete. Frank Betz (EUI) Introduction to Probability September 2007 73 / 124 Continuous random variable Deﬁnition A random variable is called (absolutely) continuous if its distribution function F (x) is continuous for all x ∈ R and there exists a non-negative function f (x) on the real line such that x F (x) = f (u) du, ∀x ∈ R −∞ Note that continuity of X requires more than a continuous distribution function F (X ). F (x) must be derivable by integrating some non-negative function f (x). Frank Betz (EUI) Introduction to Probability September 2007 74 / 124 Density function Deﬁnition Let F(x) bef the distribution function of the r.v. X . The non-negative function f (x) deﬁned by x F (x) = f (u) du, ∀x ∈ R − continuous −∞ or F (x) = f (u), ∀x ∈ R − discrete u≤x is said to be the (probability) density function (pdf) of X . Frank Betz (EUI) Introduction to Probability September 2007 75 / 124 Uniform distribution Example Suppose X takes values in the interval [a, b] and all values of X are attributed the same probability. In this case, we say that X is uniformly distributed with distribution function 0, x <a F (x) = x−a , a ≤ x < b b−a 1, x ≥b How does the corresponding density function look like? Frank Betz (EUI) Introduction to Probability September 2007 76 / 124 Properties of the density function f (x) ≥ 0, ∀x ∈ R ∞ −∞ f (x) dx = 1 b Pr (a < X < b) = a f (x) dx d f (x) = dx F (x) at every point where F (x) is continuous Frank Betz (EUI) Introduction to Probability September 2007 77 / 124 Example Example Let fX (x) be given by ax + b x ∈ [−2b, 2b] fX (x) = 0, otherwise For which a, b is fX (x) a probability density function? Frank Betz (EUI) Introduction to Probability September 2007 78 / 124 Outline 5 Moments of random variables Mean Variance Moment generating function Frank Betz (EUI) Introduction to Probability September 2007 79 / 124 Mean Deﬁnition The mean or expected value of a random variable X is given by ∞ E (X ) = x fx (x) dx −∞ if X is continuous, and E (X ) = xi pi i if X is discrete. Frank Betz (EUI) Introduction to Probability September 2007 80 / 124 Example Example Suppose you are in a casino and oﬀered to play the following game. An urn contains 100 balls, numbered from 0 to 99. Twelve balls will be drawn randomly from the urn. Before they are selected you are asked to choose any number N from 0 to 99. If the ball numbered N is among the twelve balls drawn from the urn you will be paid $N. Would you be willing to play if you have to pay $15 to enter the game? What is the optimal strategy? Frank Betz (EUI) Introduction to Probability September 2007 81 / 124 Example Example A fair coin is tossed until two successive heads occur. Let X be the number of tosses required and compute E (X ). Frank Betz (EUI) Introduction to Probability September 2007 82 / 124 Properties of the expectation 1 E (c) = c if c is a constant 2 E (aX1 + bX2 ) = aE (X1 ) + bE (X2 ) for any two r.v.’s X1 and X2 whose means exist and a, b are real constants 1 3 P(X ≥ λE (X )) ≤ λ , for a positive r.v. X and λ > 0. This result is known as Markov’s inequality. Properties (1) and (2) deﬁne E (·) as a linear operator. Hence E (aX + b) = aE (X ) + b. Frank Betz (EUI) Introduction to Probability September 2007 83 / 124 Existence of the expectation For the expectation to exist we require that ∞ |x|fX (x)dx < ∞ −∞ if the random variable is continuous or |x|f (xi ) < ∞ i if the random variable is discrete. Frank Betz (EUI) Introduction to Probability September 2007 84 / 124 Example Example Show that in case of the Cauchy distribution given by 1 fX (x) = , ∀x ∈ R π(1 + x 2 ) the mean does not exist. Frank Betz (EUI) Introduction to Probability September 2007 85 / 124 Expectation of a function of a random variable Deﬁnition If X is a random variable, and there exists a function g (·) : R −→ R, then ∞ E (g (X )) = g (x)fx (x) dx −∞ if X is continuous, and E (g (X )) = g (xi )pi i if X is discrete. Frank Betz (EUI) Introduction to Probability September 2007 86 / 124 Outline 5 Moments of random variables Mean Variance Moment generating function Frank Betz (EUI) Introduction to Probability September 2007 87 / 124 Variance Deﬁnition The variance of a random variable X is deﬁned by Var (X ) = E (X − E (X ))2 ∞ = (x − E (X ))2 f (x)dx −∞ = (xi − E (X ))2 pi i when X is continuous and discrete respectively. Often, it is more convenient to work with the equality Var (X ) = E (X 2 ) − E 2 (X ). The root of the variance is called standard deviation. Frank Betz (EUI) Introduction to Probability September 2007 88 / 124 Uniform Distribution Example Let X be uniformly distributed on the interval [0, b]. Compute its mean and variance. Frank Betz (EUI) Introduction to Probability September 2007 89 / 124 Properties of the variance 1 Var (c) = 0 for any constant c. 2 Var (aX ) = a2 Var (X ) for a constant. Var (X ) 3 P(|X − E (X )| ≥ k) ≤ k2 ,k > 0, which is known as Chebyshev’s inequality. It provides an upper bound on dispersion as deﬁned by |X − E (X )| ≥ k. Frank Betz (EUI) Introduction to Probability September 2007 90 / 124 Example Example A random variable X has mean E (X ) = 8 and Variance Var (X ) = 4. Provide an upper bound for the probability P(X ≤ 4 X ≥ 20). Frank Betz (EUI) Introduction to Probability September 2007 91 / 124 Outline 5 Moments of random variables Mean Variance Moment generating function Frank Betz (EUI) Introduction to Probability September 2007 92 / 124 Higher order moments If it exists, the r th raw moment is deﬁned by ∞ µr = E (X r ) = x r fx (x) dx, r = 0, 1, 2, ... −∞ The mean is the ﬁrst raw moment µ1 = E (X ) = µ. If it exists, the r th central moment is deﬁned by ∞ µr = E (X − µ)r = (x − µ)r fx (x) dx, r = 0, 1, 2, ... −∞ The variance is the second central moment µ2 = E (X − µ)2 = σ 2 . Frank Betz (EUI) Introduction to Probability September 2007 93 / 124 Skewness µ3 The coeﬃcient of skewness is given by S = 3/2 , where µi is the µ2 ith central moment. If S < 0 we say the distribution is negatively skewed or skewed to the left. If S > 0 the distribution is positively skewed or skewed to the right. If the distribution is symmetric, then all odd central moments are equal to zero. Hence S = 0. Frank Betz (EUI) Introduction to Probability September 2007 94 / 124 Kurtosis µ4 The coeﬃcient of kurtosis is given by K = µ2 − 3, where µi is the 2 ith central moment. Distributions with zero kurtosis are called mesokurtic. The normal distribution is mesokurtic. If K > 0 the distribution is called leptokurtic. Leptocurtic distributions have a more pronounced peak. If K < 0 the distribution is called platykurtic and has a relatively ﬂat peak. Frank Betz (EUI) Introduction to Probability September 2007 95 / 124 Moment generating function Deﬁnition Let X be a random variable with pdf fX (·). If there exists a real constant h > 0 such that E (e tX ) exists for all |t| < h, then mX = E (e tX ) is called moment generating function. If X is discrete the mgf is given by mX = i e txi pi . ∞ tx In case of continuous X , the mgf is given by mX = −∞ e fX (x)dx. The mgf need not exist! Frank Betz (EUI) Introduction to Probability September 2007 96 / 124 Properties of the mgf If the moment generating function exists, then all raw moments of a random variable X can be derived as d r mX (t) µr = ( )t=0 = E (X r ) dt r 1 1 mX (t) = E (1 + xt + (xt)2 + ...) = µ ti 2! i! i i Let X , Y be random variables with associated densities fX (x), fY (y ). If the moment generating functions mX (t), mY (t) exist and mX (t) = mY (t), ∀t ∈ (−h, h), h > 0, then FX (·) = FY (·). Frank Betz (EUI) Introduction to Probability September 2007 97 / 124 Example Example The probability density function of an exponentially distributed random variable is given by fX (x) = θe −θx , x ≥0 Find Var (X ), using the mgf. Frank Betz (EUI) Introduction to Probability September 2007 98 / 124 Characteristic function The characteristic function is an alternative to the moment generating function. It is given by √ ψX (t) = E (e itX ), i= −1 The advantage of the characteristic function is that it always exists, since |e itX | = | cos tX + sin tX | ≤ 1, such that convergence is guaranteed. To obtain moments from the characteristic function compute i r µr = ψ (r ) (0). Frank Betz (EUI) Introduction to Probability September 2007 99 / 124 Outline 6 Some common univariate distributions Discrete distributions Continuous distributions Frank Betz (EUI) Introduction to Probability September 2007 100 / 124 Parametric families We can now deﬁne our probability model in the form of a parametric family of density functions, denoted by Φ = {f (x; θ), θ ∈ Θ}. Every member of Φ is indexed by a parameter θ, which belongs to the parameter space Θ. Choosing a parametric family to model a real world phenomenon assumes that the data have been generated by one of the densities in Φ. The uncertainty relating to the outcome of a particular trial of the random experiment E has been transformed into the uncertainty regarding the value of the parameter θ. Frank Betz (EUI) Introduction to Probability September 2007 101 / 124 Bernoulli distribution A random experiment with outcomes ”success” (x = 1) and ”failure” (x = 0) and associated probabilities p and q = 1 − p is called Bernoulli experiment. A Bernoulli distributed rv has probability function p x (1 − p)1−x , x = 0, 1 fX (x; p) = 0, otherwise The distribution function follows immediately. A Bernoulli random variable has mean E (X ) = p and Variance Var (X ) = p(1 − p). Example Tossing a coin once. X - ’head occurs’ Frank Betz (EUI) Introduction to Probability September 2007 102 / 124 Binomial distribution A binomial r.v. counts the number of successes in a sequence of n independent Bernoulli experiments. The probability function of a binomial r.v. is given by n p x (1 − p)1−x , x = 0, 1, 2, . . . n k bX (x; n, p) = 0, otherwise A binomially distributed rv has mean E (X ) = np and variance Var (X ) = np(1 − p). Frank Betz (EUI) Introduction to Probability September 2007 103 / 124 Example Example A fair dice is rolled three times. Compute the probability of at least two 6. Frank Betz (EUI) Introduction to Probability September 2007 104 / 124 Example Example Obtain the expectation of a binomial random variable in three diﬀerent ways. Frank Betz (EUI) Introduction to Probability September 2007 105 / 124 Example Example How many times do we have to throw a dice for the probability to obtain no 6 to be lower than 5%? Frank Betz (EUI) Introduction to Probability September 2007 106 / 124 Poisson distribution The Poisson distribution is often used to model the number of events occuring within a given time interval. A Poisson rv has probability function e −λ λx , x = 0, 1, 2, . . . x! pX (x; λ) = 0, otherwise Mean and variance of the Poisson distribution are given by E (X ) = Var (X ) = λ. Hence, λ can be understood as the average number of events, that is the arrival rate of events, over the unit interval. Frank Betz (EUI) Introduction to Probability September 2007 107 / 124 Example Example At a manufacturing plant accidents have occured once every two months. Assuming accidents occur independently, what is the expected number of accidents per year? What is the probability that in a given month there will be no accidents? Frank Betz (EUI) Introduction to Probability September 2007 108 / 124 Example Example Suppose X ∼ B(n, p) and let n −→ ∞, p −→ 0, such that np = λ. Show k that limn−→∞ b(k; n, λ ) = e −λ λ n k! Frank Betz (EUI) Introduction to Probability September 2007 109 / 124 Geometric distribution In a sequence of independent Bernoulli experiments, the number of trials before achieving ’success’ for the ﬁrst time is geometrically distributed. The geometric distribution has probability function (1 − p)x p, x ∈ N0 gX (x; p) = 0, otherwise 1−p The geometric distribution has mean E (X ) = p and variance 1−p Var (X ) = p2 . The geometric distribution is also called a ”discrete waiting time” distribution. Frank Betz (EUI) Introduction to Probability September 2007 110 / 124 Example Example Show that the geometric distribution is ”memoryless”, that is P(X > a + b|X > a) = P(X > b). Frank Betz (EUI) Introduction to Probability September 2007 111 / 124 Hypergeometric distribution Suppose that without replacement we draw a sample of size n out of a population of size N, K of which share a certain property. Then, a rv X counting the number of elements sharing this property in the sample is hypergeometrically distributed. The hypergeometric distribution has probability function K N−K ( x )( n−x ) , x = 0, 1, . . . , n hX (x; n, N, K ) = (N ) n 0, otherwise The hypergeometric distribution has mean E (X ) = n K and variance N Var (X ) = n K N−K N N N−n N−1 . Frank Betz (EUI) Introduction to Probability September 2007 112 / 124 Example Example Suppose you are dealt ﬁve cards from a standard 52 card Poker deck. Compute the probability of receiving at least two aces. Frank Betz (EUI) Introduction to Probability September 2007 113 / 124 Example Example K Suppose X ∼ H(x; n, N, K ) and let N −→ ∞, K −→ ∞ such that N = p. Show that lim hX (x, n, N, K ) = bX (x; n, p) Frank Betz (EUI) Introduction to Probability September 2007 114 / 124 Outline 6 Some common univariate distributions Discrete distributions Continuous distributions Frank Betz (EUI) Introduction to Probability September 2007 115 / 124 Uniform distribution Suppose X takes values in the interval [a, b] and all values of X are attributed the same probability. In this case, we say that X is uniformly distributed with density function 1 b−a , a≤x ≤b fX (x) = 0, x ≥ otherwise a+b A uniformly distributed rv has mean E (X ) = 2 and variance (b−a)2 Var (x) = 12 . Frank Betz (EUI) Introduction to Probability September 2007 116 / 124 Exponential distribution In a Poisson process with parameter λ the waiting time until the event occurs for the ﬁrst time follows an exponential distribution. The density function of the exponential distribution is given by λe −λx , x ≥ 0 fX (x; λ) = 0, otherwise 1 The exponential distribution has mean E (X ) = λ and variance 1 Var (X ) = λ2 . The cdf is given by FX (x) = 1 − e λx . Frank Betz (EUI) Introduction to Probability September 2007 117 / 124 Properties of the exponential distribution Suppose X ∼ P(λ). Then P(no event occurs over (0, t)) = e −λt On the other hand, let Y ∼ Exp(λ). Then P(event occurs at T > t) = 1 − FY (t) = e −λt The exponential distribution also exhibits the memoryless property. Frank Betz (EUI) Introduction to Probability September 2007 118 / 124 Example Example A study on the lifespan of lightbulbs measures the total number of kilowatt hours consumed until failure. For a 100 Watt lightbulb, which has lit already 5 tu (time units), Y represents total kilowatt hours consumed. One tu equals 1,000 hours. A second random variable X records the remaining lifespan in tu and is exponentially distributed with parameter λ = 0.1. a. Compute the distribution of Y . b. Compute the probability that the lightbulb consumes more that 2,500 kilowatt hours. c. Suppose that after 20 tu the lightbulb still works. Compute the probability that it works another 20 tu. Frank Betz (EUI) Introduction to Probability September 2007 119 / 124 Normal distribution The normal distribution is probably the single most important distribution in econometrics. This is due mainly to results in asymptotic theory: Under fairly general conditions many other distributions converge to the normal distribution. The normal distribution has pdf 1 (x−µ)2 fX (x; µ, σ 2 ) = √ e − 2σ2 2πσ The normal distribution has mean E (X ) = µ and variance Var (X ) = σ 2 . There is no closed form solution for the cdf. Frank Betz (EUI) Introduction to Probability September 2007 120 / 124 Standard normal distribution A normal distribution with µ = 0 and variance σ 2 = 1 is called standard normal distribution. The corresponding pdf reads 1 1 2 φ(x) = √ e − 2 x 2π while the cdf is given by x 1 1 2 Φ(x) = √ e − 2 t dt −∞ 2π Let X ∼ N(µ, σ) with corresponding pdf and cdf. Then ∀x ∈ R we have fX (x) = σ φ( x−µ ) and FX (x) = Φ( x−µ ). 1 σ σ Frank Betz (EUI) Introduction to Probability September 2007 121 / 124 σ-rules Let X ∼ N(µ, σ 2 ). Then, the ”σ-rules” are given by P(|X − µ| ≤ σ) = 0.682 P(|X − µ| ≤ 2σ) = 0.974 P(|X − µ| ≤ 3σ) = 0.998 Hence, for practical purposes a normal rv does not deviate more than three standard deviations from its mean. If a rv deviates for more than three standard deviations from the mean it is unlikely to be normal. Frank Betz (EUI) Introduction to Probability September 2007 122 / 124 Examples Example Let X ∼ N(0, σ 2 ). Find σ 2 given that P(|X | < 0.5) = 0.3. Frank Betz (EUI) Introduction to Probability September 2007 123 / 124 Example Example Suppose the velocity of cars on a certain road is normally distributed. However, we only know that 21.2% of cars travel faster than 90 km/h, and that the number of cars driving slowlier than 60 km/h is just ten times smaller. Compute µ and σ. Frank Betz (EUI) Introduction to Probability September 2007 124 / 124

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 8/30/2011 |

language: | English |

pages: | 125 |

OTHER DOCS BY dfgh4bnmu

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.