Conditional Probability. Independent Events
W
Description
Independent-Event pdf
Document Sample


Department of Mathematics
Dr. Oleksiy Us
Mathematics
for IET and MET Students
MATH 401
Lecture Nr. 3
Conditional Probability. Independent Events
Conditional Probability
One of the most important concepts in probability theory is motivated by the following
observation.
Example 1 A container has 5 defective and 45 non-defective parts. An experiment
consists of taking one item, and then another one out of the container. The order is
assumed to be important. Find the probability that the first item is non-defective and
the second one is defective.
Solution. By E we denote the event, whose probability is to be found. Using the
counting rules, the sample space S consists of
N(S) = 50 · 49 outcomes.
Similarly, the number of outcomes in E equals
N(E) = 45 · 5.
The probability of E is then, naturally,
N(E) 5 45
P (E) = = · .
N(S) 49 50
We shall see more closely how the answer is constructed. Firstly, the event
E = D2 ∩ E1 ,
where E1 is the event that the first item is not defective, and D2 is the event that the
second item is defective. Secondly, it is clear that
45
P (E1 ) = .
50
Is it possible to express the other fraction in the answer as the probability of an event
related to E1 and D2 ?
With a little bit of consideration, the answer is “Yes!”. Namely,
5
= P (D2 if it is known that E1 occurred).
49
The above probability is called the conditional probability of D2 given that E1 has
occurred and is denoted by
P (D2 |E1 ).
We also observe that
P (E) = P (D2 ∩ E1 ) = P (E1 ) · P (D2|E1 ).
Before stating a formal definition we shall look into another example.
Example 2 A sample of 400 parts was taken for inspection by the quality control
department. These parts were then classified as to whether or not they have surface
flaws and whether or not they are (functionally) defective. These results were organized
in a table as follows.
surface flaws
Yes No Total
defective Yes 10 18 28
No 30 342 372
Total 40 360 400
Find the probability that a randomly selected part is defective if it has a surface flaw.
What is the probability that a part is defective, if it has no surface flaw?
Solution. There are 40 parts with surface flaws in the sample, 10 of them proved to
be defective, so the requested probability equals 10/40 = 0.25. To answer the second
question, we observe that 18 items out of 360 without surface flaws turned out to be
defective. So the probability equals 18/360 = 0.05.
If we denote by F the event that a randomly selected part has surface flaws, and by
D that a part is defective, then the above probabilities can be expressed as
P (D|F ) = 0.25, and P (D|F ′) = 0.05.
Hence, we have just computed the conditional probabilities of the event D given that F
and F ′ occurred, respectively.
We shall now slightly change the data in the above example. Suppose that the size of
the sample is not known and only the following probabilities are given:
P (D ∩ F ) = 0.025, P (D ∩ F ′ ) = 0.045, P (D ′ ∩ F ) = 0.075, P (D ′ ∩ F ′ ) = 0.855.
surface flaws
F F′ Total
defective D 0.025 0.045 ?
D′ 0.075 0.855 ?
Total ? ? ?
Page 2
Is it possible to find, for example, P (D|F )?
The key observation to make here is that by assuming that the event F occurred we
automatically reduce the sample space - the event F upon which we condition the event
D becomes the new sample space.
So if the event F occurs, then in order for D to occur the actual occurrence must be
a point in both D and F , i.e. in D ∩ F . Since the event F becomes our new (reduced)
sample space the probability that D|F occurs will equal the probability of D ∩F relative
to the probability of F .
The rigorous definition is formulated below.
Definition 3 Let E and F be events in a sample space S and P (F ) > 0. The conditional
probability P (E|F ) is defined by
P (E ∩ F )
P (E|F ) = .
P (F )
In view of Example 1, the formula is not surprising at all!
We can now return to the last example. Using the axioms of probability we get
P (F ) = P (F ∩S) = P (F ∩(D∪D ′)) = P ((F ∩D)∪(F ∩D ′ )) = P (F ∩D)+P (F ∩D ′) = 0.1.
So by definition we get
P (D ∩ F ) 0.025
P (D|F ) = = = 0.25,
P (F ) 0.1
which is consistent with Example 2.
Total Probability Rule. Bayes Formula
The definition of the conditional probability implies that
P (E ∩ F ) = P (E|F ) · P (F ),
for all events E and F in a sample space S.
Example 4 A solution to Example 1 can now be obtained from the above formula:
P (E1 ∩ D2 ) = P (D2 |E1 ) · P (E1 ).
Needless to say, it is consistent with the previous solution.
The implications of this rule are, however, much more significant than the last example
suggests.
It appears that conditional probabilities allow one to compute probabilities of complex
events by “conditioning” those to certain events and their complements.
Page 3
Let E and F be events in a sample space S. We get
P (E) = P (E ∩ F ) + P (E ∩ F ′ ) = P (E|F )P (F ) + P (E|F ′)P (F ′ ).
The probability of an event E can therefore be regarded as the weighted mean of the
respective conditional probabilities.
Example 5 A student will be asked one of n questions in an oral exam. He can only
answer k. These questions are further referred to as “easy”. The student wonders
whether probability of getting an “easy” (one of the k) question is higher if he will
choose to be the first person to receive a question, the second etc. (We assume that
questions are not returned to the set after being asked once).
Solution. If the student goes first then the probability of getting an easy question is
clearly k/n. Let us assume now that he lets his friend go first. The probability of
drawing an easy question is now dependent on what question the friend received.
Let E stand for the event of pulling an easy question. Let F be the event that the
friend got a difficult one. If the event F occurred then our student would get an easy
question with the probability k/(n − 1). If, however, the friend got an easy question
then the respective conditional probability for this student would be (k − 1)/(n − 1). It
is also clear that P (F ) = (n − k)/n.
Substituting the above values into the formula for P (E) we get
k n−k k−1 k k
P (E) = · + · = .
n−1 n n−1 n n
And so this probability is independent of whether the student chooses to go first, second
and so on.
Example 6 In semiconductor manufacturing the probability is 0.1 that a chip subject
to high levels of contamination causes a product failure. If not, then the respective
probability is 0.005. Find the probability that a product fails if 20% of all chips are
subject to high levels of contamination.
Solution. Let E stand for the event that the product fails. We denote by H the event
that a chip used in manufacturing process was subject to high levels of contamination.
By the total probability rule we have
P (E) = P (E|H)P (H) + P (E|H ′)P (H ′) = 0.1 · 0.2 + 0.005 · 0.8 = 0.024.
We can clearly modify this example by making the classification of contamination levels
more accurate - high, medium, low - with the proportions being 20%, 30% and 50%,
respectively. We also assume that the respective conditional probabilities are 0.1, 0.01,
0.001.
Let H, M and L stand for the events of a chip being highly, moderately or slightly
contaminated, respectively. The events are clearly mutually exclusive and H∪M ∪L = S.
We get therefore
P (E) = P (E ∩ H) + P (E ∩ M) + P (E ∩ L)
= P (E|H)P (H) + P (E|M)P (M) + P (E|L)P (L) = 0.0235.
Page 4
we have just extended the total probability rule to the case of three mutually exclusive
exhaustive (i.e. the union equals the whole sample space) events.
The approach we used in Example 6 may be extended to a more general setting. Suppose
F1 , . . . , Fn are mutually exclusive events such that ∪n Fk = S. Then for any event E
k=1
we have n
P (E) = P (E|Fk )P (Fk ).
k=1
The last formula is called the general total probability rule.
Example 7 We are now returning to Example 6. Suppose that a product in question
was examined and proved to be a failure. How likely is it that this chip was exposed to
high contamination levels?
Solution. We are requested to find the conditional probability P (H|E). By definition,
the multiplication rule and the total probability rule we have
P (H ∩ E) P (E|H)P (H) 0.02
P (H|E) = = = = 0.8511.
P (E) P (E) 0.0235
Hence, on the basis of an extra piece of information - the product failure - the initial
probability of high contamination levels P (H) = 0.2 was re-evaluated: P (H|E) = 0.85.
The chances are now at 85% that the contamination levels were high when the chip was
manufactured.
The trick can of course be generalized to any finite collection of mutually exclusive
exhaustive events. We use here the same notation as in the general total probability rule.
P (E|Fk )P (Fk ) P (E|Fk )P (Fk )
P (Fk |E) = = n .
P (E)
P (E|Fk )P (Fk )
k=1
The last expression is known as the Bayes Formula.
The following example illustrates the total probability rule and the Bayes formula once
again.
Example 8 An insurance company classifies all customers as accident prone and not
accident prone. On the basis of previous records, the probability to make an accident
within 12 months is 0.5 for accident prone customers. For not accident prone customers
this probability equals 0.1. The company also estimates that 70% customers are not
accident prone. Find the probability that a new customer will make an accident within
a 1-year period.
Solution. Let E stand for the event that the new customer will have an accident within
one year. By F we denote the event that he is accident prone. Then
P (E) = P (E|F )P (F ) + P (E|F ′)P (F ′) = 0.5 · 0.3 + 0.1 · 0.7 = 0.22.
In other words, on the basis of the company’s statistical data one may conclude that
out of each five new customers approximately one will make an accident within an year.
Page 5
It is, however, natural to expect that having observed a new customer for 1 year the
insurer would be able to make a better judgment as to whether or not the policy holder
is accident prone. Suppose the customer had an accident within one year. What is the
probability that he/she is accident prone?
Originally - when no information about the customer was available - this probability
was 0.3. It can be recalculated now as follows
P (E ∩ F ) P (E|F )P (F ) 0.5 · 0.3 0.15
P (F |E) = = = = ≈ 0.68.
P (E) P (E) 0.22 0.22
Hence, knowing that the policy holder made an accident within one year the insurer is
68% sure that the customer is accident prone.
Food for thought.
Independent Events
The previous examples show that the conditional probability of E given F is not
generally equal to the unconditional probability of E. In other words, knowing that F
has occurred may change the chances that E occurs.
Suppose, for example, that a batch of 100 semi-conductor chips contains 20 defective
ones. We select two without replacement. What is the probability that both are defec-
tive?
Solution. Let D1 and D2 be the events that the first and the second chips selected are
defective. The requested probability is clearly
19 20
P (D1 ∩ D2 ) = P (D2 |D1 )P (D1 ) = · .
99 100
We assume now that chips are selected with replacement, i.e. the first one is returned to
the batch before the second is selected. When answering the same question we encounter
a peculiar situation:
20 20
P (D1 ∩ D2 ) = P (D2 |D1 )P (D1 ) = · , i.e. P (D2 ) = P (D2|D1 ).
100 100
In other words, the fact that D1 occurred does not affect the chances that D2 occurs, or
the probability of the event D2 does not depend on whether D1 occurs.
The above considerations motivate the following definition.
Definition 9 Events E and F are called independent if
P (E ∩ F ) = P (E)P (F ).
Events which are not independent are naturally called dependent.
Page 6
Warning! DO NOT GET CONFUSED between mutually exclusive and independent
events! The events E and F are mutually exclusive if E ∩F = ∅. For independent events
E and F we have P (E ∩ F ) = P (E)P (F ). So if the events E and F are independent
and P (E) > 0 and P (F ) > 0, then E and F are not mutually exclusive (Prove this
statement!).
It is easy to see that if E and F are independent, then so are E and F ′ . Indeed,
P (E ∩ F ′ ) = P (E) − P (E ∩ F ) = P (E) − P (E)P (F ) = P (E)(1 − P (F )) = P (E)P (F ′).
Exercise. Show that the statement is also valid for E ′ and F ′ !
For three events the definition of independence takes the following form
Definition 10 The three events E, F, G are called independent if
P (E ∩ F ) = P (E)P (F ), P (F ∩ G) = P (F )P (G), P (E ∩ G) = P (E)P (G),
P (E ∩ F ∩ G) = P (E)P (F )P (G).
In general n events are called independent if for each group of k events out of n the
probability of their intersection is equal to the product of the respective probabilities.
Sometimes a probability experiment consists of performing a sequence of sub-experiments.
It is often reasonable to assume that the outcomes of any group of sub-experiments have
no effect on the probabilities of the outcomes of the other sub-experiments. In such a
case we say that the sub-experiments are independent.
Example 11 A system composed of n separate components is called a parallel system if
it functions when at least one of the components functions. Suppose the probability that
the k-th component functions is pk . We also assume that functioning of one component
is independent of functioning of the rest. Find the probability that the system functions.
Solution. We denote by Ek the event that the k-th component is working. The events
Ek are clearly independent. The event E that the system operates is then
n
E= Ek .
k=1
In order to find the probability of E we apply consecutively de Morgan laws and inde-
pendence, and get
n n n
′
P (E) = P Ek = P ′
Ek =1−P Ek = 1 − Πn (1 − pk ).
′
k=1
k=1 k=1 k=1
Similar to the last example, the independence of events involved follows from the nature
of a probability experiment.
Page 7
Get documents about "