Probability and Statistics Review
Shared by: hcj
-
Stats
- views:
- 2
- posted:
- 8/29/2012
- language:
- Hebrew
- pages:
- 41
Document Sample


Probability and Statistics
Review
Thursday Mar 12
מודל למידה
מה המשתנים החשובים?
מה הטווח שלהם?
מהן הקומבינציות החשובות?
חזרה מהירה על הסתברות
מאורע , אוסף מאורעות •
משתנה מקרי •
הסתברות מותנית , חוק ההסתברות השלמה , •
חוק השרשרת ,חוק בייס
תלות ואי תלות •
שונות ,תוחלת ושונות משותפת •
Moments •
Sample space and Events
• W : Sample Space, result of an experiment
• If you toss a coin twice W = {HH,HT,TH,TT}
• Event: a subset of W
• First toss is head = {HH,HT}
• S: event space, a set of events:
• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.
• Contains the empty event and W
Probability Measure
• Defined over (W,S) s.t.
• P(a) >= 0 for all a in S
• P(W) = 1
• If a, b are disjoint, then
• P(a U b) = p(a) + p(b)
• We can deduce other axioms from the above ones
• Ex: P(a U b) for non-disjoint event
Visualization
• We can go on and define conditional
probability, using the above visualization
הסתברות מותנית
-P(F|H) = Fraction of worlds in which H is true that
also have F true
p( F H )
p( F | H ) =
p( H )
Rule of total probability
B5 B3 B2
B4
A
B1
B7 B6
p A) = PBi )P A | Bi )
Bayes Rule
• We know that P(smart) = .7
• If we also know that the students grade is
A+, then how this affects our belief about
his intelligence?
P( x) P( y | x)
P x | y ) =
P( y )
• Where this comes from?
דוגמא
במפעל פעולות שתי מכונות Aו- 10% . Bמתוצרת המפעל מיוצרת במכונה Aו-
%09 במכונה 1% .Bמהמוצרים המיוצרים במכונה Aו %5 מהמוצרים
המיוצרים במכונה Bהם פגומים.
נבחר מוצר אקראי, מה ההסתברות שהוא פגום?
נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונה ?A
אחרי ביקור של טכנאי שמטפל במכונה , Bמוצאים ש %9.1 ממוצרי המפעל
הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה Bיהיה פגום?
פתרון:
נגדיר את המאורעות הבאים: -Aהמוצר הנבחר יוצר במכונה -B , Aהמוצר
הנבחר יוצר במכונה -C ,Bהמוצר שנבחר פגום.
א. עפ"י נוסחת ההסתברות השלמה מתקיים: )P(C|A)*P(A)+P(C|B)*P(B
640.0=10.0*1.0+ 50.0*9.0
ב. עפ"י נוסחת בייס: )>= P(A|C)=P(C|A)*P(A)/P(C
120.0=640.0/1.0*10.0=)P(A|C
משתנה מקרי בדיד
• מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) למרחב
המאפיינים
• בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי
• Modeling students (Grade and Intelligence):
• W = all possible students
• What are events
• Grade_A = all students with grade A
• Grade_B = all students with grade A
• Intelligence_High = … with high intelligence
Random Variables
W
I:Intelligence High
low
A
G:Grade B A+
Random Variables
W
I:Intelligence High
low
A
G:Grade B A+
P(I = high) = P( {all students whose intelligence is high})
הסתברות משותפת
• Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
• How probable is it to observe these two attributes together?
• How can we manipulate Joint probability distributions?
.1,2,3,4 דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות
. מס' הפעמים שהספרה 1 מופיעהY מס' הספרות השונות המופיעות במס' וX יהי
?(X,Y) מהי ההתפלגות המשותפת של הזוג
X 1 2
Y
0 3/16 6/16
1 0 6/16
2 1/16 0
חוק השרשרת
• Always true
)• P(x,y,z) = p(x) p(y|x) p(z|x, y
)= p(z) p(y|z) p(x|y, z
…=
כדי לסבך קצת את העניינים נוסיף לשאלה מקודם
את הנתון הבא zיהיה מס הפעמים שמופיע ספרה גדולה ממש מ 2
)2=P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x
])61/6(/)61/4([*])2==0.75*[(6/16)/P(x
Conditional Probability
events
P X = x Y = y)
P X = x Y = y) =
P Y = y)
But we will always write it this way:
p ( x, y )
P x | y ) =
p( y )
הסתברות השולית
• We know P(X,Y), what is P(X=x)?
• We can use the low of total probability, why?
p x ) = P x, y ) B5 B3 B2
y B4
= P y )Px | y )
A
B1
B7 B6
y
Marginalization Cont.
• Another example
p x ) = P x, y , z )
y,z
= P y, z )Px | y, z )
z,y
Bayes Rule cont.
• You can condition on more variables
P( x | z ) P( y | x, z )
P x | y , z ) =
P( y | z )
אי תלות
• X is independent of Y means that knowing Y
does not change our belief about X.
• P(X|Y=y) = P(X)
• P(X=x, Y=y) = P(X=x) P(Y=y)
• Why this is true?
• The above should hold for all x, y
• It is symmetric and written as X Y
CI: Conditional Independence
• X Y | Z if once Z is observed, knowing the
value of Y does not change our belief about X
• The following should hold for all x,y,z
• P(X=x | Z=z, Y=y) = P(X=x | Z=z)
• P(Y=y | Z=z, X=x) = P(Y=y | Z=z)
• P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)
We call these factors : very useful concept !!
Properties of CI
• Symmetry:
– (X Y | Z) (Y X | Z)
• Decomposition:
– (X Y,W | Z) (X Y | Z)
• Weak union:
– (X Y,W | Z) (X Y | Z,W)
• Contraction:
– (X W | Y,Z) & (X Y | Z) (X Y,W | Z)
• Intersection:
– (X Y | W,Z) & (X W | Y,Z) (X Y,W | Z)
– Only for positive distributions!
– P(a)>0, 8a, a;
Monty Hall Problem
You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
You pick a door, say No. 1
The host, who knows what's behind the doors,
opens another door, say No. 3, which has a goat.
Do you want to pick door No. 2 instead?
Host reveals
Goat A
or
Host reveals
Goat B
Host must
reveal Goat B
Host must
reveal Goat A
Monty Hall Problem: Bayes
Rule
Ci : the car is behind door i, i = 1, 2, 3
P Ci ) = 1 3
H ij : the host opens door j after you pick door i
0 i= j
0 j=k
P H ij Ck ) =
i=k
1 2
1 i k, j k
Monty Hall Problem
WLOG, i=1, j=3
P H13 C1 ) P C 1 )
P C1 H13 ) =
P H13 )
P H13 C1 ) P C1 ) = =
1 1 1
2 3 6
Monty Hall Problem: Bayes Rule cont.
P H13 ) = P H13 , C1 ) P H13 , C2 ) P H13 , C3 )
= P H13 C1 ) P C1 ) P H13 C2 ) P C2 )
1 1
= 1
6 3
1
=
2
P C1 H13 ) =
16 1
=
12 3
Monty Hall Problem: Bayes Rule cont.
P C1 H13 ) =
16 1
=
12 3
) 1 2
3 3
P C2 H13 = 1 = P C1 H13 )
You should switch!
Moments
Mean (Expectation): = E X )
v P X = v )
Discrete RVs: E X ) = vi i i
Continuous RVs:E X ) = xf x ) dx
Variance: V X) = E X )
2
Discrete RVs: V X ) = vi ) P X = vi )
2
vi
Continuous RVs: V X ) =
x ) f x )dx
2
Properties of Moments
Mean
E X Y) = E X) E Y)
E aX ) = aE X )
If X and Y are independent,E XY ) = E X ) E Y )
Variance
V aX b ) = a 2V X )
)
If X and Y are independent, V X Y = V (X) V (Y)
The Big Picture
Probability
Model Data
Estimation/learning
Statistical Inference
Given observations from a model
What (conditional) independence assumptions
hold?
Structure learning
If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.
Parameter learning
MLE
Maximum Likelihood estimation
Example on board
Given N coin tosses, what is the coin bias (q )?
Sufficient Statistics: SS
Useful concept that we will make use later
In solving the above estimation problem, we only
cared about Nh, Nt , these are called the SS of
this model.
All coin tosses that have the same SS will result in the
same value of q
Why this is useful?
Statistical Inference
Given observation from a model
What (conditional) independence assumptions
holds?
Structure learning
If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.
Parameter learning
We need some concepts from information theory
Information Theory
• P(X) encodes our uncertainty about X
• Some variables are more uncertain that others
P(X) P(Y)
X Y
• How can we quantify this intuition?
• Entropy: average number of bits required to encode X
1 1
H P X ) = E log = P x )log
p x ) x
P x )
Information Theory cont.
• Entropy: average number of bits required to encode X
1 1
H P X ) = E log = P x )log
p x ) x
P x )
• We can define conditional entropy similarly
1
H P X | Y ) = E log = H P X , Y ) H P Y )
p x | y )
• We can also define chain rule for entropies (not surprising)
H P X , Y , Z ) = H P X ) H P Y | X ) H P Z | X , Y )
Mutual Information: MI
• Remember independence?
• If XY then knowing Y won’t change our belief about X
• Mutual information can help quantify this! (not the only
way though)
• MI:
I P X ;Y ) = H
• Symmetric P X ) H P X | Y )
• I(X;Y) = 0 iff, X and Y are independent!
Continuous Random Variables
What if X is continuous?
Probability density function (pdf) instead of
probability mass function (pmf)
A pdf is any function f x ) that describes the
probability density in terms of the input
variable x.
PDF
Properties of pdf
f x ) 0, x
f x) = 1
f x ) 1 ???
Actual probability can be obtained by taking
the integral of pdf
E.g. the probability of X being between 0 and 1 is
1
P 0 X 1) =
0
f x )dx
Cumulative Distribution
Function
FX v ) = P X v )
Discrete RVs
FX v ) = vi
P X = vi )
Continuous RVs
v
FX v ) = f x ) dx
d
FX x ) = f x )
dx
Acknowledgment
Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html
Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem
http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html
Get documents about "