Probability and Statistics Review

Shared by: hcj
Categories
Tags
-
Stats
views:
2
posted:
8/29/2012
language:
Hebrew
pages:
41
Document Sample
scope of work template
							Probability and Statistics
                  Review
           Thursday Mar 12
‫מודל למידה‬
               ‫מה המשתנים החשובים?‬      ‫‪‬‬

                     ‫מה הטווח שלהם?‬     ‫‪‬‬

             ‫מהן הקומבינציות החשובות?‬   ‫‪‬‬
‫חזרה מהירה על הסתברות‬

                  ‫מאורע , אוסף מאורעות‬     ‫•‬
                            ‫משתנה מקרי‬     ‫•‬
 ‫הסתברות מותנית , חוק ההסתברות השלמה ,‬     ‫•‬
                 ‫חוק השרשרת ,חוק בייס‬
                           ‫תלות ואי תלות‬   ‫•‬
            ‫שונות ,תוחלת ושונות משותפת‬     ‫•‬
                               ‫‪Moments‬‬     ‫•‬
Sample space and Events
• W : Sample Space, result of an experiment
  • If you toss a coin twice W = {HH,HT,TH,TT}
• Event: a subset of W
   • First toss is head = {HH,HT}
• S: event space, a set of events:
   • Closed under finite union and complements
      • Entails other binary operation: union, diff, etc.
   • Contains the empty event and W
Probability Measure
• Defined over (W,S) s.t.
  • P(a) >= 0 for all a in S
  • P(W) = 1
  • If a, b are disjoint, then
     • P(a U b) = p(a) + p(b)
• We can deduce other axioms from the above ones
  • Ex: P(a U b) for non-disjoint event
Visualization




• We can go on and define conditional
  probability, using the above visualization
‫הסתברות מותנית‬
-P(F|H) = Fraction of worlds in which H is true that
also have F true



                                        p( F  H )
                         p( F | H ) =
                                           p( H )
Rule of total probability

               B5       B3        B2
        B4

                    A
                             B1
       B7      B6




       p A) =  PBi )P A | Bi )
Bayes Rule
• We know that P(smart) = .7
  • If we also know that the students grade is
    A+, then how this affects our belief about
    his intelligence?

                     P( x) P( y | x)
        P x | y ) =
                         P( y )
  • Where this comes from?
‫דוגמא‬
‫במפעל פעולות שתי מכונות ‪ A‬ו-‪ 10% . B‬מתוצרת המפעל מיוצרת במכונה ‪ A‬ו-‬
      ‫%09 במכונה ‪ 1% .B‬מהמוצרים המיוצרים במכונה ‪ A‬ו %5 מהמוצרים‬
                                         ‫המיוצרים במכונה ‪ B‬הם פגומים.‬
                         ‫‪ ‬נבחר מוצר אקראי, מה ההסתברות שהוא פגום?‬
                ‫‪ ‬נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונה ‪?A‬‬
 ‫‪ ‬אחרי ביקור של טכנאי שמטפל במכונה ‪ , B‬מוצאים ש %9.1 ממוצרי המפעל‬
    ‫הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה ‪ B‬יהיה פגום?‬
                                                                   ‫פתרון:‬
      ‫נגדיר את המאורעות הבאים: ‪-A‬המוצר הנבחר יוצר במכונה ‪-B , A‬המוצר‬
                            ‫הנבחר יוצר במכונה ‪-C ,B‬המוצר שנבחר פגום.‬
    ‫א. עפ"י נוסחת ההסתברות השלמה מתקיים: )‪P(C|A)*P(A)+P(C|B)*P(B‬‬
                                   ‫640.0=10.0*1.0+ 50.0*9.0‬
                       ‫ב. עפ"י נוסחת בייס: )‪>= P(A|C)=P(C|A)*P(A)/P(C‬‬
                                       ‫120.0=640.0/1.0*10.0=)‪P(A|C‬‬
‫משתנה מקרי בדיד‬
 ‫• מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) למרחב‬
                                          ‫המאפיינים‬
   ‫• בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי‬

• Modeling students (Grade and Intelligence):
  • W = all possible students
  • What are events
     • Grade_A = all students with grade A
     • Grade_B = all students with grade A
     • Intelligence_High = … with high intelligence
Random Variables
   W
              I:Intelligence   High

                               low




                                   A
                G:Grade        B       A+
Random Variables
   W
                                    I:Intelligence           High

                                                             low




                                                                 A
                                       G:Grade               B       A+




   P(I = high) = P( {all students whose intelligence is high})
                   ‫הסתברות משותפת‬

     • Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
     • How probable is it to observe these two attributes together?
     • How can we manipulate Joint probability distributions?
                      .1,2,3,4 ‫דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות‬
 .‫ מס' הפעמים שהספרה 1 מופיעה‬Y ‫ מס' הספרות השונות המופיעות במס' ו‬X ‫יהי‬
                                    ?(X,Y) ‫מהי ההתפלגות המשותפת של הזוג‬

                                                         X           1          2
                                                         Y
                                                          0   3/16       6/16
                                                          1          0   6/16
                                                          2   1/16              0
                     ‫חוק השרשרת‬
‫‪• Always true‬‬
   ‫)‪• P(x,y,z) = p(x) p(y|x) p(z|x, y‬‬
              ‫)‪= p(z) p(y|z) p(x|y, z‬‬
               ‫…=‬
                ‫כדי לסבך קצת את העניינים נוסיף לשאלה מקודם‬
 ‫את הנתון הבא ‪ z‬יהיה מס הפעמים שמופיע ספרה גדולה ממש מ 2‬
          ‫)2=‪P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x‬‬
          ‫])61/6(/)61/4([*])2=‪=0.75*[(6/16)/P(x‬‬
Conditional Probability
                                                            events


                                        P X = x  Y = y)
       P X = x Y = y) =
                                           P Y = y)

But we will always write it this way:

                p ( x, y )
   P x | y ) =
                 p( y )
                  ‫הסתברות השולית‬
• We know P(X,Y), what is P(X=x)?
• We can use the low of total probability, why?


p  x ) =  P  x, y )                  B5       B3        B2
          y                       B4


      =  P y )Px | y )
                                             A
                                                      B1
                                 B7     B6
          y
Marginalization Cont.
• Another example

      p  x ) =  P  x, y , z )
               y,z

            =  P y, z )Px | y, z )
                z,y
Bayes Rule cont.
• You can condition on more variables

                      P( x | z ) P( y | x, z )
     P x | y , z ) =
                            P( y | z )
‫אי תלות‬
• X is independent of Y means that knowing Y
  does not change our belief about X.
   • P(X|Y=y) = P(X)
   • P(X=x, Y=y) = P(X=x) P(Y=y)
      • Why this is true?
   • The above should hold for all x, y
   • It is symmetric and written as X  Y
CI: Conditional Independence

• X  Y | Z if once Z is observed, knowing the
  value of Y does not change our belief about X
   • The following should hold for all x,y,z
   • P(X=x | Z=z, Y=y) = P(X=x | Z=z)
   • P(Y=y | Z=z, X=x) = P(Y=y | Z=z)
   • P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)


                        We call these factors : very useful concept !!
Properties of CI
• Symmetry:
   – (X  Y | Z)  (Y  X | Z)
• Decomposition:
   – (X  Y,W | Z)  (X  Y | Z)
• Weak union:
   – (X  Y,W | Z)  (X  Y | Z,W)
• Contraction:
   – (X  W | Y,Z) & (X  Y | Z)  (X  Y,W | Z)
• Intersection:
   – (X  Y | W,Z) & (X  W | Y,Z)  (X  Y,W | Z)
   – Only for positive distributions!
   – P(a)>0, 8a, a;
Monty Hall Problem

   You're given the choice of three doors: Behind one
    door is a car; behind the others, goats.
   You pick a door, say No. 1
   The host, who knows what's behind the doors,
    opens another door, say No. 3, which has a goat.
   Do you want to pick door No. 2 instead?
Host reveals
  Goat A
     or
Host reveals
  Goat B




   Host must
 reveal Goat B




   Host must
 reveal Goat A
Monty Hall Problem: Bayes
Rule

  Ci : the car is behind door i, i = 1, 2, 3
 P  Ci ) = 1 3

 H ij : the host opens door j after you pick door i


                     0      i= j
                     0      j=k
                     
    
    P H ij Ck   )   =
                            i=k
                     1 2
                      1 i  k, j  k
                     
Monty Hall Problem
    WLOG, i=1, j=3
                      P  H13 C1 ) P  C 1 )
     P  C1 H13 ) =
                            P  H13 )
 



     P  H13   C1 ) P C1 ) =  =
 
                             1 1 1
                             2 3 6
Monty Hall Problem: Bayes Rule cont.
    P  H13 ) = P  H13 , C1 )  P  H13 , C2 )  P  H13 , C3 )
               = P  H13 C1 ) P  C1 )  P  H13 C2 ) P  C2 )
                1     1
             =  1
                6     3
                1
             =
                2

     P  C1 H13 ) =
                   16 1
                        =
                    12 3
Monty Hall Problem: Bayes Rule cont.

     P  C1 H13 ) =
                   16 1
                       =
                    12 3
            )       1 2
                      3 3
                          
     P C2 H13 = 1  =  P C1 H13   )
    You should switch!
Moments

   Mean (Expectation):  = E  X )
                             v P X = v )
        Discrete RVs: E  X ) =   vi i                  i
                                    
       Continuous RVs:E  X ) =  xf  x ) dx
                                      


    Variance: V  X) = E  X   )
                                                2

        Discrete RVs: V  X ) =            vi   ) P  X = vi )
                                                    2
    
                                      vi
                                           
        Continuous RVs: V  X ) =
                                             x  )           f  x )dx
                                                            2
    
                                           
Properties of Moments

   Mean
         E X  Y) = E X)  E Y)
         E  aX ) = aE  X )
       If X and Y are independent,E  XY ) = E  X )  E  Y )
   Variance
    
          V  aX  b ) = a 2V  X )
    
                                          )
        If X and Y are independent, V X  Y = V (X)  V (Y)
The Big Picture
              Probability




     Model                         Data



             Estimation/learning
Statistical Inference

   Given observations from a model
       What (conditional) independence assumptions
        hold?
           Structure learning
       If you know the family of the model (ex,
        multinomial), What are the value of the
        parameters: MLE, Bayesian estimation.
           Parameter learning
MLE

   Maximum Likelihood estimation
       Example on board
           Given N coin tosses, what is the coin bias (q )?
   Sufficient Statistics: SS
       Useful concept that we will make use later
       In solving the above estimation problem, we only
        cared about Nh, Nt , these are called the SS of
        this model.
           All coin tosses that have the same SS will result in the
            same value of q
           Why this is useful?
Statistical Inference

   Given observation from a model
       What (conditional) independence assumptions
        holds?
           Structure learning
       If you know the family of the model (ex,
        multinomial), What are the value of the
        parameters: MLE, Bayesian estimation.
           Parameter learning


                 We need some concepts from information theory
Information Theory
• P(X) encodes our uncertainty about X
  • Some variables are more uncertain that others


  P(X)                            P(Y)
                        X                                      Y



  • How can we quantify this intuition?
     • Entropy: average number of bits required to encode X

                                1                    1
             H P  X ) = E log         =  P x )log
                               p x ) x
                                                     P x )
Information Theory cont.
• Entropy: average number of bits required to encode X

                                  1                    1
               H P  X ) = E log         =  P x )log
                                 p x ) x
                                                       P x )

• We can define conditional entropy similarly

                                    1 
           H P  X | Y ) = E log            = H P  X , Y )  H P Y )
                                 p x | y )

• We can also define chain rule for entropies (not surprising)

             H P  X , Y , Z ) = H P  X )  H P Y | X )  H P Z | X , Y )
Mutual Information: MI
• Remember independence?
   • If XY then knowing Y won’t change our belief about X
   • Mutual information can help quantify this! (not the only
      way though)
• MI:
        I P X ;Y ) = H
   • Symmetric P  X )  H P  X | Y )
   • I(X;Y) = 0 iff, X and Y are independent!
Continuous Random Variables

   What if X is continuous?
   Probability density function (pdf) instead of
    probability mass function (pmf)
   A pdf is any function f  x ) that describes the
    probability density in terms of the input
    variable x.
PDF
   Properties of pdf
        f  x )  0, x
            
         
         
                  f  x) = 1

         f  x )  1 ???
    

   Actual probability can be obtained by taking
    the integral of pdf
       E.g. the probability of X being between 0 and 1 is
                                                      1
                               P  0  X  1) =   
                                                  0
                                                          f  x )dx
Cumulative Distribution
Function
       FX  v ) = P  X  v )
   Discrete RVs
          FX  v ) =          vi
                                     P  X = vi )
   Continuous RVs
                            v
          FX  v ) =           f  x ) dx
    
                        
            d
               FX  x ) = f  x )
    
            dx
Acknowledgment

   Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html
   Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem
   http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html

						
Related docs
Other docs by hcj