Docstoc

Probability

Document Sample
Probability Powered By Docstoc
					Probability

 Lecture 2
              Probability
• Why did we spend last class talking about
  probability?
• How do we use this?
            You’re the FDA
• A company wants you to approve a new
  drug
• They run an experimental trial
  – 40 people have the disease
  – 20 get drug, 20 get placebo
  – Random assignment
  – Conducted perfectly
             You’re the FDA
• Results:
  – Placebo group: 10 of 20 live
  – Drug group: 11 of 20 live
• Does the drug work?
  – Would you approve it?
  – Why or why not?
            You’re the FDA
• Different study, same design
• Results:
  – Placebo: 2 of 20 live
  – Drug: 18 of 20 live
• Does the drug work?
  – Would you approve it?
  – Why or why not?
            You’re the FDA
• Different study, same design
• Results:
  – Placebo: 8 of 20 live
  – Drug: 12 of 20 live
• Does the drug work?
  – Would you approve it?
  – Why or why not?
• How big of a difference do we need?
            Why probability
• Probability provides the answer
  – Set of agreed on rules
  – All based on mathematical formula
                 Example
• How many of you would accept the
  following wager:
  – If no two people in the class have the same
    birthday (month and day) you get an
    automatic A.
  – If two or more people in class have the same
    birthday, you get an automatic F.
• Not ethical for me to accept the wager
                Example
• Would you have won?
                        Example
• Would you have won?
• What is the probability?
  – Not 60/365
  – Think of the complement
  – How many possible pairs are there in the class?
     •   Me and each student = N
     •   First student and every other student = N-1
     •   Second student and every remaining student = N-2
     •   …
     •   Last two students
     •   =      = 1770
                  Example
• P of any pair matches is 1/365 = 0.00274
• P any pair doesn’t match is 1-0.00274
  – = 0.99726
• We have 1770 pairs.
  – Remember the rule
  – Joint probability of all not matching is:
  – P(first pair not match)*P(second pair not
    match)*…*P(last pair not match)
               What is random?
• What are the odds that the first flip is a heads?
   –½
   – Each outcome is equally likely
• The second flip?
   –½
• So what are the odds that both are?
   – Four outcomes:
      • HH, HT, TH, TT
      • so ¼ (each equally likely)
              What is random?
• Odds the third flip is a heads?
   –½
• Odds that all three are heads?
   – 8 outcomes
   – HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
   – So, 1/8
• Odds the fourth flip is a heads?
   –½
• All four?
   – 1/16
               What is random?
• Odds that five in a row are heads?
   – 1/32
• Odds that six in a row?
   – 1/64
• If we did this as a probability they would be:
   –   0.5
   –   0.25
   –   0.125
   –   0.0625
   –   0.03125
   –   0.0078125
• Each is the previous probability multiplied by 0.5
                    Example
• P of any pair matches is 1/365 = 0.00274
• P any pair doesn’t match is 1-0.00274
  – = 0.99726
• We have 1770 pairs.
  – Remember the rule
  – Joint probability of all not matching is:
  – P(first pair not match)*P(second pair not
    match)*…*P(last pair not match)
  – = 0.99726 1770
  – = 0.008
• Seems likely that at least one would match
• Rules of probability and math let us
  determine how likely an event is.
• Want to be able to determine “statistical
  significance”
  – Can we conclude that the pattern we see
    didn’t happen by chance?
          What is “statistical
           significance?”
• First, let’s be clear about what statistical
  significance is NOT.
• A finding that a relationship between some
  X and some Y is “statistically significant”
  does NOT mean that the relationship is
  “strong.” (It might be strong, but not
  because it’s statistically significant.)
    This is a common mistake
• Many people think that a “statistically
  significant” relationship is by definition a
  “strong” one. In fact, many people think
  that “statistical significance” IS ITSELF a
  test of the strength of the relationship. It’s
  not.
       Then what is statistical
           significance?
• It is a probabilistic statement—typically, 95%
  confidence—that the relationship we observe in
  the sample, no matter how strong or weak,
  exists in the population.
           But, as always…
• There is a 5% chance we could be wrong
  —that is, that despite what we observe in
  the sample, there really is no relationship
  in the population.
     How do we demonstrate
     statistical significance?
• We perform something called “hypothesis
  testing.”
• We actually begin with a statement called
  the “null hypothesis.” It is always a
  statement that there is not a relationship
  between two variables.
    Why a Null Hypothesis?
• We want to know if there is a relationship
• Our theory is not strong enough to tell us
  how large the effect is
  – Theory: Gender helps determine vote choice
  – Hypothesis: Women were more likely to vote
    for Obama than men were
  – Problem: How much more? We don’t know.
• How large of a difference would be big
  enough?
           Null Hypothesis
• Big enough to not happen by chance

• Ok, but how much is enough to be “not by
  chance?”

• This is where probability comes in
• Anything is possible—the normal
  distributions is unbounded.
         Null Hypothesis
• Everything may be possible, but
  everything is not probable

• We want to know the probability that a
  relationship could exist in the data by
  chance
         Example: Gender Gap
              Female      Male



McCain        281 (31%)   230 (37%)



Obama         625 (69%)   391 (63%
                 Probability
• If we make some assumptions we can
  calculate how probable any outcome is.
• What do we assume?
  – There is no difference between treatments
  – What the probability distribution is (this is
    technical and I will tell you what matters).
• With these, we can calculate P(data
  occurred by chance).
               Probability
• But that isn’t exactly what we want to
  know.
• We want to know probability that there is a
  difference, this would be probability that
  there is no difference.
• Unfortunately that is as good as we can do
                  Probability
• So, what is the null hypothesis (since that is
  where this started)?
• It is the hypothesis that there is no relationship
  (thus, “null”). This is what we can test.
• It is the inverse of what we want to know.
• So, if our theory is right the null hypothesis is
  wrong and we will reject the null hypothesis.
• If our theory is wrong, we will accept the null
  hypothesis
• What does this              Female Male
  mean?
• How likely are we
  to see this by
  chance?              McCain 281     230
• If there were no            (31%)   (37%)
  difference           Obama 625      391
  between genders,            (69%)   (63%
  the probability of
  seeing this
  difference is 0.01
•   0.01
•   That is pretty unlikely, but what does that
    mean?
•   One of three things occurred
    1. The data are wrong
    2. We were really unlucky
    3. The assumption of no relationship is wrong
– Conclusion is the last one. We have a
  relationship.
               Probability
• How unlikely does the null have to be for
  us to reject it?
• 1 out of 20 (5%)
• Why?
• Vestige of pre-computer days
  – Norm

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:6/11/2013
language:English
pages:31