Docstoc

Case Study 3- Surviving the Titanic Disaster - DOC

Document Sample
Case Study 3- Surviving the Titanic Disaster - DOC Powered By Docstoc
					                    Case Study 3- Surviving the Titanic Disaster
Description
The Titanic disaster, which occurred on the 31st of May 1911, still captures the interest
of the film producers, historians and other scientists. The carefully designed boat
became the tomb of more than 1500 people. Several characteristics of the passengers
are recorded in the dataset used for this analysis. The dataset contains 2201 subjects.
The data available are coded as follows:

SURVIVED: 0 = Not survived
            1 = Survived
AGE:     0 = Child
         1 = Adult
GENDER: 0 = Male
         1 = Female
CLASS:   1 = First class
         2 = Second class
         3 = Third class
         4 = Crew members

                                                                                        1
 Which factor(s) is most important in predicting survival rate?
 Which subgroup has the highest survival rate? For instance,
      Is “Women and children first?” true in emergencies?
      Did crew leave the boat last resulting in low survival rate in this group?




                                                                                    2
Suggested approaches:

Approach                 Reason                     Type      of questions
                                                    addressed
Data Restructuring        To compare the survival “Is it women and children
Create a new variable for rates of adult males with first?”
women and children        the combination of women
                          and children

Create dummy variables for To use in regression
class variable
Summary statistics
Survival rates for each To compare survival rates “Which subgroup has the
group (eg male vs female, of different groups     highest survival rate?”
or first class vs second
class)




                                                                          3
Odds ratio between survival   To quantify the association “Is the survival independent
status and age; between       between survival and other of       characteristic    of
survival status and gender    variables                   passenger?”
Cross-table of survival
status and class
Visual displays
Bar charts of survival for    To compare survival rates “Which subgroup has the
all variables                 of different subgroups    highest survival rate?”

Mosaic     plots    of   all To check independence        “Are              variables
categorical variables        among variables, and to      independent?” “Are there
                             explore      multivariate    any unusually small or
                             relations                    large subgroups?”

Regression
Logistic   regression of To determine the most “Which factor(s) is most
survival status on other significant    factors in important in predicting/
variables                survival from Titanic     estimating survival rate?”


                                                                                     4
      Partial solutions

      Raw counts

Not survived Survived
    1490     711

                          female    male
adult        child
2092        109            470     1731

1st     2nd     3rd   crew
325      285    706       885




                                           5
Survival             1st                2nd           3rd             Crew          Total
rate(Total)
Total                0.625 (325)        0.414 (285)   0.252 (706)     0.24 (885)
Women &              0.973 (150)        0.89 (117)    0.422 (244)     0.87 (23)     0.698 (534)
Children
Adult Males          0.326 (175)        0.083 (168)   0.162 (462)     0.223 (862)   0.203 (1667)
Adult                0.972 (144)        0.86 (93)     0.46 (165)      0.869 (23)    0.744 (425)
Female
Children             1 (6)              1 (24)        0.34 (79)              (0)    0.523 (109)

                          Male Adults    Women & Children     Total
Survived                  338            373                  711
Not survived              1329           161                  1490
Total                     1667           534                  2201
     ^        338 *161
    OR   =   1329 * 373
                          = 0.11

    Odds of male adults surviving the Titanic disaster was 90% less likely compared
    to odds of women & children surviving.

                                                                                                   6
7
8
Logistic regression

logit(P(Yi=1))=β0+ β1* I(Women&Child)

                       where I(Women&Child)=0 for adult male

Coefficients:
                        Estimate      Std. Error    z value     Pr(>|z|)
(Intercept)            -1.36914        0.06092      -22.48      <2e-16 ***
I(Women&Child)         2.20931         0.11226      19.68       <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


AIC: 2338.8




                                                                             9
    Classification table

                                          Observed
                           Not survived       Survived
Predicted
            Not survived   1329               338      1667
            Survived       161                373      534
            Total          1490               711      2201

Misclassification rate= (338+161)/2201=0.2267


    logit(P(Yi=1))=β0+ β1*AGE + β2*SEX + β3 * I (c1) + β4 * I (c2) + β5 * I (c3)

    where    c1= 0 if crew and 1 o.w.;
             c2 = 0 if 2nd class or crew and 1 o.w.;
             c3 = 0 if 3rd class or crew and 1 o.w



                                                                                   10
    Coefficients:
                      Estimate    Std. Error z value    Pr(>|z|)
    (Intercept)           -0.1724      0.2567    -0.671     0.502
    titanic2[, "AGE"]     -1.0615       0.2440   -4.350     1.36e-05 ***
    titanic2[, "SEX"]     2.4201        0.1404   17.236     < 2e-16 ***
    titanic2[, c1]        -1.9382      0.2535    -7.645     2.09e-14 ***
    titanic2[, c2]        1.0181        0.1960   5.194      2.05e-07 ***
    titanic2[, c3]        1.7778        0.1716   10.362     < 2e-16 ***

    AIC: 2222.1

Akaike Information Criterion (AIC)
This measure indicates a better fit when it is smaller. The measure is not standardized
and is not interpreted for a given model. For two models estimated from the same data
set, the model with the smaller AIC is to be preferred.
         this is a better fit than the previous one.

    Misclassification rate decreased slightly from 0.2267 to 0.2217.


                                                                                      11
As age increases, the odds of survival decreases, or equivalently, probability of
survival decreases. (odds=p/(1-p), so odds ↓ imply p ↓ and (1-p) ↑)
So, an adult has lower odds of survival compared to a child.

Odds ratio of survival for females to males = exp(2.42)=11.25
  females were 11.25 times more likely to survive titanic compared to males

Compare odds of survival for 1st class with 2nd class: OR=exp(1.018)=2.7676

Compare odds of survival for 2nd class with 3rd class: OR=exp(1.778-1.018)=2.14

Compare odds of survival for 3rd class passengers with crew: OR=exp(-
1.94+1.018)=0.4 ! (3rd class is 60% less likely to survive!)

Why comparison of odds for survival for crew and 3rd class does not make much
sense?
Chisq = 349.9, df = 3, p-value = 1.557e-75
   reject Ho: class and sex are independent.


                                                                               12

				
DOCUMENT INFO