Interaction Dendrogram by yDCfAZsM

VIEWS: 4 PAGES: 20

									              Attribute Interactions
                    in Medical Data Analysis



     A. Jakulin1, I. Bratko1,2, D. Smrke3, J. Demšar1, B. Zupan1,2,4

1.     University of Ljubljana, Slovenia.
2.     Jožef Stefan Institute, Ljubljana, Slovenia.
3.     Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia.
4.     Dept. of Human and Mol. Genetics, Baylor College of Medicine, USA.
                      Overview
1. Interactions:
  –   Correlation can be generalized to more than 2
      attributes, to capture interactions - higher-order
      regularities.
2. Information theory:
  –   A non-parametric approach for measuring
      ‘association’ and ‘uncertainty’.
3. Applications:
  –   Automatic selection of informative visualizations
      uncover previously unseen structure in medical data.
  –   Automatic constructive induction of new features.
4. Results:
  –   Better predictive models for hip arthroplasty.
  –   Better understanding of the data.
          Attribute Dependencies
                           label
                    (outcome, diagnosis)
                            C
importance of attribute A           importance of attribute B


                                                 attribute
   attribute
   (feature)
               A                             B   (feature)
                     attribute correlation

                    3-Way Interaction:
                   2-Way Interactions
          What is common to A, B and C together;
       and cannot be inferred from pairs of attributes.
                 Shannon’s Entropy
Entropy given C’s empirical probability distribution (p = [0.2, 0.8]).




                  A            C                   H(C|A) = H(C)-I(A;C)
                                                   Conditional entropy ---
                                                   Remaining uncertainty
      H(A)                                         in C after knowing A.
   Information
which came with
             I(A;C)=H(A)+H(C)-H(AC)                       H(AB)
 knowledge of A
      Mutual information or information gain ---       Joint entropy
        How much have A and C in common?
          Interaction Information

                                  I(A;B;C) :=
                                     I(AB;C) - I(A;C) - I(B;C)
                                            = I(A;B|C) - I(A;B)



• Interaction information can be:
   – NEGATIVE – redundancy among attributes (negative int.)
   – NEGLIGIBLE – no interaction
   – POSITIVE – synergy between attributes (positive int.)
    History of Interaction Information

(Partial) history of independent reinventions:

•   McGill ‘54 (Psychometrika)                - interaction information
•   Han ‘80 (Information & Control)           - multiple mutual information
•   Yeung ‘91 (IEEE Trans. Inf. Theory)       - mutual information
•   Grabisch & Roubens ‘99 (game theory)      - Banzhaf interaction index
•   Matsuda ‘00 (Physical Review E)           - higher-order mutual inf.
•   Brenner et al. ‘00 (Neural Computation)   - average synergy
•   Demšar ’02 (machine learning)             - relative information gain
•   Bell ‘03 (NIPS02, ICA2003)                - co-information
•   Jakulin ’03 (machine learning)            - interaction gain
Utility of Interaction Information
1. Visualization of interactions in data
   •   Interaction graphs, dendrograms
2. Construction of predictive models
   •   Feature construction, combination, selection

Case studies:
• Predicting the success of hip arthroplasty (HHS).
• Predicting the contraception method used from
   demographic data (CMC).

Predictive modeling helps us focus only on
    interactions that involve the outcome.
Interaction Matrix for CMC Domain




Illustrates the interaction information for all pairs of attributes.
      red – positive, blue – negative, green – independent.
Interaction Graphs
                    Information gain:
                   100% I(A;C)/H(C)
                 The attribute “explains”
                 1.98% of label entropy

                   A positive interaction:
                   100% I(A;B;C)/H(C)
            The two attributes are in a synergy:
           treating them holistically may result
          in 1.85% extra uncertainty explained.

                  A negative interaction:
                   100% I(A;B;C)/H(C)
         The two attributes are slightly redundant:
         1.15% of label uncertainty is explained
               by each of the two attributes.
             Interaction Dendrogram
weakly interacting                                strongly interacting
     loose               cluster “tightness”            tight




               uninformative                              informative
                  attribute                                 attribute
                               information gain
Interpreting the Dendrogram
                           an unimportant interaction




                                      a cluster of
                                      negatively
                                      interacting
                                      attributes


                                  a positive interaction




                                       a weakly negative
                                       interaction

            a useless attribute
Application to the Harris hip
  score prediction (HHS)
                  “Bipolar endoprosthesis and
  Attribute Structure for HHS short
                  duration of operation significantly
                             increases the chances of a good
                                        outcome.”



                           “Presence of neurological disease is
                               a high risk factor only in the
                             presence of other complications
                                    during operation.”

                                             late complications


                                                  rehabilitation




Discovered from data           Designed by the physician
          A Positive Interaction




   Both attributes are useless alone, but useful together.
They should be combined into a single feature (e.g. with a
classification tree, a rule or a Cartesian product attribute).
         These two attributes are also correlated:
          correlation doesn’t imply redundancy.
            A Negative Interaction

                                                       very few
                                                      instances!




      Once we know the wife’s or the husband’s education,
    the other attribute will not provide much new information.
       But they do provide some, if you know how to use it!
Feature combination may work: feature selection throws data away.
                Prediction of HHS
Brier score - probabilistic evaluation (K classes, N instances):
                                                 2

                BS( p, p)    pi , j  pi , j 
                                 N   K
                           1
                       ˆ                  ˆ
                           K i j
Models:
• Tree-Augmented NBC:                         0.227 ± 0.018
• Naïve Bayesian classifier:                  0.223 ± 0.014
• General Bayesian net:                       0.208 ± 0.006
• Simple feature selection with NBC:          0.196 ± 0.012
• FSS with background concepts:               0.196 ± 0.011
• 10 top interactions → FSS:                  0.189 ± 0.011
   – Tree-Augmented NB:                       0.207 ± 0.017
   – Search for feature comb.:                0.185 ± 0.012
                          These two (not very logical)
                        combinations of features are only
                                            The Best Model
                        worth 0.2% loss in performance.




   The endoprosthesis and
operation duration interaction
provides little information that
wouldn’t already be provided
by these attributes: it interacts
  negatively with the model.
           A Causal Diagram

      loss of      pulmonary     sitting
   consciousness    disease      ability             cause

                                     effect
  late                                               injury
luxation             HHS                         operation time
                                 luxation           moderator
                   diabetes

 neurological                  hospitalization
   disease                        duration
Orange
                       Summary
1. Visualization methods attempt to:
   •   Summarize the relationships between attributes in
       data (interaction graph, interaction dendrogram,
       interaction matrix).
   •   Assist the user in exploring the domain and
       constructing classification models (interactive
       interaction analysis).
2. What to do with interactions:
   •   Do make use of interactions! (rules, trees,
       dependency models)
       •   Myopia: naïve Bayesian classifier, linear SVM, perceptron,
           feature selection, discretization.
   •   Do not assume an interaction when there isn’t one!
       •   Fragmentation: classification trees, rules, general Bayesian
           networks, TAN.

								
To top