Attribute Interactions in Medical Data Analysis A. Jakulin1, I. Bratko1,2, D. Smrke3, J. Demšar1, B. Zupan1,2,4 1. University of Ljubljana, Slovenia. 2. Jožef Stefan Institute, Ljubljana, Slovenia. 3. Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia. 4. Dept. of Human and Mol. Genetics, Baylor College of Medicine, USA. Overview 1. Interactions: – Correlation can be generalized to more than 2 attributes, to capture interactions - higher-order regularities. 2. Information theory: – A non-parametric approach for measuring ‘association’ and ‘uncertainty’. 3. Applications: – Automatic selection of informative visualizations uncover previously unseen structure in medical data. – Automatic constructive induction of new features. 4. Results: – Better predictive models for hip arthroplasty. – Better understanding of the data. Attribute Dependencies label (outcome, diagnosis) C importance of attribute A importance of attribute B attribute attribute (feature) A B (feature) attribute correlation 3-Way Interaction: 2-Way Interactions What is common to A, B and C together; and cannot be inferred from pairs of attributes. Shannon’s Entropy Entropy given C’s empirical probability distribution (p = [0.2, 0.8]). A C H(C|A) = H(C)-I(A;C) Conditional entropy --- Remaining uncertainty H(A) in C after knowing A. Information which came with I(A;C)=H(A)+H(C)-H(AC) H(AB) knowledge of A Mutual information or information gain --- Joint entropy How much have A and C in common? Interaction Information I(A;B;C) := I(AB;C) - I(A;C) - I(B;C) = I(A;B|C) - I(A;B) • Interaction information can be: – NEGATIVE – redundancy among attributes (negative int.) – NEGLIGIBLE – no interaction – POSITIVE – synergy between attributes (positive int.) History of Interaction Information (Partial) history of independent reinventions: • McGill ‘54 (Psychometrika) - interaction information • Han ‘80 (Information & Control) - multiple mutual information • Yeung ‘91 (IEEE Trans. Inf. Theory) - mutual information • Grabisch & Roubens ‘99 (game theory) - Banzhaf interaction index • Matsuda ‘00 (Physical Review E) - higher-order mutual inf. • Brenner et al. ‘00 (Neural Computation) - average synergy • Demšar ’02 (machine learning) - relative information gain • Bell ‘03 (NIPS02, ICA2003) - co-information • Jakulin ’03 (machine learning) - interaction gain Utility of Interaction Information 1. Visualization of interactions in data • Interaction graphs, dendrograms 2. Construction of predictive models • Feature construction, combination, selection Case studies: • Predicting the success of hip arthroplasty (HHS). • Predicting the contraception method used from demographic data (CMC). Predictive modeling helps us focus only on interactions that involve the outcome. Interaction Matrix for CMC Domain Illustrates the interaction information for all pairs of attributes. red – positive, blue – negative, green – independent. Interaction Graphs Information gain: 100% I(A;C)/H(C) The attribute “explains” 1.98% of label entropy A positive interaction: 100% I(A;B;C)/H(C) The two attributes are in a synergy: treating them holistically may result in 1.85% extra uncertainty explained. A negative interaction: 100% I(A;B;C)/H(C) The two attributes are slightly redundant: 1.15% of label uncertainty is explained by each of the two attributes. Interaction Dendrogram weakly interacting strongly interacting loose cluster “tightness” tight uninformative informative attribute attribute information gain Interpreting the Dendrogram an unimportant interaction a cluster of negatively interacting attributes a positive interaction a weakly negative interaction a useless attribute Application to the Harris hip score prediction (HHS) “Bipolar endoprosthesis and Attribute Structure for HHS short duration of operation significantly increases the chances of a good outcome.” “Presence of neurological disease is a high risk factor only in the presence of other complications during operation.” late complications rehabilitation Discovered from data Designed by the physician A Positive Interaction Both attributes are useless alone, but useful together. They should be combined into a single feature (e.g. with a classification tree, a rule or a Cartesian product attribute). These two attributes are also correlated: correlation doesn’t imply redundancy. A Negative Interaction very few instances! Once we know the wife’s or the husband’s education, the other attribute will not provide much new information. But they do provide some, if you know how to use it! Feature combination may work: feature selection throws data away. Prediction of HHS Brier score - probabilistic evaluation (K classes, N instances): 2 BS( p, p) pi , j pi , j N K 1 ˆ ˆ K i j Models: • Tree-Augmented NBC: 0.227 ± 0.018 • Naïve Bayesian classifier: 0.223 ± 0.014 • General Bayesian net: 0.208 ± 0.006 • Simple feature selection with NBC: 0.196 ± 0.012 • FSS with background concepts: 0.196 ± 0.011 • 10 top interactions → FSS: 0.189 ± 0.011 – Tree-Augmented NB: 0.207 ± 0.017 – Search for feature comb.: 0.185 ± 0.012 These two (not very logical) combinations of features are only The Best Model worth 0.2% loss in performance. The endoprosthesis and operation duration interaction provides little information that wouldn’t already be provided by these attributes: it interacts negatively with the model. A Causal Diagram loss of pulmonary sitting consciousness disease ability cause effect late injury luxation HHS operation time luxation moderator diabetes neurological hospitalization disease duration Orange Summary 1. Visualization methods attempt to: • Summarize the relationships between attributes in data (interaction graph, interaction dendrogram, interaction matrix). • Assist the user in exploring the domain and constructing classification models (interactive interaction analysis). 2. What to do with interactions: • Do make use of interactions! (rules, trees, dependency models) • Myopia: naïve Bayesian classifier, linear SVM, perceptron, feature selection, discretization. • Do not assume an interaction when there isn’t one! • Fragmentation: classification trees, rules, general Bayesian networks, TAN.
Pages to are hidden for
"Interaction Dendrogram"Please download to view full document