Boosting_Prediction

Document Sample
Boosting_Prediction Powered By Docstoc
					                           Boosting and predictive modeling
University of Washington




                                     Yoav Freund
                                  Columbia University



                                                              1
                                    What is “data mining”?
                           Lots of data - complex models
                           • Classifying customers using transaction
                             logs.
                           • Classifying events in high-energy physics
                             experiments.
University of Washington




                           • Object detection in computer vision.
                           • Predicting gene regulatory networks.
                           • Predicting stock prices and portfolio
                             management.
                                                                         2
                                                        Leo Breiman
                                              Statistical Modeling / the two cultures
                                                          Statistical Science, 2001

                           •       The data modeling culture (Generative modeling)
                               –      Assume a stochastic model (5-50 parameters).
                               –      Estimate model parameters.
                               –      Interpret model and make predictions.
                               –      Estimated population: 98% of statisticians
                           •       The algorithmic modeling culture (Predictive modeling)
                               –      Assume relationship btwn predictor vars and response vars
University of Washington




                                      has a functional form (10^2 -- 10^6 parameters).
                               –      Search (efficiently) for the best prediction function.
                               –      Make predictions
                               –      Interpretation / causation - mostly an after-thought.
                               –      Estimated population: 2% 0f statisticians (many in other fields).



                                                                                                          3
                                         Toy Example
                           • Computer receives telephone call
                           • Measures Pitch of voice
                           • Decides gender of caller
University of Washington




                                                                Male
                                        Human
                                        Voice
                                                                Female

                                                                         4
                                         Generative modeling



                                         mean1          mean2
                           Probability




                                                 var1       var2
University of Washington




                                                                   Voice Pitch
                                                                                 5
                  University of Washington



                                        No. of mistakes   Discriminative approach




    Voice Pitch
6
                  University of Washington


                              No. of mistakes
                               Probability




                                               mean1
                                             mean2
                                                       Ill-behaved data




    Voice Pitch
7
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           8
                                               Plan of talk
                           •   Boosting: Combining weak classifiers.
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           9
                                       batch learning for
                                      binary classification

                           Data distribution:     x,y ~ D;    y  1,1

                           GOAL: find   h : X  1,1
                                                     that   minimizes
                                       
                           Generalization error:       h Px,y ~D h(x)  y
                                                            Ý
University of Washington




                                
                           Training set:   T  x1,y1,x2 ,y2 ,..., xm ,ym ; T ~ D m

                           Training error: 
                                           1
                                   Ý 1h(x)  y Px,y ~T h(x)  y
                                  (h) 
                                  ˆ                   Ý
                                           m x,y T

                                                                                           10
                                  A weighted training set
University of Washington




                           x1,y1,w1,x2 ,y2 ,w2 , ,xm ,ym ,wm 

                                                                      11
                                                    A weak learner

                            Weighted
                           training set                                                  A weak rule
                  x1,y1,w1,x2 ,y2 ,w2 , ,xm ,ym ,wm 
                                                             Weak Learner                  h
University of Washington




                                        instances                     predictions
                                                                h
                                     x1,x2 , ,xm                    y1, y2 , , ym ; yi  {0,1}
                                                                    ˆ ˆ        ˆ    ˆ

                                                                     y yw
                                                                     m
                                                                        ˆ
                                                                     i1 i i       i
                           The weak requirement:                                        0
                                                                      w
                                                                        m

                                                                     i1   i
                                                                                                       12
                                                                        The boosting process

                           x1, y1,1,x2, y2,1, ,xn , yn ,1
                                                                                           weak learner               h1

x , y ,w ,x , y ,w , ,x , y ,w 
                 1          1
                                 1
                                 1       2   2
                                                 1
                                                 2              n       n
                                                                              1
                                                                              n
                                                                                           weak learner               h2
                           x , y ,w ,x , y ,w , ,x , y ,w 
                                     2                  2                          2
                                                                                                          
                             1   1   1       2   2      2               n     n    n
                                                                                                                       h3
University of Washington




                                     x , y ,w ,x , y ,w , ,x , y ,w 
                                                 T 1                       T 1               T 1
                                                                                                       
                                         1   1   1          2       2       2          n   n   n
                                                                                                                             hT
                                                                                                          

                                             FT x  1h1x  2h2 x                                       T hT x
                                                        Final rule:                        fT (x)  signFT x
                                                                                                                                13
                                              Adaboost
                                           Freund, Schapire 1997



                           F0 x  0
                           for t 1..T
                              w  exp yi Ft1(xi )
                                i
                                 t


                              Get ht fromweak  learner
University of Washington




                                                                        t 
                                        t i i
                              t  ln i:h x 1,y 1 wi i:h x 1,y 1 wi 
                                                       t
                                                           t i  i       
                              Ft1  Ft  t ht
                                                                                 14
                                    Main property of Adaboost

                           If advantages of weak rules over random
                              guessing are: 1,2,..,T then
                              training error of final rule is at most
University of Washington




                                                       T 2 
                                          fT   exp  t 
                                         ˆ            
                                                       t1 



                               
                                                                        15
                                       Boosting block diagram

                           Strong Learner                       Accurate
                                                                  Rule
                                         Weak
                                        Learner
University of Washington




                                Weak          Example
                                rule          weights


                                        Booster


                                                                           16
                                      Adaboost as gradient-descent


     Logitboost= ln 1 eyFt (x)                           Loss

                                                    1                         
                                                             yF x   c  t 
                                   Brownboost=         1 erf t
                                                                              
                                                                               
                                                    2           ct        
                                                                                

                           
University of Washington




                                           
                                0-1 loss
                                                                                                 yFt (x)
                                                                                     Adaboost = e
                                                                                      y Ft (x)
                                                Mistakes               Correct
                                                                                      
                                                                                                          17
                                             Plan of talk
                           • Boosting
                           • Alternating Decision Trees: a hybrid of boosting
                             and decision trees
                           • Data-mining AT&T transaction logs.
                           • The I/O bottleneck in data-mining.
                           • Resistance of boosting to over-fitting.
University of Washington




                           • Confidence rated prediction.
                           • Confidence-rating for object recognition.
                           • Gene regulation modeling.
                           • Summary

                                                                                18
                                            Decision Trees

                                                 Y

                                X>3
                                                              +1

                                                 5
University of Washington




                           -1         Y>5            -1

                                                              -1
                                -1          +1

                                                                   X
                                                          3
                                                                       19
                              A decision tree as a sum of weak rules.

                                                               Y
                                         -0.2


                                         X>3                                   +1
                                                                              +0.2

                                                 +0.1
                           sign
University of Washington




                                  -0.1
                                                                   -0.1
                                                                    -1    -0.2 +0.1
                                                Y>5
                                                                                -1
                                                                               -0.3

                                          -0.3          +0.2
                                                                                      X

                                                                                          20
                                         An alternating decision tree
                                                               Freund, Mason 1997

                                                                             Y

                                                       -0.2

                                                                                                +1
                                                                                               +0.2
                                   Y<1                 X>3


                                                                                  -1
University of Washington




                                         +0.7                  +0.1
                      sign   0.0                -0.1
                                                                                 -0.1          +0.1
                                                               Y>5                              -1
                                                                                               -0.3
                                                        -0.3          +0.2

                                                                                         +1
                                                                                        +0.7
                                                                                                      X

                                                                                                          21
                                 Example: Medical Diagnostics
                           • Cleve dataset from UC Irvine database.
                           •   Heart disease diagnostics (+1=healthy,-1=sick)
                           •   13 features from tests (real valued and discrete).
                           •   303 instances.
University of Washington




                                                                                    22
                                          AD-tree for heart-disease diagnostics


                           >0 : Healthy
                           <0 : Sick
University of Washington




                                                                                  23
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           24
                                        AT&T “buisosity” problem
                                            Freund, Mason, Rogers, Pregibon, Cortes 2000

                           •   Distinguish business/residence customers from call detail information.
                               (time of day, length of call …)
                           •   230M telephone numbers, label unknown for ~30%
                           •   260M calls / day
                           • Required computer resources:
                               Huge: counting log entries to produce statistics -- use specialized I/O
                               efficient sorting algorithms (Hancock).
University of Washington




                               Significant: Calculating the classification for ~70M customers.
                               Negligible: Learning (2 Hours on 10K training examples on an off-line
                               computer).




                                                                                                         25
                           AD-tree for “buisosity”
University of Washington




                                                     26
     University of Washington


                                AD-tree (Detail)




27
                                                             Quantifiable results
                                Varying ScoreThreshold
                                Define:
                                                           TPos 
                                   Precision   




                                                                              Precision
                                                      FPos    TPos  

                                                   TPos 
                                    Recall  
                                                   AllPos

     University of Washington




                                                                                          Recall
 
                                 • For accuracy 94%
                                   increased coverage from 44% to 56%.
                                 • Saved AT&T 15M$ in the year 2000 in operations costs and
                                   missed opportunities.
                                                                                                   28
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           29
                                    The database bottleneck
                           • Physical limit: disk “seek” takes 0.01 sec
                             – Same time to read/write 10^5 bytes
                             – Same time to perform 10^7 CPU operations
                           • Commercial DBMS are optimized for
                             varying queries and transactions.
                           • Statistical analysis requires evaluation of
University of Washington




                             fixed queries on massive data streams.
                           • Keeping disk I/O sequential is key.
                           • Data Compression: improves I/O speed but
                             restricts random access.
                                                                           30
                           CS theory regarding very large data-sets
                           • Massive datasets: “You pay 1 per disk block
                             you read/write per CPU operation.
                             Internal memory can store N disk blocks”
                             – Example problem: Given a stream of line segments (in
                               the plane), identify all segment pairs that intersect.
                             – Vitter, Motwani, Indyk, …
University of Washington




                           • Property testing: “You can only look at a
                             small fraction of the data”
                             – Example problem: decide whether a given graph is bi-
                               partite by testing only a small fraction of the edges.
                             – Rubinfeld, Ron, Sudan, Goldreich, Goldwasser, …



                                                                                        31
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           32
                                   A very curious phenomenon

                                           Boosting decision trees
University of Washington




                           Using <10,000 training examples we fit >2,000,000 parameters

                                                                                          33
                                            Large margins


                                                      h x  y F x
                                                     T

                                                     t1 t t
                               marginFT   (x,y) y
                                                Ý
                                                                            T

                                                                 
                                                        T
                                                                t               1
                                                         t1

                               marginFT (x,y)  0             fT (x)  y
University of Washington




                           Thesis:
                           large margins => reliable predictions
                           Very similar to SVM.
                                                                                    34
     University of Washington



                                Experimental Evidence




35
                                                           Theorem
                                    Schapire, Freund, Bartlett & Lee / Annals of statistics 1998


                           H: set of binary functions with VC-dimension d
                           C   i hi | hi  H, i  0,i  1
                                
                           T  x1,y1,x2 ,y2 ,..., xm ,ym ; T ~ D             m
University of Washington




                     c  C,   0,              with probability 1  w.r.t. T ~ Dm
                           P x,y~D  c(x)  y Px,y~T marginc x,y   
                                    sign
                                                        d / m   1 
                                                     ˜
                                                   O          O
                                                                      log 
                                                            
                           No dependence on no. of combined functions!!!
                                                                                                   36
     University of Washington



                                Idea of Proof




37
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           38
                                                                A motivating example

                                                                       -                 -
                                   ?                                        - -                -
                                                                   -   +      +
                                                                           Unsure-
                                                               -               ++       +
                                   -                   -                 +        +                       - -
                                       -                   -                                       -
                                                                        + ++ +           +                      -
                                                                              ?+ +                    -
                               -                   -                                                        -
                                                                           + ++ +
                                                                           +        +     -      -
University of Washington




                           -                   -                     + +            ++ + +
                                                                                  + +          -
                                                       -         -
                                           -                    -               +                                 -
                                                               - --           ++ +          +
                                                                       - + +                            -
                                                                                 + ++ +
                                                        -                      ++       +
                           -           -       -         -               -                                 -    -
                                                                            +Unsure?            -     -
                                   -                   -              -         + - + -       -               -
                                                                            -                       -
                                                                   -
                                                                                -     - -
                                                                                                                      39
                                                 The algorithm
                                    Freund, Mansour, Schapire, Annals of Stat, August 2004



                           Parameters       0,   0
                                                                         h 
                                                                           ˆ
                           Hypothesis weight:             w(h) e
                                                               Ý
                                                                          w(h) 
                                                                    1 h:h ( x)1 
                                                        lˆ (x)  ln 
University of Washington




                           Empirical Log Ratio                    Ý                   
                                                                       w(h) 
                                                    :




                                                                        h:h ( x)1 
                                        
                                                               1     if     lˆx  
                                                              
                           Prediction rule:        p, x   -1,+1 if
                                                   ˆ                          lˆx  
                                                            
                                                               1     if     lˆx  
                                                                                             40
                                                    Suggested tuning
                           H is a finite set.    =Probability of failure       m =Size of training set
                           Setting:
                                                1 2
                                                                          H  1 2
                            0    4   m
                                      1                 ln H         ln  m
                                                                           
                                                                 
                           Yields:
                                                                                       ln m 
                            1) mistake  Px,y ~D y  p(x)
                               P                            ˆ                2h  O 1/2 
                                                                                   *
                                                                                     m      
University of Washington




                                        
                                                                     
                                                           1/  
                            2) for m =  ln 1  ln H  
                                                              
                                                                                      ln 1   ln H 
                                                        ˆ (x)  1,1  5h*  O
                                P(abstain)  Px,y ~D p
                                                                                                      
                                                                                           m1/2     
                                                                                                      

                                                                                                          41
                                Confidence Rating block diagram

                                Training examples
                                                                 Confidence-rated
                                x1,y1,x2 ,y2 , ,xm ,ym 
                                                                       Rule

                           
University of Washington




                           Candidate                   Rater-
                           Rules                      Combiner




                                                                                42
                                         Summary of
                                  Confidence-Rated Classifiers
                           • Frequentist explanation for the benefits of
                             model averaging
                           • Separates between inherent uncertainty and
                             uncertainty due to finite training set.
                           • Computational hardness: unknown other than in
University of Washington




                             few special cases
                           • Margins from Boosting or SVM can be used as an
                             approximation.
                           • Many practical applications!

                                                                              43
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           44
                              Face Detection - Using confidence to save time
                                                  Viola & Jones 1999
                           • Paul Viola and Mike Jones developed a face detector that can work in
                             real time (15 frames per second).
University of Washington




                                                 QuickTime™ an d a
                                            YUV420 codec decompressor
                                           are need ed to see this p icture .




                                                                                                    45
                                         Image Features

                               “Rectangle filters”

                               Similar to Haar wavelets
                                 Papageorgiou, et al.


                                       t if f t ( xi )  t
University of Washington




                           ht ( xi )  
                                        t otherwise

                                      60,000 100  6,000,000
                                       Unique Binary Features

                                                                46
                               Example Classifier for Face Detection

                           A classifier with 200 rectangle features was learned using AdaBoost

                           95% correct detection on test set with 1 in 14084
                           false positives.

                           Not quite competitive...
University of Washington




                                                                        ROC curve for 200 feature classifier
                                                                                                               47
                                   Employing a cascade to minimize
                                       average detection time

                           The accurate detector combines 6000 simple features using Adaboost.

                            In most boxes, only 8-9 features are calculated.
University of Washington




                           All     Features 1-3        Might be a face   Features 4-10
                           boxes
                                      Definitely
                                      not a face



                                                                                            48
                           Using confidence to avoid labeling
                                    Levin, Viola, Freund 2003
University of Washington




                                                                49
     University of Washington


                                Image 1




50
                           Image 1 - diff from time average
University of Washington




                                                              51
     University of Washington


                                Image 2




52
                           Image 2 - diff from time average
University of Washington




                                                              53
                                    Co-training
                                    Blum and Mitchell 98




                                        Partially trained
                                          B/W based
                                           Classifier       Confident
                                                            Predictions
University of Washington




                            Hwy
                           Images
                                                            Confident
                                        Partially trained   Predictions
                                          Diff based
                                           Classifier


                                                                          54
                                  University of Washington




                                    Subtract-average detection score




     Grey-scale detection score
                                                                                  Cars
                                                                       Non cars




55
                                        Co-Training Results

                           Raw Image detector       Difference Image detector
University of Washington




                              Before co-training     After co-training
                                                                                56
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           57
                                             Gene Regulation
                           • Regulatory proteins bind to non-coding regulatory
                             sequence of a gene to control rate of transcription


                              binding                                regulators
                              sites
University of Washington




                                                                       DNA




                                                                             mRNA
                                                                             transcript

                                        Measurable quantity
                                                                                          58
                           From mRNA to Protein



                                                          mRNA
                                                          transcript
University of Washington




                           Protein
                           folding
                                                    ribosome
                                 Protein sequence

                                                                       59
                           Protein Transcription Factors


                                                     regulator
University of Washington




                                                                 60
                               Genome-wide Expression Data

                           • Microarrays measure mRNA transcript
                             expression levels for all of the ~6000
                             yeast genes at once.
                           • Very noisy data
                           • Rough time slice over all
University of Washington




                             compartments of many cells.
                           • Protein expression not observed


                                                                      61
                                  Partial “Parts List” for Yeast
                           Many known and putative              TF
                           – Transcription factors              MTF
                           – Signaling molecules             SM      TF
                             that activate transcription factors    MTF
                           – Known and putative binding site “motifs”
University of Washington




                           – In yeast, regulatory sequence = 500 bp upstream
                             region




                                                                               62
                                  GeneClass: Problem Formulation
                                         M. Middendorf, A. Kundaje, C. Wiggins, Y. Freund, C. Leslie.
                               Predicting Genetic Regulatory Response Using Classification. ISMB 2004.

                            Predict target gene regulatory response from regulator
                             activity and binding site data
                                                                                             Microarray
                               “Parent”                                                        Image
                               gene              R1   R2   R3   R4 ….. Rp
                               expression                                                   G1

                                                                                            G2
University of Washington




                                            G1
                                            G2                                                       Target
                                            G3                                              G3
                                            G4                                                       gene
                                                            …


                                                                                            G4       expression
                                            Gt




                                                                                            …
                                                  Binding sites (motifs)
                                                  in upstream region
                                                                                            Gt

                                                                                                             63
                                       Role of quantization
                           By Quantizing expression into three classes
                           We reduce noise but maintain most of signal



                                            -1     0     +1
University of Washington




                               Weighting +1/-1 examples linearly with
                               Expression level performs slightly better.
                                                                            64
                                           Problem setup
                           • Data point = Target gene X Microarray
                           • Input features:
                             – Parent state {-1,0,+1}
                             – Motif Presence {0,1}
                           • Predict output:
University of Washington




                             – Target Gene {-1,+1}




                                                                     65
                              Boosting with Alternating Decision Trees
                                              (ADTs)
                           • Use boosting to build a single ADT, margin-
                             based generalization of decision tree
                                                    Splitter Node
                                                    Is MotifMIG1 present
                           Prediction Node          AND ParentXBP1 up?
University of Washington




                           F(x) given by sum of
                           prediction nodes along
                           all paths consistent
                           with x




                                                                           66
                                             Statistical Validation
                           • 10-fold cross-validation experiments, ~50,000
                             (gene/microarray) training examples
                           • Significant correlation between prediction score and true log
                             expression ratio on held-out data.
                           • Prediction accuracy on +1/-1 labels: 88.5%
University of Washington




                                                                                             67
                                          Biological Interpretation
                                        From correlation to causation
                           • Good prediction only implies Correlation.
                           • To infer causation we need to integrate additional knowledge.
                           • Comparative case studies: train on similar conditions (stresses),
                             test on related experiments
                           • Extract significant features from learned model
                              – Iteration score (IS): Boosting iteration at which feature first
                                 appears
University of Washington




                                   Identifies significant motifs, motif-parent pairs
                              – Abundance score (AS): Number of nodes in ADT containing
                                 feature
                                   Identifies important regulators
                           • In silico knock-outs: remove significant regulator and retrain.

                                                                                                  68
                              Case Study: Heat Shock and Osmolarity
                            Training set: Heat shock, osmolarity, amino acid
                             starvation
                            Test set: Stationary phase, simultaneous heat
                             shock+osmolarity
                            Results:
                               Test error = 9.3%
University of Washington




                               Supports Gasch hypothesis: heat shock and osmolarity
                                pathways independent, additive
                              – High scoring parents (AS): USV1 (stationary phase and heat
                                shock), PPT1 (osmolarity response), GAC1 (response to heat)



                                                                                              69
                               Case Study: Heat Shock and Osmolarity
                            Results:
                               High scoring binding sites (IS):
                                 MSN2/MSN4 STRE element
                                 Heat shock related: HSF1 and RAP1 binding sites
                                 Osmolarity/glycerol pathways: CAT8, MIG1, GCN4
                                 Amino acid starvation: GCN4, CHA4, MET31
University of Washington




                              – High scoring motif-parent pair (IS):
                                 TPK1~STRE pair (kinase that regulates MSN2 via
                                 cellular localization) – indirect effect

                                   P          P          TF            P
                                   Mp                     MTF          Mp       M
                             Direct binding       Indirect effect      Co-occurrence
                                                                                       70
                                    Case Study: In silico knockout
                           • Training and test sets: Same as heat shock and
                             osmolarity case study
                           • Knockout: Remove USV1 from regulator list and retrain
                           • Results:
                              – Test error: 12% (increase from 9%)
                              – Identify putative downstream targets of USV1: target genes
                                that change from correct to incorrect label
University of Washington




                              – GO annotation analysis reveals putative functions: Nucleoside
                                transport, cell-wall organization and biogenesis, heat-shock
                                protein activity
                              – Putative functions match those identified in wet lab USV1
                                knockout (Segal et al., 2003)


                                                                                                71
                                    Conclusions: Gene Regulation

                           • New predictive model for study of gene regulation
                              – First gene regulation model to make quantitative
                                predictions.
                              – Using actual expression levels - no clustering.
                              – Strong prediction accuracy on held-out experiments
University of Washington




                              – Interpretable hypotheses: significant regulators,
                                binding motifs, regulator-motif pairs
                           • New methodology for biological analysis: comparative
                             training/test studies, in silico knockouts


                                                                                     72
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           73
                                                  Summary
                           • Moving from density estimation to classification
                             can make hard problems tractable.
                           • Boosting is an efficient and flexible method for
                             constructing complex and accurate classifiers.
                           • I/O is the main bottleneck to data-mining
                              – Sampling, data localization and parallelization help.
University of Washington




                           • Correlation -> Causation : still a hard problem,
                             requires domain specific expertise and integration
                             of data sources.


                                                                                        74
                                              Future work
                           • New applications:
                             – Bio-informatics.
                             – Vision / Speech and signal processing.
                             – Information Retrieval and Information Extraction.
                           • Theory:
                             – Improving the robustness of learning algorithms.
University of Washington




                             – Utilization of unlabeled examples in confidence-rated
                               classification.
                             – Sequential experimental design.
                             – Relationships between learning algorithms and
                               stochastic differential equations.


                                                                                       75
     University of Washington




                                Extra


76
                                               Plan of talk
                           •   Boosting
                           •   Alternating Decision Trees
                           •   Data-mining AT&T transaction logs.
                           •   The I/O bottleneck in data-mining.
                           •   High-energy physics.
                           •   . Resistance of boosting to over-fitting.
University of Washington




                           •   Confidence rated prediction.
                           •   Confidence-rating for object recognition.
                           •   Gene regulation modeling.
                           •   Summary

                                                                           77
                           Analysis for the MiniBooNE experiment



                                                 • Goal: To test for neutrino mass by
                                                 searching for neutrino oscillations.
                                                 • Important because it may lead us to
                                                 physics beyond the Standard Model.
University of Washington




                                                 • The BooNE project began in 1997.
                                                 • The first beam induced neutrino events
                                                 were detected in September, 2002.

                            MiniBooNE detector
                            (Fermi Lab)



                                                                                            78
                            MiniBooNE Classification Task
                                        Ion Stancu. UC Riverside



                                                  Reconstruction


                           Simulation                                         e
                                                  Feature Vector
                               Of                   (52 Reals)         x 
University of Washington




                           MiniBooNE                                          Other
                            Detector               Neural Network
                                                      52 inputs
                                                                       
                                                      26 Hidden   



                                                                                      79
     University of Washington




80
     University of Washington



                                Results




81
                                 Using confidence to reduce labeling


                                             Partially trained
                           Unlabeled data                                     Sample of unconfident
                                                classifier
                                                                                 examples
University of Washington




                                            Labeled
                                            examples



                                            Query-by-committee, Seung, Opper & Sompolinsky
                                            Freund, Seung, Shamir & Tishby

                                                                                                      82
                   University of Washington



                                         No. of mistakes   Discriminative approach




     Voice Pitch
83
                           Results from Yotam Abramson.
University of Washington




                                                          84
     University of Washington




85

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:66
posted:7/25/2012
language:English
pages:85