Docstoc

Microarray Data Analysis - Classification

Document Sample
Microarray Data Analysis - Classification Powered By Docstoc
					Microarray Data
Analysis - Classification

Chien-Yu Chen
Graduate school of biotechnology and
bioinformatics, Yuan Ze University
Class Prediction

 It can be conjectured that it ought to be
  possible to differentiate among the tumor
  classes by studying and contrasting their
  gene expression profiles.
 We can apply class prediction or supervised
  classification techniques to develop a
  classification rule to discriminate them.
 The knowledge can be used to predict the
  class of a new tumor of unknown class based
  on its gene expression profile.
                                            2
Model of Microarray Data Sets
                   Gene 1   Gene 2    Gene n
             S11

   Class 1   S12
             S13
              :

             S21
                             vi,j
   Class 2   S22
             S23
              :

             S31
   Class 3   S32
                                                                          3
             S33
                                     Lecture notes of Dr. Yen-Jen Oyang
Learning From Training Data

 A classification task usually involves with
  training and testing data which consist of
  some data instances.
 Each instance in the training set contains one
  ―target value‖ (class labels) and several
  ―attributes‖ (features).




                                               4
Testing procedure

   The goal of a classifier is to produce a model
    which predicts target value of data instances
    in the testing set which are given only the
    attributes.




                                                     5
Cross Validation

   Most data classification algorithms require some
    parameters to be set, e.g. k in KNN classifier and the
    tree pruning threshold in the decision tree.
   One way to find an appropriate parameter setting is
    through k-fold cross validation, normally k=10.
   In the k-fold cross validation, the training data set is
    divided into k subsets. Then k runs of the
    classification algorithm is conducted, with each
    subset serving as the test set once, while using the
    remaining (k-1) subsets as the training set.
   The parameter values that yield maximum accuracy
    in cross validation are then adopted.

                                                                          6
                                     Lecture notes of Dr. Yen-Jen Oyang
   In the cross validation process, we set the
    parameters of the classifier to a particular
    combination of values that we are interested in and
    then evaluate how good the combination is based
    on alternative schemes.
   With the leave-one-out cross validation scheme,
    we attempt to predict the class of each sample
    using the remaining samples as the training data
    set.
   With 10-fold cross validation, we evenly divide the
    training data set into 10 subsets. Each time, we
    test the prediction accuracy of one of the 10
    subsets using the other 9 subsets as the training
    set.

                                                                         7
                                    Lecture notes of Dr. Yen-Jen Oyang
Naive Bayes Classifier

 The naive Bayes classifier assigns an
  instance sk with attribute values (A1=v1,
  A2=v2, …, Am=vm ) to class Ci with maximum
  Prob(Ci|(v1, v2, …, vm)) for all i.
 The naive Bayes classifier exploits the
  Bayes’s rule and assumes independence of
  attributes.



                                                                8
                           Lecture notes of Dr. Yen-Jen Oyang
   Likelihood of sk belonging to Ci
                                        Pv1 , v 2 ,..., v m  | Ci P(Ci )
    ProbCi | v1 , v 2 ,..., v m  
                                            Pv1 , v 2 ,..., v m 
 Likelihood of sk belonging to Cj

                                              P v1 , v 2 ,..., v m  | C j P(C j )
      Prob C j | v1 , v 2 ,..., v m  
                                                     P v1 , v 2 ,..., v m 
   Therefore, when comparing Prob(Ci| (v1, v2, …, vm))
    and P(Cj |(v1, v2, …, vm)), we only need to compute
    P((v1, v2, …, vm)|Ci)P(Ci) and P((v1, v2, …,
    vm)|Cj)P(Cj)

                                                                                          9
                                                     Lecture notes of Dr. Yen-Jen Oyang
   Under the assumption of independent attributes
         P v1 , v 2 ,..., v m  | C j 
          P( A1  v1 | C j )  P( A2  v 2 | C j )    P( Am  v m | C j )
             m
           P( Ah  v h | C j )
            h 1
   Furthermore, P(Cj) can be computed by

           number of training samples belonging to C j
                     total number of training samples

                                                                                     10
                                                Lecture notes of Dr. Yen-Jen Oyang
  An Example of the Naïve Bayes
  Classifier
                             The weather data, with counts and probabilities
         outlook               temperature               humidity                 windy               play
                yes   no              yes   no                yes   no             yes    no     yes         no

sunny             2   3      hot      2     2     high        3      4    false     6     2       9          5
overcast          4   0      mild     4     2     normal      6      1    true      3     3
rainy             3   2      cool     3     1
sunny           2/9   3/5    hot      2/9   2/5   high        3/9   4/5   false    6/9    2/5   9/14     5/14
overcast        4/9   0/5    mild     4/9   2/5   normal      6/9   1/5   true     3/9    3/5
rainy           3/9   2/5    cool     3/9   1/5


                                                  A new day

        outlook             temperature            humidity               windy                 play

        sunny                  cool                  high                  true                  ?
                                                                                                             11
                                                                    Lecture notes of Dr. Yen-Jen Oyang
   Likelihood of yes
     2 3 3 3 9
          0.0053
     9 9 9 9 14
   Likelihood of no
     3 1 4 3 5
            0.0206
     5 5 5 5 14
   Therefore, the prediction is No



                                                                           12
                                      Lecture notes of Dr. Yen-Jen Oyang
The Naive Bayes Classifier for Data
Sets with Numerical Attribute Values


   One common practice to handle numerical
    attribute values is to assume normal
    distributions for numerical attributes.




                                                                 13
                            Lecture notes of Dr. Yen-Jen Oyang
                        The numeric weather data with summary statistics
        outlook            temperature             humidity                   windy              play
             ye   no            ye    no             yes       no               ye    no    yes         no
              s                  s                                               s
sunny        2    3             83    85              86       85     false     6     2      9          5
overcast     4    0             70    80              96       90     true      3     3
rainy        3    2             68    65              80       70
                                64    72              65       95
                                69    71              70       91
                                75                    80
                                75                    70
                                72                    90
                                81                    75
sunny       2/9   3/5   mean    73   74.6   mean     79.1     86.2    false    6/9    2/5   9/14    5/14
overcast    4/9   0/5   std    6.2   7.9    std      10.2      9.7    true     3/9    3/5
                        dev                 dev

rainy       3/9   2/5

                                                                                                        14
                                                              Lecture notes of Dr. Yen-Jen Oyang
   Let x1, x2, …, xn be the values of a numerical attribute
    in the training data set.


             1 n
             xi
             n i 1
               1 n
                  xi   
             n  1 i 1
                              2


                                    w   2
                       1       
           f ( w)         e         2

                      2 
                                                                                     15
                                                Lecture notes of Dr. Yen-Jen Oyang
   For examples,                                          66 73 2
                                                      
     f temperatur e  66 | Yes  
                                        1
                                                  e         2  6.2 2
                                                                          0.0340
                                      2 6.2 
                      2                    3 9
 Likelihood of Yes =    0.0340  0.0221    0.000036
                      9                    9 14
                     3                   3 5
 Likelihood of No =    0.0291  0.038    0.000136
                     5                   5 14




                                                                                    16
                                              Lecture notes of Dr. Yen-Jen Oyang
Instance-Based Learning

 In instance-based learning, we take k nearest
  training samples of a new instance (v1, v2, …,
  vm) and assign the new instance to the class
  that has most instances in the k nearest
  training samples.
 Classifiers that adopt instance-based learning
  are commonly called the KNN classifiers.



                                                                  17
                             Lecture notes of Dr. Yen-Jen Oyang
 The basic version of the KNN classifiers
  works only for data sets with numerical values.
  However, extensions have been proposed for
  handling data sets with categorical attributes.
 If the number of training samples is
  sufficiently large, then it can be proved
  statistically that the KNN classifier can deliver
  the accuracy achievable with learning from
  the training data set.
 However, if the number of training samples is
  not large enough, the KNN classifier may not
  work well.

                                                                   18
                              Lecture notes of Dr. Yen-Jen Oyang
   If the data set is noiseless, then the 1NN classifier should work
    well. In general, the more noisy the data set is, the higher
    should k be set. However, the optimal k value should be figured
    out through cross validation.
   The ranges of attribute values should be normalized, before the
    KNN classifier is applied. There are two common normalization
    approaches
        v  vm in
    w
       vm ax  vm in
         v
    w
                     , where  and 2 are the mean and the variance
    of the attribute values, respectively.

                                                                               19
                                          Lecture notes of Dr. Yen-Jen Oyang
Example of the KNN Classifiers




 If an 1NN classifier is employed, then the
  prediction of ―‖ = ―X‖.
 If an 3NN classifier is employed, then
  prediction of ―‖ = ―O‖.
                                                                  20
                             Lecture notes of Dr. Yen-Jen Oyang
Alternative Similarity Functions

   Let < vr,1, vr,2 ,…, vr,n> and < vt,1, vt,2 ,…, vt,n >
    be the gene expression vectors, i.e. the
    feature vectors, of samples Sr and St,
    respectively. Then, the following alternative
    similarity functions can be employed:
    – Euclidean distance—


                        v   r ,h  vt ,h 
                        n
     dissimilarity 
                                           2

                       h 1

                                                                                    21
                                               Lecture notes of Dr. Yen-Jen Oyang
– Cosine—
                            v                      vt ,h 
                               n

                                             r ,h
   Similarity              h 1
                           n                            n

                          v
                          h 1
                                      2
                                      r ,h           v
                                                       h 1
                                                                2
                                                                t ,h



– Correlation coefficient--

                  v                  r vt ,h  t 
                    n

                               r ,h
  Similarity      h 1
                                                                       , where
                                         r t
       1 n                    1 n
   r   v r ,h          t   vt ,h
       n h 1                 n h 1

  r 
           1 n
                   vr ,h  r 2  t                                  1 n
                                                                                 vt ,h  t 2
         n  1 h 1                                                    n  1 h 1
                                                                                                              22
                                                                         Lecture notes of Dr. Yen-Jen Oyang
Importance of Feature Selection

 Inclusion of features that are not correlated to
  the classification decision may make the
  problem even more complicated.
 For example, in the data set shown on the
  following page, inclusion of the feature
  corresponding to the Y-axis causes incorrect
  prediction of the test instance marked by ―‖,
  if a 3NN classifier is employed.

                                                                   23
                              Lecture notes of Dr. Yen-Jen Oyang
      y




                     x=10           x


   It is apparent that ―o‖s and ―x‖ s are separated by
    x=10. If only the attribute corresponding to the x-axis
    was selected, then the 3NN classifier would predict
    the class of ―‖ correctly.

                                                                         24
                                    Lecture notes of Dr. Yen-Jen Oyang
Feature Selection for Microarray
Data Analysis
 In microarray data analysis, it is highly
  desirable to identify those genes that are
  correlated to the classes of samples.
 For example, in the Leukemia data set, there
  are 7129 genes. We want to identify those
  genes that lead to different disease types.




                                                                 25
                            Lecture notes of Dr. Yen-Jen Oyang
Parameter Setting through Cross
Validation
 When carrying out data classification, we
  normally need to set one or more parameters
  associated with the data classification
  algorithm.
 For example, we need to set the value of k
  with the KNN classifier.
 The typical approach is to conduct cross
  validation to find out the optimal value.

                                                                26
                           Lecture notes of Dr. Yen-Jen Oyang
Linear Separable and Non-Linear
Separable
 The example above shows a case of linear
  separable.
 Following is an example of non-linear
  separable.




                                                                27
                           Lecture notes of Dr. Yen-Jen Oyang
                                     28
Lecture notes of Dr. Yen-Jen Oyang
An Example of Non-Separable




                                                          29
                     Lecture notes of Dr. Yen-Jen Oyang
Support Vector Machines (SVM)

 Over the last few years, the SVM has been
  established as one of the preferred
  approaches to many problems in pattern
  recognition and regression estimation.
 In most cases, SVM generalization
  performances have been found to be either
  equal to or much better than that of the
  conventional methods

                                              30
Optimal Separation

 Suppose we are interested in finding out how
  to separate a set of training data vectors that
  belong to two distinct classes.
 If the data are separable in the input space,
  there may exist many hyperplanes that can
  do such a separation.
 We are interested in finding the optimal
  hyperplane classifier – the one with the
  maximal margin of separation between the
  two classes.
                                                31
A Practical Guide to SVM
Classification
   http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf




                                                               32
Support Vector Machines (SVM)




                                33
Graphic Interface

   http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html#GUI




                                                              34
Specificity/Selectivity v.s.
Sensitivity
   Sensitivity
    – True positives / (True positives + False negatives)
 Selectivity True positives / (True positives +
  False positives)
 Specificity
    – True negatives / (True Negatives + False positives)
   Accuracy
    – (True positives + True Negatives ) / (True
      positives+ True Negatives + False positives +
      False negatives)
                                                        35

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:3
posted:9/14/2011
language:English
pages:35