Docstoc

DNA Microarray Data Analysis

Document Sample
DNA Microarray Data Analysis Powered By Docstoc
					    DNA Microarray Data Analysis
using Artificial Neural Network Models.


                      by
         Venkatanand Venkatachalapathy
                  (‘Venkat’)
        ECE/ CS/ ME 539 Course Project
             Genetic information flow
                Transcription                           Translation
 Genes {DNA}                    RNA intermediate                      Protein

                            GENE EXPRESSION
                         (Gene expression refers to both
                          transcription and translation.)



 Genes (information molecules) – code for RNA & Proteins (functional
  molecules- properties of cell).

 “Gene Expression Level” - amount of Prot./ RNA produced per gene.
  Expression varies dynamically with time depending on environment, stage
  of development of cell etc.

 When expression level is “high” or “low” with respect to a reference
  condition (‘normal state’) , GENE is said to be switched ‘ON’ or ‘OFF’.
          Microarray experiments and data
 Measures Gene Exp. level of 1000’s of genes in a single experiment.

 For a single experiment, each gene has a data point expressed as a ratio of
  current state expression to reference state expression.
 Eg. Exp. level for Genes [A B C] = [ 3000/10 10/30 1/1]
  (Conventionally, these ratios are normalized on a log scale)

 ‘N’ such experiments for M genes give rise to
  GENE EXPRESSION MATRIX ( M x N)
   G(i,j) = expression level of ith gene in jth experiment.
  (Collection of Gene expression row vectors)

 Enormous Significance in Biotech. & Medicine! WHY?
  Genome projects completed,
   => KNOW GENETIC CODE , MUST FIND FUNCTION?
       Project Problem & Methodology
 OBJECTIVE:
    Classify “unknown” genes to functional classes based on:
     - Microarray gene expression data & Knowledge about function of
     “well known” genes.
     A Graphical User Interface for the analysis.


 SOLUTION STRATEGY:
    Functionally related genes have similar expression level!
 Two step:
1.  For “Well known genes” - correlate their gene expression vector &
    functional class. This correlation can be encoded in a Neural Network!
2.   Using this Neural Network, classify of unknown genes using its gene
    expression vector!
    ANN Models & Program Features
 Models chosen  MLP (used bp.m) ,        SVM (linear kernel, polynomial
kernel, radial basis kernel – svmdemo.m)
 GUI Interface accepting comma limited gene expression data files (.csv).
                    Preliminary Results
  • Data Source: Stanford Microarray Database.
  • Classification of 2467 genes into “TCA” Class and “Non-TCA”
    Class (Tested by 3- way cross validation)
  • Brown et al used SVM Radial Basis : 99.5%
         MLP Results                           SVM results

ARCHITECTURE     TRAIN    TEST      KERNEL        TRAIN        TEST
                 C-RATE   C-RATE    ORDER         C-RATE       C-RATE


80 – 15 – 1      99.3 %   96.5 %    Linear        100%         99.1%


80 – 15 – 5 –1   99.6%    97%       Poly – 2      To be done   To be done
                                    And Poly –3

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:7/15/2012
language:English
pages:6