Survival Analysis in Cancer Gene Using Vector Space Model

W
Document Sample
scope of work template
							                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                      Vol. 10, No. 5, May 2012




       Survival Analysis In Cancer Gene Using Vector
                        Space Model
             Jitasha Mishra                                Debashis Hati                                  Amritesh Kumar
            Assistant Professor,                       Assistant Professor,                               Assistant Professor,
    Computer Science and Engineering,             Computer science and Engineering,               Computer Science and Engineering,
    Gandhi Institute Of Technology And              Gandhi Engineering college,                   Gandhi Institute Of Technology And
               Management,                              Bhubaneswar,India                                    Management,
            Bhubaneswar,India                                                                             Bhubaneswar,India
         Sibu_124@yahoo.co.in                            d_hati@yahoo.com                               amritesh.kiit@gmail.com




Abstract—The study is an effort to design a stable classification        classifying and discovering many aspects that lead to diseases
system and make a survival analysis to categorize microarray             like cancer. The focus from a macro level to a molecular level
gene expression profiles. Currently, high-throughput microarray          has led to a better understanding of the functions of genes.
technology has been widely used to simultaneously probe the              Various developments in the field of bioinformatics have led
expression values of thousands genes in a biological sample.             to efficient data mining and classification algorithms and
However, due to the nature of DNA hybridization, the expression
                                                                         techniques. The present study involves the application of
profiles are highly noisy and demand specialized data mining
methods for analysis. Our proposed approach focuses on                   machine learning methods for the classification of cancer
developing an effective and stable sample classification system          samples using the gene expression data obtained from the
using gene expression data. The traditional cancer prognostic            micro array experiment.
tools of tumor stage and morphology are inadequate                       Understanding gene expressions: A brief explanation of
benchmarks for the accurate determination of patient risk.               gene expression and micro arrays will help aid in the proper
The emergence of microarray technology has enabled the                   understanding of the current classification problem. Genetic
simultaneous measurement of thousands of gene expression                 material is the same in all cells of the body. The only thing
levels, allowing researchers to apply sophisticated data mining          that makes the organs in the body act differently is that some
and statistical techniques in the search for a superior
                                                                         genes are dormant in certain cells. Some genes are expressed
prognostic methodology. This paper extends an existing
procedure called Bayesian Model Averaging (BMA) to Cosine                in a cell while others are not, creating the whole variation.
Bayesian Model Averaging for application to survival analysis.           These dormant genes in the cell are sometimes triggered in
Cosine BMA is a method for predicting survival prognosis by              some circumstances which lead to several diseases and
isolating a small group of relevant predictor genes from a high-         disorders like cancer .This leads to malfunctions in the proper
dimensional microarray dataset. In this paper, the Cosine                working of the cells. Bioinformatics research shows that the
BMA algorithm for survival analysis is applied to two real               expression levels of genes away from normal samples might
cancer datasets: diffuse large B-cell lymphoma and breast                be a reason for several abnormalities. With the help of new
cancer. The selected genes are used to divide patients into high         technologies, we are now able to study the expression levels of
and low-risk categories. Results show that the Cosine BMA
                                                                         thousands of genes at once. In this way, we can try to compare
algorithm for survival analysis consistently selects a small
number of relevant genes while providing a higher degree of              the expression levels in normal and abnormal cells. The
predictive accuracy than other feature selection methods. The            expression values in affected genes can help us compare them
procedure shows promise as a powerful and cost-effective                 with regular expression values and thus tell us the reason for
prognostic tool in future cancer research.                               the abnormality. A gene is considered informative when its
                                                                         expression helps to classify samples to a disease condition or
Keywords- Microarray, Supervised learning,              survival         not. All of these informative genes help us develop
analysis, classification, DNA, FNAC, biopsy.                             classification systems which can distinguish normal cells from
                                                                         the abnormal ones. The goal of this study is to build a
                      I.    INTRODUCTION                                 classification model which can efficiently classify the normal
   In the field of bioinformatics we implement a number of               and tumor samples using gene expression data obtained from
algorithms to solve the problems like cancer. A cancer is a              micro array study
disease that is a big threat to human life. But many a starving          Understanding micro arrays: A micro array is a tool used to
to overcome the problem to make the mankind simpler and                  sift through and analyze the information contained within a
smoother. The researchers have gone through a multiple                   genome. A microarray consists of different nucleic acid probes
number of solutions but it requires something that would be              that are chemically attached to a substrate, which can be a
the optimal one. Bioinformatics has led to a vast amount of              microchip, a glass slide or a microsphere-sized bead. The first
research advances and has proven effective for diagnosing,               DNA microarray chip was engineered at Stanford University,




                                                                    59                            http://sites.google.com/site/ijcsis/
                                                                                                  ISSN 1947-5500
                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 10, No. 5, May 2012




whereas Affymetrix Inc. was the first to create the patented           the expression levels of tens of thousands of genes at once,
DNA microarray wafer chip called the Gene Chip. The                    and this number changes rapidly as the technology grows
microarray data used for the current study was collected using         more robust. To put this in prospective, the human genome
Affymetix Gene Chips also knows as an oligonucleotide                  is believed to contain around 20,000 to 5,000 genes, so one
microarray. Messenger RNA is extracted from the cell and               microarray chip could ostensibly produce expression scores
converted to cDNA. After the amplification and labeling of the         for more genes than exist in the entire human genome [2].
sample it is hybridized on the chip. After the washing of              Microarrays come in several different types, including short
unhybridized material, the chip is scanned with a laser scanner        oligonucleotide arrays, cDNA or spotted arrays, long
and the image analyzed by computer.                                    oligonucleotide arrays, and fiber-optic arrays. Short
The more accurate method by which to assess relatively                 oligonucleotide arrays, manufactured by the company
recent emergence of microarray technology has allowed                  Affymetrix, are the most popular commercial variety on the
for new advances in the determination of survival prognosis            market today [2]. A growing number of studies have used a
through the study of gene expression levels. Our proposed              variety of statistical methodologies to perform classification
approach presents a survival analysis extension to the                 high-dimensional microarray data [1]. Developed a risk index
Cosine Bayesian Model Averaging (CBMA) algorithm , which               based on 50 genes that classified lung cancer patients into
is a method for predicting survival prognosis by isolating             either low- or high-risk groups with a considerable degree of
a small group of relevant predictor genes from a high-                 accuracy[4]. The two groups differed significantly from one
dimensional microarray dataset. The purpose of our                     another in terms of survival rates (p-value=0.0006), and these
proposed approach is to show that the Cosine BMA                       results still held among patients with stage-1 tumors (p-
algorithm for survival analysis consistently selects a small           value=0.028). It tested their risk index through leave-one-out
and cost-effective number of predictor genes while                     cross validation on the original data set [4]. The index was
outperforming other feature selection, classification and              then applied to an independent dataset with the same
supervised learning procedures in the accuracy of patient risk         genetic expression information. Again, the high- and low-
assessment.                                                            risk groups were found to differ significantly from one
                                                                       another, both overall (p-value=0.003) and among patients with
        II        Background Work                                      stage-1 tumors (p-value=0.006). some approach applied a
                                                                       similar methodology based on 64 genes, with improved
    The science of biology has proven to be one of the most            results. The study used multivariate Cox Proportional
important areas of knowledge in human history. A                       Hazards regression with bootstrap resampling and forward
tremendous amount of data is required to explore biological            selection to identify a 64- gene signature, whereas other
phenomena, and computer science techniques have become                 approach employed univariate Cox regression [5][4]. In
pivotal in managing this mountain of information.                      addition, one approach gathered seven independent data
Bioinformatics is the merger of biology and computer                   sets for validation purposes [7]. Once they created a risk
science[9]. Bioinformatics spans the complete range of                 index by training on two of the seven datasets, they divided
biological topics, but the study of gene expression is one of          patients from the other five into high- and low-risk groups.
the most prominent and important subjects to pursue in the             In all cases, the two groups were significantly different from
quest for medical advances. Gene expression is defined as the          one another (p < 0.001). Published a survival classification
process by which a gene's DNA sequence is converted into a             study involving primary invasive breast carcinomas [4].They
functional protein [3]. The DNA sequence is first encoded as           attempted to classify patients into two groups: those whose
messenger RNA (mRNA) through the process of                            cancer recurred within five years of diagnosis and tumor
transcription, and the mRNA is subsequently translated into a          resection (poor prognosis group), and those who remained
protein [2]. A higher expression score denotes greater                 disease-free beyond five years (good prognosis group)[9].
amounts of gene activity. Gene expression profiles are                 They used a three-step supervised classification method to
created and studied to determine which genes are expressed in          develop a 70-gene predictive signature, which they applied to
particular cell types, at what times these genes are expressed,        an independent test set of 19 patients. Their classifier
and under what conditions expression occurs. Medical                   correctly labeled 17 of the 19 samples, an accuracy of 89.5%
researchers often use these profiles to compare the expression         (p-value=0.0018).
levels of normal cells with those of cells in a special                More recently, some approach used a combination of three
condition. This condition could involve cancer, other diseases,        different feature selection methods to identify a set of
starvation, extreme temperature, or any other unique cellular          predictive genes: Prediction Analysis for Microarrays (PAM),
state that scientists are interested in understanding. By              Significance Analysis for Microarrays (SAM), and a
studying the differences between the expression levels of              correlation-based technique similar to that of) [1][7]. Of the
these cells, researchers are able to determine the effects of          genes selected in each method, 21 were present in all three
the given condition on the process of gene expression in a             models. This 21-gene signature was applied to two
particular cell type [8]. Gene expression profiles are most            independent acute myeloid leukamia datasets and
commonly extracted through the use of microarray                       successfully separated patients into good and poor outcome
technologies. A single silicon microarray chip can measure             groups (p < 0.0005).




                                                                  60                            http://sites.google.com/site/ijcsis/
                                                                                                ISSN 1947-5500
                                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                     Vol. 10, No. 5, May 2012




           III         PROPOSED ARCHITECTURE                                          squares to rank-order the genes from the microarray dataset.
                                                                                      As the algorithm iterates, genes with a high posterior
          The proposed architecture includes the total structure                      probability are retained while genes with a low posterior
of our proposed approach. Initially by using supervised                               probability are eliminated. The default threshold for
learning the system is trained on trained dataset. Then a                             inclusion is set to 0.3; genes whose posterior probabilities are
feature extraction is done by isolating a small set of cancer                         less than 0.3 are discarded.
affected genes and a microarray data is constructed by using                          Breast Cancer: The breast cancer is a disease that occurs in
the mathematical exponential function. Finally the                                    human beings in a high rate. It is required to classify and cure it.
classification and survival analysis is done. A microarray data                       The malignancy is present or not can be known by two ways
is fed as input to the risk ratio calculation system. Then a                          clinically and pathologically. Various symptoms of breast cancer
survival of cancer genes done.                                                        are presence of lump, discharge of blood, retraction of nipple
                                                                                      and distraction of shape. Here we find the cancer genes of breast
                                                                                      and classify them by the proposed algorithm and find the
                   Training dataset (Gene data + survival time +censor                survival of the patients. The different values used for
                                      information)                                    classification are:
Supervised learning                                                                   Test                       Total           Values            Result
                     Feature selection:               Model construction:             MAMMOGRAPHY-                             100     Below -50            Normal
                   Isolate a small set of              Build mathematical             BIRADS
                 predictor genes from the            model from the training                                                           Above 50 and         4
                      training dataset.             data for use in prediction        (Hormonal test for cancer)                       upto 60
Classification

                Use Selected genes and trained                  Independent                                                            Above 60 and         5
                 mathematical model to predict                 Datasets (Test                                                          upto 70
                        time of event                              data)
Survival Analysis                                                                                                                      Above 70 and         6
                                                                                                                                       upto 100
                 Risk ratio of patient survival from test data and
                               mathematical model                                     F.N.A.C.                                 100     Below -10,           Normal

                       Figure 1. The Proposed Architecture                            (Blood test for cancer)                          Above 10 and         Suspicious for
                                                                                                                                       upto 100             Malignancy
Steps involved in synthesizing a dataset
    1. Collection of training dataset                                                 BIOPSY                                   100     Below -50,           Normal
    2. Learning by a supervisor machine.
    3. Transformation of training dataset to test dataset                             (Tissue test for cancer)                         Above 50 and         BR Score-7
                                                                                                                                       upto 60
    4. Conversion and feature extraction.
    5. Synthesis using mathematical model                                                                                              Above 60 and         BR Score-8
    6. Test conducted on test dataset.                                                                                                 upto 70
    7. Survival analysis on microarray data.
                                                                                                                                       Above 70 and         BR Score-9
                                                                                                                                       upto 100
           IV          Proposed Approach
                                                                                                             Table 1 Tests conducted for breast cancer
A. Cosine Bayesian model averaging method for
    classification                                                                    Diffuse large b-cell lymphoma: Dlbcl cancer is a type of
    The Cosine BMA implementation described above is                                  cancer that occurs when the count of lymphoblast increases in
incompatible with microarray data. This is because the                                blood. Lymphoblast is present in white blood corpuscles. It
typical microarray dataset contains thousands or even tens of                         occurs many human beings so requires an approach for
thousands of genes, but the leaps and bounds algorithm can                            classification and estimation for curing. The different values
only consider a maximum of 30 variables when selecting the                            for dlbcl cancer classification are:
top best models to return to the user. The usual practice of
employing stepwise backward elimination to reduce the
number of genes down to 30 is not applicable in a situation
where the number of predictive variables is greater than the
number of samples. An Cosine BMA algorithm that takes a
rank-ordered list of genes and successively applies the
traditional BMA algorithm until all genes up to a user-
specified value p (G1, G2, …, Gp) have been processed. It begin by
using the ratio of between-group to within-group sum of




                                                                                 61                                  http://sites.google.com/site/ijcsis/
                                                                                                                     ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 10, No. 5, May 2012




Test                        Total     Values              Result               patients and the test dataset is for new patient. The value of
                                                                               R(t,p) is in between 0 and 1. If the value of R(t,p) is above or
MONOCLONAL                   100      Above -0           CANCER                equal to 0.3 then the patient is in high risk. wkt is the weight
BAND                                                     POSITIVE
                                                                               of the existing patient training dataset attributes. wkp is the
(Blood/Urine test for                                    NORMAL                weight of the new patient test dataset attributes. Where the
                                      Less than -0
M-band, monoclonal                                                             patient attributes are diameter of the cell, texture, length,
band,      monoclonal                                                          fluidity, cell size and volume for breast cancer and lym
spike,     monoclonal
protein)
                                                                               number, analysis set, follow up years, status, subgroup and
                                                                               germinal centre of b-cell in DLBCL cancer.
ESR                         100       Above & equal      CANCER
                                      to 50              POSITIVE
                                                                               Following this step, the algorithm iterates through the user-
(Cell test in Blood for                                                        specified p top-ranked genes, applying the Cosine BMA
cancer)                                                                        algorithm for survival analysis to each group of variables in
                                      Less than 50       NORMAL                the current BMA window (where the window size is denoted
                                                                               by xmax). This part of the procedure is similar to the
                                                                               classification method described previously; genes with high
LDH                        1000       Above & equal      CANCER                posterior probabilities are retained while genes with low
                                      to -500            POSITIVE
(Tissue test in Blood                                                          posterior probabilities are eliminated.
for cancer)                           Less than -500     NORMAL                To evaluate the performance of this method, the continuous
                                                                               risk scores of the patients were discretised into risk groups.
F.N.A.C.                    100       Below -10,         NORMAL                The overall risk score for a single patient is the weighted
                                                                               average of the risk scores calculated for each model in the set
(Blood test for cancer)               Above 10 and       Suspicious for
                                      upto 100           Malignancy
                                                                               S of contending models.

                                                                                         V.      Proposed Algorithm (Cosine BMA)
                      Table 2 Tests for DLBCL cancer

                                                                                          The Cosine Bayesian model averaging method
B. Cosine Bayesian model averaging method for survival
                                                                               algorithm for survival analysis is used by me in my proposed
   analysis
                                                                               approach for survival analysis of cancer patients is as follows:
   In order to extend the Cosine BMA method to survival                        STEP 1 Input: training set TD with G genes and n samples
analysis, a number of algorithmic modifications were                           STEP 2 Pre-processing step: Rank-order all G genes by
implemented. First, instead of applying the SS/WSS technique                   applying Cox Proportional Hazards Regression to each
to rank-order the genes in the preprocessing step, the Cox                     individual gene. Let x1, x2, … , xG be the ordered list of
Proportional Hazards is used to rank each individual gene.                     genes, sorted in descending order of log likelihood. Let xmax
Cox regression is a popular choice in the realm of survival                    denote the user-specified size of the Cosine BMA window
analysis due to its broad applicability and capacity for                       (maximum 5).
handling censored data. It is a semi-parametric method that                    STEP 3 Parameters: n best and p, where p is the total
quantifies the hazard rate for a subject s at time T as follows:               number of genes to be processed by the Cosine BMA
λ (T | p s ) = λ0 (T ) exp( p sθ )                 (1)                         algorithm and G is the total number of genes in the training
                                                                               dataset. Note that xmax < p ≤ G.
In this equation, λ0 (T) is the baseline hazard function a time T,             STEP 4: Initially, start with the xmax top ranked genes (x1, x2,
 ps is the vector of effect parameters for subject s and θ is the              ., xmax ), and apply the traditional BMA algorithm for survival
                                                                               analysis . Let to be processed be an ordered list of genes with
vector of unknown predictor coefficient. we observed that the                  ranks (xmax + 1) to p. Initially, to be processed
baseline hazard function in equation (1) could be left                         (xmax +1, x32, …, xp).
unspecified if the effect of a covariate on one individual                     STEP 5: End
remains the same for all times T (e.g., if an environmental
variable doubles your personal risk of dying at time 5, it also                         VI       Results and Implementation :
doubles your risk at time 8). Therefore, an estimation of θ is
                                                                                  A.    Methods and Materials
all that is needed. Our proposed approach calculate similarity
between existing patient dataset and new patient dataset based                     Our proposed work is to find out the rate of survival of
on vector space model.                                                         cancer patients. So, we have considered two cancer datasets for

R(t , p ) =                 ∑ wkt * wkp                             (2)
                                                                               this proposed work. The two cancer genes involve breast
                                                                               cancer and diffuse large b-cell lymphoma cancer.
                (∑ (wkt) * (wkt)) * (∑ (wkp) * (wkp))
In equation (2), R(t,p) is the relationship between training
dataset and test dataset, where training dataset is for existing




                                                                          62                             http://sites.google.com/site/ijcsis/
                                                                                                         ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                Vol. 10, No. 5, May 2012




              Attributes                               Value                      LYM     Analysis    Follow        Status      Sub       Germinal
                                                                                 number      set        up                     group      center of
          Diameter of the cell                          0.38                    Length    Fluidity    Cell size
                                                                                                       years         Volume                b-cell

                Texture                                -0.024                      1         0.5       -0.22        -1.29       -0.56        -0.3
                Length                                 -0.242
                                                                                   3        1.15       -0.31        -0.14       -0.05       -0.68
                Fluidity                               0.732
                                                                                   5        1.72        0.59         0.79       -0.04        0.9
               Cell size                               -0.259
                                                                                   9      0.32        -0.05        -0.76      -0.66      -0.75
               Volume                                  -0.134
                                                                                   13       0.91       -0.04         -0.6       1.48        -1.32
               Table 3 Training Dataset For Cancer
              Attributes                               Value                       14     0.43        0.23         -0.22      1.35       -1.19

          Diameter of the cell                         0.092                    Length
                                                                                   16     Fluidity
                                                                                          1.44         Cell
                                                                                                      -0.41size      Volume -0.7
                                                                                                                   0.52                  -0.2

                Texture                                -0.347                      17     -0.16       0.71         -0.53      0.35       -0.21
                Length                                 -0.157
                                                                                   19     -1.59       0.51         -0.38      -2.41      1.27
                Fluidity                                -0.04
                                                                                          Table 6 Dataset of DLBCL cancer
               Cell size                               -0.575

               Volume                                  -0.291                   In table 6 there are test datasets of patients those who are
                                                                                affected by DLBCL cancer.
                    Table 4 Test Dataset of a cancer Patient

 Here we are implementing table 3 and table 4 in equation (2),                            B. Results
where the total value of training dataset is represented by wkt                           This section presents the results from the application
and the total value of test dataset is represented by wkp.                      of the Cosine BMA algorithm for survival analysis to the
When we correlate all the values of table 3 and table 4 with                    DLBCL and breast cancer datasets.
equation (2), we find                                                            DLBCL cancer: In order to determine the best combination
                                                                                of these input parameters, a series of 10-fold cross validation
0.239921/ 0.685752088= 0.349863463.                                             runs were performed on the DLBCL training dataset becomes
As this value is greater than 0.3 so this patient is suffering from             inefficient for BMA windows larger than 5 variables.
cancer.                                                                         Classification for DLBCL cancer: In this classification the
                                                                                monoclonal blood sample test, ESR, LDH and FNAC tests are
Diameter        Texture      Length       Fluidity    Cell      Volume          conducted by taking the blood samples of different patients.
of the cell                                           size
                                                                                Then after carrying out the supervised learning the test dataset
  -0.511          0.045          -0.737    0.226     -0.084      -0.15          of a patient is taken to classify a patient is cancer affected or
                                                                                not.
  -0.448         -0.573          -0.264    0.109     0.484      0.416

  0.005           0.496          -0.377    0.435     -0.161     0.186

  -0.097          0.242          -0.497   -0.639     -0.21      -0.715

  0.046          -0.075          -0.832   -0.499     -0.518     -0.481
                                                                                                        Figure 2
  -0.15          -0.436          -0.786   -0.299     0.585      -0.663

  -0.798         -0.506          -0.349   -0.591     -0.534     -0.671

  0.854          -0.637          -0.775    -0.63     0.353      -0.319


                            Table 5 Dataset of breast cancer

In table 5 there are the test datasets of patients those who are
suffering from breast cancer.


                                                                                                     Figure 3




                                                                           63                                  http://sites.google.com/site/ijcsis/
                                                                                                               ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 10, No. 5, May 2012




                                                                            VII        Conclusion and Future Work
                                                                            A. Conclusion
                                                                                 This paper has demonstrated that the Cosine BMA
                                                                            algorithm for survival analysis consistently isolates a small
                                                                            group of relevant predictor genes from microarray datasets,
                                                                            providing better predictive power than other feature selection
Figure 4: Curve showing survival of DLBCL cancer patients
                                                                            and supervised learning methods. The algorithm is easy to use.
                                                                            It efficiently reduces vast amounts of microarray data down to
Breast cancer: In the application of the Cosine BMA                         a handful of predictor variables, making it a cost-effective
algorithm to the breast cancer dataset the optimal input                    diagnostic tool in the clinical setting.
parameters as determined from the cross validation study on
the DLBCL dataset were used. During preliminary analyses                          B.   Future Work
training sets with fewer than 100 samples tended to result in
fatal errors at higher values of maxNvar.                                        Future work in this area would involve collaboration with
Classification of a particular patient: Firstly, I have                     cancer biologists to validate the predictor genes selected by
considered the various values for breast cancer test are taken              applying the Cosine BMA algorithm to microarray data, and
                                                                            to assess the prediction accuracy of the methodology on PCR
and learned to a system. Then a test data set of a patient is
                                                                            data generated using independent patient samples. By
taken having original values of mammography birads, fnac                    implementing this technique on certain hospital we can cure a
test and biopsy tests that finally shows the survival chances on            cancer patient in a better way. Furthermore, the Cosine BMA
a scale of five year scale.                                                 algorithm could be extended to other types of high-throughput
                                                                            data such as proteomics data produced from mass
                                                                            spectrometry. The multivariate nature of BMA combined with
                                                                            its ability to account for model uncertainty makes it a
                                                                            particularly attractive candidate to extract predictive genes
                                                                            from any high-dimensional biological dataset.
                     Figure 5

                                                                                                           REFERENCES

                                                                            [1]  J. E. Korkola, E. Blaveri, S. DeVries, D. H. Moore, E. S. Hwang,
                                                                                 (2007). “Identification of a Robust Gene Signature that Predicts
                                                                                 Breast Cancer Outcome in Independent Data Sets”. BMC Cancer, 7
                                                                                 (61),1471-2407/7/61.
                                                                            [2] T. Golub, D. Lonim, P. Tamayo, C. Huard, M. Gaasenbeek(1999).
                                                                                 “Molecular Classification of Cancer: Class Discovery and Class
                                                                                 Prediction by Gene Expression Monitoring”. Science, 286 (5439), 531-
                                                                                 537.
                                                                            [3] E.Bair, & R.Tibshirani (2004). Semi-Supervised Methods to Predict
                                                                                 Patient Survival from Gene Expression Data. PLOS Biology, 2 (4), 511-
                         Figure 6                                                522. .
                                                                             [4] D. Beer, S. Kardia, C. Huang and T. Levin, (2002). Gene-Expression
                                                                                  Profiles Predict Survival of Patients with Lung Adenocarcinoma.
                                                                                  Nature Medicine, 8 (8), 816-824’’
                                                                            [5] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, (2000).
                                                                                 Tissue Classification with Gene Expression Profiles. Journal of
                                                                                 Computational Biology, 7 (3-4), 559-583
                                                                            [6] L. Jiangeng, D. Yanhua, R. Xiaogang,(2007). “A Novel Hybrid
                                                                                 Approach to Selecting Marker Genes for Cancer Classification Using
                                                                                 Gene Expression Data”. The               International Conference on
                                                                                 Bioinformatics and Biomedical Engineering, 2007, ICBBE 2007, 264-
                                                                                 267
                                                                            [7] C. Chen, Y. Huang and J. Lee, (2003). Characterization of the Univariate
                                                                                  and Multivariate Techniques on the Analysis of Simulated and
                                                                                  MRI Datasets with Visual Task.Nuclear Science Symposium Conference
                                                                                  Record,2003IEEE 2468-2472,
                                                                            [8]    J. Cohen (2004). “Bioinformatics – An Introduction for
Figure 7 Curve for survival rate of breast cancer                                  Computer Scientists”. ACM Computing Surveys, 36 (2), 122-158.
                                                                            [9] S. Derksen and H. Keselman (1992). Backward, Forward and Stepwise
                                                                                  Automated Subset Selection Algorithms: Frequency of Obtaining Authentic
                                                                                  and Noise Variables. British Journal of Mathematical and Statistical Psychology,
                                                                                  45 , 265-282.




                                                                       64                                  http://sites.google.com/site/ijcsis/
                                                                                                           ISSN 1947-5500
                                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                                 Vol. 10, No. 5, May 2012




                           AUTHOR’S PROFILE

                     Jitasha Mishra received her B.Tech in Information
                     Technology From Biju Patnaik University Of
                     Technology in 2008 and M.Tech in Computer Science
                     And Engineering from Biju Patnaik University Of
                     Technology in 2011. She was an Assistant System
                     Manager in Thyssenkrupp Elevator india pvt. Ltd.
                     Between 2008 and 2009, and now is an Assistant
                     Professor in the Department of Computer Science and
                     Engineering at the Gandhi Institute of Technology And
Management, Bhubaneswar since 2010. Her research interests are
bioinformatics, datamining, hypertext information retrieval, web analysis.

Debashis Hati received his M.tech in Computer Science from NIT, Rourkela
in 2000. He was an Assistant Professor in C.V.Raman Engineering college
and KIIT University between 2001 and 2010, and now he is an Assistant
Professor in the Department of Computer Science and Enineering Gandhi
Engineering College. His research interests are bioinformatics, datamining,
hypertext information retrieval, web analysis.

                  Amritesh Kumar received his MCA from IGNOU in 2007
                  and M.Tech in Computer Science from KIIT University in
                  2010. He was a trainee in IBM between 2007 and 2008, and
                  he is an Assistant Professor in the department of Computer
                  Science And Engineering at the Gandhi Institute of
                  Technology And Management, Bhubaneswar. Her research
interests are bioinformatics, datamining, hypertext information retrieval, web
analysis.




                                                                                 65                         http://sites.google.com/site/ijcsis/
                                                                                                            ISSN 1947-5500

						
Related docs
Other docs by ijcsiseditor