Constructing Models for MicroArray Data with Swarm Algorithm by ijcsiseditor


The International Journal of Computer Science and Information Security (IJCSIS) is a well-established publication venue on novel research in computer science and information security. The year 2010 has been very eventful and encouraging for all IJCSIS authors/researchers and IJCSIS technical committee, as we see more and more interest in IJCSIS research publications. IJCSIS is now empowered by over thousands of academics, researchers, authors/reviewers/students and research organizations. Reaching this milestone would not have been possible without the support, feedback, and continuous engagement of our authors and reviewers.

Field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. ( See monthly Call for Papers)

We are grateful to our reviewers for providing valuable comments. IJCSIS December 2010 issue (Vol. 8, No. 9) has paper acceptance rate of nearly 35%.
We wish everyone a successful scientific research year on 2011.

Available at
IJCSIS Vol. 8, No. 9, December 2010 Edition
ISSN 1947-5500 � IJCSIS, USA.

More Info
									                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 8, No. 9, December 2010

    Constructing Models from Microarray Data with
                 Swarm Algorithms.
                     Mrs.Aruchamy Rajini                                    Dr. (Mrs.)Vasantha kalayani David
              Lecturer in Computer Applications                          Associate Professor, Department of Computer Science
       Hindusthan College of Arts & Science, Coimbatore                  Avinashilingam Deemed University, Coimbatore

Abstract                                                               been computed for better use of classification algorithm in
Building a model plays an important role in DNA                        micro array gene expression data [3] [4].
microarray data. An essential feature of DNA microarray                Variable selection refers to the problem of selecting input
data sets is that the number of input variables (genes) is far         variables that are most predictive for a given outcome.
greater than the number of samples. As such, most
classification schemes employ variable selection or feature
                                                                       Appropriate variable selection can greatly enhance the
selection methods to pre-process DNA microarray data. In               effectiveness and potential interpretability of an inference
this paper Flexible Neural Tree (FNT) model for gene                   model. Variable selection problems are found in all
expression profiles classification is done. Based on the pre-          supervised and unsupervised machine learning tasks
defined instruction/operator sets, a flexible neural tree              including classification, regression, time-series prediction,
model can be created and evolved. This framework allows                and clustering [5].
input variables selection, over-layer connections and
different activation functions for the various nodes involved.         This paper develops a Flexible Neural Tree (FNT) [6] for
The FNT structure is developed using the Ant Colony                    selecting the input variables. Based on the pre-defined
Optimization (ACO) and the free parameters embedded in                 instruction/operator sets, a flexible neural tree model can
the neural tree are optimized by Particle Swarm                        be created and evolved. FNT allows input variables
Optimization (PSO) algorithm and its enhancement (EPSO).               selection, over-layer connections and different activation
The purpose of this research is to find the model which is             functions for different nodes. The tuning of the
an appropriate model for feature selection and tree-based
                                                                       parameters encoded in the structure is accomplished using
ensemble models that are capable of delivering high
performance classification models for microarray data.
                                                                       Particle Swarm Optimization (PSO) algorithm and its
Keywords --- DNA, FNT, ACO, PSO, EPSO
                                                                       The proposed method interleaves both optimizations.
                     I.   INTRODUCTION                                 Starting with random structures and corresponding
A DNA micro array (also commonly known as DNA chip                     parameters, it first tries to improve the structure and then
or gene array) is a collection of microscopic DNA spots                as soon as an improved structure is found, it then tunes its
attached to a solid surface, such as glass, plastic or silicon         parameters. It then goes back to improving the structure
chip forming an array for the purpose of expression                    again and, then tunes the structure and rules' parameters.
profiling, monitoring expression levels for thousands of               This loop continues until a satisfactory solution is found
genes simultaneously. Micro arrays provide a powerful                  or a time limit is reached.
basis to monitor the expression of thousands of genes, in                     II.   THE FLEXIBLE NEURAL TREE MODEL
order to identify mechanisms that govern the activation of
genes in an organism [1].                                              The function set F and terminal instruction set T used for
                                                                       generating a FNT model are described as S = F U T =
Recent advances in DNA micro array technology allow                    {+2,+3, . . . ,+N}U{x1, . . . , xn}, where +i(i = 2, 3, . . .,N)
scientists to measure expression levels of thousands of                denote non-leaf nodes’ instructions and taking i
genes simultaneously in a biological organism. Since the               arguments. x1,x2,. . .,xn are leaf nodes instructions and
cancer cells usually evolve from normal cells due to                   taking no other arguments. The output of a non-leaf node
mutations in genomic DNA, comparison of the gene                       is calculated as a flexible neuron model (see Fig.1). From
expression levels of cancerous and normal tissues or                   this point of view, the instruction +i is also called a
different cancerous tissues may be useful to identify those            flexible neuron operator with i inputs.
genes that might anticipate the clinical behavior of
cancers.                                                               In the creation process of neural tree, if a nonterminal
                                                                       instruction, i.e., +i(i =2, 3, 4, . . .,N) is selected, i real
Micro array technology has made the modern biological                  values are randomly generated and used for representing
research by permitting the simultaneous study of genes                 the connection strength between the node +i and its
comprising a large part of genome [2]. In response to the              children. In addition, two adjustable parameters ai and bi
development of DNA micro array technologies,                           are randomly created as flexible activation function
classification methods and gene selection techniques are               parameters and their value range are [0, 1]. For

                                                                                                    ISSN 1947-5500
                                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                                        Vol. 8, No. 9, December 2010

    developing the forecasting model, the flexible activation                          on two specific SI algorithms well-known as Particle
    function f (ai, bi, x) = e− ((x−ai)/bi)2 is used.                                  Swarm Optimization (PSO) and Ant Colony Optimization
    The total excitation of +n is
                         netn = ∑nj=1 wj * xj,                                         PSO was originated from computer simulations of the
                                                                                       coordinated motion in flocks of birds or schools of fish.
    where xj (j = 1, 2, . . ., n) are the inputs to node +n and wj                     As these animals wander through a three dimensional
    are generated randomly with their value range                                      space, searching for food or evading predators, these
    are[0,1].The output of the node +n is then calculated by                           algorithms make use of particles moving at velocity
    outn = f(an, bn, netn) =e−( (netn−an)/bn)2 . The overall output of                 dynamically adjusted according to its historical behaviors
    flexible neural tree can be computed from left to right by                         and its companions in an n-dimensional space to search
    depth-first method, recursively [7].                                               for solutions for an n-variable function optimization
                                                                                       problem. The Particle Swarm Optimization algorithm
                                                                                       includes some tuning parameters that greatly influence the
              X1                                                                       algorithm performance, often stated as the exploration
                                w1                                                     exploitation trade off. Exploration is the ability to test
                                                                                       various regions in the problem space in order to locate a
                      w2                                                               good optimum, hopefully the global one. Exploitation is
     X2                                      +n                                  Y     the ability to concentrate the search around a promising
                                                                                       candidate solution in order to locate the optimum
                           w3                                                          precisely [8][9][10][11].

                                                                                       El-Desouky et al., in [10] proposed a more enhanced
         X3                                                                            particle swarm algorithm depending on exponential
                                                                                       weight variation instead of varying it linearly which gives
    Output                                                                             better results when applied on some benchmarks
    Layer                               +6                                             functions. In this paper three models are compared: 1) A
                                                                                       Tree structure is created with ACO 2) A Tree structure is
                                                                                       created with ACO and the parameters are optimized with
                                                                                       PSO 3) A Tree Structure is created with ACO and the
                                                                                       parameters are optimized with EPSO. Comparisons of the
                                                                                       three models are shown in this paper to propose an
                                                                                       efficient methodology.
                                                                                             IV. ANT COLONY OPTIMIZATION (ACO) FOR
hidden        X1      X2          +2         X3          +3               +3                        EVOLVING THE ARCHITECTURE OF
                                                                                        ACO is a new probabilistic technique for solving
                                                                                       computational problems to find optimal path. It is a
                                                                                       paradigm for designing metaheuristic algorithm for
 First                                                                                 combinatorial optimization problems. The main
 hidden               +3                                                               underlying idea, inspired by the behavior of real ants, is
                                  X1        X2                   X3
 layer                                                +2                               that of a parallel search over several constructive threads
                                                                                       based on local problem data and on a dynamic memory
                                                                                       structure containing information on the quality of
                                                                                       previously obtained results.
 Input                                                                                  In this algorithm, each ant will build and modify the trees
 layer         X1     X2     X3                X3          X2     X1      X2     X3    according to the quantity of pheromone at each node.
                                                                                       Each node memorizes the rate of pheromone. First, a
                                                                                       population of programs is generated randomly. Each node
    Fig. 1. A flexible neuron operator (left), and a typical representation of         is initialized at 0.5, which means that the probability of
    the FNT with function instruction set F = {+2,+3,+4,+5,+6}, and terminal
    instruction set T = {x1, x2, x3} (right)                                           choosing each terminal and function is equal initially. The
                                                                                       higher the rate of pheromone, the higher the probability to
                    III. SWARM INTELLIGENCE                                            be chosen. Each ant is then evaluated using a predefined
                             ALGORITHMS.                                               objective function which is given by Mean Square Error
    Swarm Intelligence (SI) has recently emerged as a family
    of nature inspired algorithms, especially known for their                                                                                 (1)
    ability to produce low cost, fast and reasonably accurate                                        Fit (i) =1/p ∑p j=1 (At - Ex)2
    solutions to complex search problems [1]. It gives an
    introduction to swarm intelligence with special emphasis

                                                                                                                   ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 8, No. 9, December 2010

Where p is the total number of samples, At and Ex are                      (i is the index of the particle) and a velocity represented
actual and expected outputs of the j        th
                                                 sample.    Fit(i)         by a velocity-vector vi. Each particle remembers its own
denotes the fitness value of the ith ant.                                  best position so far in a vector x.

The pheromone is updated by two mechanisms:                                Each particle keeps track of its own best position, which
                                                                           is associated with the best fitness it has achieved so far in
– 1. Trail Evaporation: - Evaporation decreases the rate of                a vector pi. The best position among all the particles
pheromone for every instruction on every node, in order                    obtained so far in the population is kept track of as pg.
to avoid unlimited accumulation of trails, according to
following formula:                                                         Each particle i maintains the following information:
                                                                           xi the current position of the particle, vi the current
                     Pg = (1 − α) Pg−1                     (2)             velocity of the particle must be defined by parameters
                                                                           vmin and vmax. At each time step t, by using individual best
where Pg denotes the pheromone value at the generation                     position pi, and all the global best position, pg(t), a new
g, α is a constant (α = 0.15).                                             velocity for particle i is updated by[1]
– 2.Daemon actions: - For each tree, the components of
the tree will be reinforced according to the Fitness of the                  Vi (t+1) = wvi(t)+c1φ1(pi(t) – xi(t))+
tree. The formula is                                                                               c2φ2 (pg(t) – Xi(t))               (4)
                                                                           Where w is the inertia weight whose range is [0.4, 0.9], c1
        Pi,si = Pi,si + α                                    (3)           and c2 are positive constant and are the learning factors
                                                                           called, respectively, cognitive parameter and social
                                    F it(s)                                parameter. The proper fine-tuning may result in faster
                                                                           convergence and alleviation of local minima. The default
                                                                           values, usually, c1=c2=2 are used. Even by using
where s is a solution (tree), Fit(s) its Fitness, si the                   c1=c2=1.49 gives better results. φ1and φ2 are uniformly
function or the terminal set at node i in this individual, á               distributed random number in range of [0, 1].
is a constant (á = 0.1), Pi,si is the value of the pheromone
for the instruction si in the node i[7].                                   During the iteration time t, the update of the velocity from
                                                                           the previous velocity to the new velocity is determined.
A brief description of AP algorithm is as follows:(1)                      The new position is then determined by the sum of the
every component of the pheromone tree is set to an                         previous and the new velocity, according to the formula:
average value; (2) random generation of tree based on the
pheromone; (3) evaluation of ants (4) update of the                           Xi (t+1) = xi(t) + vi(t+1)                          (5)
pheromone; (5) go to step (1) unless some criteria is
satisfied[7]                                                               Various methods are used to identify particle to influence
                                                                           the individual. Two basic approaches to PSO exist
      V. PARAMETER OPTIMIZATION WITH PSO.                                  based on the interpretation of the neighborhood of
                                                                           particles. They are (1) global best (gbest) version of
PSO [12] is in principle such a multi-agent parallel search                PSO where the neighborhood of each particle is the
technique. It does not require any gradient information of                 entire swarm. The social component then causes
the function to be optimized, uses only primitive                          particles to be drown toward the best particle in the
mathematical operators. Particles are conceptual entities                  swarm.(2) local best (lbest) PSO model, particles have
which fly through the multi-dimensional search space.                      information only of their own and their nearest array
                                                                           neighbors best(lbest) rather than that of entire group. The
PSO was inspired by the social behavior of a bird                          gbest model converges quickly but has weakness of being
flock or fish school.PSO[13] conducts searches using a                     trapped in local optima. The gbest is recommended
population of particles which correspond to individuals.                   strongly for unimodal objective function [1].
In the PSO algorithm, the birds in a flock are
symbolically represented as particles. These particles                     The PSO is executed with repeated application of
can be considered as simple agents flying” through a                       equation (4), (5) until a specified number of
problem space. A particle’s location represents a potential                iterations has been exceeded or when the velocity
solution for the problem in the multi-dimensional problem                  updates are close to zero over a number of iterations.
space. A different problem solution is generated, when a
particle moves to a new location.                                          The PSO algorithm work as follows:
PSO model consists of a swarm of particles, which are                      1) Initial population is generated randomly. The learning
initialized with a population of random positions. They                    parameters c1, c2 are assigned in advance.2) The objective
move iteratively through the d-dimension problem space                     function value for each particle is calculated.3) Search
to search the new solutions, where the fitness, f, (Eqn. (1))              point is modified. The current search point of each
can be calculated as the certain qualities measure. Each                   particle is changed using Equations (4) and (5).4) If
particle has a position represented by a position-vector xi

                                                                                                       ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 8, No. 9, December 2010

maximum number of iterations is reached, then stop;                     flexible activation function parameters) encoded in the
otherwise go to step (2).                                               best tree formulate a particle.
                                                                        5) If the maximum number of local search is reached, or
         VI. EXPONENTIAL PARTICLE SWARM                                 no better parameter vector is found for a significantly
                   OPTIMIZATION (EPSO)                                  long time then go to step 6); otherwise go to step 4);
                                                                        6) If satisfactory solution is found, its corresponding
In linear PSO, the particles tend to fly towards the gbest              informative genes are extracted, then the algorithm is
position found so far for all particles. This social                    stopped; otherwise go to step 2).
cooperation helps them to discover fairly good solutions
rapidly. However, it is exactly this instant social                                              VII. RESULTS
collaboration that makes particles stagnate on local
optima and fails to converge at global optimum. Once a                  As a Preliminary study, the Wisconsin Prognostic breast
new gbest is found, it spreads over particles immediately               cancer (WPBC)[18] data set has 34 attributes (32 real-
and so all particles are attracted to this position in the              valued) and 198 instances. The methodology adopted for
subsequent iterations until another better solution is                  breast cancer data set was applied. Half of the observation
found. Therefore, the stagnation of PSO is caused by the                was selected for training and the remaining samples for
overall speed diffusion of newly found gbest [10].                      testing the performance of different models. All the
                                                                        models were trained and tested with same set of data. The
An improvement to original PSO is constituted by the fact               instruction set used to create an optimal FNT classifier S
that w is not kept constant during execution; rather,                   = FUT = {+2,……… ,+N} U {x0.x1,…..,x31}Where xi
starting from maximal value, it is linearly decremented as              (i=0,1,….31) denotes the 32 input features. To get an
the number of iterations increases down to a minimal                    optimal tree structure an ACO algorithm is applied. In this
value [4], initially set to 0.9, decreasing to 0.4 over the             experiment the input is the number of ant and the number
first 1500 iterations if the iterations are above 1500, and             of iterations. Each ant is made to run for a specified
remaining 0.4 over the remainder of the run according to                number of iterations. Each ant constructs a neural tree
                                                                        with its objective function which is calculated as MSE.
                                                                        The ant which gives the low MSE is taken to be the best
W = (w – 0.4) (MAXITER - ITERATION)                                     tree for which the parameters are optimized with PSO and
                                                         + 0.4          EPSO. The tree which produces the low error is the
                            MAXITER                         (6)         optimized neural tree and this extracts the informative

MAXITER is the maximum number of iterations, and                         As with breast cancer data set, it was well proven that the
ITERATION represents the number of iterations.                          tree structure with ACO and parameter optimization done
                                                                        with EPSO can achieve better accuracy compared with
EPSO has a great impact on global and local exploration                 the other models. The main purpose is to compare the
it is supposed to bring out the search behavior quickly and             models quality, where the quality is measured according
intelligently as it avoid the particles from stagnation of              to the error rate, mean absolute percentage error and
local optima by varying this inertia weight exponentially,              accuracy. The ACO-EPSO model has the smallest error
as given                                                                rate when compared with the other models. All the three
                                                                        models are made to run for the same number of iterations
W = (w – 0.4) e(   MAXITER - ITERATION )-1
                                             / MAXITER   + 0.4          and the results shows that ACO-EPSO success to reach
                                                            (7)         optimal minimum in all runs. This method gives the best
                                                                        minimum points better than the other models. This is
By using the Equation (7) the movement of particles will                depicted in the following figures.
be faster and distant from each other.
                                                                        In Figure 1 and 2 the error rate and mean absolute
    A.    General learning Procedure:                                   percentage error of the model ACO-EPSO is low when
                                                                        compared with ACO and ACO–PSO.
The general learning procedure for constructing the FNT
model can be described as follows.
1) Create an initial population randomly (Set FNT trees
and its corresponding parameters);
2) Structure optimization is achieved by the Ant Colony
Optimization Algorithm.
3) If a better structure is found, then go to step 4),
otherwise go to step 2);
4) Parameter optimization is achieved by the EPSO
algorithm. In this stage, the architecture of FNT model is
fixed, and it is the best tree developed during the end of
run of the structure search. The parameters (weights and

                                                                                                    ISSN 1947-5500
                                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                    Vol. 8, No. 9, December 2010

   Fig1: Comparison of models in terms of error rate

                                                                        Fig3: Comparison of models in terms of accuracy


                                                                   A new forecasting model based on neural tree
                                                                   representation by ACO and its parameters optimization by
                                                                   EPSO was proposed in this paper. A combined approach
                                                                   of ACO and EPSO was encoded in the neural tree was
                                                                   developed. It should be noted that there are other tree-
                                                                   structure based evolutionary algorithms and parameter
                                                                   optimization algorithms that could be employed to
                                                                   accomplish same task but this proposed model yields
                                                                   feasibility and effectiveness .This proposed new model
                                                                   helps to find optimal solutions at a faster convergence.
                                                                   EPSO convergence is slower to low error, while other
                                                                   methods convergence faster to large error. The Proposed
 Fig2: Comparison of models in terms of mean absolute              method increases the possibility to find the optimal
                   percentage error                                solutions as it decreases with the error rate.

In Figure 3 the accuracy of the model with ACO-EPSO is             [1] Swagatam DAjith Abraham ,Amit Konar,”Swarm
high, which shows that the proposed model is highly                Intelligence Algorithms in Bioinformatics”,Studies in
efficient that it could be used for faster convergence and         Computational Intelligence(SCI)94,113-147 Springer – Verlag
slower error rate.                                                 Berlin Heidelberg (2008).
                                                                   [2] Per Broberg,”Statistical methods for ranking differentially
                                                                   expressed genes molecular sciences”,AstraZeneca Research and
                                                                   Development Lund,s-221 87 Lund,Sweden 7 May 2003.
                                                                   [3]Hong Chai and Carlotta Domeniconi, “An Evaluation of Gene
                                                                   Selection methods for multiclass microarray Data
                                                                   Classification”, Proceedings of the second European Workshop
                                                                   on Data Mininig and Text Mining in bioinformatics.

                                                                                               ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 8, No. 9, December 2010

[4] J.Jager,R.Sengupta,W.L.Ruzzo,” Improved gene
selection for classifications of microarrays”,Pacific
Symposium on Biocomputing8:53-64(2003).
[5] Yuehui Chen, Ajith Abraham and Bo Yang, “Feature
Selection & Classification using Flexible Neural Tree”, Elsevier
Science, 15 January 2006.
[6] Chen, Y., Yang, B., Dong, J., “Nonlinear systems modelling
via optimal design of neural trees”. International Journal of
Neural Systems. 14, (2004) 125-138.
[7] Yuehui Chen, Bo Yang and Jiwen Dong, “Evolving Flexible
Neural Networks using Ant Programming and PSO Algorithm”,
Springer – Verlag Berlin Heidelberg 2004.
[8] Li-ping, Z., Huan-jun, Y., Shang-xu, H., Optimal Choice of
Parameters for Particle Swarm Optimization, Journal of
Zhejiang University Science, Vol. 6(A)6, pp.528-534, 2004.
[9] Sousa, T., Silva, A., Neves, A., Particle Swarm Based Data
Mining Algorithms for Classification Tasks, Parallel Computing
30, pp. 767-783, 2004.
[10] El-Desouky N., Ghali N., Zaki M., A New Approach to
Weight Variation in Swarm Optimization, proceedings of Al-
azhar Engineering, the 9th International Conference, April 12 -
14, 2007.
[11] Neveen I.Ghali, Nahed EL-Dessouki, Mervat A.N and
Lamiaa Bakrawi, “Exponential Particle Swarm Optimization
approach for Improving Data Clustering”, World Academy of
Science, Engineering & Technology 42, 2008.
[12] Kennedy.j, Eberhart.R. and Shi.Y. (2001),”Swarm
Intelligence”,Morgan Kaufmann Academic Press.
 [13] Kennedy, J., Eberhart, R., Particle Swarm Optimization,
Proceedings of the IEEE International joint conference or Neural
networks, vol.4, pp. 1942-1948, 1995.
[14] Shi, Y., Eberhart, R., Parameter Selection in Particle
Swarm Optimization, proceedings of the 7th International
Conference on Evolutionary Programming VII, pp. 591 – 600,
[15]Vasantha Kalyani David, Sundaramoorthy Rajasekaran:
“Pattern Recognition using Neural and Functional Networks”
Springer 2009
[16] Yuehui Chen, Ajith Abraham and Lizhi Peag, “Gene
Expression Profiling using Flexible Neural Trees”, Springer –
Verlag Berlin Heidelberg 2006.
[17] Dr. V. Sarvanan, R. Mallika, “An Effective Classification
Model for Cancer Diagnosis using Micro Array Gene
Expression Data” IEEE xplore.
[18] Available on the UW CS ftp server, ftp, cd
[19] Yuehui CHEN,Ajith Abraham and Yong Zhang ,”Ensemble
of Flexible Neural Trees for Breast Cancer Detection”2003.

                                                                                                    ISSN 1947-5500

To top