A Hybrid PSO-SVM Approach for Haplotype Tagging SNP Selection Problem
IJCSIS is an open access publishing venue for research in general computer science and information security. Target Audience: IT academics, university IT faculties; industry IT departments; government departments; the mobile industry and computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; computer science, computer applications, multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. The average paper acceptance rate for IJCSIS issues is kept at 25-30% with an aim to provide selective research work of quality in the areas of computer science and engineering. Thanks for your contributions in September 2010 issue and we are grateful to the experienced team of reviewers for providing valuable comments.
- views:
- 87
- posted:
- 10/10/2010
- language:
- English
- pages:
- 6

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
A Hybrid PSO-SVM Approach for Haplotype
Tagging SNP Selection Problem
Min-Hui Lin Chun-Liang Leu
Department of Computer Science and Information Department of Information Technology, Ching Kuo
Engineering, Dahan Institute of Technology, Institute of Management and Health,
Sincheng, Hualien County , Taiwan, Republic of China Keelung , Taiwan, Republic of China
Abstract—Due to the large number of single nucleotide two categories: block-based and block-free methods. The
polymorphisms (SNPs), it is essential to use only a subset of all block-based methods [1-2] firstly partition human genome into
SNPs called haplotype tagging SNPs (htSNPs) for finding the haplotype blocks. The haplotype diversity is limited and then
relationship between complex diseases and SNPs in biomedical subsets of tagging SNPs are searched within each haplotype
research. In this paper, a PSO-SVM model that hybridizes the block. A main drawback of block-based methods is that the
particle swarm optimization (PSO) and support vector machine definition of blocks is not a standard form and there is no
(SVM) with feature selection and parameter optimization is consensus about how these blocks should be partitioned. The
proposed to appropriately select the htSNPs. Several public algorithmic framework for selecting a minimum informative
datasets of different sizes are considered to compare the proposed
set of SNPs avoiding any reference to haplotype blocks is
approach with other previously published methods. The
computational results validate the effectiveness and performance
called block-free methods [3]. In the literature [4-5], feature
of the proposed approach and the high prediction accuracy with selection technique was adopted to solve for the tagging SNPs
the fewer htSNPs can be obtained. selection problem and achieved some promising results.
Feature selection algorithms may be widely categorized
Keywords : Single Nucleotide Polymorphisms (SNPs), into two groups: the filter approach and the wrapper approach.
Haplotype Tagging SNPs (htSNPs), Particle Swarm Optimization The filter approach selects highly ranked features based on a
(PSO), Support Vector Machine (SVM). statistical score as a preprocessing step. They are relatively
computationally cheap since they do not involve the induction
algorithm. Wrapper approach, on the contrary, directly uses the
I. INTRODUCTION induction algorithm to evaluate the feature subsets. It generally
outperforms filter method in terms of classification accuracy,
The large number of single nucleotide polymorphisms but computationally more intensive. Support Vector Machine
(SNPs) in the human genome provides the essential tools for (SVM) [6] is a useful technique for data classification. A
finding the association between sequence variation and practical difficulty of using SVM is the selection of parameters
complex diseases. A description of the SNPs in each such as the penalty parameter C of the error term and the kernel
chromosome is called a haplotype. The string element of each parameter γ in RBF kernel function. The appropriate choice of
haplotype is 0 or 1, where 0 denotes the major allele and 1 parameters is to get the better generalization performance.
denotes the minor allele. The genotype is the combined
information of two haplotypes on the homologous In this paper, a hybrid PSO-SVM model that incorporates
chromosomes and is prohibitively expensive to directly the Particle Swarm Optimization (PSO) and Support Vector
determine the haplotypes of an individual. Usually, the string Machine (SVM) with feature selection and parameter
element of a genotype is 0, 1, or 2, where 0 represents the optimization is proposed to appropriately select the htSNPs.
major allele in homozygous site, 1 represents the minor allele Several public benchmark datasets are considered to compare
in homozygous site, and 2 is in the heterozygous site. The the proposed approach with other published methods.
genotyping cost is affected by the number of SNPs typed. In Experimental results validate the effectiveness of the proposed
order to reduce this cost, a small number of haplotype tagging approach and the high prediction accuracy with the fewer
SNPs (htSNPs) which predicts the rest of SNPs are needed. htSNPs can be obtained. The remainder of the paper is
organized as follows: Section 2 introduces the problem
The haplotype tagging SNP selection problem has become formulation. Section 3 describes the PSO and SVM classifier.
a very active research topic and is promising in disease In Section 4, the particle representation, fitness measurement,
association studies. Several computational algorithms have and the proposed hybrid system procedure are presented. Three
been proposed in the past few years, which can be divided into public benchmark problems are used to validate the proposed
60 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
approach and the comparison results are described in Section 5. where d = 1, 2,..., D , i = 1, 2,..., S , and D is the dimension of
Finally, conclusions are made in Section 6. the problem space, S is the size of population, k is the iterative
k k
times; vid is the i-th particle velocity, xid is the current particle
k
solution, pbid is the i-th particle best ( pbest ) solution achieved
II. PROBLEM FORMULATION k
so far; gbid is the global best ( gbest ) solution obtained so far by
As shown in Figure 1, assume that dataset U consists of n any particle in the population; r1 and r2 are random values in
haplotypes {hi }1≤i ≤ n , each with p different SNPs {S j }1≤ j ≤ p , U is the range [0,1], both of c1 and c2 are learning factors, usually
n×p matrix. Each row in U indicates the haplotype hi and each c1 = c2 = 2 , w is a inertia factor. A large inertia weight
column in U represents the SNP S j . The element di , j denotes facilitates global exploration, while a small one tends to local
the j-th SNP of i-th haplotype, di , j ∈ {0,1} . Our goal is to exploration. In order to achieve more refined solution, a
general rule of thumb suggests that the initial inertia value had
determine a minimum size g set of selected SNPs (htSNPs)
better be set to the maximum wmax = 0.9 , and gradually down
V = {vk }, k ∈ {1, 2,..., p} , g = V , in which each random
to the minimum wmin = 0.4 .
variable vk corresponding to the k-th SNP of haplotypes in U,
to predict the remaining unselected ones with a minimum According to the searching behavior of PSO, the gbest
prediction error. The size of V is smaller than a user-defined value will be an important clue in leading particles to the global
value R ( g ≤ R ), and the selected SNPs are called haplotype optimal solution. It is unavoidable for the solution to fall into
tagging SNPs (htSNPs) while the remaining unselected ones the local minimum while particles try to find better solutions.
are named as tagged SNPs. Thus, the selection set V of htSNPs In order to allow the solution exploration in the area to produce
is based on how well to predict the remaining set of the more potential solutions, a mutation-like disturbance operation
unselected SNPs and the number g of selected SNPs is usually is inserted between Eq. (1) and Eq. (2). The disturbance
minimized according to the prediction error by calculating the operation random selects k dimensions (1 ≤ k ≤ problem
leave-one-out cross-validation (LOOCV) experiments [7]. dimensions) of m particles (1 ≤ m ≤ particle numbers) to put
Gaussian noise into their moving vectors (velocities). The
S1 S2 L Sj L S p −1 Sp disturbance operation will affect particles moving toward to
unexpected direction in selected dimensions but not previous
h1 ⎡ d1,1 d1,2 L d1, j L d1, p −1 d1, p ⎤ experience. It will lead particle jump out from local search and
⎢d d 2,2 L d 2, j L d 2, p −1 d 2, p ⎥ further can explore more un-searched area.
h2 ⎢ 2,1 ⎥
M ⎢ M M O M N M M ⎥ According to the velocity and position updated formula
⎢ ⎥ mentioned above, the basic process of the PSO algorithm is
hi ⎢ di ,1 d i ,2 L di , j L d i , p −1 di , p ⎥
given as follows:
M ⎢ M M N M O M M ⎥
⎢ ⎥ 1.) Initialize the swarm by randomly generating initial
hn −1 ⎢ d n −1,1 d n −1,2 L d n −1, j L d n −1, p −1 d n −1, p ⎥
particles.
hn ⎢ d n ,1
⎣ d n ,2 L d n, j L d n , p −1 d n, p ⎥ ⎦ n× p 2.) Evaluate the fitness of each particle in the population.
Figure 1 The haplotype tagging SNP Selection Problem.
3.) Compare the particle’s fitness value to identify the both
of pbest and gbest values.
4.) Update the velocity of all particles using Equation (1).
5.) Add disturbance operator to moving vector (velocity).
III. RELATED WORKS
6.) Update the position of all particles using Equation (2).
7.) Repeat the Step 2 to Step 6 until a termination criterion
A. Particle Swarm Optimization is satisfied (e.g., the number of iteration reaches the pre-defined
maximum number or a sufficiently good fitness value is
The PSO is a novel optimization method originally obtained).
developed by Kennedy and Eberhart [8]. It models the The authors [8] proposed a discrete binary version to allow
processes of the sociological behavior associated with bird the PSO algorithm to operate in discrete problem spaces. In the
flocking and is one of the evolutionary computation techniques. binary PSO (BPSO), the particle’s personal best and global
In the PSO, each solution is a ‘bird’ in the flock and is referred best is updated as in continuous value. The major different
to as a ‘particle’. A particle is analogous to a chromosome in between discrete PSO with continuous version is that velocities
GA. Each particle traverses the search space looking for the of the particles are rather defined in terms of probabilities that a
global optimum. The basic PSO algorithm is as follow: bit whether change to one. By this definition, a velocity must
vid+1 = w ⋅ vid + c1 ⋅ r1 ⋅ ( pbid − xid ) + c2 ⋅ r2 ⋅ ( gbid − xid )
k k k k k k
(1) be restricted within the range [Vmin , Vmax ] . If vid+1 ∉ (Vmin , Vmax )
k
then vid+1 = max(min(Vmax , vid+1 ), Vmin ) . The new particle position
k k
xid+1 = vid+1 + xid
k k k
(2) is calculated using the following rule:
61 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
If rand ( ) < S (vid+1 ) , then xid+1 = 1 ; else xid+1 = 0
k k k
(3) Linear SVM can be generalized to non-linear SVM via a
mapping function Φ , which is also called the kernel function,
and the training data can be linearly separated by applying the
1
, where S (vid+1 ) =
k
k +1
(4) linear SVM formulation. The inner product (Φ ( xi ) ⋅ Φ ( x j )) is
1 + e − vid calculated by the kernel function k ( xi , x j ) for given training
The function S (vid ) is a sigmoid limiting transformation and data. By introducing the kernel function, the non-linear SVM
rand() is a random number selected from a uniform distribution (optimal hyper-plane) has the following forms:
in [0, 1]. Note that the BPSO is susceptible to sigmod function m
saturation which occurs when velocity values are either too f ( x, α * , b* ) = ∑ yiα i* < Φ ( x) ⋅ Φ ( xi ) > +b*
large or too small. For a velocity of zero, it is a probability of i =1
50% for the bit to flip. m
(8)
= ∑ yiα k ( x, xi ) + b
*
i
*
i =1
B. Support Vector Machine Classifier
Though new kernel functions are being proposed by
SVM starts from a linear classifier and searches the optimal researchers, there are four basic kernels as follows.
hyper-plane with maximal margin. The main motivating
criterion is to separate the various classes in the training set • Linear: k ( xi , x j ) = xiT x j (9)
with a surface that maximizes the margin between them. It is an
approximate implementation of the structural risk minimization • Polynomial: k ( xi , x j ) = (γ xiT x j + r ) d , γ > 0 (10)
induction principle that aims to minimize a bound on the
generalization error of a model. • RBF: k ( xi , x j ) = exp(−γ || xi − x j ||2 ), γ > 0 (11)
Given a training set of instance-label pairs • Sigmoid: k ( xi , x j ) = tanh(γ xiT x j + r ) (12)
( xi , yi ), i = 1, 2,..., m where xi ∈ R n and yi ∈ {+1, −1} . The
generalized linear SVM finds an optimal separating hyper- where γ , r and d are kernel parameters. Radial basis function
plane f ( x) = w ⋅ x + b by solving the following optimization (RBF) is a common kernel function as Eq. (11). In order to
problem: improve classification accuracy, the kernel parameter γ in the
kernel function should be properly set.
m
1 T
Minimize w w + C ∑ ξi
w , b ,ξ 2 i =1 (5) IV. METHODS
Subject to : yi (< w ⋅ xi > +b) + ξi − 1 ≥ 0, ξi ≥ 0 As the htSNPs selection problem mentioned above in
Section 2, the notations and definitions are used to present our
where C is a penalty parameter on the training error, and ξi is proposed method. In the dataset U of n×p matrix, each row
the non-negative slack variables. This optimization model can (haplotypes) can be viewed as a learning instance belonging to
be solved using the Lagrangian method, which maximizes the a class and each column (SNPs) are attributes or features based
same dual variables Lagrangian LD (α ) (6) as in the separable on which sequences can be classified into class. Given the
case. values of g htSNPs of an unknown individual x and the known
full training samples from U, a SNP prediction process can be
m
1 m treated as the problem of selecting tagging SNPs as a feature
Maximize LD (α ) = ∑ α i − ∑ α iα j yi y j < xi ⋅ x j > selection problem to predict the non-selected tagging SNPs in x.
α i =1 2 i , j =1
(6) Thus, the tagging SNPs selection can be transformed to solve
m
Subject to : 0 ≤ α i ≤ C , i = 1, 2,..., m and ∑ α i yi = 0
i =1
for a binary classification of vectors with g coordinates by
using the support vector machine classifier. Here, an effective
PSO-SVM model that hybridizes the particle swarm
To solve the optimal hyper-plane, a dual Lagrangian optimization and support vector machine with feature selection
LD (α ) must be maximized with respect to non-negative α i and parameter optimization is proposed to appropriately select
m the htSNPs. The particle representation, fitness definition,
under the constraint ∑α y
i =1
i i = 0 and 0 ≤ α i ≤ C . The penalty disturbance strategy for PSO operation and system procedure
for the proposed hybrid model are described as follows.
parameter C is a constant to be chosen by the user. A larger
value of C corresponds to assigning a higher penalty to the
errors. After the optimal solution α i* is obtained, the optimal A. Particle Representation
* *
hyper-plane parameters w and b can be determined. The The RBF kernel function is used in the SVM classifier to
optimal decision hyper-plane f ( x, α * , b* ) can be written as: implement our proposed method. The RBF kernel function
requires that only two parameters, C and γ should be set. Using
the RBF kernel for SVM, the parameters C , γ and SNPs
m
f ( x, α * , b* ) = ∑ yiα i* < xi ⋅ x > +b* = w* ⋅ x + b* (7)
i =1 viewed as input features which must be optimized
62 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
simultaneously for our proposed PSO-SVM hybrid system. The
particle representation consists of three parts including: C
and γ are the continuous variables, and the SNPs mask are the Data Preprocessing
discrete variables. For the feature selection, if n f features are Dataset
required to decide which features are chosen, then n f +2
decision variables in each particle must be adopted. Split dataset by LOOCV
Table 1 shows the particle representation of our design.
The representation of particle i with dimension of n f + 2 , Training set Testing set
where n f is the number of SNPs (features) that varies from
htSNPs (Feature) Selection
different datasets. xi ,1 ~ xi , n f ∈ {0,1} denotes the SNPs mask,
Selected htSNPs (features) subset F
xi , n f +1 indicates the parameter value C , xi , n f + 2 represents the PSO parameter : htSNPs mask
parameter value γ . If xi , k = 1, k = 1, 2,..., n f represents the k-th
SNP on the i-th particle to be selected, and vice versa. Testing set tagged SNPs Training set htSNPs
TABLE I. The particle i representation. SVM parameter Optimization
Discrete-variables Continuous-variables Training SVM classifier
PSO parameters : C and r
SNPs mask C γ
Learned SVM classifier
xi ,1 xi ,2 L xi , n f xi , n f +1 xi , n f + 2
PSO operation
A random key encoding method [9] is applied in the PSO Fitness calculation
algorithm. Generally, random key encoding is used for an
order-based encoding scheme where the value of random key is No PSO
Termination check ?
the genotype and the decoding value is the phenotype. Note operation
that the particle in each {xi , k }1≤ k ≤ n f is assigned a random Yes
number on (0, 1), and to decode in ascending order with regard Optimized C , r , and feature subset F
to its value. In the PSO learning process, the particle to be
counted larger tends to evolve closer to 1 and those to be
counted smaller tends to evolve closer to 0. Therefore, a repair Figure 2 The flowchart of the proposed PSO-SVM model.
mechanism such as particle amendment in [5] to guarantee the
number of htSNPs after update process in PSO is not required.
measurement mentioned above, details of the proposed hybrid
B. Fitness Measurement PSO-SVM procedure are described as follows:
Procedure PSO-SVM hybrid model
In order to compare the performance of our proposed
approach with other published methods SVM/STSA in [4] and 1.) Data preparation
BPSO in [5], the leave-one-out cross validation is used to Given a dataset U is considered using the leave-one-out
evaluate the quality of fitness measurement. The prediction cross-validation process to split the data into training and
accuracy is measured as the percentage of correctly predicted testing sets. The training and testing sets are represented as
SNP values on non-selected SNPs. In the LOOCV experiments, U TR and U TE , respectively.
each haplotype sequence is removed one by one from dataset U, 2.) PSO initialization and parameters setting
the htSNPs are selected using only the remaining haplotypes to
Set the PSO parameters including the number of iterations,
predict these tagged SNPs values for the removed one. This
procedure is repeated such that each haplotype in U is run once number of particles, velocity limitation, particle dimension,
in turn as the validation data. disturbance rate. Generate initial particles comprised of the
features mask, C and γ .
3.) Selected htSNPs (features) subset
C. The Proposed Hybrid System Procedure
Select input features for training set according to the feature
Figure 2 shows the system architecture of our proposed mask which is represented in the particle from 2), then the
hybrid model. Based on the particle representation and fitness features subset can be determined.
63 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
TABLE II. Results to compare PSO-SVM with SVM/STSA [4] and BPSO [5] on four real haplotype datasets.
Datasets Prediction accuracy %
(num of SNPs) 80 85 90 91 92 93 94 95 96 97 98 99
SVM/STSA 1 1 3 3 4 5 6 8 10 22 42 51
5q31 BPSO 1 1 2 3 4 5 6 7 9 14 29 42
(103)
PSO-SVM 1 1 2 2 3 4 5 6 7 10 23 36
SVM/STSA 1 1 2 5 5 6 7 8 10 15 15 24
TRPM8
BPSO 1 1 2 5 5 6 7 8 9 13 14 22
(101)
PSO-SVM 1 1 2 4 4 5 6 7 8 11 13 21
SVM/STSA 2 3 4 10 13 20 25 30 35 39 42 47
LPL
BPSO 2 3 6 9 12 16 18 21 25 28 31 37
(88)
PSO-SVM 2 3 4 7 10 12 13 17 20 22 26 31
4.) SVM model training and testing ranges of continuous type dimension parameters are:
Based on the parameters C and γ which are represented in C ∈ [10−2 ,104 ] and γ ∈ [10 −4 ,10 4 ] . The discrete type particle
the particle, to train the SVM classifier on the training dataset, for features mask, we set [Vmin , Vmax ] = [ −6, 6] , which yields a
then the prediction accuracy for SVM on the testing dataset by range of [0.9975, 0.0025] using the sigmoid limiting
LOOCV can be evaluated.
transformation by Eq. (4). Both the cognition learning factor c1
5.) Fitness calculation
For each particle, evaluate its fitness value by the prediction and the social learning factor c2 are set to 2. The disturbance
accuracy obtained from previous step. The optimal fitness rate is 0.05, and the number of generation is 600. The inertia
value will be stored to provide feedback on the evolution weight factor wmin = 0.4 and wmax = 0.9 . The linearly
process of PSO to find the increasing fitness of particle in the decreasing inertia weight is set as Eq. (13), where inow is the
next generation. current iteration and imax is the pre-defined maximum iteration.
6.) Termination check
When the maximal evolutionary epoch is reached, the inow
program ends; otherwise, go to the next step. w = wmax − ( wmax − wmin ) (13)
imax
7.) PSO operation
In the evolution process, discrete valued and continuous
valued dimension of PSO with the disturbance operator may be To compare the proposed PSO-SVM approach with the
applied to search for better solutions. SVM/STSA in [4] and BPSO in [5] on the three haplotype
datasets by LOOCV experiments, the computational results of
prediction accuracy according to the numbers of selected
V. EXPERIMENTAL RESULTS AND COMPARISONS htSNPs are summarized in Table 2. As mentioned in [4], it is
astonished that only one SNP for the 80% prediction accuracy
To validate the performance of the developed hybrid in 5q31 and TRPM8 datasets can be achieved. In practice, if
approach, three public experimental SNP datasets [4] including one guesses each SNP as 0, the prediction accuracy of 72.5%
5q31, TRPM8 and LPL are used to compare the proposed for 5q31 dataset and 79.3% for TRPM8 dataset would be
approach with other previously published methods. When there obtained. Therefore, the appropriate selection of one htSNPs to
are missing data exist in haplotype datasets, the GERBIL [4-5] correctly predict 80% on the rest of non-selected SNPs is
program is used to resolve them. The chromosome 5q31 reasonable. It is obvious that the proposed PSO-SVM hybrid
dataset was from the 616 kilobase region of human model achieves higher prediction accuracy with fewer selected
chromosome 5q31 and the SNPs were 103. The TRPM8 which htSNPs in the three haplotype datasets. In general, the
consists of 101 SNPs was obtained from HapMap. The human prediction accuracy is increased refers to the incremental
lipoprotein lipase (LPL) gene was derived from the haplotypes selected htSNPs number. From Figure 3 to Figure 5 show that
of 71 individuals typed over 88 SNPs. the numbers of selected htSNPs on haplotype datasets are
Our implementation platform was carried out on the Matlab proportional to the prediction accuracy and the PSO-SVM
7.3, a mathematical development environment by extending the algorithm has very good performance for haplotype tagging
Libsvm which is originally designed by Chang and Lin [10]. SNPs selection problem in the three testing cases.
The empirical evaluation was performed on Intel Pentium IV
CPU running at 3.4GHz and 2 GB RAM. Through initial
experiment, the parameter values of the PSO were set as
follows. The swarm size is set to 200 particles. The searching
64 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.6, 2010
VI. CONCLUSION
In this paper, a hybrid PSO-SVM model that combines the
particle swarm optimization (PSO) and support vector machine
(SVM) with feature selection and parameter optimization is
proposed to effectively solve for the haplotype tagging SNP
selection problem. Several public datasets of different sizes are
considered to compare the PSO-SVM with SVM/STSA and
BPSO previously published methods. The experimental results
show that the effectiveness of the proposed approach and the
high prediction accuracy with the fewer number of haplotype
tagging SNP can be obtained by the hybrid PSO-SVM system.
REFERENCES
Figure 3 The comparison result of prediction accuracy associated with selected [1] K. Zhang, M. Deng, T. Chen, M. Waterman and F. Sun, “A dynamic
htSNPs on 5q31 datasets. programming algorithm for haplotype block partitioning,” Proc. Natl.
Acad. Sci., vol. 99, pp. 7335–7339, 2002.
[2] K. Zhang, F. Sun, S. Waterman and T. Chen, “Haplotype block partition
with limited resources and applications to human chromosome 21
haplotype data,” Am. J. Hum. Genet., vol. 73, pp. 63–73, 2003.
[3] H. Avi-Itzhak, X. Su, and F. de la Vega, “Selection of minimum subsets
of single nucleotide polymorphisms to capture haplotype block
diversity,” In Proceedings of Pacific Symposium on Biocomputing, vol.
8, pp. 466–477, 2003.
[4] He Jingwu and A. Zelikovsky, “Informative SNP Selection Methods
Based on SNP Prediction,” IEEE Transactions on NanoBioscience, Vol.
6, pp. 60-67, 2007.
[5] Cheng-Hong Yang, Chang-Hsuan Ho and Li-Yeh Chuang, “Improved
tag SNP selection using binary particle swarm optimization,” IEEE
Congress on Evolutionary Computation (CEC 2008), pp. 854-860, 2008.
[6] V.N. Vapnik, “The nature of statistical learning theory,” New York:
Springer-Verlag, 1995.
[7] E. Halperin, G. Kimmel and R. Shamir, “Tag SNP selection in genotype
data for maximizing SNP prediction accuracy,” Bioinformatics, Vol. 21,
Figure 4 The comparison result of prediction accuracy associated with pp. 195-203, 2005.
selected htSNPs on TRPM8 datasets. [8] J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle
swarm algorithm,” in Proceedings of the World Multiconference on
Systemics, Cybernetics and Informatics, Piscataway, NJ, 1997, pp.
4104-4109.
[9] J.C. Bean, “Genetics and random keys for sequencing and optimization,”
ORSA J. Comput., Vol. 6, pp. 154-160, 1994.
[10] C.C. Chang, and C.J. Lin, LIBSVM: a library for support vector
machines, Software available at:
http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
Figure 5 The comparison result of prediction accuracy associated with
selected htSNPs on LPL datasets.
65 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "