A Novel Approach for Classify Different Classes in Large Dimensional problems based on Feature Extraction and Selection

Document Sample
A Novel Approach for Classify Different Classes in Large Dimensional problems based on Feature Extraction and Selection Powered By Docstoc
					International Journal of Application or Innovation in Engineering & Management (IJAIEM)
       Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 1, Issue 1, September 2012                                       ISSN 2319 - 4847

           A Novel Approach for Classify Different
         Classes in Large Dimensional problems based
             on Feature Extraction and Selection
                                                      Fernando Emami
                                        School of Computer Science and Engineering,
                                             Higher Institute of Gaya, Portugal

Feature extraction and have choice square measure 2 general ways for the side of spatial property that it is still a huge
drawback in the pattern recognition context. In this paper, a novel approach has been planned to classify totally
different categories in giant dimensional issues. A 2 layer feature reduction is planned here. First, a parallel cooperative
feature choice is applied to knowledge and second, knowledge is remodeled in a very new feature house. between large
selection of usable analysis to choose best set of options in a very dataset, Tabu Search (TS) is one of acceptable and
progressive ways. when ample iterations that satisfy the objective perform the most effective set has been calculated by option
between 2 reduction phases and then knowledge is sent into these set to a brand new house once we attempt to take away
hissing options by a feature extraction approach. Direct linear discrimination associate degreealysis (D-LDA) has been used
as an economical feature extraction ways. Finally, knowledge are classified by Support Vector Machine (SVM) as a prevailing
used classifier. Filters and Wrappers square measure the 2 ancient sorts of objective functions for feature choice. each of
those satisfactory measures have been enforced and their results on the customary UCI dataset have been shown. The
results show the superiority of our combinatorial approach in comparison with the ancient ways.
Keywords: Feature selection, Tabu search, Direct linear discrimination analysis, Filter and Wrapper, Davies Bouldin,

   The curse of spatial property introduced by attender is one of the most necessary issues in information
classification with giant input dimensions. Feature choice and feature extraction ar common solutions [1, 2, 3].
In this article, associate degree investigation approach has been outlined to use the blessings of feature choice
strategies. Here, the aim is to use associate degree acceptable thanks to map the feature vectors to a new house
with lower dimension and then classify the check information by SVM classifier. Feature set choice (FSS) needs
2 metrics: initial, a search strategy to choose candidate subsets associate degreed on the alternative hand an
objective operate to measure these candidates and come back their “goodness” price. Also, a feedback signal used
by the search strategy to choose new candidates. usually Exponential, serial and irregular algorithms ar used as
search methods to pick a set of options and in serial algorithms the Forward and Backward choices ar outlined.
Backward choice starts from the total set and consecutive removes the feature that results in the smallest decrease
within the price of the objective operate. thoroughgoing searches ar pricey time overwhelming however will offer
the worldwide minima. Tabu Search is one thoroughgoing ones that the time that it consumes is rely on the
initial condition. Therefore, in this study we tend to strive to assign 2 totally different initializations to hasten
the government time and then we tend to use the selection between the bests.
   Objective functions are usually divided into 2 groups: Filters and Wrappers [12]. In experimental analysis
purpose of read, it is obvious that each teams have some blessings and disadvantages. In the Filter approach, the
feature set choice is performed severally of the classifier coaching. In this case, feature set choice is thought-
about as a preprocessing step for induction. though this is computationally additional economical, it ignores the
reality that associate degree optimum choice of options is dependent on the classifier model. whereas the
Wrapper approach is additional complicated than the Filter, however the interaction between options and
classifier is outstanding here. In this new technique, not solely Filter approach has been used however
conjointly accuracy rate of the one of the sure-fire classifiers has been used as a Wrapper technique. much, it's
determined that in high dimensional issues, some noisy-value options unwittingly have an effect on the reduction of
classification accuracy rate and conjointly defect the gradient of the mapping hyper plane (a plane in 2-D).
Hence, a 2 layers feature reduction formula is planned here to soften these affections. In initiative, 2 parallel
Tabu search as a feature choice technique has been used and totally different initial condition is allotted to every

Volume 1, Issue 1, September 2012                                                                                   Page 40
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
       Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 1, Issue 1, September 2012                                       ISSN 2319 - 4847

Tabu search. Then the end result of every strategy is given to a selection operate to choose the best set by
considering the cooperation between them. The second step maps the information to a lower dimensional house
by exploitation associate degree effective feature extraction formula. Direct Linear Discriminant Analysis (D-LDA)
has been with success used as associate degree economical feature extraction technique recently, thus this
approach is used in this study. Tabu search is mentioned in section two. A temporary definition of D-LDA is
brought in section three and the classification section of reduced information is delineated in section four. Results
and discussion of applying planned strategies on UCI-data set are shown in section.

    As we tend to delineated on top of, a 2 layer feature reduction is planned here. The initial layer is feature
choice and the next is feature extraction. The motivation of exploitation each strategies consecutive is to utilize
edges of the each teams along. Tabu search (TS) as a feature choice formula is used in the initial layer. But, as
we all know initial set is essential in TS formula. thus during this paper 2 TS formula with 2 totally different initial
points have been run and every of them produces own set of options. The TS is organized in some means to
turn out associate degree array with length of all options that for every selected feature, its part of array is set to
one and for not-selected options it sets to zero. to mix the resulted subsets, an easy selection operate is employed.
Indeed, associate degree OR operate is utilized between the 2 arrays. A temporary rationalization concerning TS
formula and the objective operate that is used in this study is brought in partition A.
    In the second layer, D-LDA as a widely known feature extraction is applied to the output of selection operate
that it comes from the initial layer. The D-LDA is delineated in partition B. The flow diagram of planned
reduction technique is illustrated in Fig.1
    A. Tabu Search The basic thought of Tabu Search (TS) as delineated by Glover [6] is a meta-heuristic
superimposed on another heuristic. A memory forces the search to explore the search house such defense in
native minima is avoided. The size of this memory is the same as the variety of all options. The memory is
initialized with a whimsical price. This price determines the variety of iterations that we tend to decide to not
amendment the attendance/absence of a feature in search procedure.
    Both filter and wrapper objective functions have been tested in our study. Davies-Bouldin criterion as a
filter, and the classification error, as a wrapper ar utilized here. each of the on top of criteria indexes ought to be
decreased . the small print of exploitation classifier ar explained in section IV. and also the Davies Bouldin index,
provides a live of clusters/classes disjunction. The Davies-Bouldin index is outlined as [9]:
DB(C)=                                                  (1)

Where ∆(c_i ) is that the intra-cluster/class distance whereas δ(c_i,c_j ) is that the inter-cluster/class distance. Lower
the index price (1), show the additional separability of feature subsets. The pseudo code of Tabu Search is delineated
     Feature Selection Subroutine
            Construct initial solution: assignment (i)
            =1 for all i
            Construct: initial memory, maxItr and best
            value = inf
            for Iteration = 1: maxItr
            if (IsSatisfied(BestValue)) break
            for all features
            if (Memory(k)==٠)
            call evaluate() and find the feature
            whit minval evaluation as minindex
            end for
            Memory(minindex) = initial;
            for all features that Memory(i)is true
            Memory(i) = Memory(i)-1;
            Answer = Assignment
            end for
            End of Subroutine

            evaluate Subroutine
            Not (assignment(index))
            return davies_bouldin _measure

Volume 1, Issue 1, September 2012                                                                               Page 41
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
       Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 1, Issue 1, September 2012                                       ISSN 2319 - 4847
             return classification_error_rate
             End of evaluate Subroutine

D-LDA is predicated on the concept of “simultaneous diagonalization” of between-class-scatter matrix (Sb ) and
within-class-scatter matrix (Sw), that is an alternate approach in algebra to unravel the generalized eigenvector and
eigenvalue downside (Sb wi =iSwWi ), not like ancient ways, that 1st translate SW, D-LDA 1st whitens
(diagonalizes and scales) Sb so diagonalizes SW (see formula 1) [5].

  1: Diagonalize Sb by eigenanalysis. Find matrix V such that V T SbV=  , where V TV = I and is diagonal. Only keep
      components with non-zero eigenvalues (at most #ofclasses-1). LetY be the new basis and Db be the diagonalmatrix of
      corresponding non-zero eigenvalues.Y TSbY = Db.
  2: Project and diagonalize Sw. Let Z=YD-1/2 b (whitening Sb). Factorize Z TSwZ using eigenanalysis U T(ZTSwZ)U =Dw
      where U TU=I and Dw is diagonal. Keep eigenvectors withsmallest eigenvalues.
  3: Reconstruct the matrix of feature vectors W =ZUD -1/2w . For a given input vector x, its projection in the feature space x*
      = W Tx.

                                          Table 1: Characteristics of the data sets

                                        Figure 1: Flowchart of proposed algorithm

  Support Vector Machine [13,14,15] implements the structural risk diminution (SRM) principle wherever the
empirical risk is unbroken fixed whereas the VC confidence interval is minimized. Underlying the success of SVM are
mathematical foundations of applied mathematics learning theory. instead of merely minimizing the training error,
SVM minimizes structural risk which expresses associate bound on generalization error. presumptuous a linear call
boundary, central plan is to search out a weight vector W such the margin is as giant as attainable.
  Assuming that the info is linearly severable, we seek get out|to search out} the littlest attainable W or most
separation (margin) between the 2 categories. This can be formally expressed as a quadratic optimisation problem:

  Where xi could be a train knowledge with its category label as yi, and b is that the bias of most margin hyperplane
which is decided by W vector. By reworking the on top of lenticular optimization downside into its twin downside, the

Volume 1, Issue 1, September 2012                                                                                     Page 42
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
       Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 1, Issue 1, September 2012                                       ISSN 2319 - 4847

solution is found within the kind

  Where ai is lagrange constant and it's nonzero only for those knowledge points that bring home the bacon equality
constraints in (3). These knowledge samples are known as support vectors.SVM could be a native method within the
sense that answer is solely determined by support vectors whereas all alternative data points ar irrelevant to the choice
hyper plane [13].

   In order to assess the performance of the proposed technique, we tend to used six commonplace datasets available
from UCI cc repository (Table 1).
                           Table 2: Results on datasets based on two type of objective function

   For any of datasets, the all samples on the market are divided into 90٪ coaching set among samples and 10٪ testing
set supported 10-fold cross validation [16]. Results of our technique are summarized in Table a pair of. for every
knowledge set, TS is used with each SVM-error rate and Davies- Bounding criteria as objective perform.

             Figure 2: Accuracy rate by SVM on different data set by applying various feature reduction approaches

   In Fig.2 As we tend to expect by combining the choice and extraction strategies, the clanging options are eliminated
from computer file and also the chosen subset of options has a lot of discriminative power in comparison with
exploitation TS or D-LDA individually. moreover, the resulted error rates are promising in a number of examining
knowledge set like Wine and WDBC. On the opposite hand, using 2 versions of TS with totally different initial subset
adds this chance to go looking a lot of locations (subsets) in feature-space and increase exploration whereas every TS
Volume 1, Issue 1, September 2012                                                                                    Page 43
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
       Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 1, Issue 1, September 2012                                       ISSN 2319 - 4847

on an individual basis tries to exploit in feature-space. Admittedly, defining a threshold on objective price is TS
algorithmic rule control the live of exploitation. So, to some extent exploration-exploitation trade-off has been balanced
in conferred framework.

   In this paper, we tend to outlined a ballroom dancing feature reduction algorithmic rule to enhance the classification
performance. one amongst most significant purpose is that, the planned algorithmic rule is freelance of the type of
objective perform and with each filter and wrapper criteria shows smart accuracy.
On the opposite hand by combining the feature selection and have extraction consecutive it's guaranteed that the
foremost variety of clanging features are eliminated from feature set. Our results show that the implementations of
proposed technique yield the upper classification rate on some knowledge sets. Moreover, the algorithmic rule can deal
with the clanging options within the coaching data.

  [1] K. J. Cios, W. Pedrycz, and R. W. Swiniarski, "Data mining methods for knowledge discovery", chapter 9,
       Kluwer Academic Publishers,1998
  [2] R. O. Duda and P. E. Hart. "Pattern Classification and Scene Analysis". John Wiley-Sons, New York, 1 edition,
  [3] K. Fukunaga ."Introduction to Statistical Pattern Recognition (2nd edition)",. Academic Press1990
  [4] Vapnic, V. N. (1998). Statistical learning theory. New York: Wiley.
  [5] H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data—with application to face recognition, Pattern
       Recognition 34 (2001) 2067-2070
  [6] Qifa Ke, Tianzi Jiang, Song De Ma,"A tabu search method for geometric primitive extraction", Pattern
       Recognition Letter, Vol. 18,Issu 14,pp. 1443-1449, 1997
  [7] Muhammad Atif Tahir, Ahmed Bouridane, Fatih Kurugollu, "Simultaneous feature selection and feature
       weighting using Hybrid Tabu Search/K-nearest neighbor classifier", Pattern Recognition Letter, Vol. 28 ,Issu
       4,pp. 438-446, 2007
  [8] K. W. Lau, Q. H. Wu, "Leave one support vector out cross validation for fast estimation of generalization errors",
       Pattern Recognition Letter, Vol. 37,Issu 9,pp. 1835-1840,2004
  [9] Davies D. and Bouldin D.”A Cluster Separation Measure”, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 1, No
       2, 1979, pp. 224-227
  [10] D. W. Aha and R. L. Banker, “A Comparative Evaluation of Sequential Feature Selection Algorithms”,
       Proceeding of the Fifth International Workshop on Artificial Intelligence and Statistics, pp. 1-7, 1995
  [11] Luis Carlos Molina , Lluís Belanche , Àngela Nebot,” Feature Selection Algorithms: A Survey and
       Experimental Evaluation”, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'٠2),
       p.306, December 9-12, 2002
  [12] Kohavi,R., John G.H.”Wrappers for Feature Subset Selection”. Artificial Intelligence 97 (1997)273-334
  [13] V. Cherkassky and F. Mulier. Learning from Data : Concepts, Theory and Methods. Wiley Interscience, 1998.
  [14] V. Vapnik. The Nature of Statistical Learning Theory. New York:Springer Verlag, 1995
  [15] J. A. K. Suykens, T. V. Gestel, J. D. Brabanter, B. D. Moor and J. Vandewalle. Least Squares Support Vector
       Machines. World Scientific , Pub. Co., Singapore,2003
  [16] Olivier Dubrule,”Cross validation of kriging in a unique neighborhood , mathematical geology”,2 August 1982 .

Volume 1, Issue 1, September 2012                                                                               Page 44

Description: International Journal of Application or Innovation in Engineering & Management (IJAIEM) is an online Journal in English published monthly for scientists, Engineers and Research Scholars involved in Engineering, Management and its applications to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the IJAIEM are selected through rigid peer review to ensure originality, timeliness, relevance and readability. The aim of IJAIEM is to publish peer reviewed research and review articles in rapidly developing field of engineering and management. This journal is an online journal having full access to the research and review paper. The journal also seeks clearly written survey and review articles from experts in the field, to promote intuitive understanding of the state-of-the-art and application trends. The journal aims to cover the latest outstanding developments in the field of engineering and management. ISSN 2319 - 4847 Frequency : 12 Issues/Year E-mail: editor@ijaiem.org, editorijaiem@gmail.com