A Hybrid Approach for DICOM Image Feature Extraction, Feature Selection Using Fuzzy Rough set and Genetic Algorithm

Document Sample
A Hybrid Approach for DICOM Image Feature Extraction, Feature Selection Using Fuzzy Rough set and Genetic Algorithm Powered By Docstoc
					                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 9, No. 11, November 2011

  A Hybrid Approach for DICOM Image Feature
  Extraction, Feature Selection Using Fuzzy Rough
             set and Genetic Algorithm

                     J. Umamaheswari                                                              Dr. G. Radhamani

    Research Scholar, Department of Computer Science                                  Director, Department of Computer Science
              Dr. G.R.D College of Science,                                                 Dr. G.R.D College of Science,
              Coimbatore, Tamilnadu, India                                                 Coimbatore, Tamilnadu, India.

     Abstract— The proposed hybrid approach for feature                    and Kanellopoulos 2006) [4] or to select features ( Kavzoglou
extraction, feature reduction and feature selection of Medical             and Mather 2002 [5]) but not both at the same time.
images based on Rough set and Genetic Algorithm (GA). A Gray
Level Co-occurrence Matrix (GLCM) and Histogram based                                GLCM, Histogram, level set, Gabor filters, and
texture feature set is derived. The optimal texture features are           wavelet transform [6, 7, 8, 9] are the approaches for texture
extracted from normal and infected Digital Imaging and                     classification problem. The Gabor filters are poor due to their
Communications in Medicine (DICOM) images by using GLCM                    lack of orthogonality that results in redundant features, while
and histogram based features. The inputs of these features are             wavelet transform is capable of representing textures at the
taken for the feature selection process. The selected features is          most suitable scale, by varying the spatial resolution and there
solved by using Fuzzy Rough set and GA. These optimal features             is also a wide range of choices for the wavelet function.
are used to classify the DICOM images into normal and infected.
The performance of the algorithm is evaluated on a series of                        In medical image analysis, the determination of
DICOM datasets collected from medical laboratories.                        normal and infected brain is classified by using texture.
                                                                           DICOM and CT image texture proved to be useful to
    Keywords- Fuzzy roughest; GLCM;            Texture    features;
                                                                           determine the Normal brain [10] and to detect the brain
Histogram Features and region features.
                                                                           disease part [11].
                      I.    INTRODUCTION                                            There is a big problem in selecting the optimal
                                                                           features in medical imaging. The evaluation of possible
         Nowadays DICOM image analysis is becoming more
                                                                           feature subsets is usually a painful task. So the large amount of
important for diagnosis process. This process is not easy way
                                                                           computational effort is required. Fuzzy roughest and Genetic
for optimal identification and early detection of diseases for
                                                                           algorithm (GA) appear to be a selective approach to choose
improving the surviving rate. Generally the DICOM image is a
                                                                           the best feature subset while maintaining acceptable feature
valuable and most reliable method in early detection.
                                                                           selection. Siedlecki and Sklansky [12] compared the GA with
         Different methods of DICOM image feature                          classical algorithms and they proposed the GA for feature
reduction have been used to solve by statistical methods,                  selection. Fuzzy rough set proved to be the best selection
texture based methods and feature is extracted by using image              method for optimal classification.
processing techniques [3]. Some other methods are based on
                                                                                    A new method for extracting features in DICOM
fuzzy theory [1] and neural networks [2].
                                                                           images with lower computational requirements is proposed
          The lack of systematic research on features extracted            and selection percentage is analyzed. The tables provide the
and their role to the classification results forces researchers to         user with all relevant information for taking efficient decision.
select features arbitrarily as input to their systems. Genetic             Thus a synergy of genetic algorithms and fuzzy is used for
algorithms have been successful in discovering an optimal or               feature selection in our proposed method.
near-optimal solution amongst a huge number of possible
                                                                                    The remaining paper is organized as follows. Section
solutions (Goldberg 1989). Moreover, a combination of
                                                                           2 describes the feature extraction process. The feature
genetic algorithms and fuzzy can prove to be very powerful in
                                                                           selection problem is discussed in Section 3, while Section 4
classification problems. Previously genetic algorithms have
                                                                           contains the experimental results. Finally section 5 presents
been used either to evolve neural network topology (Stathakis
                                                                           conclusion and references.

                                                                                                     ISSN 1947-5500
                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                      Vol. 9, No. 11, November 2011

                 II.     FEATURE EXTRACTION
                                                                                 TABLE 3 GLCM FEATURES AND VALUES EXTRACTED F ROM
         Feature extraction methodologies analyze objects and                       NORMAL & INFECTED MEDICAL IMAGES
images to extract the features that are representative of the
various classes of objects. In this Work intensity histogram
features and Gray Level Co-Occurrence Matrix (GLCM)
features are extracted [12].

2.1 Intensity Histogram Features
         Intensity Histogram analysis has been extensively
used. The intensity histogram features are mean, variance,
skewness, kurtosis, entropy and energy. These are shown in
Table 1.


         The average value of intensity histogram features
obtained for different type of medical image is given Table 2
as follows:


2.2 GLCM Features
          The Gray-Level Co-occurrence Matrix (GLCM) is a                                III.   FEATURE SELECTION
statistical method that considers the spatial relationship of                  To improve the prediction accuracy and minimize the
pixels, which is also known as the gray-level spatial                 computation time, feature selection is used. Feature selection
dependence matrix. The pixel and the adjacent pixel is                occurs by reducing the feature space. This is achieved by
consider as the spatial relationship and also another spatial         removing irrelevant, redundant and noisy features which
relationships can be specified between these two pixels.              performs the dimensionality reduction. Popularly used feature
         The Following GLCM features were extracted in this           selection algorithms are Sequential forward Selection,
paper : Autocorrelation, Contrast, Correlation, Cluster               Sequential Backward selection, Genetic Algorithm and
Prominence, Cluster Shade, Dissimilarity Energy, Entropy,             Particle Swarm Optimization. In this paper a combined
Homogeneity, Maximum probability, Sum of squares, Sum                 approach of fuzzy roughest method with Genetic Algorithm is
average, Sum variance, Sum entropy, Difference variance,              proposed to select the optimal features. The selected optimal
Difference entropy, Information measure of correlation,               features are considered for classification.
information measure of correlation, Inverse difference
normalized.                                                           3.1 Genetic Algorithm (GA) based Feature selection:
                                                                                During classification, the number of features can be
         The value obtained for the above features for a
                                                                      large, irrelevant or redundant. So the optimal solution is not
typical normal and infected DICOM image is given in the
                                                                      occurred. To solve this problem feature reduction is
following Table 3,

                                                                                                 ISSN 1947-5500
                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 9, No. 11, November 2011

introduced to improve the process by searching for the best                    TABLE 4 F EATURE SELECTED BY GENETIC ALGORITHM METHOD
features subset, from the original features.
         GA is an adaptive method of global-optimization
searching and simulates the behavior of the evolution process
in nature. It is based on Darwin’s fittest principle, which states
that an initial population of individuals evolves through
natural selection in such a way that the fittest individuals have
a higher chance of survival.
          The GA maintains a cluster of competing feature
matrices. To evaluate each matrix in this cluster, the inputs are         The above Table 5 shows the feature selected by GA method.
multiplied by the matrix, producing a set of output which are
then sent to a classifier. The classifier typically divides the
features into a training set and a testing set, to evaluate               3.2 Feature selection by Rough Set
classification accuracy. Generally each feature is encoded into                    Fuzzy set involves more advanced mathematical
a vector called a chromosome.                                             concepts, real numbers and functions, whereas in classical set
                                                                          theory the notion of a set is used as a fundamental notion of
      fitness = WA∙Accuracy + Wnb/N                                                                        (1)
                                                                          whole mathematics and is used to derive any other
where WA is the weight of accuracy and Wnb is the weight of N             mathematical concepts, e.g., numbers and functions [13,14].
feature participated in classification where N ≠ 0.                                Rough set theory can be viewed as a specific
          A fitness value will be used to measure the fitness of          implementation of Frege’s idea of vagueness, i.e., imprecision
a chromosome and decides whether a chromosome is good or                  in this approach is expressed by a boundary region of a set,
not in a given cluster. Initial populations in the genetic process        and not by a partial membership, like in fuzzy set theory.
are randomly created. GA uses three operators to produce a                Rough set concept can be defined quite generally by means of
next generation from the current generation: reproduction,                topological operations, interior and closure, called
crossover and mutation. GA eliminates the chromosomes of                  approximations. The concept of rough set theory is based on
low fitness and keeps the ones of high fitness.                           the followings:
         Thus more chromosomes of high fitness move to the                3.2.1 Decision Tables
next generation. This process is repeated until a good
chromosome (individual) is found. The Figure 1 illustrates the                A decision table consists of two different attribute sets.
                                                                          One attribute set is designated to represent Conditions (C) and
feature selection using the genetic algorithm.
                                                                          another set is to represent Decision (D). Therefore, each row
                                                                          of a decision table describes a decision rule, which indicates a
                                                                          particular decision to be taken if its corresponding condition is
                                                                          satisfied. If a set of decision rules has common condition but
                                                                          different decisions then all the decision rules belonging to this
                                                                          set are inconsistent decisions, otherwise; they are consistent.

                                                                          3.2.2 Dependency of Attributes
                                                                              Similar to relational databases, dependencies between
                                                                          attributes may be discovered. If all the values of attributes
                                                                          from D are uniquely determined by values of attributes from C
                                                                          then D depends totally on C or C functionally determines D
                                                                          which is denoted by C ⇒D. If D depends on some of the
                                                                          attributes of C (i.e. not on all) then it is a partial dependency C
                                                                          ⇒kD and a degree of dependency (k; 0 ≤ k ≤ 1) can be
                                                                          computed as k = γ(C, D), where γ(C, D) is the consistency
                                                                          factor of the decision table. γ(C, D) is defined as the ratio of
              FIGURE 1 FEATURE S ELECTION USING GA                        the number of consistent decision rules to the total number of
                                                                          decision rules in the decision tables.
         The total features extracted are 40. The selected
features using GA method are tabulated as follows:                        3.2.3 Reduction of Attributes
                                                                                Decision tables where feature vectors are the condition
                                                                          (C) and desired values for corresponding classes are the
                                                                          decisions (D) can also represent classification of feature
                                                                          vectors. Now the dimensionality reduction can simply be

                                                                                                     ISSN 1947-5500
                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                        Vol. 9, No. 11, November 2011

considered as removal of some attributes from the decision                          TABLE 6 F EATURE SELECTED BY PROPOSED APPROACH
table (actually some features from the feature vector)
preserving its basic classification capability. If a decision table
contains some redundant or superfluous data, then collect
those redundant data and remove them.
         The selected features using Rough set method are
tabulated as follows

                 1              Kurtosis
                 2              Std
                 3              Sum Average
                 4              Sum Variance
                                                                                              IV.   EXPERIMENTAL RESULTS
                                                                                 For the comparison of results of different feature
3.3 Proposed Hybrid Approach Algorithm:                                    reduction methods like rough set, GA and the proposed method
1. N number of features is extracted by GLCM and                           has been used. Feature space is formed using the DICOM
    Histogram texture features from the preprocessed Image                 images. Totally forty features are extracted which forms the
2. Apply roughest algorithm to select the optimal set                      feature space. Using GA feature space reduced to eight features
    containing n1 number of features where n1< N                           and by rough set method it is reduced to four features. The
3. Apply genetic algorithm to select the best subset                       proposed method selects only twelve features. These features
    containing n2 number of features where n2<N                            improve the class prediction.
4. Find the Union of n1 features and n2 features to form
    final n features                                                                 The percentage of reduction by GA method is 80%.
5. Use the n features where n<N for Classification.                        75 % of reduction is done by rough set method. The selected
                                                                           features are used for classification which reduces the
                                                                           classification time and improves the prediction accuracy. The
                                                                           proposed approach selects feature space of DICOM images
                                                                           which is reduced by 95%. The following Table 7 gives the
                                                                           results of the proposed method.

                                                                                    TABLE 7 RESULTS OBTAINED BY P ROPOSED METHOD
                                                                           GA method                           80%
                                                                           Rough set Method                    75%
                                                                           Proposed method                     95%

                                                                                     This gives that the proposed approach is efficient for
                                                                           image analysis. It’s a better tool for doctors or radiologists to
                                                                           classify normal brain images and infected brain images.

                                                                                                    V. CONCLUSION
                                                                                The paper developed a hybrid technique with normal and
       FIGURE 2 P ROPOSED APPROACH FOR FEATURE S ELECTION                  infected DICOM images. The proposed approach gives results
                                                                           in extraction and selection for classifying the images that
         The above Figure 2 shows the feature selection by
                                                                           benefit the physician to make a final decision. The approach
proposed approach. The following Table 6 gives feature
                                                                           for feature extraction, feature reduction and feature selection
selected by proposed approach.
                                                                           of images based on Rough set and Genetic Algorithm (GA). A
                                                                           Gray Level Co-occurrence Matrix (GLCM) and Histogram
                                                                           based texture feature set is derived. The feature selection is
                                                                           done by Fuzzy Rough set and GA. These optimal features are
                                                                           used to classify the DICOM images into normal and infected.

                                                                                                      ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                         Vol. 9, No. 11, November 2011

The performance of the algorithm is evaluated on a series of
DICOM datasets collected from medical laboratories. The
method has been proved that it is easier and gives desirable                                                           AUTHORS PROFILE
results for future process.

                                   REFERENCES                                                  Ms.J.Umamaheswari, Research Scholar in Computer Science, Dr.
[1] D.Brazokovic and M.Nescovic ., “Mammogram screening using                                  GRD college, Coimbatore. She has 5 years of teaching experience
                                                                                               and two years in Research. Her areas of interest include Image
multisolution based image segmentation”, International journal of pattern
                                                                                               Processing, Multimedia and communication. She has more than 3
recognition and Artificial Intelligence, Vol.7,No.6, P. 1437-1460,1993.                        publications at International level. She is a life member of
 [2] I.Christiyanni et al ., “Fast detection of masses in computer aided                       professional organization IAENG.
mammography”, IEEE Signal processing Magazine, P.54- 64, 2000.
 [3] S.Lai,X.Li and W.Bischof . “On techniques for detecting circumscribed                     Dr.G. Radhamani, Director in Computer Science, Dr. GRD College,
masses in mammograms”, IEEE Trans on Medical Imaging, Vol.8, No.4,                             Coimbatore. She has more than 5 years of teaching and research
P.377-386,1989.                                                                                experience. She has volume of publications at International level. Her
[4] K. Topouzelis, D. Stathakis and V. Karathanassi , “Investigation of
                                                                                               areas of interest include Mobile computing, e-internet and
                                                                                               communication. She is a member of IEEE.
genetic algorithms contribution to feature selection for oil spill detection”,
Vol. 30, No.3, P.611-625, 2009.
[5] Kavzoglu T and Mather P.M., “The role of feature selection in artificial
neural network applications”, International Journal of Remote Sensing,
Vol.23, No.15, P.2919-2937, 2002.
[6] Dunn C., Higgins W.E., “Optimal Gabor filters for texture segmentation”, IEEE
Transactions on Image Processing, Vol. 4, No.7, P. 947-964,1995.
[7] Chang T., Kuo C., “Texture Analysis and classification with tree structured
wavelet transform”, IEEE Transactions on Image Processing, Vol. 2, No.4, P. 429-
441, 1993.
[8] Dr. H.B.Kekre, Sudeep D. Thepade, Tanuja K. Sarode and Vashali
Suryawanshi, “ Image Retrieval using Texture Features extracted from
GLCM, LBG and KPE”, Vol. 2, No. 5, P.1793-8201, 2010.
[9] M.M. Trivedi, R.M. Haralick, R.W. Conners, and S. Goh, “Object
Detection based on Gray Level Coocurrence”, Computer Vision, Graphics,
and Image Processing, Vol. 28,          P. 199-219, 1984.
[10] Schad L.R., Bluml S., Zuna, I., “MR tissue characterization of intracranial tumors
by means of texture analysis, Magnetic Resonance Imaging”, Vol.11, No.6, P. 889-
896, 1993.
[11] Free borough P.A., Fox N.C., “MR image texture analysis applied to the
diagnosis and tracking of Alzheimer’s disease ”, IEEE Transactions on Medical
Imaging, Vol. 17, No.3,   P. 475-479, 1998.
[12] Serkawt Khola , “Feature Weighting and Selection A Novel Genetic
Evolutionary Approach”, World Academy of Science, Engineering and
Technology 73, P.1007-1012, 2011.
[13] Ping Yao, “Fuzzy Rough Set and Information Entropy Based Feature
Selection for Credit Scoring”, IEEE , P.247-251, 2009.
[14] Pradipta Maji and Sankar K. Pal, “Fuzzy–Rough Sets for Information
Measures andSelection of Relevant Genes From Microarray Data”, IEEE,
Vol. 40, No. 3, P.741-752, 2010.

                                                                                                                           ISSN 1947-5500

Shared By: