"Face Recognition Using Biogeography Based Optimization"
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011 Face Recognition Using Biogeography Based Optimization Er. Navdeep Kaur Johal Er.Poonam Gupta Er. Amanpreet Kaur R.I.E.I.T. R.I.E.I.T R.I.E.I.T. Railmajra, Distt. SBS Nagar Railmajra, Distt.SBS Nagar Railmajra,Distt. SBS Nagar Punjab, India Punjab,India Punjab,India firstname.lastname@example.org email@example.com firstname.lastname@example.org Abstract: Feature selection (FS) is a global optimization problem in approaches to face recognitions have been developed; an machine learning, which reduces the number of features, removes excellent survey paper on the different face recognition irrelevant, noisy and redundant data, and results in acceptable techniques can be found in . recognition accuracy. It is the most important step that affects the performance of a pattern recognition system. This paper presents a A. Feature Extraction novel feature selection algorithm based on Biogeography Based Optimization (BBO). Biogeography-based optimization (BBO) is a recently-developed EA motivated by biogeography, which is the The first step in any face recognition system is the extraction of study of the distribution of species over time and space. The the feature matrix. A typical feature extraction algorithm tends algorithm is applied to coefficients extracted by discrete cosine to build a computational model through some linear or Non - transforms (DCT). The proposed BBO-based feature selection linear transform of the data so that the extracted feature is as algorithm is utilized to search the feature space for the optimal representative as possible or when the input data to an algorithm feature subset where features are carefully selected according to a is too large to be processed and it is suspected to be notoriously well defined discrimination criterion. Evolution is driven by a fitness function defined in terms of maximizing the class separation redundant (much data, but not much information) then the input (scatter index). The classifier performance and the length of data will be transformed into a reduced representation set of selected feature vector are considered for performance evaluation features (also named features vector). Transforming the input using the ORL face database. Experimental results show that the data into the set of features is called feature extraction. If the BBO-based feature selection algorithm was found to generate features extracted are carefully chosen it is expected that the excellent recognition results with the minimal set of selected features set will extract the relevant information from the input features. data in order to perform the desired task using this reduced representation instead of the full size input. Keywords: Face Recognition, Biogeography Based Optimization, DCT, Feature Selection Best results are achieved when an expert constructs a set of application-dependent features. Nevertheless, if no such expert I. INTRODUCTION knowledge is available general dimensionality reduction techniques or feature extraction may help. These include: Face Recognition is a process in which we match the input image with the given database and produce the output image geometrical features extraction which is similar to the input image. As one of the most statistical (algebraic) features extraction [2 - 8]. successful applications of image analysis and understanding, face recognition has recently received significant attention, The geometrical approach, represent the face in terms of especially during the past several years. At least two reasons structural measurements and distinctive facial features that account for this trend: the first is the wide range of commercial include distances and angles between the most characteristic and law enforcement applications, and the second is the face components such as eyes, nose, mouth or facial templates availability of feasible technologies after 30 years of research. such as nose length and width, mouth position, and chin type. Even though current machine recognition systems have reached These features are used to recognize an unknown face by a certain level of maturity, current systems are still far away matching it to the nearest neighbor in the stored database. from the capability of the human perception system. So many Statistical features extraction is usually driven by algebraic 126 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011 methods such as principal component analysis (PCA), and maximum number of irrelevant and redundant features obtained independent component analysis (ICA) . These methods find during feature extraction while maintaining acceptable a mapping between the original feature spaces to a lower classification accuracy. Among the various methods proposed dimensional feature space. for FS, population-based optimization algorithms such as Genetic Algorithm (GA)-based method [16-18] and Ant Colony Alternative algebraic methods are based on transforms such as Optimization (ACO)-based method have attracted a lot of downsampling, Fourier transform (FT), discrete cosine attention . In the proposed FR system we utilized an transform (DCT), and the discrete wavelet transform (DWT). evolutionary feature selection algorithm based on swarm Transformation based feature extraction methods such as the intelligence called the Biogeography Based Optimization. DCT was found to generate good FR accuracies with very low Biogeography Based Optimization is explained in the next computational cost . section. B. Discrete Cosine Transform D. Biogeography based Optimization DCT has emerged as a popular transformation technique widely Biogeography is the study of the distribution of biodiversity used in signal and image processing. This is due to its strong over space and time. It aims to analyze where organisms live, “energy compaction” property: most of the signal information and in what abundance. Biogeography is modeled in terms of tends to be concentrated in a few low-frequency components of such factors as habitat area and immigration rate and emigration the DCT. The use of DCT for feature extraction in FR has been rate, and describes the evolution, extinction and migration of described by several research groups [9-15]. DCT transforms species. Biogeography-Based Optimization (BBO) is a new the input into a linear combination of weighted basis functions. biogeography inspired algorithm for global optimization. BBO These basis functions are the frequency components of the input  is a new biogeography inspired global optimization data. algorithm, which is similar to the island model-based GAs . The general equation for the DCT of an NxM image f Each individual is considered as a ‘‘habitat” with a habitat (x, y) is defined by the following equation: suitability index (HSI) to measure the individual. The variables N 1M 1 F (u,v) (u) (v) cos x0 y0 .u 2.N (2 x1) cos .u 2.M (2 y1) f ( x, y) ... (i) of the individual that characterize habitability are called suitability index variables (SIVs). In BBO, each individual has its own immigration rate and emigration rate µ. The Where f (x, y) is the intensity of the pixel in row x and column y; immigration rate and emigration rate are functions of the u= 0, 1,… N-1 and v=0, 1,… M-1 and the functions α(u) , α(v) number of species in the habitat. They can be calculated as are defined as: follows: 1 k for u ,v 0 k I 1 … (iii) ( u ), ( v ) N 2 … (ii) n for u ,v 0 N k For most images, much of the signal energy lies at low k E … (iv) frequencies (corresponding to large DCT coefficient n magnitudes); these are relocated to the upper-left corner of the where I is the maximum possible immigration rate; E is the DCT array. Conversely, the lower-right values of the DCT array maximum possible emigration rate; k is the number of species of represent higher frequencies, and turn out to be small enough to the kth individual; and n is the maximum number of species. be truncated or removed with little visible distortion. This means Note that Eqs. (iii) and (iv) are just one method for calculating that the DCT is an effective tool that can pack the most effective and µ, there are other different options to assign them based on features of the input image into the fewest coefficients. different species models . C. Feature Selection In BBO, there are two main operators, i.e., migration and mutation. Suppose that we have a global optimization problem After extracting the features, we further need minimal subset of and a population of candidate individuals. The individual is features so that we are able to recognize the face .Due to this represented by a D-dimensional integer vector (SIV). The reason we need a feature selection algorithm that reduces the population consists of NP = n parameter vectors Xi, i = 1. . . NP. 127 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011 One option for implementing the migration operator and the with a high HSI, and a poor solution represents an habitat with a mutation operator can be described in Figure 1 and 2, low HSI. High HSI solutions resist change more than low HSI respectively. Where rndreal (0, 1) is a uniformly distributed solutions. By the same token, high HSI solutions tend to share random real number in (0,1) and Xi(j) is the jth SIV of the solu- their features with low HSI solutions. (This does not mean that tion Xi. mi is the mutation rate that is calculated as: the features disappear from the high HSI solution; the shared features remain in the high HSI solutions, while at the same P mi = mmax 1 i P ...(v) time appearing as new features in the low HSI solutions. This is max similar to representatives of a species migrating to a habitat, while other representatives remain in their original habitat). where mmax is an user-defined parameter, and Pmax = arg max Pi, Poor solutions accept a lot of new features from good solutions. i = 1,. . ., NP. Each population member has an associated This addition of new features to low HSI solutions may raise the probability, which indicates the likelihood that it was expected a quality of those solutions. Good solutions have high emigration priori to exist as a solution to the given problem. The steady rate and they share their features (SIVs) with bad solutions that state value for the probability of the number of each species to have high immigration rate. Additionally, the mutation operator exist is given by : tends to increase the diversity of the population. The BBO algorithm can described with the following algorithm in figure 1 P n , k 0 0 3: 1 0 1 k1 k 1 2 k Pseudo-code for biogeography-based optimization. Here H 1 P ...( ) vi indicates habitat, HSI is fitness, SIV (suitability index variable) k P 01 k1 is a solution feature, denotes immigration rate and µ denotes k , 1 k n n 01 k1 emigration rate. 12 k 1 k1 12 k Biogeography-Based Optimization (BBO) Begin The largest possible number of species that the habitat can sup- /* BBO parameter initialization */ port is n. It is necessary that μk ≠0 for all k for this limiting 1. Create a random set of habitats (population) probabilities to exist. H1,H2, . . . ,Hn; 1: for i = 1 to NP 2. Compute corresponding HSI values; /* End of BBO parameter initialization */ 2: Select Xi with probability α i 3. While not T /* T is a termination criterion */ 3: if rndreal (0, 1) < i then 4. Compute immigration rate and emigration 4: for j = 1 to NP do rate µ for each habitat based on HSI; 5: Select Xj with probability α µj /* Migration */ 6: if rndreal (0, 1) < µj then 5. Apply migration as defined in algorithm 1. 7: Randomly select an SIV σ from Xj /* End of migration */ 8: Replace a random SIV in Xi with σ /* Mutation*/ 9: end if 6. Apply mutation as defined in algorithm 2. 10: end for /* End of mutation */ 11: end if 7. Recompute HSI values; 12: end for 8. End while Figure.1: Algorithm for Habitat Migration. 9. End 1: for i = 1 to NP 2: Compute the probability Pi 3: Select SIV Xi(j) with probability α Pi Figure 3: Main BBO Algorithm 4: if rndreal (0, 1) < mi then 5: Replace Xi(j) with a randomly generated II. BBO-BASED FEATURE SELECTION SIV 6: end if In this proposed work, features of image are extracted using 7:end for DCT technique. The extracted features are reduced further by Figure.2: Algorithm for Habitat Mutation using Biogeography Based Optimization to remove redundancy With the migration operator, BBO can share the information and irrelevant features. The resulting feature subset (obtained by between solutions. A good solution is analogous to an habitat BBO) is the most representative subset and is used to recognize 128 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011 the face from face gallery. D. Classifier A. Habitat Representation After the training phase, a typical and popular Euclidean distance is employed to measure the similarity between the test In proposed work, each habitat represents one possible solution vector and the reference vectors in the gallery. Euclidean (feature subset) required for face recognition. Each of the distance is defined as the straight-line distance between two features extracted by DCT of image represents one Suitability points. For N-dimensional space, the Euclidean distance Index Variable (SIV) of the habitat. Further, during feature between two any points’ pi and qi is given by: subset selection each of these feature is either selected or rejected, SIVЄ C is an integer and C [0, 1]. A habitat H Є N SIVm where m is the length of the feature vector extracted by the DCT. D (p i 1 i qi ) 2 …(x) Where pi (or qi) is the coordinate of p (or q) in dimension i. B. SIV Mutation E. Proposed BBO-Based Feature Selection Algorithm In proposed work, a habitat is chosen for mutation based on mutation rate and species count probabilities defined in (4) and In the proposed work (figure 4), the features of image are (5). Once a habitat is selected for mutation, a random SIV is extracted using DCT technique. These extracted features are selected; it is mutated to 0 if its value is 1 or vice versa. further reduced (or selected) using BBO. In BBO, each SIV of Therefore, if a particular feature was earlier selected, it is habitats is randomly set to either 0 or 1 initially, which implies rejected after mutation and vice versa. that initial feature subset selection is done randomly but after the completion of BBO algorithm, BBO helps to select the optimal C. Habitat Suitability Index set of features from the given features. The stopping criterion of proposed algorithm is number of iterations. At the end of In each generation, each habitat is evaluated, and a value of training phase, we have the optimal set of features. These goodness or fitness is returned by a fitness function. This features are then selected from the test image and the face evolution is driven by the fitness function F that evaluates the gallery. The test image is recognized as that face from face quality of habitat in terms of their ability to maximize the class gallery which has minimum Euclidean distance from the test separation term indicated by the scatter index among the image on the basis of these selected features. different classes . Let w1, w2 ..., wL and N1, N2,..., NL denote the classes and number of images within each class, 1. Feature Extraction: Obtain the DCT array by applying respectively. Let M1 ,M2 ,..., ML and M0 be the means of Discrete Cosine Transformation to image. 2. Take the most representative features of size nxn from corresponding classes and the grand mean in the feature space, upper left corner of DCT Array. Mi can be calculated as: 3. Feature Selection: Apply the BBO algorithm defined in algorithm 3 to Ni obtain the feature subset of the extracted features. 1 Mi Ni Wj 1 j (i ) , i 1,2,...., L … (vii) 4. Pick up the habitat H with max (HSI) value. The SIVs of this habitat H represent the best feature subset of the features defined in step 2. (i ) (Feature Selection Ends) Where W j , j=1,2,…,Ni , represents the sample image from 5. Classification: calculate the difference between the class wi and grand mean M0 is: feature subset (obtained in step 4) of each image of facial gallery and the test image with the help of Euclidean Distance defined in formula (x). The index L 1 of the image which has the smallest distance with the M0 N N M i 1 i i … (viii) image under test is considered to be the required index. Where N is the total no of images of all the classes. Thus the Figure 4: Face Recognition using BBO based Feature Selection between class scatter fitness function F is computed as follow: L III. EXPERIMENTAL RESULTS F (M i 1 i M 0 ) (M i M 0 ) t … (ix) The performance of the proposed feature selection algorithm is evaluated using the standard Cambridge ORL gray-scale face 129 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011 database. The ORL database of faces contains a set of face TABLE II. Results of BBO-FS algorithm images taken between April 1992 and April 1994 at the AT&T DCT Number of Average Training Average Laboratories (by the Oliver Research Laboratory in Cambridge, Feature Features no. of time (in Recogniton UK)  and . The database is composed of 400 images Vector input to features seconds) Rate corresponding to 40 distinct persons. The original size of each Size BBO-FS selected image is 92x112 pixels, with 256 grey levels per pixel. Each by BBO - FS subject has 10 different images taken in various sessions varying 20X20 400 219 85.743 100% the lighting, facial expressions (open/ closed eyes, smiling/ not 30X30 900 451 95.166 100% smiling) and facial details (glasses/ no glasses). All the images 40X40 1600 814 136.178 100% were taken against a dark homogeneous background with the 50X50 2500 1243 165.101 100% subjects in an upright, frontal position (with tolerance for some side movement). Four images per person were used in the For each of the problem instance (20X20, 30X30, 40X40, and training set and the remaining six images were used for testing. 50X50), algorithm is run 5 times and each time, random test image is chosen to be matched with face gallery. The test face TABLE I. BBO parameter setting matches with image in face gallery in each trial and average recognition rate is 100 % for each problem instance. The BBO- Size of ecosystem (No of Habitats) 30 selection algorithm reduces the size of original feature vector to Number of iterations of BBO algorithm 100 52%, 50%, 50.7%, and 50% for problem instance of 20X20, 30X30, 40X40, and 50X50 respectively. For example, if the DCT of an image is calculated and 20X20 DCT subset is taken SIV value 0 or 1 from upper left of DCT array, there are total 400 features which are given as an input to BBO-FS algorithm. BBO-FS reduces the 400 features to 219 which means only 219 features are In this work, we test the BBO-based feature selection algorithm required to recognize the face from facial gallery. with feature vectors based on various sizes of DCT coefficient. The 2-dimentional DCT is applied to the input image and only a subset of the DCT coefficients corresponding to the upper left corner of the DCT array is retained. Subset sizes of 50x50, IV. CONCLUSION 40x40, 30x30 and 20x20 of the original 92x112 DCT array are used in this work. Each of 2- dimensional subset DCT array is In this paper, a novel BBO-based feature selection algorithm for converted to a 1-dimensional array using raster scan. This is FR is proposed. The algorithm is applied to feature vectors achieved by processing the image row by row concatenating the extracted by Discrete Cosine Transform. The algorithm is consecutive rows into a column vector. This column vector is utilized to search the feature space for the optimal feature the input to the subsequent BBO-feature selection algorithm. subset. Evolution is driven by a fitness function defined in terms of class separation. The classifier performance and the length of To calculate average recognition rate for each problem instance selected feature vector were considered for performance (20X20, 30X30, 40X40, and 50X50 DCT Array), test image is evaluation using the ORL face database. Experimental results randomly chosen from 40 classes. Five trials are taken for each show the superiority of the BBO-based feature selection problem instance. Average recognition is measured by knowing algorithm in generating excellent recognition accuracy with the how many times correct faces were identified out of 5 trials (for minimal set of selected features. each problem instance). The average recognition rate is measured together with the CPU training time and the average REFERENCES number of selected features for each problem instance. The algorithm has been implemented in Matlab 7 and the result for  W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face Recognition: A Literature Survey”, ACM Computing Surveys, vol. 35, no. 4, pp. 399-458, each problem instance is shown in table 2: 2003. R. Brunelli and T. Poggio, “Face Recognition: Features versus Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 1042-1052, 1993.  C. Liu and H. Wechsler, “Evolutionary Pursuit and Its Application to Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 570-582, 2000. 130 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 5, May 2011  M. A. Turk and A. P. Pentland, “Face Recognition using Eigenfaces,” Proc.  C. Liu and H. Wechsler, “Evolutionary Pursuit and Its Application to of IEEE Conference on Computer Vision and Pattern Recognition, pp. 586-591, Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. June 1991. 22, no. 6, pp. 570-582, 2000.  L. Du, Z. Jia, and L. Xue, “Human Face Recognition Based on Principal  F. M. Matos, L. V. Batista, and J. Poel, “Face Recognition Using DCT Component Analysis and Particle Swarm Optimization -BP Neural Network,” Coefficients Selection,” Proc. of the 2008 ACM Symposium on Applied Proc 3rd Conference. on Natural Computation (ICNC 2007),vol. 3, pp. 287-291, Computing, (SAC’08),pp. 1753-1757, March 2008. August 2007.  H. R. Kanan, K. Faez, and M. Hosseinzadeh, “Face Recognition System  X. Yi-qiong, L. Bi-cheng and W. Bo, “Face Recognition by Fast Independent Using Ant Colony Optimization-Based Selected Features,” Proc. IEEE Symp. Component Analysis and Genetic Algorithm,” Proc. of the 4th International Computational Intelligence in Security and Defense Applications (CISDA 2007), Conference on Computer and Information Technology (CIT’04), pp. 194-198, pp 57-62, April 2007. Sept. 2004.  X. Fan and B. Verma, “Face recognition: a new feature selection and AUTHORS PROFILE classification technique,” Proc. 7th Asia-Pacific Conference on Complex Systems, December 2004.  A. S. Samra, S. E. Gad Allah, R. M. Ibrahim, “Face Recognition Using Wavelet Transform, Fast Fourier Transform and Discrete Cosine Transform,” Navdeep Kaur has done B-Tech (Hons.) Proc. 46th IEEE International Midwest Symp. Circuits and Systems in Computer Science & Engineering & (MWSCAS'03), vol. 1, pp. 272- 275, 2003.  A. S. Samra, S. E. Gad Allah, R. M. Ibrahim, “Face Recognition Using scored 81% marks from Punjab Wavelet Transform, Fast Fourier Transform and Discrete Cosine Transform,” Technical University, Jalandhar (India) Proc. 46th IEEE International Midwest Symp. Circuits and Systems in 2005 and MTech in Computer Science (MWSCAS'03), vol. 1, pp. 272- 275, 2003. & Engineering from Guru Nanak Dev  Z. Yankun and L. Chongqing, “Efficient Face Recognition Method based on DCT and LDA”, Journal of Systems Engineering and Electronics, vol. 15, Engineering College, Ludhiana of India no. 2, pp. 211-216, 2004. in 2009. She is currently working as a  C. Podilchuk and X. Zhang, “Face Recognition Using DCT-Based Feature lecturer in computer science and IT department of Rayat Vectors,” Proc. IEEE International Conference on Acoustics, Speech and Signal institute of Engineering & Information Technology of India. Processing (ICASSP’96), vol. 4, pp. 2144-2147, May 1996.  F. M. Matos, L. V. Batista, and J. Poel, “Face Recognition Using DCT Coefficients Selection,” Proc. of the 2008 ACM Symposium on Applied Computing, (SAC’08),pp. 1753-1757, March 2008. Er.Poonam Gupta has done BTech(First  M. Yu, G. Yan, and Q.-W. Zhu, “New Face recognition Method Based on Division) in Information Technology & DWT/DCT Combined Feature Selection,” Proc. 5th International Conference on Machine Learning and Cybernetics, pp. 3233-3236, August 2006. scored 67.64% marks from Kurukshetra  Z. Pan and H. Bolouri, “High Speed Face Recognition Based on Discrete University, Kurukshetra(India) in 2007 Cosine Transform and Neural Networks,” Technical Report, Science and and M.Tech in Computer Science & Technology Research Center (STRC), University of Hertfordshire. Engineering from Rayat institute of  Z. M. Hafed and M. D. Levine, “Face Recognition Using Discrete Cosine Transform”, International Journal of Computer Vision, vol. 43, no. 3, pp. 167- Engineering & Information Technology, 188. 2001 Railmajra of India in 2011. She is  X. Fan and B. Verma, “Face recognition: a new feature selection and currently working as a lecturer in computer science and IT classification technique,” Proc. 7th Asia-Pacific Conference on Complex Systems, December 2004. department of Rayat Polytechnic college (Evening shift) of  D.-S. Kim, I.-J. Jeon, S.-Y. Lee, P.-K. Rhee, and D.-J. Chung, “Embedded Punjab, India. She has presented 3 papers in National Face Recognition based on Fast Genetic Algorithm for Intelligent Digital Conferences. Photography,” IEEE Trans. Consumer Electronics, vol. 52, no. 3, pp. 726-734, August 2006.  M. L. Raymer, W. F. Punch, E. D. Goodman, L.A. Kuhn, and A. K Jain, Amanpreet Kaur has done B.Tech in “Dimensionality Reduction Using Genetic Algorithms,” IEEE Trans. Computer Science & Engineering Evolutionary Computation, vol. 4, no. 2, pp. 164-171, July 2000. and scored 73% marks from Punjab  H. R. Kanan, K. Faez, and M. Hosseinzadeh, “Face Recognition System Technical University, Jalandhar Using Ant Colony Optimization-Based Selected Features,” Proc. IEEE Symp. Computational Intelligence in Security and Defense Applications (CISDA 2007), (India) in 2007 and M.Tech in also pp 57-62, April 2007 the same stream from Rayat  D. Simon, “Biogeography-based optimization”, IEEE Transactions on Institute of Engineering & Evolutionary Computation, vol. 12, no. 6, pp. 702-713, 2008. Information Technology,Railmajra,  D. Whitley, S. Rana, R.B. Heckendorn, “The island model genetic algorithm: on separability, population size and convergence”, Journal of Punjab,India in 2010. She scored Computing and Information Technology 7 (1998) 33–47. 71% marks in her post graduation.  Gong, W., Cai, Z., Ling, C.X. ,Li, H., “A real-coded biogeography-based She is working as a lecturer in Computer Science & I.T. optimization with mutation”, Applied Mathematics and Computation, vol. 216, department in Rayat Institute of Engineering & information no. 9, pp. 2749–2758,201 technology , Railmajra, Punjab, India. 131 http://sites.google.com/site/ijcsis/ ISSN 1947-5500