VIEWS: 4 PAGES: 6 CATEGORY: Education POSTED ON: 3/18/2010
Classiﬁcation System Optimization with Multi-Objective Genetic Algorithms † Paulo V. W. Radtke1,2 , Robert Sabourin1,2 , Tony Wong1 1´ e Ecole de Technologie Sup´ rieure - Montreal, Canada 2 ı o a Pontif´cia Universidade Cat´ lica do Paran´ - Curitiba, Brazil † e-mail: radtke@livia.etsmtl.ca Abstract level to optimize an ensemble of classiﬁers (EoC) to im- prove accuracy. This paper discusses a two-level approach to optimize classiﬁcation systems with multi-objective genetic algo- Choice of feature transformations rithms. The ﬁrst level creates a set of representations through feature extraction, which is used to train a clas- (a) siﬁer set. At this point, the most performing classiﬁer can IFE be selected for a single classiﬁer system, or an ensem- Data set ble of classiﬁers can be optimized for improved accuracy. {RSIF E } Two zoning strategies for feature extraction are discussed and compared using global validation to select optimized Classifier solutions. Experiments conducted with isolated handwrit- training ten digits and uppercase letters demonstrate the effective- (b) EoC ness of this approach, which encourages further research {K} Optimization in this direction. SI SE Keywords: Classiﬁcation systems, feature extraction, ensemble of classiﬁers, multi-objective genetic algorithms Figure 1. Classiﬁcation system optimization ap- 1. Introduction proach. Representations obtained with IFE are used to further improve accuracy with EoCs. Image-based pattern recognition (PR) requires that pixel information be ﬁrst transformed into an abstract rep- This paper extends the work in [2]. New contributions resentation (a feature vector) suitable for recognition with lies in (1) the comparison of zoning operators for the IFE classiﬁers, a process known as feature extraction. A rel- methodology with handwritten digits, and (2) the applica- evant classiﬁcation problem is the intelligent character tion of the most performing operator to optimize a classi- recognition (ICR), most speciﬁcally the ofﬂine recogni- ﬁcation system for uppercase letters. Another difference tion of isolated handwritten symbols on documents. A is the use of a global validation strategy [3] to select solu- methodology to extract features must select the spatial lo- tions during optimization. The global validation strategy cation to apply transformations on the image [1]. The improves average results obtained in comparison to the choice takes into account the domain context, the type traditional validation approach used in [2]. The paper has of symbols to classify, and the domain knowledge, what the following structure. The approach to optimize classiﬁ- was previously done in similar problems. The process is cation systems is discussed in Section 2, and Section 3 dis- usually performed by a human expert in a trial-and-error cusses how the multi-objective genetic algorithms (MO- process. We also have that changes in the domain context GAs) were used. Section 4 details the experimental pro- may manifest in the same classiﬁcation problem, which tocol and Section 5 presents the results obtained. Finally, also requires changes in the classiﬁcation system. Section 6 discusses the goals attained. To minimize the human intervention in deﬁning and adapting classiﬁcation systems, this problem is modeled 2. Classiﬁcation System Optimization as an evolutionary multi-objective optimization problem (MOOP), using the domain knowledge and the domain Classiﬁcation systems are modeled in a two-level pro- context. This paper details the two-level genetic approach cess. The ﬁrst level uses the IFE methodology to obtain to optimize classiﬁcation systems in Fig. 1. The ﬁrst level the representation set RSIF E (Fig. 1.a). The represen- employs the Intelligent Feature Extraction (IFE) method- tations in RSIF E are then used to train the classiﬁer set ology to extract feature sets that are used on the second K that is considered for aggregation on an EoC SE for z1 z2 z2 Feature extraction F = {f 1 , f 2 } 1 z operator I Zoning operator Figure 2. IFE structure. improved accuracy (Fig. 1.b). Otherwise, if a single clas- are compared to a baseline representation with a high de- siﬁer is desired for limited hardware, such as embedded gree of accuracy on handwritten digits with a multi-layer devices, the most accurate single classiﬁer SI may be se- Perceptron (MLP) classiﬁer [4]. Its zoning strategy, de- lected from K. The next two subsections details both the tailed in Fig. 3.b, is deﬁned as a set of three image di- IFE and EoC optimization methodologies. viders, producing 6 zones. The divider zoning operator expands the baseline zoning concept into a set of 5 hori- 2.1. Intelligent Feature Extraction zontal and 5 vertical dividers that can be either active or inactive, producing zoning strategies with 1 to 36 zones. The goal of IFE is to help the human expert deﬁne rep- Fig. 3.a details the operator template, genetically repre- resentations in the context of isolated handwritten sym- sented by a 10-bit binary string. Each bit is associated bols, using a wrapper approach with a fast training clas- with a divider’s state (1 for active, 0 for inactive). siﬁer. IFE models handwritten symbols as features ex- tracted from speciﬁc foci of attention on images using zon- d0 d1 d2 d3 d4 0 0 1 0 0 ing. Two operators are used to generate representations d9 d8 d7 d6 d5 0 with IFE: a zoning operator to deﬁne foci of attention 1 over images, and a feature extraction operator to apply 0 transformations in zones. The choice of transformations 1 for the feature extraction operator constitutes the domain 0 knowledge. The domain context is introduced as actual (a) (b) observations in the optimization data set used to evalu- ate and compare solutions. Hence, the zoning operator is optimized by the IFE to the domain context and domain Figure 3. Divider zoning operator (a). The baseline knowledge. representation in (b) is obtained by setting only d2 , d6 The IFE structure is illustrated in Fig. 2. The zoning and d8 as active. operator deﬁnes the zoning strategy Z = {z 1 , . . . , z n }, where z i , 1 ≤ i ≤ n is a zone in the image I and n the The hierarchical zoning operator is the second option, total number of zones. Pixels inside the zones in Z are recursively deﬁning a zoning strategy with the set of eight transformed by the feature extraction operator in the rep- patterns in Fig. 4. Zones inside a root pattern are recur- resentation F = {f 1 , . . . , f n }, where f i , 1 ≤ i ≤ n is sively partitioned with another pattern in the set, as illus- the partial feature vector extracted from z i . At the end trated in Fig. 5. This zoning strategy is described by the of the optimization process, the resulting representation string ba#eg, where # is a pattern ignored in the root pat- set RSIF E = {F 1 , . . . , F p } presents the IFE user with a tern. choice among various trade-offs with respect to the opti- For our experiments with the hierarchical zoning op- mization objectives. erator, only one level of recursion is allowed and a maxi- The result set RSIF E is used to train a discriminating mum of 16 zones can be deﬁned. This choice is to avoid classiﬁer set K = {K 1 , . . . , K p }, where K i is the classi- too small zones, close to pixel size, that would not con- ﬁer trained with representation F i . The ﬁrst hypothesis is tribute to classiﬁcation. The operator is genetically en- to select the most accurate classiﬁer SI, SI ∈ K for a sin- coded with a 15 bits binary string, where 5 patterns (one gle classiﬁer system. The second hypothesis is to use K root plus four leaves) are encoded with 3 bits each. Un- to optimize an EoC for higher accuracy, an approach dis- like the divider zoning operator, the hierarchical zoning cussed in Section 2.2. The remainder of this section dis- operator can not reproduce the baseline representation. cusses the IFE operators chosen for experimentation with 2.1.2. Feature Extraction Operator isolated handwritten characters and the candidate solution evaluation. Oliveira et al. used and detailed in [4] a mixture of concavities, contour directions and black pixel surface 2.1.1. Zoning Operators transformations, extracting 22 features per zone (13 for Two zoning operators are compared, the divider zon- concavities, 8 for contour directions and 1 for surface). ing operator and the hierarchical zoning operator. Both To allow a direct comparison between IFE and the base- sume that RSIF E generates a set K of p diverse and fairly accurate classiﬁers. To realize this task as a MOOP, the A B A A A B classiﬁers in K are associated with a binary string E of p B D C D C D C bits, which is optimized to select the best combination of classiﬁers using a MOGA. The classiﬁer K i is associated with the ith binary value in E, which indicates whether or (a) (b) (c) (d) not the classiﬁer is active in the EoC. The optimization process is guided by two objectives, B A EoC cardinality and EoC quality. EoC cardinality is min- A A B A imized to reduce classiﬁcation time, and quality is mea- C C sured through the combined classiﬁer accuracy on the op- timization data set, as discussed in [8]. The optimization (e) (f) (g) (h) goal is to minimize both EoC cardinality and the associ- ated error rate on the optimization data set. Evaluating the Figure 4. Hierarchical recursive patterns. EoC error rate requires actual classiﬁer aggregation. The normalized continuous values of MLP outputs are aggre- gated by their output average [7]. To speed up the pro- A cess, the MLP outputs are calculated once only and stored D C in memory for future aggregation. PD classiﬁers are ag- gregated by majority voting. As with MLP classiﬁers, PD A C D votes are calculated once only and stored in memory. 3. Multi-Objective Genetic Optimization Two algorithms are used in the experiments. The ﬁrst was designed for the IFE methodology, the Multi- Figure 5. Hierarchical zoning example. Objective Memetic Algorithm (MOMA) [9]. The second algorithm is used for the EoC optimization, the Fast Non- line representation, the same feature transformations (the Dominated Sorting Genetic Algorithm (NSGA-II) [10], a domain knowledge) are used to assess the IFE. well known algorithm in the literature. MOMA was cho- 2.1.3. Candidate Solution Evaluation sen over NSGA-II for the IFE methodology for it’s higher solution diversity to optimize an EoC later. On the other Candidate solutions are evaluated with respect to hand, NSGA-II is chosen over MOMA for EoC optimiza- two objective functions, classiﬁcation accuracy (wrapper tion for being faster and producing comparable results. mode) and cardinality. A lower representation dimension- IFE and EoC solutions obtained by the optimization ality is associated to higher generalization power and to algorithm may be over-ﬁtted to the optimization data set. less processing time for feature extraction and classiﬁca- To avoid over-ﬁt after the optimization process, result- tion. Thus, the objectives are to minimize both dimension- ing solutions are traditionally validated with a disjoint se- ality (zone number) and the classiﬁcation error rate on the lection data set [11] to select the most accurate solution. optimization data set (the domain context). However, our experiments indicate that a more robust val- The wrapped classiﬁer needs to be computationally ef- idation process is needed. With NSGA-II to optimize an ﬁcient and reasonably accurate to prototype IFE solutions. EoC, Fig. 6 details all individuals in the population at Kimura et al. discussed in [5] the projection distance (PD) generation t = 14. Fig. 6.a is the objective function space classiﬁer, which is fairly quickly to train and classify ob- used during the optimization process (optimization data servations. Therefore, the PD classiﬁer has been chosen set), and Fig. 6.b is the objective function space used for to the IFE wrapper approach. validation (with the validation data set). Points are can- didate solutions in the current generation (MLP EoCs). 2.2. EoC Optimization Circles represent the best optimization trade-offs, and di- A recent trend in PR has been to combine several amonds the best trade-offs in validation. Solutions with classiﬁers to improve their overall performance. Algo- good generalization power may be eliminated by genetic rithms for creating EoCs will usually fall into one of two selection, which emphasizes solutions with good perfor- main categories. They either manipulate the training sam- mance on the optimization data set (memorization). The ples for each classiﬁer in the ensemble (like Bagging and most appropriate is to validate all candidate solutions dur- Boosting), or they manipulate the feature set used to train ing the optimization process with a selection data set and classiﬁers [6]. The key issue is to generate a set of diverse store good solutions in an auxiliary archive. This process and fairly accurate classiﬁers for aggregation [7]. is referred as the global validation and is further detailed We create EoCs on a two-level process. The ﬁrst level in [3]. creates a classiﬁer set K with IFE, and the second level An algorithmic template for MOGAs using global val- optimizes the classiﬁers aggregated as a MOOP. We as- idation is detailed in Algorithm 1, requiring a selection Result: Auxiliary archive S Creates initial population P1 with m individuals; 0.54 S = ∅; 0.52 t=1; 0.5 while t < mg do Evolves Pt+1 from Pt ; Error rate 0.48 Validate Pt+1 with the selection data set; 0.46 Update the auxiliary archive S with individuals 0.44 from Pt+1 based on the validation results; 0.42 t=t+1; 0.4 end 10 15 20 25 Algorithm 1: Algorithmic template for a MOGA Classifiers with global validation. (a) Optimization compared to the baseline representation deﬁned in [4]. All tests are replicated 30 times and average values are pre- sented. The data sets in Tables 1 and 2 are used in the ex- periments – isolated handwritten digits and uppercase 0.48 letters from NIST-SD19. MLP hidden nodes are opti- 0.46 mized as feature set cardinality fractions in the set f = {0.4, 0.45, 0.5, 0.55, 0.6}. Classiﬁer training is performed Error rate 0.44 with the training data set, except for handwritten digits with the PD classiﬁer that uses the smaller training’ data 0.42 Over−fit set (to implement a computationally efﬁcient wrapper). The validation data set is used to adjust the classiﬁer pa- 0.4 10 15 20 25 30 rameters (MLP hidden nodes and PD hyper planes). The Classifiers wrapper approach is performed with the optimization data set, and the selection data set is used with the global vali- (b) Validation dation strategy. Solutions are compared with the test data sets, testa and testb for digits, and test for uppercase let- ters. Figure 6. MLP EoC solutions as perceived by the optimization and validation processes at generation Table 1. Handwritten digits data sets extracted from t = 14 with NSGA-II. NIST-SD19. data set and an auxiliary archive S to store the validated Data set Size Origin Offset solutions. An MOGA evolves the population Pt during training’ 50000 hsf 0123 1 mg generations. At each generation, the population Pt+1 training 150000 hsf 0123 1 is validated and the auxiliary archive S is updated with validation 15000 hsf 0123 150001 good solutions. As the validation strategy used to train optimization 15000 hsf 0123 165001 classiﬁers, this validation stage provides no feedback to selection 15000 hsf 0123 180001 the MOGA. At the end of the optimization process, the testa 60089 hsf 7 1 best solutions are stored in S. testb 58646 hsf 4 1 Table 2. Handwritten uppercase letters data sets ex- 4. Experimental Protocol tracted from NIST-SD19. The tests are performed as in Fig. 1, targeting both Data set Size Origin Offset the PD and MLP classiﬁers. The IFE methodology is training 43160 hsf 0123 1 solved to obtain the representation set RSIF E . This set validation 3980 hsf 4 1 is used to train the classiﬁer sets KP D and KMLP , using optimization 3980 hsf 4 3981 the PD and MLP classiﬁers. For a single classiﬁer sys- selection 3980 hsf 4 7961 tem, the most accurate classiﬁers SIP D , SIP D ∈ KP D test 12092 hsf 7 1 and SIMLP , SIMLP ∈ KMLP are selected. EoCs are then created with KP D and KMLP , producing SEP D The parameters used with MOMA are the following: and SEMLP . Zoning strategies are compared with hand- written digits and the most performing is then applied crossover probability is set to pc = 80%, and mutation is set to pm = 1/L, where L is the length of the mutated with handwritten uppercase letters. Solutions obtained are binary string [12]. The maximum number of generations optimization approach. We observe again that the opti- is set to mg = 1000 and the local search will look for mized EoC SE is more accurate than the single classiﬁer n = 1 neighbors during N I = 3 iterations, with deviation SI, further justifying the choice for EoCs on robust clas- a = 0%. Each slot in the archive S is allowed to store siﬁcation systems. maxS l = 5 solutions. These parameters were determined Comparing solutions with the baseline representation, empirically. The same parameters (pc = 80%, pm = 1/L average improvements obtained with the IFE and EoC and mg = 1000) are used for NSGA-II. Population size approaches justify the methodology. The IFE produced depends on the optimization problem. To optimize the the same results in the 30 replications, thus the SI er- IFE with MOMA, the population size is m = 64. For ror rates in Tables 3 and 4 are the most accurate single the EoC optimization, m = 166 is used. Individual ini- classiﬁers. For digits and the divider zoning operator, the tialization is performed in two steps for both optimization lowest EoC error rates with PD are etesta = 1.93% and algorithms. The ﬁrst step creates one individual for each etestb = 5.06%, and with the MLP they are etesta = possible cardinality value, zone number for the IFE, and 0.73% and etestb = 2.31%. Finally, for uppercase let- aggregated classiﬁers for EoC optimization. The second ters the lowest EoC error rate with PD is etest = 6.22% step completes the population with individuals initialized and etest = 3.89% with MLP. with a Bernoulli distribution. The global validation strategy outperformed the tradi- Experiments are conducted on a Beowulf cluster with tional validation approach used in [2]. With handwritten 25 nodes (Athlon XP 2500+ processors and 1GB RAM). digits, selecting the best EoC validated in the last pop- The optimization algorithms were implemented using ulation Pt yields average error rates of etesta = 2.07% LAM MPI v6.5 in master-slave mode with a simple load and etestb = 5.37% with PD, and etesta = 0.77% and balance. PD vote and MLP output calculations were per- etestb = 2.42% with MLP, higher values in comparison to formed once in parallel using a load balance strategy, and EoCs in Table 3. A more complete analysis is presented results were stored in ﬁles to be loaded into memory for in [3]. the EOC optimization process. Finally, Fig. 7 details the zoning strategies used to train the SI classiﬁer selected from KP D /KMLP . Figures 5. Experimental Results 7.a and 7.b details the zoning strategy selected with hand- written digits, using the divider and hierarchical zoning The classiﬁcation system is ﬁrst optimized for hand- operators respectively. For handwritten digits, the zoning written digits using both zoning operators. Results for representation were the same for both the PD and MLP both the PD and MLP classiﬁers are indicated in Table classiﬁers. However, with uppercase letters the selected 3. Table columns are as follows: zoning operator is the zoning representation depends on the classiﬁer. With the IFE zoning operator used, solution indicates the solution PD classiﬁer, the classiﬁer SIP D was trained using the name, |S| the solution cardinality (features or classiﬁer representation in Fig. 7.c, while with the MLP classiﬁer number), HN the MLP hidden nodes and the error rates we had that SIMLP used the representation in 7.d. on the testa and testb data sets are indicated as etesta and Comparing results obtained to other representations in etestb respectively. the literature, we have the following scenario. Milgram et The ﬁrst conclusion is that the optimized EoC SE pro- al experimented with isolated handwritten digits in [13]. vides lower error rates than the single classiﬁer SI. Com- Using the same baseline representation they obtained er- paring zoning operators, the superiority of the divider zon- ror rates of 1.35% on testa with a NN classiﬁer and 0.63% ing operator is clear, as the hierarchical zoning operator on testa with a SVM (one against all). As for handwrit- has higher error rates. Comparing results with the base- ten uppercase letters, it is difﬁcult to compare results di- line representation deﬁned by the human expert, we ob- rectly. Differences in the experimental protocol to train serve that the divider zoning operator outperform the base- and test classiﬁers make a direct comparison unfeasible line representation in both SI and SE solutions, with both with the results in [14, 15]. The same protocol was used in classiﬁers, whereas the hierarchical zoning operator fails [13] with the baseline representation, yielding error rates to do the same. of 7.60% with a 3-NN classiﬁer and 3.17% with a SVM To verify these statements, a multiple comparison is classiﬁer (one against all). These results indicate that the performed. A Kruskal-Wallis nonparametric test is used use of a more discriminant target classiﬁer may improve to test the equality of mean values, using bootstrap to cre- results obtained with the proposed approach to optimize ate the conﬁdence intervals from the 30 observations in classiﬁcation systems. each sample. The conclusions presented regarding the zoning strategies and improvements obtained with EoCs were conﬁrmed as true, with a conﬁdence level of 95% 6. Discussion (α = 0.05). Thus, we choose the divider zoning oper- The methodology to optimize classiﬁcation systems ator to experiment with uppercase letters, also expecting outperformed the baseline representation deﬁned by an accuracy improvements with the EoC optimization. human expert. Obtained solutions are suitable for two Results obtained with uppercase letters are indicated different situations. The single classiﬁer SI can be ap- in Table 4. Again, the baseline representation is outper- plied on hardware with limited processing power, and the formed by solutions produced by our classiﬁcation system EoC SE is suitable for classiﬁcation systems running on Table 3. Handwritten digits results – mean values on 30 replications for SI and SE. PD classiﬁer MLP classiﬁer Zoning Operator Solution |S| etesta etestb |S| HN etesta etestb – Baseline 132 2.96% 6.83% 132 60 0.91% 2.89% SI 330 2.18% 5.47% 330 132 0.82% 2.51% Divider SE 24.67 2.00% 5.19% 14.13 – 0.76% 2.36% SI 242 3.46% 8.30% 242 134 1.14% 3.31% Hierarchical SE 13.7 3.09% 7.33% 22.86 – 0.99% 2.99% [3] P. V. W. Radtke, T. Wong and R. Sabourin, ”An Evaluation of Over-Fit Control Strategies for Multi-Objective Evolu- tionary Optimization”, Submitted to the 2006 International Joint Conference on Neural Networks, Vancouver, Canada, 2006. [4] L. S. Oliveira, R. Sabourin, F. Bortolozzi and C. Y. Suen, (a) (b) (c) (d) ”Automatic Recognition of Handwritten Numerical Strings: A Recognition and Veriﬁcation Strategy”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24(11): Figure 7. Zoning strategies. 1438–1454, 2002. [5] F. Kimura, S. Inoue, T. Wakabayashi, S. Tsuruoka and Y. Miyake, ”Handwritten Numeral Recognition using Autoas- Table 4. Handwritten uppercase letters results – sociative Neural Networks”, Proceedings of the Interna- mean values on 30 replications for SI and SE. tional Conference on Pattern Recognition, 1998, pp. 152– 155. PD classiﬁer MLP classiﬁer [6] L. I. Kuncheva and L. C. Jain, ”Design Classiﬁer Fusion Solution Systems by Genetic Algorithms”, IEEE Transactions on |S| etest |S| HN etest Evolutionary Computation, Vol. 4(4): 327–336, 2000. Baseline 132 9.20% 132 80 5.00% [7] J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, ”On Com- SI 352 7.19% 220 88 4.29% bining Classiﬁers”, IEEE Transactions on Pattern Analysis SE 14.41 6.43% 5.37 – 4.02% and Machine Intelligence, Vol. 20(3): 226–239, 1998. [8] D. Ruta and B. Gabrys, ”Classiﬁer Selection for Majority server computers. Two IFE zoning operators were tested Voting”, Information fusion, Vol. 6: 63–81, 2005. for feature extraction with handwritten digits, and the di- [9] P. V. W. Radtke, T. Wong and R. Sabourin, ”A Multi- vider zoning operator outperformed the hierarchical zon- Objective Memetic Algorithm for Intelligent Feature Ex- traction”, Proceedings of the Third International Confer- ing operator. The divider zoning operator was then used ence on Evolutionary Multi-Criterion Optimization, Gua- successfully with handwritten characters. Global valida- najuato, Mexico, 2005, pp 767–781. tion also improved classiﬁcation accuracy in comparison [10] K. Deb, S. Agrawal, A. Pratab and T. Meyarivan, ”A to the traditional selection method previously used. Fast Elitist Non-Dominated Sorting Genetic Algorithm for Future works will extend the optimization of single Multi-Objective Optimization: NSGA-II”, Proceedings of classiﬁer systems with feature subset selection, aiming to the Parallel Problem Solving from Nature VI Conference, Paris, France, 2000, pp 849–858. reduce representation complexity and classiﬁcation time. Other zoning operators will be considered as well, to al- [11] C. Emmanouilidis, A. Hunter and J. MacIntyre, ”A Mul- tiobjective Evolutionary Setting for Feature Selection and a low more ﬂexible deﬁnition of foci of attention. Commonality-Based Crossover Operator”, Proceedings of the 2000 Congress on Evolutionary Computation, La Jolla, Acknowledgments USA, 2000, pp 309–316. ´ [12] A. E. Eiben, R. Hinterdind and Z. Michalewicz, ”Pa- The ﬁrst author would like to acknowledge the CAPES rameter Control in Evolutionary Algorithms”, IEEE Trans- and the Brazilian government for supporting this research actions on Evolutionary Computation, Vol. 3(2):124–141, through scholarship grant BEX 2234/03-3. The other au- 1999. thors would like to acknowledge the NSERC (Canada) for [13] J. Milgram, R. Sabourin, M. Cheriet, ”Estimating Posterior supporting this research. Probabilities with Support Vector Machines: A Case Study on Isolated Handwritten Character Recognition”, submitted to the IEEE Transactions on Neural Networks, 2006. References [14] A. L. Koerich, ”Large Vocabulary Off-Line Handwritten [1] Z.-C. Li and C. Y. Suen, ”The partition-combination method ´ Word Recognition” (PhD thesis), Ecole de Technologie for recognition of handwritten characters”, Pattern Recog- e Sup´ rieure, Montreal, Canada, 314p, 2002. nition Letters, Vol. 21(8): 701–720, 2000. [15] I.-S. Oh and C. Y. Suen, ”Distance features for neural [2] P. V. W. Radtke, R. Sabourin and T. Wong, ”Intelligent Fea- network-based recognition of handwritten characters”, In- ture Extraction for Ensemble of Classiﬁers”, Proceedings ternational Journal on Document Analysis and Recognition, of the 8th International Conference on Document Analysis Vol. 1(2): 73–88, 1998. and Recognition, Seoul, South Korea, 2005, pp 866–870.