SITES: www.informatik.uni_trier.de www.ncst.ernet.in www.cse.iith.ac.in www.amse.modeling.org
HINDI CHARACTER RECOGNITION – A FUZZY BASED APPROACH
The main challenge in the handwritten character recognition involves a development of a method that can generate the description of the handwritten objects in a short period of time. The most efficient method for this character recognition is the Fuzzy logic. Our paper entitled “Hindi character recognition – A Fuzz y approach “ deals with the recognition of handwritten Hindi characters. It involves two main steps.
1. Preprocessing 2. Character recognition
The most tedious task associated with this character recognition using fuzz y logic is the formation of rule base th at would describe the characters to be recognized. The problem is complicated as different people would write the same character in completel y different way. This paper describes a method that is used to generate fuzz y value database that describes ch aracter written by different individuals.
INTRODUCTION: Handwritten characters differ from person to person. Thus when using traditional method like neural networks extensive training of the system is needed. So a system was developed that reco gnizes features in handwritten character using fuzzy logic In this paper entitled “Hindi character recognition – A fuzzy approach“, we have attempted to recognize the handwritten Hindi characters without using any formulae but with a simple algorithm. It is a combination of both image processing and fuzzy logic.
The reasons for the selection of the fuzzy logic were as follows: 1. It can be used to model human perception. 2. The mathematics that fuzz y logic requires is extremel y fundamental. 3. Any algorithm developed using fuzz y logic require relativel y simple and short calculations.
The relevance of preprocessing to character recognition lies in its abilit y to make raw input data palatable for appl ying the fuzzy values. In character recognition process, the preprocessing stage is image compression, which is necessary to reduce the handwritten character to a manageable size for the recognizer. It also makes the given character noise free. The amount of compress ion needed is application specific.
IMAGE COMPRESSION ALGORITHM:
The image compression algorithm used here is run length coding (RLC). The image (in bitmap format) is composed of a matrix of binary valued pixels. This is due to the fact that the input character is written in black on a white background. Hence the character pattern which is the area containing information is represented by the pixel value ‘1’ and the rest of the image by ‘0’. RLC is a technique used to reduce the size of a re peating string of characters. This repeating string is called a run. R LC can compress any type of data regardless of its information content, but the content of data to be compressed affects the compression methods, but it is easy to implement and is qu ick to execute. Run length encoding is supported by most bitmap file formats such TIFF, BMP and PCX.
DATA FLOW DIAGRAM FOR PREPROCESSING: Figure 1
Matrix Format 0’s & 1’s
Notation SD FE HC VC T
Description Scanned Document Feature Extraction Horizontal Compression Vertical Compression Trimming
Fuzz y rules are based on the character set and features we are considering. Here we have used 10 features and with those 10 features we could define the fuzzy Hindi grammar. The following are the features.
FEATURE NAMES F1
F5 F6 F7
With the help of above said features we could define any of the Hindi alphabet.
Consider the following examples .
The training mode consists of two main steps: 1. Identifying the crucial pattern 2. Forming the fuzzy database for the combination of crucial and confusional pattern.
1.IDENTIFYING THE CRUCIAL P ATTERN:
Crucial Pattern: These are the major and important feature of the characters which identifies them correctl y. Confusion Pattern: These are the features which may be common to some of the characters and may confuse us t o recognize them correctl y.
Figure 2 Original character Crucial feature Confusional feature
Here the crucial features are alone considered and the fuzz y rules is created onl y for those crucial features and the fuzz y values are specified . In some cases the characters are recognized usi ng the crucial features alone without considering the confusional features. For other cases too, this step reduces the overhead i.e. it searches the secondary fuzzy database which has those crucial fuzzy values alone.
2. FORMING THE FUZZY DATABASE:
This is the secondary database which contains the fuzz y values for both the crucial and confusional features of the characters .
Here we have shown the fuzz y database for two characters . We have to define similarl y for all the Hindi character set. Figure 3
Features Quadrants Q1 Q2 Q3 Q4
F3 F4 F5 F6 F7 F8
.9 .5 .9 .4 .4
Features Quadrants Q1 Q2 Q3 Q4
.4 .4 .1 .4 .6
FIELD NAME F1 – F10 Q1 – Q10
Features from 1 to 10
Quadrants from 1 to 4
Since handwritten characters differs from person to person, a standardized database cannot be always accepted. In such cases we move for an error value specification. For our fuzz y evaluation we have opted for a value of 0.3. This error value varies depending upon the individual’s fuzz y value calculation.
The following steps are to be performed for recognizing the characters. 1. Preprocessing (Image Compression) 2. Segregate th e character into segments 3. Extract the features. 4. Crucial features are extracted and compared with their rules. 5. Based on the above result comparison with the secondary database is done. 6. Calculate the error values. 7. Recognize the output based on the error valu e.
DATAFLOW DIAGRAM FOR CHARACTER RECOGNITION:
NOTATION DESCRIPTION SIS EF FRB Segregate into segments Extract features Fuzz y rule Base
Yet now there is no specific tools for character recognition which comprises both image processing and fuzzy approach that too without formulae. In our approach, the recognition rate could be from 60% to 70%. If this is extended to lex ical phase, the accuracy will increase and we could get the recognition rate up to 90%.This can be implemented for any other languages with different fuzz y value evaluation.