SCRIPT IDENTIFICATION USING DCT COEFFICIENTS-2

Document Sample
SCRIPT IDENTIFICATION USING DCT COEFFICIENTS-2 Powered By Docstoc
					         INTERNATIONAL and Multimedia (IJGM), ISSN 0976 – 6448(Print),
  International Journal of Graphics JOURNAL OF GRAPHICS AND
  ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
                            MULTIMEDIA (IJGM)

ISSN 0976 - 6448 (Print)
ISSN 0976 -6456 (Online)                                                    IJGM
Volume 4, Issue 1, January - April 2013, pp. 31-40
© IAEME: www.iaeme.com/ijgm.asp
Journal Impact Factor (2013): 4.1089 (Calculated by GISI)
                                                                        ©IAEME
www.jifactor.com




            SCRIPT IDENTIFICATION USING DCT COEFFICIENTS

                            M. M. Kodabagi1, Hemavati C. Purad2
       1
         Department of Computer Science and Engineering, Basaveshwar Engineering College,
                               Bagalkot-587102, Karnataka, India
       2
         Department of Computer Science and Engineering, Tontadarya College of Engineering,
                                Gadag-582101, Karnataka, India



  ABSTRACT

           Automated systems for understanding low resolution images of display boards are
  facilitating several new applications such as blind assistants, tour guide systems, location
  aware systems and many more. Script identification at word level is one of the very important
  pre-processing steps for development of such systems prior to further image analysis. In this
  paper, a new approach for word level script identification of text in low resolution images of
  display boards is presented. The proposed methodology uses horizontal run statistics and
  texture features for distinguishing 3 Indian scripts namely; Hindi, Kannada and English. The
  method computes discrete cosine transform based texture features from input word image and
  uses newly defined threshold based discriminant function to identify the script class. The
  methodology is evaluated on 800 low resolution word images of display boards. The
  proposed method is robust and insensitive to the variations in size and style of font, number
  of characters, thickness and spacing between characters, noise, and other degradations. The
  proposed method achieves an overall identification accuracy of 85.44% and individual
  identification accuracy of 100% for Hindi Script, 70.33% for Kannada Script and 86% for
  English.

  1.       INTRODUCTION

         In recent years, the camera embedded hand held systems such as smart mobile
  phones, tablets and PDA’s are being widely used and they increasingly exhibit higher
  computing and communication capabilities. These devices with internet access facilities are
  being used for wide variety of purposes such as information seeking, mobile commerce and
  other business and enterprise applications. One such application is to understand written text

                                               31
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

on display boards in an unknown environment. People who move across different places in
the world for field work and business find it difficult to understand written text on display
boards particularly in foreign environment. This is especially true in countries like India,
which are multilingual. Hence there is a need for a gadget that helps people to understand
display boards by detecting and translating written matter while providing localized
information.
         The written matter on display boards/name boards provides important information for
the needs and safety of people, and may be written in unknown languages. The written matter
can be street names, restaurant names, building names, company names, traffic directions,
warning signs etc. Researchers have focused their attention on development of techniques for
understanding written text on such display boards. There is a spurt of activity in the
development of web based intelligent hand held systems for such applications.
         In the reported works [1-10] on intelligent systems for hand held devices, not many
works pertain to understanding of written text on display boards. Therefore, scope exists for
exploring such possibilities. The text understanding involves several processing steps; text
detection and extraction, preprocessing for line, word and character separation, script
identification, text recognition and language translation. In the Indian context, the written text
on display board may contain multilingual information. Therefore, recognition and language
translation tasks require script identification at word level. Hence, script identification at
word level is one of the very important processing steps for development of such systems
prior to further analysis.
         The script identification of text in low resolution images of display boards is a
difficult and challenging problem due to various issues such as font size, style, and spacing
between characters, skew and other degradations. The reported works on script identification
employ a number of different approaches, which are categorized into local and global
methods. The local approaches use connected component analysis process for determining
the script of text. In contrast, the global approaches measure the properties of a region/block
of text and give sufficient characterization of the underlying script. Hence global approaches,
such as texture analysis is a good choice for solving such a problem.
         The task of script identification of text in low resolution image of display board is an
important step whose output will be used by the later processing steps of display board
understanding system. In this paper, a new approach for word level script identification of
text in low resolution images of display boards is presented. The proposed methodology uses
horizontal run statistics and texture features for distinguishing 3 Indian scripts namely; Hindi,
Kannada and English. The method computes discrete cosine transform (DCT) based texture
features from input word image and uses newly defined threshold based discriminant function
to identify the script class. The proposed method is robust and insensitive to the variations in
size and style of font, number of characters, thickness and spacing between characters, noise,
and other degradations. The proposed method achieves an overall identification accuracy of
85.44% and individual identification accuracy of 100% for Hindi Script, 70.33% for Kannada
Script and 86% for English Script.
         The rest of the paper is organized as follows; the detailed survey related to script
identification from images is described in Section 2. The proposed method is presented in
Section 3. The experimental results and discussions are given in Section 4. Section 5
concludes the work and lists future directions of the work.




                                               32
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

2. RELATED WORKS

         A substantial amount of work has gone into the research related to script identification
from printed document images. Some of the related works are summarized in the following.
         The script identification of low resolution image of display board is a necessary step
for development of various other tasks of display board understanding system. A number of
methods for script identification have been published in recent years and are categorized into
local and global approaches. The local approaches perform connected component analysis
and use statistic based features for script identification. Few such methods are summarized in
the following; An approach for determining the script and language of document images is
proposed in [11]. Initially, the algorithm determines connected components and locates
upward concavities in the connected components. It then classifies the script into two broad
classes Han-based (Chinese, Japanese and Korean) and Latin-based (English, French,
German and Russian) languages. The Han-based languages are later differentiated using
statistics of optical densities of connected components. And Latin-based languages are
identified based on the most frequently occurring word shapes characteristics.
         An automatic technique for the identification of printed Roman, Chinese, Arabic,
devnagari und Bangla text lines from single document image is found in [12]. The method
uses headline feature to separate Devanagari and Bangla script line into one group and other
script lines (English, Chinese and Arabic) are separated into other group. The technique
obtains zone wise features to identify Devanagari and Bangla scripts. Further, vertical run
length statistics and water reservoir features are used to classify Chinese, English and Arabic
scripts. The experimental results were conducted on 25000 text lines and identification rates
of 97.32%, 98.65%, 97.53%, 96.02% and 97.12% for English, Chinese, Arabic, Devnagari
and Bangla scripts respectively are reported. However, the approach reports higher error rates
for short text lines containing a word with few characters.
         The method for script and language identification of noisy and degraded document
images is employed in [13]. The method identifies script based on document vectorization
technique that converts each image into vertical cut vector and character extremum points
that characterizes the shape and frequency of contained character or word images. The
method is tolerant to the variation in text fonts and styles, noise, and various types of
document degradation. For each script or language under study, a script or language template
is first constructed through a training process. Scripts and languages of document images are
then determined according to the distances between converted document vectors and the pre-
constructed script and language templates. Experimental results show that the proposed
technique is accurate, easy for extension, and tolerant to noise and various types of document
degradation. The technique proposes further investigation for the images containing
perspective and curvature distortion and skew angle.
         In contrast, the global approaches measure the texture of a region of text to identify
the underlying script. Some of the texture based approaches are detailed below; The method
describing effectiveness of rotation invariant texture features for automatic script
identification is found in [14]. The method computes features from text blocks using multi-
channel gabor filters and constructs a representative feature vector for each language. Then,
Euclidian distance classifier is used for script identification of 6 languages (Chinese, English,
Greek, Russian, Persian, and Malayalam). An average classification accuracy of 96.7% is
reported. The sensitivity of texture analysis to different fonts is also discussed.


                                               33
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

        A technique that investigates use of texture analysis for script and language
identification from document images is presented in [15]. The method obtains a uniform
block of text from document image. Multiple channel gabor filters and gray level coocurrence
matrices (GLCMs) are used to extract texture features. Then K-NN classifier is used to
classify seven languages; Chinese, English, Greek, Korean, Malyalam, Persian and Russian.
The test results showed that gabor filters proved to be more accurate than the GLCMs,
producing results which are over 95% accurate.
        The texture analysis technique for script identification is described in [16]. The
method conducts evaluation of commonly used texture features for the purpose of script
identification and provides a qualitative measure of which features are most appropriate for
this task. The texture features include GLCM, Gabor filter bank energies, and a number of
wavelet energy features. The experimental results have shown that the wavelet log co-
ocurrence features outperform other techniques giving lowest error rate of 1%. The
effectiveness of features extracted from co-occurrence histograms of wavelet decomposed
images and KNN classifier for script identification of 7 Indian languages are discussed in
[17]. Many recent works on script identification are reported in [18-19].
        Out of many works cited in the literature, it is found that few limitations still exist
with the reported script and language identification methods. First, the performance of local
approaches depends upon correct segmentation of connected components. Consequently, they
are very sensitive to the segmentation error resulting from noise and various types of
document degradation. Second, the global techniques need more time to measure the texture
of a region. But, these methods are of good choice for analysis of low resolution images of
display boards. Hence, use of textural features is further investigated in the proposed work.
        It is also noticed that, the global techniques, operates on predefined size text blocks
containing matter pertaining to same script for determination of script and language of
underlying document. But this is not the case with written text on display boards in the Indian
scenario, as text may contain multilingual information. Therefore, it is necessary to identify
script and language at word level which is essential for later processing steps such as text
understanding and language translation. The task of script identification at word level is
difficult and challenging, because distinguishing properties are to be obtained from a small
region containing text of variable size and font. Therefore more research is desirable/needed
to model texture of small region containing text of variable size and font for better
characterization and classification with reduced computational complexity.           Hence, the
current work is undertaken to identify new properties of texture using discrete cosine
transform coefficients for script identification of low resolution images of display boards.
The detailed description of the proposed methodology is given in the next section.

3.     PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATION

        The proposed methodology uses DCT based texture feature for identification of the
script class of low resolution display board images. The methodology comprises three phases;
Preprocessing, Extraction of DCT Energy Features and Script Class Identification. The block
diagram of proposed model is given in Fig. 1. The detailed description of each processing
step is presented in the following subsections.




                                              34
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

                                 Test word image of display board


                   Preprocessing for Binarization and Bounding Box
                                     Generation



                 Computational Strategy for Hindi Script Identification
                                                                          Hindi
                          Test Word Image     Not Satisfied
                                                                           (satisfied)

              Compute Discrete Cosine Transform Energy Features




                             Threshold Based Classification

                                              Word Image Classified as
                                              Kannada/English


                         Fig. 1: Block diagram of proposed method

3.1 Preprocessing
        The works reported in literature preprocess document image to obtain uniform sized
text block, detect and correct skew, and remove uneven spacing between lines, word and
characters to obtain optimal texture features for improved classification rate. Because, the
presence of noise, skew and uneven spacing and other degradations significantly affect
texture features leading to higher classification errors. But the preprocessing task is difficult,
computationally expensive and may not be suitable for applications that process small of
amount of text containing few lines. Hence, in this work, an attempt is made to evaluate
performance of new texture features extracted directly from variable sized word images
without removal of noise, skew and uneven spacing and other degradations. However, the
processing is done to binarize the image and generate bounding box around it.

3.3 Extraction of DCT Energy Features
       In this phase, Dimensional Discrete Cosine Transformation is applied on the
processed image to obtain DCT matrix d of size MxN, and energy features E1, E2, and E3 are
computed on the chosen regions of DCT coefficients as depicted in equations (1) to (3).



                      ……………………………. ……………………………………. (1)



                     …………………………………………………………………... (2)


                                                35
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME



                                                                                ……. ...(3)

Where,
  • d is a DCT matrix of dimension MXN obtained after applying DCT on input image.
  • Mid1 and Mid2 are column and row numbers used during computation of energy
       feature E3.

The Fig. 2. shows regions chosen to calculate energy features E1, E2 and E3.




Fig. 2. DCT matrix and 3 chosen regions for determining energy features E1, E2 and E3

3.4. Script Classification Identification
        The script identification task consists of 2 processing stages. In stage1, the test word
image is processed to determine whether it belongs to Hindi Script. Otherwise, stage 2 uses
threshold based classification to determine whether it belongs to Kannada or English Script.
The functionality in both stages is described in the following sections;

3.4.1 Computational strategy for Hindi Script Identification
        In this stage, horizontal run statistics of test word image are used to determine
whether the written word in display board image belongs to Hindi or other scripts. Initially,
the horizontal runs of length greater than 6 are computed for every row of word image and
are stored into a feature vector. The vector records row number and run length count of all
runs for all rows. These run length values are thresholded to classify word image into two
classes’ w1 and w2. Where, w1 corresponds to Hindi script and w2 corresponds to other
scripts category. The classified word image into class w2 is further processed as in stage 2 to
determine whether it belongs to Kannada or English Script.


                                              36
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

3.4.2 Threshold based classification
             The threshold classification phase of the proposed model uses discriminant
function to classify English and Kannada scripts. The discriminant function use thresholds to
determine the script class. The thresholds are heuristic values, chosen empirically. The
classification rules using discriminant function are stated below.
Algorithm 3.4.1: Threshold Based Classification
             Input:              E1, E2 and E3
             Output:             Script Class: English or Kannada
Begin
                  if     E1>=0.1000 &E1<=0.4000 & E2>=0.0200 & E2<=0.2000
                               Print “Script is ENGLISH”
                 else if E1>=0.0300 & E1<0.1000 & E2>=0.0100 & E2<=0.0850
                         Print “Script is KANNADA”
                 end
End           //end of begin

4.      EXPERIMENTAL RESULTS AND DISCUSSION

        The proposed methodology for script identification has been evaluated for low
resolution word images of display boards with varying font size and style. The experimental
tests were conducted for word images of 3 scripts; Hindi, Kannada and English and results
were highly encouraging. The results of processing several display board word images
dealing with various issues and the overall performance of the system are reported in section
4.1.

4.1 Script Identification: An experimental analysis dealing with various issues
        The effectiveness of proposed methodology for script identification using DCT
features has been evaluated for 800 low resolution word images of display boards. The
images were captured from display boards of government offices in India. The image
database consists of 300 Kannada, 300 English, and 200 Hindi script word images of varying
resolutions. The images are characterized by variable number of characters, variable font size
and style, uneven thickness and spacing between characters, minimal information context,
small skew, noise and other degradations.
        The proposed methodology has produced good results for low resolution word images
containing text of different size, font, and alignment with varying background. The approach
also identifies script of small skewed text regions. Hence, the proposed method achieves an
overall identification accuracy of 85.44% and individual identification accuracy of 100% for
Hindi Script, 70.33% for Kannada Script and 86% for English Script. A closer examination
of results revealed that misclassifications arise due to minimal information context, noise and
larger skew, which affect the texture of region of text and performance of the texture based
approach. The correctly classified images dealing with various issues are described in table 1.
And the overall performance of the system is reported in table 2.




                                              37
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

   TABLE 1: The performance of the system of processing different images dealing with
                                    various issues

     Input Sample Image               Image                        Description
                                    Resolution
                                     33 x 101         Script identification of an image
                                                      having      minimal      information
                                                      context.
                                     76 x 127         The effectiveness of the method in
                                                      processing degraded Kannada
                                                      Image containing characters of
                                                      uneven thickness, lighting and
                                                      spacing between characters, noise,
                                                      small skew and other degradations.
                                     36 x 123         The robustness of the method in
                                                      identifying script of an image
                                                      containing 7 characters and
                                                      degraded background.
                                     96 x 351         The texture of an image having
                                                      different font style and large font
                                                      size is correctly modeled as
                                                      Kannada Script
                                    132 x 451         The method processes a larger size
                                                      blurred image with small skew and
                                                      classifies as English text.
                                     78 x 151         The robustness of the method in
                                                      processing a degraded unusual font
                                                      image having English text.
                                     92 x 190         The effectiveness of the method in
                                                      correctly classifying part of word
                                                      text.



                         TABLE 2: Overall System Performance

            Number of     Classified as   Classified as   Classified   Accuracy
           Word Images     Kannada          English       as Hindi

            200 Hindi           -                -           200        100%
           300 Kannada        211               89            -         70.33%
           300 English          -               258           -          86%




                                           38
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

5.     CONCLUSION

        In this paper, a approach for word level script identification of low resolution images
of display boards employing DCT energy features is proposed. The method identifies script
of word image without applying techniques for removal of noise and other degradations. This
aspect of work makes it more robust and efficient. The proposed set of texture features better
model/organize the texture of a region of text and thus provide sufficient characterization.
The threshold based classification function based on heuristics is found to be robust and
efficient for improving classification accuracy. The testing of methodology for 800 low
resolution word images containing text of different size, font, and alignment with varying
background has yielded an average classification accuracy of 85.44%. The system is found to
be resilient to the presence of small skew and degradations. This is a significant result, which
makes this work suitable for text understanding and translation systems especially in the
Indian context. The method can be extended for script identification of images belonging to
other scripts. And further investigations can focus on language identification of word images.

REFERENCES

[1]   Abowd Gregory D. Christopher G. Atkeson, Jason Hong, Sue Long, Rob Kooper, and
      Mike Pinkerton, 1997, “CyberGuide: A mobile context-aware tour guide”, Wireless
      Networks, 3(5): pp.421-433.
[2]   Natalia Marmasse and Chris Schamandt, 2000, “Location aware information delivery
      with comMotion”, In Proceedings of Conference on Human Factors in Computing
      Systems, pp.157-171.
[3]   Tollmar K. Yeh T. and Darrell T., 2004, “IDeixis - Image-Based Deixis for Finding
      Location-Based Information”, In Proceedings of Conference on Human Factors in
      Computing Systems (CHI’04), pp.781-782.
[4]   Gillian Leetch, Dr. Eleni Mangina, 2005, “A Multi-Agent System to Stream Multimedia
      to Handheld Devices”, Proceedings of the Sixth International Conference on
      Computational Intelligence and Multimedia Applications (ICCIMA’05).
[5]   Wichian Premchaiswadi, 2009, “A mobile Image search for Tourist Information
      System”, Proceedings of 9th international conference on SIGNAL PROCESSING,
      COMPUTATIONAL GEOMETRY and ARTIFICIAL VISION, pp.62-67.
[6]   Ma Chang-jie, Fang Jin-yun, 2008, “Location Based Mobile Tour Guide Services
      Towards Digital Dunhaung”, International archives of phtotgrammtery, Remote
      Sensing and Spatial Information Sciences, Vol. XXXVII, Part B4, Beijing.
[7]   Shih-Hung Wu, Min-Xiang Li, Ping-che Yanga, Tsun Kub, 2010, “Ubiquitous
      Wikipedia on Handheld Device for Mobile Learning”, 6th IEEE International
      Conference on Wireless, Mobile, and Ubiquitous Technologies in Education, pp. 228-
      230.
[8]   Tom yeh, Kristen Grauman, and K. Tollmar., 2005, “A picture is worth a thousand
      keywords: image-based object search on a mobile platform”, In Proceedings of
      Conference on Human Factors in Computing Systems, pp.2025-2028.
[9]   Fan X. Xie X. Li Z. Li M. and Ma. 2005, “Photo-to-search: using multimodal queries to
      search web from mobile phones”, In proceedings of 7th ACM SIGMM international
      workshop on multimedia information retrieval.


                                              39
International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME

[10] Lim Joo Hwee, Jean Pierre Chevallet and Sihem Nouarah Merah, 2005, “SnapToTell:
     Ubiquitous information access from camera”, Mobile human computer interaction with
     mobile devices and services, Glasgow, Scotland.
[11] Lu Shijian, Chew Lim Tan, 2008, “Script and Language Identification in Noisy and
     Degraded Document Images”, IEEE transactions on pattern analysis and machine
     intelligence, 30(1), january.
[12] Linlin Li; Chew Lim Tan; , 2008, "Script identification of camera-based images”, ICPR
     2008. 19th International Conference on Pattern Recognition, pp.1-4, 8-11 Dec. 2008.
[13] T.N. Tan, 1998, “Rotation Invariant Texture Features and Their Use in Automatic
     Script Identification,” IEEE Trans. Pattern Analysis and Machine Intelligence, 20(7),
     pp. 751-756..
[14] G.S. Peake and T.N. Tan, 1997, “Script and Language Identification from Document
     Images,” Proc. Eighth British Mach. Vision Conf., vol. 2, pp. 230-233, Sept.
[15] A. Busch, W.W. Boles, and S. Sridharan, 2005, “Texture for Script Identification,”
     IEEE Trans. Pattern Analysis and Machine Intelligence, 27(11), pp. 1720-1732.
[16] Hiremath P. S. et al., 2010, “Script identification in a handwritten document image
     using texture features”, IEEE 2nd International Advance Computing
     Conference,pp.110-114, 2010.
[17] Li Yang; Xuelong Hu; Jun Pan, 2008, "Approaches to image retrieval using fuzzy set
     theory", International Conference on Neural Networks and Signal Processing, pp.422-
     425, 7-11 June 2008.
[18] S. A. Angadi, M. M. Kodabagi, “Word Level Script Identification of Text in Low
     Resolution Images of Display Boards using wavelet features ”, Proceedings of
     International Conference on Advances in Computing (ICADC 2012),AISC 174, pp.209-
     220, Springer India 2012.
[19] M. M. Kodabagi and S. R. Karjol, “Script Identification from Printed Document Images
     using Statistical Features”, International Journal of Computer Engineering &
     Technology (IJCET), Volume 4, Issue 2, 2013, pp. 607 - 622, ISSN Print: 0976 – 6367,
     ISSN Online: 0976 – 6375.
[20] P. Prasanth Babu, L.Rangaiah and D.Maruthi Kumar, “Comparison and Improvement
     of Image Compression using Dct, Dwt & Huffman Encoding Techniques”, International
     Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013,
     pp. 54 - 60, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.




                                           40

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:6/15/2013
language:
pages:10