Extraction of Information from Images using Dewrapping Techniques

Document Sample
Extraction of Information from Images using Dewrapping Techniques Powered By Docstoc
					                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                               Vol. 8, No.8, 2010

        Extraction of Information from Images using Dewrapping
    Khalid Nazim S. A.,                                                   Dr. M.B. Sanjay Pande,
    Research Scholar,                                                     Professor and Head,
    Singhania University,                                                 Department of Computer
    Rajasthan, India.                                                     Science & Engineering,
                                                                          VVIET, Mysore, India.

Abstract-An         image       containing    textual         Notepad     or    MS    word    document.      The

information is called a document image. The                   experimental results were corroborated on

textual information in document images is                     various objects of database.

useful in areas like vehicle number plate                     Keywords: Dewrapping, Template Database,

reading, passport reading and cargo container                 Text Extraction.

reading and so on. Thus extracting useful                               I. INTRODUCTION
textual information in the document image                     An image         may be defined as a two
plays an important role in many applications.                 dimensional function f(x, y), where x and y are
One of the major challenges in camera                         spatial co-ordinates and the amplitude of f at
document analysis is to deal with the wrap and                any pair of co-ordinates(x, y) is the intensity or
perspective      distortions.   In   spite   of   the         gray level of the image at that point. When x, y
prevalence of dewrapping techniques, there is                 and the intensity values of f are all finite, the
no standard efficient           algorithm    for the          digital image is composed of finite number of
performance evaluation that concentrates on                   elements where each has a particular location
visualization.                                                and value. These elements are called picture
Wrapping is a common appearance document                      elements, image elements, pels and pixels
image before recognition. In order to capture                 [7][14].Image processing can be broadly
the document images a mobile camera of                        categorized into two classes. The first category
2megapixel resolution is used. A database is                  takes images as input and gives the images as
developed with variations in background, size                 output. The other category takes images as
and colour along with wrapped images, blurred                 input and gives the attributes of images as
and clean images. This database will be                       output. The entire processing can be listed as:
explored and text extraction from those                       (i).   Image       enhancement-     It    involves
document images is performed. In case of                      manipulating an image so that the result is
wrapped images no efficient dewrapping                        more suitable than original for processing.
techniques have been implemented till date.                   (ii). Image restoration- It involves improving
Thus extracting the text from the wrapped                     the appearance of an image based on
images is done by maintaining a suitable                      mathematical or probabilistic model of image
template database. Further, the extracted text                degradation.
from the wrapped or other document images                     (iii). Colour image processing- Colour can be
will be converted into an editable form such as               used as factor or basis for extracting features

                                                        101                               http://sites.google.com/site/ijcsis/
                                                                                          ISSN 1947-5500
                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                               Vol. 8, No.8, 2010

                                                                Although they cannot replace scanners, they are
of interest in an image.                                        small, light, easily integrated with various networks

(iv). Compression- This reduces the storage                     and more suitable for many document capturing
                                                                tasks in less constrained environments. These
required to save an image or bandwidth to
                                                                advantages are leading to a natural extension of the
transmit an image.
                                                                document processing community where cameras are
(v). Morphological image processing- It deals
                                                                used to image hardcopy documents or natural scenes
with the tools for extracting image components                  containing textual content [12].
that are useful in the representation and                                  Cameras in an uncontrolled environment
description of shape.                                           have triggered a lot of interest in the research
(vi).      Segmentation-     It   deals    with     the         community over the last few years and many
partitioning of an image into constituent parts                 approaches have been proposed. However, there has

namely autonomous and rugged segmentation                       been no satisfactory work presented for dewrapping
                                                                techniques so far. Wrapping is a               common
[7] [4].
                                                                appearance in camera captured document images
            A document is a bounded physical or
                                                                [13]. It is the primary factor that makes such kind of
digital representation of a body of information
                                                                document images hard to be recognized. Therefore
with       capacity   (and    usually     intent)    to         it is necessary to restore wrapped document image
communicate. Document image processing and                      before recognition. The documents captured from
understanding has been extensively studied                      cameras often suffer from various distortions, like
over the past 40 years that has carved a niche out              non-planar (wrapped) shape, uneven light shading,
of the more general problem of computer vision                  motion blur, perspective distortion, under-exposure
because of its pseudo binary nature and the                     and over-exposure. But current Optical Character
regularity of the patterns used as a “visual”                   Recognition (OCR) systems do not deal with these
representation of language. In the early 1960s,                 distortions when applied directly to wrapped
optical character recognition was taken as one of the           camera-captured document images.
first clear applications of pattern recognition and                      Images when captured will suffer from
today, for some simple tasks with clean and well-               distortions such as noise, blur and so on. In order to
formed data document analysis is viewed as a                    perform operations on document the distortions
solved problem. Unfortunately, these simple tasks               have to be removed. Noise removal and blur
do not represent the most common needs of the                   removal is done using filters. There are several
users of document image analysis. The challenges                types of filters available among them the Gaussian
of complex content and layout, noisy data and                   filter is the most efficient filter. Gaussian filters are
variations in font and style presentation keep the              a class of linear smoothing filters with the weights
field active.                                                   chosen according to the shape of the Gaussian
        Traditionally, document images are scanned              function. The Gaussian smoothing filter is a very
from pseudo binary hardcopy paper manuscripts                   good filter for removing the noise drawn from a
with a flatbed, sheet-fed, or mounted imaging                   normal     distribution.   Gaussian     functions    are
device. Recently, the community has seen an                     rotationally symmetric in two dimensions i.e. the
increased interest in adapting digital cameras to               amount of smoothing performed by the filter is the
tasks related to document image analysis. Digital               same in all directions. In image sharpening the goal
camcorders, digital cameras, PCcams, PDA’s                      is to highlight fine details in an image. That is, to
(personal digital assistant) and even cell phone                enhance details that have been blurred. Fine details
cameras are becoming increasingly popular and they              in the frequency domain correspond to high
have shown potential as alternative imaging devices.

                                                          102                                  http://sites.google.com/site/ijcsis/
                                                                                               ISSN 1947-5500
                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                              Vol. 8, No.8, 2010

frequencies, thus the use of high-pass filters for           or semantic content but the text within an
image sharpening [3] [10].                                   image is of more interest as it describes the
         Text      detection     refers   to    the          contents of the image. It can be easily
determination of the presence of text in a given             extracted compared to the semantic contents. A
frame (normally text detection is used for a                 variety of approaches to Text Information
sequence of images). Text localization is the                Extraction(TIE) from images and videos have
process of determining the location of text in               been    proposed    for     specific    applications
the image and generating bounding boxes                      including page segmentation, address block
around the text [2][7]. Text tracking is                     location, number plate location and content
performed to reduce the processing time for                  based image or video indexing. Text extraction
text localization and to maintain the integrity              system has various applications such as
of position across adjacent frames. Although                 portable       computers,       content       based
the precise location of text in an image can be              video/document       coding,        license    plate
indicated by bounding boxes, the text still                  recognition and video content analysis. To
needs to be segmented from the background to                 enhance performance of text information
facilitate its recognition. This means that the              system it is advantageous to merge various
extracted text image has to be converted to a                sources as proposed by Keechul et.al.
binary image and enhanced image is then used                            Portable digital cameras are now used
for text extraction. Text extraction is the stage            for digitalizing documents and as a fast way to
where the text components are segmented from                 acquire document images taking advantage of
the background, enhancement of the extracted                 their low weight, portability, low cost, small
text components is required because the text                 dimensions etc. Several specific problems arise
region usually has a low-resolution and is                   in this digitization process. Rafael et.al,
prone to noise.                                              addressed the inherent problems of document
         II. LITERATURE SURVEY                               image digitization using portable camera.

Jian Liang, et.al. proposed a method that is                 Their work was based on an issue that

focused on analyzing text and documents                      documents make use of translucent paper in

captured by a camera which is known as                       such a way that back-to-front interference was

camera-based analysis of text and documents.                 not observed. Also when a document image is

Camera based document analysis is more                       taken from the camera the strobe flash causes

flexible to provide capability to capture                    an uneven illumination of the document.

information       for   visual    communication,             Marginal noise, not only drops the quality of

indexing, reading graphical text in web pages.               the    resulting   image      for      CRT    screen

In camera       based analysis      of text    and           visualization, but also consumes space for

documents, sources of images used are paper                  storage and large amounts of toner for printing,

based, printed handwritten documents, journal                which alters the segmentation algorithm of the

etc. Scanner based process provides good                     optical character recognition and thus affects

reference and starting point, but they cannot be             the response obtained in the number of

used directly on camera-captured images.                     characters and words correctly transcribed. It

         Content in an image can be perceptual               assumes that the background may be of any
                                                             colour or texture, provided that there is a

                                                      103                                 http://sites.google.com/site/ijcsis/
                                                                                          ISSN 1947-5500
                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                Vol. 8, No.8, 2010

colour difference of at least 32 levels between                direction of curl and variable line spacing. The
the image background and at least one of the                   optical character recognition error rate from
RGB components of the most frequent colour                     wrapped to dewrapped documents was reduced
of the document background (paper). Two                        from 5.15% to 1.92% for the dataset collected.
different experiments were set to evaluate lens                The     approaches      for    document        image
distortion. The first one is visual inspection by              dewrapping can be divided into two main
humans, while the second one is based on                       categories based on the document capturing
analyzing the effect of the compensation of                    methodology:      one    in    which     specialized
lens distortion in optical character recognition.              hardware arrangement like stereo camera is
          In      monochromatic        images     no           required for 3D shape reconstruction of
iconographic or artistic value saves storage                   wrapped document and the other approach in
space and bandwidth in network transmission.                   which dewrapping method is designed for
The      binarized     preprocessed      documents             image that is captured using single hand-held
lowered the number of character substitution                   camera in an uncontrolled environment.
and it has also raised the incidence of insertion                         Faisal Shafait presents an overview of
errors due to spurious noise inserted in the                   the approaches based on evaluation measure
image.     This      paper   presents     ways     to          and the dataset used. The methods used are
significantly improve the visualization of                     continuous skeletal image representation for
images whenever displayed on screen of CRT                     document image dewrapping, Segmentation
or LCD or printed.                                             based       document      image        dewrapping,
          Celine and Bernard presented the                     Coordinate Transform Model (CTM) and
solution which is independent of scenes,                       document rectification for book dewrapping.
colours, lighting and all various conditions.                  Dewrapping of documents captured from
Their algorithm was based on multi-hypothesis                  hand-held cameras has triggered a lot of
text extraction.                                               interest and thus many approaches have been
          Yu      Zhanga     et.al.   proposed    an           proposed to achieve that. However, there has
algorithm based on binary document images                      been no comparative evaluation of different
which considers the horizontal text that is                    dewrapping techniques so far. A dataset of 102
mostly present in both Arabic and Chinese                      documents captured with a hand-held camera
characters. Wrapped document images should                     were created and made freely available online.
have text lines with a main direction of                       A text-line, text-zone, and ASCII text ground-
horizon. Thus several pairs of key points when                 truth for the documents in this dataset were
mapped using Thin Plate splines(TPS) will                      made. The results showed that the CTM
restore the original image based on an                         presented by Wenxin Li et al. performed better
interpolation algorithm[5][8].                                 than the other two methods, but the difference
          Syed Saqib Bukhari et.al. used a                     was not statistically significant. Overall, all
novel dewrapping approach based on curled                      participating methods worked well and the
text-lines information, which was extracted                    mean edit distance was less than 1% for each
using ridges based modified active contour                     of them.
model (coupled snakes). This dewrapping                                   Based on the literature survey our
technique is less sensitive with different                     main aim is to recover document images.

                                                        104                                  http://sites.google.com/site/ijcsis/
                                                                                             ISSN 1947-5500
                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                Vol. 8, No.8, 2010

Hence the proposed technique was applied on                     this information involves detection, localization,
gray scale document images and is based on                      tracking, extraction, enhancement and recognition
several    distinct    steps   like    an      adaptive         of the text from a given image. The data samples
document image binarization, a text line and                    for text extraction were captured as image from a
word      detection,   a   draft      binary     image          2Megapixel mobile camera with a resolution of
dewrapping based on word rotation and                           (960 X 1280). The size of each image captured
shifting and finally a complete restoration of                  varies between (60 – 80KB). The variations of text
the original gray scale wrapped image guided                    present in the images captured will be due to
by the binary dewrapping.             The problems              variations in size, style, orientation and alignment
encountered are background removal, skew                        followed with a low image contrast and complex
often found in the image in relation to the                     background.
photograph axes, as documents have no fixed                              The images that were captured are
mechanical support in the document image.                       manually classified/ categorized into blur,
A. DATA SAMPLE DESCRIPTION                                      clean and wrapped images as shown in Fig 1.
Text data present in images and video contain
useful information for automatic annotation,
indexing and structuring of images. Extraction of

             Blur Images                           Clean Images                       Wrapped images

                           Figure 1: Blur Images, Clean Images and Wrapped images

           III. IMPLEMENTATION                                  text extraction phase, the text in the document
                                                                image is detected, localized and finally extracted
The process is divided into three main phases
                                                                into editable form.
namely preprocessing phase, dewrapping phase
                                                                1. Pre-processing Phase: The experimental
and the text extraction phase. In preprocessing
                                                                setup for capturing an image containing text
phase, the quality of an image is enhanced. In
                                                                requires a 2Megapixel mobile camera which is
dewrapping phase, the wrapped document images
                                                                placed at a standard distance. The measuring
(the   images     which    are     captured from the
                                                                scale is used to measure the distance of an
cylindrical objects surface) are dewrapped. In the

                                                          105                              http://sites.google.com/site/ijcsis/
                                                                                           ISSN 1947-5500
                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                               Vol. 8, No.8, 2010

image from the mobile camera. Further, the                    text boundaries. Towards the second step, a fine
intensity of light is also varied. The images are             dewrapping is achieved based on words detected.
captured in a controlled environment at the                   All words pose as normalized guide lines by the
distances      of   10cm,     13cm     and    15cm            lower and upper word baselines.
respectively.                                                 3. Text extraction Phase: Text data present in
1.1 Preprocessing with Different Image                        images and video contains useful information for
Documents: Preprocessing involves several                     automatic annotation, indexing and structuring of
steps which are presented in the following                    images. Extraction of this information involves
section. In preprocessing we tend to remove                   detection,     localization,     tracking,     extraction,
noise, blur operation and sharpening as shown                 enhancement and recognition of the text from a
in Figure 2.                                                  given image.
2. Dewrapping Phase: Dewrapping is a two-step                 However variations of text due to differences in
approach at the first step, a coarse dewrapping is            size, style, orientation and alignment as well as low
accomplished with the help of a transformation                image contrast and complex background make the
model that maps the projection of a curved surface            problem of automatic text extraction extremely
to a 2D rectangular area. The projection of the               challenging.
curved surface is delimited by the two curved lines
which fits the top and bottom text lines along with
the two straight lines that fit to the left and right

                                                Pre-processing Phases
Converting the input RGB        Converting gray image to       Blur removal:                   Sharpening:
image into grayscale image:     binary image:
                                                               h = fspecial(type)              h = fspecial('unsharp',
 I = rgb2gray(RGB)              BW = im2bw(I, level)
                                                               B = filter(A,H)

                                      Figure 2: Block diagram of preprocessing

         V. ALGORITHM
Stage 1: Convert the input RGB image into                     Stage 2: Convert grayscale image into binary
grayscale image:                                              image:
    (The RGB values are normalized to a single                          Stage 2.1: BW = im2bw (I, level)
gray scale value. Grayscale images are distinct                         The output image BW replaces all pixels
from one-bit black-and-white images, which in the             in the input image with luminance greater than
context of computer imaging are images with only              level with the value 1 (white) and replaces all other
two colours black, and white [bi-level images]).              pixels with the value 0 (black).

                                                       106                                   http://sites.google.com/site/ijcsis/
                                                                                             ISSN 1947-5500
                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                   Vol. 8, No.8, 2010

Stage 3: Blur removal:                                            to the style in which characters are represented
           Stage 3.1: h = fspecial (type)                         in the given document image as illustrated in
     (Creates a two-dimensional filter h of the                   figure 3.
specified type. fspecial returns has a correlation                   INPUT DOCUMENT              OUTPUT EDITABLE FORM
kernel, which is the appropriate form to use with                             IMAGE
imfilter.type is a string having one of the following
values: average, disk, gaussian)
           Stage 3.2:        B = imfilter (A, H)
(Filters the multidimensional array A with the
multidimensional filter H.)
Stage 4: Sharpening:
         Stage 4.1: h = fspecial ('unsharp', alpha)

(Returns a 3/3 unsharp contrast enhancement filter.
fspecial creates the unsharp filter from the negative
                                                                                                The letter ‘I’ could not be
of the Laplacian filter with parameter alpha. where                                             matched with the template ‘I’.
alpha controls the shape of the Laplacian and must
be in the range (0.0 to 1.0). The default value for
alpha is 0.2.)
Stage 5: Dewrapping:
Based on the background of the image, different
approaches have been proposed for document
image dewrapping. These approaches can be
divided into two main categories based on the
document capturing methodology: (i) approaches
in which specialized hardware arrangement like                                                  Since the degree of wrapping is
stereo     camera,      is    required    for   3D    shape                                     high, the letter ‘T’ could not be
reconstruction of wrapped document and (ii)                                                     matched.
approaches in which dewrapping method is
designed for image that is captured by using single
hand-held camera/Mobile in an uncontrolled
environment. In our present work, we deal with the
second approach using Mobile phones where
documents are captured using mobile on curved
surfaces [8].

                                                                                                The digit ‘8’ has high degree of
We have also worked with certain typical
                                                                                                wrapping. Thus 8 is recognized as
special cases where recognition of character                                                    ‘5’.
was not highly relevant to the one present in
the given document image which is mainly due                               Figure 3: Typical Special Cases

                                                           107                                http://sites.google.com/site/ijcsis/
                                                                                              ISSN 1947-5500
                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                               Vol. 8, No.8, 2010

Typical results on the various operations of the
proposed approach are as shown in Fig.4.

                                                                               C. Text localization:
                                                                Fig 4.5 Bounding box for the text in the document

A. Input Image: Fig 4.1 The input document image

                                                                               D. Text Extraction:
                                                                     Fig 4.6 Characters extracted individually
                   B Preprocessing:
   Fig 4.2 Input image, Motion blurred image, Gray
               image, Sharpened image

                                                                              E. Editable document:
                                                                    Fig 4.7 Text converted into editable format
         B.1.Noise Removal using bwareaopen
          Fig 4 .3 Input image without noise                      Figure 4: Various Operations performed

                                                                       IX. CONCLUSION
                                                              Image processing is a method that has wide
                                                              applications   in   disciplines    related    to    a
                                                              researcher’s preview. In the present work, we
                  B.2. Binary image:                          have taken up an interesting concept of
   Fig 4.4 Gray image converted into Binary image
                                                              document image analysis in a broader sense
                                                              but we have restricted ourselves to text
                                                                       In specific, document analysis plays a
                                                              very important role in image processing since
                                                              any information present that has to be

                                                       108                                 http://sites.google.com/site/ijcsis/
                                                                                           ISSN 1947-5500
                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                             Vol. 8, No.8, 2010

authenticated will be in a form of document                 Conference, Vietri sul Mare, Italy, 2009,
                                                            proceedings. Volume 5716 of Lecture Notes in
only. Thus in the present problem, we have                  Computer Science, pages 471-480, Springer, 2009.
used a concept of template matching to
                                                            [3] B. Gatos, I. Pratikakis, K. Kepene, S.J.
evaluate the text present in an input image with            Perantonis, “Text detection in indoor/outdoor scene
                                                            images,” in: Proc. 16th IEEE international
database of characters that have been taken as              conference on image processing, pp: 127-132, 2009.
a knowledge during the process of training.
                                                            [4] S Jayaraman, S Esakkirajan, T VeeraKumar,
         Image wrapping or dewrapping may                   “Digital Image Processing,” ISBN: 0070144796,
                                                            McGraw-Hill Education, India, 2009.
be implemented using texture mapping by
defining a correspondence between a uniform                 [5] Celine Mancas-Thillou, Bernard Gosselin
                                                            ”Natural Scene Text Understanding. Computer
polygonal mesh and a wrapped mesh. The                      Vision and Image Understanding,” Volume
                                                            107, Issue 1-2, pp: 97-107, 2007.
points of the wrapped mesh are assigned the
                                                            [6] Stephen J.Chapman,”Mat lab Programming for
corresponding texture coordinates of the
                                                            Engineers”, 3rd edition, McGraw-Hill Education,
uniform mesh and the mesh is texture mapped                 2007.
                                                            [7] Rafael C. Gonzalez, Richard E. Woods, “Digital
with the original image. Using this technique,              Image processing, Publishing House of Electronics-
simple transformations such as zoom, rotation               An Approach-Effect of an Exponential Distribution
                                                            on different Processing”, Second edition, 2005.
or shearing can be efficiently implemented.
         This paper provides various tools to               [8] Aleksander Recnik, Gunter Mobus, Saso
                                                            Sturm,”IMAGE-WRAPP: A real-space restoration
play with images and also opens up new                      method for high-resolution STEM images using
                                                            quantitative HRTEM analysis,” Elsevier, 2005.
avenues in the areas such as text extraction and
detection. Typically the approach is focused to             [9] Wenxin Li, Jane You, David Zhang: “Texture-
                                                            based palm print retrieval using a layered search
convert the input document image, that is text
                                                            scheme for personal identification,” IEEE
(which might be a text present in an image)                 Transactions on Multimedia, Volume 7, Number 5,
                                                            pp: 891-898, 2005
into its gray image constituents. Before
performing any operations the present paper                 [10] Yu Zhang, Shie Qian and Thayananthan
                                                            Thayaparan,” Two new approaches for detecting a
takes into consideration the various pre-                   maneuvering air target in strong sea-clutter,” Radar
                                                            Conference, 2005 IEEE International, pp: 83-88,
processing stages such as blur, cleaning and
sharpening to cognize a knowledge base. The
algorithm designed for the present work is able              [11] Keechul Jung, Kwang in Kim, Anil K.
                                                            Jain,”Text Information Extraction in Images and
to identify the alphanumeric text into its                  Video: A Survey,” Pattern recognition, volume37,
                                                            issue 5, pp: 977-997, 2004.
relevant values i.e. its alphabets (capital A-Z &
                                                            [12] Jian Liang, David Doermann and Huiping Li.”
small a-z) and numerals (0-9) that can have the
                                                            Camera-Based Analysis of Text and Documents: A
properties of skew also.                                    Survey,” International Journal on Document
                                                            Analysis and Recognition, Volume 7(2+3), pp: 83 --
         REFERENCES                                         104, Springer-Verlag 2005.
[1] Syed Saqib Bukhari, Faisal Shafait, Thomas M.            [13] Faisal Shafait, Thomas M. Breuel, “Document
Breuel, “Ridges Based Curled Text line Region               Image Dewrapping Contest,” In proceedings of 17th
detection from Gray scale Camera-Captured                   ICPR, Volume 1, pp 482-485, 2004 .
document Images,” 13th Int. Conf. on Computer
analysis of Images and Patterns, CAIP’09, Munster,          [14] Rafael C. Gonzalez, Richard E. Woods, Steven
Germany, 2009.                                              L.Eddins,”Digital   Image     Processing    Using
                                                            MATLAB,” Prentice Hall, Pearson education, 2004.
[2] Fabio Caccia, Roberto Marmo, Luca Lombardi.,
“License     Plate  Detection     and     Character
Recognition,” In Pasquale Foggia, Carlo Sansone,
Mario Vento, editors, Image Analysis           and
Processing - ICIAP 2009, 15th International

                                                      109                                http://sites.google.com/site/ijcsis/
                                                                                         ISSN 1947-5500