Web OCR Services in Indian Language - C-DAC_ Noida

Document Sample
Web OCR Services in Indian Language - C-DAC_ Noida Powered By Docstoc
					    Web OCR Services in Indian
                              Presented By:
       Tushar Patnaik            Bhupendra Kumar          Deepak Kumar Arya
   School of Information       School of Information     School of Information
        Technology                  Technology               Technology
       CDAC, Noida                CDAC, Noida               CDAC, Noida
   Introduction to OCR
    Provides translation of scanned documents images into
      machine encoded text format.

                          OCR System

Input Image
                                                  Output text
Introduction to Web Services
 Software system designed to support interoperable machine
  -to-machine interaction over the network.
 Builds distributed computing platform for the web.
 Can combine different web applications and components
 A web application is an application that is accessed over a
  network such as the internet or an intranet.
      do not require any complex procedure to deploy in large organizations.
      little or no disk space on the client.
     Easy upgrade since all new features are implemented on the server
     cross-platform compatibility
     Integration to other web applications
Web OCR Services
 The OCR is implemented as a web application where user
  can upload image and generate text output on the fly through
  the web.
 Challenges:
   Multiple User’s support with session handling
   Handling of non standard documents
   Administrative control over the user and application
   Controlled access to resources
   Scalability issues
Need for Web OCR Service
   The need to develop a web based OCR arise so as
    to make the stand-alone OCR for Indian Scripts
    online and to get feedback from users about the
    availability of such a service.
   Their was a need to provide an online service to
    users globally to take advantage of OCR service
    for Indian Scripts also.
   To maintain large volumes of data through digital
   To preserve old and historical documents in
    electronic format.
Key Criteria            Visual Studio           Netbeans                 PHP

Nature         has dynamic It is also form oriented,    PHP is still stuck to its
                        nature & has broken new object oriented and      scripting language. It is
                        grounds by entering into precompiled as VS       an old software with no
                        new languages (even                              newer versions.
                        developing some of its
                        own). It is form oriented,
                        object oriented and

Programming Languages   Visual Studio supports Written in Java but can   PHP code is embedded
                        different programming run anywhere a JVM is      into the HTML source
                        languages by means of installed.                 document and interpreted
                        language services, which                         by a web server, which
                        allow the code editor and                        generates the web page
                        debugger to support                              document. It also has
                        nearly any programming                           evolved to include a
                        languages.                                       command-line interface
                                                                         capability and can be
                                                                         used in standalone
                                                                         graphical applications
Compiler            Parallel compilation on Newer Lexer makes               PHP is a loosely typed,
                    multicore systems does faster     runtime               objects optional, fixed
                    improve performance by compilation                      syntax, component-less,
                    a good 25-30% over                                      runtime interpreted,
                    previous versions on C#                                 structured programming
                    apps. It integrates web                                 model. It is not
                    services hosting, which                                 precompiled and form
                    earlier had to be done                                  oriented as VS.
                    separately by the users.

Space utilization   ASP.Net utilizes server     It uses server space and    Inbuilt memory space is
                    space while running.        not inbuilt memory.         used by PHP while

Security            ASP. Net is reputed for Security         techniques      PHP provides security but
                    creating sophisticated       present but not as great as does not ensure as much
                    techniques to ensure the VS.                             as DOT Net. It is not
                    safety of confidential data.                             professional and secured
                    It is professional in nature                             as required for corporate
                    and is used for corporate                                projects.
 The OCR Web portal can be
  accessed by two different types
  of users.
    The administrator user
  control other user activities
  and have write access to web
 Web portal provides services
  to the users through the
  backhand web applications like
  OCR services, preprocessing
  service, Text editing and other
  image processing facilities.
Web OCR Service…a View
User’s Dashboard..

          No files to Display
File Upload
Preview Uploaded File
File Editing
Editted File
OCR File
OCRed File with Output
User’s Dashboard after File has
been OCRed
Web OCR Services using Grid

  Grid computing follows service oriented architecture and
   provide hardware and software services and infrastructure
   for secure and uniform access to heterogeneous resources
   and enables formation and management of virtual
  A computational grid is a hardware and software
   infrastructure that provides dependable, consistent,
   pervasive, and inexpensive access to high-end computational
   -”The Grid: Blueprint for a New Computing Infrastructure”,
   Kesselman & Foster
What grid computing can provide?

  • Exploit underutilized resources
    • the application must be executable remotely.
    • remote machine must meet any special hardware, software, or
      resource requirements imposed by the application.
  • Parallel CPU capacity
  • Access to additional resources
  • Resource balancing
  • Reliability
    • multiple copies
    • automatically resubmit jobs
        Basic Grid Architecture

                                                     Web OCR
                                                    Web OCR

Users                               Internet
                                 Internet oror

                                                 Worker Agent
    Future Scope..
 Multiple file upload with status bar.
 Animation/progress indicator at the time of OCR execution.
 Batch processing of files.
 Deciding the process-flow and saving the workflow for future use.
 Dictionary based corrections in the output of OCR.
 Controls for applying multiple types of text formatting like Bold,
  Italics, Underline etc.
 Zoom-in and Zoom-out functions for both input and output images.
 Conversion of exe files for all the OCR’s to dll library files and
  integrating them.
 Authenticating user login through OCR CAPTCHA.
 The proposed system has been designed and implemented
    providing the services defined.
   At present five scripts OCR have been integrated.
   Seven more scripts OCR are planned to be integrated during
    next two years.
   The computational job of OCR engine will be provided by
    the grid architecture.
   The number of users for Web OCR services may be not
    large in number but as facilities and more number of OCRs
    will be included large number of users will be benefited.
 Software Works, “Comparison of dot net, J2ee, PHP”
•  Foster, Carl Kesselman, and S. Tuecke, The Anatomy of the
  Grid: Enabling Scalable Virtual Organizations,
  International Journal of Supercomputer Applications, 15(3), Sage
  Publications, 2001, USA.
 Rajkumar Buyya and Srikumar Venugopal “ A Gentle
  Introduction to Grid Computing and Technologies” CSI
  Communication VOL 9, july 2005
 In this paper development methodology for the web OCR services
  is proposed.
 The term Web services describes a standardized way of
  integrating Web-based applications using the XML, SOAP,
 Web services instead share business logic, data and processes
  through a programmatic interface across a network.
 Developers can then add the Web service to a GUI (such as a Web
  page or an executable program) to offer specific functionality to
  users. Services like optical character recognition are still not
  available on web for Indian languages, where user can upload the
  image and get the text output on the fly through web.

        Keywords- Web Services, OCR Services, Image processing
 A web application is an application that is accessed over a network
    such as the internet or an intranet. Web applications are popular due to
    the ubiquity of web browsers, and the convenience of using a web
    browser as a client, sometimes called a thin client.
   Common web applications include webmail, online retail sales, online
    auctions, wikis and many other functions.
   Services like optical character recognition are still not available on web
    for Indian languages, where user can upload the image and get the text
    output on the fly through web.
   The framework for the OCR services will be using the ASP DOT NET
    in middle tier application logic. The framework supports multiuser,
    authentication, session handling, multiple file upload, user control on
    technical flow, session saving, multilingual facilities for the user.
   It also supports handling of non standard images, administrative control
    to the client request and resources, multilevel priorities to users,
    handling scalabilities (horizontal and vertical) and transparency to
    replace, repair and upgrade the application.
 Ministry of Information and Technology which has constituted a
  Consortium to develop Indian language OCR where digitization
  of all Indian languages can be done. CDAC Noida as a consortium
  member has developed a Web OCR service portal for the internet
 The comparative study leads us to selection of Visual studio dot
  net as it has dynamic nature and has broken new grounds by
  entering into new languages (even developing some of its own). It
  is form oriented, object oriented and precompiled unlike PHP.
 VS Team System Database Edition has excellent database-code
  integration tools. LINQ code generators are another excellent
  feature. Winforms and ASP forms are great and better than
 In this paper, we propose Development mythology for the Web
  OCR services using Visual studio 2010 dot net tools with ASP dot
  net version 4 as development technology.
 The Architecture in figure 1
  defines the OCR Web portal
  can be accessed by two different
  types of users.
 The administrator user
  functions and controls are
  different from normal user
 Web portal provides services to
  the users through the backhand
  web applications like OCR
  services, preprocessing service,
  Text editing and other image
  processing facilities.

                                     Figure : 1
 The use-case diagram is
  shown in figure 2. The
  diagram describes the set of
  actions that system can
  perform in collaboration of
  external users or actors.

                                 Figure : 2
• OCR Web Portal is to
incorporate End-to-end OCR
system for different scripts,
preprocessing modules and
different level of access to the
end user and administrator.
•The user can upload input
file or files through the web
portal after proper login to
the server and then can select
the OCR or preprocessing
module for the execution.
•The text outputs can be
edited by the user through
web portal thus it requires
online keyboard for each
•The administrator control of
web portal is provided with
the facility of controlling
other user activity and to
control the configuration of
OCR and preprocessing

                                   Fig 3. Workflow
       A. User activity and control
 This module defines the role and accessibility of the end user
    of the OCR Web Portal.
   User module interfaces with the login and new registration
   The Login module checks for the credentials properties and
    verifies the user.
   The new registration module defines the method to get the
    new credentials for the new user.
   The user module specifies the services provided for the end
    user and to maintain sessions. The services includes file/files
    uploading, downloading the output data, selecting OCR or
    preprocessing module to execute on the input file, editing
    the output text file using online keyboard and logout.
        B. Online Keyboard module

 The module specifies the design and usage of online keyboard
  to be used by the user module for text editing through OCR
  web portal.
 This module will generate a online keyboard for all the script
  (Included in OCR module).
 This also interfaces with the selected OCR by the user so it
  can initialize the correct keyboard for the user on web portal.
C. Administrator activity and control
 The module specifies the control mechanism for the Web
   The administrator privileged user need to provide the valid
    credential for accessing the services. The service includes
    checking the input and output files of normal user and to
    control the configuration files for the OCR and
    preprocessing modules.
   The OCR and preprocessing modules access the
    configuration file before executing the input to control the
    technical flow.
   The Administrator can control/change the configuration files
    that help in generating better output to the user.
   It can also access the various log file as it interfaces with log
    generation and maintenance module.
    D. OCR Module/Preprocessing
 The module is responsible for generating the xml files
  according to schemas which in turn helps in global interfaces
  for any OCR and preprocessing module.
 Current version of Web OCR contains OCR engine for five
 The other image editing facilities are also provided in the
  Web portal like image rotation, brightness control and image
    E. Log creation and Maintenanc e
   This module interfaces with user module and administrator
    module and log the information about the activities on Web portal.
   The interface for the user module is used for creating log of user
    activities while the interface for administrator module is used for
    retrieving the log information.
   This module also provides the important information about the
    text editing done by user.
   The log information contains all the activity done on the text
    output of the document image.
   This information is very much useful for improving the OCR
    engine performance as it can specify the more frequent errors
    caused by the OCR itself at character and word level.
        F. Output Generation module
   The module defines the format for the output text generated by
    OCR engine and the other facilities of text editing to be provided.
   The output text should support Unicode format so that all the
    scripts output are standardized and accessible everywhere.
   The text editing services is provided by the rich text control
    where user edits output with bold, italics, underline, coloring and
    other services.
   .Also this control provided print control where user can get the
    output thorough the printer without saving the output to local
   The dictionary module can also be embedded into the rich text

Shared By: