Web OCR Services in Indian
Tushar Patnaik Bhupendra Kumar Deepak Kumar Arya
School of Information School of Information School of Information
Technology Technology Technology
CDAC, Noida CDAC, Noida CDAC, Noida
firstname.lastname@example.org email@example.com firstname.lastname@example.org
Introduction to OCR
Provides translation of scanned documents images into
machine encoded text format.
Introduction to Web Services
Software system designed to support interoperable
machine-to-machine interaction over the network.
Builds distributed computing platform for the web.
Can combine different web applications and components
A web application is an application that is accessed over a
network such as the internet or an intranet.
do not require any complex procedure to deploy in large organizations.
little or no disk space on the client.
Easy upgrade since all new features are implemented on the server
Integration to other web applications
Web OCR Services
The OCR is implemented as a web application where user
can upload image and generate text output on the fly through
Multiple User’s support with session handling
Handling of non standard documents
Administrative control over the user and application
Controlled access to resources
Need for Web OCR Service
The need to develop a web based OCR arise so as
to make the stand-alone OCR for Indian Scripts
online and to get feedback from users about the
availability of such a service.
Their was a need to provide an online service to
users globally to take advantage of OCR service
for Indian Scripts also.
To maintain large volumes of data through digital
To preserve old and historical documents in
Key Criteria Visual Studio Netbeans PHP
Nature ASP.net has dynamic It is also form oriented, PHP is still stuck to its
nature & has broken new object oriented and scripting language. It is
grounds by entering into precompiled as VS an old software with no
new languages (even newer versions.
developing some of its
own). It is form oriented,
object oriented and
Programming Languages Visual Studio supports Written in Java but can PHP code is embedded
different programming run anywhere a JVM is into the HTML source
languages by means of installed. document and interpreted
language services, which by a web server, which
allow the code editor and generates the web page
debugger to support document. It also has
nearly any programming evolved to include a
languages. command-line interface
capability and can be
used in standalone
Compiler Parallel compilation on Newer Lexer makes PHP is a loosely typed,
multicore systems does faster runtime objects optional, fixed
improve performance by compilation syntax, component-less,
a good 25-30% over runtime interpreted,
previous versions on C# structured programming
apps. It integrates web model. It is not
services hosting, which precompiled and form
earlier had to be done oriented as VS.
separately by the users.
Space utilization ASP.Net utilizes server It uses server space and Inbuilt memory space is
space while running. not inbuilt memory. used by PHP while
Security ASP. Net is reputed for Security techniques PHP provides security but
creating sophisticated present but not as great as does not ensure as much
techniques to ensure the VS. as DOT Net. It is not
safety of confidential data. professional and secured
It is professional in nature as required for corporate
and is used for corporate projects.
The OCR Web portal can be
accessed by two different types
The administrator user control
other user activities and have
write access to web
Web portal provides services
to the users through the
backhand web applications like
OCR services, preprocessing
service, Text editing and other
image processing facilities.
Web OCR Service…a View
No files to Display
Preview Uploaded File
OCRed File with Output
User’s Dashboard after File has
Web OCR Services using Grid
Grid computing follows service oriented architecture and
provide hardware and software services and infrastructure
for secure and uniform access to heterogeneous resources
and enables formation and management of virtual
A computational grid is a hardware and software
infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high-end computational
-”The Grid: Blueprint for a New Computing Infrastructure”,
Kesselman & Foster
What grid computing can provide?
• Exploit underutilized resources
• the application must be executable remotely.
• remote machine must meet any special hardware, software, or
resource requirements imposed by the application.
• Parallel CPU capacity
• Access to additional resources
• Resource balancing
• multiple copies
• automatically resubmit jobs
Basic Grid Architecture
Multiple file upload with status bar.
Animation/progress indicator at the time of OCR execution.
Batch processing of files.
Deciding the process-flow and saving the workflow for future use.
Dictionary based corrections in the output of OCR.
Controls for applying multiple types of text formatting like Bold,
Italics, Underline etc.
Zoom-in and Zoom-out functions for both input and output images.
Conversion of exe files for all the OCR’s to dll library files and
Authenticating user login through OCR CAPTCHA.
The proposed system has been designed and implemented
providing the services defined.
At present five scripts OCR have been integrated.
Seven more scripts OCR are planned to be integrated during
next two years.
The computational job of OCR engine will be provided by
the grid architecture.
The number of users for Web OCR services may be not large
in number but as facilities and more number of OCRs will be
included large number of users will be benefited.
Software Works, “Comparison of dot net, J2ee, PHP”
• Foster, Carl Kesselman, and S. Tuecke, The Anatomy of the
Grid: Enabling Scalable Virtual Organizations,
International Journal of Supercomputer Applications, 15(3), Sage
Publications, 2001, USA.
Rajkumar Buyya and Srikumar Venugopal “ A Gentle
Introduction to Grid Computing and Technologies” CSI
Communication VOL 9, july 2005
In this paper development methodology for the web OCR services
The term Web services describes a standardized way of integrating
Web-based applications using the XML, SOAP, WSDL and HTTP
Web services instead share business logic, data and processes
through a programmatic interface across a network.
Developers can then add the Web service to a GUI (such as a Web
page or an executable program) to offer specific functionality to
users. Services like optical character recognition are still not
available on web for Indian languages, where user can upload the
image and get the text output on the fly through web.
Keywords- Web Services, OCR Services, Image processing
A web application is an application that is accessed over a network
such as the internet or an intranet. Web applications are popular due to
the ubiquity of web browsers, and the convenience of using a web
browser as a client, sometimes called a thin client.
Common web applications include webmail, online retail sales, online
auctions, wikis and many other functions.
Services like optical character recognition are still not available on web
for Indian languages, where user can upload the image and get the text
output on the fly through web.
The framework for the OCR services will be using the ASP DOT NET
in middle tier application logic. The framework supports multiuser,
authentication, session handling, multiple file upload, user control on
technical flow, session saving, multilingual facilities for the user.
It also supports handling of non standard images, administrative control
to the client request and resources, multilevel priorities to users,
handling scalabilities (horizontal and vertical) and transparency to
replace, repair and upgrade the application.
Ministry of Information and Technology which has constituted a
Consortium to develop Indian language OCR where digitization
of all Indian languages can be done. CDAC Noida as a consortium
member has developed a Web OCR service portal for the internet
The comparative study leads us to selection of Visual studio dot
net as it has dynamic nature and has broken new grounds by
entering into new languages (even developing some of its own). It
is form oriented, object oriented and precompiled unlike PHP.
VS Team System Database Edition has excellent database-code
integration tools. LINQ code generators are another excellent
feature. Winforms and ASP forms are great and better than
In this paper, we propose Development mythology for the Web
OCR services using Visual studio 2010 dot net tools with ASP dot
net version 4 as development technology.
The Architecture in figure 1
defines the OCR Web portal can
be accessed by two different
types of users.
The administrator user
functions and controls are
different from normal user
Web portal provides services to
the users through the backhand
web applications like OCR
services, preprocessing service,
Text editing and other image
Figure : 1
The use-case diagram is
shown in figure 2. The
diagram describes the set
of actions that system can
perform in collaboration of
external users or actors.
Figure : 2
• OCR Web Portal is to
incorporate End-to-end OCR
system for different scripts,
preprocessing modules and
different level of access to the
end user and administrator.
•The user can upload input
file or files through the web
portal after proper login to
the server and then can select
the OCR or preprocessing
module for the execution.
•The text outputs can be
edited by the user through
web portal thus it requires
online keyboard for each
•The administrator control of
web portal is provided with
the facility of controlling
other user activity and to
control the configuration of
OCR and preprocessing
Fig 3. Workflow
MODULES AND PROCESSES
This section provides a general description of the modules
and where each fits in the global picture. The OCR Web
Portal comprises of the following modules.
User activity and control
• Login module
• New registration module
Administrator activity and control
Log creation and maintenance
Output Generation modules
A. User activity and control
This module defines the role and accessibility of the end user
of the OCR Web Portal.
User module interfaces with the login and new registration
The Login module checks for the credentials properties and
verifies the user.
The new registration module defines the method to get the
new credentials for the new user.
The user module specifies the services provided for the end
user and to maintain sessions. The services includes file/files
uploading, downloading the output data, selecting OCR or
preprocessing module to execute on the input file, editing
the output text file using online keyboard and logout.
B. Online Keyboard module
The module specifies the design and usage of online keyboard
to be used by the user module for text editing through OCR
This module will generate a online keyboard for all the script
(Included in OCR module).
This also interfaces with the selected OCR by the user so it
can initialize the correct keyboard for the user on web
C. Administrator activity and control
The module specifies the control mechanism for the Web
The administrator privileged user need to provide the valid
credential for accessing the services. The service includes
checking the input and output files of normal user and to
control the configuration files for the OCR and
The OCR and preprocessing modules access the
configuration file before executing the input to control the
The Administrator can control/change the configuration files
that help in generating better output to the user.
It can also access the various log file as it interfaces with log
generation and maintenance module.
D. OCR Module/Preprocessing
The module is responsible for generating the xml files
according to schemas which in turn helps in global interfaces
for any OCR and preprocessing module.
Current version of Web OCR contains OCR engine for five
The other image editing facilities are also provided in the
Web portal like image rotation, brightness control and image
E. Log creation and Maintenance
This module interfaces with user module and administrator
module and log the information about the activities on Web
The interface for the user module is used for creating log of user
activities while the interface for administrator module is used for
retrieving the log information.
This module also provides the important information about the
text editing done by user.
The log information contains all the activity done on the text
output of the document image.
This information is very much useful for improving the OCR
engine performance as it can specify the more frequent errors
caused by the OCR itself at character and word level.
F. Output Generation module
The module defines the format for the output text generated by
OCR engine and the other facilities of text editing to be provided.
The output text should support Unicode format so that all the
scripts output are standardized and accessible everywhere.
The text editing services is provided by the rich text control
where user edits output with bold, italics, underline, coloring and
.Also this control provided print control where user can get the
output thorough the printer without saving the output to local
The dictionary module can also be embedded into the rich text