Digitization Practices in India - DELNET

Document Sample
Digitization Practices in India - DELNET Powered By Docstoc
					Digitization Practices in
   India: Issues and
       Challenges




                 V.N. Shukla
         C-DAC, NOIDA UNIT
                 NATURAL LANGUAGE
                  PROCESSING AND
                    INTERFACES


INFRASTRUCTURE                      HUMAN RESOURCE
  AND SUPPORT                       DEVELOPMENT IN
    SERVICES        MISSION          HITECH AREAS
                     C-DAC




                                                     2
                   AREAS OF COMPETENCE


                                         Graphical Display
       NLP                               System


E-Governance                 .

                                                Security
                                               Systems
                     .

                                 NOIDA
Internet on CATV         .

 & E-Commerce                                  Embedded
                                               System

  Solar Energy
    System                   System Engineering and
                             Consultancy

                                                             3
   Digital Library Activities : CDAC Noida

•Digital Library Projects

       •Mega Centre for Digital Library
       •Mobile Digital Library : Dware Dware Gyan Sampada
       •Digital Library at President’s House
       •Digital Library at Nagari Pracharini Sabha Varanasi
       •Digital Library at Uttaranchal
       •GyanNidhi : Multilingual Parallel Corpus in Indian Languages
       •Digital Library at Gujrat Vidyapeeth ,Ahmedabad
       •Digitization of Libraries
          Digital Library Mission
        To organize the information and make it universally
                      accessible and useful.




  Online Content                                Offline Content
Billions of web pages                    Billions of items still unindexed
                        DL Initiatives

                                       ~85% of books are out of print
                                       and/or out of copyright – these
Only ~15% of books are in print
                                       books are only found in libraries




       GOAL: Create a comprehensive virtual card catalog of all
       books in all languages, while respecting publishers’ rights
                                                                      Source: Google
Digital Libraries


                                                               Users




                    Hyperlinks      Metadata
                                     Search
                            Index



                                               DL creation &
Traditional Libraries                           processes
                        I
                        N
                        D
                        E
                        X
                      A Typical Library Collection

The value is in the middle

           15%
           ~15%                                                 ~65% or more                                     Less than 20%**



In-Print                                      Unclear copyright status                                            Public Domain
                                              • May be in copyright, but not for sale
                                              • Rights may have reverted to author
                                              • May be in the public domain



        92% of the world's books are neither generating revenue for the
        copyright holder nor easily accessible to potential readers.*

*Source: Covey, Denise Troll. "Global Cooperation for Global Access: The Million Book Project“
**OCLC analysis of the Google Books Library Project: http://www.dlib.org/dlib/september05/lavoie/09lavoie.html
DIGITAL LIBRARY DEFINITION

   Digital Library (DL) may be seen as
    “Collection of intelligent creations by human
    beings through their own language and
    culture. It also reflects cultural heritage
    besides providing archive and generating
    many research issues pertaining to Natural
    Language Processing”
Digital Library ?
Sun Microsystems defines a digital library as the electronic extension of
functions users typically perform and the resources they access in a
traditional library.

These information resources can be translated into digital form, stored
in multimedia repositories, and made available through Web-based
services.


According to other definition Digital libraries are

“Organizations that provide the resources, including the specialized
staff, to select, structure, offer intellectual access to, interpret,
distribute, preserve the integrity of, and ensure the persistence over
time of collections of digital works so that they are readily available for
use by a defined community or set of communities”.
      What is Digital library ?


   A Service? An Architecture?
   A set of Information Resources?
   A set of tools to locate, search, retrieve
    information?
   Possibly the tools to create such resources and
    services also fall within the purview of DLs
   Digital face of traditional libraries
   Include both digital collections and traditional
   Backbone and nervous system of libraries.
              Digital library Vs traditional library


•Efficient & qualitative services by collecting, organizing,       storing,
 disseminating, retrieving and preserving the information.

•Preservation benefits besides making information retrieval & delivery more
 comfortable.

•Online access to historical and cultural documents whose existence is
 endangered due to physical decay.

Digital libraries necessarily include a strong focus on the management of
digital content, just as traditional libraries have focused for long on the
management of content in physical forms.
                        Digital Content Management
Most of the digital content that is being managed includes:

• Human Language, in various forms character-coded electronic text, scanned
images, printed or handwritten text or human speech.

• Language technology helps in managing digital content

• Management through learning from past experience also adds to manage
content

The major areas for great exploitation are:

    • Information retrieval,
    • multimedia,
    • database,
    • data mining,
    • data warehouse,
    • on-line information repositories,
    • image processing, hypertext,
    • World Wide Web and wide area information services (WAIS).
                   Few advantages of digital libraries
  • Access anywhere

  • Reducing delays

  • Distributed storage – central access

  • Better cataloguing

  • Cross references to other documents

  • Full text search

  • Protected information source

  • Wide exploration and exploitation of the information



The information explosion, the wide bandwidth data networks and the potential
of Internet-based technologies - such as the Web - make digital libraries one of
the important application areas of computer science.
Process of Digital Preservation
                                 Centralized                     Book scanning
                                   Server                           status


          XML Meta File
          Creation using                                   Yes
                                       Reject the
          Dublin core Std.
                                         Book
                                                                           No




                Scanned                   S/w to divide                     Batch
              Image in TIFF               even & odd                      cropping &
                 format                      pages                         Cleaning




                                                Conversion to
                                               TXT/RTF/HTML                      OCR


                     Uploading
                         Goals of DL

   Focused on digitization technology, metadata
    schemes, data management techniques, and digital
    preservation.
   Second-generation digital library
       exploring new opportunities and developing new
        competencies.
   Third-generation digital library
       focusing instead on fully integrating digital material into the
        library’s collections through a modular systems
        architecture.
                   Ingredients for DLs


   Hardware
    The minimum machinery to do the job
   Software
    The programs for handling data
   Digital Objects
    Articles, Conference Papers, Thesis,……
   Basic Skills
    Things one has to learn
                      Hardware

   A Server
       You’ll need access to a web server
   A good PC
   Scanners
    Flatbed – Auto feed, Back to back
    MF
    Book Scanner
                  Software



   Open Source Software (OSS)
    Dspace, E-Prints, Fedora, GSDL……


   Proprietary software you can’t avoid
    Image Editing and Optical Character Recognition Software
      have to be purchased
              Content is King




The information content is
more important than the         Objects should not be “locked”
systems used for its storage,   in specific DLs or archives
management and retrieval
                    Creating DLs …

   Six steps
       Selecting
       Acquiring
       Digitization
       Creation Of Meta Data
       Organizing
       Archiving
       Providing Access
Possible Delivery Formats

   Pure image formats: TIFF, JPEG
   Open encoded formats: XML, HTML, ASCII, and
    Unicode
   Hybrid formats: PDF, DjVu – can contain both image and
    text
   Proprietary formats: Microsoft Word, WordPerfect
             Digitization: Issues


   Copyright
   Access copy and archive copy
   File size
   Storage media( CD, Hard disc…)
   File format ( TIFF,JPEG…)
    Challenges in Digitization

   Building digital collections of national importance from
    existing texts, documents, images . . .

   Creating new digital documents & linking them

   Subject portals: Selecting and maintaining open source
    digital resources

   Developing / adapting management tools for digital
    collections

   Providing access to digital collections
                                                               25
    Challenges..

   Integrating digital & other library collections

       incl. integration of OPACs, subscribed e-resources and
        subject portals

   Establishing services for digital libraries

       online access & offline support
       education & training of users and librarians


   Addressing social, legal, policy issues


                                                                 26
          Challenges in Publishing


   Preservation of layout

   Searchability of content and metadata

   Efficient image compression

   Easy browsing of books

   Accommodating low bandwidth user

   Multilingual text support

   Multipaging
Digital Library Support in India
 Funding
   Ministry of Communication & Information Technology
     (MIT)
   Ministry of Human Resource Development (MHRD)

   Manuscript Mission of India

   Department of Scientific & Industrial Research (DSIR-
     TRP)
   All India Council for Technical Education (AICTE)

   University Grants Commission (UGC)
Digital Library Initiatives in India
    Library Consortium in India
    Scholarly Science Journals
    Theses & Dissertations
    Institutional E-Print Archives
    Books (out of copyright)
    Manuscripts
    Newspapers
    Online Courseware
    Open Access at Metadata Level
    Portal and Gateway Services


                                       29
                  Government of India


Min. of C&IT            Min of Culture           Others

    Universal Digital      National Manuscript
                                                  CSIR E-Journals
                           Library
     Library                                      Consortium

                                                  INDEST-AICTE
                                                  Consortium

                                                   UGC Infonet
                                                   Consortium

                                                   FORSA
                                                   Consortium

                                                   IIM Libraries
                                                   Consortium
                  Participating centers of DLI
                                       PTU-1
                                       PTU-2
                                       PTU-3
                                               Rashtrapathi
                                               Bhavan
                                 ERNET         CDAC Noida


                                                     IIIT-Allahabad




                                     Digital Library of India
                                                                    CDAC Kolkata


              MIDC         Pune University
                                                 IIIT-H
                                                 State & City
                                                 Central Library
                                                 University of Hyderabad
               Goa University

                                IISc           TTD Tirupati
                     Sringeri Mutt
                                                                              Mega Scanning Centres at
                                                Anna University                    IIITH, IIITA
IISc, IIAP,                                 Kanchi Mutt                       CDAC- Noida and Kolkatta
                         ASR Melkote
PoornaPragya                                SASTRA


                                     AKCE
Digital Library Initiatives in India




         Some Examples
                     Digital Library of India
                     http://www.dli.ernet.in/




April 20, 2009   Workshop on Institutional Repositories   33
                       http://www.ias.ac.in/


April 20, 2009   Workshop on Institutional Repositories   35
                                            http://www.insa.ac.in/




April 20, 2009   Workshop on Institutional Repositories              36
                                            http://medind.nic.in/




April 20, 2009   Workshop on Institutional Repositories             37
April 20, 2009   Workshop on Institutional Repositories   38
39
Manuscripts
   India has the largest collection of manuscripts in the world (5 million
    Approximately).

   India is the repository of an astounding wealth of ancient knowledge
    belonging to different periods of history, going back to thousands of
    years. Most of this knowledge belonging to different areas of
    intellectual activity such as religion, philosophy, science, arts and
    literature is preserved in the form of manuscripts. Composed in
    different Indian languages and scripts, they are preserved in materials
    such as birch bark, palm leaf, cloth, wood, stone and paper.

   National Manuscript Mission was launched five-year programme in
    Feb., 2003 by the Ministry of Human Resource Development, Govt. of
    India to get all the manuscripts and conserve them.
http://namami.nic.in/
          Archives of Indian Labour
      V.V. Giri National Labour Institute

Heritage of Indian Working Class
  Commissions on Labour
   Oral History Collections
   Trade Union Collections
   Regional Collections
   Strike Collections
 Powered by Green Stone Digital
Library
http://www.indialabourarchives.org/
                                            43
        Digital Libraries Benefits : Individual

  Gain access to the holdings of libraries worldwide through
   automated catalogs. Locate both physical and digitized
   versions of scholarly articles and books.
 Optimize searches, simultaneously search the Internet,
   commercial databases, and library collections.
 Save search results and conduct additional processing to
   narrow or qualify results.
 From search results, click through to access the digitized
   content or locate additional items of interest.
All of these capabilities are available from the desktop or
   other Web-enabled device such as a personal digital
   assistant or cellular telephone.
    Conclusion
   Digital Libraries are redefining the role of libraries in
    society & the role of librarians & information specialists

   National level mechanism is essential to promote and
    coordinate open access and public domain digital library
    systems

       Improve awareness of open access
       Regular training – tools, processes, standards
       Support setting up of working models, services
       National Resource Centre for open access publishing

   International agencies like UNESCO, ICSU, ICSTI,
    CODATA need to actively promote and support
    developing country initiatives
References

   Digitization Of Library Forum Survey 2010. IT Act .
    Available at www.mit.gov.in/it-bill.htm.
   A digital library for education: the PEN-DOR project. The
    Electronic Library, 17(2), 75-82.
   Government of India. 2000. “Background Report on IT
    for Masses” itformasses.nic.in/vsitformasses/page1.htm
   Government of India. 2000. IT for the Common Man: The
    Millenium IT Policy. Department of Information.
Thank You

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:8/31/2012
language:Latin
pages:47