; Active Routers for Selective Discard of Streamed MPEG Packets
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Active Routers for Selective Discard of Streamed MPEG Packets

VIEWS: 0 PAGES: 100

  • pg 1
									           Million Books to the Web
             An Example of Indo-US Collaboration
              Lessons Learnt & The Road Ahead

                        Prof N. Balakrishnan



Supercomputer Education and Research Centre   School of Computer Science
         Indian Institute of Science          Carnegie Mellon University
              Bangalore India                      Pittsburgh USA




         Indo-US Workshop on Open Digital Libraries & Interoprability
                             Washington, DC
                              June 23, 2003
Lessons from the past
 • fires of Alexandria
    – irrevocably severed our access to any of the works of the ancients.
 • introduction of printing technology
    – several Indian and Chinese knowledge disseminated by word of mouth
      and on palm leaves virtually disappear or inaccessible
 • New cultural revolutions
    – edifices built by destroying the past irrevocably
    – later revolutions seek solace in attempting to preserve what was
      destroyed
    – we need to preserve our heritage independent of the political and
      social ups and downs


A single wanton act of destruction
                     can destroy an entire line of heritage
Lessons from Reality
 In a thousand years:
    only a few of the paper documents we have today will survive
    the ravages of deterioration, loss, and outright destruction.
 Existing archives of paper many other works still in existence
    today are rare
    - only accessible to a small population of scholars and
    collectors at specific geographic locations
 Contrary to the popular beliefs, the libraries, museums, and
    publishers do not routinely maintain broadly comprehensive
    archives of the considered works of man
 No one can afford to do this, unless the archive is digital
The Approach
 • Technology Driven Vision
 • Decide on the stake holders
   – Never make it exclusive
 • Pilot Projects to perfect technology
 • Bring in advanced management concepts
   – like People Maturity Models
   – Quality assurance
   – automate wherever possible




                                     Continued…
The Approach
 • Lessons from the past
   – Too many Digital Library Projects
   – with half-life of less than 2 years from the date of
     “Launch” or a long incubation time
   – Follow Nike – JUST DO IT
 • Digital Library must have two ingredients
   – A knowledge Amplifier
   – Free-access, giving avenues for every one to make
     economic benefit
      • still contribute to multiplication of knowledge by circulation
 • In India, it should be a test bed for our
   Language Technology Research
   – a show case for our heritage
Elements of Technology
  • Microprocessors
  • Memory
  • Connectivity
  • Software
   All these technologies are growing
   exponentially
Communication Revolution
  If you are amazed at the drop in cost of computing,
  wait till you see what is going to happen to bandwidth.

  Network technology will increase 10-100 times faster
  than processor technology
                             -Andy Grove, Titan of Intel

     Bandwidth will double every year
     Network speeds become comparable to
      interconnect speeds
Together, the technology of Computers
and Communications Revolutions aim at


              Death of Time and Distance
             Anytime, Anyplace and Anyone
The World of Computers &
Communication
 Small fish eat the Big Fish
 Microprocessors offer performances comparable to
  supercomputers; Paradigm Shift from Dinosaurs to
  mammals- from performance to functionality
 NETWORK is everywhere
 Web is a preferred medium of communication for
  everyone - including the military & the terrorists
 Companies that make more and more Software Free
  – capitalize more- Open archives
Processor of Tomorrow
 • Carbon Nano Tubes
    – 5 to 10 atoms wide
    – promise to replace silicon soon
 • Flexible Transistors
    – made from plastic, oraganic materials
 • Silicon will live for 15 years
 • Moore’s law will live longer
 • 1000 times growth in 10 years

The winner will be decided by:
Material Convergence + Human Like interactions
Processor of Tomorrow
 • A billion Transistors at 10 to 20 GHz Clock
   rates by 2010
 • 128 G Bytes of Main Memory
 • Terra byte of Disk Storage- may be
   Holographic
 • Speech input/ output ASR
 • Multiligual
 • Terrabit connectivity at PC
 • The DL plans of today must be sensitive to
   this
The Road Ahead
                                              Bill Joy’s
                                              Nightmare
        Brilliant                                              SuperHumans



                                                  Emulating Human
                                                    Performance:
            Rich                             See, Hear, Talk, and “Think”


Knowledge
 Content                                Expert Systems


        Medium
                             Data Analysis


                     Scientific
                    Calculations
            Poor

                                       Evolution
The future trends:
• Browser will be the only medium of
  communication.
• It will be active- with voice and video, language
  independent.
• Mobility will be the key.
• Small form factor devices such as Palms, PDAs
  and Tablets would be the future.
• We would soon see TVPCT at the cost of a TV
• We will witness major convergence between ICT,
  Nano Technologies and Biological Sciences
Electronic Resources and the
Library of the Future
    E-mags; E-books; E-music;
            E-Movies
Dedicated E-book Readers
 • Dedicated readers – about
   20,000
 • Palm devices – 6,000,000
 • PC’s – hundreds of millions
 • “For people accustomed to
   reading text on a computer
   for hours at a time, e-book
   screen clarity is a non-
   issue.”
 • A low cost E-Book reader
   design on in India
http://www.eink.com/technology/index.htm
   • E Ink is made up of millions of microcapsules
      – each the diameter of a human hair
   • Each microcapsule contains
      – positively charged white particles &
      – negatively charged black particles
         • that float in a clear fluid
   • A film of transistors supplies the voltage to the
     capsules
   • A negative charge makes the white particles
     move to the top of the microcapsule
      – an opposite electric field pulls the black particles to
        the bottom of the microcapsules, mimicking the
        effect of print.
   • Electronic ink is a real power miser
E-ink/e-paper (Lucent)
The technology has been identified and
   development is well under way
By the year 2003, we envision electronic
   books
• that can display volumes of information
   as easily as flipping a page,
• permanent newspapers that update
   themselves daily via wireless broadcast
• Just as today's books give people easy
   access to everyday information,
   tomorrow's books will provide the
   same easy access to the dynamic data of
   the information age




   The world of publishing will never be the same
Indian Institute of Science’s
Simputer
 •   A hand held Linux Box at around US$ 200
 •   Has the state of the art browser
 •   Color screen
 •   very good speech synthesizer
     – In English and many Indian Languages
 • A very powerful tool for access with wireless
 • Soon to be modified as an E-book
 www.simputer.org
 www.picopeta.com
 www.ncoretech.com
The Challenges in Computing
Tomorrow’s computing needs
  are not in mflops and Gflops
The computer to process
  Information, recognition and
  DM like a Human
Small inexpensive
   Robots, swarms will
   be a reality
Ray Kurzweil:
The Age of Spiritual Machines
“A $1,000 PC (in 1999-dollars)…
  – 2009 = trillion calculations/second
  – 2019 = 20 million billion
    calculations/second
     (the human brain)
  – 2029 = 2 * 1019 calculations/second
                  (1,000 human brains)
Ray Kurzweil:
The Age of Spiritual Machines
 • 2009: “Computer displays have all the
   display qualities of paper- high
   resolution, high contrast, large viewing
   angle, and no flicker. Books, magazines,
   and newspapers are now routinely read
   on displays that are the size of small
   books.”
 • 2009: “At least half of all (business)
   transactions are conducted online.”
Ray Kurzweil:
The Age of Spiritual Machines
 • 2009: “There is effective convergence of
   all media, which exist as digital objects
   (that is, files) distributed by the ever-
   present high-bandwidth, wireless
   information web. Users can instantly
   download books, magazines,
   newspapers, television, radio, movies,
   and other forms of software to their
   highly portable personal communication
   devices.”
2009
• A $1,000 PC delivers Terahertz speeds
• PCs with high resolution visual displays come in a
  range of sizes
   – from those small enough to be embedded in clothing and
     jewelry
   – to the size of a thin book
• Cables are disappearing
   – Communication between components uses wireless
     technology, as does access to the Web
• The majority of text is created using continuous
  speech recognition
   – Also ubiquitous are language user interfaces.
• Most routine business transactions (purchases, travel,
  etc.) take place between a human and a virtual
  personality
   – Often the virtual personality includes an animated visual
     presence that looks like a human face
Ray Kurzweil:
The Age of Spiritual Machines
• 2019: “Reading books, magazines, newspapers,
  and other Web documents; listening to music;
  watching three-dimensional moving images
  (for example, television, movies); engaging in
  three-dimensional visual phone calls; entering
  virtual environments (by yourself, or with
  others who may be geographically remote);
  and various combinations of these activities
  are all done through the ever-present
  communications Web and do not require any
  equipment, devices, or objects that are not
  worn or implanted.”
Ray Kurzweil:
The Age of Spiritual Machines
2029: “The ever learning Society”
• Learning now constitutes the primary focus of the
  human species.
• Human learning is accomplished using virtual
  teachers (and virtual libraries?).
• Learning is enhanced by widely available neural
  implants, which improve memory and perception but
  cannot yet download knowledge directly.
• Automated agents are learning, on their own without
  human assistance. Machines can now create
  significant new knowledge with little or no human
  intervention; unlike humans, machines easily share
  knowledge structures with one another.
And Then There Was Music

 •   RealJukeBox
 •   Win Amp
 •   MP3
 •   Napster
The Growth rates
 • The processor performance doubles every 18
   Months
 • The Network bandwidth doubles every year
 • The storage capacity doubles every nine
   months
 • Soon you will have processor bottleneck
 • 1000 times growth in storage in 10 years – I
   already have 250 GB on a single disk-
Recognition verses Recall
 • Recognition is like seeing your friend’s face
   in a sea of faces
   – even if he has changed since you last saw him
   – storage intensive and fast
 • Recall is like figuring out how to repair your
   car’s carburetor using a manual and you
   have never done that before- applying
   knowledge to a new situation- processor
   intensive and less storage
 • Brian works on recognition
 • Present day computers prefer recall –
   remember the Y2K
 • Future computers would work like the
   brain- recognition
Recognition verses Recall- what
it does to our DL
 • We will move away from quantitative search
   (key word match) to “aboutness” and content
   based retrieval
 • In Future the documents will be read more by
   computers than by humans – will it change
   the way we write ? Would we think in html
   or in xml ?
 • From mere Text data to 3d Objects, voice and
   video
 • Multiligual
 • Every conceivable form of knowledge
   expression
Technology Driven vision for The
Digital Library
 • We can store everything
   – all the knowledge of the human race
   – in all forms
   – that is the Universal Digital Library
 • Cost of Selection is stationary but
   storage cost is plummeting

It is not about contents alone-
           It is about networking of people
Education
                               Universities
                               Colleges
                               Schools




                               Real-time
                                Engineering
                                Science
                                Business




   3 Ls of Learning
1. Face-to-Face Lectures
2. Virtual Labs
3. Universal Digital Library
Universal Library Vision
 All recorded information online
 • instantly available
   –   To Anyone
   –   Anywhere in the world
   –   In any language
   –   searchable, browsable, navigable by humans
       and machines
Digital Library Contents
• Books
• Periodicals (journals, newspapers)
• Art, photographs
• Databases, software
• Movies, video
• Music, opera, dance
   Suppose all of this were on the Web
Digital Library of the future
 • Digital library
 • Digital museum
 • Digital tour guide
 • Research assistant
 • Knowledge amplifier
Can we store all the human
knowledge in a Digital form
There are about 100 Million books written by the human
   race
Multiply by 10 for all other form of knowledge
1 book = 500 pp. = 1 MB uncompressed
    – 109 books = 1015 bytes = 1 petabyte
140 million computers on the Internet
   – At 20 GB free space each  >2.8 Zetabytes now
1 GB of disk costs ~$1
   – 1 petabyte < $1 million
   – Our Peta Byte server Initiative
   – Storage is not the limitation but creation and
     coordination are
   – Avoiding Duplication and connectivity are
Universal Digital Library
  •   More than 120 million PCs on the net
  •   Each having atleast 20 GB of free space
  •   Peer to peer Communication
  •   Can we store all the Human Knowledge
      in the computers




                         This is today
                 The time consuming process is taking the
                printed books to the web- The technology is
                            not an impediment
Technology Driven Vision for the
Universal Digital Library
• A vision to store everything that the
  human race ever produced
• A mission to digitize 1 Million Books and
  make them freely available
The Strategy for Scanning of
books
• A planetary Scanner like the Minolta PS 7000
• Takes about two hours to scan a 500 page book,
  crop, OCR and convert it to TIFF, HTML and
  XML files
• About 10, 000 pages to the web in a day
• Storage per book is around ~ 60MB
• 100 Tera byte is not an issue
• Our Partner Internet Archives has 370 TB
  adding 30 TB a day
• Distributed data bases
Process

          Process Involved


            Identification of
                Books

            Pre-Scanning
               process

              Scanning
               Process

                Image
              Processing

              Conversion
               Process
Scanning




            •2 pages at a time
           •Stored in tif format
Post scanning operations
• Skew Correction
• Document Registration
• Dot Shading and Speck Removal
• Image centering
• Image Cropping
• Smoothing and Completion
Image comparison
Original Image
Processed Image
SW 1
OCR CONVERSION
Performance evaluation for various fonts in
Kannada language OCR




Series1: Average performance efficiency before using the cropping software.
Series2: Average performance efficiency after using cropping software.
The Digitized book
• Average book size ~ 500 Pages
• Size of Page as Image ~ 50-150 KB
• Size of Page as text file
               (rtf /htm) ~ 8 – 15 KB
• Average size of Digitized book ~ 60MB
    Brightness – Dark(1 in scale) and contrast – 9(in scale)




Original image




Cropped image
Million Books to the web- Stake
holders as Partners
 • Academia- CS, IS and users
 • Researchers and Language
  Technologists
 • Cultural and Religious Organizations
 • Public Libraries
 • Government Agencies
 • None too exclusive
Background and Status
 • Collaborative Project between India and US
 • Lead roles by CMU and IISc
 • Initiated by CMU sending scanners free of cost to
   India. NSF supported
 • Initiated by the Office of the Principal Scientific
   Advisor to GOI by a Seed funding to IISc
 • Fuelled by MCIT’s whole hearted support
 • More than 16 centres in academic, religious and
   government institutions spread across the country
 • 69 scanners in place
 • China, Egypt (Alexandria Library), Srilanka,
   Australia joining in
 • There is light on the other side of the tunnel
  Hubs of DL Activities in India
Anna University, Chennai, Tamil Nadu
Arulmigu Kalasligam College of Engineering, Srivilliputur, Madurai,
   Tamil Nadu
Goa University, Goa
Indian Institute of Information Technology, Allahabad, Uttar Pradesh
International Institute of Information Technology, Hyderabad, Andhra
   Pradesh
City and State Central Library, Andhra Pradesh
Shanmugha Art, Science, Technology & Research Academy,
   Thanjavore, Tamil Nadu
Sringeri Mutt, Sringeri, Karnataka
Tirumala Tirupathi Devasthanams, Tirupathi, Anadhra Pradesh
Mahastrastra Industrial Development Corporation, Maharastra
Universirty of Pune, Pune
Kanchi University, Kanchi, Tamil Nadu
Indian Institute of AstroPhysics, Karnataka
                        0
                        5
                       10
                       15
                       20
                       25
                       30
                       35
                       40
                       45
          AK
               CE




                       2
               AU




                       1
        SA
          ST
               RA




                       2
          Ka
             nc  hi




                       1
             IIA
                  P



                       1
           Sri
S&             ger

                       1
     CL            i
III        ibr
    TA         ary
                               10


        lla
            hab
                 ad
                           5


   Ra
      sth
          rap
               ath
                   i
                       3




Go            IIS
                  c
                       4




    aU
       niv
            ers
                ity
                       2
                                         Scanner Operation at Hubs




            Pu
   Pu         ne
                       1




      nja
          bU
              niv
                       3




           MI
              DC
                           5




       III
           TH
              yd
                                    40
Progress of Various Centre in Scanning

               5000                                     6276
               4500
               4000
No. of Books



               3500                                          3042
               3000
               2500                2000
               2000 1704
               1500      1031 1097
               1000                     504 465 273
                500                                 158
                  0




                                                                               CCL

                                                                                     SCL
                             AKCE




                                                          PUNE
                                    SASTRA
                      IISc




                                                                 AU
                                             TTD

                                                   MIDC




                                                                      Kanchi
                                                   Centre
                                 No. of Pages




                   0
                       200000
                                400000
                                         600000
                                                  800000
                                                           1000000
                                                                     1200000
                                                                               1400000




            IIS
           AK c                                        837708
         SA CE
           ST
              RA         158933
                                         451452
                                                                                         Number of Pages Scanned




            TT
           M D                           500000
            ID
           PU C
              NE        134100
                       97334
Centre
         Ka AU
           nc            152502
              hi
           CC      39395
               L
           SC                                                        1319001
               L
                                                                     1080759
Category of Books
   5000
                        5596                                  English
   4500                                                       Telugu
   4000                                                       Tamil
   3500                                                       Sanskrit
              2962
   3000                                                       Kannada
   2500
                                                              Others
   2000
                                                              Urdu
   1500

   1000                           836
                                             430                          384
    500                                                 176       168
      0
               sh




                                   l
                          u




                                                         a




                                                                          du
                                             rit




                                                                  rs
                                 mi




                                                       ad
                       lug




                                                                he
            gli




                                          nsk




                                                                        Ur
                               Ta




                                                     nn


                                                              Ot
                     Te
          En




                                        Sa



                                                   Ka
Cumulative Status

        4771184




16550


Books    Pages
More Centres and Initiatives-
Already 61 scanners in operation
+ 39 in the pipe line

• Rashtrapathi Bhavan
• Punjab Technical University
• IIIT Hyderabad and University of Hyderabad
MCIT’s Initiatives
 • Mobile Van with VSAT for the Book Mobile
 • ERNET providing connectivity to all centres
 • Many Centres supported with funds for
   computers and for scanning operations
 • Total spending from Government support and
   from Scanning Centre’s resources is ten times
   more than the Scanning equipment cost and
   effectively 100 times more
 • Support from all quarters of the government,
   religious leaders, academia and private agencies
 • Universal Digital Library of India to be launched
Some Observations
          and the Road ahead
• More than 5 million pages have been scanned
• The highest average rate of sustained scanning
  was about 4,000 pages per day at Hyderabad
  during February.
• Our goal is to establish best practices to reach
  6000 pages a day
• 3 years – 1 M Books
• By 2020 – 20 Million Books, 2 Million Songs,
  200,000 Movies
• The most enviable content creation
Road Ahead
 • Establishing the Digital Library of India
   on the same lines as the E-Governance
   Initiative
 • Under the MCIT
 • Head Quartered in AP
 • A think tank for content selection,
   delivery, technology and policy
   directions for the country
 • Creation of special funds for 4C
Criteria for Selecting Mega
Centres- 5 of them planned
• Geographical Distribution
• Availability of contents of interest to larger
  user base
• Local enthusiasm to support and sustain this
  activity
• Budget of US$ 200,000 Initially and around 0.5
  cent per page of output
• One single scanner can produce 2 Million
  pages a year-
• We will have 300 scanners – a Million books a
  year
Raod Ahead
 • Mega Content Creation Centres
 • New Delhi, Varanasi, Allahabad, Hyderabad,
   Far east (Tawang or Guahathi), Kolkotta and
   Chennai
 • Each Centre having around 40 scanners and 5
   mobile scanners
 • Content Creation Centres with upto 5 scanners
   in Gujarat, Rajasthan so as to cover the entire
   country
 • Spearheading Language Technology
   Initiatives
 • Adding voice and video of our heritage
Universal Digital Library
 • Goal — To have all public knowledge online,
   available for free to all, everywhere
 • An achievable goal
   – There are only some 100,000,000 books in the world
   – A few billion dollars could bring these online
 • Limitations
   – Copyright and licensing issues
   – Different language books and character recognition
     technologies
      • We must ensure that English is not necessarily the de facto
        language
 • Universal Library
TECHNOLOGICAL CHALLENGES
• Input (scanning, digitizing, OCR)
• Data representation
  – text, notations, images, web pages
• Navigation and Search
• Multilingual Issues
• Output (voice, pictures, virtual reality)
• Synthetic Documents
SEARCH ENGINE of UDL
• Very powerful light weight and scalable
  CMU search engine
• Greenstone
• Both are working and are being evaluated
  for the choice
• Both have been modified for use as Indian
  Language search engines- language
  independent search
• Future- Semantic web and content based
  retrieval – Speech input and speech output
  COMPARATIVE ANALYSIS – GREENSTONE Vs
         UDL SEARCH ENGINES

  Search      Time       Boolean   Proximity        Case          Stemming
  Engine      Taken

Greenstone   Not          OR &      Phrase     User can select    Stemming
             dependi      NOT      searching     the option        allowed
             ng on the   Default
             number      :AND
             of hits


  UDL        Highly       OR          No          No Case        Not available
             dependi     Default                 Sensitivity
             ng on the   :AND
             number
             of hits
Choice of Collection
 • Use books from libraries that are beyond
   copyright
 • Administrative metadata from OCLC, ISBN,
   and other sources
 • Dublin Core for Indian Books
 • A Copy Right Metadata – aggressive attempts
   to obtain copy right- Free Copyright from
   many agencies including GoI
 • Source Library Metadata
 • Converge towards focussed collection
Funding – Road Ahead
 • Funding effort must be an organized activity
 • Commercial funding unlikely for “public
   good” activity
   – Must go to governments, NGOs
 • World Bank
 • Qatar (if CMU deal succeeds)
 • Benefits of UDL:
   –   Digital Opportunity
   –   Use in distance education
   –   International involvement – cultural diversity
   –   Technology dissemination
   –   Low cost v. conventional libraries
 • Funding is tied to Outreach (next slide)
Outreach
• The UDL message must be disseminated
• Present at World Summit (WSIS) in Geneva
  (12/03)
• Pre-WSIS meeting at CERN (12/03)
• Establish liaison with UN Decade of
  Literacy (2003-2013)
• Points:
  – Terabyte servers
  – “Free to read” policy
  – Universal Dictionary (applicability to other
    domains)
Access by Public
 • All content free to read, print one page at a
   time
 • Restrictions imposed by donors will be
   respected
 • Categories of use will be recognized, e.g.
   cannot print entire document
 • Buttons, links to fulfillment houses and
   publishers are allowed- to take in “born
   Digital” copyrighted material
Partner Relations- Future
• All material scanned or input as part of the
  UDL will be shared by all partners
• Preference for national umbrella
  organizations to simplify international
  partner relations
• Relationships between partners and their
  national DLs encouraged
• Online communication and collaboration
  tools needed to facilitate partner questions
  and interchanges
• Written partnership agreement will be made
Standards
• Published standards within the UDL
• Quality control and testing standard
• Funding to be sought to support standards
  development
• Logo to be developed (graphic device
  without words). Must appear on all sites, all
  pages
• Logo should have a hot link to a gateway
  site that links all UDL sites
• Local variability in look and feel of sites is
  permitted so long as the logo is displayed
Scanning/OCR Policy
 • We scan what gives greatest impetus to
   continued funding
 • Language: majority of content in English;
   otherwise no restriction
 • Scans will be previewed for minimum
   quality; OCR will not be corrected unless
   local site desires
Metadata

 • All entries MUST have metadata according
   to MARC or Dublin Core
Copyright
 • Public domain materials: no restrictions,
   tools for printing entire document provided
 • Works of uncertain copyright status:
   – Good faith effort to determine status, locate owner
   – Scan and index work
   – After a waiting period (at least one month), make
     work viewable
 • Archival material (old but unique)
   – Allow resolution restriction to avoid devaluation
     of original
 • Out-of-print in-copyright (OPIC)
   – Seek blanket permissions from publishers
Possible Intake Model
   HINDI     LOCAL           TAMIL     LOCAL         GUJARATI      LOCAL
  INTAKE    MATERIALS       INTAKE    MATERIALS       INTAKE      MATERIALS


             SCANNING           SCANNING             SCANNING
              CENTER             CENTER               CENTER



  ENGLISH      SCANNING            INDIA          SCANNING          ART
   INTAKE       CENTER           CENTRAL           CENTER         INTAKE
                                MIRROR SITE

 INDIA


OUTSIDE
 INDIA        AUSTRALIAN            CMU               CHINESE
              MIRROR SITE        UL SERVER          MIRROR SITE
The Digital Library a Test Bed for
language research
 • Rich data in many languages from the Million
   Books to the web Project - atleast 10,000 books
   in any language
 • Translations in many languages- Gita, NBT,
   NCERT etc- an excellent tool for language
   translation-
 • Training data for the OCR
 • The case insensitive ITRANS standard
The Digital Library a Test Bed for
language research
 • Rich data makes the creation of OCRs in
   Indian languages easy- In Tamil, Kannada
   and Malayalam – A rapid prototyping
 • Speech synthesis and recognition
 • Indian Language Search Engines
 • Example Based Machine Translation
 • Universal Dictionary
  Word               English                          POS   Pron   Use   Lang
  danúbia            linen tape                                          HUN    HUNGARIAN
  danum              water                                               PMP
                                                                                KAMPAMPANGAN
  danun              early                                               PMP
  danup              hunger                                              PMP

The Universal Dictionary
  danup              hunger, starvation                                  PMP
  danupan            hungry, starving                                    PMP
  daný               existent                                            SLO
  daný               existing                                            SLO
                                                                                SLOVAK
  daný               given                                               SLO
  daný číslom        numerical                                           SLO
  daný na pospas     obnoxious                                           SLO
  danyag             landscape                        n                  HIL    HILIGAYNON
  daog               overturn                         v                  CEB
  daog               prevail                          v                  CEB    CEBUANO
  daogdaog           manhandle                        v                  CEB
  daong              boat with a covered cabin, ark                      TAG
                                                                                TAGALOG
  daong              bring the ship to shore                             TAG
  daot               harm                             v                  CEB
  daot               mar                              v                  CEB
  daotan             bad                              adj                CEB
  daotan'g buut      dislike                          n                  CEB
  daotan'g hitabo    mishap                           n                  CEB
  daotan'g tinguha   malice                           n                  CEB
  daotan'g tuyo      malice                           n                  CEB
  dapa               granary                          n                  CEB
  dapa               lie flat on stomach or face                         PMP
  dapa               down on stomach or face
                     lie flat                                            TAG
  dapače             on the
                     down contrary                    adv                BOS    BOSNIAN
  dapadnúť (na       to land                                             SLO
                                                                                FRENCH
  d'apaiser
  nohy)              to appease                       v                  FRE
  Aboutness Hierarchy- Dr Shamos
                                                 Universe
SUBJECT SEARCHING
   OCCURS HERE
                               Collection


        Newspaper     Book                                  3D Artifact


                     Chapter
          Article
                     Section

                    Paragraph

                    Sentence         Photograph

KEYWORD SEARCHING     Word
   OCCURS HERE                              Object
                      Glyph
Legal and Business Challenges
• Use of copyrighted material
• Economics (Who pays? Who gets?)
• Privacy
• Reliability of information
• Change in the nature of teaching
• Change in the nature of Information
  creation and use
Philosophy of Copy Right Laws
 • Protect the Inventor so that private
   investments in R & D would flow
 • Disseminate the information so that
   society grows
 • Protect the fairuse
 • Ensure you get what you paid for
What can be copyrighted ?
 • Must be tangible, e.g. a lecture can’t be
   copyrighted, a transcript of it can

 • Work must be original

 • Work must be creative - even minimal
   efforts usually count as creative
Fair use doctrine
 Authorizes any person to make fair use of a
   published or unpublished copyrighted work
   (including the making of unauthorized
   copies) in these contexts:
  In connection with criticism of or comment
   on the work
  In the course of news reporting

  For teaching purposes or

  As part of scholarship or research activity
Four basic Factors:
 1.   The purpose and character of the use,
      including whether such use is of a
      commercial nature or is for nonprofit
      educational purposes
 2.   The nature of the copyrighted work
 3.   The amount and substantiality of the portion
      used in relation to the copyrighted work as a
      whole; and
 4.   The effect of the use upon the potential
      market for or value of the copyrighted work
www.library.org principles
1. Scholarly and government information
   and knowledge is a public good
  • that should be available, maintaining the
    balance of the rights of the individual creator
    vs. the needs of the public
2. The Library is the intellectual crossroads
   of the community.
3. Librarians will conceptualize and ensure
  • implementation of innovative new systems
     • for the creation and dissemination of information
       for succeeding generations.
“This rule provides that the first sale of a
  copy of a work to a member of the public
  ‘exhausts’ the rights holder’s ability to
  control further distribution of that copy. A
  library is thus free to lend, or even rent or
  sell, its copies of books to patrons”

How does this work in the Digital
 World ?
Music, Movie and Entertainment
Industry
 •   Much larger part of most of the economies
 •   Large production costs
 •   Need to protect business interest
 •   Need to technology to protect
 •   NAPSTER – peer to peer communication
 •   DeCSS
 •   NAPSTER for video ??
 •   Consumer is different from the creator
New paradigms in the Digital
Library
 • Should the laws used for protecting
   commercially attractive enterprise such
   as patents, music, entertainment be
   applied to DL
 • The dissemination of information
   creates multiplication unlike in music
   etc
 • Shorter life cycles for the information
Copyright Conflicting
requirements
    Need to protect the financial interests
     of creators in order to encourage
     private investments to the economy
    Need to create a framework for every
     human being to create

 The 2nd principle should dominate in DL
 The 1st principle should dominate the
  others
The Concept of FourC
 The scientific community is the only
 one that is creator and consumer of
 information
 It pays for both
 The SW Industry had shown the way
 for freeware
 Can we do it in Scholarly
 communication, text books etc.
The Concept of FourC
  In the 20th Century, in the interest of public
  good the Governments created BBC, PBS,
  AIR and also the Public Library System-
  provided compensation for artists and
  writers while providing free access to public
 Total Global Expenditure in public
  broadcasting and public libraries exceed 100
  B$
  Look at our kings who supported all the
  poets and scholars
  We need to find the 21st Century equivalent
  of BBC, AIR and PBS.
The Concept of FourC

  Learn from NAPSTER- will we have a
  video equivalent of NAPSTER
  It is impossible to police and protect IP
  Rights at gigabit rate connections
  Some countries and WIPO under pressure
  from lobbying groups form the draconian
  Copy Right Laws
  Remember the FAIR USE Doctrine- and
  what the creators want- recognition and
  compensation
The Solution -FourC
 Consortium for Compensation of Creative
  Contents- FourC
 Set aside 25% of the current national
  expenditure on public broadcasting and PLs
 Authors are encouraged to put the work on
  the web after a few years of commercial
  exploitation- many models- in return get tax
  excempt etc.
 India showing the way IASc and INSA
 Books out of print
 Titanic effect
 Authors Can take back the Copy right
The Solution -FourC

Authors compensation based on the hits
Future versions of text books may be FAQs
 and XMLised-
Many eceonomic models-
Can work for Courseware as well
The Solution -FourC
 The changing trend in publications- we want
  the documents to be readable by the machines
  as well humans
 Born digital documents
 Can we compensate those for creating contents
  for the web
 Can we compensate those who create music
  and movies for the web- really small form
  factor – small screens
Conclusion
• Knowledge multiplies whenever bits are circulated on
  the web
• Technology has a habit of creating a problem (by
  knowledge explosion) and spending the rest of its time
  in trying to solve it- through Digital Library
• The Universal Digital Library with 20 Million Books by
  2020 – A year our President dreams India to become a
  developed nation
• A FourC Policy and a Digital Library Act are in the anvil
  in India to meet this mission
• If a billion people sneeze- together we can create a
  Hurricane
• With the technology of the two nations we will convert
  this hurricane into useful energy and light up the world
  of knowledge
• If you are creating a digital library, it should be for
  access by anyone, anytime and from any place
• If Your Digital Library Is For Exclusive Use, Let Us Talk
  About Weather
• There Is Nothing Called, Your DL, My DL
   – It Is Our DL
   – The Universal Digital Library

								
To top