Docstoc

Metadata Architecture for Digital Libraries Conceptual Framework

Document Sample
Metadata Architecture for Digital Libraries Conceptual Framework Powered By Docstoc
					Metadata Architecture for Digital
           Libraries:
  Conceptual Framework for
    Indian Digital Libraries
        Madhusudana Rao CR
         C-DAC, Bangalore.

              Metadata for DL
Agenda
• Introduction
• Metadata
• Digital Library Architecture
  – SODA
  – STARTS
• Indian Digital Library
  – Background

                   Metadata for DL
Agenda
  – Proposed Architecture
  – SODA & STARTS
• Conclusion




                    Metadata for DL
Exclude
• Search Engines - General
• Digital Library - General




                   Metadata for DL
Introduction
• Information Processing & Retrieval
  –   Typical Library Environment
  –   Library Automation
  –   Networking of Libraries
  –   Digital Library
  –   Digital Library initiatives



                     Metadata for DL
Introduction
• Digital Library Scene
  – Search Engines
     •   Heterogeneous
     •   Vertical Information Retrieval
     •   Unique User Interface
     •   Search engines are different
     •   Protocols are different
     •   Querying & Ranking
     •   Incompatible across the sources
                         Metadata for DL
Introduction
 – Possible solutions
    •   Identifying the User Group
    •   Identifying the Information Sources
    •   Negotiating with different Information Sources
    •   Resource Description Format
    •   Choose best Information Source to evaluate Query
    •   Evaluate the query at these sources
    •   Merge the Query Results from these sources


                       Metadata for DL
New Protocol
•   User
•   User Query
•   Information Source
•   Networked Environment
•   RDF Metadata
•   User Interface
•   Search & Retrieval
                  Metadata for DL
Issues..
• Metadata
• Network Protocols
• Possible Solutions for typical environment




                   Metadata for DL
Metadata…definition

Structured data about
 data...


            Metadata for DL
Metadata…definition
• Data that helps in design, create, describe,
  preserve and use of information systems
  and resources is Metadata.
• Metadata can play in the development of
  effective, authoritative, interoperable,
  scaleable, and preservable information and
  record keeping systems.

                    Metadata for DL
Metadata…means
• Information Resource
• Library Catalogue
  – Index, Abstracts, Catalog Records, etc >
    MARC, AACR, LCSH etc.
• Human Generated Textual description
• Machine generated data


                     Metadata for DL
Metadata….features
• Content
  – Intrinsic
     • What it contains?
     • What is about?
• Context
  – Extrinsic
     • Who, What, Why, Where, How etc.
• Structure
  – Formal Set         Metadata for DL
Metadata…Attributes
• Intrinsic
  – Subject, Title, Author, Publisher, Publication
    place, Other agent, Date, Object type, Form -
    Identifier, Relation, Source, Language,
    Coverage, Abstract, Version, Notes, Signature,
    Classification, keyword




                    Metadata for DL
Metadata…Attributes
• Extrinsic
  – System Requirement, Mode of access,
    Availability, Cost, Control, Extent, Encoding
    description, Revision description




                     Metadata for DL
Metadata…for two communities
• Information Generators
• Librarians / Cataloguers




                   Metadata for DL
Metadata… can be
• Information Objects
  – Physical
  – Intellectual Form




                        Metadata for DL
Metadata…similar
• Typical Physical Library:
  – Catalogue
  – Book Racks
  – Books




                   Metadata for DL
Metadata…currently
• Electronic Information Environment
  – Users search Metadata
  – Pointers
  – Primary Information available on computer
    display
• Distinction
  – Electronic Environment

                    Metadata for DL
Metadata…process
               Two Communities
   Generators                        Libraries &
  Of information    Metadata         Cataloguers



                       User’s


                   Metadata for DL
Metadata…can be
•   Need not be Digital
•   More than description of an object
•   Come from variety of sources
•   Continue to accrue
•   One’s object Metadata can be another
    information object’s metadata

                    Metadata for DL
Metadata…can be
• Intermediate steps to retrieve content
• Surrogates of objects




                    Metadata for DL
Metadata… need
• Internet & WWW witnessed exponential
  growth
• Need of the hour in the internet is catalogs
  of some kind
• Internet/WWW is not designed to catalog
  the contents


                    Metadata for DL
Metadata…need
• Resource Description is a Challenge
• Tools are available
• Just directories listing of network resources
  and search engines
• Metadata is one of the solutions
• Again Standards are yet to make its impact

                    Metadata for DL
Metadata…issues
• Increased accessibility
  – Searching > existence of rich and consistent
    metadata
  – search across multiple collections
  – Distributed across several repositories




                     Metadata for DL
Metadata…issues
• Retention of Text
  – Collection of objects
  – Complex interrelationships with people, places,
    movements & events
  – Documenting and maintaining those
    relationships
  – authenticity, structural and procedural integrity


                     Metadata for DL
Metadata…issues
• Expanding use
  –   Disseminating digital versions
  –   Geography
  –   Economics
  –   Infinite ways to search information
  –   Retrieve to wider community



                       Metadata for DL
Metadata…issues
• Multi-versioning
  – variant versions
  – High resolution copy for preservation
  – Low resolution copy for thumbnail image for
    quick reference and network transfers




                     Metadata for DL
Metadata…issues
• Legal Issues
  – Track many layers of rights and reproduction
    information
  – Privacy
  – Proprietary interests




                    Metadata for DL
Metadata…issues
• Preservation
  – Generations - H/W & S/W
  – Technical, Descriptive and Preservation data
  – Information objects to remain accessible and
    intelligible over time




                    Metadata for DL
Metadata…issues
• System improvement and economics
  – Benchmarking
  – Planning new systems




                   Metadata for DL
Metadata..life cycle
     Creation & Multi
                                         Organization
        Versioning



      Preservation &
                                      Searching & Retrieval
       Disposition



                        Utilization

                    Metadata for DL
Metadata…standards
• In order Metadata to be useful & cost-
  effective it is essential
  – Structure, Semantics and Syntax conforms to
    standards
  – Capture essence of sources
  – Distributed metadata model



                    Metadata for DL
Metadata…standards
• There is no single international standard for
  Metadata
• Different levels - complexity, richness to
  simple formats
• Several metadata schemes has been
  proposed for different levels of
  requirements

                    Metadata for DL
Metadata…standards
• IAFA templates                • EAD (Encoding Archival
• WWW semantic header             Description)
• URS (Uniform Resource         • GILS (Govt Information
  Citation)                       Locator Service)
• OCLC InterCat project         • Federal Geographic Data
• TEI (Text Encoding and          Committee
  Interchange)                  • Museum Educational Site
• Search engine meta tags         Licensing Project
• Resource Description          • Dublin Core
  Framework
                       Metadata for DL
Dublin Core

Because it is
 simple…….. Yet
 effective ….

              Metadata for DL
Dublin Core..means
• Dublin, Ohio
• International consensus meetings,
  workshops, etc
• Emerging Infrastructure for Internet
• Support Resource Discovery
• Elements represent a broad interdisciplinary
  consensus
• Core set of elements for DL
                    Metadata
Dublin Core..standard
• Comprises of 15 core elements
• Consensus by an International, Cross-
  disciplinary group representing
  –   Library & Information
  –   Computer Science
  –   Text Encoding
  –   Museum
  –   Related fields of scholarship
                       Metadata for DL
Dublin Core..standard
• Each 15 elements are optional and repetitive
• Each element has a limited set of qualifiers
  and attributes
• Simple DC
• Qualified DC



                   Metadata for DL
Dublin Core..goals
• Simplicity of creation & Maintenance
  – Non-specialist to create descriptive records for
    effective retrieval in an networked environment
• Commonly understood semantics
  – Digital tourist for non specialist searcher
  – Convergence of common, more generic
    elements
  – increasing visibility and accessibility
                      Metadata for DL
Dublin Core..goals
• International scope
  – 20 languages
  – Coordinating efforts
  – RDF - WWW
• Technical challenges of Internationalization
  – Multilingual & Multicultural nature of
    electronic information universe

                     Metadata for DL
Dublin Core..goals
• Extensibility
  – Additional resource discovery needs




                    Metadata for DL
Dublin Core..elements
• Content
  – Coverage, Description, type, relation, source,
    subject and title
• Intellectual property
  – Contributor, Creator, Publisher & Rights
• Instantiation
  – Date, Format, Identifier & Language

                     Metadata for DL
Dublin Core..implementation
• Dublin Core web site lists 15 North
  America and Mexico in Europe and 12 Asia
  and Australia




                 Metadata for DL
Digital Library Architecture
• SODA (Smart Objects Dumb Archives)
• STARTS (Stanford Protocol proposal for
  Internet Retrieval and Search)




                  Metadata for DL
Digital Library
• Digital Library Services
  – User
     • Functionality & Interface
  – Searching
  – Browsing
• Archive
  – Managed sets of objects

                       Metadata for DL
Digital Library
• Digital Object
  – Stored and trafficked digital content
     • Simple files,
     • Sophisticated objects




                       Metadata for DL
Digital Library

                                          Library Users



                                         Digital Library         Digital Library
                                            Services             Service
                                                                 Providers


                                                                         Digital
                                                                         Objects
                 Archive 1   Archive 2               Archive N           out of Archives

                                                                                     Publishers
    Digital
    Objects in
    Archives




                                    Metadata for DL
Digital Library.. builds
• Identifying a user group
• Identifying archives holding information of
  interest
• Negotiating terms and conditions with
  publishing
• Creating Indices
• Services such as Search & Browse
                   Metadata for DL
Digital Library.. builds
• Creating User interaction services
  –   Terms & Conditions
  –   Authentication
  –   Billing
  –   Display




                     Metadata for DL
Digital Library.. hindered
• Interoperability
• Object mobility
• Complex archives




                 Metadata for DL
Digital Library..cons
• Digital Libraries are partitioned
  – Discipline - Computer Science, Aeronautics,
    Physics, etc.
  – Format - Technical reports, video, software,
    etc.
• Interdisciplinary search difficult
• Resource Description includes manuscripts,
  software, data sets etc.
                     Metadata for DL
Digital Library..cons
• Manuscripts Vs Other objects -
  Reintegration
• All digital storage and transmission, tight
  integration




                    Metadata for DL
SODA…background
• Information generated in several forms
• Differentiated by semantic types (report,
  software, video, data sets etc.)
• Given semantic representation differentiated
  by syntactic representation (PS, PDF,
  Word)
• Media boundaries exists

                   Metadata for DL
SODA…addresses
• Archive-independent container construct
• All semantic and syntactic data types
• Objects that logically grouped together
• Archived & manipulated as a single object
• Several objects can communicate with each
  other
• Arbitrary network services
                  Metadata for DL
SODA..addresses
• Traditional functionality associated with
  archives has been pushed down into objects
• Making objects smarter/increase the
  responsibility
• Archives dumber/decrease the responsibility



                  Metadata for DL
SODA
• Archives exists to assist the user to locate
  the objects
• Once the object is found user directly
  interact with the objects




                    Metadata for DL
Smart Objects.. illustration

              Smart Archives             Dumb Archives

            SOSA: Smart objects,        SODA: Smart Objects
 Smart                                  Dumb Archives
            Smart Archives
 objects    Ex: none                    Ex: NCSTRL+


            DOSA: Dumb Objects          DODA: Dumb objects
 Dumb                                   Dumb Archives
            Smart Archives
 Archives   Ex: NCSTRL                  Ex: FTP server



                      Metadata for DL
SODA Model…implementation




           Metadata for DL
Buckets..containers
• Object oriented containers
• Logically grouped items are
  – Collected
  – Stored
  – Transported as a single unit
• Many forms of same data
• Related & non traditional data (Supportive
  material)
                     Metadata for DL
Buckets.. containers
• Multiple packages
• Packages can corresponds semantics
  –   manuscript, software etc.
  –   metadata
  –   terms and conditions
  –   pointers
• Single package can have several items

                       Metadata for DL
Bucket..architecture
        Handle
        (unique ID)                                                    Access Methods




                                         Terms and Conditions

                                    Metadata (RFC 1807, Dublin Core)

 Packages             Manuscript.ps, .pdf, .tex, .doc
 inside the
 bucket                                                                          Element
                      Software.tar,.c, .java, .asp                               s inside
                                                                                 the
                                                                                 package
                      Images.gif, .jpg




                      Data sets.xls, .tar




                                             Metadata for DL
Bucket…requirements
• Unique ID - handle
• Either standalone or multiple repositories
• Standalone - WWW through TCP/IP
• Moderation of number of buckets through
  intelligence and functionality
• Individual buckets may have custom terms
  and conditions
                   Metadata for DL
Buckets..characteristics
• Is of arbitrary size
• Globally unique ID
• 0 or more components called packages
• Package contains 1 or more components -
  elements
• Element can be a file or pointer
• Packages and elements can be other buckets
                  Metadata for DL
Buckets..characteristics
• Package can be a pointers to a remote
  bucket, another package or element
• Buckets can keep internal logs of actions
• Interactions or communication between
  buckets are made only through defined
  methods
• Buckets can initiate actions, they do not
  have to wait to be acted on
                   Metadata for DL
Traditional Vs Bucket repository
       User                                   User



 Repository Interface                  Repository Interface
    intelligence                       Optional intelligence
                                                                  Bucket
                                                                  extraction
                                                                  procedure



         Archived objects                      Archived Buckets

                            Metadata for DL
Buckets..protocol

                                          Archive

           Index holdings
User       Search/retrieve
              holdings
                                 bucket
           Display holdings




               Metadata for DL
Bucket..Tools
• Author Tool
  –   Metadata
  –   Adds packages
  –   Adds elements to package
  –   Selects applicable clusters
  –   Terms and conditions



                       Metadata for DL
Bucket..Tools
• Management Tool
  – Interface
  – Query and update buckets
• Bucket Matching System
  –   SDI
  –   Find similar works by different authors
  –   Arbitrary SDI
  –   Metadata scrubbing
                       Metadata for DL
Buckets..implementation
• NCSTRL
• NCSTRL+




             Metadata for DL
STARTS
• Stanford Digital Library Project
• Search Engine Vendors




                   Metadata for DL
STARTS
• Document Sources
  – Internal networks
  – Internet
• Source Contents
  – Hidden behind search interfaces
• Algorithms/Protocols are different


                    Metadata for DL
STARTS..Architecture




            Metadata for DL
STARTS..Architecture
• Large Number of resources
• Each resource consist one or more sources
• Source is collection of files
• Accepts queries from clients and produces
  results
• Sources may be small or large
• Extract the source list from resources
  periodically      Metadata for DL
STARTS..Architecture
• Extract Metadata and content summaries
  from source periodically
• Query to a source to a resource
• Communicate with promising resources
• Results are from multiple sources, merge
  them & retrieve them to the user


                   Metadata for DL
STARTS..Query language
• Filter expression
  – Boolean nature
  – Defines documents
• Ranking expression
  – Associates score with documents




                      Metadata for DL
STARTS..Query language
• L-strings
  – language-country
  – string behavior
• Atomic Terms
  – Fields
  – Modifiers
• Complex filter expression
  – and, or, and-not, prox etc
                     Metadata for DL
STARTS..Query language
• Complex ranking expressions
• Global settings




                 Metadata for DL
STARTS..Merging ranks
• Unnormalized score of the document for
  each query
• ID of the sources where document appears
• Statistics
  – Term-frequency, Term-weight, Document-
    frequency, Document-size, Document-count



                   Metadata for DL
STARTS..Source metadata
• Properties of the source
  – Fields supported, score range, linkage etc.
• Content Summary of the source
  – List of words that appear in the source
  – statistics of each word listed
  – total documents in the list etc.


                     Metadata for DL
STARTS..in the end
• General Search Engines
  – Gathers all documents on the network
• STARTS
  – Gathers metadata about collections
  – Selects small set of collections
  – Search & retrieve


                    Metadata for DL
STARTS..implementation
• Alexandria Digital Library




                   Metadata for DL
STARTS..limitation
• Text only




              Metadata for DL
Indian Digital Library..
•   Ancient & Diverse culture
•   5000 years old culture
•   Largest Democracy
•   Seventh largest country
•   High population
•   Illiterate
•   Important part of World Economy
                   Metadata for DL
Indian Digital Library..
•   World’s largest middle class
•   Poverty
•   Highly skilled manpower
•   Generates Research Oriented Information
•   Global interest
•   Major players in IT in the World
•   World is looking for ancient Indian Culture
                     Metadata for DL
Indian Scene..IT
• Content is lacking
• Indian Literature control (both bibliographic
  and full text)in almost all fields are sketchy
• NII
• DL on Indian Heritage
• World Wide accord for Indian Heritage
• Internet Religion is the hot attraction
                    Metadata for DL
Indian Scene.. IT
• West Research has been done on Veda,
  Upanishads, Shastra, Philosophy etc. but
  soul is missing
• Protection, Preservation, Study, Research,
  Propagation for posterity
• NLP
• Knowledge Presentation

                   Metadata for DL
Indian Scene.. IT
•   Speech recognition
•   OCR
•   Machine translation
•   NL interfaces
•   Text Processing through Index,
    Concordance, Thesauri, Dictionaries

                    Metadata for DL
Indian Scene.. IT
• National Integration, Guide Humanity,
  Conflicts, Aberrations, intolerance etc
• Value based system
• Historic priceless manuscripts




                   Metadata for DL
Indian Heritage
•   Indian Art
•   Indian Paintings
•   Indian Sculpture
•   Religion




                       Metadata for DL
Proposed Architecture….
• Background
  – User Group
     • Skilled & Illiterates
     • Oral tradition still exists
     • Multilingual
  – Information Sources
     • Content is lacking
     • Literature Control both Bibliographic and Text is
       very weak
                         Metadata for DL
Proposed Architecture….
    • Media
         – Computer Generated files to Palm leaf manuscripts
    •   Language
    •   lack of standards for communication
    •   Geographical boundaries
    •   Accessibility
    •   Reaching rural population
 – Publishing
    • Restricted to regional and local

                          Metadata for DL
Proposed Architecture….
   • National initiates are yet to take off
   • Cooperative publishing is lacking
   • Unicode/Universal protocol yet make its impact
 – Network Resources
   • Communication infrastructure exists but not stable
   • Individuals, Organizations, local, regional are
     generators of sources
   • Loose networks - manpower & infrastructure
   • Lack of communication standards
   • Duplicate works
                     Metadata for DL
Proposed Architecture….
 – Need of Networked Information Sources
    • Many priceless knowledge lost or loosing
    • Future generation missing the value of life told by
      ancestors
    • Protection, Preservation, Study, Research,
      Propagation for posterity
 – Looking for future
    • NII
    • Better CCC, Computer, Communication, Content

                       Metadata for DL
Hybrid Architecture….
• Combination of SODA & STARTS
  Architecture
  – From SODA - Bucket Architecture
  – From STARTS - Search and Retrieval protocol
• Metadata - Dublin Core
  – For its simplicity and popularity



                     Metadata for DL
Bucket Architecture….
• Buckets are logically grouped
  – Language, Region, Content, Media, Images,
    etc. (any combination or together as intelligent)
• Large archives have buckets with many
  different functionality's
• Bucket may contain resources, packages,
  elements, metadata, pointers, etc.

                     Metadata for DL
Bucket Architecture….
• Bucket may be unique entity or many
  buckets may form an entity
• Bucket may be standalone with the content
• Many buckets may become resource
• Each bucket has been built with some
  degree of intelligence and functionality
• Includes author tool and management tool
                  Metadata for DL
Bucket Architecture….
• Similarly user’s buckets are also created
• Bucket matching may take place
• Interactions with packages or elements are
  made only through defined methods on a
  bucket
• Bucket can initiate actions
• Buckets can exist inside or out of a
  repository
                   Metadata for DL
STARTS Architecture….
• Search, Retrieval and Browse within Bucket
• Resources, Sources, Elements, Packages,
  Pointers, etc. based on the Bucket definition
• Search query is made within the source
  defined in Bucket
• Query may be within the bucket or across
  the bucket based on the definition and
  functionality
                   Metadata for DL
STARTS Architecture….
• Ranking is done within the source
• Matching is done with User’s Bucket
  definition
• Results displayed based on Ranking and
  user’s requirements
• Although STARTS uses Z39.50 for
  metadata & transfer protocol, we propose to
  use Dublin Core for metadata
                   Metadata for DL
New Protocol..
•   Need to create standard for communication
•   Information processing and retrieval
•   Feeling universal information source
•   Many sources converge as once resource
•   Global information resource
•   Universal accessibility by unified protocol
•   Global access
                     Metadata for DL
New Protocol..
• Frame work is just beginning




                  Metadata for DL

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:9/21/2011
language:English
pages:102