Docstoc

OCLC Online Computer Library Center CONTENTdm Digital

Document Sample
OCLC Online Computer Library Center CONTENTdm Digital Powered By Docstoc
					                                               OCLC Online Computer Library Center




             CONTENTdm Interoperability--
       Leveraging resources; repurposing collections




                                  Claire Cocco, Product Manager
                                  Geri Ingram, Customer Service
                                  Specialist
ALA Annual
New Orleans, LA
                                  DiMeMa, Inc.
June 23rd, Friday, 9 am to noon
                                            OCLC Online Computer Library Center




Agenda Part 1

 9:00 to 10:15
 I.    Mainstream digital objects into existing workflows
         Importing from legacy systems

 II.   Exporting
 III. Example of collaborative development for
        interoperability
         METS transform (courtesy of CDL)



 [BREAK 10:15 TO 10:30]
                                  OCLC Online Computer Library Center




Agenda Part 2

 10:30 to 11:30

  Customizing and integrating your
   CONTENTdm site
    Web templates

    Custom Queries and Results

    Configuration files
                           OCLC Online Computer Library Center




Agenda Part 3

 11:30 to Noon

  Handling Finding Aids

  Importing EAD files into CONTENTdm
                                   OCLC Online Computer Library Center



Setting the context:
fully engaged in digital library transformation


   Library services and collections expanding to
    encompass all
   Traditional to digital
      Licensed
      Reformatted

   Sharing
   Preserving
                                    OCLC Online Computer Library Center




Leveraging resources

   Staff time and skills throughout the organization
    and/or consortium

   Existing metadata in some form

   Existing digital collections (images and
    transcripts)
                                         OCLC Online Computer Library Center




Why? For better customer service

  In order to mainstream your processing and amplify your
   efforts.
  Your digital collections should ultimately be mainstreamed
   into regular workflows, similar to the ones used for other
   materials (whether that’s done centrally or in a distributed
   fashion).
  This includes selection, technical processing (cataloging,
   organizing, importing), integration with site vis-à-vis
   presentation and archiving.
                                         OCLC Online Computer Library Center




Mainstreaming processing of digital formats
    (Part 1 of 3)

  I.     Importing from other systems to CONTENTdm

  II.    Exporting from CONTENTdm

  III. Example of collaborative development for
       interoperability
        A.   CONTENTdm Standard Export

        B.   METS transform for import
                                            OCLC Online Computer Library Center




I. Importing from other systems to CONTENTdm


  • Metadata only
     • When records describe items that are not yet scanned

     • Replace ―null‖ files at later time

  • Metadata AND their digital files
                                    OCLC Online Computer Library Center




From an OPAC or other database system


   When you have…

    Individual image files cataloged already

    And can export from an OPAC or other dbms



   Or where you have compound digital objects ready
     for migration
                                        OCLC Online Computer Library Center




Migration steps:

   Prepare the collection and the import files
      Cross-walk metadata to Dublin Core

      Configure the CONTENTdm collection fields

   Export and prep data in a tab-delimited ASCII file

   Import the file to CONTENTdm
                                         OCLC Online Computer Library Center




Data prep: Common problems in tab
delimited data files
 Extra data in columns or rows
    Extra tabs at end of line
    Extra CRs at end of file (Should only be 1 CR)
    Carriage return in metadata, tab in metadata
 Files must exist
    0 versus O
    Error may occur in previous record, check few rows
     before and after error
 File names are required, not full pathnames
                                         OCLC Online Computer Library Center



Data prep: Troubleshooting with Excel

 Use Microsoft Excel to open the file and view data
    Each row should be an item with last column as
     filename
 Work with small batches to find errors – keep
  adding items until record with error is found
 Use Excel’s ―CLEAN‖ function to remove invisible
  characters
 Import images from directory without using tab
  delimited file
    Checks for any type of imaging errors
                             OCLC Online Computer Library Center




Demo: MARC to DC

  Export MARC records to tab-delimited text
   file (using ILS or MarcEdit)

  Format and clean up the text file to
   conform to your CONTENTdm Collection
   schema

  Import the file (with or without images)
   to the Collection
                                    OCLC Online Computer Library Center




Importing compound objects

  • For documents, postcards, monographs and
    picture cubes

  • Can do singly or in batch

  • Much easier to start with singles, then set up for
    batch when process is smooth
                                               OCLC Online Computer Library Center




Migrate compound objects from another
database system

  Where you have many compound digital objects to migrate
   Prepare the collection and the import files
      Cross-walk metadata to Dublin Core
      Configure the CONTENTdm collection fields
      Configure folders for scans and transcripts (if appropriate)
      Choose an import method based on your data structure
      Create tab-delimited ASCII file(s) appropriate to the method
      Import the files to CONTENTdm in batches
                                               OCLC Online Computer Library Center




Multiple compound object wizard

    Documented in online tutorial
    Today’s demo described in handout
    Four import methods for multiple object loading
       Compound object (same as single, but upload batched)
       Directory Structure (most flexible and efficient)
       Object List (useful when NO page-level metadata)
       Job List


    Time allowing, demonstrate three different object types using 3 of
     4 methods
                                     OCLC Online Computer Library Center



Choose a multiple compound import method
based on your data
* Will demo   Compound   Directory              Object List
                         Structure
              Object                            (No page-
                                                level
                                                metadata)
Postcards      YES        YES                     *YES


Documents      YES        * YES                   YES


Monograph      * YES      YES                     YES
  Are your scan files
                              Create compound object
    separated into
                         No    directories for EACH
   compound object
                                 compound object.
     directories?


         Yes

                                 Break up into
 Are they all the same
                                  batches by
  type of compound       No          type
        object?



         Yes

                                Do you have one
                                                                                           Create text file listing all
                               tab-delimited text
Do you have page-level                                        Do you have tab-              compound objects and
   metadata for the      No    file containing ALL     No    delimited text files for No      object metadata or
 compound objects?                 the objects?             EACH compound object?          create a text file for each
                                                                                              compound object.


         Yes                           Yes                           Yes

                                                                       .
  DIRECTORY                                                    DIRECTORY
  STRUCTURE                     OBJECT LIST                    STRUCTURE

                                                                       .
                                             OCLC Online Computer Library Center



Every one of the four
CONTENTdm compound object importing methods


   • Requires object-level metadata

   • Requires preparation
      • File–naming, keeping sort order in mind

      • Each object has own directory for scans

      • May use tab-delimited text file(s)

   • Accommodates transcripts
                                        OCLC Online Computer Library Center




A word about descriptive page-level metadata


   • Supported by some but not all 4 import methods
      • NOT supported by Object List

   • At page-level Title is only field required
      • Technical metadata, can be generated by Template
        creator
                                                     OCLC Online Computer Library Center




More on transcripts
 Typescripts and transcripts
    Requires a field designated as the data type “Full Text
     Search”
    Inserted into the metadata field of the scanned page
        During import
            Through use of .txt file found, or
            By Template Creator
                 If OCR Extension in use
                 Or by “Directory Import” as with early versions of CONTENTdm

 Transcripts and typescripts are supported by all four methods
  (i.e., not considered “metadata” for purposes of this
  discussion)
                                  OCLC Online Computer Library Center




Demo: Import Multiple Compound Objects

   Monograph using Compound Object method

   Postcards using Object List method

   Documents using Directory Structure method
                                           OCLC Online Computer Library Center




II. Exporting from CONTENTdm

   To ascii tab-delimited with field headers

   To xml:
      Standard Dublin Core —only DC

      Custom—all fields, including local but not structure

      CDM Standard—all fields, including structure
                                                 OCLC Online Computer Library Center




III. Examples of collaboration for interoperability


     • Web integration through search engines, RSS

     • OAI harvesting
        • Enable at collection or server level

        • Choose to suppress <pagedata> or not

     • WorldCat registration
        • Open WorldCat integration
                              OCLC Online Computer Library Center




CONTENTdm and a new METS transform


   Info available on USC in July

   Code at SourceForge

   Windows-oriented
The CONTENTdm to METS
     conversion tool
                  What is/are METS?

                Why is/are METS good?

                    What is 7train?

                 How do I use 7train?

               What do I get from 7train?

                 How do I get 7train?
               What is/are METS?


 METS (Metadata Encoding and Transmission Standard) is an
  XML-based standard for encoding metadata to describe
    objects (digital or otherwise) within a digital library.


See http://www.loc.gov/standards/mets/ for more information
                  What is/are METS?
METS

                                        Metadata about this particular METS -
   metsHdr
                                        encoder, contact info, etc.
                                        Descriptive metadata - title, author,
   dmdSec                               subjects, etc.
                                        Metadata for the management of the
   amdSec                               object: technical details, object
                                        history, etc.
    fileSec                             A list of files that make up the object


   structMap                            Description of the structure of the
                                        object, i.e. how the files fit
                                        together
  behaviorSec                           What to do with the object: machine
                                        actionable instructions

                Yellow elements/tags are required; all others are optional
                  Why METS?
To be able to add your objects to other collections and
    increase the visibility your institution's assets.
                   What is 7train?

  7train is an XSL-based tool for converting XML documents -
 in this case CONTENTdm exports describing objects managed
in the CONTENTdm system - into METS objects suitable for
  submission to a digital library system, such as the California
           Digital Library's Online Archive of California.

  7train is a platform-independent, standalone tool that was
  designed to work on any system and to be simple to use.
           How does 7train work?

It is as easy as dragging your CONTENTdm XML export file
                    onto an executable file.
How does 7train work?
How does 7train work? What do you get?
Output: A Sample METS document
                     References & Links

         7train Home: http://seventrain.sourceforge.net
7train Download: http://seventrain.sourceforge.net/7train_download.html
             CONTENTdm: http://www.dimema.com
           METS: http://www.loc.gov/standards/mets/
               XSL: http://www.w3.org/Style/XSL/
       The California Digital Library: http://www.cdlib.org
   The Online Archive of California: http://www.oac.cdlib.org
                                                              OCLC Online Computer Library Center


                                                 Librarians,

Interoperability                                 Archivists…


    Web
                                CONTENTdm                               Other
  WorldCat               DC                                           CONTENTdm
                                                                         sites
   Regional                     Existing
                                              New
                                Libraries
    Union                                   Libraries

   Catalog
                                     10K/50K/
                          XML    Unlimited Objects                 OAI
                           DC                           OAI
                                                                                     OAI

                                                              CONTENTdm
              MARC     OPACS                                   Multi-Site
             RECORDS
                                                                Server


    OPEN                                                                 OAI
  WORLDCAT
                                                                                 Other
                                                                                 digital
                                                                                archives




                          For Library Users
                              OCLC Online Computer Library Center




BREAK—15 minutes

  This concludes Part 1

  To come after the break:

 Part 2
    Customization

 Part 3
    Finding Aids
                                  OCLC Online Computer Library Center




Customizing and integrating your CONTENTdm site
    (Part 2 of 3)



    Web templates

    Custom Queries and Results

    Configuration files
                                           OCLC Online Computer Library Center




CONTENTdm Web Templates

 Customizable for integration

 Designed to support broad range of users
    Small to large organizations

    Beginners to experts

 Use out of the box with minimal customization
    Basic customization requires minimal HTML skills

 Fully customize including advanced extensions

 Based on a PHP API (Hypertext Preprocessor and
  Application Program Interface)
                                           OCLC Online Computer Library Center




 Basic Customizations
 Minimal skills needed

 Easy to make changes
    Global include files

    Variables

 Recommend all organizations do basic
  customizations
    Header (name/logo), contact e-mail address, colors, about
     page, home page

   http://www.contentdm.com/help4/custom/templates.html
                                 OCLC Online Computer Library Center




Getting Started

   Access to Web server docs directory

   HTML editor or text editor

   Design plan

   Logo or other graphics

   Backup copy of original files
                                     OCLC Online Computer Library Center




Customization Demo

 http://sr.contentdmdemo.com

 Files located in /cdm4 directory
    /includes/global_header.php

    /client/LOC_global.php

    /client/STY_global_style.php

    about.php

    browse.php

    results.php

 New logo saved in /cdm4/images/
                                             OCLC Online Computer Library Center




Advanced Customizations
   Experience with HTML, PHP, and JavaScript needed
   Customize looks for each collection
     University of Nevada, Reno

   Web Template extensions
     E-commerce (University of Utah, Oregon State University)
     Comment forms (SENYLRC, Enoch Pratt Free Library, OSU)
     Custom metadata display (University of Oregon)
     QuickTime video (Williams College)

   http://www.contentdm.com/customers/index.html
                                          OCLC Online Computer Library Center




Examples of Advanced Customizations

   University of Nevada, Reno
    http://imageserver.library.unr.edu/

 University of Utah
  http://www.lib.utah.edu/digital/bodmer/

 Oregon State University
  http://digitalcollections.library.oregonstate.edu/cd
  m4/client/bracero/
   SENYLRC http://www.hrvh.org/

   Enoch Pratt Free Library http://www.mdch.org/

   Williams College http://contentdm.williams.edu/
                                        OCLC Online Computer Library Center




Customizations Tips

 Always make a backup!
 Be aware of encoding (UTF-8 vs. ASCII)
 See what other users are doing
    Share, borrow, and copy ideas and code
    http://www.contentdm.com/customers/index.html
    Listserv

 Document changes
    Document which files are edited and what code changes
     are made to ease upgrading to newer versions
                                               OCLC Online Computer Library Center




Custom Queries and Results (CQR)
   Create predefined, custom queries
     Virtual collections
     Guide users to specific results
     Integrate with other sites
   Multiple options
     Simple hyperlink, drop-down list, index box, text box, browse
   Easy to use
     Wizard generates code to copy and paste into Web pages
   Documentation
     http://www.contentdm.com/help4/custom/cqr.html
     http://www.contentdm.com/USC/tutorials/cqr.pdf
                               OCLC Online Computer Library Center




CQR DEMO

  Generate code using CQR

  Copy and paste into Web pages
    May need to change path

    Customize as desired
                                             OCLC Online Computer Library Center




Configuration Files
 Customizable files that reside on the server
 Stop words
    Full text field stop words – fullstop.txt
    Automatic hyperlink stop words – stopwords.txt
    http://www.contentdm.com/help4/custom/stopwords.ht
     ml
 Image viewer
    Customize how images are displayed – imageconf.txt
    For all collections or per collection
    http://www.contentdm.com/help4/custom/zoompan.ht
     ml
                                        OCLC Online Computer Library Center




Imageconf.txt Demo

  Located in the /conf directory on the CONTENTdm
   server

  Can change globally or for individual collections
     If you wish to change the zoom and pan default settings
      for a particular collection, copy the imageconf.txt file
      from the Server/conf directory to the index/etc
      directory of the collection(s) you wish to modify.

  Make a backup copy!
                                   OCLC Online Computer Library Center




Introduction to Finding Aids

   How many of you have them?

   Are they digital documents or paper?

   If digital, are they XML?
     Basic: create documents, monographs, and use
      http protocol to link

     XML: use EAD DTD, and style sheet to display
                                OCLC Online Computer Library Center




Handling Finding Aids
   Part 3

  Importing EAD files to CONTENTdm
                                                     OCLC Online Computer Library Center




Current EAD Support
   Import of EAD files
   Automatic text extraction from EAD files when:
     The file extension of the EAD is .xml.
     The file includes a header record beginning with DOCTYPE ead.
     The collection has a full text search field.
     The full text search field is empty when the item is added to the
      collection.

   Up to 128,000 characters extracted from the following fields
    and placed in the full text search field
     titleproper, title, unititle, persname, famname, corpname,
      genreform
                                OCLC Online Computer Library Center




Current EAD Support

  Display determined by style sheet
     XSLT

     CSS

  Client side parsing

  Affected by Web browser
                     OCLC Online Computer Library Center




Getting Started

 EAD XML files

 EAD DTD

 XSLT style sheet
                                  OCLC Online Computer Library Center




EAD Demo
 Configure Full Text Search field

 Store DTD and style sheet on server

 Edit path to DTD and XSLT in EAD files

 Import (single or batch)
   Add metadata

   Custom thumbnail if desired

 Upload, approve, index
                                         OCLC Online Computer Library Center




Custom EAD Extension

 Example by Oregon State University
   Terry Reese, terry.reese@oregonstate.edu

 Customized Web templates
   Client side or server side parsing

   Integrates display in templates

 VBScript for extracting metadata from EAD to
  tab-delimited text file
 www.contentdm.com/USC/templates/index.asp
                                            OCLC Online Computer Library Center




Oregon State University EAD Collection
http://digitalcollections.library.oregonstate.edu/
                                       OCLC Online Computer Library Center




Announcing new exposure for your
CONTENTdm Collections


   Collection of Collections
      http://collections.contentdmdemo.com/

      (also featured at contentdm.com/customers)

   Harvesting metadata from Collection sites at:
      http://primarysources.contentdmdemo.com
       Uses CONTENTdm Multi-site server