Docstoc

DSpace in Action Implementing the HKUST Institutional Repository

Document Sample
DSpace in Action Implementing the HKUST Institutional Repository Powered By Docstoc
					             International Conference on Developing Digital Institutional
                      Repositories: Experiences and Challenges
                         December 9-10, 2004, Hong Kong


                         DSpace in Action
                  Implementing the
         HKUST Institutional Repository System

                          Presented by K.T. Lam
                         Head of Library Systems
          The Hong Kong University of Science and Technology Library
                                lblkt@ust.hk


9 December 2004
                                                    Table of Contents

       From Idea to Creation
                Why have an IR?
                IR Software Selection
  Major Features
  Future Improvements
  Conclusions




Implementing the HKUST Institutional Repository System / K.T. Lam       2
                                            From Idea to Creation

  The idea of establishing an IR originated from a
   staff development workshop at HKUST Library
   on 26 November 2002, where Kimberly Douglas
   was invited to speak on “E-prints, OAI and
   Institutional Repository”.
  After the workshop, a Task Force was formed to
   investigate the idea.
  After two months of software evaluation, DSpace
   was selected to build the Repository.


Implementing the HKUST Institutional Repository System / K.T. Lam   3
                                    From Idea to Creation (cont.)

       The IR System at HKUST was brought to life in
        February 2003, with the following configuration
        and data content:

                DSpace Version 1.01
                Server with Intel Pentium III 733 MHz, 512 MB RAM,
                 and RedHat Linux Release 7.3
                105 Computer Science Technical Reports




Implementing the HKUST Institutional Repository System / K.T. Lam     4
                                    From Idea to Creation (cont.)

       Background / Experience Facilitating the
        Creation
                HKUST Library is an early supporter of the Open
                 Access concept - joined SPARC (Scholarly Publishing
                 & Academic Resources Coalition) in 2001
                Experience of conducting digital libraries projects,
                 with CJK capabilities
                  • Electronic Course Reserve - 1993
                  • Digital University Archives and Electronic Theses -
                    1997
                  • etc.


Implementing the HKUST Institutional Repository System / K.T. Lam     5
                                    From Idea to Creation (cont.)

       Why have an IR?
                To create a permanent record of the scholarly output
                 of HKUST
                  • No available access to some scholarly works
                    published by our own faculty
                  • Collections of working papers, technical reports,
                    research reports floating around
                  • Some of our scholarly works are in the public
                    domain



Implementing the HKUST Institutional Repository System / K.T. Lam       6
                                    From Idea to Creation (cont.)

       Why have an IR? (cont.)
                To make HKUST’s scholarly output more globally and
                 openly accessible
                To support the international Open Access effort.
                 “[T]he mission of disseminating knowledge is only half
                 complete if it is not widely and readily available to
                 society” - Berlin Declaration
                 (http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html)




Implementing the HKUST Institutional Repository System / K.T. Lam               7
                                    From Idea to Creation (cont.)

       IR Software Selection
                The July/August 2004 issue of Library Technology
                 Reports provides a very detailed discussion on
                 institutional repository systems and functional
                 requirements




Implementing the HKUST Institutional Repository System / K.T. Lam   8
                                    From Idea to Creation (cont.)

       IR Software Selection (cont.)
                Decision in the first meeting of the IR Task Force in
                 mid December 2002:
                  • follow Caltech's model, i.e. to base our IR on open
                    source software and with OAI-PMH interface.
                We therefore evaluated two IR systems: EPrints and
                 DSpace




Implementing the HKUST Institutional Repository System / K.T. Lam         9
                                    From Idea to Creation (cont.)

       IR Software Selection (cont.)
                EPrints
                  • Developed by University of Southampton
                  • The very first open source IR software; since 2000
                  • Written in Perl, with MySQL database and Apache
                    Web server




Implementing the HKUST Institutional Repository System / K.T. Lam        10
                                    From Idea to Creation (cont.)

       IR Software Selection (cont.)
                DSpace
                  • Jointly developed by MIT Libraries and Hewlett-
                    Packard Company
                  • Open source software
                  • Released on Sourceforge during our system
                    evaluation period in late December 2002
                  • Written in Java, with PostgreSQL database,
                    Lucene search engine, and a Tomcat web servlet
                    container


Implementing the HKUST Institutional Repository System / K.T. Lam     11
                                    From Idea to Creation (cont.)

       IR Software Selection (cont.)
                We chose (almost two years ago) DSpace because:
                  • DSpace began the development with the
                    experience gained from EPrints - the very first and
                    most popular open source IR software at that time
                  • EPrints did not have full support on Unicode and is
                    not Java- and servlet-based
                  • Both EPrints and DSpace are open source
                    software, fulfill our functional requirements, and
                    follow state-of-the-art library standards


Implementing the HKUST Institutional Repository System / K.T. Lam     12
              Current Configuration of IR at HKUST

 As of 4 December 2004,

 Home URL:                                                 http://repository.ust.hk/
 IR Software:                                              DSpace Version 1.2
 System Software:                                          Fedora Core 2 Linux; Tomcat 5.0;
                                                           JDK1.4.2
 Server:                                                   Intel Pentium 4 2.4GHz, 1GB RAM
 Content:                                                  1650 documents from 38 Departments
 Usages:                                                   Documents were accessed
                                                           9,051 times in the previous month

Implementing the HKUST Institutional Repository System / K.T. Lam                           13
Implementing the HKUST Institutional Repository System / K.T. Lam   14
            Growth (May 2003 to September 2004)




Implementing the HKUST Institutional Repository System / K.T. Lam   15
                                                         Major Features

       This section covers the following topics
                Data structure
                Document submission form
                Add item form
                CJK support
                OAI data provider
                SRW/U interface
                Google pilot project
                Authentication and authorization


Implementing the HKUST Institutional Repository System / K.T. Lam         16
                                                Major Features (cont.)

       Data Structure
                Document Types
                  • Preprints, technical reports, working papers,
                    conference papers, journal articles, presentations,
                    book chapters, patents, theses, etc.
                Document Formats
                  • Mainly PDF files; also contains PowerPoint files




Implementing the HKUST Institutional Repository System / K.T. Lam         17
                                                Major Features (cont.)

       Data Structure (cont.)
                DSpace data model
                  • Communities (and sub-communities)
                  • Collections
                  • Items
                                  Metadata
                                  Bundles of bitsteams
                HKUST implementation: Items are grouped by
                 Departments (i.e. communities) and then by
                 Document Types (i.e. collections).


Implementing the HKUST Institutional Repository System / K.T. Lam        18
        Community




        Collections




Implementing the HKUST Institutional Repository System / K.T. Lam   19
                                                                       CNRI Handle
                                                                    (Persistent Identifier)




Document in PDF




Implementing the HKUST Institutional Repository System / K.T. Lam                       20
                                                Major Features (cont.)

       Document Submission Form
                Faculty are apathetic about self-submission
                DSpace’s submission and workflow functions are too
                 lengthy; might scare off faculty
                In need of a simple and effortless submission form -
                 as a quick medium for submitting documents




Implementing the HKUST Institutional Repository System / K.T. Lam        21
                                                Major Features (cont.)

       Document Submission Form (cont.)
                Decided to develop our own form
                  • Requires only very minimal data entry
                  • Non-exclusive distribution license agreement
                  • Library IR staff enhance the metadata of the
                    submissions and then add them to DSpace
                  -------
                  • Written in Perl
                  • Submitted data stored in DSpace “Simple Archive
                    Format”

Implementing the HKUST Institutional Repository System / K.T. Lam        22
Implementing the HKUST Institutional Repository System / K.T. Lam   23
Implementing the HKUST Institutional Repository System / K.T. Lam   24
                                                Major Features (cont.)

       Add Item Form
                Locally developed JSP application to add items to
                 DSpace by Library IR staff
                Allows IR staff to:
                  • Create new item from scratch
                  • Enhance the metadata from faculty submission
                    and then add the item to DSpace




Implementing the HKUST Institutional Repository System / K.T. Lam        25
Implementing the HKUST Institutional Repository System / K.T. Lam   26
Implementing the HKUST Institutional Repository System / K.T. Lam   27
                                                Major Features (cont.)

       CJK (Chinese, Japanese, Korean) Support
                DSpace supports Unicode
                Problem - Lucene search engine is unable to search
                 by CJK characters
                  • Solved by replacing DSpace’s Tokenizer with a
                    CJKTokenizer - but has an interesting side effect
                Problem - URL of query containing CJK characters is
                 not properly encoded
                  • Solved by setting Tomcat URIEncoding="UTF-8"
                    and adding URLEncode() to one line of the java
                    source code


Implementing the HKUST Institutional Repository System / K.T. Lam        28
Implementing the HKUST Institutional Repository System / K.T. Lam   29
Implementing the HKUST Institutional Repository System / K.T. Lam   30
                So, ….




         Sorting Problem.
       Can you figure out the
           logic behind?




Implementing the HKUST Institutional Repository System / K.T. Lam   31
                                                Major Features (cont.)

       OAI Data Provider
                DSpace is OAI-compliant
                This means that OAI harvesters can easily collect the
                 metadata (in Dublin Core format) from various IRs
                 (including HKUST’s) for their added-value
                 indexing/searching services.
                For example: OAIster
                OAI Path to IR at HKUST:
                   http://repository.ust.hk/dspace-oai/request?



Implementing the HKUST Institutional Repository System / K.T. Lam        32
   http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805




Implementing the HKUST Institutional Repository System / K.T. Lam                33
                                                Major Features (cont.)

       SRW/U Interface
                Search and Retrieval for the Web (or by URL)
                Retain core functionality of Z39.50 but in the form of
                 web services
                This means search service providers can broadcast a
                 search to various IRs and deliver the search results in
                 their own GUI interface
                SRW/U Interface for the IR at HKUST
                   • Based on OCLC’s SRW/U software
                   • URL: http://repository.ust.hk/SRW/


Implementing the HKUST Institutional Repository System / K.T. Lam        34
                              The results of a SRW/U search, with XSLT transformation


Implementing the HKUST Institutional Repository System / K.T. Lam                       35
                                                Major Features (cont.)

       Google Pilot Project
                Initiated in March 2004 by the DSpace user
                 community under the leadership by MacKenzie Smith
                To improve access to DSpace IRs from within Google
                HKUST is a participant of this project
                Result - created a restrict=dspace search filter for
                 use in the Google URL. For example:

           http://www.google.com/search?restrict=dspace&q=collaboration




Implementing the HKUST Institutional Repository System / K.T. Lam         36
            http://www.google.com/search?restrict=dspace&q=collaboration




Implementing the HKUST Institutional Repository System / K.T. Lam          37
                                                Major Features (cont.)

       Authentication and Authorization
                Authentication - by EPerson record created through
                 user registration
                Authorization - based on the policy settings on the
                 object (community, collection, item, bitstream, etc.)
                A&A are not a big concern to our IR
                  • We do not use DSpace’s submission and workflow
                    functions
                  • It is open to the public
                  • A&A only required when our library IR staff access
                    DSpace’s administration functions


Implementing the HKUST Institutional Repository System / K.T. Lam        38
                                                Major Features (cont.)

       DSpace Authentication and Authorization (cont.)
                We have however customized DSpace to allow for
                 campus-wide LDAP authentication
                  • Mainly for a different project that also uses DSpace
                    (Digital University Archives).
                  • Transparent creation of EPerson record on-the-fly
                    during authentication
                We have also investigated the feasibility of hooking
                 DSpace with Yale’s Central Authentication Services
                  • With only little success - due to cumbersome stage
                    transfer from authentication to authorization


Implementing the HKUST Institutional Repository System / K.T. Lam        39
                                                         https://archives.ust.hk/




                                                                           Login to see more…


Implementing the HKUST Institutional Repository System / K.T. Lam                               40
                                             Future Improvements

  Flatten community+collection structure - 2-level
   only, not deep enough
  Linked collection - a collection that belongs to
   more than one community
  Unable to search across multiple collections
   from multiple communities
  Query Syntax not apparent to users, e.g.
                    +water +rapid                                   [for exact word match]
                    "vapor generator"                               [for phrase search]


Implementing the HKUST Institutional Repository System / K.T. Lam                            41
                                    Future Improvements (cont.)

  Insufficient capability for sorting search results
  Unable to display the number of items in a
   community and in a collection
                We have developed a JSP page to display the size of
                 the Repository
       Does not have the capability of transferring an
        item from one collection to another; nor a
        collection from one community to another

        DSpace is open source software; its success depends
        on contributions from its user community

Implementing the HKUST Institutional Repository System / K.T. Lam   42
                                                               Conclusions

  DSpace was selected about two years ago to
   build the HKUST IR.
  Make HKUST's scholarly research more openly
   and globally accessible.
  Installing DSpace is straightforward, but tailoring
   it to work effectively in your institutional
   environment is not trivial.




Implementing the HKUST Institutional Repository System / K.T. Lam            43
                                                      Conclusions (cont.)

       Customization:
                CJK support with UTF-8 encoding
                Driven by the fact that faculty are apathetic about self-
                 submission, a simple document submission form was
                 developed.
                Developed the “Add Item Form” to allow IR staff to
                 add items to DSpace without the need of batch
                 importing




Implementing the HKUST Institutional Repository System / K.T. Lam           44
                                                      Conclusions (cont.)

       By having the following implementations:
                DSpace's built-in OAI support
                OCLC's SRW/U on DSpace
                Google’s DSpace search filter
        documents in the Repository are more fully
        exposed on the Internet for easy harvesting,
        searching and discovery




Implementing the HKUST Institutional Repository System / K.T. Lam           45
                                                      Conclusions (cont.)

       Finally, many many thanks to the DSpace team
        from MIT and HP for developing this high quality
        open source product!




                                                                            Thank you!
                                                                              謝 謝!

Implementing the HKUST Institutional Repository System / K.T. Lam                        46

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:19
posted:12/25/2010
language:English
pages:46