FLOW Federating Libraries on the Web

Document Sample
FLOW Federating Libraries on the Web Powered By Docstoc
					FLOW: Federating
Libraries on the Web
ACM/IEEE Joint Conference on Digital
Libraries: Portland, July 17, 2002
Anna Keller Gold (UC San Diego Libraries); Karen Baker (Scripps Institution of
Oceanography, LTER); Kim Baldridge (San Diego Supercomputer Center); Jean-
Yves LeMeur (European Center for Nuclear Research, CERN)
Outline:
1.   In theory: defining repository success and
     developing system requirements to match
2.   In practice: field report and local
     observations
3.   Next steps: developing for the future



                      JCDL, July 17 2002
1. In theory:
   The individual, team and network have document
    management needs in common
   Building successful research repositories entails
    active participation by relevant research
    communities in the full range of repository activities
    (“GSD”):
     Gather
     Share
     Discover

                         JCDL, July 17 2002
   Repository success depends on good match between
    technical and social design.
       E.g., institutional vs. disciplinary repositories
   Good social design remains unsolved research
    problem. See call for participants in October 2002
    conference addressing the cultural and management
    aspects of repository building, and emphasize
    institutionally-based repositories:
               http://www.arl.org/ir2002.html
                              JCDL, July 17 2002
FLOW hypothesis:
Repository success depends on addressing
   divergent roles of repository participants and
   multiple levels of organization, including:
     1)   Divergence among ingrained, more-or-less well-
          functioning workflows and practices
     2)   Multiple (and differing) motivations for participation
          by individuals, groups, networks, institutions,
          disciplines


                           JCDL, July 17 2002
Practices and motivations, e.g. of:
   Individuals
   Research groups
   Institutions
   Disciplines




                      JCDL, July 17 2002
Practices as individuals:
     Notebooks, articles, office files
     Mail, email, in-person: circulate preprints by
      mail, email
     Personal web pages (multi-format links)
     Personal databases (e.g. flat files, citation
      managers: can extract from, download and import
      to)
     Deposit to/extract from disciplinary repositories
      (e.g. arXiv)
                       JCDL, July 17 2002
Motivations as individuals:
   Tenure (maintain lists of peer-reviewed
    publications; track citation counts)
   Manage knowledge for easy retrieval and
    discovery
   Exchange with key colleagues
   Participate in building shared knowledge


                     JCDL, July 17 2002
Practices of research groups:
   Internal databases (shared)
   Web sites with lists




                      JCDL, July 17 2002
Motivations of research groups:
     Manage knowledge
     Track output (for funding agencies)
     Track impact (greater exposure leads to
      greater impact)
     Discovery



                     JCDL, July 17 2002
Practices of institutions, orgs:
     Publish (e.g. tech reports, conf.
      proceedings, journals)
     Create internal databases
     Establish repositories
     Establish libraries
     Hybrid library/repositories


                      JCDL, July 17 2002
Motivations of institutions:
     Sharing, discovery, and reputation
     Management and reporting (including
      accountability to funding agencies)
     Archiving




                    JCDL, July 17 2002
Practices of disciplines:
     Professional society databases, portals
     Establish disciplinary repositories (may be
      distributed & federated or centralized, e.g.
      NCSTRL, arXiv)




                      JCDL, July 17 2002
Motivations of disciplines:
     Sharing
     Discovery




                  JCDL, July 17 2002
FLOW:
   The distinctive document management tools
    and practices used within each layer
    (individuals, group, center, network,
    discipline) represent boundaries across which
    information could flow openly if technology
    and metadata could provide an enabling
    digital framework (“metadata grid”)


                      JCDL, July 17 2002
2. Practice:
   Field report of progress in creating a
    prototype repository at the San Diego
    Supercomputer Center using CERN’s
    CDSware
   Goal is to prototype a system that reconciles
    the divergent practices and motivations of
    target repository participants

                      JCDL, July 17 2002
CDSware: reasons for selection

   Proven institutional implementation at CERN
   Extended features fully implemented
    (personalization, review)
   OAI compliant
   Supports hybrid repository / bibliography
   Technical support and active development
   Open source
                       JCDL, July 17 2002
CDSware:
   CERN implementation of CDSware manages
    over 350 collections of data, consisting of
    over 550,000 bibliographic records, including
    220,000 full-text documents: preprints,
    articles, books, journals, photographs…
                http://cdsware.cern.ch/


                      JCDL, July 17 2002
CDSware:
   Configurable portal-like interface for hosting various
    kind of collections:
       Powerful search engine with Google-like syntax.
       User personalization, including document baskets and
        email notification alerts.
       Electronic submission and upload of various types of
        documents.
       Runs an OAI data and service provider enabling the
        metadata exchange between heterogeneous repositories.
       Automated citation recognition and linking

                           JCDL, July 17 2002
CDSware:
   MySQL database server (adaptable to Oracle)
   Apache/{PHP,Python} web application server
   Compile-time configuration via GNU Autoconf and
    WML
   Runtime configuration via MySQL configuration
    tables
   Integrates with other platform independent services
       E.g. CDS Conversion Server – converts file formats
   Extensible: enables the integration of any other
    installation-specific application.

                            JCDL, July 17 2002
CDSware status:
   CDSware is major revision and repackaging
    of CDS (CERN Document Server)
   First public release planned for July 2002
   Announce & users mailing lists released June
    2002
   News:
       http://cdsware.cern.ch/news/

                         JCDL, July 17 2002
Why another repository?
   Repositories and their design diverge in
    important ways:
       How things get in
       How things get out
       Who can put things in (and take out)
       What things can be put in
       What linkages they have to other systems
       What protocols/standards they follow

                         JCDL, July 17 2002
Comparing repository tools
 Parameter:              openEprints                CDSware                    Reference Web Poster          Library catalogs
 1. how things get in    Deposit by registered /    *Deposit by registered /   Upload by administrator       *FTP of batch files
                                  authorized                 authorized                 from one or more              consisting of
                                  people                     people                     private citation              individual entries
                                                    * Upload from structured            libraries                     or single record
                                                             file              *additions to citation                 copies from
                                                                                        library can be                bibliographic
                                                                                        batch-extracted               utilities
                                                                                        from commercial
                                                                                        sources
                                                                               *additions may also be
                                                                                        individual entries
                                                                                        by private library
                                                                                        manager

 2. how things get out   *OAI metadata harvesting   *OAI metadata harvesting   *Marked records may be        *Marked records may be
                                 protocol                    protocol                  downloaded to                 extracted in
                                                    *marked records can be             citation                      printable or
                                                             downloaded                management                    downloadable
                                                             singly or                 software                      formats, e.g. to
                                                             collectively              (Z39.80)                      citation
                                                    *personal “baskets” can                                          management
                                                             be made, shared                                         software
                                                    *record output in XML,
                                                             HTML, MARC,
                                                             DC record
                                                             formats
                                                    *CERN applications
                                                             support file
                                                             format
                                                             conversions
                                                     JCDL, July 17 2002
Comparing repository tools
  Parameter:                  openEprints                    CDSware                     Reference Web               Library catalogs
                                                                                         Poster
  3. who can put things in    Configurable: registered       Configurable: may be        Administrator with access   Specially trained and / or
                              or authorized people; may      registered or authorized    to server and commercial    authorized staff, usually in
                              include researcher direct      people: researchers in or   software                    libraries, using locally
                              deposits, or be configured     outside the institution;                                configured catalog
                              to “flow” deposits through     may be linked to                                        software
                              administrators                 institutional ID




  4. what things can be put   * Focus on preprints,          *Configurable; current      *Articles and conference    *Monographic works and
  in                          working papers (full text)     support for documents       proceedings are focus.      entire journals are
                              * Other uses: conference       with metadata or                                        primary focus
                              proceedings (CalTech);         metadata alone.
                              other monographs
                                                             *CERN configured for
                                                             preprints, commercial
                                                             articles, books, photos,
                                                             presentations, etc.

                                                             *Developing “people”
                                                             records




                                                           JCDL, July 17 2002
Comparing repository tools
    Parameter:                  openEprints                 CDSware                  Reference Web               Library catalogs
                                                                                     Poster
    5. what linkages to other   OAI supports cross-         *OAI supports cross-     *primarily to commercial    *via Z39.50, federated
    systems                     repository searching        repository searching     article databases for       search of other library
                                                            *Linkages created to     which citation              catalogs; extraction and
                                                            local applications and   management download         deposit to parallel
                                                            databases, e.g.          filters have been written   collective catalogs
                                                            personnel database                                   (OCLC, union catalogs)
                                                            *Upload from citation
                                                            management software
                                                            OK



    6. what protocols,          *DC                         *OAI-PM                  *Z39.80                     *MARC
    standards followed          *OAI-PMH                    *DC                      *MARC                       *Z39.50
                                *crosswalks from other      *MARC21                                              *Z39.80
                                metadata formats            *Z39.80 (in dev.)




                                                       JCDL, July 17 2002
CDSware at SDSC:
   How things get in:
       One-by-one item deposits
       Batch uploading from local collections
       Goal: to also populate the collection via
        intelligent spidering of designated open
        collections/documents (ResearchIndex does this
        now)


                         JCDL, July 17 2002
CDSware @ SDSC:
   How things get out:
       Extract to bibliographic software
       Extract as XML
       Extract as MARC 21 records
       Extract as DC
       Batch or single item extraction



                          JCDL, July 17 2002
CDSware at SDSC:
   Who can put things in (or take out)
       Organization affiliates (tracked by personnel
        database)
       Registered affiliates (voluntary deposits),
        associated by research collaboration, or just
        research interest
       Any interested parties (extract only)


                          JCDL, July 17 2002
CDSware at SDSC:
   What things can be put in:
       Digital objects plus metadata
       Metadata only
       Document-like objects
       Event records
       People records (and associations with
        organizations and research groups)


                         JCDL, July 17 2002
CDSware @ SDSC
   What (data) linkages with other systems?
       Now: personnel database at SDSC
       Future:
           NSF grants database
           Open URL
           Storage Resource Broker (SRB)




                           JCDL, July 17 2002
CDSware at SDSC:
   What protocols / standards followed:
       OAI-Protocol for Metadata Harvesting
       MARC 21
       Z39.80 (article databases, bibliographic software)
       DC




                          JCDL, July 17 2002
Design decisions:
     People and digital objects:
         Q: Are “creators” authors or people? A: Both.
         Integration with personnel database (also enables organization
          views – “all the people associated with XYZ research group”)
     Incorporate records for non-document objects (groups,
      people, grants)
     Allow hybrid system of metadata with or without
      associated digital objects
     End-user uploading from EndNote or similar commercial
      citation management software a goal
     Genre-based views for public; organization views for
      center
                              JCDL, July 17 2002
Accomplishments:
   Formed interdisciplinary team
   Assessed available repository software and design
    choices
   Demonstrated upload from test citation management
    file
   Integrated repository database with internal “people”
    table linking people with organizations
   Grounded in both local practices and management
    demands
                         JCDL, July 17 2002
Next steps:
   Complete demonstration of submit and upload
    functions from citation management software
    and grants database
   Populate database using both individual and
    batch submissions
   Demonstrate internal views of data for
    program administrators

                     JCDL, July 17 2002
Conclusion:
   Further work needed to address integration of
    repository building with researcher workflow.
   Further assess centrality of people and
    organizations in digital libraries / repositories.
   Further assess prospect of creating a metadata
    grid in which participation and flow is
    multilateral and multidirectional.
        In short – continued work toward…
                       JCDL, July 17 2002
D-Repository Grail:
   Accommodate current practices at all levels
                        and
   Enhance participation at all stages of research
    / learning process.




                      JCDL, July 17 2002
Acknowledgements:

   Programming support:
         Frank Sudholt and Josh Polterock (SDSC)
   Integrative Biosciences at SDSC
   NSF (DBI and OPP)




                      JCDL, July 17 2002
References and more information
   CDSware:
       http://cdsware.cern.ch/
   CDS at SDSC:
       agold@ucsd.edu




                          JCDL, July 17 2002

				
DOCUMENT INFO