Docstoc

“Tomorrow_ and tomorrow_ and tomorrow” the players on the

Document Sample
“Tomorrow_ and tomorrow_ and tomorrow” the players on the Powered By Docstoc
					                                                      a centre of expertise in data curation and preservation




 “Tomorrow, and tomorrow, and
           tomorrow”:
the players on the curation stage
                             Chris Rusbridge
                          Presentation at OCLC
                                                                                                   Funded by:
  This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
  Scotland License, excluding content property of others. To view a copy of this license, visit
  http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
  Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
                              a centre of expertise in data curation and preservation




        •"To-morrow, and to-morrow, and to-morrow,
        •Creeps in this petty pace from day to day,
        •To the last syllable of recorded time;
        •And all our yesterdays have lighted fools
        •The way to dusty death.
        •Out, out, brief candle!
        •Life's but a walking shadow; a poor player,
        •That struts and frets his hour upon the stage,
        •And then is heard no more: it is a tale
        •Told by an idiot, full of sound and fury,
        •Signifying nothing."
                                            •Shakespeare: Macbeth
OCLC October 2006
                          a centre of expertise in data curation and preservation




                    •Dunsinane Hill




OCLC October 2006                                     •Photo by Fabrice
                    a centre of expertise in data curation and preservation




OCLC October 2006
                    a centre of expertise in data curation and preservation




OCLC October 2006
                             a centre of expertise in data curation and preservation




                    Contents
     •   Curation and the Digital Curation Centre
     •   Science and Data Citations
     •   The “poor players” of data curation
     •   Sustainability of curated data
     •   Macbeth again…




OCLC October 2006
                               a centre of expertise in data curation and preservation




                     Curation
     • Data increasingly important as evidence
        • Experimental verifiability (the basis of science)
        • Unrepeatable observations & experiments
          (particularly environmental in broadest sense)
        • Legal, compliance & transactions
        • Cultural resources


     • “Preservation” view vs “Publishing” view


OCLC October 2006
                              a centre of expertise in data curation and preservation




                Lynch remarks
     • Closing the Curation Conference
     • 3 views of digital curation
        • Finite process, handover to preservation
        • Whole life process, evolving object(s)
        • Collection as a living thing




OCLC October 2006
                          a centre of expertise in data curation and preservation




                    Digital curation?
                                                        For later use

                                                                    Static

                               Digital preservation




OCLC October 2006
                                    a centre of expertise in data curation and preservation




                        Digital curation?
      In use now (and the future)                                 For later use

      Dynamic                                                                 Static
      Long-term
                    Digital curation     Digital preservation




OCLC October 2006
                                    a centre of expertise in data curation and preservation




                       Digital curation
      In use now (and the future)                                 For later use

      Dynamic                                                                 Static
      Long-term
                     Digital curation & preservation



      “maintaining and adding value to a trusted body
      of digital information for current and future use”


OCLC October 2006
                         a centre of expertise in data curation and preservation




                    Mission
       “The over-riding purpose of the DCC is to
       support and promote continuing improvement
       in the quality of data curation, and of
       associated digital preservation”




OCLC October 2006
                                        a centre of expertise in data curation and preservation



  Organisation to Engage & Collaborate
     communities of                                                 curation
     practice: users                                                organisations
                                                                    eg DPC
                                      community
                                       support &
                                       outreach
                        service      management
     Associates                                                               research
                       definition     & admin           research
     Network                           support                              collaborators
                       & delivery

                                     development
                                     co-ordination
                          testbeds
                          & tools

    Industry                                            standards bodies

OCLC October 2006
                                     a centre of expertise in data curation and preservation



 Organisation to Engage & Collaborate: Leads
     communities of                                              curation
     practice: users                                             organisations
                                                                 eg DPC

                                       Bath



     Associates        Glasgow      Edinburgh       Edinburgh              research
     Network                                                             collaborators


                                     CCLRC
                         testbeds
                         & tools

    Industry                                         standards bodies

OCLC October 2006
                               a centre of expertise in data curation and preservation




              Associated work
     • DCC LOCKSS Technical Support Service
        (Lots of Copies Keep Stuff Safe)
     • DCC SCARP Project
        • Disciplinary approaches to sharing, curation, re-
          use and preservation
     • EU projects associated
        • CASPAR
        • Digital Preservation Europe
        • PLANETS

OCLC October 2006
                             a centre of expertise in data curation and preservation




                    Phase 2
     • Externally-moderated, reflective self-
       evaluation completed
     • Phase 2 proposal (2007/10) to JISC
        • Accepted: focus on science data, reduced scale
     • EPSRC-funded Research continues until
       2007/8




OCLC October 2006
                          a centre of expertise in data curation and preservation




     2nd International Digital Curation
               Conference
     • Research & invited presentations
     • Glasgow, 21/22 November, 2006
     • Please register at:
       http://www.dcc.ac.uk/events/dcc-2006/




OCLC October 2006
                    a centre of expertise in data curation and preservation




OCLC October 2006
                             a centre of expertise in data curation and preservation




          Data resource stages
     • Curated data is created…
        • Observations? Fixed!
     • Or Acquired…
        • Data brought/bought from outside
        • Ingest
     • Development
        • Derived, refined, combined, processed data
        • Potentially many stages



OCLC October 2006
                           a centre of expertise in data curation and preservation




                                                 TWOMASS (Infrared)
                    SDSS (Visual)
OCLC October 2006                             Slide from Rajendra Bose
                    a centre of expertise in data curation and preservation




OCLC October 2006                      Slide from Rajendra Bose
                                     a centre of expertise in data curation and preservation




               New discovery…
     • National Virtual Observatory
        • Johns Hopkins press release: “Scientists working to create the
          NVO, an online portal for astronomical research unifying dozens of
          large astronomical databases, confirmed discovery of [a] new
          brown dwarf recently. The star emerged from a computerized
          search of information on millions of astronomical objects in two
          separate astronomical databases. Thanks to an NVO prototype,
          that search, formerly an endeavor requiring weeks or months of
          human attention, took approximately two minutes.”




OCLC October 2006
                             a centre of expertise in data curation and preservation




                    Context
     • Data meaningless without context
        • Linkage
        • Metadata of many kinds
        • Workflow!
     • Provenance
        • Computational lineage
        • Authenticity




OCLC October 2006
                                                       a centre of expertise in data curation and preservation



        NASA




                     PAR
                            subscene
                                                 E0




                     Csat
                              8-day composite
                               and subscene

                                                         Ctot calc       Zeu calc
                                                Csat                                PP eu calc




                            8-day composite
 HRPT                        and subscene                   Pbopt calc



                                                 SST




        University                                                                               research
                                                             University                           group3  local
        research
                                                             research                                   decision-
         group1
                                                              group2                                   making body



OCLC October 2006                                                                   Slide from Rajendra Bose
                              a centre of expertise in data curation and preservation




            Access and re-use
     • Ethics and rights control access
        • Weak in expressing this long-term
     • Collaboration tools
        • Annotation, discussion, review
        • Re-use leading to change and development
     • “Publication”
        • Not just in “print”
        • Underlying data should be “published”, too
     • Citation…

OCLC October 2006
                                      a centre of expertise in data curation and preservation




 CLADDIER citation investigation
   “My last example was an MST data set held at the BADC, and I was
     suggesting something like this (for a citation):
   <Citation><Author> Natural Environment Research Council </Author>
   <Title> Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth </Title>
   <Medium> Internet </Medium>
   <Publisher> British Atmospheric Data Centre (BADC) </Publisher>
   <PublicationDate status="ongoing"> 1990</PublicationDate>
   <Identifier> badc.nerc.ac.uk/data/mst/v3/upd15032006</Identifier>
   <Feature><FeatureType>http://featuretype.registry/verticalProfile</FeatureType><
      LocalID>200409031205</LocalID></Feature>
   <AccessDate> Sep 21 2006 </AccessDate>
   <AvailableAt><url>http://badc.nerc.ac.uk/data/mst/v3/</url></AvailableAt>
   </Citation>
   (Made up tags!)”




OCLC October 2006                                     •Bryan Lawrence Weblog
                             a centre of expertise in data curation and preservation




CLADDIER 2: “Version of record”
     • Role of Publisher: add value
        • provision of catalogue metadata
        • some commitment to maintenance of the resource
          at the AvailableAt url
        • some commitment to the resource being
          conformant to the description of the Feature
        • some commitment to the maintenance of the
          mapping between the identifier [LocalID] and the
          resource.



OCLC October 2006                            •Bryan Lawrence Weblog
                                    a centre of expertise in data curation and preservation




      CLADDIER 3: persistence
     • Wayback Machine
        • Only snapshots (eg only 2004 version of Bryan‟s home
          page!)
     • WebCite
        • allows the creater of content to submit URLs for [archiving],
          thus ensuring when one writes an academic document, the
          material will be archived, and the citation will be persistent
        • But no real help for data…
     • “… only allow [data citation] when we believe in the
       persistence of the organisation making the data
       available…”

OCLC October 2006                                   •Bryan Lawrence Weblog
                    a centre of expertise in data curation and preservation




OCLC October 2006
                                       a centre of expertise in data curation and preservation




                          Citation
     • Needs a stable resource to cite…
        OWL Web Ontology Language
        Reference
         W3C Proposed Recommendation 15 December 2003
         This version:
         http://www.w3.org/TR/2003/PR-owl-ref-20031215/
         Latest version:
         http://www.w3.org/TR/owl-ref/
         Previous version:
         http://www.w3.org/TR/2003/CR-owl-ref-2003081

     • (FRBR works & expressions?)
OCLC October 2006
                                  a centre of expertise in data curation and preservation




                      Citation…
     • The date alone (as in common web citation
       approaches) is not enough!
                    •[6] The CIA World Factbook.
                    •www.cia.gov/cia/publications/factbook/.
                    •Retrieved on 8 Jan 2006.
        • Cited object likely to have changed…
        • Citation should link to the cited object as it was!




OCLC October 2006
                                        a centre of expertise in data curation and preservation




                 Citation needs…
     • An efficient way to reference and access “archived” past states
       of a changing dataset (work in progress, Buneman et al)
     • Not important for original observations
         • Don‟t mess with those data
     • Less important for incremental datasets
         • Later stuff should not invalidate earlier
     • Very important for revisable datasets
         • Eg Genomics… datasets that result from the combined work of
           curators, or contain opinions or facts likely to change
         • Eg Mapping… OS maps represent a huge database that changes
           on a daily basis




OCLC October 2006
                                         a centre of expertise in data curation and preservation


                                                              XML Archive at time t - 1
 XMLArch: System Architecture

                      Relational
                      Database




                                                                                      XML Archiver
                                                                  Pre-processor

     Data Extractor
                                                                      Version
                                                                      Merger

                             XML Snapshot at time t




                                                                XML Archive at time t
OCLC October 2006                                                           •Carwyn Edwards
                    a centre of expertise in data curation and preservation




   Who are the curation players?




OCLC October 2006
                                    a centre of expertise in data curation and preservation




            Curation: Individual
     • “Small science” 2-3 times more data than “Big
       science”, but much more at risk
     • PhD student? RA? PI? Administrator? IT support?
     • Data potentially on local hard drives, or at best
       shared network drives
        • May be inadequately protected
        • Liable for policy-led deletion on resignation
     • Individual “knows” too much
        • Documentation/metadata unlikely to be adequate
     • Tomorrow: gone!

OCLC October 2006
                    a centre of expertise in data curation and preservation




         Department: eCrystals
                               • Specialist department
                                 archive (& national service)
                               • Workflow recording of lab
                                 parameters (R4L)
                               • Public & private elements
                               • Trying to build eCrystals
                                 federation (eBank 3)
                               • But… ReciprocalNet?
                                 French COD efforts?
                                 Fragmented discipline!
                               • Tomorrow: likely to continue



OCLC October 2006
                    a centre of expertise in data curation and preservation




    Institution: Cambridge Chemistry
                                 • 175,000 small molecule
                                   structures in CML
                                 • Alongside Archaeology,
                                   Manuscripts, Learning
                                   Materials, etc
                                 • No library curation skills;
                                   dependent on research
                                   group enthusiast
                                 • Collection isolated from
                                   other Chemistry
                                 • Tomorrow: assured…




OCLC October 2006
                      a centre of expertise in data curation and preservation




              Community: CDL
                                   • Shared effort from
                                     group of institutions
                                   • Comparison OhioLink?
                                   • Document tradition, not
                                     data
                                   • Passive role re
                                     collections
                                   • Rely on departmental &
                                     domain expertise
                                   • Tomorrow: assured…

OCLC October 2006
                    a centre of expertise in data curation and preservation




           Community: SDSC?
                                 • Data specialists
                                 • Multiple disciplines
                                 • Distinct from domains;
                                   curation dependent on
                                   external expertise
                                 • Research ethos
                                 • Tomorrow: dependent
                                   on grant/contract
                                   income & research
                                   priorities

OCLC October 2006
                    a centre of expertise in data curation and preservation




         Community: LOCKSS?
                                 • Self-selected group of
                                   collectors: closest to genuine
                                   open activity (despite
                                   Alliance)?
                                 • Traditionally libraries
                                   collecting eJournals
                                 • Model respects IPR
                                 • No domain expertise; rely on
                                   origins
                                 • Data limitations…
                                 • Tomorrow: potentially very
                                   persistent (low cost, high
                                   reliability, attack resistance,
                                   distributed)

OCLC October 2006
                     a centre of expertise in data curation and preservation




        Discipline: Archaeology
                                  • Staffed by archaeologist
                                    curators
                                  • Understand special
                                    legal issues
                                  • Strong relationship with
                                    community & peers
                                  • Internationally still
                                    fragmented?
                                  • Tomorrow: dependent
                                    on research council
                                    grants + deposit funding


OCLC October 2006
                      a centre of expertise in data curation and preservation




          Discipline: Astronomy
                                   • Part of major
                                     international effort
                                   • Expensive shared
                                     facilities, global reach
                                   • Well integrated into
                                     community
                                   • Enable new science
                                   • Tomorrow: assured by
                                     community (another
                                     large facility)


OCLC October 2006
                     a centre of expertise in data curation and preservation




         Discipline: Atmosphere
                                  • Strong believer in need
                                    for domain scientists as
                                    curators
                                  • Significant participant in
                                    “community proxy”
                                    agenda-setting activities
                                  • Internationally
                                    fragmented resources
                                  • Tomorrow: mostly
                                    dependent on grant
                                    funding (but strong
                                    commitment)

OCLC October 2006
                    a centre of expertise in data curation and preservation




       Discipline: Pharmacology
                                 • International Scientific
                                   Union
                                 • Attempting to build
                                   credit for data
                                   contributions
                                 • DB ownership rotates
                                 • Tomorrow: extremely
                                   limited funding




OCLC October 2006
                     a centre of expertise in data curation and preservation




      Discipline: Social Sciences
                                  • Mature!
                                  • Staffed by Social
                                    Science curators
                                  • Alert to opportunities
                                  • Able to appraise
                                    material offered
                                  • Strong relationship to
                                    discipline
                                  • Tomorrow: assured
                                    through broad mix of
                                    funding streams

OCLC October 2006
                    a centre of expertise in data curation and preservation



      Publisher: Crystallography
                                 • Publisher and Scientific
                                   Union
                                 • Created key domain
                                   crystallographic standard
                                   (CIF)
                                 • Strong motivator for deposit
                                   of structure data
                                 • Consistent quality checks
                                 • DOIs used for structure data
                                 • Tomorrow: publishing
                                   business model



OCLC October 2006                                 •Slide from IUCr
                    a centre of expertise in data curation and preservation




   National bodies: British Library
                                 • Serious and robust
                                   approach
                                 • Legal deposit powers &
                                   responsibilities as driver
                                 • Oriented primarily
                                   towards “cultural
                                   heritage” (broadly
                                   interpreted)
                                 • Little data, no science
                                   domain experience
                                 • Tomorrow: strong future
                                   commitment

OCLC October 2006
                    a centre of expertise in data curation and preservation




    National bodies: TNA/NDAD
                                 • Specialist archive for
                                   government datasets
                                 • Understand government
                                   regulations, dynamics &
                                   requirements
                                 • Subject generalists;
                                   disconnected from
                                   associated science
                                 • Technology specialists
                                   (understand databases)
                                 • Tomorrow: likely to pass
                                   eventually to The National
                                   Archives



OCLC October 2006
                    a centre of expertise in data curation and preservation




    National bodies: NOAA (etc)
                                 • Government body
                                   making serious data
                                   available
                                 • Domain scientists
                                   curate data
                                 • Operates in current
                                   political context (!)
                                 • Tomorrow: reasonably
                                   assured but some un-
                                   funded mandates?


OCLC October 2006
                      a centre of expertise in data curation and preservation




            3rd parties: OCLC?
                                   • Should this be
                                     community?
                                   • Demand driven
                                   • No domain science
                                     expertise: rely on
                                     origins
                                   • Tomorrow: business
                                     case




OCLC October 2006
                       a centre of expertise in data curation and preservation




            3rd parties: Portico
                                    • Specific area: eJournals
                                    • Depends on publisher
                                      agreements
                                    • No data or domain
                                      science expertise
                                    • Tomorrow: commitment
                                      from Mellon +
                                      publishers +
                                      subscriptions, good
                                      funding mix


OCLC October 2006
                    a centre of expertise in data curation and preservation




      3rd Parties: Iron Mountain
                                 • Records management
                                   IS a curation problem
                                 • Organisations like this
                                   very likely to branch out
                                 • No domain science
                                   expertise
                                 • Tomorrow: business
                                   case, viability, stock
                                   market…



OCLC October 2006
                             a centre of expertise in data curation and preservation




       Institutions & the network
     • Institutions have some fundamental
       sustainability
     • Disciplines live in the network; sustainability is
       an issue
     • Can we get the best of both?




OCLC October 2006
                               a centre of expertise in data curation and preservation




                  Intersections…
                Institution Institution Institution                  etc
                     1           2           3
     Discipline      X                       X
         1
     Discipline                  X                 X
         2
     Discipline      X           X
         3
        etc


OCLC October 2006
                    a centre of expertise in data curation and preservation




    Who are the curation players
              again?




OCLC October 2006
                                     a centre of expertise in data curation and preservation




          Project StORe findings
  • Discipline commonality from survey (Miller, UKDA, 2006):
     •   2-way links between data & publication useful
     •   Barriers to actual deposit of data/outputs
     •   Sharing data important, likely between colleagues
     •   Perceived inconsistency across repositories
     •   Most common searching: Google type
     •   Researchers favour self-reliance rather than library support
     •   Recognise need for common minimum metadata
  • Aim for pilot linking middleware demonstrator
  • “Creating small scale „silos‟ of information with institutional
    repositories is not … a compelling information
    management strategy in the „Google age‟” (Heery &
    Anderson for JISC, 2005)
OCLC October 2006
                          a centre of expertise in data curation and preservation




      Sustainability: tomorrow is the
             emerging worry
     • Sustainability work package in DCC (new
       grant!)
     • JISC/NDIIPP meeting addressed it
     • AHRC report draft soon
     • Research Information Network report draft
     • JISC study on sustainable IT systems for HE
     • Recent ARL/NSF workshop, NSF strategy


OCLC October 2006
                            a centre of expertise in data curation and preservation




          Sustainability of what?
     •   Repository as an organisation
     •   Repository as a service
     •   Repository as a system
     •   Repositories as a network (federation?)
     •   Collections and objects supported by
         repositories

     • Commit to collection: contract the manager!

OCLC October 2006
                                    a centre of expertise in data curation and preservation




                    Social factors
     • Commitment essential… much more than anything else
       (cf persistent identifiers)
     • Funder requirements express social determination
        • Policy & grant application forms, selection criteria
        • Monitoring essential
     • Legal, ethical, IPR impacts all significant
     • Public good questions
        • Academic credit (citations?)
        • Free-loaders (embargos?)
        • Disciplines are different!
     • Workforce skills: researcher, data librarian/scientist

OCLC October 2006
                              a centre of expertise in data curation and preservation




     Sustainability a function of...
     •   Commitment
     •   Goals
     •   Value and cost
     •   Business model
     •   Time
     •   Environment
     •   Domain knowledge and information
     •   Dimensions (how much stuff)
     •   Technical approaches
     •   Usage

OCLC October 2006
                                 a centre of expertise in data curation and preservation




                So, tomorrow…
     • Digital data repositories already sustained > 30 years
        • How?
        • Vision, leadership, commitment
     • Libraries, archives, museums sustained 100s of
       years
        • How?
        • Aggregate value proposition
        • Perception now under threat!


     • Collectively we need to identify the next steps toward
       digital data sustainability, for tomorrow, and
       tomorrow, and tomorrow!

OCLC October 2006
                                a centre of expertise in data curation and preservation




             Macbeth again…
          •"To-morrow, and to-morrow, and to-morrow,
          •Creeps in this petty pace from day to day,
          •To the last syllable of recorded time;



          •…it is a tale
          •Told by an idiot, full of sound and fury,
          •Signifying nothing."




OCLC October 2006
                            a centre of expertise in data curation and preservation




          Mission (impossible?)
     • To that last syllable of recorded time
     • Keep our tales forever full of significance!




                        Thank you




OCLC October 2006

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:9/4/2011
language:English
pages:63