Disciplines, Documents, and Data:
Convergence and Divergence in the Scholarly Information Infrastructure
Christine L. Borgman
Professor & Presidential Chair in Information Studies University of California, Los Angeles
i-Conference Plenary Presentation 17 October 2006
These slides are available under Creative Commons Non-commercial Attribution License, Christine L. Borgman, 2006 http://creativecommons.org
Scholarly Information Infrastructure
Cyberinfrastructure, e-Science, e-Social Science, eHumanities, …e-Research Goal: enable new forms of scholarship that are
information-intensive data-intensive distributed collaborative multi-disciplinary
Means: use information technology to
improve access to scholarly information manage the “data deluge” leverage data as a form of scholarly capital
http://www.fairholme.qld.edu.au/images/Image66.jpg
2
Driving Forces
Technology push
Distributed access to content and computing resources Tools and services for data collection, mining
Collaboration pull
Virtual organizations Share distributed resources
Open science, open scholarship
Open access
“Author self-archiving” “Author pays” journals Institutional repositories…
Open source software
http://www.tzanis.org/tzanisblog/archives/images/push-pull-thumb.jpg
3
e-Science infrastructure: Layered Model
Content
Digital Libraries
Applications Space
User Interfaces & Tools
Scientific DBs
Information & knowledge layer
Middleware
services layer
ITC Infrastructure
Processors, memory, network
4
Slide courtesy of Stephen Griffin, NSF, and Norman Wiseman, JISC
Content layer
Documents
Publications: books, journals, conference papers, ... Semi-formal: technical reports, working papers, proposals… Unpublished: websites, blogs, wikis…
Data
Observational Computational Experimental Records
Composite objects
http://www.medscape.com/content/2004/00/46/81/468129/art-mgm468129.fig1.jpg
5
Value chain of information
Links
Cited/citing documents Publications to data sources Data to publications in which reported
Across boundaries
Repositories Publisher databases Disciplines Countries
Image: http://www.indexgeo.com.au/tech/asdd/discover.gif
6
How do publications enter the value chain?
Function Legitimization
Authority, quality, priority, trustworthiness
Print
Peer review
Digital
Peer review
Dissemination
Awareness, diffusion, publicity
Publisher Pre-print distribution
Copy Mail
Publisher Pre-print distribution
Post on Web Deposit
Access, preservation, curation
Availability, discovery, retrieval, persistence
Library
Library Publisher Repository Homepage
7
Role of data in the value chain
Scholarly capital
Human capital Instrumentation Data
Leverage research investments
Replicate, verify research findings Ask new questions with extant data
Computational biology, chemistry Longitudinal and comparative social research Mining large bodies of text
Collaborative research
Data creation Data sharing, reuse
8
9
10
11
12
13
Roman Forum, Western End, ca. 400AD, copyright Regents of the University of California
14
What are data?
Technical definition:
A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen
Reference Model for an Open Archival Information System (2002).
Socio-technical definition:
“alleged evidence” (Buckland, 2006)
Image: http://cdiac.ornl.gov/oceans/NAtl_map.jpg
15
How do data enter the value chain?
Function Legitimization
Authority, quality, priority, trustworthiness
Reported in a publication Contributed to a data
repository
Peer review in context
Quality of method Evidence for conclusions Verify, reanalyze?
Peer review
Quality of metadata, documentation “test drive” the data
Author reputation
Dissemination
Awareness, diffusion, publicity
Description in a publication
Repository publisher
Access, preservation, curation
Availability, discovery, retrieval, persistence
Request to author Author maintains own data Author directs requestor to data source
Repository
Immediate access Embargo period
Curation responsibility
16
Disciplinary differences in the content layer
Sciences Social sciences Humanities
Image: Christine L. Borgman, 2005
17
Citation distribution of scientific literature
18
Graphs by Jillian Wallis, UCLA
Science Citation Index coverage
19
SCI and Scopus Coverage
20
Citation distribution of social scientific literature
21
Social Sciences Citation Index coverage
22
SSCI and Scopus coverage
23
Citation distribution of humanities literature
24
Arts & Humanities Citation Index coverage
25
A&HCI and Scopus coverage
26
Disciplinary Comparison of Online Access to Literature
Business models
Web of Knowledge: Long tail Scopus: Hits
Sciences
Steepest curve: shortest time span of use Online access goes furthest into the tail Most advantaged of the disciplines
Humanities
Shallowest curve: longest time span of use Online access goes least far into the tail Least advantaged of the disciplines
Mass digitization projects will favor the humanities
27
Scientific data
Examples
Ecology: weather, ground water, sensor readings, historical record Medicine: xrays Chemistry: protein structures Astronomy: spectral surveys Biology: specimens Physics: events, objects Documentation: Lab and field notebooks, spreadsheets
Sources
Generate own data Acquire from collaborators, other scientists Data repository
28
Social scientific data
Examples
Opinion polls Surveys, interviews Mass media Laboratory Experiments Field experiments Demographic records Census records Voting records Economic indicators Sources
Generate own data Acquire from other scholars Data repositories: Social Surveys Government records Corporate records
http://www.census.gov/population/cen2000/map02.gif
29
Humanities data
Examples
Newspapers Photographs Letters Diaries Books Articles Birth, death, marriage records Church records Court records School and college yearbooks Maps…
Sources
Search libraries, archives, public records Acquire from other scholars Data repositories: Beazley, Arts & Humanities Data Service (UK) Corporate records, mass media
http://ecai.org/silkroad/cultures/index.html
30
Data sources by discipline
Research data generated within the field
High
Non-research data from external sources
Sciences Social sciences Humanities
Humanities Social sciences Sciences
Medium
Low
31
Incentives to share
Tradition of “open science” Collaboration Reciprocity Recognition Coercion: Required by funding agency, journal, university
Image source: www.buffaloworks.us/ images/sharing%20orangs.jpg
32
Contributing to the content layer
Publications
Publisher dissemination, library access “Author self-archiving” (Harnad)
Institutional or disciplinary repositories Post online: personal, lab, department site
Contribution rates
Physics (arXiv): very high
4,000 new submissions/month 15,000 to 38,000 connections/hour
Other fields: about 15% PubMed Central: less than 4%
33
Christine L. Borgman, 1995
Contributing to the content layer
Data
Maintain locally, release upon request Maintain locally, post publicly, provide links from publications Contribute to data repository
Discipline: Protein Databank, Survey Research center… Institutional: University
Contribution, sharing rates
Mandatory: high Optional: low
Image source: www.buffaloworks.us/ images/sharing%20orangs.jpg
http://buckminster.physics.sunysb.edu/images/ac60poly.jpg
34
Data sharing in science
Physics: shared by collaborators, not openly published Genomics: deposit expected Seismology: two-year embargo Chemistry: highly contentious intellectual property Ecology: many small, local projects, local data
Image source: http://www.bbc.co.uk/schools/gcsebitesize/img/ict04datastorage.gif
35
Incentives not to share
Rewards for publication, not for data management Effort to document data Competition, priority of claims Intellectual property
control of own resources access to resources controlled by others
Image source: www.buildingsrus.co.uk/.../ target1.htm
36
Sciences: Incentives not to share
Rewards for publication, not for data management Effort to document data Competition, priority of claims
Local, personal use: minimal Shared use: extensive, follow metadata standards
Control until publication Control until finished mining data Control of own resources
Intellectual property
Collaborative resources
Authority to release Multiple jurisdictions
Concerns for misuse, misinterpretation Concerns for “free riders” Resale value of products - chemical, drug data
http://nick.omp.net/graphics/tongue.jpg
37
Social sciences: Incentives not to share
Rewards for publication, not for data management Effort to document data
Local, personal use: minimal Shared use: extensive, follow metadata standards Personal data must be de-identified Control until publication Control until finished mining data
Competition, priority of claims Intellectual property
Control of own resources
Collaborative resources
Authority to release Multiple jurisdictions
Concerns for misuse, misinterpretation Concerns for “free riders” Resale value of products - low Confidential data that cannot be released
Use of resources controlled by others
Acquire permission to use resources Permissions may not be transferable
http:www.fci.org
38
Humanities: Incentives not to share
Rewards for publication, not for data management Effort to document data Competition, priority of claims Intellectual property
Local, personal use: minimal Shared use: extensive, follow metadata standards Control until publication Control until finished mining sources Control of own resources
Collaborative resources
Authority to release Multiple jurisdictions
Concerns for misuse, misinterpretation Concerns for “free riders” Resale value of products - art, cultural heritage
Use of resources controlled by others
Acquire permission to use resources Permissions may not be transferable Permissions may be prohibitively expensive or unavailable
http://www.fortunecity.com/millennium/galaxyway/375/14months/talking.jpg
39
Research frontiers in information
Cyberinfrastructure/e-Research
Infrastructure design and development Changing nature of scholarship Interaction of social, technical, economic, policy factors
Information
Expansion of sources and resources Computational uses of content Information as evidence
Society
Differential access to resources Differential uses of resources
Institutions and policy
Roles of libraries, archives, museums Institutional information infrastructure Policy implemented by information technology
40
Education frontiers in information
Information professionals
Librarians Archivists Information architects Data scientists … Social questions Technical questions Policy questions Subject domain expertise Collaborative expertise Disciplines Information institutions Universities Private enterprise, publishers Funding agencies Policy makers 41
Information scholars
Convene stakeholders
http://www.nelsonmullins.com/legal-practice-area/Practice_Insets/Intellectual-Property-Inter.jpg
Thank you!
http://www.eng.ox.ac.uk/
http://www.rhodesscholar.org/
http:www.cens.ucla.edu New York Times
42