brody-nordic
Document Sample


Citation Analysis for the Free,
Online Literature
Tim Brody
Intelligence, Agents, Multimedia Group
University of Southampton
28 April 2004 Second Nordic Conference on 1
Scholarly Communication
Content
• Current services for Open Access
Literature
• Institutional Archives Registry
• Metadata Harvesting through Celestial
• Citebase Search
– Citation Linking
– Search and Navigation Service
• Web Impact as a predictor of Citation
Impact
28 April 2004 Second Nordic Conference on 2
Scholarly Communication
Institutional Archives Registry
28 April 2004 Second Nordic Conference on 3
Scholarly Communication
28 April 2004 Second Nordic Conference on 4
Scholarly Communication
Sites in the IAR
• Things we want to know:
– GNU EPrints sites
– Other research collections (Other Archives, Open
Journals)
– BOAI 1. vs BOAI 2.
• A submission form consisting of:
– URL, Name, OAI URL, Country, ‘type’, full-text,
software
• Can’t (yet) track full-texts
• (Create a master-list so archives only register-
once?)
28 April 2004 Second Nordic Conference on 5
Scholarly Communication
Celestial
• Designed to:
– Be an abstraction over OAI-PMH versions
– Caching OAI metadata records
• Technological questions:
– How big can the OAI-PMH go (ok for 5 million
records so far)
– How reliable are OAI-PMH implementations
• Feeds Citebase, IAR, some external users
28 April 2004 Second Nordic Conference on 6
Scholarly Communication
28 April 2004 Second Nordic Conference on 7
Scholarly Communication
28 April 2004 Second Nordic Conference on 8
Scholarly Communication
Services for Open Access
Literature
OAIster
Google
Scirus Search Engines
OAI-PMH Transport
Navigation Tools
Citebase
Analysis & Assessment
Citeseer
BMC
Citation Analysis/Linking Services
(Citebase / Citeseer / OpenURL / DOI)
Version Linking Services
arXiv.org
Self-Archived Full-texts (Pre/Post-prints)
Open Access Publishing
n.b. Scirus/OAIster aren’t citation-analysis aware yet, Google
indexes Citeseer. Not an exhaustive list …
28 April 2004 Second Nordic Conference on 9
Scholarly Communication
Citation Analysis & Linking
• A citation is a reference from one work to
another [as a hyperlink: a citation link]
• Citation analysis uses citation
relationships to analyse patterns in
research
• As a graph a work (paper, book etc.) is a
vertex and a citation an edge
• ‘Bibliometrics’
– (study of patterns in literature)
28 April 2004 Second Nordic Conference on 10
Scholarly Communication
Digitometric/Infometric Analysis
• Bibliometrics for the online age
• Couple citation analysis with Web analysis
– (how many times has x been accessed?)
• Similar to readership studies, but easier to
survey and more comprehensive
– (though subject to the same problems of
copies being re-distributed, multiple accesses
etc.)
28 April 2004 Second Nordic Conference on 11
Scholarly Communication
Citebase Search
Metadata Harvest
(OAI-PMH) Web
Interface
Meta Database
Repositories
Full-text Harvest
Citation
OAI-PMH
Database
Interface
References
Database Citebase
28 April 2004 Second Nordic Conference on 12
Scholarly Communication
Citation Linking
• Retrieve and cache full-texts
– LaTeX, PDF, XML
• Extract reference list
• Extract individual references
• Parse references into components
– Author, year, title, journal, volume, pagination
• Store in structured database
28 April 2004 Second Nordic Conference on 13
Scholarly Communication
Citebase Search
28 April 2004 Second Nordic Conference on 14
Scholarly Communication
28 April 2004 Second Nordic Conference on 15
Scholarly Communication
Citebase Search:
Navigation by Citation Links
Article with
Future reference list
Reference
link
Related
Current Article Co-cited
Past
28 April 2004 Second Nordic Conference on 16
Scholarly Communication
28 April 2004 Second Nordic Conference on 17
Scholarly Communication
Predicting Citation Impact
• The Web gives us access to new metrics
– Download/access frequency
• Can early-day ‘download’ frequency give an
indication of longer-term citation frequency?
• (Web logs from the UK arXiv.org mirror, Citation
data from Citebase Search)
• Pearson correlation after 6 months of web logs =
0.42 for the High Energy Physics sub-arXiv
28 April 2004 Second Nordic Conference on 18
Scholarly Communication
28 April 2004 Second Nordic Conference on 19
Scholarly Communication
28 April 2004 Second Nordic Conference on 20
Scholarly Communication
28 April 2004 Second Nordic Conference on 21
Scholarly Communication
28 April 2004 Second Nordic Conference on 22
Scholarly Communication
0.5
0.45
0.4
0.35
Correlation (r)
0.3
0.25
0.2
0.15
0.1
0.05
0
0 100 200 300 400 500 600 700 800
28 April 2004 Second Nordic Conference on 23
Days since deposit
Scholarly Communication
Assessing Research(ers)
• Citation Impact
– By-Paper, Author, [Journal, Institution]
• Web Impact
– Predictor of citation-impact, combine with
citation-impact
• Search Engines
• More detailed research assessment
28 April 2004 Second Nordic Conference on 24
Scholarly Communication
Comparing Online/Offline Impact
• Using ISI CD-ROM data
• Use Web crawlers to find ‘online’ articles
• Compare citation impact of online and
offline articles
– By discipline, by journal, by author?
• Initial results for Physics show 2-3x
increase
– arXiv.org
• Southampton, U. Quebec, Oldenburg (de)
28 April 2004 Second Nordic Conference on 25
Scholarly Communication
Relevant Web Pages
• EPrints – http://www.eprints.org/
– IAR: http://archives.eprints.org/
• Citebase Search
– http://citebase.eprints.org/
• Celestial
– http://celestial.eprints.org/
• Correlation Generator
– http://citebase.eprints.org/analysis/correlation.php
• Tim Brody <tdb01r@ecs.soton.ac.uk>
28 April 2004 Second Nordic Conference on 26
Scholarly Communication
Get documents about "