Chandler_openurl_ala_2009 by chenshu

VIEWS: 6 PAGES: 44

									Towards OpenURL Quality
 Metrics: Initial Findings
                   Adam Chandler
               Cornell University Library




 2009 American Library Association Annual Conference, Chicago
OpenURL model
                    OpenURL model cont.
incoming OpenURL
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-
2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-
level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=
c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-
8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech
&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windi
sch,+n&rft_id=info:doi/10.1108%2f07378830910942991/


in our knowledge base?

title: Library hi tech        issn: 0737-8831 start date: 19970101 end date:


link-to syntax for Emerald

http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-
HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
       OpenURL is pervasive

Cornell link resolver alone:
July 1, 2008 – June 30, 2009:
 402,000 OpenURL service
requests.
 Estimate: 402,000 * 123(ARL libraries) =
 49 million
 Cornell’s top 10 OpenURL sources
1. Web of Knowledge
2. Google Scholar
3. Webfeat (our “Find Articles” service)
4. EBSCOHost
5. OCLC FirstSearch
6. SilverPlatter
7. Weill Cornell Medical Center
8. SciFinder Scholar
9. PubMed
10. Refworks
                 example OpenURL


http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-
2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:m
tx:journal&rft.atitle=item-
level+usage+statistics+a+review+of+current+practices+and+recommendati
ons+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.da
te=2009&rft.epage=162&rft.genre=article&rft.issn=0737-
8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+li
mited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.v
olume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.
au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
              example OpenURL (1)
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004
&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx
&rft_val_fmt=info:ofi/fmt:kev:mtx:journal
&rft.atitle=item-
level+usage+statistics+a+review+of+current+practices+and+recommendati
ons+for+normalization+and+exchange
&rft.auinit=c
&rft.aulast=merk
&rft.date=2009
&rft.epage=162
&rft.genre=article
&rft.issn=0737-8831
            example OpenURL (2)
&rft.issue=1
&rft.place=bingley
&rft.pub=emerald+group+publishing+limited
&rft.spage=151
&rft.stitle=libr+hi+tech
&rft.title=library+hi+tech
&rft.volume=27
&rfr_id=info:sid/www.isinet.com:wok:wos
&rft.au=scholze,+f
&rft.au=windisch,+n
&rft_id=info:doi/10.1108%2f07378830910942991/
         Literature review
Since the OpenURL standard was
introduced some ten years ago I can
identify no systematic study designed and
carried out to benchmark the quality of
linking.
 Wakimoto, Walker, and Dabbour (2006)


Main finding: Users just expect full-text.
 When they do not get it they are
 disappointed.




Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The
Myths and Realities of SFX in Academic Libraries." The Journal of Academic
Librarianship 32 (2): 127–136
 Wakimoto, Walker, and Dabbour (2006)

"Where does SFX start and where does it end? If
  an SFX request does not result in a full-text link,
  does the problem lie with the source database’s
  metadata, the construction of the OpenURL
  request, the SFX KnowledgeBase, the SFX
  software, the resulting target resource, or even
  the local library’s collection development plan?"
  (p. 134)

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The
Myths and Realities of SFX in Academic Libraries." The Journal of Academic
Librarianship 32 (2): 127–136
 … but finding the cause of the problem is hard

• Wrong start end date in the local library's holdings
  knowledge base (see KBART)
• Semantically inaccurate metadata from the OpenURL origin
  (wrong ISSN, for example)
• Wrong link-to syntax in link resolver
• Fragile handling of incoming links by content provider
• Inaccurate or missing Crossref DOI URL (sometimes the DOI
  registration process is out of sync with the mounting of
  articles)
• Subscription errors (especially with the start of a new
  calendar year)
• Syntactically incorrect metadata from the OpenURL origin
         Blake and Knudson (2002)

• “Increased communication between primary
  publishers and secondary publishers.
  Metadata corrections and updates need to be
  better coordinated.”
  See: Culling, James (2007). "Link Resolvers and the Serials Supply
  Chain." UKSG. <http://www.uksg.org/projects/linkfinal> and NISO/UKSG
  KBART

 Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
 Linking." Library Collections, Acquisitions & Technical Services 26 (3),
 (2002): 219-230.
         Blake and Knudson (2002)

• “Increased awareness of bibliographic/citation
  standards by authors. Increased submission of
  publications with bibliographical references
  reflecting the accepted standards.”



 Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
 Linking." Library Collections, Acquisitions & Technical Services 26 (3),
 (2002): 219-230.
         Blake and Knudson (2002)
• “Increased outreach by librarians to authors
  emphasizing and promoting the importance of
  citation standards for electronic document
  retrieval.”




 Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
 Linking." Library Collections, Acquisitions & Technical Services 26 (3),
 (2002): 219-230.
         Blake and Knudson (2002)
• “Increased consistency in metadata within a
  single database and across databases. This
  would result in a higher success rate of linking
  and would allow the algorithms to be simpler.
  Simpler algorithms are easier to maintain and
  modify.”


 Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
 Linking." Library Collections, Acquisitions & Technical Services 26 (3),
 (2002): 230.
                        Hughes (2004)
• Hughes describes an initiative of the Open
  Language Archives Community (OLAC), a
  consortium of linguistic data archives, to
  create an infrastructure to support metadata
  quality assessment within a specialized Open
  Archives Initiative (OAI) community.
.




    Baden Hughes, Metadata Quality Evaluation: Experience from the Open
    Language Archives Community. 7th International Conference on Asian
    Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
    Proceedings, pp 320-329.
                        Hughes (2004)
• Metadata quality should be evaluated on a
  per record and per collection basis and
  assessed against the baseline of broader
  community practice. Metadata quality
  requires both structural and semantic
  validation.
.




    Baden Hughes, Metadata Quality Evaluation: Experience from the Open
    Language Archives Community. 7th International Conference on Asian
    Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
    Proceedings, pp. 320-329.
                        Hughes (2004)
• Goals:
  – establish a baseline against which future
    instances can be compared;
      – provide assistance to data providers;
      – evaluate a set of domain-grounded controlled
        vocabularies.

.

    Baden Hughes, Metadata Quality Evaluation: Experience from the Open
    Language Archives Community. 7th International Conference on Asian
    Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
    Proceedings, pp. 320-329.
                  Hughes’ approach
• Each metadata record score from 0 - 10.
• There are two parts, a "Code Existence Score and an Element
  Absence Penalty," with weighting.
• The Code Existence Score is specific to the OLAC communities use
  of Dublin Core extensions.
• The Element Absence Penalty is based on the premise that the
  usefullness of a given metadata decreases in the absence of core
  metadata fields.
• The absence of a core element results in a negative 0.2 penalty.



 Baden Hughes, Metadata Quality Evaluation: Experience from the Open
 Language Archives Community. 7th International Conference on Asian
 Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
 Proceedings, pp. 320-329.
                  Hughes’ approach
• From this simple approach, an array of metrics are derived:
   –   archive diversity;
   –   metadata quality;
   –   core elements per record;
   –   core element usage;
   –   code usage;
   –   code and element usage;
   –   star rating.
• From these metrics a score is computed for each metadata
  record, each archive, and the community as a whole.

 Baden Hughes, Metadata Quality Evaluation: Experience from the Open
 Language Archives Community. 7th International Conference on Asian
 Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
 Proceedings, pp. 320-329.
   Mellon funded planning grant for
        L'Année philologique

1. Canonical Citation Linking: http://cwkb.org
   In collaboration with Eric Rebillard, Professor,
   Classics and History, and David Ruddy, Cornell
   University Library

2. OpenURL Quality
   Is it possible to build a system for evaluating
   OpenURL quality from a content provider?
 Key findings from 2008 Mellon
 OpenURL quality investigation

Hughes’ approach to metadata evaluation is
excellent scaffolding to help build a model
for OpenURL metadata evaluation, but it
does not match the problem exactly.
Constant 1: Key elements used by content
    providers in their link-to targets
  title - 64%
  spage - 64%
  volume - 61%
  issue - 60%     Based on an analysis of link-
  date - 48%      tos in the Cornell instance of
                  the III WebBridge link resolver
  aulast - 47%    product.
  issn - 35%
  atitle - 35%
  DOI - 14%
  ISBN – 5%
Constant 2: Frequency of element
  string patterns for all sources
                                 aulast
if ($element =~ /aulast/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.aulast/) { next; } }
     $patterns{allsids}{$genre}{"aulast"}++;
     $patterns{$sid}{$genre}{"aulast"}++;
     if ($value =~ /^[A-Za-z]+$/) {
     $patterns{$sid}{$genre}{"aulast_simple"}++; }
     elsif ($value =~ /^[A-Za-z]+, .+$/) {
     $patterns{$sid}{$genre}{"aulast_comma"}++; }
     elsif ($value =~ /^[A-Z][a-z]+( [A-Z]\.)+$/) {
     $patterns{$sid}{$genre}{"aulast_simpleplusinitial"}++; }
     else {$patterns{$sid}{$genre}{"aulast_other"}++; }
   }
Simple flat structure
          aulast_other examples
Ryan S Miller
Louise D Bryant
DAVID J MCKENZIE
%C4%90okovi%C4%87
Indu B Ahluwalia
Carreras-Sangr%c3%a0
Bautista-Casta%C3%B1o
O%27Shea
Melissa Ventura Marra
Guan XueYing%3B Yu Nan%3B Shangguan XiaoXia
                              spage

if ($element =~ /spage/) {
  if ($sid =~ /firstsearch/) { if ($element =~ /rft.spage/) { next; } }
  $patterns{allsids}{$genre}{"spage"}++;
  $patterns{$sid}{$genre}{"spage"}++;
  if ($value =~ /^\d+$/) {
  $patterns{$sid}{$genre}{"spage_number"}++; }
  elsif ($value =~ /^\d+-\d+$/) {
  $patterns{$sid}{$genre}{"spage_number_number"}++; }
  elsif ($value =~ /[A-Za-z].+\d/) {
  $patterns{$sid}{$genre}{"spage_string_w_number"}++; }
  else {$patterns{$sid}{$genre}{"spage_other"}++; }
}
          spage_other examples
•   1033 (6 pages)
•   85(19)
•   575 (11 pages)
•   283...290
•   PHYS
•   GLRM
•   58,+VI
                                 date
if ($element =~ /date/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.date/) { next; } }
     $patterns{allsids}{$genre}{"date"}++;
     $patterns{$sid}{$genre}{"date"}++;
     if ($value =~ /^\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd"}++; }
     elsif ($value =~ /^\d{4}-\d{2}$/) { $patterns{$sid}{$genre}{"date_dddd-
     dd"}++; }
     elsif ($value =~ /^\d{4}-\d{2}-\d{2}$/) {
     $patterns{$sid}{$genre}{"date_dddd-dd-dd"}++; }
     elsif ($value =~ /^\d{4}-\d{4}$/) { $patterns{$sid}{$genre}{"date_dddd-
     dddd"}++; }
     elsif ($value =~ /^\d{8}$/) { $patterns{$sid}{$genre}{"date_dddddddd"}++;
     }
     else {$patterns{$sid}{$genre}{"date_dateother"}++;}
   }
         date_other examples
•   1956 July
•   %7E1994
•   June 5%2C 2002
•   JUN 30 05
•   2006%282007%29
•   1922,+April+25th
•   %5B%5B1943-06-19%5D%5D
                           issn_other
if ($element =~ /issn/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.issn/) { next; } }
     $patterns{allsids}{$genre}{"issn"}++;
     $patterns{$sid}{$genre}{"issn"}++;
     if ($value =~ /^\d+-\d+$/) {
     $patterns{$sid}{$genre}{"issn_number_number"}++; }
     elsif ($value =~ /^\d+$/) { $patterns{$sid}{$genre}{"issn_number"}++; }
     elsif ($value =~ /^\d+X$/) { $patterns{$sid}{$genre}{"issn_numberX"}++; }
     elsif ($value =~ /^\d+-\d+X$/) {
     $patterns{$sid}{$genre}{"issn_number_numberX"}++; }
     else {$patterns{$sid}{$genre}{"issn_other"}++;
          print "$value\n";}
   }
         issn_other examples
• 0065-2598%28print%29
• 0018-5345+%28ISSN+print%29
• ISSN ISBN 0-9525091-5-6.
• 0021-8375%28print%29%7C1439-
  0361%28electronic%29
• 1471-2164+%28ISSN+online%29
• 0191-8699%3B0191-8699
• 0741-8329 (Print)%3B NLM Unique Journal
  Identifier%3A 8502311
            How often?
metric          frequency in July-Sep 2008 sample


au_last_other   5476
spage_other     772
date_other      591
issn_other      200
Demo of OQ UI
Element report
Element report
Pattern report
Pattern report
Pattern report
                  Next steps
• add non-Cornell data, from libraries or link
  resolver vendors (model is agnostic to source)
• confirm and publicize key elements used by
  target syntaxes
• outreach to content providers
• refine and expand metrics
• more reports
  – longitudinal by source
  – compare frequency of an element’s use across sources
  – compare frequency of an element pattern across
    sources
   How to stay in the loop
http://openurlquality.blogspot.com/

                 Adam Chandler
                 Database Management and Electronic
                 Resources Librarian
                 Library Technical Services
                 Cornell University Library
                 tel: 607-255-5760
                 email: alc28@cornell.edu

								
To top