Using Trackback to Support Citation Notification Services Presentation by bestt571

VIEWS: 172 PAGES: 36

More Info
									Using Trackback to Support
    Citation Notification
           Services
   Brian Matthews, Katie Portwin, Cathy
          Jones, Bryan Lawrence
             E-Science Centre,
   STFC Rutherford Appleton Laboratory
        Science and Technology
            Council Facilities
              We employ more than 2200 staff who
              are deployed at 7 locations, these
              are: Swindon where the headquarters
is            based: the Rutherford Appleton
              Laboratory, the Daresbury Laboratory,
the Chilbolton Observatory, the UK Astronomy
Technology Centre in Edinburgh, the Isaac
Newton Group of Telescopes on La Palma; and
the Joint Astronomy Centre in Hawaii.

Annually over 15000 visiting
Scientists from around the
world from both Academia
and Industry.
     Research and Science
       Support at STFC
E-science at STFC aims to:
           Deliver world class science
              Engender world class science
               Communicate world class science

By providing services, development and
innovation in IT
         Academic Discourse
Academics have long established means of
communication via:

     - Journal Publication;
     - Stored in Libraries;
     - Which cite prior published work

     - Which inspires new ideas
     - Which leads to new data being
       collected
     - Which leads to new papers.
             Backward Citations




Traditional Citations – refer backwards between formal written
                           publications

If you try hard enough, you can trace back the prior influences
              on papers back to the 17th Century.

          Usually does not include citation of data.
      A Changing Landscape
Over the last few years, the wider availability of
the internet and the computing in science has
  lead to a change in the research discourse:

   •Institutional repositories - capturing the
      published output of the institution

 •Data Repositories capturing and classifying
           science data for reuse
BADC
          New Opportunities
Find datasets that in repositories which are used
             to derive publications.
  Find papers which are written from datasets.

     •Can check the results of the paper
   •Can perform new secondary analyses
•Can judge the value of a data set from its use
      •Can give credit to data providers
  •Can also add forward links to paper- to
             evaluate their use.
    Adding Forward Citations




  Include Forward Citations – refer forwards to
derived datasets and formal written publications

   Include data into the network of citations.
     Microsoft’s Science 2020
              Report

  Modern scientific communication relies on
both journals and databases. At present these
              are not integrated.

 By 2020 mutual linking will be commonplace
and publications just containing peer-reviewed
         data will become available.


http://research.microsoft.com/towards2020science/downloads.htm
                 About CLADDIER
Citation, Location and Deposition in Discipline and Institutional Repositories

    Funded via a JISC grant, through the Digital Repositories programme
            - July 2005-Oct 2007
                                                  Bryan Lawrence (PI, BADC)
                                            Sam Pepler (Project Manager, BADC)
                                                     Sue Latham (BADC)
                                                   Pauline Simpson (NOCS)
                                                  Jessie Hey (Southampton)
                                                    Brian Matthews (STFC)
                                                    Catherine Jones (STFC)
                                                     Alistair Miles (STFC)
                                                     Katie Portwin (STFC)
                                                      Shoaib Sufi (STFC)
                                                     Kevin O’Neil (STFC)
                                              Katherine Bouton (Reading, NCAS)
                 Citation and linking in
                       repositories
    In order to achieve this scenario we need to provide a set of key
                               mechanisms

• Publishing of Data
    –    Conventions for the citation of data
    –    Can then treat data citation in similar way to publications
• Browsing and searching
    –    across different repositories
    –    across data and publication
• Cross-citation of data and publication
    –    forward and backward citation
    –    need to maintain currency of citation links
    –    A simple mechanism to push citation information between repositories

        A practical look at citation of data and how repositories could
                      communicate citation information.
                         Data Publication
 In this context “publication” is defined as the process
 through which data is fixed and made retrievable over
   the long term, and may imply that there has been
              some quality control process.

     – Defining data : fixing and encapsulating a
       “meaningful” data set
     – Quality Control : Publishers, Data Centres

Natural Environment Research Council, Mesosphere-Stratosphere-
Troposphere Radar Facility [Thomas, L.; Vaughan, G.] . Mesosphere-
Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet].
Version 2, Cartesian products. British Atmospheric Data Centre (BADC),
1990- [cited 2006 Apr 25]. Available from http://badc.nerc.ac.uk/data/mst.
           Maintaining Links
Ideally the archives holding the datasets and publications would be
notified that a paper citing them had been submitted.

    – Metadata associated with those records would be updated to
      reflect the citations.
    – The metadata in the publication repository should also link
      to the metadata in the data archives and vice versa.
    – It would be great if this notification could be done
      automatically.
        • Tedious to enter citations
        • “forward citations” (“cited-by”) are hard to track

To support this, we need to provide a citation notification service

• Federated Repositories register with the service
• Repositories notify the service of citations
• The service informs (via broadcasting or targeting) repositories of
citation,
•Service provides sufficient information to update metadata
            Architectures for Notification
                      Services                                                                                                    Repository A

                                                               Repository A
                                                                                                                                     Resource A1

                                                                  Resource A1




                                                                                                                                    references




                                                                                                                                                                     1. poll
                                                                 references




                                                                                1. notify
                                                                                                                                    Resource B1

                                                                 Resource B1                                                      Repository B

                                                               Repository B
                                                                                                                   3. RSS (peer-to-peer pull)
                                          2. Linkback (blogging) (peer-to-peer push)
 1. Broker (mediated push)
                                                                                            Repository A

                                                                                                                                                 Registry
         Repository A                                                                           Resource A1




                                                                                              references
             Resource A1




                                                                                                                                                       0. register
                                                                                                              2. notify
           references




                                     Notification Service


                                                                                              Resource B1
                                                                                                                                                  Registry
           Resource B1                                                                                                1. Match
                                                                                            Repository B             URL prefix
         Repository B

4. Harvester (Crossref) (mediated pull)                     5. Trackback with Registration (Broker/Trackback Hybrid)
                   Using Trackback for
                       Notification
However, the Linkback approach was chosen because:

•Linkback is a pure peer-to-peer approach
     – Does not rely on third party services.

• Linkbacks can be received without the source or target repository knowing of
each other’s existence in advance

• Existing well-known and well-defined simple protocols for article cross-
referencing used in the Blogging.

A number of Linkback specifications exist:

•Trackback is a simple “framework for peer-to-peer communication”
•supported by blogging tools such as MoveableType
•It has a relatively simple metadata transmission uses HTTP POST.
•Problems with Spamming are well-known and mitigation can be done.

Consequently, Trackback was chosen as basis the experimental Citation
Notification Service.

With our own extensions….
Trackback Protocol
Sender Publication




       This publication has a citation
       to a technical report
                         Adds Citation




Sends trackback call to this URI
                Embedded Metadata

Trackback URI
             Trackback Message
Trackback Sender sends the following message to the
Trackback URL.

HTTP POST to http://trackbackurl-for-resourceB
  &title=[title of source object]
  &url=[URL of source object]
  &excerpt=[excerpt from source object]
  &blog_name=[website of source repository]
  &metadata=[rich metadata about source object]
  &metadataformat=[format of metadata]
  &type=[type of cross-reference being
notified]

Added three extra keys to the Trackback protocol
                 Trackback Metadata
    In the &metadata key can add metadata on the
                citing paper – in RDF.

<rdf:RDF
   xmlns:dcterms="http://purl.org/dc/terms/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/" >
 <rdf:Description rdf:about="http://theidentifier">
   <dc:title>Best article ever</dc:title>
   <dc:creator>F Flinstone</dc:creator>
   <dc:creator>J Bloggs</dc:creator>
   <dcterms:issued>2007</dcterms:issued>
   <dcterms:isPartOf>urn:ISSN:1234-5678</dcterms:isPartOf>
   <dcterms:bibliographicCitation>&amp;ctx_ver=Z39.88-2004
           &amp;rft_val_fmt=info:ofi/fmt:kev:mtx:journal
           &amp;rft.jtitle=Lecture Notes in Computer Science
           &amp;rft.volume=1
           &amp;rft.issue=11
           &amp;rft.spage=111
    </dcterms:bibliographicCitation>
  </rdf:Description>
</rdf:RDF>
            After Trackback – cited-by
                    link added
                           Receiver Publication




Added this cited by link
         Further Extensions to
              Trackback

       Once we have established a P2P
communications channel between repositories,
 we can start to use it for additional functions.
                                 Whitelists
Well-known problem of Trackback – Spamming
            with bogus messages.
Maintain a whitelist of known and trusted sites.
  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:wl ="http://epubs.cclrc.ac.uk/vocab/trackback/">
       <wl:repository rdf:about="http://testserver.claddier">
           <wl:hostname>Claddier Testserver</wl:hostname>
           <wl:ipaddress>130.246.142.155</wl:ipaddress>
      </wl:repository>
      <wl:repository rdf:about="http://test.home">
           <wl:hostname>Test Home</wl:hostname>
           <wl:ipaddress>130.246.242.92</wl:ipaddress>
      </wl:repository>
  </rdf:RDF>


  Not the most satisfactory of solutions to the Spamming
     problem.
  Still on the lookout for good solutions to this problem.
          Reverse Trackback
 The normal trackback sends metadata from the
             sender to the receiver.
Can also send metadata from the receiver to the
          sender – reverse trackback.
Embed metadata in the first requested page with
                Trackback URL.
   This metadata can be used to enhance or
           correct the citation data.
     Could mean that user data entry is at a
                   minimum
Embedded Metadata




      Metadata for
   reverse trackback
    Multiple metadata formats
Senders and receivers could support a number
         of different metdata formats.
  Embed supported metadata formats in the
                 scraped page.
  Also transmit metadata format in the Ping
  message using the &metadataformat key
           Embedded Metadata




Formats
accepted
                              Citation Type
We have assumed so far that the citation is a “backward-citation”.

We can use the &type key Trackback extension to indicate the type of
citation being added. This key can take the following values.

    backward      The citation being notified is a backward citation. This is the default if the
                 type key is omitted.


    cites         Same as backward


    forward       The citation being notified is a forward citation.


    cited-by      Same as forward


    copy          The sender is notifying that it holds a copy of an object which is held by
                 the receiver.
              Error correction
A number of error situation can occur which should be
    signalled and handled to keep the data clean.

  – Duplicates: The same citation can be notified a
    number of times as the citing entry is updated.

  – Anti-Trackback: citations may need to be
    retracted. A small extension to the protocol would
    allow the sender to send a “delete” request.

  In both these cases, it should be up to the receiving
repository to determine how it responds to the request.
          Comments on Trackback
•A simple existing protocol
    – P2P – loosely federates    Some problems or extensions are
      repositories               under consideration
    – Extended to carry          •Data entry is HARD
      metadata of the citation   •Link to metadata– not full text
    – To add “cited-by” links    •Spamming – anyone could send
•Can also indicate which         trackbacks
metadata is expected
                                     – Whitelists
    – Simple Dublin Core
    – ePrints Application            – Administrator intervention
      Profile                    •Multiple entries
•Can also use the metadata of        – Same citation multiple times
the receiver                         – Same citation in different
    – Improves the citation            repositories
      metadata
                                 •Retraction of citation
•Implemented in ePubs
    – Also partially in BADC         – A delete protocol
    – Receiver only – send
      email to admin.
                     Conclusions
   CLADDIER supports the scientific process with federated
                       repositories

This requires the cross-linking network of information objects.
      Which needs to be stored, maintained and searched
                 Now doing some user testing

          Tools and ideas relatively straightforward
              Extending existing components
             Keep it simple – so it will get used

  Simple P2P protocols can lead to great possibilities in new
                          contexts.

                 http://claddier.badc.ac.uk/

								
To top