Using Trackback to Support
Brian Matthews, Katie Portwin, Cathy
Jones, Bryan Lawrence
STFC Rutherford Appleton Laboratory
Science and Technology
We employ more than 2200 staff who
are deployed at 7 locations, these
are: Swindon where the headquarters
is based: the Rutherford Appleton
Laboratory, the Daresbury Laboratory,
the Chilbolton Observatory, the UK Astronomy
Technology Centre in Edinburgh, the Isaac
Newton Group of Telescopes on La Palma; and
the Joint Astronomy Centre in Hawaii.
Annually over 15000 visiting
Scientists from around the
world from both Academia
Research and Science
Support at STFC
E-science at STFC aims to:
Deliver world class science
Engender world class science
Communicate world class science
By providing services, development and
innovation in IT
Academics have long established means of
- Journal Publication;
- Stored in Libraries;
- Which cite prior published work
- Which inspires new ideas
- Which leads to new data being
- Which leads to new papers.
Traditional Citations – refer backwards between formal written
If you try hard enough, you can trace back the prior influences
on papers back to the 17th Century.
Usually does not include citation of data.
A Changing Landscape
Over the last few years, the wider availability of
the internet and the computing in science has
lead to a change in the research discourse:
•Institutional repositories - capturing the
published output of the institution
•Data Repositories capturing and classifying
science data for reuse
Find datasets that in repositories which are used
to derive publications.
Find papers which are written from datasets.
•Can check the results of the paper
•Can perform new secondary analyses
•Can judge the value of a data set from its use
•Can give credit to data providers
•Can also add forward links to paper- to
evaluate their use.
Adding Forward Citations
Include Forward Citations – refer forwards to
derived datasets and formal written publications
Include data into the network of citations.
Microsoft’s Science 2020
Modern scientific communication relies on
both journals and databases. At present these
are not integrated.
By 2020 mutual linking will be commonplace
and publications just containing peer-reviewed
data will become available.
Citation, Location and Deposition in Discipline and Institutional Repositories
Funded via a JISC grant, through the Digital Repositories programme
- July 2005-Oct 2007
Bryan Lawrence (PI, BADC)
Sam Pepler (Project Manager, BADC)
Sue Latham (BADC)
Pauline Simpson (NOCS)
Jessie Hey (Southampton)
Brian Matthews (STFC)
Catherine Jones (STFC)
Alistair Miles (STFC)
Katie Portwin (STFC)
Shoaib Sufi (STFC)
Kevin O’Neil (STFC)
Katherine Bouton (Reading, NCAS)
Citation and linking in
In order to achieve this scenario we need to provide a set of key
• Publishing of Data
– Conventions for the citation of data
– Can then treat data citation in similar way to publications
• Browsing and searching
– across different repositories
– across data and publication
• Cross-citation of data and publication
– forward and backward citation
– need to maintain currency of citation links
– A simple mechanism to push citation information between repositories
A practical look at citation of data and how repositories could
communicate citation information.
In this context “publication” is defined as the process
through which data is fixed and made retrievable over
the long term, and may imply that there has been
some quality control process.
– Defining data : fixing and encapsulating a
“meaningful” data set
– Quality Control : Publishers, Data Centres
Natural Environment Research Council, Mesosphere-Stratosphere-
Troposphere Radar Facility [Thomas, L.; Vaughan, G.] . Mesosphere-
Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet].
Version 2, Cartesian products. British Atmospheric Data Centre (BADC),
1990- [cited 2006 Apr 25]. Available from http://badc.nerc.ac.uk/data/mst.
Ideally the archives holding the datasets and publications would be
notified that a paper citing them had been submitted.
– Metadata associated with those records would be updated to
reflect the citations.
– The metadata in the publication repository should also link
to the metadata in the data archives and vice versa.
– It would be great if this notification could be done
• Tedious to enter citations
• “forward citations” (“cited-by”) are hard to track
To support this, we need to provide a citation notification service
• Federated Repositories register with the service
• Repositories notify the service of citations
• The service informs (via broadcasting or targeting) repositories of
•Service provides sufficient information to update metadata
Architectures for Notification
Services Repository A
Resource B1 Repository B
3. RSS (peer-to-peer pull)
2. Linkback (blogging) (peer-to-peer push)
1. Broker (mediated push)
Repository A Resource A1
Resource B1 1. Match
Repository B URL prefix
4. Harvester (Crossref) (mediated pull) 5. Trackback with Registration (Broker/Trackback Hybrid)
Using Trackback for
However, the Linkback approach was chosen because:
•Linkback is a pure peer-to-peer approach
– Does not rely on third party services.
• Linkbacks can be received without the source or target repository knowing of
each other’s existence in advance
• Existing well-known and well-defined simple protocols for article cross-
referencing used in the Blogging.
A number of Linkback specifications exist:
•Trackback is a simple “framework for peer-to-peer communication”
•supported by blogging tools such as MoveableType
•It has a relatively simple metadata transmission uses HTTP POST.
•Problems with Spamming are well-known and mitigation can be done.
Consequently, Trackback was chosen as basis the experimental Citation
With our own extensions….
This publication has a citation
to a technical report
Sends trackback call to this URI
Trackback Sender sends the following message to the
HTTP POST to http://trackbackurl-for-resourceB
&title=[title of source object]
&url=[URL of source object]
&excerpt=[excerpt from source object]
&blog_name=[website of source repository]
&metadata=[rich metadata about source object]
&metadataformat=[format of metadata]
&type=[type of cross-reference being
Added three extra keys to the Trackback protocol
In the &metadata key can add metadata on the
citing paper – in RDF.
<dc:title>Best article ever</dc:title>
&rft.jtitle=Lecture Notes in Computer Science
After Trackback – cited-by
Added this cited by link
Further Extensions to
Once we have established a P2P
communications channel between repositories,
we can start to use it for additional functions.
Well-known problem of Trackback – Spamming
with bogus messages.
Maintain a whitelist of known and trusted sites.
Not the most satisfactory of solutions to the Spamming
Still on the lookout for good solutions to this problem.
The normal trackback sends metadata from the
sender to the receiver.
Can also send metadata from the receiver to the
sender – reverse trackback.
Embed metadata in the first requested page with
This metadata can be used to enhance or
correct the citation data.
Could mean that user data entry is at a
Multiple metadata formats
Senders and receivers could support a number
of different metdata formats.
Embed supported metadata formats in the
Also transmit metadata format in the Ping
message using the &metadataformat key
We have assumed so far that the citation is a “backward-citation”.
We can use the &type key Trackback extension to indicate the type of
citation being added. This key can take the following values.
backward The citation being notified is a backward citation. This is the default if the
type key is omitted.
cites Same as backward
forward The citation being notified is a forward citation.
cited-by Same as forward
copy The sender is notifying that it holds a copy of an object which is held by
A number of error situation can occur which should be
signalled and handled to keep the data clean.
– Duplicates: The same citation can be notified a
number of times as the citing entry is updated.
– Anti-Trackback: citations may need to be
retracted. A small extension to the protocol would
allow the sender to send a “delete” request.
In both these cases, it should be up to the receiving
repository to determine how it responds to the request.
Comments on Trackback
•A simple existing protocol
– P2P – loosely federates Some problems or extensions are
repositories under consideration
– Extended to carry •Data entry is HARD
metadata of the citation •Link to metadata– not full text
– To add “cited-by” links •Spamming – anyone could send
•Can also indicate which trackbacks
metadata is expected
– Simple Dublin Core
– ePrints Application – Administrator intervention
Profile •Multiple entries
•Can also use the metadata of – Same citation multiple times
the receiver – Same citation in different
– Improves the citation repositories
•Retraction of citation
•Implemented in ePubs
– Also partially in BADC – A delete protocol
– Receiver only – send
email to admin.
CLADDIER supports the scientific process with federated
This requires the cross-linking network of information objects.
Which needs to be stored, maintained and searched
Now doing some user testing
Tools and ideas relatively straightforward
Extending existing components
Keep it simple – so it will get used
Simple P2P protocols can lead to great possibilities in new