					Accessing the data: going
beyond what the author
wanted to tell you
Interactive Publications and the Record of Science
                     ICSTI Winter Workshop
                                            Paris, Monday, February 8, 2010

   Brian McMahon
   International Union of Crystallography
   5 Abbey Square, Chester CH1 2HU, UK
       PDFs and data impoverishment

Henry Rzepa: Publishers are likely to love interactive PDF, since it is
easy to archive. However ... such objects are data impoverished.
Whereas with Jmol, one is obliged to provide semantically accurate
data (e.g. CML or equivalent), the PDF object is simply a (pre)rendering
of that data. Thus reconstituting a useful molecule from Jmol is trivial
(and that reconstitution can then be used for many other purposes),
reconstituting a molecule from a 3D PDF is likely to be non trivial, and
will almost certainly suffer information loss compared to the original
data. By all means, provide both, but I strongly urge that a 3D PDF
should not be the only object provided.
19 December 2009:
Jmol interactive visualizations
                 • Not new
                    Biochem J. (2008). 412
                 • Bespoke design /
                 • Expensive
                 • Requires consultation
                 • Supplementary
         The right tool for the job
Then (ca. 2004):
   • Protein structures (RasMol)
   • Small organic chemical molecules (Chime)
  • Crystal lattices (symmetry)
  • Inorganic materials (coordination polyhedra)
  • Displacement ellipsoids
  • Symmetry operations
  • Electron orbitals
  • Electron density maps
            Making it easier to use

• Editing toolkit
• High-quality immediate visual feedback
• Context-sensitive help
• Manuals, examples, tutorials
• Reference: McMahon, B. & Hanson, R.M. (2008).
   J. Appl. Cryst. 41, 811-814. A toolkit for publishing enhanced
      Interactive molecular visualizations
            enhance understanding

Acta Cryst. (2008). F64,

•   Rotate
•   Modify orientation
•   Alternative representations
•   Overlay representations
•   Interrogate
    Infrastructure for publication workflow

•    Server/client architecture
•    Ability to create interactive figures before or during
     article submission/review
•    Opportunity for peer review/revision
•    Auto-generation of static equivalent
•    Easy generation/activation of multiple scripts to provide
     alternative views
        Requirements for routine
     publication of enhanced figures
• Platform independence
• Web access for authors
• Serving visualization
  application and data
• Integration into
  submission/review procedures
• Integration into journal
  production workflow
• Automated generation of static
  copy (for failsafe/PDF
• Authoring tools
         The authoring environment
• The author uploads a data file
• The system provides different
  default styles according to the
  type of structure
• The author edits and annotates
  the view
• The author may supply
  additional scripts
• The author saves the result as
  an enhanced figure +
  publication-quality static figure
        Saving the enhanced figure
• Interactive applet
• Active scripts provided by the
• High-resolution static image
• Option to view dynamic or
  static image online
• Link to allow peer review
The toolkit editing interface

• Essential tool for authors
• Accommodates novice and advanced users
• Tabbed interface allows authors to concentrate
  on scientific aspects of visualization
• Presets tuned to journal style requirements
• Live testing, preview and feedback mechanisms
• Author may prepare enhanced
  figure ahead of publication
• Simply enter URL of edit
  workspace when asked to
  ‘upload source files’
• Presented alongside other
  conventional figures
• Available for peer review
• Can be edited in response to
  referee comments
Interactive authorship: publBio

             • Start with the data (PDB)
                 example 3jw1

             • Add structured text
             • Online look-up:
                 • authors
                 • references
                 • crystallization solution components
             • Validation
                 • references
             • Visualisation (Jmol)
             • Update data file as submission
Uniform (compatible) markup systems
              • Crystallographic Information
                Framework (CIF)
                 • Treat data/metadata,
                 text/numerical data as peers
                 • Domain-specific extensions
                 (dictionaries = ontologies)
                 • Image format
              • Some data fields may need
                to contain richer content
                 • Text markup
                 • Mathematical equations
                 • Interactive figure scripts
              • Machine validation of
                dictionary attributes
              • Methods
•   The working scientist really wants to interact with the data
•   What interactive PDF offers is currently limited
•   Publishers should develop compatible architectures
•   Need domain-specific implementations (learned societies)
•   Investment in new applications; integration with workflow
•   Education for a new paradigm
•   Archiving
     •   requires more standardisation
     •   proper compound document model
     •   concentrate on data (or semantic content), not the implementation
     •   ‘record not what it looks like, but what you are looking at’
• Distributed content sources
     • data not necessarily integral part of document
     • retrieval of non-discrete data sets

