Docstoc

NLM_DTD_archiving_IMechEProceedingsArchive2007

Document Sample
NLM_DTD_archiving_IMechEProceedingsArchive2007 Powered By Docstoc
					    The NLM DTD in
archiving – a case study



http://archive.pepublishing.com




  Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
                        Themes
• Practical example of using this DTD
• Why we chose the NLM archiving and
  interchange DTD (“Green DTD”) and how
  it helped.




       Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
    the Institution of Mechanical
        Engineers (IMechE)
• Original (1847) mission statement:
  – “to enable mechanics and engineers engaged
    in the different manufacturies, railways and
    other establishments in the kingdom, to meet
    and correspond, and by a mutual interchange
    of ideas respecting improvements in the
    various branches of mechanical science to
    increase their knowledge, and give an
    impulse to inventions likely to be useful to the
    world”.
         Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
              Background (2)
• Publishing from 1847 continuously to the
  present day - Professional Engineering
  Publishing Ltd.
• Other Institutions merged with IMechE
  during its history

• Chris Baker

        Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
       What is in the archive?
• 22,485 articles in the archive (so far…)
• Technical papers (many seminal papers and
  leading names); shows evolution of modern
  journal publishing
• Social interest –e.g. the employment of women
  in munitions factories
• Local history - Works visits
• Biographical and genealogical interest
  Obituaries and memoirs, membership records.
• History of the institutions – e.g. minutes of
  meetings

         Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
             Project overview / timeline
                                        Not to scale


   Planning &
                                                                      Data complete Jan
   Approval
                                                                      2006
                  Data Capture spec.,
                  samples

                                         Full data production
Mid 2004                                       Website planning, building and testing


           Jan 2005
                                 April 2005
                                                                              Go live March 06



                      Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
The start of a project like this…




     Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
        Where did we start?
• What features?
  – IMechE Library’s knowledge of their users’
    needs
  – market research
• What technical issues?
  – Discussions with potential digitizers;
  – rough page counts;
  – helpful advice from contacts at IoP, RSC and
    ICE.

        Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
             It was decided….
• To limit initial content of archive to things that
  were published as IMechE Proceedings,
   – Therefore Online archive would be broadly journal-
     like: maybe choose a host used to hosting Journals
• That there were business and functionality
  benefits of using same host for archive and
  current Journals
   – Therefore hosting negotiations were bound up with
     upcoming renewal of hosting contract for current
     Journals

          Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
At this point (late 2004) we already
 knew we wanted to use the NLM
                 DTD.

                           Why?


       Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
     Why use the NLM DTD?
• We were not going to write our own DTD!
• We wanted a vendor-independent DTD.
• We wanted a DTD that was familiar to
  digitizers and hosts.
• We needed a DTD that would enable us to
  create rich metadata in the way we wanted
  to do it.
• We wanted a very flexible DTD.
        Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
              Flexibility in DTDs
  Totalitarian: “Everything that is                Liberal – “Where there is
   not compulsory is forbidden”.                    flexibility there must be
                                                         responsibility.”




Increasing the variety of material that can be handled under the one DTD




Decreasing number of rules already assumed by DTD (so you have to
make your own rules (and QA systems)


           Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
Resist monkeying with the DTD




     Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
 Decisions, decisions!
                                • how will the digitiser
                                  recognise which kind of
                                  article has been
                                  encountered?
                                • What metadata are to be
                                  captured for each kind?
                                • How does the digitiser
                                  recognise start and
                                  stop points for an
                                  article?
                                • Inconsistencies are
                                  most likely challenge

Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
        Still more work to do…
• Abstracts
• keyword the content - choice of
  categories and of vocabularies
  important.
• How to exploit hidden assets – e.g.
  images?
• Good file-naming and progress
  tracking systems are important.
• QA arrangements.
• Expect some surprises!
• Document all this – host needs it
  later.
           Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
                …or else




Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
The Front matter element
                                                                    Some content too
                                                                      old for ISSN




                                                      Could have handled
                                                     unusual requirements
                                                             here

http://dtd.nlm.nih.gov/archiving/tag-library/2.3/index.html NB:
“Green” DTD
? = optional (0 or 1 time) ; + = required, can have >1 ;* = optional (0, 1 or >1 times)
         Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
                            The article-meta element
    Used for
filename/DOI                                          Did use to sort                Could have
                                                      kinds of paper,                 handled
                                                         as well as                    unusual
                                                          indicate                  requirements
                   Could have                              subject
                   used to sort                                                         here
                     kinds of
                      paper




? = optional (0 or 1 time) ;                                               Abstracts – big
+ = required, can have >1 ;                                               issue/opportunity
* = optional (0, 1 or >1 times)
                                                                                for us
                     Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
             Project overview / timeline
                                        Not to scale


   Planning &
                                                                      Data complete Jan
   Approval
                                                                      2006
                  Data Capture spec.,
                  samples

                                         Full data production
Mid 2004                                       Website planning, building and testing


           Jan 2005
                                 April 2005
                                                                              Go live March 06



                      Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
Into full data production
                                 • Batch and dispatch
                                   books
                                 • Abstracts
                                 • QA
                                 • Progress monitoring

                                 • Lots of work (full time
                                   post 1 year)

 Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
                  Building the site



• Issues about how best the data will fit into the
  host’s database structure
• How to display oddities – e.g. the large technical
  drawings
• Publisher should keep a correct archive copy of
  the data (need to reflect fixes by host).
          http://archive.pepublishing.com
          Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
• Went live March 2006 – on time, below
  budget!
• Doing fine commercially
• Reduces half a ton of books to one
  website.
• Some metadata features not exploited on
  current site - mostly future-proofing items.

         Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
      With especial thanks to…
                (in no particular order)

• Alan Singleton, Peter Williams, Mick
  Spencer, Rosie Grimes, Sarah Espiner –
  Professional Engineering Publishing
• Keith Moore, Sarah Rogers, Rebecca
  Stockley – IMechE library
• Ritu Popli ,Chris McKeown et al. at
  Techbooks (now Apatra)
• Heather Klusendorf et al. at Metapress
        Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk
            Any Questions?
        The IMechE Proceedings Archive -
          http://archive.pepublishing.com
         NLM Archiving DTD (now v 2.3)-
      http://dtd.nlm.nih.gov/2.3/index.html
Chris Baker (Chris Baker Project Management
                     Limited)
           Chris@chrisjbaker.co.uk

       Chris Baker Project Management Ltd.: chris@chrisjbaker.co.uk

				
DOCUMENT INFO