Docstoc

XMLasDocumentFormat

Document Sample
XMLasDocumentFormat Powered By Docstoc
					Lecture 7 (1/2 hour)

XML as a Document Format
Sang Shin Java™ Technology Evangelist sang.shin@sun.com (You can use this material in any way you want, but if you can drop me an email when you do, that will be greatly appreciated.)

Topics
! ! ! !

XML as a Narrative document format TEI and DocBook Document permanence Transformation and presentation

XML as a Document Format
!

!

Documents are for humans while computer data is for computers Document examples
" Webpages,

Books, Scholarly articles, Poems, Short stories, Reference manuals, Tutorials, Textbooks, Legal pleadings, Contracts, Instruction sheets processing, object serialization, database exchange and backup

!

Computer data examples
" Order

Documents vs. Computer data
!

!

!

Documents are in more free form while Computer data has to be more rigid structure Documents could be more permanent while computer data could be both transitory (XML messages) and permanent (XML archives) XML handle both forms well

SGML
! !

XML is a simplified form of SGML Same characteristics as XML
" Semantic

and structural markup language for text document

!

! !

Useful for managing complex documents Too complicated SGML programs are not compatible

XML
!

XML handle all kinds of publishing well
" Web " Books " Magazines " Journals " Newspapers

!

XML is particularly useful when you need to publish in any or all of these media from the same contents

Narrative Document Structure
!

Minimum tree structure is assumed
" Book

has chapters

!

Meta-information about the document
" title,

document’s author, dates written and modified

!

Meta-information in a root element or child of a root element
" HTML
#

document

html is root element # Meta-information in head element

Narrative Document Structure
! !

Sections has subsections Sections and subsections has
" Title

or attributes " Mixed contents
!

" Elements

!

Paragraphs, headlines, figures, sidebars, footnotes Words, inline marked-up elements

Characteristics of Narrative Document
!

Linear
" Markup

is not fundamental part of the document for people to read

!

Composed of words
" Text

!

DTD was designed for narrative document
" Reason

for DTD not providing strong data typing capability

Narrative Document
!

Two primary SGML applications
(Text Encoding Initiative) " DocBook
" TEI

!

Like most SGML applications, both are moving toward XML

TEI
!

For classic literature
" Prose,

plays, poems, etc

!

!

Scholarly analysis rather than casual reading Defines
document structure " Grammatical structures " Transcription errors and emendations
" Common

!

XML version of TEI is now available

TEI Example
!

Example 6-1 from XML in a Nutshell

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE TEI.2 SYSTEM "xteilite.dtd"> <TEI.2> <teiHeader> <fileDesc> <titleStmt> <title>XML in a Nutshell</title> <author>Harold, Elliotte Rusty</author> <author>Means, W. Scott</author> </titleStmt> <publicationStmt><p></p></publicationStmt> <sourceDesc><p>Early manuscript draft</p></sourceDesc> </fileDesc> </teiHeader>

Continued
<text id="HarXMLi"> <front> <div type='toc'> <head>Table Of Contents</head> <list> <item>Introducing XML</item> <item>XML as a Document Format</item> <item>XML as a "better" HTML</item> </list> </div> </front> <body> <div1 type="chapter"> <head>Introducing XML</head> <p></p> </div1>

DocBook
!

!

For new documents like computer manuals, books Offers many advantages to technical authors
and nonproprietary " Can be created with any text editor " Free tools are available " Modular
" Open

!

Much of Linux documents, Several O’Reilly books

DocBook
!

!

Is authoring format not a presentation format Before humans read, a DocBook document needs to be converted to
" HTML " XSL

Formatting Objects (FO) " Rich Text Format (RTF) " TeX (high-quality printing)
!

Can create multiple output documents

DocBook Example
!

Example 6-2 from “XML in a Nutshell”

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.7//EN" "docbook/docbookx.dtd"> <book> <title>XML in a Nutshell</title> <bookinfo> <author> <firstname>Elliotte Rusty</firstname> <surname>Harold</surname> </author> <author> <firstname>W. Scott</firstname> <surname>Means</surname> </author> </bookinfo>

Continued
<toc> <tocchap><tocentry>Introducing XML</tocentry></tocchap> <tocchap><tocentry>XML as a Document Format</tocentry></tocchap> <tocchap><tocentry>XML as a "better" HTML</tocentry></tocchap> </toc> <chapter> <title>Introducing XML</title> <para></para> </chapter>

Document Permanence
! !

!

XML documents for archiving purpose XML documents for human consumption Recommendations
" Format

of the document (DTD) should be well-documented " Standard DTD formats such as DocBook and TEI should be prefered to cusom DTDs " Do not include data in stylesheet

Transformation and Presentation
!

!

XML data does not say anything about presentation Input format is not necessarily same as output format
" Need

for transformation from input format to output format
XHTML, XSL-FO, HTML # Postscript, RTF, PDF
#

!

XML document transformed into

!

XSLT as a standard

Output Format
!

! !

Postscript for printing on paper, overhead transparencies, slides PDF for all of the above and screen HTML is better than PDF for a screen

CSS
!

!

!

Alternative to transformation-based presentation Describes presentation format in each element Useful for narrative documents
" Typically

cares about fonts, styles, sizes " No reordering/rearrangement is needed
!

Does not work well for data-oriented documents

XSLT vs. Programming
!

Programming is useful when you do more than transformation
" Interpreting

certain elements as database

queries " Inserting the query results into output document " Asking users questions in the middle of transformation

Summary
!

!

!

XML is good for writing both narrativedocument and data-oriented document TEI and DocBook are useful for vendor-neutral, presentation-neutral narrative documents Transformation is useful

References
!

“XML in a Nutshell” written by Elliotte Rusty Harold & W. Scott Means, O’Reilly, Jan. 2001(1st Edition), Chapter 6 “XML as a Document Format”