XMLonTheWebAndXHTML

Document Sample
XMLonTheWebAndXHTML Powered By Docstoc
					Lecture 8 (1 hour)

XML on the Web & XHTML

Topics
! ! !

!

! !

Displaying XML XHTML Browser support of XHTML documents Browser support of arbitrary XML documents Stylesheet processing instruction Websearch

Motivation of use of XML on the Web
!

XML is very attractive language to write and serve web pages
" Well-formed’ness

results in less browser

incompatibility " Much easier for robots and search engines to parse and search since
XML provides semantical information as search criteria # since XML is highly structured
#

Motivation of use of XML on the Web
!

XML is very attractive language to write and serve web pages
" Many

XML tools are available " Standards such as XSLT and XSP have been developed or in the process of development " Contents are increasingly represented in XML (either natively or through an adaptor)

Displaying XML - 3 Options
! !

XHTML Direct display of raw XML documents of arbitrary vocabularies
stylesheet " XSLT stylesheet
" CSS

!

XHTML mixed with raw XML
" XHTML

1.1 (Modular XHTML) " Need Namespaces

XHTML
! !

Official W3C recommendation Defines XML-compatible version of HTML
" Welformed-ness

rule of XML " Validity rule of DTD " Misc. requirements

XHTML - Wellformed-ness
!

Add missing end tags
" </p>,

</li>

! !

!

Follow nesting rule of XML Matching case between opening tag and closing tag Put quotes around attribute values
align=center> wrong! " <p align=“center”> correct!
" <p

XHTML - Welformed-ness
!

Make sure all attributes have values
type=“checkbox” checked> wrong! " <input type=“checkbox” checked=“checked”> correct!
" <input

!

Replace any occurrences of & and < in character data or attribute values with &amp; and &lt;
" A&P

wrong! " A&amp;P correct!

XHTML - Welformed-ness
!

!

Make sure document has a single root element Follow empty tag rule
alone is wrong! " <hr></hr> correct! " <hr/> correct!
" <hr>

!

Encode the document in UTF-8 or UTF-16 or specify encoding in XML declaration

XHTML - Validity
!

!

!

Add DOCTYPE declaration pointing XHTML DTDs Make all elements and attributes to lowercase Eliminate non-standard elements
" “marquee”

!

Add required attributes
" “alt”

attribute of “img” element

XHTML - Other requirements
! !

!

The root element must be “html” DOCTYPE declaration uses PUBLIC ID to identify one of the three public XHTML DTDs The root element must have default namespace
"

<html xmlns=“http://www.w3.org/1999/xhtml”>

XHTML Example
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="HTML Tidy, see www.w3.org" /> <style type="text/css"> body {backgroundColor: #FFFFFF; color: #000000} a:visited {color: #0000CC} a:link {color: #990000} </style> <title>O'Reilly Shipping Information</title> </head> <body> <table border="0" width="515"> <tr> <td><img src="/www/graphics_new/generic_ora_header_wide.gif" style="border-width: 0" alt="O'Reilly"/> <h2>U.S. Shipping Information</h2> <hr style="height: 1; text-align: left"/> <dl> <dt><b>UPS Ground Service (Continental US only -- 5-7 business days):</b></dt>

Tool
!

Dave Ragget’s Tidy
" Convert

HTML documents to valid XHTML documents " http://www.w3.org/People/Raggett/tidy/

Three DTDs for XHTML
!

Strict
" Most

strict among the three " Contains all basic elements and attributes " Recommended DTD
!

Transitional
" Support

some deprecated elements and attributes frame-related elements

!

Frameset
" Allows

Browser Support for XHTML
! ! !

Internet 5.5 (and later versions) Netscape 6.0 (and later versions) Mozilla 5.0 (and later versions)

Direct Display of Arbitrary XML Documents in Browser
!

Ideal form of presentation
" No

need to convert XML document to XHTML or HTML at the server

!

!

Browsers can not understand semantics and vocabularies of arbitrary XML documents, however Need for stylesheet
" Instruction

of how to render

Three major stylesheet languages for now
! ! !

CSS1 CSS2 XSLT 1.0

xml-stylesheet
! !

Processing Instruction Define pseudo-attributes
" Two
#

are required

href # type
" Four
#

are optional

media # charset # alternate # title

xml-stylesheet - Required attributes
!

href
" Specifies

where stylesheet can be found in URL form

!

type
" Specifies

MIME media type of the stylesheet " text/css for cascading stylesheets " text/xml or application/xml for XSLT stylesheets

xml-stylesheet - Example
!

The processing instruction tells browsers to apply the CSS stylesheet person.css to the document before showing it to the reader
<?xml version=“1.0”?> <?xml-stylesheet href=“person.css” type=“text/css”?> <person> Alan Turing </person>

xml-stylesheet - Optional attribute: Media
!

Media
" screen

- computer monitor " tv - Television, WebTV, game console " projection - Slides " handheld - PDA’s, cell phones, GameBoys " print - paper " braille - Tactile feedback devices " aural - Screen readers and speech synthesizers

xml-stylesheet - Optonal attribute: Media
!

Use CSS Stylesheet at the specified URL for television, projection, and print media

<?xml-stylesheet href=“http://www.ibiblio.org/xml/style/titus.css” type=“text/css” media=“tv, projection, print”?>

xml-stylesheet - Optional attribute: charset
!

Specifies which character set the stylesheet is written

<?xml-stylesheet href=“big.css” type=“text/css” charset=“ISO-8859-6”?>

Internet Explorer
!

IE 4.0
" has

internal parser that is available only VBScript or JavaScript

!

IE 5.0
its parser - buggy " can display XML file directly with or without stylesheet " support CSS Level 1 and some of Level 2
" exposed

!

IE 5.5
" supports

its custom version of XSLT

Internet Explorer 5.0 Wellformed XML, No stylesheet

Internet Explorer 5.5 Wellformed’ness Checking

Internet Explorer 5.5 Without CSS

Internet Explorer 5.5 With CSS

Netscape and Mozilla
!

Mozilla 5.0 and Netscape 6.0
" Checks

well-formedness " Does not validate (based on my experiment) " Can display XML document in the browser
without CSS - default XSLT behavior # with CSS stylesheet
#

level 2 is supported " Does not support XSLT " Current Mozillla open source projects
XSLT work # SVG and MathML support
#

" CSS

Netscape 6.0 Wellformed’ness Checking

Netscape 6.0 Without CSS

Netscape 6.0 With CSS

Modular XHTML
!

! !

!

XHTML 1.1 now supports Modular XHTML XHTML DTDs are divided into modules You can pick and choose the modules you want Parameter entities connect the modules by including or leaving out particular modules

XHTML 1.1 Modules
!

! ! ! !

!

Structure module %xhtmlstruct.module Text module %xhtml-text.module List module %xhtml-list.module Applet module %xhtml-applet.module Stylesheet module %xhtmlstyle.module Many more

Example XHTML Document
<catalog_entry> <name>Aluminum Duck Drainer</name> <price>34.99</price> <item_number>2456</item_number> <color>silver</color> <description> <p> This study <strong>silver</strong> colored sink stopper dignifies the <em>finest kitchens</em>. It makes a great gift for </p> <ul> <li>Christmas</li> <li>Birthdays</li> </ul> <p>and all other occasions!</p> </description> </catalog_entry>

Module XHTML DTD
! !

!

!

Writing XHTML document is easy Tricky part is how to write appropriate DTD Instead of writing DTD from scratch, use XHTML 1.1 DTD modules Use parameter entity references

Modular XHTML DTD Example
<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT catalog (catalog_entry*)> catalog_entry (name, price, item_number, color, size, description)> name (#PCDATA)> size (#PCDATA)> price (#PCDATA)> item_number (#PCDATA)> color (#PCDATA)>

<!-- throw away the modules we don't need --> <!ENTITY % xhtml-hypertext.module "IGNORE" > <!ENTITY % xhtml-ruby.module "IGNORE" > <!ENTITY % xhtml-edit.module "IGNORE" > <!ENTITY % xhtml-pres.module "IGNORE" > <!ENTITY % xhtml-applet.module "IGNORE" > <!-- deleted a few to compress this example --> <!-- import the XHTML DTD, at least those parts we aren't ignoring. You will probably need to change the system ID to point to whatever directory you've stored the DTD in. --> <!ENTITY % xhtml11.mod PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml-modularization/DTD/xhtml11.dtd"> %xhtml11.mod; <!ELEMENT description (%Block.mix;)+)>

Server Side Transformation
!

Motivation
" Browsers

are not completely XML-aware " People still use old HTML browsers " Many client types " More control
!

!

Transform XML documents to HTML or WML documents in server XSLT processing engine + Stylesheet + XML document

Web Search
! !

XML provides “search’able” tags In practice
" Search

engines only see HTML front-end documents which are generated from back-end XML documents " Add XML hints to HTML pages
RDF # Dublin Core # “robots” processing instruction
#

RDF
! ! !

!

XML encoding for simple data model RDF document describes resources Each resource has zero or more properties Each property has name and value pair

RDF Example
!

Example 7-9 from “XML in a Nutshell”

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description about="urn:isbn:0596000588"> <author>Elliotte Rusty Harold</author> <author>W. Scott Means</author> </rdf:Description> </rdf:RDF>

Dublin Core
!

!

!

Standard set of information items for catalog Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights Can be encoded in various forms including HTML META tags and RDF

Dublin Core Example
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://www.purl.org/dc/"> <rdf:Description about="urn:isbn:0596000588"> <dc:Title>XML in a Nutshell</dc:Title> <dc:Creator>W. Scott Means</dc:Creator> <dc:Creator>Elliotte Rusty Harold</dc:Creator> <dc:Subject>XML (Document markup language)</dc:Subject>. <dc:Description> A brief tutorial on and quick reference to XML and related technologies and specifications </dc:Description> <dc:Publisher>O'Reilly &amp; Associates</dc:Publisher> <dc:Contributor>Laurie Petrycki</dc:Contributor> <dc:Date>2000-08-23</dc:Date> <dc:Type>text</dc:Type> <dc:Format>6" x 9"</dc:Format> <dc:Identifier>0596000588</dc:Identifier> <dc:Language>en-US</dc:Language> <dc:Relation>http://www.oreilly.com/catalog/xmlnut/</dc:Relation> <dc:Coverage>US UK ZA CA AU NZ</dc:Coverage> <dc:Rights>Copyright 2000 O'Reilly &amp; Associates</dc:Rights> </rdf:Description> </rdf:RDF>

Robots
!

HTML’s META tag
" Tells

search engines and robots whether they may index a page index=“yes” follow=“no”?>

!

XML document
" <?robots

Summary
!

! ! !

3 different ways of displaying XML documents on theWeb XHTML Stylesheet processing instruction Websearch mechanism based on XML

References
!

“XML in a Nutshell” written by Elliotte Rusty Harold & W. Scott Means, O’Reilly, Jan. 2001(1st Edition), Chapter 7 “XML on the Web”