Standards, the Web
and eLib Projects
Brian Kelly Email Address
UK Web Focus B.Kelly@ukoln.ac.uk
UKOLN
University of Bath
http://www.ukoln.ac.uk/
UKOLN is funded by the British Library Research and Innovation Centre,
the Joint Information Systems Committee of the Higher Education Funding
Councils, as well as by project funding from the JISC’s Electronic Libraries
Programme and the European Union. UKOLN also receives support from
1 the University of Bath where it is based.
Contents
• Introduction Aims of Talk
• To review key web
• Web Standards Overview
standards
• Web Standards: • To describe standards
• Data Formats bodies
• To identify opportunity
• Transport
for involvement
• Addressing • To briefly address
• Metadata implementation models
• Accessibility
• Programming Languages
• Distributed Searching
• Deployment Issues
• Questions
2
UK Web Focus / W3C
UK Web Focus:
• JISC funded post based at UKOLN (Bath Univ)
• Advises UK HE community on web issues
• Represents JISC on W3C
W3C (World Wide Web Consortium):
• International consortium, with headquarters at
MIT, INRIA and Keio University (Japan)
• Coordinates development of web protocols
• Four domains:
• Architecture • Technology & Society
• User Interface • Web Accessibility
3
Note
Standardisation JISC Standards
HTML Subcommitee
extensions
Proprietary
• De facto standards
PDF and Java?
• Often initially appealing
W3C (cf PowerPoint) PNG
• Produces W3C • May emerge as HTML
ISO
Recommendations standards • Produces ISO Z39.50
on Web protocols Java?
Standards
• Managed approach to
• Can be slow moving
developments
and bureaucratic
• Protocols initially
• Produce robust
developed by IETF standards
W3C members • Produces Internet
• Decisions made by Drafts on Internet protocols
W3C, influenced by • Bottom-up approach to developments
member and public • Protocols developed by
HTTP
review PNG interested individuals
URN
HTML • "Rough consensus and working
whois++
4 HTTP code"
The Web Vision
Tim Berners-Lee's vision for the Web:
• Automation of information management:
If a decision can be made by machine, it should
• All structured data formats should be based on
XML
• Migrate HTML to XML
• All logical assertions to map onto RDF model
• All metadata to use RDF
5
Standards
Need for standards to provide:
• Platform independence
• Application independence
• Avoidance of patented technologies
• Flexibility ("evolvability" - Tim Berners-Lee)
• Architectural integrity
• Long-term access to data
Ideally look at standards first, then find applications
which support the standards
Difficult to achieve this ideal!
6
Web Protocols Data Format
HTML
Web initially based on three
Addressing Transport
simple protocols: URL HTTP
• Data Formats
HTML (HyperText Markup Language)
provides the data format for native documents
• Addressing
URLs (Uniform Resource Locator) provides an
addressing mechanism for web resources
• Transport
HTTP (HyperText Transfer Protocol) defines
transfer of resources between client and server
7
HTML History
HTML 1.0 Unpublished specification. DTD
developed by Tim Berners-Lee (CERN).
HTML 2.0 Spec. based on innovations from NCSA
(forms and inline images!)
HTML 3.0 Proposed spec. (renamed from HTML+).
Very comprehensive
Failed to complete IETF standardisation
Little implementation experience
Proprietary Introduction of proprietary HTML elements
by Netscape and Microsoft
HTML 3.2 Spec. based on description of mainstream
innovations in marketplace
HTML 4.0 Current recommendation
8
Problems with Extensions
Device Dependency
• Resources are dependent on a particular browser
• Platform dependency
Costs
• Potential costs in re-engineering
Architecture
• Proprietary innovations have been flawed:
– Merging content and appearance
– Maintenance of resources
• Accessibility problems:
– Poor support for access by disabled
But:
9 • Experiments are needed
HTML 4.0, CSS 2.0 and DOM
HTML 4.0 used in conjunction with CSS 2.0
(Cascading Style Sheets) and the DOM provides an
architecturally pure, yet functionally rich environment
HTML 4.0 CSS 2.0
• Improved forms • Support for all HTML
• Hooks for stylesheets formatting
• Hooks for scripting • Positioning of HTML
languages elements
• Table enhancements • Multiple media support
• Better printing
DOM
CSS Problems • Document Object Model
• Changes during CSS development • Hooks for scripting
• Netscape & IE incompatibilities languages
• Continued use of browsers with • Permits changes to
known bugs HTML & CSS properties
10
and content
HTML Limitations
HTML 4.0 / CSS 2.0 have limitations:
• Difficulties in introducing new elements
– Time-consuming standardisation process
()
– Dictated by browser vendor (, )
• Area may be inappropriate for standarisation:
– Covers specialist area (maths, music, ...)
– Application-specific ()
• HTML is a display (output) format
• HTML's lack of arbitrary structure limits
functionality:
– Find all memos copied to John Smith
11 – How many unique tracks on Jackson Browne CDs
XML
XML:
• Extensible Markup Language
• A lightweight SGML designed for network use
• Addresses HTML's lack of evolvability
• Arbitrary elements can be defined (, , etc)
• Agreement achieved quickly - XML 1.0 became
W3C Recommendation in Feb 1998
• Support from industry (SGML vendors, Microsoft,
etc.)
• Support in Netscape 5 and IE 5
12
XML Concepts
Well-formed XML resources:
Make end-tags explicit: ...
Make empty elements explicit:
Quote attributes
13 Insert M-471
XML Deployment
Ariadne issue 14 has
article on "What Is XML?"
Describes how XML
support can be provided:
• Natively by new browsers
• Back end conversion
of XML - HTML
• Client-side conversion
of XML - HTML / CSS
• Java rendering of XML
Examples of intermediaries
See http://www.ariadne.ac.uk/issue15/what-is/
14
XLink, XPointer and XSL
XLink will provide sophisticated England
hyperlinking missing in HTML: France
• Links that lead user to multiple destinations
• Bidirectional links
• Links with special behaviors:
– Expand-in-place / Replace / Create new window
– Link on load / Link on user action
• Link databases
XPointer will provide
access to arbitrary
portions of XML resource
XSL stylesheet language will provide extensibility and
transformation facilities (e.g. create a table of contents)
15
Adobe PDF NOTE
PDF is not a W3C
Adobe PDF: activity
Proprietary format
Provides control over document appearance
(originally lacking in HTML)
Lack of support for document structure
Requires proprietary (though free) plugin (Acrobat)
Proprietary plugin provides richer functionality (e.g.
suppress printing)
Development work on improved hyperlinking
Becoming more open?
Conclusion
16
• Acceptable output format?
Addressing
URLs (e.g. http://www.bristol-poly
.ac.uk/depts/music/) have limitations:
• Lack of long-term persistency
– Organisation changes name
– Department scrapped
– Directory structure reorganised
• Inability to support multiple versions of resources
(mirroring)
URNs (Uniform Resource Names):
• Proposed as solution
• Difficult to implement (no W3C activity in this
area)
17
Addressing - Solutions
DOIs (Document Object Identifiers):
• Proposed by publishing industry as a solution
• Aimed at supporting rights ownership
• Business model needed
PURLs (Persistent URLs):
• Provide single level of redirection
Cache support:
• National caches could provide simple URN
support
For further information see:
Transport
HTTP/0.9 and HTTP/1.0:
Made the Web popular
Design flaws and implementation problems
caused poor performance
HTTP/1.1:
Addresses some of these problems
60% server support, client & proxy support
beginning
Performance benefits! (optimised implementation
reduces packet traffic by 2/3)
Is acting as fire-fighter
Poor usage counting
19
Not sufficiently flexible or extensible
HTTP/NG
HTTP/NG:
• Two W3C Working Groups:
Web Characterisations:
Study Web usage and form requirements
New log format for easier collection & anonymisation
Protocol Design:
Redesign Web as distributed object application
• Transition to HTTP/NG will be gradual
– Use of proxies / HTTP/1.1 UPGRADE header
– Layer HTTP/NG on top of HTTP/NG using POST
• Distributed searching as HTTP/NG application?
• W3C Briefing Package due out on 7 July
20
Metadata
Metadata - the missing architectural component
from the initial implementation
of the web
Addressing
URL
Metadata Needs: Transport Data format
• Resource discovery HTTP HTML
• Content filtering
• Authentication
• Improved navigation
• Multiple format support
• Rights management
21
Privacy Relevant to Jun 98
lis-elib discussion
P3P (Platform for Privacy Preferences):
• Example of a metadata application
• Privacy concerns are a current barrier to Web
development (esp. in US)
• P3P project developing methods for exchanging
Privacy Practices of Web sites and user
• Documents on architecture and vocabulary
available
• P3P1.0 draft spec released on 19 May 1998
• See
22
Digital Signatures
DSig (Digital Signatures initiative):
• Key component for providing trust on the web
• DSig 1.0 is based on PICS
• DSig 2.0 will be based on RDF and will
support signed assertion:
– This page is from the University of Bath
– This page is a legally-binding list of courses
provided by the University
• Potential for use in authentication but:
– Little activity in this area in W3C
– Implementation would require expensive
infrastructure
23
RDF
RDF (Resource Description Framework):
• Highlight of WWW 7 conference
• Provides a metadata framework ("machine
understandable metadata for the web")
• Based on ideas from content rating (PICS),
resource discovery (Dublin Core) and site
mapping (MCF)
• Applications include:
– cataloging resources – resource discovery
– electronic commerce – intelligent agents
– digital signatures – content rating
– intellectual property rights – privacy
• See
RDF Model RDF Data Model
RDF:
PropertyType Value
• Based on a formal Resource
data model (direct
label graphs) Property
• Syntax for
interchange of data
Cost
• Schema model page.html £0.05
Cost ValidUntil
page.html £0.05
PropObj 11-May-98
InstanceOf Value
Property
ValidUntil
PropName
11-May-98
Cost
25
RDF Example
Example of Dublin Core metadata in RDF
John Smith
John’s Home Page
26
Browser Support for RDF
Mozilla (Netscape's Trusted
3rd
source code release) Party
provides support for Metadata
RDF.
Mozilla supports site
maps in RDF, as well
as bookmarks and
history lists
Embedded
See Netscape's or Metadata
HotWired home page e.g.
sitemaps
for a link to the RDF
file.
Image from http://purl.oclc.org/net/eric/talks/www7/devday/
27
RDF Conclusion
RDF is a general-purpose framework
RDF provides structured, machine-
understandable metadata for the Web
Metadata vocabularies can be developed
without central coordination
Role for eLib projects in defining schemas?
RDF Schemas describe the meaning of
each property name
Signed RDF is the basis for trust
28
Languages
Java
• Powerful platform independent object-oriented
system with:
• Language • Java Virtual Machine • Chip • OS
• Owned by Sun but being standardised by ISO
• Beware Microsoft Java DK
• "This is the year the performance problem is solved"
• See
ECMAScript
• Standardised version of JavaScript
• Important role in DHTML, DOM, XSL, ...
• See
WAI
WAI (Web Accessibility Initiative):
• Ensures web specs address accessibility issues
• Based on universal design principles
Authoring:
• Page Author Accessibility Checklist and Guidelines
draft at
Software
• WAI Accessibility Guidelines: User Agent draft at
Note
• JISC DISinHE project at Dundee University.
See
30
Distributed Searching
Distributed searching important for the DNER
(Distributed National Electronic Resource)
http://prospero.ahds.ac.uk:8080/ahds_live/
ROADS prototype provides AHDS prototype provides
cross-searching using whois++ cross-searching using Z39.50
31
Distributed Searching Issues
Providing access to resources by software rather than
by humans raises several issues:
• Loss of visibility of service / value-added web services
• Possible performance problems
• Information overload
• Finding the service
Solutions:
• Giving visibility and pointers in results sets
• Service metadata:
– Service only available for cross-searching by non AC.UK
users outside peak hours
– Service covers UK Census data
• Need for agreed metadata standards (profiles,
32 rights issues, …)
Deployment Issues
More sophisticated deployment techniques can be
adopted to overcome deficiencies in simple model
Original Model Web server simply sends
file to client
HTML Web browser File contains redundant
resource server information (for old
browsers) plus client
Sophisticated Model interrogation support
HTML / Intelligent
Web Client
XML /
database proxy
server
resource browser
Intermediaries can provide Server
functionality not available at client: proxy
• DOI support
• XML support
Example of an intermediary
33 • Format conversion
Conclusions
To conclude:
• Standards are important, especially for national
initiatives, such as eLib
• Proprietary solutions are often tempting because:
– They are available
– They are often well-marketed and well-supported
– They may become standardised
– Solutions based on standards may not be properly
supported by applications
• Intermediaries may have a role to play in deploying
standards-based solutions
• Opportunity for involvement with standards bodies
(e.g. W3C Working Groups)
34
Question Time
Any questions?
35