HTML (HyperText Markup Language)

Document Sample
HTML (HyperText Markup Language) Powered By Docstoc


            WANDA Y. McLEAN
             10 MARCH 2002
                            TABLE OF CONTENTS

I. Abstract

II. In the Beginning

       a. SMGL (Standard Generalized Markup Language)

       b. HTML (HyperText Markup Language)

              1. The Evolution of HTML

              2. Elements and Structure of an HTML Document

              3. The Limitations of HTML

III. The Present

       a. What is XML (eXtensible Markup Language)?

       b. The Benefits of XML

       c. XML Specifications

       d. How Does XML differ from HTML?

       e. How Does XML Work?

       f. The XML Document

       g. XHTML (eXtensible HyperText Markup Language)

IV. The Future

       a. Language Applications

       b. Developing Technology

V. Conclusion
    The Extensible Markup Language (XML) is the trend that will replace HTML as the standard
for Web-based applications. It has been taking the Internet world by storm. Created by the
World Wide Web Consortium (W3C) in 1996, XML is intended to make it easy and
straightforward to use SMGL-defined documents, and easy to transmit and share them across the
Web. XML is not a presentation language, but by providing a common language for describing
data, it will enable more precise searching, let businesses share data more efficiently, and make
navigating data much easier.

In The Beginning
   To fully understand how and why XML came about, we must trace it back to its roots starting
with SMGL (Standard Generalized Markup Language).

SGML (Standard Generalized Markup Language)

   XML and HTML both grew out of the Standard Generalized Markup Language (SGML), the
international standard for defining descriptions of the structure of different types of electronic
document. SGML is a metalanguage—a language that lets you create other languages. It is very
large, powerful and complex. SGML allowed various groups and industries (such as the airline
industry) to create tags, elements, and attributes, called Document Type Definitions (DTDs),
specific to their applications (Gottesman, 2001).

   SGML is the mother tongue, and has been used for describing thousands of different
document types in many fields of human activity, from transcriptions of ancient Irish
manuscripts to the technical documentation for stealth bombers, and from patients’ clinical
records to musical notation. SGML is very large and complex, however, and probably overkill
for most common applications (Flynn, 2002).

HTML (HyperText Markup Language)

   The great enabling technology of the Internet has been HTML, the language used to define
the presentation of all the pages we encounter on the Web. HTML has given us the ability to
combine text and graphics on a page and create an intricate system of hyperlinks between pages.
But HTML isn’t very useful when it comes to describing information (Gottesman, 2001).

   HTML was designed to specify the logical organization of a document, with important
hypertext extensions. It was not designed to be the language of a word processor such as Word
or WordPerfect. This choice was made because the same HTML document may be viewed by
many different “browsers”, of very different abilities. This, for example, HTML allows you to
mark selections of text as titles or paragraphs, and then leaves the interpretation of these marked
elements up to the browser (Graham, 1998).
    It defines a very simple class of report-style documents, with sections headings, paragraphs,
lists, tables, and illustrations, with a few informational and presentational items and some
hypertext and multimedia (Flynn, 2002).

The Evolution of HTML

  HTML 2.0: The first definitive version. It had most of the elements we know today,
however, it lacked some of the Netscape/Microsoft extensions and it did not support tables or
ALIGN attributes.

   HTML 3: It was an attempt to upgrade the features and utilities of the earlier version. This
version was never completed or implemented,

   HTML 3.2: The next official version. It integrated many of the upgraded features of HTML
3 along with support for TABLES, image, heading and other ALIGN attributes. It still lacked
some of the Netscape/Microsoft extensions, such as FRAMEs, EMBED, and APPLET.

  HTML 4.01: The current official standard. It includes support for most of the proprietary
extensions, plus support for extra features (Internationalized documents, support for Cascading
Style Sheets, extra TABLE, FORM, and JavaScript enhancements), that are not universally
supported. (Graham, 1998).

Elements and Structure of an HTML Document

   HTML instructions divide the text of a document into blocks called elements. These can be
divided into two broad categories—those that define how the BODY of the document is to be
displayed by the browser, and those that define information about the document, such as the title
or relationships to other documents. The HTML instructions are themselves called tags, and
look like <element_name> --that is, they are simply the element name surrounded by left and
right angle brackets. Some elements are empty—that is, they do not affect a block of the
document in some way. These elements do not require an ending tag. Also element names are
case insensitive (Graham, 1998).

   Many elements can have arguments that pass parameters to the interpreter handling this
element. These arguments are called attributes of the element (Graham, 1998). An attribute is
the additional information that is included inside the start tag. For example, you can specify the
font or size of a character (48 pitch, Times New Roman) by including the appropriate attribute
with the character source code.

  HTML documents are structured into two parts, the HEAD and the BODY. Both of these are
contained within the HTML element—this element simply denotes this as an HTML document.
The head contains information about the document that is not generally displayed with the
document, such as the TITLE. The BODY contains the body of the text, and is where you place
the document material to be displayed. Elements allowed inside the HEAD, such as TITLE, are
not allowed inside the BODY, and vice versa (Graham, 1998).
   The following is an example of the document structure of HTML:

    <TITLE> A Sample HTML Document </TITLE>
    <h1> XML is Changing the Way We Do Business on the Web </h1>

     <p>The Extensible Markup Language, HTML’s likely successor for capturing web content, has generated
     a lot of interest. Created by the World Wide Web Consortium to address HTML’s limitations, XML
     resembles HTMLs format but offers users a more extensible language (Rosenthal and Seligman, 2001).</p>

    <p>Is this a true statement? We shall find out.</p>

After validation, the code would appear on a Web page as the following:

XML is Changing the Way We Do Business on the Web
The Extensible Markup Language, HTML’s likely successor for capturing web content, has
generated a lot of interest. Created by the World Wide Web Consortium to address HTML’s
limitations, XML resembles HTML’s format but offers users a more extensible language
(Rosenthal and Seligman, 2001).

Is this a true statement? We shall find out.

The Limitations of HTML

   HTML doesn’t lend itself to the ration of dynamic design. It consists of mark-up tag
containers that tell a browser client how to display content. HTML has rigid and limited
formatting, a lack of flexibility and links that are easy to break. HTML browser clients don’t
validate documents properly – they ignore syntax violations. HTML has no real concept of
structure, resulting in lifeless and flat documents. But worst of all, HTML is not extensible, so
you can’t express content display without adding proprietary extensions (Internet Magazine,

The Present
   XML was developed by an XML Working Group (originally known as the SGML Editorial
Review Board) formed under the auspices of the World Web Consortium (W3C) in 1996. It was
chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special
Interest Group (previously know as the SMGL Working Group) also organized by the W3C
What is XML (eXtensible Markup Language)?

   XML was designed to improve the functionality of the Web by providing more flexible and
adaptable information identification (Flynn, 2002). It promises to move the Internet forward
into a whole new and more powerful era of computing-and in every way imaginable, not just for
searching. XML is not a presentation language, but by providing a common language for
describing data, it will enable more precise searching, let businesses share data more efficiently,
and make navigating data much easier (Gottesman, 2001).

    XML is intended to make it easy and straightforward to use SGML on the Web: easy to
define document types, easy to author and mange SGML-defined documents, and easy to
transmit and share them across the Web. It defines an extremely simple dialect of SGML which
is completely described in the XML Specification. The goal is to enable generic SGML to be
served, received, and processed on the Web in the way that is now possible with HTML (Flynn,

   XML’s main purpose is the interchange of hierarchical data. XML makes it easier for
different companies, different departments within the same company, different applications, or
even different portions of the same program to communicate in a well-ordered yet flexible way
(Feldman, 1999).

   XML allows the flexible development of user-defined document types. It provides a robust,
non-proprietary, persistent, and verifiable file format for the storage and transmission of text and
data both on and off the Web; and it removes the more complex options of SGML, making it
easier to program for (Flynn, 2002).

   What make XML so powerful is that any type of data--even abstract data concepts—can be
given form and structure. You give data concepts, such as customer and inventory, form by
describing their components and the relationship between those components. Instead of a
customer, you have a specified structure that describes customer-related information, such as
customer name, account number, and address. Once you have created the structure, you group
the data together in a document and serve is up to the world (Gottesman, 2001).

  The Benefits of XML

   In business-to-business electronic commerce systems, organizations can move data between
companies without having to invent custom, special-purpose applications at both ends. There is
no need to invent new mechanisms to move the data around: most data will travel between
companies using HTTP and message queuing technologies, such as Microsoft’s Message Queue
Server and IBM’s MQSeries (McFadden, 2000).

   Another import benefit is in the integration of e-commerce systems and existing back office
systems. Older applications often have no way of understanding XML. If legacy systems can be
modified to accept and publish XML documents, or alternatively use middleware for that task,
they can suddenly be a part of an integrated business-to-business e-commerce strategy
(McFadden, 2000).
   In addition, every major database vendor is committed to accepting and issuing XML
documents as a feature of their databases. This means that an elderly order processing system
based on IBM’s DB/2 can talk directly to new, Internet-enabled order entry system based on
Microsoft’s Site Server, Commerce Edition. That is, as long as they understand each other
(ENT, 2000).

XML Specifications

   According to Norman Walsh of, the W3C had ten very specific goals it wanted to
accomplish with the development of XML:

   1. It shall be straightforward to use XML over the Internet. Users must be able to view
      XML documents as quickly and easily as HTML documents. In practice, this will only be
      possible when XML browsers are as robust and widely available as HTML browsers, but
      the principle remains.

   2. XML shall support a wide variety of applications. XML should be beneficial to a wide
      variety of diverse applications: authoring, browsing, content analysis, etc. Although the
      initial focus is on serving structured documents over the Web, it is not meant to narrowly
      define XML.

   3. XML shall be compatible with SGML. Most of the people involved in the XML effort
      come from organizations that have a large, in some cases staggering, amount of material
      in SGML. XML was designed pragmatically to be compatible with existing standards
      while solving the relatively new problem of sending richly structured documents of the

   4. It shall be easy to write programs that process XML documents. The colloquial way of
      expressing this goal while the spec was being developed was that it ought to take about
      two weeks for a competent computer science graduate student to build a program that can
      process XML documents.

   5. The number of optional features in XML is to be kept to an absolute minimum, ideally
      zero. Optional features inevitably raise compatibility problems when users want to share
      documents and sometimes lead to confusion and frustration.

   6. XML documents should be human-legible and reasonably clear. If you don’t have an
      XML browser and you’ve received a hunk of XML from somewhere, you ought to be able
      to look at it in your favorite text editor and actually figure out what the content means.

   7. The XML design should be prepared quickly. Standards efforts are notoriously slow.
      XML was needed immediately and was developed as quickly as possible.

   8. The design of XML shall be formal and concise. In many ways a corollary to rule 4, it
      essentially means that XML must be expressed in EBNF and must be amenable to modern
       compiler tools and techniques. There are a number of technical reasons why the SGML
       grammar cannot be expressed in EBNF. Writing a proper SGML parser requires handling
       a variety of rarely used and difficult to parse language feature. XML does not.

   9. XML documents shall be easy to create. Although there will eventually be sophisticated
      editors to create and edit XML content, they won’t appear immediately. In the interim, it
      must be possible to create XML documents in other way: directly in a text editor, with
      simple shell and Perl scripts, etc.

   10. Terseness in XML markup is of minimal importance. Several SGML language features
       were designed to minimize the amount of typing required to manually key in SGML
       documents. These features are not supported in XML. From an abstract point of view,
       these documents are indistinguishable from their more fully specified forms, but
       supporting these features adds a considerable burden to the SGML parser (or the person
       writing it, anyway). In addition, most modern editors offer better facilities to define
       shortcuts when entering text (Walsh, 1998).

How Does XML differ from HTML?

   XML is not designed to replace HTML; the two languages have significantly different aims
and can, in some cases, be complementary. Specifically unlike HTML, XML isn’t designed to
format your data. Instead, its purpose is to organize it. In HTML, you apply tags to add
formatting- to let the browser know which text to bold, which text to italicize, and which text to
make a hyperlink. In XML, on the other hand, you use tags to classify your data into its parts
and subparts ( Feldman, 1999).

   XML is used to define the structure of data rather than to describe how the data will be
presented. You define data structures using markup tags. Unlike HTML, XML lets you define
your own tags, giving you control over the document structure. You can also define attributes
for XML tags. Most HTML attributes are used for formatting, but most XML attributes provide
additional information regarding the data structure—for example, a flag that determines whether
a customer account can be processed (Gottesman, 2001).

   Whereas HTML is a specific instance of SGML—a fixed set of tags specifically designed to
display Web pages—XML is a metalanguage, like SGML, that grew out of a need for of
something simpler that SGML to describe documents on the Web (Gottesman, 2001). XML
makes it easier for you to define your own document types, and makes it easier for programmers
to write programs to handle them. It omits all the options, and most of the more complex and
less-used parts of SGML in return for the benefits of being easier to write applications for, easier
to understand, and more suited to delivery and interoperability over the Web. But it is still
SGML, and XML files may still be processed in the same way as any other SGML file (Flynn,
How Does XML Work?

   The power in XML comes from interaction with the Document Object Model (DOM) – the
interface to a document’s structure that defines the mechanics it will use for data access. The
DOM lets your produce standardized scripts for dynamic content, so you can use specific content
to create a specific action. Adding XML to the equation gives you access to the technologies
needed to build the next generation of Web-based applications (Internet Magazine, 1999).

   But in order to appreciate this power, you have to understand how XML works. There are
three key structural elements, Document Type Definition (DTD), Extensible Style Language
(XSL) and Extensible Link Language (XLL), of the XML document.

   The DTD concept, inherited from the SGML, creates documents for document markup. XML
content can not be processed without a DTD. It sets out what names are to be used for the
different types of elements, where they may occur, and how they all fit together. The DTD can
be either part of, or apart from, an XML document – but unlike HTML, XML has no universal
DTD. The benefit is that any industry or company wanting to use XML for data exchange
purposes can define its own DTDs to suit its specific purposes (Internet Magazine, 1999).

   The XSL is more powerful that the Cascading Style Sheets (CSS) of HTML, so you can
create documents that change appearance dynamically and not be required additional interaction
with the server. Elements can be formatted and displayed in several places within a document,
and multiple style sheets can tell one data set to deliver the different platforms or output devices.
XSL handles multiple tags in several ways, so it should bring advanced layout techniques to the
Web. It supports simple CSS objects and complex DSSSL concepts, such as flow objects, where
text can flow into a template and apply style construction (Internet Magazine, 1999).

   The XLL supports simple links in HTML documents, but it also works with advanced
concepts, such as extended links. Extended links work like a Web ring—clicking on an icon
could take you through a linked list of URLs. XLL also lets you identify resources by contextual
location, so you can create a link to a resource that has no unique identifier.

The XML Document

   An XML document can be broken into two basic parts: the header, which gives an XML
parser and XML applications information about how to handle the document; and the content,
which is the XML data itself (McLaughlin, 2001).

   The header is the XML declaration and is displayed as <?xml version=”1.0”>. The header
can also include an encoding, and whether the document is a stand alone document or requires
other documents to be referenced for a complete understanding of its meaning:
<?xml version=”1.0” encoding=”UTF8” standalone=”no”?> (McLaughlin, 2001).

   The bulk of an XML document is the content which contains elements, attributes, and the data
that is entered by the user. The root element, which is the highest element of the XML
document, must be the first opening tag and the last closing tag within the document. It provides
a reference point that enables an XML parser or XML-aware application to recognize a
beginning and end to an XML document. The opening tag and its matching closing tag surround
all of the other data within the content of an XML document. XML specifies that there may be
only one root element per document.

   Elements, delimited by angle brackets, identify the nature of the content they surround. Some
elements may be empty in which case they can have no content. If an element is not empty, it
begins with a start-tag, <element>, and ends with an end-tag, </element>.

   There are rules that govern the creation of elements:

         1. Their names must start with a letter or underscore, and then may contain any amount
of letters, numbers, underscores, hyphens, or period. They man not contain embedded spaces.

       2. XML element names are also case-sensitive.

       3. Every opened element must in turn be closed.

If any of the rules for XML syntax are not followed in an XML document, the document is not
well-formed. A well-formed document is one in which all XML syntax rules are followed, and
all elements and attributes are correctly positioned (McLaughlin, 2001).

  Once the elements have been defined, it is time to define the attributes. The first value is the
name of the element, and then there are various attributes. Those definitions involve giving the
name of the attribute, the type of attribute, and when whether the attribute is required or implied.
The following is an example of attributes:

       <!ATTLIST chapter
                  title           CDATA #REQUIRED
                  number          CDATA #REQUIRED

  After you have all the ingredients (header, elements, attributes, and data), you can create
XML scheme such as the following:

          <author>Brett McLaughlin</author>
          <title>Java and XML</title>
          <author>Tom Willard</author>
          <title>Buffalo Soldiers</title>
XHTML (eXtensible HyperText Markup Language)

   The evolution of HTML has essentially stopped. During 1999, HTML 4.01 was re-cast in
XML and the resulting XHTML 1.0 became a W3C Recommendation in January 2000.
XHTML brings the Web of the future to content authors today. XHTML is in many ways
similar to HTML, but is designed to work with XML and can be put to immediate use with
existing browsers by following a few simple guidelines. XHTML helps create standards that
provide richer Web pagers on an ever-increasing range of browser platforms including cell
phones, televisions, cars, wallet-sized wireless communication, kiosks, and desktops (W3C,

   XHTML is intended to be used in conjunction with tag sets from other XML vocabularies, so
that in principle, you can combine XHTML tags with SVG graphics or XML tags from any other
XML vocabulary.

    The primary advantage of XHTML is its extensibility. Critics called for HTML to be
extended as the Web is used for more varied applications. The addition of new HTML
applications will often demand either making a browser increasingly incompatible with other
browsers, or modifying the standard – a painstakingly slow process led by a committee. But
because XHTML is based partly on XML, it can be extended by adding new elements without
altering the entire DTD that the document is based on (Kiely, 2000).

    The other major advantage of XHTML is referred to as both interoperability and portability.
Most Internet access is through browsers on desktop computers, though more and different types
of devices are constantly being introduced. Refrigerators, PDA’s, digital TVs, and other
alternative platforms are being connected to the Internet, in part to access Web documents.
These devices obviously won’t have the processing power of a desktop computer, and browsers
on them will be less capable to tolerate malformed markup to render the document. XHTML is
designed to make Web documents accessible and interoperable across platforms, in part by
enforcing a rigorous coding standard.

The Future
   While use of XML to glue together business-to-business partnerships is growing, another
larger trend is emerging. Entire industries are building XML languages, or schemas, that will
simplify communications and business processes between partners in those sectors (Mcfadden,

Language Applications

There is a plethora of markup languages that are appearing as a result of the XML standard:

         1. MathXL (MML) – makes it easy for mathematicians to include equations in their Web
        2. Chemical Markup Language (CML) – lets chemist graphically render molecular
structures in Web pages.

       3. BiosequenceML (BSML) – used for exchanging and manipulating gene mapping

       4. Astronomical Instrument ML (AIML) – being developed by NASA to let ground-
based engineers control the SOFIA infrared telescope, as it flies miles through the air aboard a
jumbo jet.

      5. Open Finance Exchange (OFX) – used as the standard for exchange of financial data
between financial institutions, businesses, and customers.

       6. Wireless Markup Languages (WML) – defines how Web content is accessed and
presented by small, handheld devices.

Developing Technology

   The use of XML for loosely-coupled application integration has become a high profile topic,
crucial as it is for the future conduct of electronic business transactions (Dumbill, 2000). The
following is a list of a few of the current technology proposals that use the XML protocol (The
reference used for the following is Dunbill, 2000):

        1. Advertising and Discovery of Services (ADS) – A proposal from IBM to allow web
service providers to advertise the availability of their services. It allows for aggregation of this
service information at a well-known location, and it provides ways of linking content and

       2. Electronic Business XML Initiative: Transport/Routing and Packaging (ebXML) –
ebXML is a project hosted by the United Nations body for Trade Facilitation and Electronic
Business (UN/CEFACT) and OASIS to create an XML infrastructure for conducting electronic

        3. Universal Description, Discovery and Integration of Business for the Web (UDDI) –
UDDI provides a framework for the description and discovery of business services on the Web.
It does this by using distributed Web-based registries of services, and conventions for accessing
that registry using SOAP (Simple Object Access Protocol).

    XML is becoming the language of choice for Web applications. It promises to move the
Internet forward into a whole new and more powerful era of computing and in every way
imaginable. XML has changed the way organizations conduct business-to-business e-commerce.
It allows organizations to move data between companies without having to invent custom,
special-purpose applications at both ends. The question has be asked, “Will XML replace
HTML?”. The answer is no. Although HTML has reached its limitation, XML was not made to
replace HTML but to complement it. HTML is still used today in many Web applications and
will continue to be used in the future. To extend the longevity of HTML, XHTML was
developed. XHTML, a combination of HTML and XML, brings the Web of the future to content
authors today.

  Since XML is a metalanguage, a language that creates other languages, many other languages
have been developed using the concept of XML. Some of these languages are MathXL,
Chemical Markup Language (CML) and Open Finance Exchange (OFX). Technology is
changing at fast pace and XML is apart of the change. There are many technology proposals that
include XML as the primary language. XML is the future of the Web.

Dumbill, E. (2000). XML Protocol Technology Reference. Retrieved March 9, 2002, from

Feldman, B. (1999). Get up to speed with XML. XML Magazine. Retrieved March 3, 2002, from

Flynn, P. (2002). The XML FAQ. Retrieved from

Gottesman, B. (2001). Why XML matters. Retrieved March 3, 2002, from

Graham, I (2000). Introduction to HTML. Retrieved March 3, 2002, from

Internet Magazine (1999). A practical guide to XML. Retrieved March 3, 2002, from

Kiely, D. (2000). XHTML the best of two languages? XML Magazine Retrieved March 9, 2002,

Kyrnin, J. (2000). What is XML Content? Retrieved March, 9, 2002, from

McFadden, M. (1998). HTML and XML: the Internet team. Retrieved March 3, 2002, from

McFadden, M. (2000). Sticking with XML: tying disparate systems together. Retrieved March 3,
 2002, from

McLaughlin, B. Java & XML. O’Reilly and Associates. 2001.

Simpson, J. (2000). Will XML replace HTML? Retrieved March 3, 2002, from

Wahlin, D. (2001). Back to basics: the XML fundamentals. XML Magazine Retrieved March 3,
 2002, from

Walsh, N. (1998). What is XML? Retrieved March 3, 2002, from

Walsh, N. (1998). What do XML documents look like? Retrieved March 3, 2002, from

World Wide Web Consortium (W3C). (2000). Hypertext Markup Language Activity Statement.
 Retrieved March 9, 2002, from