Document Sample
CIS Powered By Docstoc
					World Wide Web Development

     The World Wide Web

                                 Various Views

• In which we consider the evolution of the web in various ways:
   •   what pages are
   •   what they contain
   •   how they are presented
   •   how they are styled
   •   how they are syndicated
   •   how they react
   •   how they are delivered


• First, a bit of history.
• The World Wide Web was invented in approximately 1989 by Tim
• In 1990 he released the first web browser, called, appropriately
  enough, WorldWideWeb. Later it was renamed Nexus, and
  originally ran on a NeXT computer.
• The main aim, at the time, was to use it to share information at
  CERN, the European Laboratory for Particle Physics.


The first web browser


• ―In 1989 one of the main objectives of the WWW was to be a
  space for sharing information. It seemed evident that it should be
  a space in which anyone could be creative, to which anyone could
  contribute. The first browser was actually a browser/editor, which
  allowed one to edit any page, and save it back to the web if one
  had access rights.‖ (Berners-Lee, 2005)


• The key feature of the first web browser was hypertext - it allowed
  text in one document to link to another document.
• While it had existed before, (most notably in Apple’s Hypercard
  system), it was well implemented in early browsers, and is still the
  key feature today.

                          History: Compatibility

•   One of the major issues facing Berners-Lee was making
    browsers work on as many different operating systems as
•   This isn’t surprising - at the time there were a variety of operating
    systems to cater for, and physicists tend to use the operating
    system that they are most comfortable with.
•   Three factors made this possible:
    1. Other browsers were rapidly developed.
    2. HTTP (HyperText Transfer Protocol) proved to be fast and platform
    3. HTML (HyperText Mark-up Language) only provided minimal instructions
       on how to display pages, leaving most decisions to the browser.


• It was that last point that matters here: HTML was designed to
  provide only minimal instructions as to how a page should be
• Thus different browsers could make their own decisions on how to
  implement these instructions depending on what the computer
  was capable of.
• It is much harder (as we’ll see) to make more complex and
  specific instructions work exactly the same way on every browser
  and every operating system.


• The basic theory behind HTML is simple. If you take an existing
  string of text:
   This is a series of words. They should be laid out neatly, in two paragraphs,
     with bold and italics.
• The browser needs to be informed how to lay it out. When is the
  paragraph supposed to start? What is in bold? What is in italics?
• That’s where instructions come in.
• But there’s a problem: the only information that can be transmitted
  is text. How can the browser tell the difference between an
  instruction and text to display?


• The answer is to use tags.
• Tags are instructions contained within < >. Every time the browser
  sees those two symbols, it assumes that whatever lies between is
  an instruction.
• And anything not within those symbols is therefore assumed to be
  normal text that should be displayed.
• Note that if you want to display those symbols you’re in trouble,
  but there are ways around it.


• The problems aren’t over. Not only do you need to say ―start
  doing this‖, but sometimes you also need to say ―stop doing this‖.
• For example, you want an instruction that says ―make this bold‖,
  but you’ll also need a second one to say ―stop making this bold‖.
• The solution was to use a forward slash:
   <>        is a start tag
   </ >      is a stop (end) tag
• For example:
   <b>       says ―start making the text bold‖
   </b>      says ―stop making the text bold‖


• Both concepts come from another ―markup language‖ - the
  Standard Generalised Markup Language (SGML) which Berners-
  Lee had previously been exposed to.
• SGML was developed in the 1960’s, and used tags to provide
• HTML was very similar to SGML: however, it wasn’t quite as
  rigorous, and therefore early versions weren’t compatible.

                          HTML: Basic Tags

• Some of the most basic HTML tags are:
   <html></html>     Encompasses all text containing other HTML tags.
   <head></head>     Information about the page, such as author, title and date
                     it was created.
   <body></body>     The content to be displayed.
   <p></p>           Start and end of a paragraph.
   <b></b>           Bold text
   <i></i>           Italics
   <a href=―‖></a>   An anchor tag. The ―href‖ is used to specify what page to
                     link to.

                             HTML: Basic Tags

• Putting it together, we get:
   <html><body><p>This is a series of words.</p><p>They should be laid out
     neatly, in two paragraphs, with <b>bold</b> and
• Which would appear in a browser as:
   This is a series of words.
   They should be laid out neatly, in two paragraphs, with bold and italics.

                               HTML: Layout

• To make things easier, HTML doesn’t care about extra
  whitespace. Thus the same information can be presented as:
   <p>This is a series of words.</p>

   <p>They should be laid out neatly, in two paragraphs, with <b>bold</b> and

                          HTML Grows Up

• As mentioned, HTML was intended simply to provide basic layout
• There wasn’t intended to be much control on the part of the
  designer over how the page looked.
• This makes sense: do you really need that much control over
  design to present a physics paper?
• But as the Web became popular, it started being used in different
• As a result, people started to want additional control over layout.

                        HTML Grows Up

• The result was an expansion of what HTML could do, both
  officially from the World Wide Web Consortium (W3C) and
  unofficially from the browser manufacturers.
• Thus a number of different versions of HTML were released,
  culminating (at least for now) with version 4.01.
• However, this pushed HTML well beyond what it was originally
  designed for, made browser development extremely difficult,
  HTML became very complex, and cross-platform issues became
  more serious.

                              XHTML & CSS

• The result was XHTML and CSS.
• XHTML is, in effect, as stricter form of HTML.
• It has tighter rules, such as:
   • All tags must be in lower case.
   • All tags must have an associated ―end‖ tag.
   • Many ―browser specific‖ tags no longer work.
• More to the point, XHTML focused on describing what content
  was - ―this is a heading‖ and ―this is a menu‖ - rather than on how
  to display it.
• Presentation moved to CSS.

                          XHTML & CSS

• CSS (Cascading Style Sheets) tells the browser how to display
  the information contained in an XHTML file.
• Thus while XHTML says ―this is a major heading‖, CSS states
  ―make major headings bold, using Arial as the font, 30 pixels high
  and centered on the page‖.
• The advantage comes from it being separate: if you develop a site
  using XHTML and CSS, you can update the look-and-feel simply
  by modifying the CSS file, without touching the XHTML.

     XHTML & CSS

UniSA’s website: XHTML and CSS

       XHTML & CSS

UniSA’s website with CSS turned off


• Finally, a bit about XML. XML stands for eXtensible Markup
• It is an attempt to go even further than XHTML (and, in fact,
  predates it).
• XML can be used to markup almost any information in a way that
  makes sense to both machines and (to an extent) people.
• Mostly it is used in databases and to share information with RSS.

                          What’s in a Page?

• As discussed, pages can contain a number of different elements:
  text, images, audio, programs, and the like.
• They can also use files in a number of different formats: text, html,
  xhtml, css, xml, and so on.
• MIME Types are used to tell the browser what sort of file is going
  to arrive.

                            MIME Types: Text

• Text MIME Types include:
   •   text/css: Cascading Style Sheets; Defined in RFC 2318
   •   text/html: HTML; Defined in RFC 2854
   •   text/plain: Textual data; Defined in RFC 2046 and RFC 3676
   •   text/xml: Extensible Markup Language; Defined in RFC 302

                         MIME Types: Images

• Type image
   •   image/gif: GIF image; Defined in RFC 2045 and RFC 2046
   •   image/jpeg: JPEG JFIF image; Defined in RFC 2045 and RFC 2046
   •   image/png: Portable Network Graphics; Registered[4]
   •   image/tiff: Tag Image File Format; Defined in RFC 3302
   •   image/ ICO image; Registered[5]

                         MIME Types: Audio

• Type audio
   • audio/mpeg: MP3 or other MPEG audio; Defined in RFC 3003
   • audio/x-ms-wma: Windows Media Audio; Documented in Microsoft KB
   • audio/vnd.rn-realaudio: RealAudio; Documented in RealPlayer Customer
     Support Answer 2559
   • audio/x-wav: WAV audio

                       MIME Types: Application

• Type application: Multipurpose files
   • application/msword: Microsoft Word .doc files
   • application/pdf: Adobe’s portable document format (Acrobat files)
   • application/javascript: JavaScript; Defined in RFC 4329
   • application/xhtml+xml: XHTML; Defined by RFC 3236
   • application/x-shockwave-flash: Adobe Flash files; Documented in Adobe
     TechNote tn_4151 and Adobe TechNote tn_16509
   • application/octet-stream: Arbitrary byte stream. This is thought of as the
     ―default‖ media type used by several operating systems

                        What's in a Page?

• Headings, links, paragraphs & lists from the beginning
  (WorldWideWeb browser 1990)
• images added by NCSA Mosaic (Andreessen, 1993)
• scripts, Java and plug-ins appear in Netscape 2 (1995) – the
  RealAudio plugin bundled with navigator
• stylesheets and ActiveX arrive with Internet Explorer 3
• forms add controls like:

        radio buttons and

                          What's in a Page?

• The first image on the web was
  also the first rock band on the
• By the time the NCSA released
  Mosaic, pages included images
  embedded in the html.*

                        What's in a Page?

• tables are introduced in HTML 3 (1995/6)
• The applet tag (HTML 3.2) allows the inclusion of Java programs
  within a web page
• divs provide arbitrary divisions of the HTML document (useful to
  define different areas of the page)
• The embed tag allows us to include video and audio via plugins
• Generic objects allow use to add arbitrary items

                     How is the page arranged?

• Early pages were just a single column, items appearing from top
  to bottom;
• tables were used to arrange elements in a grid;
• floats allowed pictures to appear to the left or right of the
  matching text;
• divs could be positioned relative to each other;
                                                Top layer with an
• z-position allowed layers to overlap          absolutely horrendous
                           Bottom text in one colour scheme to
                           layer with no        highlight the overlap
                           guarantee of filling it
                           all up easily
                Frames: several pages at a time

• Netscape 3 introduced the idea of showing several pages at a
  time: frames divided the window into several regions, each of
  which was attached to a web page.
   <frameset cols='20%, 75%'>
     <frame src='left.html'>
     <frame src='right.html'>

                         Pages with Style

• Early pages with <font…> tags to define font family and size
• Transparent gifs used as spacers
• CSS1 (1996) styles provide font, padding and margins
• CSS (1998) positioning allows exact placement on page —
  partly implemented by level 4 browsers
• CSS width and float allows side-by-side divs
• CSS2 defines styles for separate media types, child and adjacent
• CSS3 adds multiple columns, shadows and opacity settings


• forms plus CGI (Common gateway interface):
  generates a new page based on arguments
• forms plus client-side javascript allows changes in the existing
• Styles with hover, active classes on links
• Javascript triggers on mouseover, mouseout and keypress
• Adobe’s Flash plugin allow video/audio and interactive controls


• Javascript expanded to alter styles of document elements click!
• IE5 introduces inner html to allow javascript to write elements
• Standard Document Object Model (w3c DOM Level 1, 1998)
  allows dynamic changes to elements of web page

                     Interactivity and AJAX

• Internet Explorer 5 introduced the XMLHTTP function to
  javascript, allowing background requests to the webserver.
• XMLHttpRequest is now a standard javascript function in level 6
• AJAX refers to the use of XMLHttpRequest to query databases
  and update parts of a web page in the background.
• Javascript extension libraries such as Prototype and provide standard ways to do animation, effects
  and drag & drop of html page elements.

                         Web Page Delivery

1. The user specifies a file using a URL.
2. The server locates the file (or the program) on its hard drive, and
   transmits a copy (or the output) to the client.
3. The connection is then effectively "broken".
4. The browser now interprets the file and displays it on the screen
   (or saves it, or whatever else it is asked to do with it).
5. Thus a web page exists first on a server, then as a copy on a
   client's machine.
6. This process can be referred to as "call and response" - the
   server receives a call from a client, and responds to that call.
                          Call and Response

            Client                Server                  File

Step 1: The client (your computer) requests a file (URL) from the server.

                  Call and Response

    Client                Server                  File

Step 2: The server finds the requested file (or program).

                 Call and Response

    Client                Server                  File

Step 3: The server reads the information to be delivered.

             Call and Response

Client               Server                   File

Step 4: The server sends the data to the client.

                        Call and Response

           Client               Server                  File

Step 5: The connection is ―broken‖, and the browser displays the data.

                      Why Does This Matter?

• Files must be downloaded to the client.
   • Therefore, it is difficult to protect those files.
   • So HTML/XHTML code, images, some JavaScript and the like
     are difficult, or even impossible, to protect.
• HTTP is a stateless protocol - it has no memory.
   • It is difficult (or impossible) to write software using HTTP that
     employ synchronous communication. Mostly we fake it - by
     making many small requests to the server.
   • Each individual file - including every single image - requires a
     separate request.

                  Some Historical Drivers of the Web

• 1990-1993: Predominately for scientific research
   • "Presentation" was less significant than cross-platform compatibility and
• 1993: Multiple browsers, and the web opens up for commercial
   • Some incompatibility between browsers emerges.
   • Commercial interests start push towards presentation as well as content.
• 1994: AOL and Compuserve allow access to the web
   • Massive increase in number of users.
   • Users have (on average) less computer experience than previously,
     pushing "ease of use".

                  Some Historical Drivers of the Web

• 1995: eBay founded, Amazon launched
   • Big online-only businesses emerge, although profits are slight (and often
• 1995-1996: Browser wars begin - Netscape 3 vs MSIE 3
   • Forces rapid development of browser technology.
   • Deliberate incompatibilities emerge between browsers.
• 1995-1997: Introduction of W3C standards

                  Some Historical Drivers of the Web

• Late 1990's: Mobile Commerce emerges
• 1998: Google becomes privately owned company
   • Dramatically improves practicality of web searches.
• 1999: LiveJournal and
   • Help popularise "blogging".
• 2000: Dot-com crash
• 2004: Long tail
   • Acknowledges that there are business models that can best be realised

                  Some Historical Drivers of the Web

• 2004: Podcasting
   • Podcasting, along with video sharing (such as YouTube), image sharing
     (Flickr), online music sales, Massive Multiplayer Online Games and peer-
     to-peer downloading strongly push uptake of broadband.
• 2005: AJAX
   • Shows the viability of web-based applications, such as Writely (now Goodle
     Docs), Google Calendar, and others.
   • Underlying technologies predate 2005

The End


Shared By: