Introduction to HTML & Servlet Introduction and History File transfers were made possible with the earliest versions of internet software; the program that handles file transfers still survives more or less intact - ftp. As internet users started to place files for public access, advertising their wares on bulletin boards, some people started distributing "lists" of hot ftp material on bulletin boards. Along came archie, a program to locate ftp sites using a keyword search. This program worked by collecting data at archie servers (located at some sites) and allowing archie clients to connect to the servers to search. The next development was gopher (created at the University of Minnesota), a utility which combined several tools (a file viewer, ftp and telnet) in a single easy-to-use menu-driven interface. At the same time, the publishing industry had been experimenting with so-called hypertext documents (electronic documents with nonlinear organization of data) -- on single machines. A standard called SGML (Standard Generalized Markup Language) was developed to write hypertext documents in free ascii-text (similar to Latex, troff etc). Ideally, SGML should be integrated with TCP/IP to provide links across the network. But SGML is large and complex. Thus came HTML (HyperText Markup Language), a much simpler formatting language developed by CERN in Switzerland that uses TCP/IP. The whole idea in using HTML is to display more than text, that is, formatted text and images. For this, a "browser" is needed - most often, a browser written for a windowing package such as xmosaic (the first browser) written for X-windows. HTML Tag Basics HTML documents are written in ascii text, with commands specified by particular sequences of characters. Commands in HTML usually consist of 3 components: a start tag, a middle, and a stop tag. For example, to specify the title of a document, such as, Red Riding Hood, you would use the <title> command: <title> Red Riding Hood </title> Note that the start tag is the keyword <title> (with triangular brackets, the signature of HTML commands), the stop tag is the keyword </title> and the middle is the textual data. Some characters, such as “<“, “>” and “&” are used exclusively for HTML command. There are special ways to display them. HTML Basic Structure <html> <!-- This is an internal comment; it won't be displayed --> <!-- Note the exclamation mark and the two dashes on either side --> <head> <title> Red Riding Hood </title> </head> <body> Once upon a time, in a land far far away, there lived... </body> </html> Write HTML in Free Text Now, HTML is written in free NOTE: text, so the previous document could just as well Obviously, some indentation will make be written into a text file as: the text file more readable. <html> <!-- This is an The <head> blah-blah <\head> part of internal comment; it won't the document is used to specify a title (so that it can be displayed in the title window be displayed --> <!-- Note of a browser). the exclamation mark and the two dashes on either Thus, the bulk of the document will be side --> in the <body> blah-blah-blah </body> part. Typically, a file that starts with <head> <title> Red Riding <html> and ends with </html> Hood </title> </head> corresponds to one page during browsing; <body> Once that is, following a link leads to a new upon a time, in a land far HTML document. far away, there lived... </body> </html> HTML Formatting Commands Command Description Start tag Stop tag --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- <html> </html> HTML document indicator <head> </head> Document head <title> </title> Title (usually in <head> section) <body> </body> Document body <address> </address> Document author info <!-- --> Comment <h1> </h1> Level-1 heading <h2> </h2> Level-2 heading .. . <h6> </h6> Level-6 heading <i> </i> Italics <em> </em> Emphasized text - similar to italics <b> </b> Bold face <strong> </strong> Strong - similar to bold <tt> </tt> Teletype <strike> </strike> Strike-through <var> </var> Variable - similar to italics <cite> </cite> Citation - similar to italics <code> </code> Code - similar to teletype <kbd> </kbd> Keyboard - similar to teletype <samp> </samp> Sample - similar to teletype <dfn> </dfn> Definition - for definitions <key> </key> Keyword - for keywords <p> End a paragraph, start a new one <br> Line break - start a new line <hr> Horizontal rule <pre> </pre> Preformatted text - format exactly as entered in ascii. <blockquote </blockquote> To set apart a quote Itemized List <body> An unordered list list is defined by <h2> Red Riding Hood's shopping list <ul> list of things </ul> </h2> <ul> An ordered list by <ol> list of <li> Picnic basket ordered items </ol> . <li> Red items Each item is specified by a <li>. . <ul> <li> Red delicious apples Unordered lists are bulleted and <li> Red sneakers ordered lists are numbered. </ul> There are also <menu> and <dir> <li> Safety items types of lists, both being similar to <ol> unordered lists. <li> Magnesium flare <li> Cellular phone Lists can be nested within other </ol> lists, as the example shows. </ul> </body> Special Characters Character code Description ------------------------------------------------------ < the less-than symbol > the greater-than symbol & the ampersand symbol Adding Links: an Example <body> <h1> Red's Early Years </h1> <a href="early/birth.html"> Birth </a> <a href="early/preschool.html"> Pre-School </a> <h1> Red Goes to High-School </h1> <a href="home.html"> Red Sets up a Homepage </a> </body> What is a URL? A URL (Universal Resource Locator) is a document name that contains complete access information such as whether the document is HTML, where it is (internet address), the path name (sequence of directories) and other information. For example, consider this URL: http://www.cs.wm.edu:80/tales/fairy/modern/masterlist.html It specifies the following: http - the document is in HTML www.cs.wm.edu - the internet address or system name 80 - the port at which the httpd daemon is listening (most often the port is 80, the default port, and is left out of the URL) tales/fairy/modern - a path name leading to a file masterlist.html - a file name Port number 80 is the standard port number. It is not needed in the URL. It is also possible to pass parameters in a URL. Another Type of Anchor Let's assume that stories.html contains the tales (each with hyperlinks to other files. Now, by clicking on any stories in masterlist.html, the browser will take you to the top of the stories.html file. You then have to scroll down to the story you want. To avoid this problem, we simply mark each story beginning in the file stories.html and use the mark in the href specification. For example, in stories.html, let us mark Red Riding Hood as follows: <h1> <a NAME="red"> Red Riding Hood <a> </h1> Now, in the appropriate href part in masterlist.html, we specify this mark: <li> <a href="stories.html#red"> Red Riding Hood <a> Observe the hash symbol being used to specify a named anchor. You can use named anchors for rapid movement within a single HTML document. Relative Addressing Suppose the address of the current document is http://www.cs.wm.edu/tales/fairy/modern/masterlist.html Then, we have seen that links in the file masterlist.html are created by giving an address in the href part of an anchor. We can either provide a full address or a partial or relative address. Above, we saw an example of a relative address: <a href="stories.html"> Red Riding Hood <a> We could have also given the complete address: <a href="http://www.cs.wm.edu/tales/fairy/modern/stories.html"> Red Riding Hood <a> Image and Other Things What is MIME? MIME (Multipurpose Internet Mail Extensions) is a standard that incorporates many well-known file formats. The idea is that the browser doesn't handle these formats and instead calls a "plug-in", a program that knows what to do with the data. Thus, for "postscript" files, a postscript viewer is called by the browser. You can, by setting options in the browser, decide which application programs (plug-ins) handle which file extensions. Here are some common extensions (some of which, like .gif, are directly handled by the browser). gif - .gif files are graphics or bitmap files in the GIF (Graphics Interchange Format) format. jpeg - a bitmap format for still images. mpeg - format for motion pictures. ps - postscript pdf - the format used by Adobe Acrobat documents How to display in-lined images: In-lined images are images are images within the HTML document (as opposed to spawning a viewer). Consider the following example, which displays an image in the file mypicture.gif: <body> <h1> The Next President of the United Brewpub Tasters of America </h1> <img alt="my mugshot" align=bottom src="mypicture.gif"> </body> With the <img ... > command, we specify the source file (mypicture.gif), an alignment for the first following line of text, and an alternate ascii string (my mugshot) for browsers that don't support images. Adding links to images (simulating buttons): This is easy: simply enclose the entire image command inside an anchor command. For example: <body> <h1> The Next President of the United Brewpub Tasters of America </h1> <a href="bio.html"> <img alt="my mugshot" align=bottom src="mypicture.gif"> </a> Click on my picture to get my biodata </body> Homepages You now want to know where to place files that others can view: your homepages. In Unix, you need to create a subdirectory off of your main directory and call it public_html. Note: webservers differ in what they want you to use as "home" directories. In the subdirectory public_html, create a file called index.html. When someone accesses your homepage by just giving your username, it is this file that is brought up. Thus, the URL http://www.cs.wm.edu/~simha is really the file public_html/index.html, which the webserver knows to get. (You don't have to understand this last point). Make sure that you grant public access to this directory and to the files you place in the directory. You can now place all other files you want others to access in the directory public_html. For example, if I create my CV in an HTML file called cv.html, put the file in the directory public_html, and refer to it by the URL: http://www.cs.wm.edu/~simha/cv.html then others can `open' this URL and get the file.