Searching_ images_ frames_ and markup languages - Welcome to

Document Sample
Searching_ images_ frames_ and markup languages - Welcome to Powered By Docstoc
					ECT 250: Survey of e-commerce technology

  Searching, images, frames, and markup
         Searching the WWW
• Exploring the Web can be very time-consuming.
• Search engines and directories enable you to locate
   relevant web pages more quickly and efficiently.
• A search engine is software that allows you to type
   in keywords. The engine scans a database of
   Web pages and displays a list of pages that meet
   your criteria.
• A directory organizes Web pages into categories.
   You can click on appropriate categories until you
   find a Web page that matches your chosen topic.

   Search engines/directories
• Altavista (
• Excite (
• DirectHit (
• Fast Search (
• Go (
• Google (
• HotBot (
• Northern Light (
• Yahoo (
• Web Crawler (

             Naïve searches
• A single keyword search can yield thousands
   of sites, many of which are irrelevant.
   Example: A search for climbing yields
   2,400,000 hits.
• Multiple keywords can help.
   Example: Illinois, Wisconsin, climbing yields
   only 32,500 hits.
• To save time and effort it pays to construct a
   more sophisticated search that will yield fewer
   hits with a higher percentage of relevant pages.

               Searching tips
• Use a directory to find information on a general
   topic. Use keywords in a search engine for
   specific information or narrow topics.
• Use the searching tips to construct a precise query.
• Use multiple, specific keywords and synonyms.
• Use advanced search features to make your query
   more focused.
• Try multiple search engines/directories or use a
   meta-search engine (e.g. DogPile).
• Use a specialized search engine (e.g. Business
   search engine)
     Advanced search options
• Special operators (and, or, not, near)
• Search for phrases, not just keywords
• Domain specific searches: include or exclude
   pages based on their domain
• Specify the language of the search
• Page specific searches: pages that link to or are
   similar to a given page
• Give a bound on the most recent update
• Specify whether the site contains images, audio,
   or visual information
Search engines examine only a fraction of the web
pages available on the World Wide Web.

A study released in 1998 estimated that the best
engines indexed only 33% of the publicly indexable
Web. The 1999 follow-up study found the coverage
had decreased to only 16%.

More important are the techniques used by the search
engine in ranking and updating pages.

         Loading efficiency
• Most Web pages contain graphical images to
   add interest, make navigation easier, and to
   convey necessary information.

• Most Web users will wait only a short time for
   a page to load, so efficiency considerations
   are important.

              Graphic formats
• Graphic formats are usually referred to by their file
   extensions, such as .tif, .bmp, .gif, .jpg, and .png.
• Web page images are commonly in either the .gif
   .jpg, or .png format.
• Graphic formats are usually compressed. File
   compression can either by lossless, which does
   not decrease image quality, or lossy, which does
   lose image quality.

• The Graphics Interchange Format (GIF) is the
    standard format for Web page images and is
    supported by all browsers that display images.
• It is an efficient, compressed format that allows
    up to 256 colors. It uses lossless compression.
• GIF images are always rectangular, but a
    transparent background can be used to make
    the images appear to be non-rectangular.
• GIF images can be interlaced, which means that
    the image is displayed initially at low resolution
    and its quality is increased as it downloads.
• The Joint Photographic Experts Group (JPEG)
    format is supported by most browsers that
    display images.
• JPEG images use lossy compression. The amount
    of compression ranges from 0% to 100%. The
    higher the compression, the smaller the file size
    and the lower the image quality.
• JPEG cannot be made transparent, but it can be
    specified as a progressive JPEG, which is loaded
    the same way as an interlaced GIF.

• The Portable Network Graphics (PNG) format is
    a new(ish) format created for Web page images.
• It is expected that it will eventually replace GIF.
• PNG images use a lossless compression that is
    more efficient than GIF.
• It can use a color palette of 256 colors or less like
    GIF or support true color like JPEG images.
• PNG images can be interlaced and transparent.

          Selecting a format
• The GIF or PNG format is usually used for line
    art such as clip art, logos, etc.
• JPEG is chosen for photographs because true
    color is desirable and selecting the amount of
    compression can result in smaller sized files.
• One approach is to save an image in several
    formats and choose the one with the smallest
    file size that produces acceptable quality.

          Size considerations
• GIF, JPEG, and PNG images are all bitmapped
   formats, which means that the images are
   made of a rectangular grid of pixels.
• Web images are measured in pixels.
   Example: 500 x 55
• Do not make images too wide. Images that do
   not fit into a single screen will force scrolling.
• For efficiency considerations, you may choose
   to create a thumbnail image. This is a smaller
   version of an image that allows a preview of
   the picture. Example: LLBean
Frames allow more than one Web page to be
displayed within the browser window at a time.

When frames are used, the page opened in the
browser is a special page containing instructions
about how the browser window is to be divided
into separate regions and which page should be
initially displayed into each region. This special
page is called the frames page or frameset.

      Navigating with frames
When frames are used, clicking on a link in one
frame can:
• Change the contents of that frame
• Change the contents of a different frame
• Display a page without using the frames page

An application of frames is for a table of contents
or a navigation bar. Frames allow the contents or
navigation bar to be visible at all times.

Sites that use frames:
• Macromedia:
• National Discount Brokers:
• XSL Tutorial:
• A personal page: Jim Jacobson

Some sites that do not use frames:
• Amazon:
• DePaul CTI:
• Gap:
• NY Times:                                  17
        Frames: good or evil?
There is a significant controversy about whether
the use of frames is a good or bad thing.

What are some of the issues surrounding frames?

For a longer discussion of some of the issues see:
• Aren’t frames bad?
• Web design: frames – good or bad?

Some problems with frames
• Search engines do not deal well with frames
• Printing becomes more difficult
• Saving pages is more complicated
• Creating browser bookmarks may not work
• Frames can require large resolution

Why use frames at all?

          Benefits of frames
• Navigation can be easier
• Easier updating of pages

Many of the problems given on the previous page
are technology issues. Once a solution is found,
frames may become more attractive.

Example: MS IE 5.0 supports frames better than
previous versions.

    Conclusions about frames
• Use frames only when the benefits outweigh the
• Tables or shared borders can be used instead of
   frames to place a navigation bar, table of
   contents, or other item on the edge of the page.
• Frames have become much less popular at large
   web sites.

          Markup languages
• FrontPage is an HTML editor.
• HTML stands for hypertext markup language.
• It is an example of a markup language.
• Historically markup has described annotations
    and handwritten notes found on manuscript
    pages that tell a typist how a particular page
    should be laid out or typeset.
• Electronic markup languages are marked with
    tags to govern the display, formatting, and
    organization of text elements.

       Three markup languages
Three markup languages are of particular interest:
1. SGML (Standard Generalized Markup Language)
   is the parent language from which the other two
   are derived. It is a meta language used to define
   other markup languages.
2. HTML (Hypertext Markup Language)
3. XML (Extensible Markup Language) is another
   descendent of SGML. It defines data structures
   important for a wide range of data exchange

An HTML document contains both document
content and tags.
• The content consists of all the information that
   appears in the browser window, including
   text, graphics, and video.
• Tags are the HTML codes that specify how a
   the document should be formatted.


              HTML tags
• Each HTML tag is enclosed in angle brackets.
• Two-sided HTML tags come in pairs.
   The general form of a two-sided tag is:
   <tagname properties>Content</tagname>
   The opening tag is <tagname properties>.
   The closing tag is </tagname>.
• Some HTML tags are one-sided, requiring
   only the opening tag.
• Tags are not case-sensitive.

              Types of tags
There are a large number of tags. Some examples:
• Document tags: specify the parts of the document
    such as the heading, title, body.
    <title></title>, <html></html>
• Text structure tags: determine the layout of the
    text found in the body of the document.
    <h1></h1>, <p></p>, <br>
• Style tags: specify how text will be shown by the
    browser. <center></center>, <em></em>
• Image tag: <img src=“name” other-attributes>
• Anchor tag: <a href = “URL”></a>
               The meta tag
Search engines catalog sites by following links from
page to page and saving identification information
for each page visited.

The main HTML element that interacts with search
engines is the Meta tag.

Using the Meta tag you can list information about
your page that allows a search engine to better
classify the contents of your page.

      Attributes of the meta tag
The Meta tag has two attributes that should always
be used:
1. The Name attribute identifies the type of Meta
   tag you are including.
2. The Content attribute provides information the
   search engine will be cataloging about your site.
<Meta Name = “keywords” Content = “algorithms,
complexity, quantum, information, retrieval,
kolmogorov, security, arrays, cryptography, faculty,
          History of HTML
• HTML 1.0: Introduced in 1991 by Berners-Lee.
   At that time there was no standard for HTML.
• HTML 2.0: Released in 1995.
   Began to move to a standard. Released at the
   same time were MS IE 2.0 and Netscape’s
   Navigator 2.0.

Recall that the World Wide Web Consortium
(W3C) serves as a leader in maintaining Web
standards and common protocols. It was founded
in 1994.
         History of HTML
• HTML 3.2: Introduced in 1997 by the W3C.
   Supported tables, complex numbers, and text
   flow around images.
• HTML 4.0: Released by W3C in 1997.
   Included support for cascading style sheets,
   and added international features such as the
   ability to render text right to left.
• HTML 4.01: Released by W3C in 1999.
   Supported more multimedia options,
   scripting languages, and documents more
   accessible to users with disabilities
         History of HTML
• XHTML Basic: Released in December 2000 by
   W3C, incorporating elements of XML into
   HTML to allow development on a wider set
   of devices such as TVs, PDAs, pagers, and
   cellular phones.
• Coming soon from W3C: XHTML 1.0, which is
   a reformulation of HTML 4.0 in XML.

• Work on the definition of a Generalized Markup
    Language for describing electronic documents
    and their format was begun in the 1960s.
• In 1986, the International Standards Organization
    (ISO) adopted a version of the standard called
    Standard Generalized Markup Language.
• SGML includes a standard that defines device-
    independent and machine-independent methods
    for representing electronic documents.

        Advantages of SGML
• SGML is good for organizations with special or
    complex requirements for the management of
    documents. Examples: U.S. DOD, HP
• It is stable since it was standardized in 1986.
• It is platform independent and will outlive most
    current applications.
• It supports user-defined tags and architecture.

Why is SGML not used by everyone?

     Disadvantages of SGML
• SGML’s tools are relatively expensive when
    compared to HTML.
• SGML has a steep learning curve.
• It is costly to set up and maintain, requiring
    extensive training and expertise.
• Creating document type definitions with SGML
    can be expensive in terms of human labor.

• Extensible Markup Language is also derived from
    SGML, although it is newer than HTML.
• It represents an effort to define what information
    is on a Web page. This contrasts with HTML
    where the emphasis is on the format of the data.
• XML allows designers to easily describe and
    deliver structured data from any application in
    a standard, consistent way.

        Idea behind XML
• XML is both a markup language and meta
    markup language.
• XML allows you to create new tags for each
    type of document you are storing.
• In this way, XML stores information in a
    structured manner.
• It is also interoperable with both HTML and
    SGML. This allows data stored in XML to
    be displayed (using HTML) and integrated
    with SGML documents.
            XML example I
        <title>Some XML</title>
        <date>April 25, 2001</date>
        <summary>Sample XML</summary>
        <content>XML is not for displaying information
                but for managing information.
            XML example II
                  <position>network administrator</position>
                  <position>web designer</position>
</list>                                                        38