Document Sample
					Introduction to Web Science

         Web 1.0
       Dr Alexiei Dingli

          Introducing Web 1.0
•   Packet switching network
•   IP Addressing
•   Internet Applications
•   The WWW and markup
•   Searching the WWW
•   Intelligent Agents
•   Internet Governance
  Packet-Switched Networks (1)
• Local area network (LAN)
   – Network of computers located close together
• Wide area networks (WANs)
   – Networks of computers connected over greater
• Circuit
   – Combination of telephone lines and closed
     switches that connect them to each other
 Packet-Switched Networks (2)
• Circuit switching is used in telephone

• The Internet uses packet switching

• Packet switching needs computers called
  „routers‟ and the programs called „routing
  algorithms‟                              4
  Packet-Switched Networks (3)
• Information is
  divided into

• It is passed from
  node to node

• It is recomposed
  as one chunk on
  the destination
  server                    5
              Routing Packets
• Routing computers
  – Computers that decide how best to forward

• Routing algorithms
  – Rules contained in programs on router computers
    that determine the best path on which to send

   – Programs apply their routing algorithms to
     information they have stored in routing tables
• Communications protocol suite
  – Packet switched protocol
     • No end-to-end connection is required
     • Each message broken down into small pieces called packets
     • Packets possibly routed to destination over different paths

  – Transmission Control Protocol (TCP)
     • Breaks messages into packets
     • Numbers packets in order
     • Reorders packets at the destination

  – Internet Protocol (IP)
     • Routes packets to the proper destination
     Open Systems Interconnections
 OSI Model (also called TCP/IP protocol suite) layers
  (from the highest to the lowest):

7 Application
6 Presentation
5 Session
                 {    HTTP, SMTP, FTP, Telnet,
                      SSH, Whois, etc.
4 Transport           TCP, UDP
3 Network             IP
2 Data Link           Ethernet
1 Physical            Wire, Radio, Fibre Optic
               IP Address
• Internet addresses are based on a 32-bit
  number called an IP address

• IP addresses appear as a series of up to four
  separate numbers delineated by a period

• An address such as uniquely
  identifies a computer connected to the

• IP Subnetting conceptually divides a large
  network into smaller sub-networks         10
IP Classes (1)

              IP Classes (2)
Class     Leading       Network     Addresses Per
            Value       Numbers        Network
Class A             0      126        16,777,214

Class B         10       16,384         65,534

Class C        110      2,097,152        254


          Without subnetting …
• Explosion in size of IP routing tables.

• Every time more address space was needed, the
  administrator would have to apply for a new block of

• Any changes to the internal structure of a company's
  network would potentially affect devices and sites
  outside the organization.

• Keeping track of all those different Class C networks
  would be a bit of a headache in its own right.
         Benefits of Subnetting
• Better Match to Physical Network Structure

• Flexibility

• Invisibility To Public Internet

• No Need To Request New IP Addresses

• No Routing Table Entry Proliferation
  IP Vr6 (or IP Next Generation)
• Network Layer
• Developed in 1994

• Will replace the IP Vr4 standard
  – limits on network addresses will eventually lead to
     exhaustion of available addresses (by 2023)
  – supports only 4,294,967,296 addresses (32bits)

• Improvements include
   – providing future cell phones and mobile devices their own
     unique & permanent addresses
   – supports about 3.4 × 1038 (128bits)                   16
             Domain Names
• A Uniform Resource Locator (URL) consists of names and
  abbreviations that are much easier to remember than IP

• The HTTP protocol defines how an Internet resource is

• An address such as is called a domain

• Domain Name System (DNS)
   – A database of Internet names
   – DNS Servers convert Internet names to IP addresses
   – Top level domains

    Top-Level Domain Names

• Internet Corporation for Assigned Names and
  Numbers (ICANN)
  – Responsible for managing domain names and
    coordinating them with IP address registrars   18
      Domain Name case study
• The web was not an „open‟ place

• One company available where you could buy a
  .com, .net or .org domain

• Price of 100 dollars and a two year minimum

• Back then, there was a big chance you would be
  able to buy a dictionary word as .com

• In 2000, they lost the monopoly position and
  domain prices dropped over 95%

• Since then innovation halted and Network
  Solutions became one of the thousands            19
  anonymous domain registrars
           Internet Applications
• E-Mail
• File transfers
• Instant messaging (IM)
• Newsgroups
• Streaming audio and video
• Internet telephony
• World Wide Web (WWW)
• Most popular and widely used Internet application
• 30 billion e-mails sent every day
  – Spam – junk e-mail messages
  – Spam costs corporate America $9 billion per year

• Every e-mail message contains head that
  describes source and destination for the message
• E-mail messages are text, but may have
  attachments of many types of digital data
  – Viruses often transmitted via e-mail
     SMTP, POP, and IMAP (1)
• E-mail is sent across the Internet is managed and stored
  by mail servers

• Simple Mail Transfer Protocol (SMTP) is the standard to
  send mails to the server

• Post Office Protocol (POP) is the standard to get mails
  from the server

• The Interactive Mail Access Protocol (IMAP) is a newer
  e-mail protocol
SMTP, POP, and IMAP (2)

                Controlling Spam
• Use complex email addresses rather than name and surname
   – Why? Bots? Name Directories?

• Control exposure of email address
   – How? Java script? JPEG?

• Use multiple email addresses for different purposes
   – In what occasions?

• Use content-filtering software
   – black list spam filter 
   – white list spam filter 
   – challenge response using graphical challenges ?
                E-Mail Case Study
• Hotmail (1995)

• First place to get a free email address,
  disconnected from an ISP

• 4 years later, 30 million people worldwide were
  exchanging @hotmail email addresses

• Bought by Microsoft in 1998 for just 400 million

• 2007 the end of Hotmail is near
    – transformation to “Live” mail to become an
      integrated part of the Microsoft‟s “Live” family
                  File Transfers
• File transfer protocol (FTP)
  – Protocol providing for transmission of a file between
    an Internet server and a user‟s computer

• Peer-to-peer (P2P) file sharing
  – Share data from one computer to another
  – Every user can be a server
  – Napster
     • Kazaa
     • Gnutella
  – With P2P, every user on the network can make data
    available to every other user on the network
           Instant Messaging
• Allows user to create a private chat session with
  another user

• IM started with AOL

• IM sneaking into corporate networks

• Many Web-based companies use IM technology
  for customer service
  – eBay
                      ICQ case study
•   ICQ abbreviation of “I seek you”

•   1996 first easy to use instant messenger program where you
    could add friends to your list, and see if they were online

•   Back then it was revolutionary for the masses and it became the
    „application‟ everybody had installed

•   Acquired by AOL in June 1998 for a whopping $287 million

•   Eventually the program got too many additional features that
    made the application heavy and unorganized

•   Competition of AOL IM, Yahoo IM, and MSN Messenger
    increased, and friends on your ICQ-list left the application
    eventually resulting in a mass abandoning of the network
            Usenet Newsgroups
• Online, bulletin board discussion forums

• Users post and read messages

• More than 100,000 newsgroups

• Millions of newsgroup readers

• Important information resource, especially for technical
  issues and products

• Newsgroup messages distributed using open standard
   – Many are uncensored                             30
    Streaming Audio and Video
• Creating and sending audio and video files
  – Sports
     • Basketball at
     • Major league baseball

  – News
     • Fox News
     • CNN radio

  – Business
     • ZDNet

  – Education
     • Warriors of the Net                     31
          Internet Telephony
• Voice-over Internet Protocol (VoIP)

• Use your computer like a telephone

• Software connects computers via the Internet and
  transmits voice data

• Savings comes from eliminating toll charges
  between locations

            The World Wide Web
• Collection of hyperlinked computer files on the Internet
• Client-server application
   – Web servers
   – Web browsers as clients

• WWW standards
   – Hypertext markup language (HTML)
      • Current standard for writing Web pages
      • Tags in HTML instruct the client browser how to format and display the
        Web page content
   – Hypertext transfer protocol (HTTP)
      • Establishes a connection between Web server and client
   – Extensible markup language (XML)
      • A meta-markup language
      • Gives meaning to the data enclosed within XML tags
              Website case study
• Create your own free homepage on the web

• 1997 Fifth most popular website, with over
  500,000 homepages created

• Yahoo bought Geocities two years later for
  $3.57 billion dollars and started to actively
  commercialize the homepages with various
  advertising types that resulted in their death

• „Real‟ web hosting becoming affordable for
  anybody, the need for free homepages in this
  form vanished
Overview of Markup Languages
• SGML is a rich meta language that is useful for
  defining markup languages

• HTML is particularly useful for displaying Web

• XML defines data structures for electronic
  commerce (and much more …)

Development of Markup Languages

    Standard Generalized Markup
• The ISO adopted SGML standard in 1986

• SGML is nonproprietary and platform-

• SGML supports user-defined tags and
  architecture to complement the required
  richness of documents
  Extensible Markup Language
• XML is a descendant of SGML

• XML allows designers to easily describe and deliver
  structured data from any application in a standard,
  consistent way

• XML can be embedded within an HTML document

• XML allows you to create your own customized
  markup language.
          Learn XML in a slide 
•    Tag – a piece of Markup
    – An opening tag           <name>
    – A closing tag            </name>
•    Element – well formed usage of tags
    – <name>Alexiei</name>
•    Attribute – properties
    – <name length=“7”>Alexiei</name>

•    Rules to keep XML well formed
    1. Can be nested but not overlapping
    2. Case sensitivity
    3. Quoted attributes
    4. Required end tag

•    Short hand
    – <abc></abc> is equivalent to <abc/>
           Some XML examples

<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”>
           Some XML examples
<book>E-Commerce</booK> 

<book pages=100>E-Commerce</book> 

<book pages=“100”><title>E-Commerce</book></title> 

<book pages=“100”><title>E-Commerce</title></book> 

<book pages=“100”>
</book> 
  Processing a Request for an XML Page

• Why going through all this hassle?
• How would you go about displaying HTML on a
   – PC                                         42
   – Handheld
   – Mobile
   Hypertext Markup Language
• Tim Berners-Lee invented HTML

• HTML is a document production language that
  includes a set of tags that define the format and
  style of a document

• HTML is based on SGML

• HTML is an instance of one particular SGML
  document type – Document Type Definition
  (DTD)                                      43
                    HTML Tags
• An HTML document contains both document content and

• The tags are the HTML codes inserted in a document to
  specify the format on screen

•   Each tag is enclosed in brackets (< >)

• Most tags are two-sided – opening and closing tags

• Well formed tags, bots, meta tags?? Why are they
                   HTML Links
• Hyperlinks are bits of text that connect the current
  document to:
   – Another location in the same document
   – Another document on the same host machine
   – Another document on the Internet
   – Can they link to a toaster at home?

• Hyperlinks are created using the HTML anchor tag

• Two popular link structures:
   – Linear hyperlink structure
   – Hierarchical hyperlink structure
          HTML Version History
• HTML version 1.0 was introduced in 1991

• HTML 2.0 was released in Sept. 1995

• HTML 3.2 was introduced in 1997

• HTML 4.0 was released by W3C in Dec 1997

• HTML 4.01 was released in Dec 1999

• XHTML 1.0 became a W3C recommendation in Jan 2000
            HTML Editors (1)
• Low end editor displays HTML code on the
  screen and allow you to insert HTML tag pairs by
  clicking selected buttons

• High end editor are Web site builder programs,
  they provide a rich environment that displays the
  Web page, not the HTML code

• Microsoft FrontPage and Macromedia
  Dreamweaver are examples of Web site builders
HTML Editors (2)

   Static versus Dynamic Pages
• HTML and XML only display and exchange data
• No interactivity; no processing of data
• Scripting languages
   – Provides basic interactivity
        • Rollovers
        • Crawling text
   – JavaScript
   – VBScript

• Full-featured Web programming
   –   Java
   –   Client side scripting or browser side scripting
   –   Applets
   –   J2EE

• Common Gateway Interface (CGI)
   – Allows passing of data between a static HTML page and a
     computer program
         Searching the WWW
• Most data on the Internet is part of the WWW
• Search engines – large databases that index
  WWW content
• Building the search engine database
  – Submit a site to the search engine administrator for
  – Spiders
     • Metatags

  – Google
  – Yahoo
              Search Engines
• A search engine is a special kind of Web page
  software that finds other Web pages that match a
  word or phrase you entered

• A Web directory is a listing of hyperlinks to Web pages
  that is organized into hierarchical categories Eg:

• Search engines contain three major parts: spider,
  index, and utility
Popular Search Engines

Spiders and Crawlers


       Search Engine case study
• Search engine AltaVista was the Google of the last millennium

• First real effort to index the World Wide Web

• One of the few search engines that actually came up with good
  search results

• Had a hard time fighting spam listings in their results

• While spam grew logarithmic in Altavista, some company named
  Google found a way to prioritize web pages more intelligently, and
  thus keep spam out better

        Case Study:                              ‟s
• PageRank relies on the uniquely democratic nature of
  the web by using its vast link structure as an indicator
  of an individual page's value

• Google interprets a link from page A to page B as a
  vote, by page A, for page B

• But Google looks at more than the sheer volume of
  votes, or links a page receives; it also analyzes the
  page that casts the vote

• Votes cast by pages that are themselves "important"
  weigh more heavily and help to make other pages
          Intelligent Agents
• An intelligent agent is a program that
  performs functions such as
  – information gathering,
  – information filtering,
  – mediation running,
  – in the background on behalf of a person or

• What agents can you think of?              57
           Intelligent Agents (2)
• Search Agents
   – Improve your information retrieval on the Internet
   – Used to find pages on the Web easily and quickly
      • Meta Agents, Specialised (MP3), etc

• Web Agents
  – Improve browsing experience
     • Automate form filling, off-line browsing, etc

• Monitoring Agents
  – Monitor web sites or specific themes
  – Used to get automatic alerts about the latest news    58
            Intelligent Agents (3)
• Virtual Assistants
   – Artificial life
   – Characters, plants, animals or people living on your desktop

• Shop Bots
   – Allow users to compare prices on the Internet
   – Find the best price for books, CDs, movies, etc.

• Webmastering Agents
  – Make it easy to manage a Web site and make it more effective
  – Monitor broken links, content gathering etc.
       Intelligent Agents (4)
• Other agents …

  – Development agents
    • Used to develop other agents

  – Games agents
    • Used in games

 Ms Dewey not your
ordinary search agent!

         Internet Governance
• Internet Engineering Task Force (IETF)
  – Works in groups to develop standards
• Internet Engineering Steering Group (IESG)
  – Approves or disapproves standards developed by the

• Internet Architecture Board (IAB)
  – The oversight authority for the standards development

• World Wide Web Consortium (W3C)
  – Promotes the WWW and develops new web
    technologies and standards
• We‟re all very familiar with Web 1.0

• But what makes Web 2.0?

• Next lecture …



Shared By: