Brief Introduction to the Internet and Web for CS 301
What is the Internet
The Internet is an international network that connects many thousands of networks and millions of
computers across the world. The connection of networks of computers makes possible the exchange of
ideas and information in a manner not possible via traditional electronic and print media, resulting in an
astounding diversity of content. This diversity is increasing as developments in computing and
telecommunications enable previously autonomous services such as video, radio and television to be
accessed on the Internet.
To be able to communicate with each other and exchange information, these different types of computers
need to comply with a set of standard communication rules called protocol. All computers connected to
the Internet use IP, Internet Protocol, which controls the break up of data messages into units called
packets, and governs the routing of data from sender to receiver. IP is one of a suite of protocols known
as Transmission Control Protocol/Internet Protocol, or TCP/IP, which was developed by the US
Department of Defense to enable communications over different types of networks.
The backbone of the Internet consists of high-speed data communication lines linking major nodes or
host computers. These lines carry the bulk of the data traffic. Major Internet service providers (ISPs) own
the largest networks, which comprise the backbone of the Internet. By connecting together, these
networks form an extremely fast data pipeline that crisscrosses the world.
No one can cause the Internet to crash, as no single computer or node controls it. One or more Internet
nodes could fail without jeopardizing the Internet as a whole or preventing communications. However,
different parts of the world are not equally well defended against Internet service failure. In more
developed countries, the backbone of the Internet usually has redundant intersecting points. If one part
fails, data traffic is quickly rerouted to another. This feature is called Redundancy. The more redundancy
the backbone has, the more reliable the Internet service is.
While no one owns or controls the Internet, the Internet backbone is made up of large networks operated
by major telecommunications companies such as GTE, MCI, Qwest, Sprint, UUNet, and ANS. Global,
regional and local Internet Service Providers or ISPs (including universities) use these backbone
networks to interconnect their own networks. These ISPs provide Internet access to businesses and
individuals who connect their computers through telephone (at KU, via dial-in) or cable television lines,
directly to a dedicated network cable (at KU, the Ethernet), or even through wireless technology.
Brief history of the Internet
Sometime in the mid 1960's, during the Cold War, it became apparent that there was a need for a
bombproof communications system. A concept was devised to link computers together
throughout the country. With such a system in place large sections of the country could be nuked
and messages could still get through.
In the beginning, only government "think tanks" and a few universities were linked. Basically the
Internet was an emergency military communications system operated by the Department of
Defense's Advanced Research Project Agency (ARPA). The whole operation was referred to as
ARPANET.
In time, ARPANET computers were installed at every university in the United States that had
defense related funding. Gradually, the Internet had evolved from a military pipeline to a
communications tool for scientists. As more scholars came online, the administration of the
system transferred from ARPA to the National Science Foundation.
Years later, businesses began using the Internet and the administrative responsibilities were
once again transferred. At this time no one party "operates" the Internet, there are several
entities that "oversee" the system and the protocols that are involved. The speed of the Internet
has changed the way people receive information. It combines the immediacy of broadcast with
the in-depth coverage of newspapers, and the reference scope of a university library, all within a
few keystrokes reach.
How does it exist (and grow) financially?
The Internet is comprised of hundreds of networks, each administered by a network provider.
Network providers generate revenue by charging internet access fees to their customers. These
customers may either be individuals or other network providers seeking to connect their own
customers. Direct connection of distinct networks is referred to as peer connection while the
point at which networks exchange traffic is termed the peering point. Wherever two networks
peer, their respective network providers characteristically share the costs of the physical network
and the infrastructure required to facilitate the connection. These costs, of course, are passed
along to users in term of additional surcharges
While the present internet governance is loose it has to date been effective. Profit driven Internet
providers are motivated to improve both the quality of their service and scope of their networks to
increase still further their revenues. A second element to the profit equation is provided by
Internet merchants. They significantly subsidize the Internet by paying service providers yet
additional fee for the memory and bandwidth consumed.
The Internet is a true example of economy of scale. A vast number of users paying relatively
nominal fees has and shall continue to subsidize the continued proliferation of this valuable
resource. The Internet is further beginning to exploit economy of scope has new media
implementations begin to offer additional revenue sources. Such examples include net phone,
video teleconferencing, and video on demand. The significant streaming requirements of these
media are already providing network providers revenues based upon information transfer charges
and even percent profit sharing. Internet revenue generation shall become as creative as the
technologies that comprise it.
Who are the users and how do they use the Internet?
Generally, we think of Internet users as representing a demographic group as broad as the
resources and subjects offered on the net. However, like everything else that is capable of
generating revenue, there exists a corresponding typical user profile:
Age 33
60% are male
Household income of $59,000
80% access daily
84% report email and web indispensable
25% buy on the web
With the capability of cookie information retrieval and web site log on profiling, the Internet user is
perhaps the most detailed specimen capable of writing a check or initiating an electronic funds
transfer.
Internet users are connected to
a local Internet Service Provider
(ISP) via their computer and a
modem, typically phone, cable, or DSL. Upon physical connection to the provided network, the
user's request for a website is routed through this network until it reaches the web server that
stores or "hosts" the requested site. Upon user initiated query, that server responds by returning
the requested information from the site back through the network to the user.
Internet user applications are varied in scope and technology with many new offerings developed daily
including: net phone, tele-video conferencing, audio and video file download, multi-person gaming,
financial analysis and real time stock quoting. This has truly become a media whose offerings are limited
only by imagination.
What services are available?
Only a short time ago, many of the Internet services that we now take for granted were not available on
any platform other than UNIX. And now an abundance of powerful services are widely available within
common Internet browsers or simply a few mouse clicks removed. No longer does providing Internet
services require a fascination with the user-unfriendly complexities of UNIX or the significant amounts of
time and money involved in purchasing and maintaining a UNIX workstation. Section 5.0 overviews the
more conventional internet services available.
Electronic Mail (Email)
Electronic mail is one of the most basic services supported by the Internet. Typical uses of Email are for
person-to-person communications and for facilitation of group work activities, through the delivery of
messages to multiple recipients - a mailing list - and through the exchange of data files through the use of
e-mail attachments. Small mailing lists can be created from within the software of individual users, but
larger lists are best administered using specialized mailing list software. By the same logic, although
smaller data files can be transferred through e-mail attachments, larger files are best transferred using
the File Transfer Protocol (Section 5.2).
Email is facilitated through the electronic exchange between a client and server. An e-mail client is
known as a mail user agent, or MUA, while an e-mail server is known as a mail transport agent, or MTA.
The acronym makes obvious point that mail servers simply transport messages generated by other
software. The mail client allows users to compose, send, receive, and read electronic mail, as well as
delete and file sent and received messages. A mail server, or mail transport agent, on the other hand,
handles the delivery of mail to mail clients and the routing of mail between different servers.
File Transfer Protocol (FTP)
When individuals are exchanging small files among themselves via the Internet, the most common
approach is to send those files as e-mail enclosures. However, if it is necessary to distribute large files or
make files of any size publicly available, then FTP (File Transfer Protocol) is employed.
FTP is orchestrated via an FTP server which is simply a file server like any office server such as a Novell
NetWare file server. The difference is that local file servers are available only on local networks. By
contrast, FTP servers are file servers whose potential audience is made up of everyone on the Internet.
FTP is a standard that was designed to ensure that any machine, whether it's a client or a server, a UNIX
machine or a Macintosh, is able to transfer files back and forth with ease. Any FTP client or server
shares a common protocol (speaks the same language). They all share the same set of commands that
enables user log in, file structure navigation, and the transmission or reception if files.
Telnet
Telnet makes it possible for any computer to function as if it were a terminal attached directly to a remote
computer in order to access databases, library card catalogs, and other information. Use of telnet
requires the installation and configuration of telnet software. Some telnet sites allow the user to log in
using a generic user id and password; others require that you become a member or registered user by
first logging on as a guest in order to register. The advantage of telnet is the direct server linkage can
dramatically improve information retrieval speeds as well as permit the user access to library files
potentially missed by conventional browsers.
Search Engines
Search engines are huge databases of web page files that have been assembled automatically by
machine. There are two fundamental types of search engines: Individual and Meta. Individual search
engines compile their own searchable databases on the web. In contrast, meta-searchers do not compile
databases. Instead, they search the databases of multiple sets of individual engines simultaneously.
Search engines compile their databases by employing "spiders" or "robots" ("bots") to crawl through web
space from link to link, identifying and perusing pages. Sites with no links to other pages may be missed
by spiders altogether. Once the spiders get to a web site, they typically index most of the words on the
publicly available pages at the site. Web page owners may submit their URLs to search engines for
"crawling" and eventual inclusion in their databases. Whenever a user navigates the web using a search
engine, the engine is tasked to scan its index of sites and match the user’s keywords and phrases with
those in the texts of documents within the engine's database.
It is important to remember that when using a search engine, the search is not of the entire web as it
exists at that moment. The search is actually of only a portion of the web, captured in a fixed index
created at an earlier date. How much earlier is difficult to determine. Spiders regularly return to the web
pages they index to look for changes. When changes occur, the index is updated to reflect the new
information. However, the process of updating can take a while, depending upon how often the spiders
make their rounds and then, how promptly the information they gather is added to the index. Until a page
has been both "spidered" and "indexed," any new information will not be available for general access.
Push Ttechnology
Push technology is the process by which information is delivered automatically to a PC according to
programmed preferences, eliminating the need to surf several Web sites to gather specific news or
material. The advantages of push technology are straightforward. The traditional pull approach requires
that users know a priori where and when to look for data or that they spend an inordinate amount of time
polling known sites for updates and/or hunting on the network for relevant sites. Push relieves the user of
these burdens.
The problems of push are also fairly obvious. Push transfers control from the users to the data providers,
raising the potential that users receive irrelevant data while not receiving the information they need.
These potential problems can arise due to issues ranging from poor prediction of user interests to outright
abuse of the mechanism, such as "spamming". The "in-your-face" nature of push technology is the root
of both its potential benefits and disadvantages.
Chat
Internet Relay Chat, or IRC supports a means by which individuals around the world are enabled to
conduct online discussions. Like every other Internet service, IRC has client programs and server
programs. The client is, as usual, the program run on the user’s local machine. An IRC server resembles
a large switchboard, receiving everything a user types followed by message relay to other users and vice
versa. To support the real time chat capability, all the different servers are in constant contact with each
other. As a result, text typed to one server is quickly relayed to the other servers so that the entire IRC
world becomes a dynamic “chat room”.
E-Commerce
E commerce is the internet facilitated product marketing and exchange of a commodity or service for
payment. It is the means by which prospective buyers may quickly find products of interest and often
reduced costs. Conversely, the scope of the internet provides a literally world wide market exposure for
sellers. While the notion is predicated upon a conventional “brick and mortar” business model, the
infrastructure is necessarily supported by all the fundamental internet core requirements:
data storage and retrieval (relational databases, SQL),
web technology (client/server model)
Java interface to relational databases
web security (cryptography and ciphers, secure internet protocols, digital certificates, digital
signatures, and digital envelopes, firewalls)
Video conferencing
Video conferencing in its most basic form is the transmission of image (video) and speech (audio) back
and forth between two or more physically separate locations. This is accomplished through the use of
cameras (to capture and send video), video displays (to display received video), microphones (to capture
and send audio), and speakers (to play received audio).
Video conferencing began over a decade ago with the introduction of expensive group conferencing
systems designed to send and receive compressed audio and video over network connections that could
guarantee a dedicated rate of transmission and predictable service.
Now however, basic video conferencing may be accomplished with the expense of about $150 for all
necessary equipment, a digital camera and microphone. The most popular support software is a free
Video conferencing program, CU-SeeMe, available to anyone with a Macintosh or Windows and a
connection to the Internet. With CU-SeeMe, anyone can video conference with another site located
anywhere in the world.
How does a private user connect to the internet?
Four predominate connection alternatives are available to the general private user. These include the
phone modem, cable and DSL modems, and satellite dish.
Modems
While modems still provide the most common means of private user access to the internet, modem
speeds have pretty much hit the speed limit with 56K modems. In fact, 56K is a little misleading. Due to
FCC regulations, the maximum transmission is more around 53K. Anyone who has used a modem
knows the problems associated with dial-up access. Even if you just want to check your email for one
minute, you have to wait a couple minutes for your modem to dial a number and establish a connection to
your ISP. It often takes less time to check your email than it does to connect the Internet! While this isn't
a major problem if you rarely use the Internet, it can be a major annoyance if you use it heavily. For
heavy users, a dedicated, "always on" connection such as DSL or cable is the better alternative.
Cable
Cable connects the user to the Internet through a coaxial cable, often using the same line that carries the
cable TV service.
DSL
xDSL is used to describe several types of DSL (Digital Subscriber Line) technologies, including ADSL,
which provides different upload and download speeds and is most popular with consumers, and SDSL,
which provides the same speed in both directions and is most popular with businesses.
Satellite
For rural users or people that don't have DSL in their area, Satellite connectivity is becoming a more and
more viable alternative for high-speed Internet access. One company, DirecPC (which is an off-shoot of
DirecTV) has taken the lead in the Satellite provider playing field. They offer their service through AOL,
Earthlink, and Pegasus Broadband. The main problem with Satellite is that it's only for downloading data.
You'll need a modem to send information out.
A summary of private user connection alternatives is described in Table 8-1.
Connection Cost Speed Hardware Pros Cons
Modem $12 - $30 up to 56 Kbps 56k modem Inexpensive Ties up phone line
Wide availability Connection is not "always on"
Slower
Cable $40 - $60 500 kbps to 2 Mbps cable modem Wide availability Potential security risks
Relatively inexpensive
ADSL $60 - $80 128 kpbs - 1.54 Mpbs DSL modem Affordable Available only in limited areas
Shares a telephone line Speed can vary widely
Wide variety of speeds and prices User must be within 3 miles
Choice of service providers of switching site
Satellite $50 - $100 Up to 400 Kbps download Satellite dish Access Internet anywhere Limited upload speed
Up to 56 Kbps upload Available almost everywhere Limited competition
Table 8.1 - User Connection Alternatives
Description of the World Wide Web
The World Wide Web, a subset of the Internet, is a collection of many Internet services, including: email,
ftp, telnet, chat, and USENET news.
Each of these services requires its own servers and clients and uses its own protocol or language to
communicate.
A number of key attributes dominated the design of the World Wide Web:
Independence of specifications
Flexibility was clearly a key objective. Every specification needed to ensure interoperability placed
constraints on the implementation and use of the Web. Therefore, minimal constraints combined with
independent specifications provided the design objectives. The independence of specifications would
allow parts of the design to be replaced while preserving the basic architecture. Thus, an old FTP
protocol could be intermixed with the new HTTP protocol in the address space, and conventional text
documents could be intermixed with new hypertext documents. This ability to evolve from the past to the
present within the general principles of the WWW architecture insured that evolution into the future would
be smooth and incremental.
Universal Resource Identifiers
The power of a link in the Web is that it can point to any resource of any kind in the universe of
information. This requires a global space of identifiers. These Universal Resource Identifiers are the
primary element of Web architecture. The now well-known structure starts with a prefix such as "http:" to
indicate into which space the rest of the string points. The URI space is universal in that any new space
of any kind which has some kind of identifying, naming or addressing syntax can be mapped into a
printable syntax and given a prefix, and can then become part of URI space. The properties of any given
URI depend on the properties of the space into which it points. Depending on these properties, some
spaces tend to be known as "name" spaces, and some as "address" spaces. The web architecture,
fortunately, does not depend on the decision as to whether a URI is a name or and address, although the
phrase URL (locator) has been coined to indicate that most URIs actually in use were considered more
like addresses than names.
Opaqueness of identifiers
An important principle is that URIs are generally treated as opaque strings: client software is not allowed
to look inside them and to draw conclusions about the object referenced.
HTTP
As protocols went for accessing remote data, a standard did exist in the File Transfer Protocol (FTP).
However, this was not optimal for the web, in that it was too slow and not sufficiently rich in features, so a
new protocol designed to operate with the speed necessary for traversing hypertext links, HyperText
Transfer Protocol, was designed. The HTTP URIs are resolved into the addressed document by splitting
them into two halves. The first half is applied to the Domain Name Service to discover a suitable server,
and the second half is an opaque string which is handed to that server.
HTML
For the interchange of hypertext, the Hypertext Markup Language was defined as an associated data
format. Given the presumed difficulty of encouraging the world to use a new global information system,
HTML was chosen to resemble some SGML-based systems in order to encourage its adoption by the
documentation community, among whom SGML was a preferred syntax, and the hypertext community,
among whom SGML was the only syntax considered as a possible standard.