Searching the Internet
Internet searching is the “art” of submitting a word or phrase to a web catalog or engine and receiving a series of
URLs containing the word or phrase. Search engines have become an important method of locating data on the web.
Uniform Resource Locator (URL) is a specification of the location of a link. It specifies the
protocol (http:// for a web page,) site name, path and file name to the resource. Think of it as
a networked extension of the standard filename concept: not only can you point to a file in a
directory, but that file and directory can exist on any machine on the network, can be served via
any of several different methods, and might not even be something as simple as a file: URLs can
also point to … queries stored deep within databases… (…from The Webmaster’s Lexicon)
The standard format of a URL is:
1. Scheme: appears before the colon, and describes the protocol, or the way the browser should
handle the resource.
http: = HyperText Transfer Protocol, the native transfer method on the Web.
ftp: = File Transport Protocol, for downloading files from an FTP server.
file: = specifies a file on your computer as the resource
news: = specifies a newsserver and newsgroup as the host and resource
mailto: = starts the mail program associated with the browser, with a recipient as
2. Host: appears after two forward slashes (//), and references the host computer (site) on which
the resource resides. The host segment includes the domain name of the host computer. Domain
names end in 2-5 letter zone names to indicate what type of site you are contacting:
.com = commercial organization
.edu = educational institutions
.gov = U.S. government and public sites
.mil = U.S. military sites
.net = networking organizations, communications service providers
.org = non-profit organizations and others not fitting existing categories
.fr, .uk, .us, .ca, … = international domains end in a two-letter country code
Internet Corp for Assigned Names and Numbers (www.icann.com) has added 7 new domains:
.info = information services .aero = aviation
.biz = trademarked businesses .coop = business cooperatives
.name = individual/personal sites .museum = museums
.pro = professionals
3. Resource: appears after a single forward slash (/), describing the full path to a file or
document. Index.html is the default resource on an http location.
http://www.unf.edu/index.html or http://osprey.unf.edu or http://www.state.fl.us
file:///C|/temp/jenny.gif (note changes to standard file notation)
[K.Brown – rev: 5/02] page 1 of 2
Searching the Internet
Where do I start?
A good place to start Internet searching is through the UNF Library’s home page.
On http://www.unf.edu/library you will find a link to Internet Search Engines.
This link, http://www.unf.edu/library/guides/search.html, describes and connects to several of
the most popular and useful search tools available.
I. Search Engines:
Search Engines are tools to let you explore the databases containing text from over a billion
unclassified Web pages (documents.) Most concentrate on providing powerful search
capabilities, not organization of the data. Search engines index data, they do not provide a
review process on the content or value of the data.
Most of the major search engines now also include additional services such as directories and
meta-index searches, as discussed below.
The most comprehensive search engine is AltaVista.
Others are Fast, HotBot, Infoseek, and Excite.
Excite includes reviews, discussion groups and classified ads.
II. Internet Directories:
Internet Directory tools provide multi-level topic directories of a smaller database of
documents, allowing you to browse for information on a given subject. Topic directories are
established based on reviewing and classifying each Web site for content.
Since classification of Web sites requires human intervention, these directories are smaller in
scope, but often lead to more precise results. The data is organized!
Yahoo arranges and reviews over a million sites.
LookSmart contains over 500,000.
Magellan reviews sites for value, allowing the user to screen out “content for mature
Lycos includes abstracts for sites matching search results.
Meta-indexes search other indexes. These tools translate your query into the format of several
other search tools and return results categorized by the tool used.
Google, Dogpile and MetaCrawler query most of the major search engines.
Google is currently the largest index with access to over 1.3 billion pages.
One site, www.37.com, claims to search using thirty-seven different engines.
Search Qualifiers / Boolean Operators: examples of commonly used operators
AND + Gore AND Bush returns documents with both Gore and Bush
OR Gore OR Bush returns documents with either Gore or Bush
NOT - mickey NOT mouse returns mickey but not mouse. Mickey Mantle
+mickey -mouse would be found, but not Mickey Mouse.
Capitalization Mouse returns proper name. Mickey Mouse would be
found, but not field mouse.
“phrase in quotes” “ Duke Blue Devils” returns exact phrase. Excludes pages about Duke
Power, devil worship or blue suede shoes
NEAR Duke NEAR Blue similar to quotes except proximity of words
NEAR Devils determines results
[K.Brown – rev: 5/02] page 2 of 2
Searching the Internet
[K.Brown – rev: 5/02] page 3 of 2