2-3b-Holans
Document Sample


INTERNET SEARCHING
VLIR Workshop on Library
Development Problems
CBA Heverlee 13.2.2003
Ludo Holans
Lay-out
• Introduction
• Meta search engines
• Pitfalls
• Classic searching versus free internet
searching
• New developments : hyperbolic browsers
and the semantic web
Introduction
• The majority of information that can be searched
on the web is text-based.
• Searching techniques and operators
• a. Boolean operators
• b. Proximity search (near near/n with adj
• before after)
• c. Wildcards (truncation)
• d. Nesting (parentheses) « »
• e. Field searching
(domain,image,link,text,title,url)
META SEARCH ENGINES
• A. Searching instruments
• 1. Directories (databases compiled and
indexed by people). Hierarchical system of
categories. Tree structure.
http://www.britannica.com
• http://dmoz.org
• http://www.yahoo.com/
A. Searching instruments
• 2. Search engines
• Use of a spider or crawler in order to find new
or renewed information. This information is than
put in the database. The use of a stoplist can
eliminate a great percentage of noise. Stemming
(reduction of the keyword by its stem) enables the
future use of wildcards.
• Examples : http://www.altavista.com/
• http://www.excite.com/
• http://www.google.com/
A. Searching instruments
• 3. Metasearch instruments
• a. Metasites bring searching instruments via
hyperlinks of searching interfaces together on 1
website or webpage.
• Examples :
• http://www.gogettem.com/
•
http://stommel.tamu.edu/~baum/linuxlist/linuxlist/
node4.html
A. Searching instruments
• b. « Human resource » websites
• Via een form you send your query to a team
of specialists. The results of your question
are sent afterwards.
• Example : http://www.about.com/
A. Searching instruments
• c. Metasearch engines
• Examples :
• http://www.ixquick.com/
• http://www.mamma.com/
• http://www.MetaCrawler.com/
• http://www.Profusion.com/
• http://www.Queryserver.com/
• http://www.search.com/
B. Working of search engine
• Matching of document features to a query
• 1. Term frequency
• 2. Location of terms
• 3. Link analysis
• 4. Popularity
• 5. Date of publication
• 6. Length
• 7. Proximity of query terms
• 8. Proper nouns
PITFALLS
• Common errors :
• 1. Mispellings (searching, serching,
seerching, sherching etc)
• 2. Redundant terms (6 to 8 terms)
• 3. Stoplist terms
• 4. Too many terms, synonyms
• 5. Alternate spellings
PITFALLS
• 6. Alternate spellings
• 7. Construction too complicated or not the
correct Boolean construction
• USING FILTERS for slicing and dicing
your search results
PITFALLS
• Filters
• a. Site filters
• com, edu, gov, mil, net, org, arts, firm,
info, nom, rec, store, web
• country domains (geo or ISO3166
domains)
• b. Size and data filters
•
Filters
• c. Specialty filters and search options
• e.g. people’s names, depth, anchor, java
applets, domain, host, image, link, title, url,
file/media types, business document types
• UNDERSTAND your engines
• http://www.beaucoup.com
New developments
• Hyperbolic browsers :
http:www.webbrain.com
• Semantic web
• Vertical search engines
• http://searchmil.com/
• http://www.corrosionsource.com/
• http://www.google.com/linux
New developments
• OpenSource in an academic university
library
• Samba : http://www.samba.org
• Apache Web Server :
http://www.apache.org
• Majordomo :
http://www.greatcircle.com/majordomo
CLASSIC VERSUS FREE
INTERNET SEARCHING
• What can you expect or what should you
expect ?
• Example of a citation (web of knowledge
and NEC)
• Example of a topic : cellular neural
networks and visual computing
B. Exercises
• Panamarenko in the title-field
• Laplace in url-field
• Searchenginewatch.com in host-field
• Osama Bin Laden in image-field
Get documents about "