The Internet
Introduction
Search engines
7KH ,QWHUQHW
Biosci Newsgroups
Link collections
Important entry sites
Comparison HUSAR / Internet
HUSAR HUSAR
Introduction Several basic sources of information
It is impossible to present all the resources on the
entire WWW
Interesting sites are added every week
Search engines
Search engines
Metasearch engines
Metasearch engines Newsgoups (BIOSCI)
Newsgoups (BIOSCI)
As the internet changes and grows, many interesting sites may be boring tomorrow
Our object for this session
• Not primarily where to get information but: Medline
• How to get to information (search strategies) Medline Homepages // portals
Homepages portals
Electronic journals
Electronic journals
The ‘best’ method doesn’t exist; therefore we present a personal view !
All the indicated links can be found at: genome.dkfz-heidelberg.de White lists
White lists
HUSAR HUSAR
The internet is a rich source of information The Internet
... But you have to combine the right question with the right source !
Introduction
I need info on Dr. Complexname. How do I purify DNA from hair ?
I need info on baldness. Is there a database of hair growth
related proteins ? Search engines
Newsgoups (BIOSCI)
Newsgoups (BIOSCI)
Search engines
Search engines Biosci Newsgroups
Metasearch engines
Metasearch engines
Homepages //portals
Homepages portals Link collections
Medline
Medline I need info on Dr. Smith,
Electronic journals member of National Baldness
Electronic journals Society.
Important entry sites
I need info on baldness related
diseases/syndromes
White lists I need info on Dr. Smith,
Comparison HUSAR / Internet
White lists working at Baldness University
HUSAR HUSAR
1
Search engines: shapes and sizes Do not rely on just one search engine
TMHMM Krogh TMHMM HUSAR Venter
&Krogh
Yahoo 15 2/0 4 5/0 4/0
• AltaVista and Northern Light are two of the largest search engines on the web.
• FAST Search aims to index the entire web. FAST 99 23870/1 23 3965/0 56000/>1
• Excite is a medium-sized index but uses concept searching.
• Companies can pay money to GoTo to be placed higher in the search results. Altavista 41 19186/? 19257/15 3946/>1 26000/>1
• Google is a search engine that makes use of link popularity to rank web sites.
• Yahoo is the largest human-compiled directory to the web, employs 150 editors
• Specialized search engines: Biofinder, www.biologie.de, BioHunt, Pasteur NetBook Excite 40/30 90/2 50/2 60/2 >150/3
Multiple search engines query several other search engines in parallel
• Examples: Metacrawler, DogPile, MetaFind, Cyber 411, Savvysearch Metacrawler 20 44/1 11 21/1 30/3
cuiwww.unige.ch/meta-index.html
cuiwww.unige.ch/meta-index.html
Total hits: blue Relevant hits: red
www.monash.com/spidap4.html
www.monash.com/spidap4.html Search engines may employ different AND / OR rules
www.library.carleton.edu/staff/terry/websearch/
www.library.carleton.edu/staff/terry/websearch/ HUSAR HUSAR
Comparison of search engines The Internet
Introduction
http://searchenginewatch.com
http://searchenginewatch.com
Search engines
Biosci Newsgroups
Link collections
Important entry sites
NorthernLight AltaVista Excite
INKtomi GOOgle InfoSeek Lycos
YaHoo MicroSoft NetScape
http://searchenginewatch.com Comparison HUSAR / Internet
HUSAR HUSAR
Usenet Biosci Newsgroups Usenet Biosci Newsgroups
Object
to organize discussions on
a large variety of topics
Advantages Disadvantages
• simple to complex questions • traffic can be too high or too low
• resources are scientists • resources are scientists
• Netscape and newslist format • spam !
HUSAR HUSAR
2
Usenet Biosci Newsgroups The Internet
Access over a newsreader
(e.g. Pine) is also very Introduction
convenient. Mailing lists or
reading by Deja Vu is also
possible.
Search engines
Biosci Newsgroups
Link collections
Instructions on how to
install Usenet newsgroups Important entry sites
are provided by
www.bio.net
Comparison HUSAR / Internet
HUSAR HUSAR
Useful link collections The Internet
Introduction
Search engines
Many, many links for molecular biology.
Internet problem: last update 1996
Biosci Newsgroups
Link collections
Important entry sites
Many links to a wide variety
of databases. Comparison HUSAR / Internet
Many, many links. Highly scientific
HUSAR HUSAR
EMBL / EBI EMBL / EBI
Keywords: molecular
biology • Proteomics
• Home of databases • regular newsletter about
EMBL, TREMBL and the EBI and
Swissprot Bioinformatics
• Mitochondrial database
• Ligand/Receptor
database • http://industry.ebi.ac.uk/
Database of databases
• Home of European Datamining, EST´s,
Drosophila Genome gene prediction, Java,
Project and Flybase microarrays, sequence
• Original home of SRS analysis, visualisation
• Macromolecule structure and Web technology.
• Large array of
downloadable software
HUSAR HUSAR
3
EBI´s Biocatalog http://www.sanger.ac.uk
Keywords: large scale sequencing and analysis
This includes major databases and analysis software
Home of Pfam
(Proteins Families Database of alignments and HMMs)
Home of AceDB
(managing of genome project data)
and EMBOSS
(The European Molecular Biology open software suite)
An abundance of tools, e.g.
Victor Solovyev‘s gene prediction software
HUSAR HUSAR
NCBI NIH
Keyword: major US
site for sequence analysis
Keywords: research, funding,
USA science politics
• ENTREZ search and
retrieval system
• Pubmed • 25 separate institutes
• home of BLAST • huge amount of data
• home of GENBANK
• Unigene database
• COGs (cluster of
genomic groups)
• dbSNP Single
Nucleotide
Polymorphisms
database
• 600+ genome maps
• Tools such as ORF
Finder and e-PCR
HUSAR HUSAR
NIH The Institute for Genome Research
• Local search engine Keyword: genome projects
• Still difficult to get to relevant data
http://www.tigr.org/tdb/
Abundant software, including
Genomes databases. E.g.:
• >20 microbes A system for finding genes
• Parasites: Trypanosoma in microbial DNA
brucei and Plasmodium
falciparum
• Human,
• Arabidopsis
MUMmer for aligning
whole genome sequences
Sequence clean-up program
HUSAR HUSAR
4
Institut Pasteur: Bio Netbook GenomeNet
Keywords: metabolic pathways / proteomics / metabolomics
KEGG
• Metabolic pathways
• Regulatory pathways
• Disease Catalogs, Cell Catalogs
• Molecule Catalogs; compounds
and enzymes
• Gene Catalogs
• Genome Maps
• Gene Expression Profiles
• Computational Tools
• Links to other pathway and
compound sites
• The Bio Netbook is a search engine especially designed for biologists
• Its index contains only biological expressions (2945)
• Growing database
• The homepage www.pasteur.fr contains a large amount of additional
information HUSAR HUSAR
GenomeNet / KEGG GenomeNet / KEGG
Regulatory Pathways
Gene Expression Profiles
Still preliminary character
Clicakable signals allow identification of enzyme
Metabolic Pathways
Graphical pathway maps and ortholog group
tables
Maps are fully interactive
HUSAR HUSAR
More about proteomics ExPASy
Keywords: Proteins /
proteomics / applications
Gene Expression Profiles
Still preliminary character
http://bodymap.ims.u-tokyo.ac.jp
Many applications
Largest collection of biology
links on the WWW
(few outdated)
High quality
search engine for
biologists
HUSAR HUSAR
5
ExPASy ExPASy
SWISS-MODEL, An
Swiss-PdbViewer is an Automated
application that provides Comparative Protein
a user friendly interface allowing Modelling Server
to analyse several proteins at the
same time.
Software for 2D analysis
HUSAR HUSAR
Pubcrawler MIPS
• Keyword: Proteins and more...
• Databases of proteins (Protfam), RNAs,
mitochondrial sequences
• Genome projects of human, yeast and Arabidopsis
• Pathways, Proteomics
• Yeast ORFs and genes
• Small but comprehensive link list
• An alert utility sends you once per week, via email,
new database entries related to your field of study.
• ORPHEUS is a software system for gene prediction
• It goes to the library. You go to the pub. in complete bacterial genomes and large genomic
• Automatic system which searches PubMed or other databases as often fragments.
as you want with your keywords or sequences
• Similar systems exist as well, links are indicated on the PubCrawler
homepage
HUSAR HUSAR
IMB Jena HUSAR
Keywords: biotech and Keyword: Sequence analysis
molecular biology
• Many useful links, up to date • Sequence Retrieval System
• Tools, databases, services • GDB
• OMIM
• AceDB
• Genecards is mirrored at
the DKFZ
• FAQ, Bioinformatics
information
• Link list (>200 links)
• Several free tools (e.g.
Genscan)
• ... and the HUSAR package
HUSAR HUSAR
6
The Internet Sequence analysis: Internet vs. HUSAR
Introduction The Internet HUSAR
Number of applications many >250
Up-to-date few all
Search engines Comprehensive no yes
Databases many >90
Speed low high
Biosci Newsgroups Security low high
Data storage none 40MB
Handling of multiple or large files bad (copy/paste) good
Link collections Batch job utilities mostly absent yes
User support low high
Development of customized tools no yes
Important entry sites Training low high
Bug removal slow fast
Costs no low
Comparison HUSAR / Internet
HUSAR HUSAR
Conclusion
The internet contains abundant information, the important
thing is to use clever strategies to find it.
You can always contact us at:
You can always contact us at:
genome @ dkfz-heidelberg.de
genome @ dkfz-heidelberg.de
if you have difficulty locating the information that you need.
if you have difficulty locating the information that you need.
HUSAR
7