WHAT IS BIOINFORMATICS?
ORIGIN OF BIOINFORMATICS
AIMS OF BIOINFORMATICS
NEEDS OF BIOINFORMATICS
POTENTIAL OF BIOINFORMATICS
LEVELS OF BIOINFORMATICS
TOOLS OF BIOINFORMATICS IN INDIA
APPLICATION PROGRAMMES IN BIOINFORMATICS
BIOINFORMATICS AND IT’S SCOPE
In the last few decades, advances in molecular biology and the
equipment available for research in this field have allowed the
increasingly rapid sequencing of large portions of the genomes
several species. In fact, to date, several bacterial genomes, as
as those of some simple eukaryotes (e.g., Saccharomyces
cerevisiae, or baker's yeast) have been sequenced in full. The
Human Genome Project, designed to sequence all 24 of the
chromosomes, is also progressing. Popular sequence
such as GenBank and EMBL, have been growing at exponential
rates. This deluge of information has necessitated the careful
storage, organization and indexing of sequence information.
Information science has been applied to biology to produce the
field called Bioinformatics.
WHAT IS BIOINFORMATICS?
Bioinformatics is the application of
computer technology to the
management of biological information.
Computers are used to gather, store,
analyze and integrate biological and
genetic information which can then be
applied to gene-based drug discovery
The most pressing tasks in bioinformatics involve the
analysis of sequence information.
Computational Biology is the name given
to this process, and it involves the following :
Finding the genes in the DNA sequences of various
Developing methods to predict the structure and/or
function of newly discovered proteins and structural
Clustering protein sequences into families of related
sequences and the development of protein models.
Aligning similar proteins and generating
phylogenetic trees to examine evolutionary
ORIGIN OF BIOINFORMATICS
Over a century ago, bioinformatics history started
an Austrian monk named Gregor Mendel. He is
as the "Father of Genetics". He cross-fertilized
different colors of the same species of flowers. He
kept careful records of the colors of flowers that he
cross-fertilized and the color(s) of flowers they
produced. Mendel illustrated that the inheritance of
traits could be more easily explained if it was
controlled by factors passed down from generation to
Since mendel, bioinformatics and genetic record
keeping have come a long way .
AIMS OF BIOINFORMATICS
The aims of bioinformatics are basically three-fold. They are
Organization of data in such a way that it allows researchers to
access existing information & to submit new entries as they are
produced. While data-creation is an essential task, the information
stored in these databases is useless unless analyzed. Thus the
purpose of bioinformatics extends well beyond mere volume
To develop tools and resources that help in the analysis of data.
For example, having sequenced a particular protein, it is with
previously characterized sequences. This requires more than just a
straightforward database search. As such, programs such as FASTA
and PSI-BLAST much consider what constitutes a biologically
significant resemblance. Development of such resources extensive
knowledge of computational theory, as well as a thorough
understanding of biology.
Use of these tools to analyze the individual systems in detail,
and frequently compared them with few that are related.
NEEDS OF BIOINFORMATICS
The need for Bioinformatics capabilities has been
precipitated by the explosion of publicly available
genomic information resulting from the
Human Genome Project.
The goal of this project - determination of
the sequence of the entire human genome
(approximately three billion base pairs) - will be
reached by the year 2002
Whole Genome Analyses and Sequences.
Experimental Analyses involving thousands of Genes
DNA Chips and Array Analyses -Expression Arrays ,
Comparative Analyses between Species and Strains
Proteomics: 'Proteome' of an Organism .
Medical applications: Genetic Disease -
and Biotech Industry.
The potential of Bioinformatics in the identification of useful
leading to the development of new gene products, drug discovery
drug development has led to a paradigm shift in biology and
biotechnology-these fields are becoming more & more
The new paradigm, now emerging, is that all the genes will
be known "in the sense of being resident in database available
electronically", and the starting point of biological investigation will
theoretical and a scientist will begin with a theoretical conjecture
only then turning to experiment to follow or test the hypothesis.
THREE LEVELS OF
1- Analysis of a single gene (protein) sequence. For
Similarity with other known genes.
Phylogenetic trees; evolutionary relationships
2- Analysis of complete genomes. For example:
Which gene families are present, which missing?
Location of genes on the chromosomes, correlation with
function or evolution
3- Analysis of genes and genomes with respect to
functional data. For example:
Expression analysis; microarray data; mRNA conc.
Identification of essential genes, or genes involved in
A biological database is a large, organized body of
persistent data, usually associated with computerized
software designed to update, query, and retrieve
components of the data stored within the system.
A simple database might be a single file containing
many records, each of which includes the same set of
information. For example, a record associated with a
nucleotide sequence database typically contains
information such as contact name; the input sequence
with a description of the type of molecule; the scientific
name of the source organism from which it was isolated;
and, often, literature citations associated with the
For researchers to benefit from the data stored in a
database, two additional requirements must be met:
• Easy access to the information; and
• A method for extracting only that information needed
to answer a specific biological question.
Currently, a lot of bioinformatics work is concerned
with the technology of databases. These databases
include both "public" repositories of gene data like
GenBank or the Protein DataBank (the PDB), and
private databases like those used by research groups
involved in gene mapping projects or those held by
A few popular databases are GenBank from NCBI
(National Center for Biotechnology Information),
SwissProt from the Swiss Institute of
Bioinformatics and PIR from the Protein
TOOLS OF BIOINFORMATICS
There are both standard and customized products to meet the
requirements of particular projects. There are data-mining
that retrieve data from genomic sequence databases and also
visualization tools to analyze and retrieve information from
Homology and Similarity Tools:
Homologous sequences are sequences that are related by
divergence from a common ancestor. Thus the degree of
between two sequences can be measured while their homology is
case of being either true of false. This set of tools can be used to
identify similarities between novel query sequences of unknown
structure and function and database sequences whose structure
JAVA in Bioinformatics:
Since research centers are scattered all around the globe ranging
private to academic settings, and a range of hardware and OSs are
used, Java is emerging as a key player in bioinformatics. Physiome
Sciences' computer-based biological simulation technologies and
Bioinformatics Solutions' PatternHunter are two examples of the
adoption of Java in bioinformatics.
Perl in Bioinformatics:
String manipulation, regular expression matching, file parsing, data format
interconversion etc are the common text-processing tasks performed in
bioinformatics. Perl excels in such tasks and is being used by many
developers.Developers have designed several of their own individual modules
for the purpose, which have become quite popular and are coordinated by the
The BioJava Project is dedicated to providing Java tools for
processing biological data which includes objects for manipulating
sequences, dynamic programming, file parsers, simple statistical
The BioPerl project is an international association of developers of
Perl tools for bioinformatics and provides an online resource for
modules, scripts and web links for developers of Perl-based
A part of the BioPerl project, this is a resource to gather XML
documentation, DTDs and XML aware tools for biology in one
CORBA is one such framework for interlanguage support, and the
biocorba project is currently implementing a CORBA interface for
BIOINFORMATICS IN INDIA
Studies of IDC points out that India will be a potential star in
bioscience field in the coming years after considering the factors
like bio-diversity, human resources, infrastructure facilities and
government’s initiatives. According to IDC, bioscience includes
pharma, Bio-IT (bioinformatics), agriculture and R&D. IDC has been
reported that the pharmaceutical firms and research institutes in
India are looking forward for cost-effective and high-quality
research, development, and manufacturing of drugs with more
Bioinformatics has emerged out of the inputs from several
different areas such as biology, biochemistry, biophysics, molecular
biology, biostatics, and computer science.
This sector is the quickest growing field in the country. The
vertical growth is because of the linkages between IT and
biotechnology, spurred by the human genome project. The
promising start-ups are already there in Bangalore, Hyderabad,
Pune, Chennai, and Delhi. There are over 200 companies functioning
in these places. IT majors such as Intel, IBM, Wipro are getting into
this segment spurred by the promises in technological
Bioinformatics has evolved into a full-fledged scientific discipline
over the last decade. The definition of Bioinformatics is not
restricted to computational molecular biology and computational
structural biology. It now encompasses fields such as comparative
genomics, structural genomics, transcriptiomics, Proteomics,
cellunomics and metabolic pathway engineering. Developments in
these fields have direct implications to healthcare, medicine,
discovery of next generation drugs, development of agricultural
products, renewable energy, environmental protection etc.
Bioinformatics integrates the advances in the areas of Computer
Science, Information Science and Information Technology to solve
complex problems in Life Sciences.
The core data comprises of the genomes and proteomes of
human and other organisms, 3-D structures and functions of
proteins, microarray data, metabolic pathways, cell lines &
hybridoma, biodiversity etc.
Bioinformatics has a key role to play in the cutting edge
Research & Development areas such as functional genomics,
proteomics, protein engineering, pharmacogenomics, discovery of
new drugs and vaccines, molecular diagnostic kits, agro-
It has now been universally recognized that Bioinformatics is the
key to the new grand data-intensive molecular biology that will
take us into 21 century.
A Bioinformatician must acquire/possess expertise in the
essential multi-disciplinary fields that comprise the core of this new
science. Quality research and education in Bioinformatics are vital
not only to meet the existing challenges but also to set and
accomplish new goals in Life Sciences.