bioinformatics by 03tUBH5a



Prepared By :
In the last few decades, advances in molecular biology and the
equipment available for research in this field have allowed the
 increasingly rapid sequencing of large portions of the genomes of
 several species. In fact, to date, several bacterial genomes, as well
 as those of some simple eukaryotes (e.g., Saccharomyces
 cerevisiae, or baker's yeast) have been sequenced in full. The
Human Genome Project, designed to sequence all 24 of the human
 chromosomes, is also progressing. Popular sequence databases,
 such as GenBank and EMBL, have been growing at exponential
 rates. This deluge of information has necessitated the careful
 storage, organization and indexing of sequence information.
 Information science has been applied to biology to produce the
field called Bioinformatics.
Bioinformatics is the application of
computer technology to the management
of biological information. Computers are
used to gather, store, analyze and
integrate biological and genetic
information which can then be applied to
gene-based drug discovery and
The most pressing tasks in bioinformatics involve the
analysis of sequence information.

Computational Biology is the name given
to this process, and it involves the following :

Finding the genes in the DNA sequences of various

Developing methods to predict the structure and/or
function of newly discovered proteins and structural RNA

Clustering protein sequences into families of related
sequences and the development of protein models.

Aligning similar proteins and generating phylogenetic
trees to examine evolutionary relationships.
Over a century ago, bioinformatics history started with
an Austrian monk named Gregor Mendel. He is known
as the "Father of Genetics". He cross-fertilized
different colors of the same species of flowers. He kept
careful records of the colors of flowers that he cross-
fertilized and the color(s) of flowers they produced.
Mendel illustrated that the inheritance of traits could be
more easily explained if it was controlled by factors
passed down from generation to generation.

Since mendel, bioinformatics and genetic record keeping
have come a long way .
The aims of bioinformatics are basically three-fold. They are

Organization of data in such a way that it allows researchers to
access existing information & to submit new entries as they are
produced. While data-creation is an essential task, the information
stored in these databases is useless unless analyzed. Thus the
purpose of bioinformatics extends well beyond mere volume control.

To develop tools and resources that help in the analysis of data. For
example, having sequenced a particular protein, it is with previously
characterized sequences. This requires more than just a
straightforward database search. As such, programs such as FASTA
and PSI-BLAST much consider what constitutes a biologically
significant resemblance. Development of such resources extensive
knowledge of computational theory, as well as a thorough
understanding of biology.

Use of these tools to analyze the individual systems in detail, and
frequently compared them with few that are related.
The need for Bioinformatics capabilities has been
precipitated by the explosion of publicly available
genomic information resulting from the
Human Genome Project.

The goal of this project - determination of
 the sequence of the entire human genome
(approximately three billion base pairs) - will be
 reached by the year 2002
 Whole Genome Analyses and Sequences.

 Experimental Analyses involving thousands of Genes

 DNA Chips and Array Analyses -Expression Arrays ,
  Comparative Analyses between Species and Strains
  Proteomics: 'Proteome' of an Organism .

 Medical applications: Genetic Disease -Pharmaceutical
  and Biotech Industry.

 Forensic applications.

 Agricultural applications
               POTENTIAL OF
The potential of Bioinformatics in the identification of useful genes
leading to the development of new gene products, drug discovery and
drug development has led to a paradigm shift in biology and
 biotechnology-these fields are becoming more & more computationally

The new paradigm, now emerging, is that all the genes will
be known "in the sense of being resident in database available
electronically", and the starting point of biological investigation will be
theoretical and a scientist will begin with a theoretical conjecture and
only then turning to experiment to follow or test the hypothesis. With a
much deep understanding of the biological processes at the molecular
level, the Bioinformatics scientist have developed new techniques to
analyze genes on an industrial scale resulting in a new area of science
known as 'Genomics'.
1- Analysis of a single gene (protein) sequence. For

      Similarity with other known genes.

      Phylogenetic trees; evolutionary relationships

2- Analysis of complete genomes. For example:

      Which gene families are present, which missing?

      Location of genes on the chromosomes, correlation with
       function or evolution
3- Analysis of genes and genomes with respect to
functional data. For example:

   Expression analysis; microarray data; mRNA conc.

   Identification of essential genes, or genes involved in
   specific processes
A biological database is a large, organized body of
persistent data, usually associated with computerized
software designed to update, query, and retrieve components
of the data stored within the system.

A simple database might be a single file containing many
records, each of which includes the same set of information.
For example, a record associated with a nucleotide sequence
database typically contains information such as contact
name; the input sequence with a description of the type of
molecule; the scientific name of the source organism from
which it was isolated; and, often, literature citations
associated with the sequence.

For researchers to benefit from the data stored in a
database, two additional requirements must be met:

   •   Easy access to the information; and
   •   A method for extracting only that information needed
       to answer a specific biological question.
Currently, a lot of bioinformatics work is concerned
  with the technology of databases. These databases
  include both "public" repositories of gene data like
  GenBank or the Protein DataBank (the PDB), and
  private databases like those used by research groups
  involved in gene mapping projects or those held by
  biotech companies.

A few popular databases are GenBank from NCBI
  (National Center for Biotechnology Information),
  SwissProt from the Swiss Institute of Bioinformatics
  and PIR from the Protein Information Resource.
There are both standard and customized products to meet the
requirements of particular projects. There are data-mining software
that retrieve data from genomic sequence databases and also
visualization tools to analyze and retrieve information from
proteomic databases.

Homology and Similarity Tools:

Homologous sequences are sequences that are related by
divergence from a common ancestor. Thus the degree of similarity
between two sequences can be measured while their homology is a
case of being either true of false. This set of tools can be used to
identify similarities between novel query sequences of unknown
structure and function and database sequences whose structure and
 function have been elucidated. EG- BLAST
JAVA in Bioinformatics:
Since research centers are scattered all around the globe ranging from
private to academic settings, and a range of hardware and OSs are being
used, Java is emerging as a key player in bioinformatics. Physiome
Sciences' computer-based biological simulation technologies and
Bioinformatics Solutions' PatternHunter are two examples of the growing
adoption of Java in bioinformatics.

Perl in Bioinformatics:
String manipulation, regular expression matching, file parsing, data
format interconversion etc are the common text-processing tasks
performed in bioinformatics. Perl excels in such tasks and is being used
by many developers.Developers have designed several of their own
individual modules for the purpose, which have become quite popular
and are coordinated by the BioPerl project.
The BioJava Project is dedicated to providing Java tools for processing
biological data which includes objects for manipulating sequences,
dynamic programming, file parsers, simple statistical routines, etc.

The BioPerl project is an international association of developers of Perl
tools for bioinformatics and provides an online resource for modules,
scripts and web links for developers of Perl-based software.

A part of the BioPerl project, this is a resource to gather XML
documentation, DTDs and XML aware tools for biology in one location.

CORBA is one such framework for interlanguage support, and the
biocorba project is currently implementing a CORBA interface for
Studies of IDC points out that India will be a potential star in
bioscience field in the coming years after considering the factors like
bio-diversity, human resources, infrastructure facilities and
government’s initiatives. According to IDC, bioscience includes pharma,
Bio-IT (bioinformatics), agriculture and R&D. IDC has been reported that
the pharmaceutical firms and research institutes in India are looking
forward for cost-effective and high-quality research, development, and
manufacturing of drugs with more speed.

Bioinformatics has emerged out of the inputs from several different
areas such as biology, biochemistry, biophysics, molecular biology,
biostatics, and computer science.

This sector is the quickest growing field in the country. The vertical
growth is because of the linkages between IT and biotechnology,
spurred by the human genome project. The promising start-ups are
already there in Bangalore, Hyderabad, Pune, Chennai, and Delhi. There
are over 200 companies functioning in these places. IT majors such as
Intel, IBM, Wipro are getting into this segment spurred by the promises
in technological developments
       IT’S SCOPE
Bioinformatics has evolved into a full-fledged scientific discipline
over the last decade. The definition of Bioinformatics is not restricted
to computational molecular biology and computational structural
biology. It now encompasses fields such as comparative genomics,
structural genomics, transcriptiomics, Proteomics, cellunomics and
metabolic pathway engineering. Developments in these fields have
direct implications to healthcare, medicine, discovery of next
generation drugs, development of agricultural products, renewable
energy, environmental protection etc.

Bioinformatics integrates the advances in the areas of Computer
Science, Information Science and Information Technology to solve
complex problems in Life Sciences.

 The core data comprises of the genomes and proteomes of human
and other organisms, 3-D structures and functions of proteins,
microarray data, metabolic pathways, cell lines & hybridoma,
biodiversity etc.
Bioinformatics has a key role to play in the cutting edge
Research & Development areas such as functional genomics,
proteomics, protein engineering, pharmacogenomics, discovery of new
drugs and vaccines, molecular diagnostic kits, agro-biotechnology etc.

 It has now been universally recognized that Bioinformatics is the key
to the new grand data-intensive molecular biology that will take us into
21 century.
A Bioinformatician must acquire/possess expertise in the
essential multi-disciplinary fields that comprise the core of this new
science. Quality research and education in Bioinformatics are vital not
only to meet the existing challenges but also to set and accomplish
new goals in Life Sciences.

To top