Project Title Creation of facili

Document Sample
Project Title Creation of facili Powered By Docstoc
              PVPSIT, Vijayawada
             30th September 2010

               Allam Appa Rao

10/12/2010         Allam Appa Rao   1
                        "Computer Science is no
Socrates taught us,     more about computers
the    essence     of   than astronomy is about
                             ”                                             Knuth
is         measuring,   8.html

counting,         and      If you want to understand life, don't think
                                                                           “Science is what we
weighing     together    about vibrant, throbbing gels and oozes, think
                                  about information technology.            understand well enough
                          --- Richard Dawkins, University of California,
with        reasoning                        Berkeley
                         Oxford UniversityThe Blind Watchmaker, 1986,
                                                                           to explain to a computer”
from postulates or                        Norton, p. 112.


   10/12/2010                            Allam Appa Rao                                            3
    Over the past few decades rapid
    developments in molecular research
    technologies       (MRT)       and
    developments      in    information
    technologies (IT) have combined to
    produce a tremendous amount of
    information related to molecular
10/12/2010        Allam Appa Rao      4
             Bioinformatics: Definition
    Bioinformatics is the application of
    information      technology      and
    computer science to the field of
    molecular biology.
    The term bioinformatics was coined by
    Paulien Hogeweg in 1979 for the study of
    informatic processes in biotic systems.
10/12/2010              Allam Appa Rao     8
             Bioinformatics: Applications
    Bioinformatics      focuses       on
    developing       and        applying
    computationally            intensive
    techniques like pattern recognition,
    data mining, machine learning
    algorithms, and visualization

10/12/2010              Allam Appa Rao      9
             Bioinformatics: Activities
    Common activities in bioinformatics
    include mapping and analyzing DNA
    and protein sequences, aligning
    different    DNA    and     protein
    sequences to compare them and
    creating and viewing 3-D models of
    protein structures.
10/12/2010             Allam Appa Rao     10
             DM: Definition

    Diabetes Mellitus is a condition
    in which the body either does
    not produce enough, or does not
    properly respond to, insulin, a
    hormone produced in the
10/12/2010        Allam Appa Rao   11
             Bioinformatics: Entailment
    Bioinformatics entails the creation
    and advancement of databases,
    algorithms, computational and
    statistical techniques, and theory to
    solve formal and practical problems
    arising from the management and
    analysis of biological data.
10/12/2010             Allam Appa Rao     12
             Bioinformatics: Research
    Major research efforts in the field
    include sequence alignment, gene
    finding, genome assembly, protein
    structure alignment, protein structure
    prediction,    prediction    of     gene
    expression      and      protein-protein
    interactions, genome-wide association
    studies and the modelling of evolution.
10/12/2010            Allam Appa Rao       13
             DM: Epidemic
    Diabetes Mellitus (DM) affects
    about 10-15% of the Indians and is
    assuming epidemic proportions,
    development of newer robust
    therapeutic approaches both in its
    prevention and treatment are
10/12/2010        Allam Appa Rao     14
             Our Work: BDNF
• BDNF is a natural compound that is present in
  human body and hence therapeutics developed
  using these molecules are expected to have fewer
  side effects.
• BDNF can indeed prevent and ameliorate DM, it
  would pave way to develop newer therapeutic
• BDNF as a therapeutic tool for the prevention and
  treatment of DM.

10/12/2010            Allam Appa Rao              15
10/12/2010   Allam Appa Rao   16
                             Protein folding ?
The ability of a protein to fold reliably into a pre-determined conformation despite a near infinite
number of possibilities is, despite much research, still poorly understood.

The structure of a protein is determined purely by the amino acid sequence, and the structure of
the protein determines the function.

The function of a protein depends entirely on the ability of the protein to fold rapidly and
reliably to its native structure. Many proteins fold spontaneously into their native structure in
aqueous solution.

It has been suggested that for a protein of 100 amino acids, a purely random conformational
search would require around 10 **29 years, and yet proteins are able to fold on a timescale of
milliseconds to seconds.

This suggests that only a small amount of conformational space is sampled during the folding
process and this in turn implies the existence of kinetic folding pathways.

This paradox of how proteins fold rapidly and reliably to their native conformation is known as
the protein folding problem.
 10/12/2010                                 Allam Appa Rao                                       17
      Statistics is essence of making sense

10/12/2010           Allam Appa Rao           18
             George Gamow (1904-68)

    Application of Shannon’s information
    theory     breaks    genetics   and
    molecular biology out of the
    descriptive     mode     into    the
    quantitative mode

10/12/2010           Allam Appa Rao    19
               Shannon - Information Flow
  Information flow in an information theoretical context is the transfer of
   information from a variable h to a variable l in a given process.
  The measure of information flow, p Gene                      Protein
  P is defined as the uncertainty before the process started minus the
   uncertainty after the process terminated. This can be quantified as

   where H (h | l) is the conditional entropy (equivocation) of variable h
   (before the process started) given the variable l (before the process
   started), and H(h | l') is the conditional entropy (equivocation) of variable
   h (before the process started) given the variable l' (the value of variable l
   after the process finished).
  H(X,Y) is the joint entropy, and can be calculated as follows:

10/12/2010                         Allam Appa Rao                              20
             Information in living organisms

                                        Braitenberg, a German cybernetist,
One of the prime                         has submitted evidence ‘that
characteristics of all                   information is an intrinsic part of the
                                         essential nature of life.’ The
living organisms is                      transmission of information plays a
                                         fundamental role in everything that
the     information                      lives.
                                   •       Without a doubt, the most complex information
they contain for all                       processing system in existence is the human body. If
                                           we take all human information processes together, that
                                           is, conscious ones (language, information-controlled
operational                                functions of the organs, hormone system), this involves
                                           the processing of 1024 bits daily.
                                   •       This astronomically high figure is higher by a factor of
processes                                  1,000,000 than the total human knowledge of 1018 bits
                                           stored in all the world’s libraries.

10/12/2010                Allam Appa Rao                                                        21
                             Allen Turing and
             Gatlinburg symposium on information theory in biology

                                            •       Information Source
                      The logic of          •       Transmission of Information
                      Turing                •       Tasks to be completed
                      machines has          •       Output
                      with the logic             Information source: DNA
                      of the genetic             Transmission through
                      information                 m/t/r RNA
                      system                     Tasks: Transcription,
                                                 Output: Protein(s)

10/12/2010                         Allam Appa Rao                                 22
                              Information Theory, Evolution,
                              and the Origin of Life
                              Hubert P. Yockey, pp 35

10/12/2010   Allam Appa Rao                                    23
  Shannon, Turing, Gamow and Rao

             H (gene)     L (protein)


10/12/2010              Allam Appa Rao   24
The word "information" derives from the Latin,
informare,           which             means
"to put into form”

               Latent                                 Manifest

  • Existing or present                     Readily seen, or understood:

                                            apparent,    clear,    evident,
  • but concealed or inactive                noticeable, observable

               DNA                                 Protein

  10/12/2010              Allam Appa Rao                                    25
                     How does the code work?
Inherited disease: broken/ damaged dna  broken/ damaged proteins
Viral disease: dna/rna  foreign proteins (Akin to Computer VIRUS)

 • Template for construction of proteins

              Latent Information                         Manifested

 10/12/2010                             Allam Appa Rao                 26
             Genomic Information
               What good is all
               this        genetic

10/12/2010                Allam Appa Rao                     27
             Information Inheritance

    • Human beings are endowed with the
      information encoded in the genetic material
      of     inheritance which controls the
      development, reproduction and self-repair.

    • The carrier of this information is a complex
      structure of dna.

10/12/2010            Allam Appa Rao            28
             Anything Computable!
• Markov      defined • Alonzo Church used
  what        became     Lambda calculus for
  known as Markov        computing
  algorithms (HMM)
  for       biological
  computation          • Kurt Gödel defined
                         Recursive functions

10/12/2010           Allam Appa Rao            29
             Genetic Information Flow
• Within the body,                     – first, by the transcription of
                                         portions of the dna into so-
  genetic information                    called messenger rna and,
  flows from dna to
                                       – second, by (translation) the
  protein and other                      assembly of individual amino
  products                               acids into polypeptides,
                                         including proteins.

                                        This is the process of
                                         life and living related
                                         to information flow.

10/12/2010            Allam Appa Rao                                 30
 Paradigm of 'computational thinking'
    Advances            in              This paradigm of
    computational                       'computational
                                        thinking‘      takes
    power             and               applications      of
    computational                       computer science far
    methods have led to                 beyond         mere
                                        programming      and
    the crystallization of              data management.
    the paradigm of
    'computational                       This   paradigm      of
    thinking’                           thinking’       provides
                                        newer methods for
                                        understanding        the
                                        complex life style
10/12/2010             Allam Appa Rao   diseases like Diabetes.31
               The Value of the Right Tool

3.2 Billion Nucleotides          A “Supercomputer” Cluster
 10/12/2010               Allam Appa Rao                     32
       The new Challenges in Computer Science

• Promoter     recognition in genomic
• Understanding data from micro array
  experiments, and
• Accurate prediction of protein folds
  from sequences

10/12/2010             Allam Appa Rao           33
                 Bioinformatics approach to extract
                       information from genes
With the widespread availability of nucleotide and amino acid sequences, novel
methods for extracting biologically and clinically relevant knowledge are feasible.

Data is deposited on the Internet on websites such as GeneCards, available at .

Further     information    can   be    obtained     from    related   sites   -      UniProt
( and SwissProt (

Using FASTA and CLUSTAL_X programs, similarity scores can be calculated to choose
items of interest.

Further information can be obtained by mining text, either manually or increasingly
using text-mining tools such as PathBinderH and GENIA corpus.
     Computational approach of protein analysis using
     affinity chromatography: application to proteomics
Computerization is necessary to build the database of chromatography
coupled mass spectrometric analytical data.

On the basis of this proteomic data we can identify the proteins as
biomarkers which are expressing dissimilarity between healthy and disease

Affinity chromatography         is one of the    fastest   liquid chromatographic
methods for the separation and purification of biomolecules due to its high
molecular specificity.

Proteomics       and    tools    for   identification   and   understanding   the
biochemistry of proteins and pathways are in a new stage of development
and evaluation.
     Development and Validation of LC Method for the
      Determination of Famciclovir in Pharmaceutical
        Formulation Using an Experimental Design
A rapid and sensitive RP-HPLC method with UV detection (242 nm) for routine
analysis of famciclovir in pharmaceutical formulations was developed.

Chromatography was performed with mobile phase containing a mixture of
methanol and phosphate buffer (50:50, v/v) with flow rate 1.0 mL min−1.

Quantitation was accomplished with internal standard method. The procedure
was validated for linearity (correlation coefficient =0.9999), accuracy, robustness
and intermediate precision.

Experimental design was used for validation of robustness and intermediate
To test robustness, three factors were considered; percentage v/v of
methanol in mobile phase, flow rate and pH; flow rate, the percentage of
organic modifier and pH have considerable important effect on the response.

For intermediate precision measure the variables considered were: analyst,
equipment and number of days. The RSD value (0.86%, n=24) indicated an
acceptable precision of the analytical method.

The proposed method was simple, sensitive, precise, accurate and quick and
useful for routine quality control.
     Mathematical Analysis of Diabetes Related Proteins
           Having High Sequence Complexity

We have searched for proteins affecting diabetes and we also found in which
common species these proteins were more prevalent and have performed protein
composition analysis of those having high sequence complexity.

About 90% of rat genes have counterparts in the mouse and human genomes and
this is the reason to find proteins common among the three different species.(Rat
Genome Sequencing Consortium 2004,

The distribution pattern of the protein variates was examined and bivariate plots
were further drawn.

The bivariate plots show a similar clustering for Rattus norvegicus and Mus
Musculus but show some variation in Homo sapiens indicating that the plots are
correct as Rattus Norvegicus and Mus Musculus are relatively close in the
phylogenetic tree(Sridhar GR etal) hence having a similar clustering.

The proteins which are away from the cluster are outliers due to the reason that
they are having different compositional characteristics.
    Bioinformatics analysis of diabetic retinopathy using
               functional protein sequences
Diabetic retinopathy is the leading cause of blindness among patients with
diabetes mellitus.

We evaluated the role of several proteins that are likely to be involved in diabetic
retinopathy by employing multiple sequence alignment using ClustalW tool and
constructed a phylogram tree using functional protein sequences extracted from

Phylogram was constructed using Neighbor-Joining Algorithm in bioinformatics

It was observed that aldose reductase and nitric oxide synthase are closely
associated with diabetic retinopathy.
It is likely that vascular endothelial growth factor, pro-inflammatory cytokines,
advanced glycation end products, and adhesion molecules that also play a role
in diabetic retinopathy may do so by modulating the activities of aldose
reductase and nitric oxide synthase.

These results imply that methods designed to normalize aldose reductase and
nitric oxide synthase activities could be of significant benefit in the prevention
and treatment of diabetic retinopathy.

The role of bioinformatics is to aid life scientists in gathering and processing
genomic data to study protein function.

Another important role is to aid researchers at pharmaceutical companies in
making detailed studies of protein structures to facilitate drug design.

 Human genome with 3 billion chemical nucleotide bases has about 30,000
genes whose functions are known to a great extent.

 These genes dictate the synthesis of different proteins which proteins differ
from one another in their amino acid sequence.

The physiological functions of a protein depend upon this sequence.
 The functions of the protein, butyrylcholinesterase, are not known to a great
Therefore its amino acid sequence is compared with the sequences of 29
different proteins using computational techniques.

Close similarity is observed with the protein EST2_human which confirms
similarities of physiological actions.

This finding obtained from computational techniques, now found to be
indispensable, help the scientists of life sciences to proceed with the work in their
wet laboratories.

Amino acid sequence of BChE is compared with proteins which act as
inhibitors of neovascularisation and similarity is found with one of the
inhibitors. Early onset of diabetic retinopathy is often found in patients who
have insufficient BChE in their serum.

This suggests that BChE may act as an inhibitor of neovascularisation
that causes retinopathy.
Bioinformatics analysis of functional protein sequences
 reveals a role for brain-derived neurotrophic factor in
          obesity and type 2 diabetes mellitus

Using bioinformatics techniques and sequence analyses algorithms, a
comparative study between human and rodents revealed similarity in the
behavior of genes involved in the control of energy homeostasis.

Brain-derived neurotrophic factor (BDNF) modulates the secretion and
actions of insulin, leptin, ghrelin, various neurotransmitters and peptides, and
pro-inflammatory cytokines involved in energy homeostasis suggesting that it
(BDNF) has a significant role in the pathobiology of obesity and type 2
diabetes mellitus.

Based on these evidences, we propose that obesity and type 2 diabetes could be
disorders of the brain and BDNF could serve as a biomarker in predicting their

Hence, methods developed to selectively deliver BDNF to appropriate
hypothalamic neurons may form a novel approach in their treatment.
Bioinformatics Analysis of Functional Protein Sequences
 Reveals a Role for Tumor Necrosis Factor-α and Nitric
         Oxide in Insulin Resistance Syndrome

Using bioinformatics techniques and sequence analyses algorithms, we
identified that tumor necrosis factor-α (TNF-α) and nitric oxide (NO) have a
significant role in the pathobiology of insulin resistance syndrome, a condition that
is common in subjects with abdominal obesity, hypertension, dyslipidemia,
atherosclerosis, and coronary heart disease and are accompanied by endothelial
dysfunction due to reduced endothelial nitric oxide generation.

TNF-α has neurotoxic actions, stimulates inducible NO synthase activity, and
modulates the expression of neurotransmitters involved in the control of feeding
and thermogenesis.

NO is a neurotransmitter and influences secretion and actions of various
hypothalamic peptides and neuropeptides.

Insulin suppresses the production of TNF-α but stimulates that of endothelial NO.

This close interaction between TNF-α, NO, hypothalamic peptides, and insulin
suggests that regulation of TNF-α and NO production and action could be critical in
the management of insulin resistance syndrome and its associated conditions.
  Phylogenetic Tree Construction of Butyrylcholinesterase
                 Sequences in Life Forms

Butyrylcholinesterase is an enzyme with few known physiological functions. It is
related to acetylcholine that was shown to be expressed in a variety of life forms.

We performed a search using the human butyrylcholinesterase gene
(HGNC:983;MIM:177400), and found the sequence in a broad spectrum including
plants, bacteria and animals.

Therefore butyrylcholinesterase appears to have evolved early in evolution,
and to have been conserved.
   Serum butyrylcholinesterase in type 2 diabetes mellitus: a
   biochemical and bioinformatics approach

Butyrylcholinesterase is an enzyme that may serve as a marker of metabolic
syndrome. We (a) measured its level in persons with diabetes mellitus, (b)
constructed a family tree of the enzyme using nucleotide sequences downloaded
from   NCBI.   Butyrylcholinesterase   was   estimated   colorimetrically   using   a
commercially available kit (Randox Lab, UK). Phylogenetic trees were constructed
by distance method (Fitch and Margoliash method) and by maximum parsimony
There was a negative correlation between serum total cholesterol
and butyrylcholinesterase (-0.407; p < 0.05) and between serum
LDL cholesterol and butyrylcholinesterase (-0.435; p < 0.05). There
was   no   statistically   significant   correlation   among   the   other
biochemical parameters. In the evolutionary tree construction both
methods gave similar trees, except for an inversion in the position of
Sus scrofa (M62778) and Oryctolagus cuniculus (M62779) between
Fitch and Margoliash, and maximum parsimony methods.

The level of butyrylcholinesterase enzyme was inversely related to
serum cholesterol; dendrogram showed that the structures from
evolutionarily close species were placed near each other.
        Alzheimer's disease and Type 2 diabetes mellitus: the
                     cholinesterase connection?
Alzheimer's disease and type 2 diabetes mellitus tend to occur together.

We sought to identify protein(s) common to both conditions that could suggest a
possible unifying pathogenic role. Using human neuronal butyrylcholinesterase
(AAH08396.1) as the reference protein we used BLAST Tool for protein to protein
comparison in humans. We found three groups of sequences among a series of
12, with an E-value between 0–12, common to both Alzheimer's disease and
diabetes:    butyrylcholinesterase       precursor   K   allele   (NP_000046.1),
acetylcholinesterase isoform E4-E6 precursor (NP_000656.1), and apoptosis-
related acetylcholinesterase (1B41|A).
Butyrylcholinesterase and acetylcholinesterase related proteins were found
common to both Alzheimer's disease and diabetes; they may play an etiological
role via influencing insulin resistanceand lipid metabolism.