gen proteom & bioinformatics by SBMirza


and Bioinformatics
      BINM 473
   Dr. Asma Ashraf
What is bioinformatics?

• Interface of biology and computers

• Analysis of genomes, genes, mRNA
and proteins using computer algorithms
and computer databases
  What is Genomics?

  What is Proteomics?

What is the Transcriptome?
What do you want out of this course?
Top ten challenges for bioinformatics

[1] Precise models of where and when transcription
     will occur in a genome (initiation and termination)

[2] Precise, predictive models of alternative RNA splicing

[3] Precise models of biological pathways;
    ability to predict cellular responses to external stimuli

[4] Determining protein:DNA, protein:RNA, protein:protein
    recognition codes

[5] Accurate ab initio protein structure prediction
Top ten challenges for bioinformatics

[6] Rational design of small molecule inhibitors of proteins

[7] Mechanistic understanding of protein evolution

[8] Mechanistic understanding of speciation

[9] Development of effective gene ontologies:
    systematic ways to describe gene and protein function

                                              Source: Ewan Birney,
                                              Chris Burge, Jim Fickett
 Themes throughout the course:
     gene/protein families

We will study it in a variety of contexts including
--homologs in various species
--sequence alignment
--gene expression
--protein structure
bioinformatics                        medical

                  public health

      databases                    algorithms

    DNA     RNA       protein      phenotype

            cDNA       sequence
            ESTs       databases
There are three major public DNA databases

    EMBL          GenBank         DDBJ
   Housed           Housed       Housed
    at EBI          at NCBI      in Japan
  European          National
Bioinformatics     Center for
   Institute     Biotechnology
                                             Growth of GenBank

                                                                                Base pairs of DNA (billions)
               Sequences (millions)

Updated 8-12-04:                      1982   1986   1990   1994   1998   2002
>40b base pairs
Press Release (August 22, 2005)

   100 gigabases of sequence data
     (NCBI, EMBL, & DDBJ)

   over 165,000 organisms
The growth of GenBank. The blue area shows the total number of bases including
those from whole genome shotgun sequencing projects (WGS). The checkered
area shows only the non-WGS portion. With release 149, the number of WGS
bases exceeded the number of bases in the traditional GenBank divisions.
Go to NCBI website
PubMed is…

• National Library of Medicine's search service
• 12 million citations in MEDLINE
• links to participating online journals
• PubMed tutorial (via “Education” on side bar)
Entrez integrates…
• the scientific literature;
• DNA and protein sequence databases;
• 3D protein structure data;
• population study data sets;
• assemblies of complete genomes
Entrez is a search and retrieval system
    that integrates NCBI databases

• Basic Local Alignment Search Tool
• NCBI's sequence similarity search tool
• supports analysis of DNA and protein databases
• 80,000 searches per day
OMIM is…

•Online Mendelian Inheritance in Man
•catalog of human genes and genetic disorders
Tax Browser is…
• browser for the major divisions of living organisms
  (archaea, bacteria, eukaryota, viruses)
• taxonomy information such as genetic codes
• molecular data on extinct organisms
Structure site includes…
• Molecular Modelling Database (MMDB)
• biopolymer structures obtained from
  the Protein Data Bank (PDB)
• Cn3D (a 3D-structure viewer)
• vector alignment search tool (VAST)
Synonymous vs. nonsynonymous
   Proline     four fold degenerate
                  amino acid

   C   C   T
   C   C   C       Synonymous changes
   C   C   A       Nonsynonmous changes
   C   C   G


   C G T

           Central Dogma

• DNA  RNA  protein

• sequence  structure  function  evolution
RNA Modifications
What are cDNAs?
           Protein structures

• X-ray crystallography and Nuclear
  magnetic resonance (NMR)
• Primary structure
  – linear AA
• Secondary structure-
  – alpha helix and beta sheet
• Tertiary structures-
  – 3-d that exposes binding domains etc
             Linkage maps

• YAC Yeast artificial chromosome &
• BAC Bacterial artificial chromosome
   -used to clone large pieces of DNA
   -overlapping clones
• Are genes linked?
 How do we determine functions
          of genes?
• Expression patterns
   –   Northerns
   –   RT-PCR
   –   SAGE
   –   Microarrays
• Transgenics
   – insert genes what results?
• Mutants
   – classical genetics
   – molecular genetics
• And Functional Protein Assays
• All organisms alive today can trace their
  ancestry back to the origin of life some 3.8
  billion years ago
  – Since then millions if not billions of branching
    events have occurred
• Mechanisms have to be in place for change
  to occur
  – genetic drift and natural selection

To top