gen proteom & bioinformatics by SBMirza

VIEWS: 55 PAGES: 37

									    Proteomics
and Bioinformatics
      BINM 473
   Dr. Asma Ashraf
What is bioinformatics?

• Interface of biology and computers

• Analysis of genomes, genes, mRNA
and proteins using computer algorithms
and computer databases
  What is Genomics?


  What is Proteomics?


What is the Transcriptome?
What do you want out of this course?
Top ten challenges for bioinformatics

[1] Precise models of where and when transcription
     will occur in a genome (initiation and termination)

[2] Precise, predictive models of alternative RNA splicing

[3] Precise models of biological pathways;
    ability to predict cellular responses to external stimuli

[4] Determining protein:DNA, protein:RNA, protein:protein
    recognition codes

[5] Accurate ab initio protein structure prediction
Top ten challenges for bioinformatics

[6] Rational design of small molecule inhibitors of proteins

[7] Mechanistic understanding of protein evolution

[8] Mechanistic understanding of speciation

[9] Development of effective gene ontologies:
    systematic ways to describe gene and protein function




                                              Source: Ewan Birney,
                                              Chris Burge, Jim Fickett
 Themes throughout the course:
     gene/protein families



We will study it in a variety of contexts including
--homologs in various species
--sequence alignment
--gene expression
--protein structure
--phylogeny
bioinformatics                        medical
                                    informatics
                                                  Tool-users


                  public health
                   informatics




                                                  Tool-makers
      databases                    algorithms


                  infrastructure
    DNA     RNA       protein      phenotype




                       protein
            cDNA       sequence
            ESTs       databases
genomic
            UniGene
DNA
databases
There are three major public DNA databases



    EMBL          GenBank         DDBJ
   Housed           Housed       Housed
    at EBI          at NCBI      in Japan
  European          National
Bioinformatics     Center for
   Institute     Biotechnology
                  Information
                                             Growth of GenBank




                                                                                Base pairs of DNA (billions)
               Sequences (millions)




Updated 8-12-04:                      1982   1986   1990   1994   1998   2002
>40b base pairs
                                                       Year
Press Release (August 22, 2005)

   100 gigabases of sequence data
     (NCBI, EMBL, & DDBJ)

   over 165,000 organisms
The growth of GenBank. The blue area shows the total number of bases including
those from whole genome shotgun sequencing projects (WGS). The checkered
area shows only the non-WGS portion. With release 149, the number of WGS
bases exceeded the number of bases in the traditional GenBank divisions.
Go to NCBI website

http://www.ncbi.nlm.nih.gov/
PubMed is…

• National Library of Medicine's search service
• 12 million citations in MEDLINE
• links to participating online journals
• PubMed tutorial (via “Education” on side bar)
Entrez integrates…
• the scientific literature;
• DNA and protein sequence databases;
• 3D protein structure data;
• population study data sets;
• assemblies of complete genomes
Entrez is a search and retrieval system
    that integrates NCBI databases
BLAST is…

• Basic Local Alignment Search Tool
• NCBI's sequence similarity search tool
• supports analysis of DNA and protein databases
• 80,000 searches per day
OMIM is…

•Online Mendelian Inheritance in Man
•catalog of human genes and genetic disorders
Tax Browser is…
• browser for the major divisions of living organisms
  (archaea, bacteria, eukaryota, viruses)
• taxonomy information such as genetic codes
• molecular data on extinct organisms
Structure site includes…
• Molecular Modelling Database (MMDB)
• biopolymer structures obtained from
  the Protein Data Bank (PDB)
• Cn3D (a 3D-structure viewer)
• vector alignment search tool (VAST)
Synonymous vs. nonsynonymous
          changes
   Proline     four fold degenerate
                  amino acid

   C   C   T
   C   C   C       Synonymous changes
   C   C   A       Nonsynonmous changes
   C   C   G

   Arginine

   C G T
 Synonymous
 Substitution




Non-synonymous
  Substitution
           Central Dogma

• DNA  RNA  protein

• sequence  structure  function  evolution
RNA Modifications
What are cDNAs?
           Protein structures

• X-ray crystallography and Nuclear
  magnetic resonance (NMR)
• Primary structure
  – linear AA
• Secondary structure-
  – alpha helix and beta sheet
• Tertiary structures-
  – 3-d that exposes binding domains etc
             Linkage maps

• YAC Yeast artificial chromosome &
• BAC Bacterial artificial chromosome
   -used to clone large pieces of DNA
   -overlapping clones
• Are genes linked?
 How do we determine functions
          of genes?
• Expression patterns
   –   Northerns
   –   RT-PCR
   –   SAGE
   –   Microarrays
• Transgenics
   – insert genes what results?
• Mutants
   – classical genetics
   – molecular genetics
• And Functional Protein Assays
                    Species
• All organisms alive today can trace their
  ancestry back to the origin of life some 3.8
  billion years ago
  – Since then millions if not billions of branching
    events have occurred
• Mechanisms have to be in place for change
  to occur
  – genetic drift and natural selection

								
To top