molecular_biology_genomics_intro_v2 by pptfiles

VIEWS: 0 PAGES: 53

									                  Introduction to
     Molecular Biology, Genetics and Genomics


                     Sushmita Roy

             www.biostat.wisc.edu/bmi576/
                sroy@biostat.wisc.edu
                  September 6, 2012

BMI/CS 576
                     Goals for today

• Molecular biology crash course:
   – The different parts of a cell
   – DNA, RNA, chromosomes, nucleus, cytoplasm
   – Bio-chemical entities of a cell: mRNA, proteins,
     metabolites
   – genes, heredity, transcription, translation, gene regulation,
     gene expression, alternative splicing
• Genomics crash course:
   – Genomes, functional genomics, other omes, networks
               Organization of biological information



     Organism
                                                                        Chromosome

                                Tissue

                                                     Cell




                                                                 Gene

http://publications.nigms.nih.gov/thenewgenetics/chapter1.html
The central dogma of Molecular biology

                DNA


                    Transcription


                RNA


                    Translation


               Proteins
image from the DOE Human Genome Program
http://www.ornl.gov/hgmis
                          DNA

• Short for Deoxy ribonucleic acid

• composed of small chemical units called nucleotides (or
  bases)
   – adenine (A), cytosine (C), guanine (G) and thymine (T)
   – ATGC is the alphabet

• DNA is double stranded: made up two twisting strands

• Each strand of DNA is a string composed of the four
  letters: A, C, G, T
                 DNA is a double helical molecule

    DNA molecules consist of two strands
      arranged in a double helix

    • DNA is made up of nucleotides

    Double-helical structure is needed for the DNA
    molecule to store and pass with great
    precision




James Watson, Francis Crick, Maurice Wilkins
and Rosalind Franklin
        Watson-Crick Base Pairs




   A always bonds to T                 C always bonds to G



This is called base pairing.
A and G are double ringed structures called purines.
C and T single ringed structures called pyrimidines
          5’ and 3’ of a DNA molecule
• The backbone of this molecule has
  alternating carbon and phosphate
  molecules
• each strand of DNA has a “direction”
   – at one end, the terminal carbon atom
      in the backbone is the 5’ carbon atom
      of the terminal sugar
   – at the other end, the terminal carbon
      atom is the 3’ carbon atom of the
      terminal sugar
• therefore we can talk about the 5’ and the
  3’ ends of a DNA strand
   DNA stores the blue print of an organism

• The heredity molecule
• Has the information needed to make an organism
• Base pairing enables self-replication:
  – one strand has all the information
                            Chromosomes

  • All the DNA of an organism is
    divided up into individual
    chromosomes

  • prokaryotes (single-celled
    organisms lacking nuclei)
    typically have a single circular
    chromosome

  • eukaryotes (organisms with
    nuclei) have a species-specific
    number of chromosomes

Image from www.genome.gov
                   DNA packaging in Chromatin




DNA is very long (3m in humans), cell is very small
Chromosome compresses the DNA molecule 50,000
Collection of DNA and proteins is called chromatin.
Different organisms have different numbers of
                chromosomes
   Organism             # of chromosomes
   Yeast                      32
   Human                      46
   Fly                         8
   Mouse                      40
   Arabidopsis                10
   Worm                       12
                               Genes
• genes are the basic units of
  heredity
• a gene is a sequence of bases
  which specifies a protein or RNA
  genes
• the human genome comprises ~
  25,000 protein-coding genes (still
  being revised)
• One gene can have many
  functions
• One function can require many
  genes
                          …GTATGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTC…
                 Structure of genes




DNA
        Gene A           Gene B       Gene C

      Non-coding          Gene          Promoter
                     Genomes
• Refers to the complete complement of DNA for a given
  species

• the human genome consists of 2X23 chromosomes

• every cell (except egg and sperm cells and mature red
  blood cells) contains the complete genome of an organism
Some Greatest Hits
Some Genome Sizes
Number of sequenced genomes
The central dogma of Molecular biology

                DNA


                    Transcription


                RNA

                    Translation


               Proteins
                            RNA

• RNA is like DNA except:
   – single stranded
   – U is used in place of T
• a strand of RNA can be thought of as a string composed of
  the four letters: A, C, G, U
                   Transcription
• In eukaryotes: happens inside the nucleus
• RNA polymerase is an enzyme that builds an RNA strand
  from a gene
• RNA Pol II is recruited at specific parts of the genome in a
  condition-specific way.
• Transcription factor proteins are assigned the job of Pol II
  recruitment.

• RNA that is transcribed from a gene is called messenger
  RNA (mRNA)
 Transcription: Process of turning DNA
                into RNA




mRNA
The central dogma of Molecular biology

                DNA


                    Transcription


                RNA

                    Translation


               Proteins
                          Translation
•   Process of turning mRNA into proteins.

•   Happens inside the cytoplasm in ribosomes

•   ribosomes are the machines that synthesize proteins from mRNA

•   Translation process reads one codon at a time

•   translation begins with the start codon

•   translation ends with the stop codon
Translation happens in ribosomes
                       Codons




• Each triplet of bases is called a odon
• How many codons are possible?
• Each codon is responsible for coding a particular
  amino acid.
The Genetic Code
Codons and Reading Frames
                    Alanine


                    Threonine
                      Proteins
• Proteins are long strings of composed of amino acids

• There are 20 different amino acids known
Amino Acids
Proteins are the workhorses of the cell

•   structural support
•   storage of amino acids
•   transport of other substances
•   coordination of an organism’s activities
•   response of cell to chemical stimuli
•   movement
•   protection against disease
•   selective acceleration of chemical reactions
        Proteins are complex molecules

• Primary amino acid
  sequence
• Secondary structure
• Tertiary structure
• Quarternary structure
        Some well-known proteins




                                                          Actin:
Hemoglobin: carries oxygen Insulin: metabolism of sugar   maintenance of
                                                          cell structure
                       Hemoglobin protein HBA1
>gi|224589807:226679-227520 Homo sapiens
      chromosome 16, GRCh37.p9 Primary
      Assembly
1 cccacagact cagagagaac ccaccatggt
      gctgtctcct gacgacaaga ccaacgtcaa
61 ggccgcctgg ggtaaggtcg gcgcgcacgc
      tggcgagtat ggtgcggagg ccctggagag
                                           >sp|P69905|HBA_HUMAN Hemoglobin subunit
121 gatgttcctg tccttcccca ccaccaagac
                                           alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
      ctacttcccg cacttcgacc tgagccacgg
                                           MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFL
181 ctctgcccag gttaagggcc acggcaagaa       SFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA
      ggtggccgac gcgctgacca acgccgtggc     HVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVT
241 gcacgtggac gacatgccca acgcgctgtc       LAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
      cgccctgagc gacctgcacg cgcacaagct
301 tcgggtggac ccggtcaact tcaagctcct
      aagccactgc ctgctggtga ccctggccgc
361 ccacctcccc gccgagttca cccctgcggt
      gcacgcctcc ctggacaagt tcctggcttc
421 tgtgagcacc gtgctgacct ccaaataccg
      ttaagctgga gcctcggtgg ccatgcttct
481 tgcccctttg g                               Amino acid sequence (142 aa)          Protein 3d structure



            DNA sequence (491 bp)
     RNA Processing in Eukaryotes

• eukaryotes are organisms that have enclosed nuclei in
  their cells

• in many eukaryotes, RNAs consist of alternating
  exon/intron segments

• exons are the coding parts

• introns are spliced out before translation
RNA Splicing
                   RNA Genes
• not all genes encode proteins
• for some genes the end product is RNA
   – ribosomal RNA (rRNA), which includes major
      constituents of ribosomes
   – transfer RNAs (tRNAs), which carry amino acids to
      ribosomes
   – micro RNAs (miRNAs), which play an important
      regulatory role in various plants and animals
   – linc RNAs (long non-coding RNAs), play important
      regulatory roles.
Central Dogma revisited

              DNA


                Transcription


              RNA

Translation               Non-coding RNA processing


Proteins                ncRNA, miRNA, rRNAs
                      Summary

• Key concepts in molecular biology
   – Central Dogma
   – DNA, RNA, proteins
   – Chromosomes, Nucleus, Ribosomes
• Important processes
   – Transcription
   – Translation
   – RNA splicing
                Functional Genomics

• Aims to characterize gene, proteins in an organism in
  an unbiased way using high throughput technologies.
• Really focused on “beyond the genetic sequence”
• What does a piece of DNA do?
   – Gene, regulatory element, a mutation
• Has generated large collections of “omics” datasets
   – Gene expression
   – Protein expression
   – Metabolite levels
                        Metabolites

• Metabolism:
   – A set of chemical processes in cells
   – Need for sustaining life
• Small molecules that are intermediates of
  metabolism
   – Sugar
   – Glycerol
• Metabolic pathway
   – A set of chemical reactions in a cell
                   The Tri-Carboxylic Acid cycle
                                  Metabolites




                                                   Enzyme




Courtesy KEGG Pathways
Yeast metabolic pathways
        Context-specific expression of a cell

 • The DNA is static
 • But the set of mRNA per cell type, environment, time-
   point may be different.
 • A key process is gene regulation
    – determines which genes are expressed when
Environmental signal
             Transcriptional gene regulation

 • Key control process that determines what genes are
   expressed when
 • Requires
     – RNA Polymerase
     – Transcription factors
     – Energy



http://www.youtube.com/watch?v=WsofH466lqk
           Transcriptional gene regulation


                       Transcription factor
                       level (trans)
                              P1    P2
                                              HSP12
Transcription factor
binding sites (cis)           Promoter


                                              mRNA levels
             Regulation of GAL genes

• GAL genes are required for yeasts to grow on
  Galactose.
• There are 4 genes that are metabolic
   – GAL1, GAL10, GAL2 and GAL7
• There are three that are regulatory
   – GAL4, GAL80 and GAL3
                Regulation of GAL genes

 No Galactose




                                          A metabolic GAL gene

In Galactose
                     Transcriptome

• The entire set of RNA products in a cell
• A cell can decide to make more or less of a particular
  RNA
   – Levels change
• It’s constituents are context-specific
• Context is determined by environment of a cell
            Transcriptional Regulatory networks

• The entire set of
  interactions between TFs
  and genes in an
  organism
• The transcriptome is the
  output of a regulatory
  network




 Image courtesy: Dr. Mike Snyder, http://compbio.pbworks.com/w/page/16252928/Transcription%20Regluatory%20Network#1
           Understanding cells requires an iterative
             approach spanning multiple levels




Ideker et al., Science 2002
                      Summary

• Cells are made up of many different molecular
  entities
• Functional genomics enables us to identify these
  entities
• Cells function via the interaction of these entities
• Putting it together into comprehensive models is a
  major goal of systems biology

								
To top