Changes in DNA Mutation

Document Sample
Changes in DNA Mutation Powered By Docstoc
					Changes in DNA
•   Any change in the DNA sequence of an organism is a mutation.
•   Mutation is a decay force whose ultimate roots are in the second law of
    thermodynamics (entropy). Living things survive inevitable mutations by a
    combination of being tolerant of a certain level of mutation, repairing
    mutational damage, killing cells that are mutated beyond repair, and relying
    on natural selection to remove individuals with unfavorable mutations.
•   Mutations are the source of the altered versions of genes that provide the
    raw material for evolution.
•   A central tenet of biology is that the flow of information from DNA to protein
    is one way. DNA cannot be altered in a directed way by changing the
    environment. Only random DNA changes occur.

•   Some terminology: the genotype is the organism’s genetic constitution, at
    the bottom, the sequence of its DNA. The phenotype is the physical
    characteristics of the organism: its appearance, biochemistry, reactions to
    the environment, etc.
     – before DNA sequencing, the genotype was deduced from the phenotypes of
       parents and offspring.
     – the point of genome annotation is to deduce the phenotype that will result from a
       given genotype.
     More Mutation Generalities
• Most mutations have no effect on the organism, especially among
  the eukaryotes, because a large portion of the DNA is not in genes
  and thus does not affect the organism’s phenotype.
• Even within genes, mutations can have little or no effect
    – the genetic code is degenerate: some mutations ar translated into the
      same amino acid
    – many amino acid changes have little or no effect on protein function.
• Of the mutations that do affect the phenotype, the most common
  effect of mutations is lethality, because most genes are necessary
  for life.

• From a bioinformatics point of view, the three simplest types of
  mutation: base substitution, small insertions and deletions, and
  simple sequence repeats, affect sequence alignment programs.
  Larger mutations such as transposable element movements,
  recombination-induced mutations, and general chromosome
  rearrangements, affect large scale issues such as genomic maps.
                Base Change Mutations
•   The simplest mutations are base changes,
    where one base is converted to another.
    (Also called “substitutions”, or “point
    mutations”.) These can be classified as either:
     –   --“transitions”, where one purine is changed to
         another purine (A -> G, for example), or one
         pyrimidine is changed to another pyrimidine (T -
         > C, for example).
     –   “transversions”, where a purine is substituted for
         a pyrimidine, or a pyrimidine is substituted for a
         purine. For example, A -> C.
•   Transitions are more common than
    transversions, because they are easier to
    create, and because transitions often have
    less drastic effects than transversions.
                                                                   A      C    G      T
•   Base change mutations are the cause of                    A   0.6   0.1   0.2   0.1
    single nucleotide polymorphisms (SNPs).
    Mapping SNPs is the current best way to                   C   0.1   0.6   0.1   0.2
    locate human disease genes.
•   Base change mutations are the most common                 G   0.2   0.1   0.6   0.1
    mutations, and they are the easiest to handle
    for statistics and evolutionary studies.                  T   0.1   0.2   0.1   0.6
           Base Change Causes
• Base changes occur naturally as errors in replication: the
  wrong base gets inserted.
    – DNA polymerase has an editing function that detects most
      errors, then backs up, removes the wrong base and puts in the
      proper base.
    – enzymes that replicate RNA don’t have the editing function, so
      their error rate is 100 x that of DNA polymerase, causing the high
      mutation rate of RNA viruses.
•  Various chemical changes in a base can cause
  mutation. For instance, the spontaneous loss of the
  amino group on cytosine converts it to uracil (which will
  pair with A, not G).
• environmental chemicals that attach bulky groups onto
  bases (alkylating agents) can cause the bases to be mis-
  read by DNA polymerase.
           Phenotypic Effects of Base
•   Mutations can be classified according to their effects on the protein (or mRNA)
    produced by the gene that is mutated.
•   1. Silent mutations (synonymous mutations). Since the genetic code is degenerate,
    several codons produce the same amino acid. Especially, third base changes often
    have no effect on the amino acid sequence of the protein. These mutations affect the
    DNA but not the protein. Therefore they are called neutral mutations, mutations
    which should have no effect on the organism’s phenotype.
•   2. Missense mutations. Missense mutations substitute one amino acid for another.
    Some missense mutations have very large effects, while others have minimal or no
    effect. It depends on where the mutation occurs in the protein’s structure, and how
    big a change in the type of amino acid it is.
•   3. Nonsense mutations convert an amino acid into a stop codon. The effect is to
    shorten the resulting protein. Sometimes this has only a little effect, as the ends of
    proteins are often relatively unimportant to function. However, often nonsense
    mutations result in completely non-functional proteins.
•   4. Sense mutations are the opposite of nonsense mutations. Here, a stop codon is
    converted into an amino acid codon. Since DNA outside of protein-coding regions
    contains an average of 3 stop codons per 64, the translation process usually stops
    after producing a slightly longer protein.

•   Base changes can also affect RNA initiation, splicing and termination.
                     More on Substitution
•   In addition to synonymous
    mutations, some amino acid
    changes are “conservative” in
    that they have little or no affect
    on the protein’s function.
     –   for example, isoleucine and
         valine are both hydrophobic
         and readily substitute for each
     –   other amino acid substitutions
         are very unlikely: leucine
         (hydrophobic) for aspartic acid
         (hydrophilic and charged). This
         would be a non-conservative
     –   Some amino acids play unique
         roles: cysteines form disulfide
         bridges, prolines induce kinks
         in the chain, etc.
     –   However, some amino acids           BLOSUM62 Table. Numbers on the diagonal
         are critical fro active sites and   indicate the likelihood of the amino acid
         cannot be substituted.              staying the same. The off-diagonal numbers
•   Tables of substitution                   are relative substitution frequencies.
    frequencies for all pairs of
    amino acids have been
•   Another simple type of mutation is the gain
    or loss of one or a few bases. These
    mutations are called indels, which is short
    for “insertion/deletion”.
     – When comparing two species it isn’t easy to
       tell whether an insertion occurred in one
       species or a deletion occurred in the other.
•   Indels are thought to be generated when
    the DNA polymerase slips forward or
    backward on the template DNA it is
     – This occurs most easily in repeated
       sequences, but can occur anywhere.

•   A second cause of short indels is
    chemical- or radiation-induced loss of the
    base portion of the nucleotide. The DNA
    polymerase often skips right over these
    sugar/phosphate stumps, leaving a
    missing base in the resulting DNA chain.
      Frameshifts and Reversions
•   Translation occurs codon by codon,
    examining nucleotides in groups of 3.
    If a nucleotide or two is added or
    removed, the groupings of the codons
    is altered. This is a frameshift
    mutation, where the reading frame of
    the ribosome is altered.
•   Frameshift mutations result in all
    amino acids downstream from the
    mutation site being completely
    different from wild type. These
    proteins are generally non-functional.

•   A reversion is a second mutation that
    reverse the effects of an initial
    mutation, bringing the phenotype back
    to wild type (or almost).
     –   Frameshift mutations sometimes have
         “second site reversions”, where a
         second frameshift downstream from the
         first frameshift reverses the effect.
    Microsatellites/Simple Sequence Repeats
•   Two words for the same phenomenon.
•   During replication, DNA polymerase can “stutter” when it replicates several tandem
    copies of a short sequence, say 2-5 bp.
     –   For example, CAGCAGCAGCAG, 4 copies of CAG, will occasionally be converted to 3
         copies or 5 copies by DNA polymerase stuttering.
•   Outside of genes, this effect produces useful genetic markers called SSR (simple
    sequence repeats).
•   They are heavily used in genetic mapping, for several reasons.
     –   They are easy to detect,
     –   They are fairly stable across generations yet have a high enough mutation rate that many
         alleles exist in the population.
     –   They are found in many locations in the genome of all organisms.
•   Within a gene, this effect can cause certain amino acids to be repeated many times
    within the protein. In some cases this causes disease
                    Huntington Disease
•   Huntington Disease. A dominant autosomal
    disease, with most people heterozygotes.
•   Onset usually in middle age.
•   Neurological: starts with irritability and
    depression, includes fidgety behavior and
    involuntary movement (chorea), followed by
    psychosis and death.
•   Caused by CAG repeats within the coding
    region, giving a tract of glutamines. Below
    28 copies is normal, between 28 and 34
    copies is the premutation allele: normal
    phenotype but unstable copy number that
    puts the next generation at risk. Above 34
    copies gives the disease.
•   HD shows “anticipation”: the age of onset
    gets earlier with every generation. This is
    due to a direct correlation between copy
    number and age of onset.
•   There is a genetic test for the disease, but in
    the absence of effective treatment few
    actually take the test.
•   Function of the protein remains unknown,
    the excess glutamines may cause it to
    aggregate and lose function.
            Larger Scale Mutations
•   Larger mutations include insertion of whole new
    sequences, often due to movements of transposable
    elements in the DNA or to chromosome changes such
    as inversions or translocations.
•   Deletions of large segments of DNA also occurs.
•   These phenomena affect the order of genes on the
     – In classical genetics, synteny means that two genes are
       on the same chromosome. This term has a slightly
       different meaning in genomics and bioinformatics: that a
       group of genes are in the same order on the chromosome
       in different species.
     – Synteny tends to be conserved in closely related species,
       but breaks down in more distantly related species.
•   Also, the genes at the breakpoints of a large scale
    mutation are often broken in half or otherwise
              Transposable Elements
•   Transposable elements are DNA sequences that
    move from place to place in the genome. Unlike
    genes, transposable elements don’t have a fixed
    location on the chromosome.
•   Transposable elements are essentially parasites. In
    general they don’t contribute to the evolutionary
    fitness of the organism.
•   Most of the genes in an organism are necessary, at
    least under some circumstances, for the organism’s
    survival. Genes avoid being destroyed by random
    mutations because individuals with mutated genes
    are less fit: don’t survive or reproduce as well as
    unmutated individuals.
•   Transposable elements avoid being destroyed by
    increasing their numbers by enough to keep some
    functional copies present even if some are
     – However, too much increase in numbers will kill the
       organism because sometimes transposable elements
       insert within a gene, inactivating it.
           More Transposable Elements
•   Two basic types: those that are strictly DNA, and those that replicate
    through an RNA intermediate. These are sometimes called type 1 and
    type 2, but I have a hard time keeping those arbitrary numbers
    straight. The most important nomenclature issue is that the prefix
    “retro-” implies the use of reverse transcriptase, which copied RNA
    into DNA, the defining characteristic of RNA-intermediate
    transposable elements.
•   Eukaryotes often contain very (200-500 bp) elements that contain the
    ends of a longer DNA transposon and miscellaneous junk inside.
    They move to new locations using the transposase enzyme from a full
    length element.
•   Most bacterial TEs are DNA only. In eukaryotes, DNA transposable
    elements occur, but are less common than retrotransposons.
      – Transposable elements were first studied by Barbara
          McClintock in corn. They are an important source of the
          variation seen in ornamental flowers.
•   Most common type in bacteria: Insertion Sequences (IS)
      – roughly 1-3 kbp long, containing a transposase gene, and are
          bounded by short (10-40 bp) inverted repeats
      – many different families, not well conserved across species
•   Transposons are longer TEs, usually composed of 2 IS elements and
    a gene(s) in between, often an antibiotic resistance gene.
                            Retro Elements
•   RNA transposable elements are called retrotransposons in
    eukaryotes. They are characterized by the use of reverse
    transcriptase in their life cycle.
•   They are related to retroviruses, such as HIV, feline leukemia
    virus, etc
     –   . Retrotransposons lack the gene necessary to move outside the
•   There are a variety of retro element types, some of which
    contain long terminal repeats (LTRs) and some of which don’t.
•   Also, there are many non-functional, degenerate sequences
    in eukaryotic genomes that started out as retrotransposons.
     –   Up to 25% of the human genome.
•   In bacteria, the common RNA TE is a “mobile group II intron”.
      – When transcribed into messenger RNA they can splice
         themselves out without the need for proteins
      – group II introns contain a gene for reverse transcriptase,
         which copies the RNA back into DNA at a new location
         in the genome.
 Recombination-Induced Mutations
• Most recombination occurs between
  homologous sites: two chromosomes
  line up in meiosis and have a break-
  and-rejoin event at the same location,
  resulting in daughter chromosomes
  that contain a mixture of alleles from
  both parents.
• However, any two sites that contain
  similar DNA sequences can pair up
  and have a crossover. These events
  can significantly rearrange the
    Hemophilia A: Inversion Problems
•   The clotting factor VIII gene, F8, is on the X
    chromosome and is the major cause of
•   F8 is a large gene, and completely contained
    within intron 22 are two small genes
    transcribed from the opposite strand.
•   One of these genes, F8A, has another copy
    several hundred kb away, on the opposite
    strand. Thus, these two very similar genes
    are in opposite orientation.
•   Sometimes crossing over during meiosis will
    pair these regions are recombination will
    occur. This results in an inversion.
•   The inversion completely disrupts the main
    F8 gene, because its 5’ half is now inverted
    and far away from its 3’ half.
•   This accounts for about 45% of hemophilia A
•   Almost all new cases arise during male
    meiosis: in females, the two homologous X
    chromosomes are paired, which seems to
    inhibit this inversion.
             Tandem Duplications
• Genes are duplicated if there is more than one copy present in the
  haploid genome.
    – Some duplications are “dispersed”, found in very different locations from
      each other.
    – Other duplications are “tandem”, found next to each other.
• Tandem duplications play a major role in evolution, because it is
  easy to generate extra copies of the duplicated genes through the
  process of unequal crossing over.
    – These extra copies can then mutate to take on altered roles in the cell,
      or they can become pseudogenes, inactive forms of the gene, by
• Most commonly tandem duplications affect only one gene, resulting
  in an array of very similar genes.
    – Sometimes duplicated regions exist within a gene, which can cause
      havoc in trying to align the sequences
                   Unequal Crossing Over
•   Unequal crossing over happens during prophase
    of meiosis 1. Homologous chromosomes pair at
    this stage, and sometimes pairing occurs between
    the similar but not identical copies of a tandem
    duplication. If a crossover occurs within the
    mispaired copies, one of the resulting gametes will
    have an extra copy of the duplication and the
    other will be missing a copy.
•   As an example, the beta-globin gene cluster in
    humans contains 6 genes, called epsilon (an
    embryonic form), gamma-G, gamma-A (the
    gammas are fetal forms), pseudo-beta-one (an
    inactive pseudogene), delta (1% of adult beta-type
    globin), and beta (99% of adult beta-type globin.
    Gamma-G and gamma-A are very similar, differing
    by only 1 amino acid.
•   If mispairing in meiosis occurs, followed by a
    crossover between delta and beta, the hemoglobin
    variant Hb-Lepore is formed. This is a gene that
    starts out delta and ends as beta. Since the gene
    is controlled by DNA sequences upstream from
    the gene, Hb-Lepore is expressed as if it were a
    delta. That is, it is expressed at about 1% of the
    level that beta is expressed. Since normal beta
    globin is absent in Hb-Lepore, the person has
    severe anemia.
              Chromosome Breaks
•   DNA sometimes breaks due to mechanical stress,
    ionizing radiation, or chemical attack.
•   Most organisms contain enzymes that reassemble
    broken DNA molecules, called non-homologous
    end joining.
•   If there is more than one break, ends are joined
    randomly, which can lead to a rearranged
     – This breaks up blocks of genes over evolutionary
          Horizontal Gene Transfer
•   In eukaryotes, there is little doubt that almost all
    genes are transmitted from parent to offspring,
    with each species having a separate line of
     –   Large exceptions: endosymbionts, the mitochondria
         and chloroplasts. Many genes from these formerly
         free-living organisms have migrated into the nucleus.
     –   There are other cases of single genes being
         transferred horizontally.
•   This is much less true in the prokaryotes, where a
    great deal of DNA is transferred across species
     –   I have seen an estimate of 15% of all prokaryotic
         genes are derived from horizontal transfers
•   Horizontal gene transfer is usually identified by
    performing phylogenetic linage studies on
    individual genes, and seeing that some gene has
    more in common with genes in distant species
    than with genes in closely related species.
          Sources of New DNA
•   Bacteria reproduce by binary
    fission: replicating their DNA,
    then splitting in half. Each
    cell has only 1 parent, and
    there is no regular sexual
•   Bacteria have 3 main ways
    of bringing in new DNA:
     – conjugation: direct transfer
       of DNA between 2 cells
       (although not necessarily of
       the same species)
     – transduction: transfer of
       DNA between cells using a
       bacteriophage (virus) as an
     – transformation: the cell
       takes up DNA molecules
       from the environment
      Lysogenic Bacteriophage
•   Bacteriophage (phage) are bacterial viruses: DNA (or RNA) surrounded by
    a protein coat, but with no internal metabolic activity.
•   Most bacteriophage enter the cell, hijack its machinery to reproduce
    themselves, and then kill the cell by lysing it (breaking it open). This is
    called the lytic cycle.
•   Some phage have the ability to insert themselves into the bacterial genome
    and remain there, inactive, for many generations: the lysogenic cycle.
     – First described in phage lambda
     – the inserted phage chromosome is called the prophage.
•   When conditions get harsh, the phage DNA comes out of the chromosome
    and enters the normal lytic pathway. It reproduces and kills the host cell.
•   Sometimes the prophage is inactivated by mutation and becomes a
    permanent part of the chromosome.

Shared By: