Types and Sources
• Mutation is a decay force whose ultimate roots are in the second law of
thermodynamics (entropy). Living things survive inevitable mutations by a
combination of being tolerant of a certain level of mutation, repairing mutational
damage, killing cells that are mutated beyond repair, and relying on natural selection
to remove individuals with unfavorable mutations.
• Simple mutations: base substitutions and small indels. “Indel” stands for insertion-
deletion, which is based on the idea that when you see a difference in DNA sequence
between two species it is usually difficult to tell whether there was an insertion in one
species or a deletion in the other.
• More complex mutations are larger events involving the insertion, rearrangement, or
deletion of large pieces of DNA. Typical events include fusion of two different genes
and insertion of transposable elements.
• Internal sources: DNA polymerase can insert the wrong nucleotide or slip at a certain
rate. Transposable elements can move, cause other sections of DNA to move, or
produce reverse transcriptase that acts on other messenger RNAs.
• External sources: damage to the DNA caused by chemicals in the environment,
including oxygen, or by radiation.
• DNA polymerase, the enzyme that
replicates DNA, is not perfectly
accurate. One problem is that bases
spontaneously undergo a “keto-enol
shift”, where a hydrogen moves its
position in ketones. Guanine and
thymine bases are subject to this at a
low rate, and it causes mispairing.
• DNA polymerase has a proofreading
function, a 3’ to 5’ exonuclease
activity, which backs up and removes
newly inserted nucleotides if they are
mispaired. This function lowers the
DNA polymerase error rate from about
1 error in 106 nucleotides to about 1 in
109. Still, that is about 6 errors every
time the genome is replicated.
• DNA polymerase also can slip,
especially when replicating short
repeats (microsatellites). This
generates small indels.
• Another chemical instability is that cytosine
occasionally gets deaminated: it loses an amino
group. This converts it into uracil, which is not a DNA
base and is removed by repair enzymes.
• However, in many places, a C followed by a G (CpG:
the “p” is the connecting phosphate) gets methylated:
a CH3 group is attached to the 5 position on the ring.
• When 5-methyl cytosine is spontaneously
deaminated, it is converted to thymine, a standard
DNA base. Replication leads to a base change: one
daughter stays a C-G base pair while the other is
converted to T-A.
• Over evolutionary time, this has led to a loss of CpG
dinucleotides in human DNA.
• However, methylation of cytosine is associated with
gene inactivation, and genes that are expressed in
most cells (housekeeping genes) usually do not have
methylated cytosines at their 5’ ends. In these areas,
the frequency of CpG stays high.
• These areas of high CpG are called “CpG islands”.
There are about 30,00 of them in the human genome,
and most of them are associated with genes.
• However, the presence of a CpG island does not
necessarily imply the existence of a gene, and vice
• Two basic types:
– transition: converting one purine to the other purine, or
one pyrimidine into the other pyrimidine.
– transversion: converting a purine to a pyrimidine or the
• Logically, transversions should be twice as frequent
since there are twice as many of them as transitions.
• However, in practice, transitions are about twice as
common as transversion. Due to a combination of
natural selection and ease of occurrence.
• Neutral substitution rate: how often to nucleotides
change in the absence of selection pressure. In a
comparison of the human and mouse genomes, 165
Mbp of DNA associated with non-functional transposon
sequences were identified in both species. These had
about 67% identical bases, and models implied a rate
of 0.46 substitutions per position over the 75 million
years since the human and mouse lineages diverged..
This works out to 2 x 10-9 substitutions per year for
each site, in the absence of selection pressure. This
estimate agrees with other estimates based on
Substitutions Within Genes
• We mostly care about the functional parts of
the genome, the genes and their control
regions. Since most of the genes are
presumably necessary for life, some
mutations will be deleterious and others not.
• In the human-mouse genome comparison,
variation in the rate of substitutions across
the various portions of genes was clear:
fewest in the exons, most in the introns, and
an intermediate amount in the UTRs and
• For coding regions, the degeneracy of the
genetic code has a large effect.
– some sites are non-degenerate: any change
results in a different amino acid. 65% of
– other sites are two-fold degenerate:
transitions give the same amino acid while
transversions give a different amino acid. 19%
of codon sites.
– other sites are four-fold degenerate: any
mutation gives the same amino acid. These
sites are all third positions of codons. 16% of
• Mutations that give the same amino acid are
called silent or synonymous mutations.
They are presumed to be selectively neutral.
More on Substitution
• In addition to synonymous
mutations, some amino acid
changes are “conservative” in
that they have little or no affect
on the protein’s function.
– for example, isoleucine and
valine are both hydrophobic
and readily substitute for each
– other amino acid substitutions
are very unlikely: leucine
(hydrophobic) for aspartic acid
(hydrophilic and charged). This
would be a non-conservative
– Some amino acids play unique
roles: cysteines form disulfide
bridges, prolines induce kinks
in the chain, etc.
– However, some amino acids BLOSUM62 Table. Numbers on the diagonal
are critical fro active sites and indicate the likelihood of the amino acid
cannot be substituted. staying the same. The off-diagonal numbers
• Tables of substitution are relative substitution frequencies.
frequencies for all pairs of
amino acids have been
Detecting Natural Selection
• Patterns of base substitution within a gene can be used as evidence for natural
selection, by comparing the ratio of synonymous to non-synonymous substitutions.
• Compare orthologs: genes in two different species that can be traced to a common
• Can also compare paralogs within a species: genes resulting from duplication.
– a confounding problem: can you accurately identify orthologs between species, or are you
comparing paralogs between the species?
• Measured by comparing KS, the number of synonymous substitutions per site, to KA,
the number of non-synonymous substitutions per site. Note that these numbers are
corrected for the different levels of degeneracy for each site. The summary statistic
is the KA / KS ratio.
• Possible results.
– neutral selection: the gene is apparently not being selected. Often seen when a pseudogene
is compared to a functional gene. Synonymous and non-synonymous substitutions occur at
the same frequency. KA / KS = 1.
– negative (purifying) selection: the gene is being selected for similar functions in both species.
Synonymous substitutions are more frequent than non-synonymous. KA / KS < 1
– positive (disruptive) selection: the gene is being selected for different functions in the two
species. An unexpectedly high number of non-synonymous substitutions. KA / KS > 1
• The median KA / KS value for humans vs. mice was 0.115. The lowest value
(greatest purifying selection) was for calmodulin, histones, ribosomal proteins,
ubiquitin, actin: genes involved with critical cellular functions common to all
organisms. The highest ratios were seen for defense and immune response proteins
• Trinucleotide repeats (TNRs) are a type of microsatellite, an array of
3 bp repeats.
• DNA polymerase often slips at TNRs, increasing or decreasing the
• Because a codon is 3 bp long, TNRs within a coding region don’t
change the reading frame.
• However, some TNRs cause diseases even though they are in the
• There are only 10 possible TNRs, considering the two DNA strands
and the different orders you could write the bases. For example,
the TNR that causes Fragile X syndrome could be written as CCG,
CGC, GCC, GGC, or GCG.
• Below a certain number, the repeats are relatively stable. But,
above that, the copy number can change drastically in both mitosis
and meiosis. These alleles are called “pre-mutation alleles”. Above
an even higher point, the mutant phenotype appears.
• Huntington Disease. A dominant autosomal
disease, with most people heterozygotes.
• Onset usually in middle age.
• Neurological: starts with irritability and
depression, includes fidgety behavior and
involuntary movement (chorea), followed by
psychosis and death.
• Caused by CAG repeats within the coding
region, giving a tract of glutamines. Below
28 copies is normal, between 28 and 34
copies is the premutation allele: normal
phenotype but unstable copy number that
puts the next generation at risk. Above 34
copies gives the disease.
• HD shows “anticipation”: the age of onset
gets earlier with every generation. This is
due to a direct correlation between copy
number and age of onset.
• There is a genetic test for the disease, but in
the absence of effective treatment few
actually take the test.
• Function of the protein remains unknown,
the excess glutamines may cause it to
aggregate and lose function.
Fragile X Syndrome
• Fragile X syndrome. The most common form
of human mental retardation.
• The phenotype includes moderate to severe
mental retardation, macroorchidism, large
ears, prominent jaw, and high-pitched,
jocular speech. Expression is variable, with
mental retardation the most common
• Males having only 1 X, are affected more
frequently and severely than females.
• Appears as a secondary constriction on the
X, which appears in cells starved for folate.
The X can actually break at that point, but
this isn’t a common feature.
• Caused by CGG repeats in the 5’ UTR of
the FMR1 gene.
• Normal copy number is about 30. Between
55 and 200 copies, the copy number is
unstable, but the person is normal. Above
200 copies, the mutant phenotype appears.
• The gene gets heavily methylated and is not
• The function of the protein is unclear, but it
is an RNA-binding protein that seems to be
involved with translational regulation,
possibly through RNA interference as part of
the RISC complex.
Mutations Affecting RNA
• Altered promoters, splice sites, poly-A
• If a cell contains two different copies of
a gene, either on homologous
chromosomes or as paralogs,
sometimes one copy will “convert” the
other copy to its sequence.
• Gene conversion (at least between
homologues) is a normal outcome of
recombination. We need to look at the
Holliday molecular model of
recombination to understand this. This
model is a bit simple compared to
current theory, but is still basically
• The homologues are paired in
prophase of meiosis 1.
• Single stranded breaks in both
homologues are catalyzed by
• The free ends invade the homologous
DNA, forming heteroduplexes.
• “Branch migration” occurs and the
heteroduplexes are extended.
More Gene Conversion
• Recombinase cuts the DNA
• Two possibilities at this
point, occurring with equal
• 1. A “north-south” cut occurs
after the 2 DNA molecules
twist relative to each other.
The result is a crossover: the
two homologues are broken
and rejoined at this point,
chromosomes. Note that
there is a heteroduplex
region at the breakpoint.
More Gene Conversion
• 3. The other possibility is
that an “east-west” cut
occurs. This gives a
region, but the 2
chromosomes are still
intact: no crossover has
• However, if the
within a gene that is
being monitored, it will
result in an offspring with
an altered gene: gene
Steroid 21-Hydroxylase Deficiency
• The medical condition is “congenital adrenal hyperplasia”, and autosomal
recessive condition. 21-hydroxylase is an enzyme necessary for converting
cholesterol into aldosterone and cortisol. Aldosterone affects kidney
function: causes salt to be retained. Cortisol is the main stress response
• The biggest problem is that hormone precursors build up in the adrenals
and get converted to testosterone, the major male hormone. This causes
the external genitalia to develop into the male pattern, or develop
“ambiguous genitalia” regardless of the individual’s gender (“virilization”). In
milder cases, and in males, puberty occurs early in childhood. Female
embryos develop a normal uterus and ovaries.
• In some cases, salt is not retained in the body well, which is life-threatening
but treatable with hormones.
• The functional gene, CYP21A2, is located about 30 kb from a pseudogene,
CYP21A2P on chromosome 6p. The pseudogene contains 9 mutations that
inactivate it. Almost all cases result from one of two causes:
– An unequal crossing over between these loci, resulting in a normal 5’ end of the
gene and a mutant 3’ end (from the pseudogene), plus deletion of all teh
– Gene conversion converts part of the normal allele to the pseudogene sequence.
Hemophilia A: Inversion Problems
• The clotting factor VIII gene, F8, is on the X
chromosome and is the major cause of
• F8 is a large gene, and completely contained
within intron 22 are two small genes
transcribed from the opposite strand.
• One of these genes, F8A, has another copy
several hundred kb away, on the opposite
strand. Thus, these two very similar genes
are in opposite orientation.
• Sometimes crossing over during meiosis will
pair these regions are recombination will
occur. This results in an inversion.
• The inversion completely disrupts the main
F8 gene, because its 5’ half is now inverted
and far away from its 3’ half.
• This accounts for about 45% of hemophilia A
• Almost all new cases arise during male
meiosis: in females, the two homologous X
chromosomes are paired, which seems to
inhibit this inversion.
Transposable Element Insertions
• Functional copies of LINE-1 elements, Alu sequences,
and some endogenous retroviral sequences (LTR
retrotransposons) exist in the human genome. They
occasionally transpose into genes that give a detectable
• The first examples found were two independent
insertions of the 3’ end of LINE-1 into exons of the
clotting factor 8 gene. Additional examples have been
• Transposable element movement has also been
implicated in cancer and the chromosome
rearrangements that accompany it.
• Recombination between Alu sequences in different parts
of the genome can generate deletions.
• A list of agents that damage DNA:
– ionizing radiation: induces breaks in DNA
– Ultraviolet light: crosslinks adjacent thymidines (thymidine
– alkylating agents: attach hydrocarbon groups to bases, either
blocking DNA polymerase or crosslinking the bases
– intercalating agents: slip between the DNA bases and cause
DNA polymerase to insert extra bases or misread the sequence.
– depurination: the link between purine bases and the deoxyribose
– deaminination: loss of amino group from cytosine convers it to
– reactive oxygen: peroxide and superoxide attack the purine and
• There are at least 5 separate DNA repair
mechanisms in human cells
• Direct repair, simply reversing the damage,
is possible in some cases, notably removing
methyl groups from guanine.
• Base excision repair. A damaged base is
removed from its sugar by a DNA
glycosylase (several types). After this, the
DNA strand is cut by AP endonuclease and
the sugar-phosphate without its base is
removed from the DNA chain. A new
nucleotide is added by DNA polymerase and
the chain is re-ligated.
• Nucleotide excision repair. Abnormal bases,
including thymidine dimers, are removed
along with a number of surrounding bases.
The missing section is then re-synthesized
and ligated. Xeroderma pigmentosum, a
genetic disease that causes extreme
sensitivity to sunlight, is due to defects in
this repair system.
• Post-replication repair. Double stranded
breaks are repaired by randomly joining
DNA ends, or by a gene-conversion-like
mechanism that involves the homologous
chromosome. The breast cancer
susceptibility genes BRCA1 and BRCA2 are
involved in this pathway.
• Mismatch repair. Mispaired bases (those
not caught by the DNA polymerase’s editing
function) are repaired by an enzyme
complex that moves along the DNA. When
it finds a mismatched base pair, it removes a
number of bases on one of the DNA strands
and re-synthesizes them. The gene for
hereditary non-polyposis colon cancer is
involved in this system.
• In addition, cells with DNA damage are often
induced to kill themselves through the
process of apoptosis, or they stop dividing
by not entering the S phase of the cell cycle.
More on this when we talk about cancer.