Changes in DNA Mutations • Any change in the DNA sequence of an organism is a mutation. • Mutation is a decay force whose ultimate roots are in the second law of thermodynamics (entropy). Living things survive inevitable mutations by a combination of being tolerant of a certain level of mutation, repairing mutational damage, killing cells that are mutated beyond repair, and relying on natural selection to remove individuals with unfavorable mutations. • Mutations are the source of the altered versions of genes that provide the raw material for evolution. • A central tenet of biology is that the flow of information from DNA to protein is one way. DNA cannot be altered in a directed way by changing the environment. Only random DNA changes occur. • Some terminology: the genotype is the organism’s genetic constitution, at the bottom, the sequence of its DNA. The phenotype is the physical characteristics of the organism: its appearance, biochemistry, reactions to the environment, etc. – before DNA sequencing, the genotype was deduced from the phenotypes of parents and offspring. – the point of genome annotation is to deduce the phenotype that will result from a given genotype. More Mutation Generalities • Most mutations have no effect on the organism, especially among the eukaryotes, because a large portion of the DNA is not in genes and thus does not affect the organism’s phenotype. • Even within genes, mutations can have little or no effect – the genetic code is degenerate: some mutations ar translated into the same amino acid – many amino acid changes have little or no effect on protein function. • Of the mutations that do affect the phenotype, the most common effect of mutations is lethality, because most genes are necessary for life. • From a bioinformatics point of view, the three simplest types of mutation: base substitution, small insertions and deletions, and simple sequence repeats, affect sequence alignment programs. Larger mutations such as transposable element movements, recombination-induced mutations, and general chromosome rearrangements, affect large scale issues such as genomic maps. Base Change Mutations • The simplest mutations are base changes, where one base is converted to another. (Also called “substitutions”, or “point mutations”.) These can be classified as either: – --“transitions”, where one purine is changed to another purine (A -> G, for example), or one pyrimidine is changed to another pyrimidine (T - > C, for example). – “transversions”, where a purine is substituted for a pyrimidine, or a pyrimidine is substituted for a purine. For example, A -> C. • Transitions are more common than transversions, because they are easier to create, and because transitions often have less drastic effects than transversions. A C G T • Base change mutations are the cause of A 0.6 0.1 0.2 0.1 single nucleotide polymorphisms (SNPs). Mapping SNPs is the current best way to C 0.1 0.6 0.1 0.2 locate human disease genes. • Base change mutations are the most common G 0.2 0.1 0.6 0.1 mutations, and they are the easiest to handle for statistics and evolutionary studies. T 0.1 0.2 0.1 0.6 Base Change Causes • Base changes occur naturally as errors in replication: the wrong base gets inserted. – DNA polymerase has an editing function that detects most errors, then backs up, removes the wrong base and puts in the proper base. – enzymes that replicate RNA don’t have the editing function, so their error rate is 100 x that of DNA polymerase, causing the high mutation rate of RNA viruses. • Various chemical changes in a base can cause mutation. For instance, the spontaneous loss of the amino group on cytosine converts it to uracil (which will pair with A, not G). • environmental chemicals that attach bulky groups onto bases (alkylating agents) can cause the bases to be mis- read by DNA polymerase. Phenotypic Effects of Base Changes • Mutations can be classified according to their effects on the protein (or mRNA) produced by the gene that is mutated. • 1. Silent mutations (synonymous mutations). Since the genetic code is degenerate, several codons produce the same amino acid. Especially, third base changes often have no effect on the amino acid sequence of the protein. These mutations affect the DNA but not the protein. Therefore they are called neutral mutations, mutations which should have no effect on the organism’s phenotype. • 2. Missense mutations. Missense mutations substitute one amino acid for another. Some missense mutations have very large effects, while others have minimal or no effect. It depends on where the mutation occurs in the protein’s structure, and how big a change in the type of amino acid it is. • 3. Nonsense mutations convert an amino acid into a stop codon. The effect is to shorten the resulting protein. Sometimes this has only a little effect, as the ends of proteins are often relatively unimportant to function. However, often nonsense mutations result in completely non-functional proteins. • 4. Sense mutations are the opposite of nonsense mutations. Here, a stop codon is converted into an amino acid codon. Since DNA outside of protein-coding regions contains an average of 3 stop codons per 64, the translation process usually stops after producing a slightly longer protein. • Base changes can also affect RNA initiation, splicing and termination. More on Substitution • In addition to synonymous mutations, some amino acid changes are “conservative” in that they have little or no affect on the protein’s function. – for example, isoleucine and valine are both hydrophobic and readily substitute for each other. – other amino acid substitutions are very unlikely: leucine (hydrophobic) for aspartic acid (hydrophilic and charged). This would be a non-conservative substitution. – Some amino acids play unique roles: cysteines form disulfide bridges, prolines induce kinks in the chain, etc. – However, some amino acids BLOSUM62 Table. Numbers on the diagonal are critical fro active sites and indicate the likelihood of the amino acid cannot be substituted. staying the same. The off-diagonal numbers • Tables of substitution are relative substitution frequencies. frequencies for all pairs of amino acids have been generated. Indels • Another simple type of mutation is the gain or loss of one or a few bases. These mutations are called indels, which is short for “insertion/deletion”. – When comparing two species it isn’t easy to tell whether an insertion occurred in one species or a deletion occurred in the other. • Indels are thought to be generated when the DNA polymerase slips forward or backward on the template DNA it is copying. – This occurs most easily in repeated sequences, but can occur anywhere. • A second cause of short indels is chemical- or radiation-induced loss of the base portion of the nucleotide. The DNA polymerase often skips right over these sugar/phosphate stumps, leaving a missing base in the resulting DNA chain. Frameshifts and Reversions • Translation occurs codon by codon, examining nucleotides in groups of 3. If a nucleotide or two is added or removed, the groupings of the codons is altered. This is a frameshift mutation, where the reading frame of the ribosome is altered. • Frameshift mutations result in all amino acids downstream from the mutation site being completely different from wild type. These proteins are generally non-functional. • A reversion is a second mutation that reverse the effects of an initial mutation, bringing the phenotype back to wild type (or almost). – Frameshift mutations sometimes have “second site reversions”, where a second frameshift downstream from the first frameshift reverses the effect. Microsatellites/Simple Sequence Repeats • Two words for the same phenomenon. • During replication, DNA polymerase can “stutter” when it replicates several tandem copies of a short sequence, say 2-5 bp. – For example, CAGCAGCAGCAG, 4 copies of CAG, will occasionally be converted to 3 copies or 5 copies by DNA polymerase stuttering. • Outside of genes, this effect produces useful genetic markers called SSR (simple sequence repeats). • They are heavily used in genetic mapping, for several reasons. – They are easy to detect, – They are fairly stable across generations yet have a high enough mutation rate that many alleles exist in the population. – They are found in many locations in the genome of all organisms. • Within a gene, this effect can cause certain amino acids to be repeated many times within the protein. In some cases this causes disease Huntington Disease • Huntington Disease. A dominant autosomal disease, with most people heterozygotes. • Onset usually in middle age. • Neurological: starts with irritability and depression, includes fidgety behavior and involuntary movement (chorea), followed by psychosis and death. • Caused by CAG repeats within the coding region, giving a tract of glutamines. Below 28 copies is normal, between 28 and 34 copies is the premutation allele: normal phenotype but unstable copy number that puts the next generation at risk. Above 34 copies gives the disease. • HD shows “anticipation”: the age of onset gets earlier with every generation. This is due to a direct correlation between copy number and age of onset. • There is a genetic test for the disease, but in the absence of effective treatment few actually take the test. • Function of the protein remains unknown, the excess glutamines may cause it to aggregate and lose function. Larger Scale Mutations • Larger mutations include insertion of whole new sequences, often due to movements of transposable elements in the DNA or to chromosome changes such as inversions or translocations. • Deletions of large segments of DNA also occurs. • These phenomena affect the order of genes on the chromosome. – In classical genetics, synteny means that two genes are on the same chromosome. This term has a slightly different meaning in genomics and bioinformatics: that a group of genes are in the same order on the chromosome in different species. – Synteny tends to be conserved in closely related species, but breaks down in more distantly related species. • Also, the genes at the breakpoints of a large scale mutation are often broken in half or otherwise disrupted. Transposable Elements • Transposable elements are DNA sequences that move from place to place in the genome. Unlike genes, transposable elements don’t have a fixed location on the chromosome. • Transposable elements are essentially parasites. In general they don’t contribute to the evolutionary fitness of the organism. • Most of the genes in an organism are necessary, at least under some circumstances, for the organism’s survival. Genes avoid being destroyed by random mutations because individuals with mutated genes are less fit: don’t survive or reproduce as well as unmutated individuals. • Transposable elements avoid being destroyed by increasing their numbers by enough to keep some functional copies present even if some are destroyed. – However, too much increase in numbers will kill the organism because sometimes transposable elements insert within a gene, inactivating it. More Transposable Elements • Two basic types: those that are strictly DNA, and those that replicate through an RNA intermediate. These are sometimes called type 1 and type 2, but I have a hard time keeping those arbitrary numbers straight. The most important nomenclature issue is that the prefix “retro-” implies the use of reverse transcriptase, which copied RNA into DNA, the defining characteristic of RNA-intermediate transposable elements. • Eukaryotes often contain very (200-500 bp) elements that contain the ends of a longer DNA transposon and miscellaneous junk inside. They move to new locations using the transposase enzyme from a full length element. • Most bacterial TEs are DNA only. In eukaryotes, DNA transposable elements occur, but are less common than retrotransposons. – Transposable elements were first studied by Barbara McClintock in corn. They are an important source of the variation seen in ornamental flowers. • Most common type in bacteria: Insertion Sequences (IS) – roughly 1-3 kbp long, containing a transposase gene, and are bounded by short (10-40 bp) inverted repeats – many different families, not well conserved across species • Transposons are longer TEs, usually composed of 2 IS elements and a gene(s) in between, often an antibiotic resistance gene. Retro Elements • RNA transposable elements are called retrotransposons in eukaryotes. They are characterized by the use of reverse transcriptase in their life cycle. • They are related to retroviruses, such as HIV, feline leukemia virus, etc – . Retrotransposons lack the gene necessary to move outside the cell. • There are a variety of retro element types, some of which contain long terminal repeats (LTRs) and some of which don’t. • Also, there are many non-functional, degenerate sequences in eukaryotic genomes that started out as retrotransposons. – Up to 25% of the human genome. • In bacteria, the common RNA TE is a “mobile group II intron”. – When transcribed into messenger RNA they can splice themselves out without the need for proteins – group II introns contain a gene for reverse transcriptase, which copies the RNA back into DNA at a new location in the genome. Recombination-Induced Mutations • Most recombination occurs between homologous sites: two chromosomes line up in meiosis and have a break- and-rejoin event at the same location, resulting in daughter chromosomes that contain a mixture of alleles from both parents. • However, any two sites that contain similar DNA sequences can pair up and have a crossover. These events can significantly rearrange the genome. Hemophilia A: Inversion Problems • The clotting factor VIII gene, F8, is on the X chromosome and is the major cause of hemophilia. • F8 is a large gene, and completely contained within intron 22 are two small genes transcribed from the opposite strand. • One of these genes, F8A, has another copy several hundred kb away, on the opposite strand. Thus, these two very similar genes are in opposite orientation. • Sometimes crossing over during meiosis will pair these regions are recombination will occur. This results in an inversion. • The inversion completely disrupts the main F8 gene, because its 5’ half is now inverted and far away from its 3’ half. • This accounts for about 45% of hemophilia A cases. • Almost all new cases arise during male meiosis: in females, the two homologous X chromosomes are paired, which seems to inhibit this inversion. Tandem Duplications • Genes are duplicated if there is more than one copy present in the haploid genome. – Some duplications are “dispersed”, found in very different locations from each other. – Other duplications are “tandem”, found next to each other. • Tandem duplications play a major role in evolution, because it is easy to generate extra copies of the duplicated genes through the process of unequal crossing over. – These extra copies can then mutate to take on altered roles in the cell, or they can become pseudogenes, inactive forms of the gene, by mutation. • Most commonly tandem duplications affect only one gene, resulting in an array of very similar genes. – Sometimes duplicated regions exist within a gene, which can cause havoc in trying to align the sequences Unequal Crossing Over • Unequal crossing over happens during prophase of meiosis 1. Homologous chromosomes pair at this stage, and sometimes pairing occurs between the similar but not identical copies of a tandem duplication. If a crossover occurs within the mispaired copies, one of the resulting gametes will have an extra copy of the duplication and the other will be missing a copy. • As an example, the beta-globin gene cluster in humans contains 6 genes, called epsilon (an embryonic form), gamma-G, gamma-A (the gammas are fetal forms), pseudo-beta-one (an inactive pseudogene), delta (1% of adult beta-type globin), and beta (99% of adult beta-type globin. Gamma-G and gamma-A are very similar, differing by only 1 amino acid. • If mispairing in meiosis occurs, followed by a crossover between delta and beta, the hemoglobin variant Hb-Lepore is formed. This is a gene that starts out delta and ends as beta. Since the gene is controlled by DNA sequences upstream from the gene, Hb-Lepore is expressed as if it were a delta. That is, it is expressed at about 1% of the level that beta is expressed. Since normal beta globin is absent in Hb-Lepore, the person has severe anemia. Chromosome Breaks • DNA sometimes breaks due to mechanical stress, ionizing radiation, or chemical attack. • Most organisms contain enzymes that reassemble broken DNA molecules, called non-homologous end joining. • If there is more than one break, ends are joined randomly, which can lead to a rearranged genome. – This breaks up blocks of genes over evolutionary time Horizontal Gene Transfer • In eukaryotes, there is little doubt that almost all genes are transmitted from parent to offspring, with each species having a separate line of descent. – Large exceptions: endosymbionts, the mitochondria and chloroplasts. Many genes from these formerly free-living organisms have migrated into the nucleus. – There are other cases of single genes being transferred horizontally. • This is much less true in the prokaryotes, where a great deal of DNA is transferred across species lines. – I have seen an estimate of 15% of all prokaryotic genes are derived from horizontal transfers • Horizontal gene transfer is usually identified by performing phylogenetic linage studies on individual genes, and seeing that some gene has more in common with genes in distant species than with genes in closely related species. Sources of New DNA • Bacteria reproduce by binary fission: replicating their DNA, then splitting in half. Each cell has only 1 parent, and there is no regular sexual process. • Bacteria have 3 main ways of bringing in new DNA: – conjugation: direct transfer of DNA between 2 cells (although not necessarily of the same species) – transduction: transfer of DNA between cells using a bacteriophage (virus) as an intermediate – transformation: the cell takes up DNA molecules from the environment Lysogenic Bacteriophage • Bacteriophage (phage) are bacterial viruses: DNA (or RNA) surrounded by a protein coat, but with no internal metabolic activity. • Most bacteriophage enter the cell, hijack its machinery to reproduce themselves, and then kill the cell by lysing it (breaking it open). This is called the lytic cycle. • Some phage have the ability to insert themselves into the bacterial genome and remain there, inactive, for many generations: the lysogenic cycle. – First described in phage lambda – the inserted phage chromosome is called the prophage. • When conditions get harsh, the phage DNA comes out of the chromosome and enters the normal lytic pathway. It reproduces and kills the host cell. • Sometimes the prophage is inactivated by mutation and becomes a permanent part of the chromosome.