Genome Plasticity
Transposable and repetitive DNA -comparison between species Genome Analysis course
Transposable elements - Principle force in reshaping genomes
All eukaryotic genomes contain individual repeat units that are interspersed
genome-wide.
Some areas are devoid of such repeats whereas some areas almost
exclusively consist of such repeats.
Studies of the functional consequences of transposable DNA are key to the
understanding of chromosome structure and evolution.
Their amplification and integration within host genomes contributes to genome
plasticity.
The mechanism of interspersion, at distinct chromosomal location occurs
by either retrotransposition or DNA transposition.
Interspersed repetitive DNA in Eukaryotic Genomes
Active transposons may cause rearrangements, create new genes, modify existing genes, and modulate GC-content of host genomes.
Repeats constitute a rich paleontological record and provide clues for the evolutionary history of genomes
Most repeats are passive markers in the genome and mutations that strike these sequences can be used to study processes of mutation and selection.
Future study of DNA repeats should also be informative for our overall understanding of chromosome structure.
Repeat Content of Mammalian Genomes
Repetitive DNA comprise at least 46 % and 37,5% of the human and mouse genomes. i.e. About half of the genomes have evolved through mechanisms involving retrotransposition and DNA transposition
In comparison, protein-coding sequences comprise about 1,5 %
UTRs of protein-coding genes comprise about 1%
The remaining ≈51,5% of DNA comprise RNA genes, introns, regulatory DNA and intergenic DNA with unknown function.
Insertion patterns of repetitive DNA (e.g. LTR retrotransposons) differ between populations and can be used as evolutionary markers
Rate of transposition in mouse vs. man
• Transposition is more active in mouse vs. man
– Since divergence, mouse genome has accumulated at least 120 Mbp more transposon-derived sequences
– Mm-32,4% (818 Mbp) lineage-specific repeats
– HSa- 24,4% (695 Mbp) lineage-specific repeats
– Note that the dog genome has 31% repeats
• The apparent low percentage in mouse (37,5%) is explained by
– a two-fold higher mutation rate in mouse and high divergence in ancestral repeats makes them undetectable by current programs
• RepeatMasker cannot efficiently detect repeats of >37% divergence
– The WGS-approach will miss highly similar recently mobilized transposons
• More repeat insertions is not the reason for the 0,4 Gbp larger size of the human genome
Four classes of transposable elements
Long interspersed nucleotide element (LINE)-like elements
Short interspersed nucleotide elements (SINEs)
retroposons
Retrovirus-like elements with LTRs
Retrotransposons
These repeats transpose through an RNA intermediate
DNA transposons
Transpose by a cut and paste mechanism
Classification of Retroelements
RETROPOSONS
Lack env genes and are non-infectious
These elements are sub-divided based on the presence of RT
•LINEs or the absence of RT •SINEs
RETROTRANSPOSONS
Related to retroviruses but lack env genes
• truncated endogenous retroviruses
• Yeast Ty1 elements
• Drosophila copia elements
ENDOGENOUS RETROVIRUSES (ERVs)
•complete ERVs have gag, pol, and env genes and may produce
infectious retroviral particles
Long interspersed nucleotide elements (LINEs) (1)
• • LINEs are found in the genomes of all mammals. Insertion bias towards ATrich regions
Three LINE families are present in the human and mouse genomes: LINE1-3
– Only some LINE1 (L1Hs) is still actively retrotransposing in the human genome
– LINE1 is highly active in the mouse; three active sub-families
• >97% similar but with divergent 5’-ends
• • •
LINE is the most ancient class of transposable elements
– LINE1 >150 Myr old
LINEs perform autonomous retrotransposition
Full length copies are approximately 6 - 8 kb long
– Copy # in the human genome: 850,000, 20.4 % of the genome
– Internal RNA polymerase II promoter
– Encode two open reading frames (ORFs), endonuclease and reverse transcriptase (RT)
– LINE RNA assembles with its encoded proteins and moves to the nucleus
• Endonuclease makes a single-stand nick of the genomic DNA and RT uses the nicked DNA to prime reverse transcription from the 3’-end of the LINE RNA
– Reverse transcription of LINE RNA is frequently terminated prior to completion and this results in truncated, non-functional insertions
LINEs (2)
• LINE1-3 in the human genome
– LINE1: 516,000 copies (462 Mb-16.9 % of the genome)
– LINE2: 315,000 copies (88 Mb-3.2 % of the genome)
– LINE 3: 37,000 copies (8 Mb-0.3% of the genome)
• •
LINE1 is the largest family in both mouse and man
Note that most inserted LINEs are truncated because of terminated reverse transcription of LINE RNA prior to integration of LINE DNA
– Average size of LINE1 is approx. 1 kb
• •
LINE insertion sites are flanked by short (7-20 bp) direct repeats as a result of target site duplication
The LINE machinery is believed to be responsible for most detectable reverse transcription that occur in human cells
LINEs (3)
Transposable elements are the principle force in reshaping and modulating genomes
• • • Recombination events involving pig LINE elements have caused the KIT duplication Large White pigs
Insertion of LINE elements within or in the vicinity of genes have created pseudogenes by affecting gene regulation or by disrupting ORFs.
LINE elements prefer sex chromosomes
– Increased frequency on sex chromosomes may reflect inability to purge insertions by the lower recombination rates on X and Y
– Hypothesis: LINEs may facilitate X chromosome inactivation by interaction with Xist RNA. Xist RNA only expressed from the X “marked” for inactivation. Highest LINE density close to the X inactivation centre
– On the Y chromosome, increased tolerance may be due to lower relative number of genes
Short interspersed nucleotide elements (SINEs) (1)
• SINEs are non-autonomous repeats found in the genomes of all
mammals
• SINE is the most abundant class of retroelements
– Copy # human genome: 1.5 x 106, 13 % of the genome
• Insertion bias into GC-rich DNA
• density is influenced by features of GC-rich DNA i.e. Active transcription
• Human genome
– 1.5 x 106, 10,7% of the genome
– Alu elements are the only active SINEs in the human genome
– Alu - derived from 7SL RNA
SINEs (2)
• Three SINE families in the human genome: Alu, MIR, and MIR3
– The active Alu family use the LINE1 machinery for active retrotransposition. The MIR elements are inactive
– MIR was adapted for reverse transcription by LINE2. When LINE2 became ”extinct” 80-100 Myr ago also MIR became inactive.
• Alu elements are derived from 7SL RNA
– Alu elements are >80 Myr old
• Full length copies of SINEs are between 100 and 400 bp long
– Internal RNA polymerase III promoter
– SINEs encode no proteins
SINEs (3)
• Mouse genome -
– four distinct SINEs - 7,6% of the genome
– B1 - derived from 7SL RNA, tRNA-derived internal promoter, transcribed by RNApol III
– ID - derived from neuronal BC1 RNA
– B2 - resemble Ala-tRNA
– B4 - fusion between B1 and ID elements
LTR Retroposons and Retrotransposons
• • • • • • LTR retroposons elements are non-autonomous repeats
LTR retrotransposons are autononous repeats that contain gag and pol genes
Endogenous retroviruses have also an env gene
LTR elements contain regulatory sequences for transcription initiation, termination and poly-adenylation
Copy # in the human genome of LTR retroelements: 0.44 x 106, 8 % of the genome
Reverse transcription occurs in the cytoplasm in virus-like particles
– The reverse transcription is primed by tRNAs (different tRNAs are used depending on subclass)
– ERVs are active in mammalian genomes. Appears to be more active in mouse compared to man.
• • •
85% of the human LTR retroposons consist only of the LTR (solitary LTR elements). These are created by homologous recombination between flanking LTR elements.
Several ERVs have been shown to have functional roles in cellular gene expression, splicing, and polyadenylation.
A proposed functional role for HERV expression is to prevent retrovirus infection by receptor interference.
Background
• Definition of an endogenous retrovirus
• Infection by exogenous retrovirus in germ line cells
• Carried by an individual as part of its germ line DNA
• Present in all cells of the organism
• All vertebrates have endogenous retroviruses (ERV) as integrated proviruses in their genomes
• Mendelian inheritance
• Complete ERVs have two LTRs, gag, pro, pol, and env genes and may produce infectious retroviral particles
1 6
RETROVIRUS LIFE CYCLE
CfERV
1 7
Background
• Retroviruses are classified into seven genera
• Alpharetrovirus
• Betaretrovirus
• Gammaretrovirus
• Deltaretrovirus
• Epsilonretrovirus
• Spumavirus
• Lentivirus
ERVs
• Class I: similar to gamma rv
• Class II: similar to beta and alpha
• Class II: spuma-like
Long Terminal Repeats (LTRs)
• LTRs formed during reverse transcription
• Composed of unique U3 and U5 regions separated by an R segment
• TSS is within the R segment
• U3 contains the RV promoter
• Parts of the R segment are repeated at each end of genomic RNA
Provirus LTR
5’-U3RU5-------------U3RU5-3’
U3
R TSS
U5
Solo LTRs:
10-100x more abundant
Genomic RNA
5´-RU5---------------U3R-3’
than complete ERVs
SIMPLIFIED SCHEMATIC RETROVIRAL PHYLOGENY
Spumaviruses FFV HFV Lentiviruses HIV
Beta retroviruses MMTV
GALV
PERV
CfERV
MLV
CfERV
Gammaretrovirus
HTLV-related viruses
2 0
DNA transposons
• • DNA transposons that encode transposase are autonomous repeats
DNA transposons resemble bacterial transposons having terminal inverted repeats
The transposase mediates mobility by a cut and paste mechanism and cause large-scale chromosome rearrangements
Four classes in mouse - MMAR1, hAT-URR1, RMER30 and RChar1
– Fraction of the genome (0,88%)
•
•
•
Seven major classes in the human genome:
– Charlie, PiggyBac, Mariner, Tc2, Tigger, Zaphod, unclassified
•
Total copy # in the human genome of DNA transposons: 0.3 x 106, 3% of the genome
Processed pseudogenes and retrogenes
•Cellular mRNAs and small structural RNAs are occassionally retrotranscribed
by RT and re-integrated into the genome.
•Following integration these transposed copies are often struck by mutation and
become processed pseudogenes.
•Creation of a retrogene
Occasionally, integration can occur at a favorable site and transcription of the
gene can take place. Some genes that lack introns in mammalian genomes have
evolved through such mechanisms.
• Example FGF3 retrogene that causes short legs in dogs (chondrodysplasia)
Variation in repeat distribution
• High LINE density on Mm X: 28,5%, Hsa X: 17,5%; autosomes: Mm 14,6% and Hsa 7,5%
A striking variation in distribution of repeats in the human genome
– Highest density: Xp11 (525 kb segment) with 89% of transposable elements with a 200 kb segment with 98% density and a 100 kb segment with 89% LINE1
– Regions with high Alu density: >56% at three loci , one on 7q11
– Regions with high density of MIR: >15% on 1p36
•
• •
The gene dense HLA class II regions has some 25% of repeats
Some regions have almost no repeats
– In the four homeobox gene clusters, HOXA, HOXB, HOXC, and HOXD, each locus (100 kb) contains <2% interspersed repeats
– A 63 kb region on 8q21 with <1.5% repeats that contains a gene that encodes a homeodomain zinc-finger protein
Distribution of repeats by GC content
•
A striking variation between distribution of LINEs and SINEs in both the human and mouse genomes is evident from analysis of mouse and human genomes
– LINE repeats occur with much higher density in AT-rich DNA
• Approximately 4-fold enriched. AT-rich DNA has less # of genes and integration of LINEs in gene rich regions would be selectively disadvantageous
Five-fold lower in AT-rich DNA
– Regions with high Alu density are GC-rich
•
Basis for apparent distinct insertion bias for LINEs vs SINEs ?
Both LINEs and SINEs would be equally bad if integrated in gene rich areas.
Recently integrated Alu elements (AluHs) show the same preference like LINE for integration in AT rich DNA. Older Alus(Alu elements that integrated several Myr ago) show stronger bias for GC-rich DNA
•
LTR retrotransposons, retroposons, and DNA transposons are evenly distributed. However, integration seems to be more frequent in areas with locally more AT.
Repeat architecture on the Y chromosome
• Repeats found on the Y chromosome are unusually young, e.g. LINE elements and HERVs are much younger than those found on autosomes
– Could this indicate a higher tolerance of integration of transposable DNA on the Y chromosome and loss of old repeats by deletion ?
Identification of active retrotransposons
• • • • • • Population studies can be used to determine integration sites for interspersed DNA that are still polymorphic.
Many known cases of LINE insertions are associated with genetic disease and these LINEs all belong to the youngest sub-family L1Hs.
At least 50% of all known L1Hs still segregate as polymorphism in the human population.
Future analysis of multiple individuals for interspersed repeat polymorphism will be informative for identification of actively retrotransposing elements.
Endogenous retroviruses in the mouse genome are actively retrotransposing.
Preliminary analysis of transposable ERV elements in the pig genome suggests active retrotransposition in an inbred herd of pigs.
– It is possible but not shown that this can also be the case in the human genome.
– An argument against this is that the human draft sequence indicate that most integrated HERVs are evolutionary old.
• All eukaryotic genomes contain transposable elements (TEs).
• Nearly half of the mammalian genomes are derived from ancient TEs
• TEs are in a some cases still active
• Many genes have TE-derived promoters
• TEs have contributed and are still contributing to genome evolution
• May cause mutation upon integration
• Documented cases as disease-causing agents in both human, dog and mouse
• Mediate recombination e.g. Kit locus in pig
Conclusions