Genome
Complete set of instructions for making an organism
• master blueprints for all enzymes, cellular structures &
activities
an organism„s complete set of DNA
The total genetic information carried by a single set of
chromosomes in a haploid nucleus
Located in every nucleus of trillions of cells
Consists of tightly coiled threads of DNA organized into
chromosomes
Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or circular
Viruses with RNA genomes:
• Almost all plant viruses and some bacterial and animal viruses
• Genomes are rather small (a few thousand nucleotides)
Viruses with DNA genomes (e.g. lambda = 48,502 bp):
• Often a circular genome.
Replicative form of viral genomes
• all ssRNA viruses produce dsRNA molecules
• many linear DNA molecules become circular
Molecular weight and contour length:
• duplex length per nucleotide = 3.4 Å
• Mol. Weight per base pair = ~ 660
Bacterial genomes: E. coli
4288 protein coding genes:
• Average ORF 317 amino acids
• Very compact: average distance
between genes 118bp
Numerous paralogous gene families:
38 – 45% of genes arisen through
duplication
Homologues:
• H. influenzae (1130 of 1703)
• Synechocystis (675 of 3168)
• M. jannaschii (231 of 1738)
• S. cerevisiae (254 of 5885)
Procaryotic genomes
Generally 1 circular chromosome (dsDNA)
Usually without introns
Relatively high gene density (~2500 genes per
mm of E. coli DNA)
Contour length of E.coli genome: 1.7 mm
Often indigenous plasmids are present
Easy problem
Bacterial Gene-finding
Dense Genomes
Short intergenic regions
Uninterrupted ORFs
Conserved signals
Abundant comparative information
Complete Genomes
Genomes
Gene Content
E. coli
4000 genes X 1 kbp/gene=4 Mbp
Genome=4 Mbp!
Plasmids -lactamase
ori
Extra chromosomal circular DNAs
Found in bacteria, yeast and other fungi
Size varies form ~ 3,000 bp to 100,000 bp. foreign gene
Replicate autonomously (origin of replication)
May contain resistance genes
May be transferred from one bacterium to another
May be transferred across kingdoms
Multipcopy plasmids (~ up to 400 plasmids/per cell)
Low copy plasmids (1 –2 copies per cell)
Plasmids may be incompatible with each other
Are used as vectors that could carry a foreign gene of interest (e.g.
insulin)
Agrobacterium tumefaciens
Characteristics
• Plant parasite that causes Crown Gall Disease
• Encodes a large (~250kbp) plasmid called Tumor-
inducing (Ti) plasmid
Portion of the Ti plasmid is transferred between bacterial
cells and plant cells T-DNA (Tumor DNA)
Agrobacterium tumefaciens
T-DNA integrates stably into plant genome
Single stranded T-DNA fragment is converted to
dsDNA fragment by plant cell
Then integrated into plant genome
2 x 23bp direct repeats play an important role in the
excision and integration process
Agrobacterium tumefaciens
Tumor formation = hyperplasia
Hormone imbalance
Caused by A. tumefaciens
• Lives in intercellular spaces of the plant
• Plasmid contains genes responsible for the disease
Part of plasmid is inserted into plant DNA
Wound = entry point 10-14 days later, tumor
forms
Agrobacterium tumefaciens
What is naturally encoded in T-DNA?
• Enzymes for auxin and cytokinin synthesis
Causing hormone imbalance tumor formation/undifferentiated
callus
Mutants in enzymes have been characterized
• Opine synthesis genes (e.g. octopine or nopaline)
Carbon and nitrogen source for A. tumefaciens growth
Insertion genes
• Virulence (vir) genes
• Allow excision and integration into plant genome
Ti plasmid of A. tumefaciens
1. Auxin, cytokinin,
opine synthetic genes
transferred to plant
2. Plant makes all 3
compounds
3. Auxins and cytokines
cause gall formation
4. Opines provide unique
carbon/nitrogen
source only A.
tumefaciens can use!
Fungal genomes: S. cerevisiae
First completely sequenced
eukaryote genome
Very compact genome:
• Short intergenic regions
• Scarcity of introns
• Lack of repetitive sequences
Strong evidence of duplication:
• Chromosome segments
• Single genes
Redundancy: non-essential genes
provide selective advantage
Eucaryotic genomes
Located on several chromosomes
Relatively low gene density (50 genes per mm of
DNA in humans)
Contour length of DNA
Carry organellar genome as well
Human Genomes
Human
50,000 genes X 2 kbp=100 Mbp
Introns=300 Mbp?
Regulatory regions=300 Mbp?
•Only 5-10% of human genome codes for genes
- function of other DNA (mostly repetitive sequences) unknown
but it might serve structural or regulatory roles
2300 Mbp=???
Plant genomes
It contains three genomes
The size of genomes is given in base pairs (bp)
The size of genomes is species dependent
The difference in the size of genome is mainly due to a
different number of identical sequence of various size
arranged in sequence
The gene for ribosomal RNAs occur as repetitive sequence
and together with the genes for some transfer RNAs in
several thousand of copies
Structural genes are present in only a few copies, sometimes
just single copy. Structural genes encoding for structurally
and functionally related proteins often form a gene family
Genetic information is divided in the chromosome
The DNA in the genome is replicated during the interphase of
mitosis
Size of the genome in plants and in
human
Genome Arabidopsis Zea mays Vicia faba Human
thaliana
Nucleus 70 Millions 3900 Millions 14500 Millions 2800 Millions
Plastid 0.156 Millions 0.136 Millions 0.120 Millions
Mitochondrion 0.370 Millions .570 Millions .290 Millions .017 Millions
Plant genomes: Arabidopsis thaliana
A weed growing at the roadside of
central Europe
It has only 2 x 5 chromosomes
It is just 70 Mbp
It has a life cycle of only 6 weeks
A model plant for the investigation of
plant function
Contains 25,498 structural genes from
11,000 families
The structural genes are present in only
few copies sometimes just one protein
Structural genes encoding for structurally
and functionally related proteins often
form a gene family
Plant genomes: Arabidopsis thaliana
Cross-phylum matches:
• Vertebrates 12%
• Bacteria / Archaea 10%
• Fungi 8%
60% have no match in non-plant
databases
Evolution involved whole genome
duplication followed by
subsequent gene loss and
extensive local gene duplications
Complex
Genome DNA
~10% highly repetitive (300 Mbp)
• NOT GENES
~25% moderate repetitive (750 Mbp)
• Some genes
~25% exons and introns (800 Mbp)
40%=?
• Regulatory regions
• Intergenic regions
Genome organization
“Nonfunctional” DNA
80 kb
Higher eukaryotes have a lot of noncoding DNA
Some has no known structural or regulatory function (no genes)
Duplicated genes
Encode closely related (homologous) proteins
Clustered together in genome
Formed by duplication of an ancestral gene followed by
mutation
Five functional genes and two pseudogenes
Pseudogenes
Nonfunctional copies of genes
Formed by duplication of ancestral gene, or
reverse transcription (and integration)
Not expressed due to mutations that produce a
stop codon (nonsense or frameshift) or prevent
mRNA processing, or due to lack of regulatory
sequences
Repetitive DNA
Moderately repeated DNA
• Tandemly repeated rRNA, tRNA and histone genes (gene
products needed in high amounts)
• Large duplicated gene families
• Mobile DNA
Simple-sequence DNA
• Tandemly repeated short sequences
• Found in centromeres and telomeres (and others)
• Used in DNA fingerprinting to identify individuals
Mobile DNA
Move within genomes
Most of moderately repeated DNA sequences
found throughout higher eukaryotic genomes
• L1 LINE is ~5% of human DNA (~50,000 copies)
• Alu is ~5% of human DNA (>500,000 copies)
Some encode enzymes that catalyze
movement
Transposition
Movement of mobile DNA
Involves copying of mobile DNA element and
insertion into new site in genome
Why?
Molecular parasite: “selfish DNA”
Probably have significant effect on evolution
by facilitating gene duplication, which provides
the fuel for evolution, and exon shuffling
Mitochondrial genome (mtDNA)
Number of mitochondria in plants can be between 50-
2000
One mitochondria consists of 1 – 100 genomes (multiple
identical circular chromosomes. They are one large and
several smaller
Size ~15 Kb in animals
Size ~ 200 kb to 2,500 kb in plants
Mt DNA is replicated before or during mitosis
Transcription of mtDNA yielded an mRNA which did not
contain the correct information for the protein to be
synthesized. RNA editing is existed in plant
mitochondria
Over 95% of mitochondrial proteins are encoded in the
nuclear genome.
Often A+T rich genomes
Chloroplast genome (ctDNA)
Multiple circular molecules, similar to procaryotic
cyanobacteria, although much smaller (0.001-0.1%of the size
of nuclear genomes)
Cells contain many copies of plastids and each plastid contains
many genome copies
Size ranges from 120 kb to 160 kb
Plastid genome has changed very little during evolution.
Though two plants are very distantly related, their genomes
are rather similar in gene composition and arrangement
Some of plastid genomes contain introns
Many chloroplast proteins are encoded in the nucleus (separate
signal sequence)
“Cellular” Genomes
Viruses Procaryotes Eucaryotes
Nucleus
Capsid
Plasmids
Viral genome Bacterial
Chromosomes Mitochondrial
chromosome
(Nuclear genome) genome
Chloroplast
genome
Genome: all of an organism‟s genes plus intergenic DNA
Intergenic DNA = DNA between genes
Estimated genome sizes
mammals
plants
fungi
bacteria (>100)
mitochondria (~ 100)
viruses (1024)
1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12
Size in nucleotides. Number in ( ) = completely sequenced genomes
What Did These Individuals
Contribute to Molecular Genetics?
Anton van Leeuwenhoek
Discovered cells
• Bacteria
• Protists
• Red blood
What Did These Individuals
Contribute to Molecular Genetics?
Gregor Johan Mendel
Discovered genetics
What Did These Individuals
Contribute to Molecular Genetics?
Walter Sutton
Discovered
Chromosomes
What Did These Individuals
Contribute to Molecular Genetics?
Thomas Hunt Morgan
Discovered how genes
are transmitted through
chromosomes
What Did These Individuals
Contribute to Molecular Genetics?
Rosalind Elsie Franklin
Research led to the
discovery of the double
helix structure of DNA
What Did These Individuals
Contribute to Molecular Genetics?
James Watson and
Francis Crick
Discovered DNA
DNA’s History
1866 Gregore Mendel Law of Heredity
1900 Carl Correns, Hugo de Mendelian Law re-invention
Vries& Eric von
Tschermak
1944 Avery, Macleod & McCarty Gene consists of DNA
1952 Hersey dan Chase DNA as genetic matarials
1953 Watson & Crick Double helix DNA
1971 Cohen & Boyer Transformation Technology
1972 Berg DNA Recombinant Technology
1973 Arber, smith & Nathans Restriction Enzyme
Chromosome parts
Chromatid
• sister strands after
replication
• still joined at centromere
Centromere
• ~ “middle” of Chromosomes
• spindle attachment sites
Telomeres
• ends of chrm
• important for the stability
of chromosomes tips.
Chromosomal Regions
Heterochromatin
compact;
few genes;
largely structural role
Euchromatin
contains most of the genes.
Chromosome
Gene
The hereditary determinant of a specified difference
between individual
The unit of heredity
The unit which passed from generation to generation
following simple Mendelian inheritance
A segment of DNA which encodes protein synthesis
Any of the units occurring at specific points on the
chromosomes, by which hereditary characters are
transmitted and determined, and each is regarded as a
particular state of organization of the chromatin in the
chromosome, consisting primarily DNA and protein
Gene classification
intergenic
region non-coding
coding genes genes
Chromosome
(simplified)
Messenger RNA Structural RNA
Proteins
transfer ribosomal other
RNA RNA RNA
Structural proteins Enzymes
Gene
Molecular definition:
DNA sequence encoding protein
What are the problems with this
definition?
Gene
Some genomes are RNA instead of DNA
Some gene products are RNA (tRNA, rRNA,
and others) instead of protein
Some nucleic acid sequences that do not
encode gene products (noncoding regions)
are necessary for production of the gene
product (RNA or protein)
Coding region
Nucleotides (open reading frame) encoding the
amino acid sequence of a protein
The molecular definition of gene includes more
than just the coding region
Noncoding regions
Regulatory regions
• RNA polymerase binding site
• Transcription factor binding sites
Introns
Polyadenylation [poly(A)] sites
Gene
Molecular definition:
Entire nucleic acid sequence necessary for the
synthesis of a functional polypeptide (protein
chain) or functional RNA
Bacterial genes
Most do not have introns
Many are organized in operons: contiguous
genes, transcribed as a single polycistronic
mRNA, that encode proteins with related
functions
Polycistronic mRNA encodes several proteins
Bacterial operon
What would be the effect of a mutation in
the control region (a) compared to a
mutation in a structural gene (b)?
Eukaryotic genes
Most have introns
Produce monocistronic mRNA: only one
encoded protein
Large
Eucaryotic genes
Hemoglobin beta subunit gene
Exon 1 Intron A Exon 2 Intron B Exon 3
90 bp 131 bp 222 bp 851 bp 126 bp
Splicing
Introns: intervening sequences within a gene that are not translated
into a protein sequence. Collagen has 50 introns.
Exons: sequences within a gene that encode protein sequences
Splicing: Removal of introns from the mRNA molecule.
Alternative splicing
Splicing is the removal of introns
mRNA from some genes can be spliced
into two or more different mRNAs