Embed
Email

genome_org1

Document Sample

Shared by: panniuniu
Categories
Tags
Stats
views:
1
posted:
12/12/2011
language:
pages:
46
Genome Organization &

Protein Synthesis and

Processing in Plants

Viral genomes

Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or

ciruclar



Viruses with RNA genomes:

•Almost all plant viruses and some bacterial and animal viruses

•Genomes are rather small (a few thousand nucleotides)

Viruses with DNA genomes (e.g. lambda = 48,502 bp):

•Often a circular genome.

Replicative form of viral genomes

•all ssRNA viruses produce dsRNA molecules

•many linear DNA molecules become circular

Molecular weight and contour length:

• duplex length per nucleotide = 3.4 Å

• Mol. Weight per base pair = ~ 660

Procaryotic genomes

• Generally 1 circular chromosome (dsDNA)

• Usually without introns

• Relatively high gene density (~2500 genes

per mm of E. coli DNA)

• Contour length of E.coli genome: 1.7 mm

• Often indigenous plasmids are present

-lactamase

Plasmids ori

Extra chromosomal circular DNAs

• Found in bacteria, yeast and other fungi

• Size varies form ~ 3,000 bp to 100,000 bp. foreign gene

• Replicate autonomously (origin of replication)

• May contain resistance genes

• May be transferred from one bacterium to another

• May be transferred across kingdoms

• Multicopy plasmids (~ up to 400 plasmids/per cell)

• Low copy plasmids (1 –2 copies per cell)

• Plasmids may be incompatible with each other

• Are used as vectors that could carry a foreign gene of

interest (e.g. insulin)

Eukaryotic genome

• Moderately repetitive

– Functional (protein coding, tRNA coding)

– Unknown function

• SINEs (short interspersed elements)

– 200-300 bp

– 100,000 copies

• LINEs (long interspersed elements)

– 1-5 kb

– 10-10,000 copies

Eukaryotic genome

• Highly repetitive

– Minisatellites

• Repeats of 14-500 bp

• 1-5 kb long

• Scattered throughout genome

– Microsatellites

• Repeats up to 13 bp

• 100s of kb long, 10 6 copies

• Around centromere

– Telomeres

• Short repeats (6 bp)

• 250-1,000 at ends of chromosomes

Eucaryotic genomes

• Located on several chromosomes

• Relatively low gene density (50 genes per mm of

DNA in humans)

• Contour length of DNA from a single human cell = 2

meters

• Approximately 1011 cells = total length 2 x 1011 km

• Distance between sun and earth (1.5 x 108 km)

• Human chromosomes vary in length over a 25 fold

range

• Carry organelles genome as well

Mitochondrial genome (mtDNA)





• Multiple identical circular chromosomes

• Size ~15 Kb in animals

• Size ~ 200 kb to 2,500 kb in plants

• Over 95% of mitochondrial proteins are

encoded in the nuclear genome.

• Often A+T rich genomes.

• Mt DNA is replicated before or during

mitosis

Chloroplast genome

(cpDNA)

• Multiple circular molecules

• Size ranges from 120 kb to 160 kb

• Similar to mtDNA

• Many chloroplast proteins are encoded

in the nucleus (separate signal

sequence)

“Cellular” Genomes

Viruses Procaryotes Eucaryotes

Nucleus









Capsid

Plasmids

Viral genome Bacterial

Chromosomes Mitochondrial

chromosome

(Nuclear genome) genome



Chloroplast

genome

Genome: all of an organism‟s genes plus intergenic DNA

Intergenic DNA = DNA between genes

Estimated genome sizes

mammals

plants

fungi

bacteria (>100)



mitochondria (~ 100)



viruses (1024)





1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12

Size in nucleotides. Number in ( ) = completely sequenced genomes

Size of genomes

Epstein-Barr virus 0.172 x 106

E. coli 4.6 x 106

S. cerevisiae 12.1 x 106

C. elegans 95.5 x 106

A. thaliana 117 x 106

D. melanogaster 180 x 106

H. sapiens 3200 x 106

Chromosome organization

Eucaryotic chromosome



Telomere Centromere Telomere



p-arm q-arm

Centromere:

• DNA sequence that serve as an attachment for protein during mitosis.

• In yeast these sequences (~ 130 nts) are very A+T rich.

• In higher eucaryotes centromers are much longer and contain

“satellite DNA”

Telomeres:

• At the end of chromosomes; help stabilize the chromosome

• In yeast telomeres are ~ 100 bp long (imperfect repeats)

• Repeats are added by a specific telomerase

5’ – (TxGy)n x and y = 1 - 4

3’ – (AxCy)n n = 20 to 100; (1500 in mammals)

Gene classification

intergenic

region non-coding

coding genes genes

Chromosome

(simplified)



Messenger RNA Structural RNA





Proteins

transfer ribosomal other

RNA RNA RNA

Structural proteins Enzymes

What is a gene ?

• Definitions

1. Classical definition: Portion of a DNA that determines a

single character (phenotype)

2. One gene – one enzyme (Beadle & Tatum 1940): “Every

gene encodes the information for one enzyme”

3. One gene – one protein: “One gene contains information

for one protein (structural proteins included) one gene –

one polypeptide

4. Current definition: A piece of DNA (or in some cases

RNA) that contains the primary sequence to produce a

functional biological gene product (RNA, protein).

Coding region

Nucleotides (open reading frame) encoding

the amino acid sequence of a protein









The molecular definition of gene includes

more than just the coding region

Noncoding regions

• Regulatory regions

– RNA polymerase binding site

– Transcription factor binding sites

• Introns

• Polyadenylation [poly(A)] sites

Gene

Molecular definition:

Entire nucleic acid sequence necessary for the

synthesis of a functional polypeptide

(protein chain) or functional RNA

Anatomy of a gene

• ORF. From start (ATG) to stop (TGA,

TAA, TAG)

• Upstream region with binding site. (e.g.

TATA box).

• Poly-a „tail‟

• Splices. Bounded by AG and GT splice

signals.

Bacterial genes

• Most do not have introns

• Many are organized in operons: contiguous

genes, transcribed as a single polycistronic

mRNA, that encode proteins with related

functions



Polycistronic mRNA encodes several proteins

Bacterial operon









What would be the effect of a mutation in

the control region (a) compared to a

mutation in a structural gene (b)?

Eucaryotic genes

Hemoglobin beta subunit gene

Exon 1 Intron A Exon 2 Intron B Exon 3

90 bp 131 bp 222 bp 851 bp 126 bp





Splicing







Introns: intervening sequences within a gene that are not translated

into a protein sequence. Collagen has 50 introns.

Exons: sequences within a gene that encode protein sequences

Splicing: Removal of introns from the mRNA molecule.

Regulatory mechanisms

• „organize expression of genes‟ (function

calls)

• Promoter region (binding site), usually near

coding region

• Binding can block (inhibit) expression

• Computational challenges

– Identify binding sites

– Correlate sequence to expression

Eukaryotic genes









• Most have introns

• Produce monocistronic mRNA: only one

encoded protein

• Large

Alternative splicing









• Splicing is the removal of introns

• mRNA from some genes can be spliced into

two or more different mRNAs

“Nonfunctional” DNA





80 kb









• Higher eukaryotes have a lot of noncoding

DNA

• Some has no known structural or regulatory

function (no genes)

Types of eukaryotic DNA

Duplicated genes

• Encode closely related (homologous)

proteins

• Clustered together in genome

• Formed by duplication of an ancestral gene

followed by mutation









Five functional genes and two pseudogenes

Pseudogenes

• Nonfunctional copies of genes

• Formed by duplication of ancestral gene, or

reverse transcription (and integration)

• Not expressed due to mutations that

produce a stop codon (nonsense or

frameshift) or prevent mRNA processing, or

due to lack of regulatory sequences

Repetitive DNA

• Moderately repeated DNA

– Tandemly repeated rRNA, tRNA and histone

genes (gene products needed in high amounts)

– Large duplicated gene families

– Mobile DNA

• Simple-sequence DNA

– Tandemly repeated short sequences

– Found in centromeres and telomeres (and others)

– Used in DNA fingerprinting to identify

individuals

Types of DNA repeats

Perfect repeats vs degenerate repeats

Tandem repeats (e.g. satellite DNA)



5’-CATGTGCTGAAGGCTATGTGCTGCGACG- 3’

3’-GTACACGACTTCCGATACACGACGCTGC- 5’

Inverted repeats (e.g. in transposons)



5’-CATGTGCTGAAGGCTCAGCACATCGACG- 3’ Loop

3’-GTACACGACTTCCGAGTCGTGTAGCTGC- 5’ Stem

• Form stem-loop structures

Palindroms = adjacent inverted repeats

(e.g. restriction sites) Hairpin

• Form hairpin structures

Repetitive sequences DNA

Satellite

Chromosomal DNA







Caesium chloride

Repeats in the mouse genome density gradient



Type No. of Size Percent of

Repeats genome

Highly > 1 Mill 1000 ~ 150 - ~300 bp 20 %

repetitive

DNA repeats and forensics

AluSTXa

Gender determination

1) Standard technique: PCR amplification M F Suspect

of the amelogenin locus

(Males = XY => 103 + 109 bp)

2) AluSTXa Alu insertion on X 878 bp

3) AluSTYa Alu insertion on Y 556 bp



AluSTYa

X-Y homologous regions M F Suspect

AluSTYa

X

Y 528 bp

Alu sequence 199 bp

Mobile DNA

• Move within genomes

• Most of moderately repeated DNA sequences

found throughout higher eukaryotic genomes

– L1 LINE is ~5% of human DNA (~50,000 copies)

– Alu is ~5% of human DNA (>500,000 copies)

• Some encode enzymes that catalyze

movement

Transposition

• Movement of mobile DNA

• Involves copying of mobile DNA element

and insertion into new site in genome

Why?

• Molecular parasite: “selfish DNA”

• Probably have significant effect on

evolution by facilitating gene duplication,

which provides the fuel for evolution, and

exon shuffling

RNA or DNA intermediate



• Transposon moves

using DNA

intermediate

• Retrotransposon

moves using RNA

intermediate

Types of mobile DNA elements

LTR (long terminal repeat)

• Flank viral retrotransposons and retroviruses

• Contain regulatory sequences

Transcription start site and poly (A) site

LINES and SINES

• Non-viral retro-transposons

– RNA intermediate

– Lack LTR

• LINES (long interspersed elements)

– ~6000 to 7000 base pairs

– L1 LINE (~5% of human DNA)

– Encode enzymes that catalyze movement

• SINES (short interspersed elements)

– ~300 base pairs

– Alu (~5% of human DNA)

Proteins

• Most protein sequences (today) are inferred

• What‟s wrong with this?

• Proteins (and nucleic acids) are modified

• „mature‟ Rna

• Computational challenges

– Identify (possible) aspects of molecular life cycle

– Identify protein-protein and protein-nucleic acid

interactions

Genetic variation

• Variable number tandem repeats

(minisatellites). 10-100 bp. Forensic

applications.

• Short tandem repeat polymorphisms

(microsatellites). 2-5 bp, 10-30 consecutive

copies.

• Single nucleotide polymorphisms

Single nucleotide polymorphisms

• 1/2000 bp.

• Types

– Silent

– Truncating

– Shifting

• Significance: much of individual variation.

• Challenge: correlation to disease

Yeast genome

• 4.6 x 106 bp. One chromosome. Published

1997.

• 4,285 protein-coding genes

• 122 structural RNA genes

• Repeats. Regulatory elements. Transposons.

• Lateral transfers.

Yeast protein functions

Regulatory 45 1.05%

Cell structure 182 4.24

Transposons,etc 87 2.03

Transport & binding 281 6.55

Putative transport 146 3.40

Replication, repair 115 2.68

Transcription 55 1.28

Translation 182 4.24

Enzymes 251 5.85

Unknown 1632 38.06



Other docs by panniuniu
Valuation of contingent claims and the
Views: 0  |  Downloads: 0
excel sample
Views: 0  |  Downloads: 0
Bare
Views: 0  |  Downloads: 0
Ch14
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!