professional documents
home
Upload
docsters
Upload
Powerpoint

Organization of the human genome center doc

educational > Medical


Organization of the human genome Human genome = nuclear genome + mitochondrial genome Mitochondrial genome HUMAN NUCLEAR GENOME 24 chromosomes (haploid) 3200 Mbp 30,000 genes 16569 bp 37 genes Human Mitochondrial Genome Small (16.5 kb) circular DNA rRNA, tRNA and protein encoding genes (37) 1 gene/0.45 kb Very few repeats No introns 93% coding; Genes are transcribed as multimeric transcripts Recombination not evident Maternal inheritance H strand enriched in G L strand enriched in C 7S DNA – short repetitive segment of H strand attached to L strand (abortive replication) Element of triple–DNA stand structure What are the mitochondrial genes? • 24 of 37genes are RNA coding – 22 mt tRNA – 2 mit ribosomal RNA (23S, 16S) • 13 of 37 genes are protein coding (synthethized on ribosomes inside mitochondria) some subunits of respiratory complexes and oxidative phosphorylation enzymes Limited autonomy of mitochondria mt encoded NADH dehydrogenase 7 subunits Succinate CoQ reductase 0 subunits Cytochrome b-c1 comp 1 subunit Cytochrome C oxidase 3 subunits ATP synthase complex 2 subunits tRNA components 22 tRNAs rRNA components 2 components Ribosomal proteins none Other mt proteins none nuclear >41 subunits 4 subunits 10 subunits 10 subunits 14 subunits none none ~80 mtDNA pol RNA pol etc. Two overlapping genes encoded by same strand of mt DNA (unique example) Two independent AUG located in Frame-shift to each other, second stop codon is derived from TA + A (from poly-A) Mitochondrial codon table 22 tRNA cover for 60 positions via third base wobble Human Nuclear Genome 3200 Mb 23 (XX) or 24 (XY) linear chromosomes 30-35,000 genes 1 gene/100kb Introns in the most of the genes 1,5 % of DNA is coding Genes are transcribed individually Repetitive DNA sequences (45%) Recombination at least once for each chrom. Mendelian inheritance (X + auto, paternal Y) Human Genome Organization From: Dr Finbarr Hayes lec HUMAN GENOME Nuclear genome 3000 Mb 65-80000 genes Mitochondrial genome 16.6 kb 37 genes 30% Genes and generelated sequences 70% Extragenic DNA Two rRNA genes 22 tRNA genes 80% 13 polypeptideencoding genes 20% Moderate to highly repetitive Unique or moderately repetitive 10% 90% Coding DNA Noncoding DNA Unique or low copy number Pseudogenes Gene fragments Introns, untranslated sequences, etc. Tandemly repeated or clustered repeats Interspersed repeats Human nuclear genome Euchromatic portion 3000Mb Constitutive heterochromatine 200 Mb Hetero DNA 30 3 15 3 11 chr Total DNA 1 279 Heterochromatin is distributed 2 251 between chromosomes 16 104 unevenly 17 88 21 45 Gene-poor chromosomes (With extra heterochromatin) Short arms of acrocentric chromosomes –13, –14, –15, –21, –22 Part of long arms of chr 1,9,16 Long arm of chromosome Y Human genome base content • 41% CG in average 38% CG for chromosomes 4 and 13 49% for chromosome 19 • Regions with wide swings in GC content (e.g. from 33,1% to 59,3%) GC content is correlated with Giemsa staining; Genes correlated too. Gene density correlates with higher GC content CpG dinucleotide conspicious depletion • Expected frequency is 0,042 (4,2%) • Observed frequency is five times lower It happens due to methylation-dependent mutation based CpG depletion CpG islands in the regulatory areas of human genes Location of CpG islands in the gene CpG islands do NOT have a deficit of CpG dinucelotides REPEATS!!!! 3 Main Components in Eukaryotic Genomes DNA purified from a human, do not self-anneal as a simple sigmoidal curve. Instead we see a curve which is the sum of the reannealings of many different components REPEATS CoT curve is a measure of sequence complexity NO REPEATS C0 = the initial concentration of nucleotides, T – time in seconds human CoT DNA (commercial preparate) This is human DNA which has been denatured and allowed to reanneal to a C0t value of 1. The double stranded component is then purified from the single stranded component and is supplied commercially. It contains most of the human repetitive DNA but very little "single copy" DNA (unique genes). Used to suppress background hybridization of compleх probes Satellite DNA is repetitive DNA that could be separated by buoyant density Equilibrium density gradient centrifugation Sheared DNA in Cesium Chloride gradient Satellite DNA Alpha –satellite (Centromere DNA) Microsatellites Minisatellites Are you still remember what it is? If not please refer to previous lectures and to the book Repetitive DNA • Moderately repeated DNA – Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts) – Large duplicated gene families – Mobile DNA (transposons) – to be discussed soon • Simple-sequence DNA – Tandemly repeated short sequences – Found in centromeres and telomeres (and others) = (MINI and MICROSATELLITES) Human Mobile DNA (transposons) • Moves within genome • LINE (Long interspersed nuclear elements) – L1, L2, L3 LINE is ~21% of human DNA (~1,00,000 copies) • SINE (Short interspersed nuclear elements) – Alu is ~10,7% of human DNA (1,200, 000 copies) – MIR, MIR3 is 3% of hum DNA (500,000 copies) • LTR elements (Long Terminal Repeats) – ERV and MalR are 8% of human DNA (500,000 copies) • Transposons – MER1 (Charlie), MER2 (Tigger), others (350, 000 copies), 2,8% of human DNA TOTAL: approx; 45% of human DNA RNA or DNA intermediate • Transposon moves using DNA intermediate • Retrotransposon moves using RNA intermediate LINEs and ERVs http://www.hos.ufl.edu/mooreweb/ Long interspersed nuclear elements (LINEs ) 20% of genome RNA binding also endonuclease Internal promoter • LINE1 – active (Also many truncated inactive sequences) • Line2 – inactive • Line 3 – inactive LINEs prefer AT-rich euchromatic bands IN everyone’s genome 60-100 copies of LINE1 are still capable of transposing, and may occasionally cause the disease by gene disruption Mechanism of LINE repeat jumps Full length LINE transcript is generated from 5’-UTR-based promoter 5’ 3’ ORF1 and ORF2 translated into proteins that stay bound to LINE mRNA orf2 5’ orf1 3’ ORF1/ORF2/mRNA complex moves back into the nucleus Product of ORF2 5’ orf2 orf1 5’ 3’ cut ds DNA 3’ 5’ Freed 3’ serves as a primer for LINE reverse transcription from 3’ UTR 3’ ORF2 and ORF1 function • ORF1 keeps ORF2 and LINE mRNA bound together and retracted into nucleus • ORF2 (endonuclease) cut dsDNA to provide free 3’ end as a primer to LINE 3’UTR • ORF2 (reverse transcriptase) makes cDNA copy of LINE mRNA, which becomes integrated into chromosomal DNA (as it bound to it by former 3’ freed end) TTTT A is ORF1 cleavage site, that is why integration prefers AT rich regions LINE replication is not very efficient process Reverse transcriptase of LINE elements is a “weak” enzyme (have a low processivity) Many insertions are truncational (copies are not able to copy itself further) Most insertions are only 900 bp (instead of 6.1 kb), only 1 of 100 insertions is successful Illustration to full-size LINEs and their fossil derivates Short interspersed nuclear elements (SINE) 13% of genome • • • • Non-autonomous (no revertase) 100-400 bp long; No open reading frames Derived from tRNA (transcribed with RNA pol III, leaving internal promoter) • Share sequences with 3’ ends of LINEs • Depend on LINE machinery for its movement AluI - elements • Derived from signal recognition particle 7SL • Does not share its 3’ end with a LINE • Internal promoter is active, but require appropriate flanking sequence for activation – so it’s active only if lucky with it’s integration site • Integrates in GC rich sequences • Only active SINE in the human genome Mark A. Batzer and Prescott L.Deininger As ALU repeats do not have open reading frames, ALUs have to use RT enzyme and endonuclease provided by LINE repeats or other transposons After integration Alu copies rapidly mutate at sites of their 24 CpGs Alignment of Alu-subfamily consensus sequences. Mark A. Batzer and Prescott L.Deininger The expansion of Alu-elements in primate lineage Mark A. Batzer and Prescott L.Deininger Potential Alu-mediated damage to human genome Insertional mutagenesis ALU-mediated uneven recombination Diseases that sometimes caused by de novo Alu-integration • • • • Neurofibromatosis (Shwann cell tumors), haemophilia, breast cancer, Apert syndrome (distortions of the head and face and webbing of the hands and feet), • cholinesterase deficiency (congenital myasthenic syndrome) • complement deficiency (hereditary angioedema) Disease that sometimes caused by Alu-mediated uneven recombination • insulin-resistant diabetes type II (InsReceptor) • Lesch–Nyhan syndrome (overproduction of uric acid leading to neurologic syndrome), • Tay–Sachs disease, • complement component C3 deficiency, • Familial hypercholesterolaemia • α-thalassaemia • Several types of cancer, including Ewing sarcoma, breast cancer, acute myelogenous leukaemia Positive role of Alu repeats in evolution Insertions of the repeat near gene may change Alu its expression pattern, gene structure, Alu or leads to alternatively spliced mRNA isoforms Alu LTRs contain promoters, ALUs repeats contain TF binding sites Human repeat distribution depends on GC content of integration sites Alu paradox • Alu repeats are found in GC-rich (gene rich) regions more often than in AT rich; • De novo integration of ALU-repeats happens in AT-rich areas (as they hijacked ORF2 product of LINE) ALUs are subject of positive selection (as they CREATE new genes) by supplying genome segments ready to become genes with promoter like elements and exonic-like boundaries. Also they are GC rich themselves, so they transform AT-rich regions into GC rich LTR transposons • Any trasposon flanked by Long Terminal Repeats; • DNA bases transposons and Retrotransposons; Contain Transposase; Already silent in the human genome Fossils (Charlie and Tigger types) Endogenous Retroviral Sequences (ERVs) Contain Gag and Pol genes Only HERV-K look still OK for moving DNA transposons and retrotransposons LINE SINE Kazazian, Science, Vol 303, Issue 5664 Human RNA genes (non-coding RNA transcripts) • • • • • • • • • 3000 RNA genes in human genome (rough) rRNA tRNA THIS IS NOT TRUE, Small nuclear RNA MY OPINION Small nucleolar RNA IS CLOSE TO 100,000 SRP RNA MicroRNA Antisense RNA Non-coding gene mRNA isoforms; RNAs form transcribed pseudogenes miRNA and antisense RNA are underestimated; “other non-coding RNA” are not represented rRNA genes (1200 genes) 18S, 5.8S and 28S are encoded by single transcription units; Located in 5 clusters: Chr. 13,14,15, 21,22 5S is in tandem arrays, largest is on Chr. 1q41-42 All this is to increase a gene dosage tRNA genes (497 nuclear genes + 324 putative pseudogenes) • Humans have fewer tRNA genes that the worm (584), but more than the fly (284); • Frog X.laevis have thousands of tRNA genes; • Number of tRNA genes correlates with size of the oocytes; In large oocytes lots of protein needs to be sythethized simultaneously…. tRNA genes (497 nuclear genes + 324 putative pseudogenes) • 49 families according to codon recognition; (Should by 61 for every coding triplet) Paradox is eliminated by codon wobbling • Very rough correlation between tRNA gene number and amino acid frequency in the protein • 280 out of 497 genes are on Chr.6, most are clustered in the same 4 Mb region; other are also more or less clustered (Chr. 1 and 7) • All chromosomes still carry at least one tRNA gene – chr.22 and Y are exclusions Representation of aminoacids by human tRNA (examples) Amino Acid Alanine Leucine Tryptophan Valine Aspartate Cysteine Histidine Selenocysteine Frequency 7,06% 9,95% 1,30% 6,12% 4,78% 2,25% 2,56% <0,01% Number of tRNAs 40 35 7 44 10 30 12 1 Small nuclear RNA (snRNA) • Uridine rich; • Numbered U1, U2, U3 etc • Include spliceosomal RNAs U6 and U1 U6 (44 genes) and U1 (16 genes) • Sometimes clustered as very irregular or almost perfect groups, e.g. RNU1 locus at 1p36 and RNU2 at 17q21; • For U6 snRNA 1135 fragmental/pseudogenic sequences are identified Small nucleolar RNA (snoRNA) • Employed in nucleolus to guide site-specific base modifications in rRNA; • Also can modify U6 RNA; • snoRNA genes often found in other gene’ introns • Generally not clustered except SNURF-SNRPN unit on 15 q which possibly involved in Prader-Willi sydrome • C/D box snoRNA and H/ACA snoRNA Site-specific 2’-O-ribose methylation of rRNA (105-107 sites) Site-specific Pseudouridylation (95 sites) SRP RNA (7SL RNA) Protein export machinery of the endoplasmic reticulum binds a protein RNA complex (Signal Recongnition Particle) that contains 7SL RNA four 7SL genes, 500 7SL pseudogenes and all the Alu repeats that are derived form 7SL gene Micro RNA (miRNA) • a family of 21–25-nucleotide small RNAs that negatively regulate gene expression at the post-transcriptional level; • primary transcripts of miRNAs are processed sequentially by two RNase-III enzymes, Drosha and Dicer, into a small, imperfect dsRNA duplex (miRNA:miRNA*) mature miRNA strand plus its complementary strand (miRNA*). • RNA-induced silencing complex (RISC) is operated by miRNA;miRNA* and Ago-proteins Exonuclease III Drosha This form is exported from the nucleus By Exportin-5 Dicer cleaves microRNAs into their mature form miRNA incorporated into effector complexes Elizabeth P Murchison and Gregory J Hannon miRNA is recognized by the PAZ domain of an Ago protein, and incorporated into RISC facilitates transfer of miRNAs into RISC. ss miRNA Depending on RISC components, RISC may target homologous mRNA for cleavage, or stall mRNA translation Non-coding mRNA with poly(A )tail transcribed by RNA pol II • • • • Mid-to-large size mRNA For most of them function is unknown Often overexpressed in tumors 7SK RNA decreases rate of RNA pol II elongation and inhibits the activity of CDK9/cyclin T complexes; • SRA RNA co-activator of steroid receptors • XIST RNA – X-chromosome incativation in female cells Antisense RNA • TSIX regulates XIST gene • Antisense regulation of imprinted genes • aHIF: regulates hypoxia-inducible factor (HIF)1alpha and HIF-2alpha; • Makorin-2 gene as an antisense to the RAF1 oncogen • RFP2 CLL candidate gene and RFP2OS transcript • antisense beta myosin heavy chain RNA switches myosin heavy chain gene expression from myosin beta to myosin alpha in heart musc Polypeptide encoding genes In human genome clusters of gene-rich regions are separated by gene deserts Chr. 19 has the highest gene density, Chr. 13 & Y show the lowest gene density; Gene total estimated 30,000-40,000 average gene size of 27 Kb Hundreds of human genes share homology with bacterial genes Some more statistics • • • • • • • • • Gene density 1/100 kb (vary widely); Averagely 9 exons per gene 363 exons in titin gene Many genes are intronsless Largest intron is 800 kb (WWOX gene) Smallest introns – 10 bp Average 5’ UTR 0,2-0,3 kb Average 3’ UTR 0,77 kb but underestimated… Largest protein: titin: 38,138 aa INTRONLESS GENES • • • • • • • Interferon genes Histone genes Many ribonuclease genes Heat shock protein genes Many G-protein coupled receptors Some genes with HMG boxes Various neurotransmitters receptors and hormone receptors Smallest human genes Percentages describe exon content to the length of the gene Typical human genes Extra Large human genes IG genes are shown as germline genes, before rearrangements transcription of long introns is costly, in highly expressed genes introns are 14 times shorter than in low-level express Castillo-Davis et al., 2002 Presumable functions of human genes HUMAN genes and their homology to genes from other organisms Why so small amount of genes we, humans, kings of nature, have? Human 30,000 genes Drosophila – 13,000 Nematode – 19,000 Potential of proteome and transcriptome diversity is so great that it is no need for increase of amount of genes Gene families • Functionally identical genes -- Recently duplicated genes (Alpha-globins); -- Histone genes (86 members, some are identical) -- Ubiquitin-encoding genes (some are in polycistronc transcription units) • Functionally similar genes usually arise by duplications also, than diverge • Functionally related genes belong to the same pathway or to encode subunits of protein complex (usually non-related) Chromosomal distribution of human histone genes Bidirectional and partially overlapping genes • Not very common in human genome as 1 gene/100 kb density allow genes to be loose… • Provides possibility for common regulation of a gene pair. • Partially overlapping genes are usually encoded by opposite DNA strands. Found in dense gene areas, as HLA class III complex on 6p21.3 Could represent sense-antisense pair with one gene is coding mRNA, another is non-coding HLA class III complex on 6p21.3: an example of tightly packed genes MHC Class III genes Encoding complement proteins C4A and C4B, C2 and FACTOR B TUMOUR NECROSIS FACTORS  AND  Plus some Immunologically irrelevant genes Genes encoding 21-hydroxylase, RNA Helicase, Casein kinase Heat shock protein 70, Sialidase An example of complex human gene locus INK4a-ARF From: Prof. Gordon Peters website Genes within genes Neurofibromatosis gene (NF1) intron 26 encode : OGMP (oligodendrocyte myelin glycoprotein) EVI2A and EVO2B (homologues of ecotropic viral intergration sites in mouse) Gene families • Classical gene families (overall conservativeness) Histones, alpha and beta-globines • Gene families with large conservative domains (other parts could be low conservative) HLH/bZIP box transcription factors • Gene families with short conservative motifs e.g. DEAD box (Asp-Glu-Ala-Asp), WD repeat Example of human gene families clustered together CS = chorionic somatomammotropin four placenta-specific genes, primates only serum albumin alpha-albumin vitamin D-binding protein Example of human protein motifs DEAD box proteins are involved in mRNA splicing and translation initiation; 8 conservative boxes, DEAD is the most evident WD proteins take part in a variety of regulatory functions, GH (Gly-His should be at 23-41 aa distance from WD (Trp-Aps) Gene superfamilies • Proteins that are functionally related in a general sense, but show only weak homology • Immunoglobulin superfamily (IG genes, T- cell receptor genes, HLA-genes….) • Globin superfamily (myoglobin, alpha and betaglobins, neuroglobin etc….) • G-protein coupled receptor superfamily (seven transmembrane domains, but low homology) And so on…. Illustration to gene superfamily Major mechanisms of gene family spreading • Ancient gene or chromosomal segment duplications – Tandem duplications – Duplications with gene transfer to another chromosome • Retrotransposition events (processed copies with no introns only) Fig 33 Finished HG has 1.5% interchromosomal and 2% intrachromosomal segmental duplications. The duplications are 10–50 kb long and highly homologous. Human Gene Families extended recently by gene duplication Some regions of genome are more prone to rearrangements than others Chromosome 22q Human pseudogenes Non-processed pseudogenes Contain introns; Arise by duplications; Frequency of transfer depend on chromosomal context (pericentromeral fragment are transferred more often) Processed pseudogenes Do not contain introns; Arise by retrotransposition; Frequency of transfer depends on initial level of gene expression (Highly expressed genes are transferred more often) Complete Partial Both types of pseudogenes are raw material for evolution HLA type I cluster Domain structure of a typical HLA type I gene Complete pseudogenes Partial pseudogenes and their structure NF1 gene and its pseudogenes on different chromosomes All NF1 pseudogenes are partial; 11 of them are found in the genome Mechanism of processed pseudogene transfer into new location Could be very prolific: there are 95 functional ribosomal genes and 2090 pseudogenes Transcription from pseudogenes Chr 1 Master gene Chr 15 Chr 7 Partial duplication with preservation of the promoter; Expression is preserved in evolution, If transcript encode partial protein (regulatory), or if rare transcription factors sites present in both promoters LINE-mediated inclusion of cDNA copy of master gene; Brought under heterologous promoter by chance, could be antisense Human Genome Project The International Human Genome Consortium Initial sequencing and analysis of the human genome Nature, 409, February 15, 860-921 (2001) Venter et al. (Celera) The Sequence of the Human Genome Science, 291, February 16, 1304-1351 (2001) History of Human Genome Consortium 1984 to 1986 – first proposed at US DOE meetings 1988 – endorsed by US National Research Council - creation of genetic, physical and sequence maps of the human genome - parallel efforts in five model organisms: bacteria, yeast, worms, flies and mice; - develop of supporting technology - ethical, legal and social issues (ELSI) 1990 – Human Genome Project (NHGRI) with NIH Later – UK, France, Japan, Germany, China, Russia Technical development necessary for human genome completion 1. Automated capillary based DNA sequencing 2. Electronic databases GenBank, UniGene, sequence assembly software Completed sequences 1995 – First complete bacterial genomes 2002 – About 35 bacterial genomes; 0.5-5 Mb; hundreds to 2000 genes 1996 April – Yeast (Saccharomyces cerevisiae) 12 Mb, 5,500 genes 1998 Dec. -Worm (Caenorhabditis elegans) Mouse, 97 Mb, 19,000 genes rat, 2000 March - Fly (Drosophila melanogaster) chimp 137 Mb, 13,500 genes 2000 Dec. - Mustard (Arabidopsis thaliana) 125 Mb, 25,498 genes 2000 June – Human (Homo sapiens) 1st rough draft 2001 Feb 15/16 – Human, “working draft” 3000 Mb, 35,000~40,000 genes Bac- by Ba shotgu (public sequence Total shotgu from the BAC end (Celera Clone contig is a prerequisite No prerequisites Prerequiste of human genome sequencing: genetic and physical maps Genetic Mapping – based on recombination frequency (expressed as cM); Key word : co-segregation Physical Mapping – actual molecular distance in nucleotide base pairs (expressed as bp, kb, or mb) Key word: contig Genetic maps are important crutches for physical maps Genetic Markers for Mapping • Polymorphic markers for genetic mapping. RFLPs – restriction fragment length polymorphisms SSRs – simple sequence repeats (also called microsatellites) • For High-resolution physical mapping STSs – sequence tagged sites ESTs – expressed sequenced tags Fragment of a human genetic map One map unit = one centimorgan (cM) = 1% recombination between loci Physical maps Ways to create a genetic maps: Analysis of large human pedigrees The CEPH Family Panel (Centre du Etude Polymorphisme Humain) – 40 nuclear families • • • • 10 are French families: 27 are Utah Mormon pedigrees: 2 are Venezuelan Huntington's pedigrees: 1 is an Old Order Amish pedigree: (has bipolar affective disorder segregating). The total number of individuals in the panel is 520, with an average sibship size of 8. Typical task of CEPH based research To integrate new polymorphic marker into human genetic map: 1) To PCR given marker in DNA from members of CEPH families 2) To compare segregation patterns of given marker with segregation patterns of other known markers (already mapped) 3) To conclude genetic location of marker of interest as co-segregated with known marker Ways to create a physical maps: • 1. Somatic cell hybrids • 2. Radiation hybrids • 3. Enrichments of starting DNA for library construction - chromosome flow-sorting; - microdissection; • 4. Contig construction from genomic fragments - BAC, YAC, PAC….. - cosmids…. cgil.uoguelph.ca/ QTL/ SomaticCellHybrids.htm Somatic cell hybrids each of resulting cell colonies will contain a full mouse genome plus a few human chromosomes Resulting colonies are stable Monochromosomal hybrides are most useful 24 colonies = 24 PCRs = chromosomal location of your sample Radiation hybrids Whole-genome radiation hybrids RH maps are constructed by typing a panel of hybrids with a set of human DNA markers Only a PROPORTION of the pieces of the broken human chromosomes will integrate into rodent chromosomes RH panels available before human genome sequence GeneBridge 4 panel 93 human-hamster cell lines Stanford G3 panel 83 human-hamster cell lines Each contain 32% of Each contain 16% of human genome in average human genome in average 25 Mb is average fragment 2,4 Mb is average fragment size size Allow finer mapping Both operates via databases on corresponding central servers Creation of representative clone libraries In Shot Gun cloning we hope that randomly picked fragment will cover every piece of genome Overlap whole genome with handy bacterial clones, screen for one that you need after Contig and contigous maps www.biozentrum.uni-wuerzburg.de/ .../weenie/fig1.html Shotgun cloning (clone everything, hope for the best) How many clones would be required to have a complete library of the human genome? N = ln(1 - P) / ln(1 - f) where N = number of colonies required to have P = probability of recovering any particular sequence, when f = average fraction of genome / clone (clone/genome ratio) For the human genome of 3 x 109 bp pUC18 plasmid accepts 5,000bp insert => f = 5000 / 3 x 109 set P = 0.999 to be on safe side i.e., 99.9% of the genome is represented at least once or, any particular gene segment is present with 99.9% probability Then, N = 4.1 x 106 plasmid clones required You’ll need less clones, if your average size of insert will be larger ww2.mcgill.ca/biology/undergra/ c200a/f07-16.gif Partial restriction with Sau3A 4-bp sequence GATC with sticky ends Partial digestion of this region of DNA would yield a variety of overlapping fragments (blue) ~ 20 kb long. Use of such overlapping fragments increases the probability that all sequences in the genomic DNA will be represented in a Lambda library. Gridded library in a 96 well plate For human genome sequencing: • Underlying map: YAC map – YACs are too large for sequencing, often are rearranged and difficult to handle); • Subject to sequencing map: BAC/PAC map • Most important and interesting regions: were mapped by cosmid-based maps Yeast artificial chromosome (YAC) YAC vectors capacity is 150 kb -2 Mb. (in yeast cells) The YAC vectors contain: 1) yeast centromere (CEN) and yeast telomeres (TEL) 2) autonomously replicating sequence (ARS) = origins of replication 3) URA3 gene involved in uracil synthesis for positive selection of yeast cells with YAC (in URA- host strain) 4) To propagate empty vector a bacterial replication origin (ORI) and bacterial selection marker (Amp) ARS URA www.indstate.edu/thcme/ mwking/yac.gif BAC (Bacterial Artificial Chromosome) capacity = 100-200 kb www.labs.roslin.ac.uk/ jwilliam/bacinfo.html Cosmid vectors Cosmid = plasmid that contains cos site (packaging signal) of the lambda phage Vector capacity = 42-45 kb Vector size = 5 kb Phage-derived advantages: Size selection, large capacity Plasmid-derived advantages: Plasmid minipreps, growing as a colony www.web-books.com/MoBio/ Free/images/Ch9A6.gif http://www.epicentre.com/f5_3/f5_3pw3.gif www.web-books.com/MoBio/ Free/images/Ch9A6.gif Cosmid map in PDE4a human region For human genome sequencing: YAC, BAC/PAC and cosmid map were verified by STS-maps: STS-maps included : 1) Lots of polymorphic STR markers (genetic maps) 2) Lots of short sequences mapped on RH hybrid panels or on somatic cell hybrids 3) Lots of ESTs (mostly representing genes) For human genome sequencing: • BACs/PACs representing verified contigs were subjected to subcloning into M13 vectors (short inserts) and randomly sequenced; • Resulting sequences of every BAC/PAC clone were aligned by PHRED/PHRAP software – PHRED analyses raw seq traces and provides quality score for each bp position (estimating degree of confidence); – PHRAP perform sequence assembly itself PHRED base calling program that works with different image process settings most accurate PHRED quality scores PHRAP aligns sequenced clones Problems with genomic sequence alignments • High copy repeats (Alu, LINEs) • Segmental duplications Sometimes blocks are very large (>200Kb) Highly Similar (>97%) No characteristic sequences Duplication detection From. Dr. Vicky Choi lectures 5-10 copies 10-20 copies Over-representation of Celera reads in the duplicated regions of clones ~40 copies Human gene maps Gene prediction Signal based: Starts, stops, splicing signals, promoters GRAIL, GENEFINDER Content-based: ORF, Codon preference by the organism, EST coverage GRAIL algorithm 1. Generate the list of all exon candidates -- translational start, splice donor, splice acceptor, translational stop -- 1000s of candidate exons per 10 kb sequence 2. Remove improbable exon candidates by set of 30 rules (95% of exons removed) 3. Evaluate exon candidates by neural network -- exon candidate scores as output node Gene predictions are inaccurate
flag this doc
106
1
not rated
0
4/16/2008
English
Preview

Lecture 12 Human Genome Project

sammyc2007 4/17/2008 | 79 | 4 | 0 | educational
Preview

The Impact of the Human Genome Project on Public Health Practice in Chinese

sammyc2007 4/15/2008 | 55 | 0 | 0 | educational
Preview

Mapping the Human Genome what does it mean for you

sammyc2007 4/17/2008 | 47 | 0 | 0 | educational
Preview

An Introduction to Genomics including the basics of Human Genome

sammyc2007 4/17/2008 | 80 | 2 | 0 | educational
Preview

Beyond the Human Genome- Transcriptomics

sammyc2007 3/28/2008 | 90 | 2 | 0 | educational
Preview

The Human Genome Project Implications for Health Care and Society

sammyc2007 4/12/2008 | 107 | 3 | 0 | educational
Preview

international consortium in Human Genome Epidemiology

sammyc2007 3/29/2008 | 65 | 0 | 0 | educational
Preview

Assembling genome sequences HO

sammyc2007 4/17/2008 | 31 | 0 | 0 | educational
Preview

The Organization of a Student run Social Medicine Course

sammyc2007 4/17/2008 | 59 | 0 | 0 | educational
Preview

Preparation of plasmids containing HBV full genome of genotype A to H and trial of HBV inactivation method

sammyc2007 4/28/2008 | 17 | 0 | 0 | educational
Preview

A GENOME WIDE APPROACH TO PREDICT OUTCOME IN OSTEOSARCOMA

sammyc2007 4/16/2008 | 64 | 1 | 0 | educational
Preview

Genome annotation and gene recognition presentation of projects HO

sammyc2007 4/17/2008 | 24 | 0 | 0 | educational
Preview

New discoveries concerning genome structure and evolution NM

sammyc2007 4/17/2008 | 86 | 4 | 0 | educational
Preview

Human Genome Project

mountainmom01 5/14/2008 | 89 | 5 | 0 | educational
Preview

Genome-scale Metabolic Network Reconstruction

AmnaKhan 4/16/2008 | 38 | 1 | 0 | educational
Preview

WEST VIRGINIA desarrollo económico autoridad solicitud de ayuda financiera en espanol

sammyc2007 6/13/2008 | 293 | 2 | 0 | legal
Preview

Valoración en espanol

sammyc2007 6/13/2008 | 251 | 0 | 0 | legal
Preview

Venta de cuentas de las empresas en espanol

sammyc2007 6/13/2008 | 311 | 4 | 0 | legal
Preview

Una declaración de deseo de una muerte natural en espanol

sammyc2007 6/13/2008 | 279 | 3 | 0 | legal
Preview

Valor de arrendamiento y subarrendamiento en espanol

sammyc2007 6/13/2008 | 521 | 2 | 0 | legal
Preview

Última voluntad y testamento en espanol

sammyc2007 6/13/2008 | 423 | 1 | 0 | legal
Preview

Última voluntad y testamento esta es la última voluntad y testamento de mí en espanol

sammyc2007 6/13/2008 | 249 | 0 | 0 | legal
Preview

Toda la solución de acuerdo todos los derechos en espanol

sammyc2007 6/13/2008 | 229 | 0 | 0 | legal
Preview

Última voluntad y testamento CONOCER TODOS LOS HOMBRES POR ESTOS PRESENTA que yo en espanol

sammyc2007 6/13/2008 | 353 | 0 | 0 | legal
Preview

Subcontrato para construir casa en espanol

sammyc2007 6/13/2008 | 316 | 0 | 0 | legal
 
review this doc