Embed
Email

Genomics

Document Sample
Genomics
Shared by: HC11111810353
Categories
Tags
Stats
views:
1
posted:
11/18/2011
language:
English
pages:
80
Genomics

The Human Genome Project



 Mapping and Sequencing the Genomes of

Model Organisms

 Data Collection and Distribution

 Ethical, Legal, and Social Considerations

 Research Training

 Technology Development

 Technology Transfer

A Few Genome Resources



 NCBI Genome Resources

 UCSC Human Genome Browser

 Ensembl Human Genome Server

Genome Sequencing Progress



 NCBI Genome Sequence Repository

 All organisms

 Eukaryotic genomes

 Prokaryotic genomes

 Archaea genomes

 Viruses

Genome Sequencing









From NCBI, 5/2001

Human Genome Sequencing 2/11/2001









From NCBI

Human Genome Progress 2/11/2001





Total Non-redundant Percentage of

sequence sequence (kb) genome

(kb)

Finished 1,140,365 1,040,372 32.50%

Unfinished 3,547,899 1,951,344 61.00%

Total 4,688,264 2,991,716 93.50%









From NCBI

Microbial Genomes



 Published complete microbial genomes

 Microbial genomes and chromosomes in

progress

Genome Informatics



 Annotation and Analysis

 Data Handling

 Metabolic Reconstruction

 Comparative Genomics

 Functional Genomics

Genome Project Organization



 Cloning

 Mapping

 Sequencing

 Annotation

 Analysis

Cloning and Mapping

Cloning



 Large

 YAC’s

 1 Mb

 BAC’s

 100 - 200 Kb

 Intermediate

 Cosmids

 Lambda clones

 Small

 Plasmids; M13

Mapping



 Establishment of Guideposts

 Aids in Assembly

 Error Checking

 Useful in mapping of genetic disorders

Genetic Maps



 Cytogenetic markers

 Linkage maps

 Polymorphic loci screened by PCR to

determine inheritence patterns

 Produce linkage map with nearby loci

Physical Maps



 Radiation Hybrid/YACs/Cosmids

 Restriction Sites

 Sequence Tagged Sites

 100 Kb resolution needed

 30,000 STS’s

 Expressed Sequence Tags

 Detection

 PCR

 Hybridization

 FISH

 Fluoresecent in situ Hybridization

Human Genome STS Mapping Strategy



 STS Content Mapping

 Screen YAC’s by PCR

 Radiation Hybrid Mapping

 Screen RH Cell lines by PCR

 Genetic Mapping

 PCR Screening of polymorphic loci

 Combine above to produce an integrated

map

Mapping Resolution



 YAC mapping

 1 Mb

 Radiation hybrid mapping

 10 Mb

 Genetic map

 30 Mb

GeneMap’98



 Integrated Human Genetic Map

 Over 30,000 unique gene-based markers

 100 Kb resolution

 http://www.ncbi.nlm.nih.gov/genemap98/

Map Integration

Human Chromosome 1 Genetic Map

Human Chromosome 1 Combination Map

Sequencing

Sequencing Methods



 Random Shotgun

 Ordered Shotgun

 Directed

 Primer Walking

 Direct genomic sequencing

Random Shotgun Sequencing



 Randomly shear or cut DNA into small pieces

 2-4 Kb

 Clone into M13, pUC or some other sequencing

vector

 Sequence the clones from both ends

 Rely on the computer to assemble the

sequences into one (or as few as possible)

contigs

Shotgun Sequencing Statistics



 Lander and Waterman equation

 poisson distribution

 Po = e-m

 probability that a base is not sequenced

where m=sequence coverage

H. influenza Sequencing



 For 1X random sequence coverage = 1.8 Mb

 P = 0.37 (63% of the bases are sequenced)

 To get > 99% of the bases sequenced

 5X coverage = 8.74 Mb of sequence

 Po = e-5 = 0.0067

 This coverage would leave approx. 128 gaps of

about 100 bp in size

 From Science 269:496-512. 1995

Ordered Sequencing



 Generate a set of large sequence clones in

lambda phage

 May be subcloned from YACs or BACs as necessary

 End sequence the lambda clones and order the

clones to produce a map of the genome

 Choose a minimal tiling path of the genome from

the ordered lambda clones

Ordered Sequencing...



 Shear and subclone the lambda inserts

that comprise the minimal tiling set into

sequencing vectors

 Shotgun sequence and assemble each of

these lambda inserts individually

 Assemble all sequences into one,

contiguous genome

Directed Sequencing



 Process used for finishing following the

shotgun sequencing phase

 Gap closure

 Use specific sequencing primers to extend

appropriate clones into gap regions

 Use specific sequencing primers to

sequence directly from genomic DNA

Sequence Assembly

Assembly of Shotgun Fragments



 For H. influenzae (TIGR) 1.8 Mb

 24,304 Sequence fragments were generated

for the random assembly phase

 11,631,485 bases

 Generated 140 contigs

 Assembled using the TIGR Assembler

 30 hours of cpu time

phred/phrap/consed



 Widely used programs for sequence:

 base calling (phred)

 assembly (phrap)

 editing (consed)

 Developed at the University of Washington

 Phil Green (phrap)

 Brent Ewing (phred)

 David Gordon (consed)

Genome Annotation and Analysis



Pattern Matching

Sequence Annotation



 ORF identification

 Frameshift resolution

 Genome map construction

 Functional assignments

 Metabolic pathway assignment

 Metabolic pathway Reconstruction

 Comparative analysis

Annotation Tools



 Semi-automated

 Manual

MAGPIE



 Multipurpose Automated Genome Project

Investigation Environment

 Terry Gaasterland et. al.

 http://genomes.rockefeller.edu/magpie/magpie.htmlAutomated

 Semi-automated analysis tool for microbial

genome projects

MAGPIE Example

Non-Automated Analysis and Prediction



 The Ureaplasma urealyticum genome

database

 Run analysis tool

 Parse results

 Dump results into the database

 View results

 Manually annotate

Genomic Sequence Database



 Data Storage

 Sequence

 Gene Map

 Annotation

 User Interface

 Web browser

 Customizable

The Ureaplasma urealyticum Genome Project



 Uu - 751,719 bp

 http://genome.microbio.uab.edu/uu/uugen.htm

 Web-based genome analysis tool

Annotation Problems



 Problems with existing sequence databases

 Incomplete datasets

 Skewed datasets

 Incorrectly annotated records

 Annotations based on experimental vs. predicted

data

 Nomenclature differences

 Transitive errors in gene function predictions

 Functional predictions for “hypothetical” genes

Metabolic Pathway Reconstruction

Metabolic Pathway Reconstruction



 Role assignment

 Extract metabolic pathways from genomes

 Navigation and analysis

 Pathway editing

Metabolic Assignments

 Amino acid Biosynthesis

 Biosynthesis of cofactors, prosthetic groups, and carriers

 Cell envelope

 Cellular processes

 Central intermediary metabolism

 Energy metabolism

 Fatty acid and phospholipid metabolism

 Purines, pyrimidines, nucleosides, and nucleotides

 Regulatory functions

 Replication

 Transcription

 Translation

 Transport and binding proteins

 Other categories, Unassigned

 Hypothetical

Ureaplasma urealyticum Gene Map

1 50,000



50,001 100,000



100,001 150,000



150,001 200,000



200,001 250,000



250,001 300,000



300,001 350,000



350,001 400,000



400,001 450,000



450,001 500,000



500,001 550,000



550,001 600,000



600,001 650,000



650,001 700,000



700,001 750,000



750,001 751,719 Cofactor Biosynthesis Energy Metabolism Replication Other

Cell envelope Fatty Acid Metabolism Transcription RNA

Cellular processes Hypothetical Translation

Central Intermediary Metabolism Nucleotide Metabolism Transport tRNA

Uu Genes Mg Genes

Percent Percent

Role # of Total # of Total

Amino acid Biosynthesis 1 0.2% 0 0.0%

Biosynthesis of cofactors 10 1.7% 7 1.5%

Cell envelope 19 3.1% 26 5.4%

Cellular processes 13 2.1% 15 3.1%

Central intermediary metabolism 15 2.5% 7 1.5%

Energy metabolism 23 3.8% 30 6.3%

Fatty acid - phospholipids 6 1.0% 7 1.5%

Hypothetical 293 48.3% 169 35.3%

Other categories 1 0.2% 3 0.6%

Purines, pyrimidines 18 3.0% 20 4.2%

Regulatory functions 4 0.7% 4 0.8%

Replication 45 7.4% 31 6.5%

Transcription 17 2.8% 19 4.0%

Translation 100 16.5% 99 20.7%

Transport and binding proteins 37 6.1% 35 7.3%

Unassigned 4 0.7% 7 1.5%

Total 606 100.0% 479 100.0%

EcoCyc



 Peter D. Karp, PhD

 SRI International

 Menlo Park, CA

 http://ecocyc.pangeasystems.com/ecocyc/

ecocyc.html

Pathway Reconstruction

Cell

Metabolic Network







Pathways

Annotated Genome

List of Gene Products Reactions (Compounds)







List of Genes/ORFs Gene Products







DNA Sequence Genes





Genomic

Maps

Adapted from P. Karp, Pangea Systems

Glycolysis in Uu?

glucose-1-phosphate

? phosphoglucomutase

glucose-6-phosphate

phosphoglucose isomerase

fructose-6-phosphate

6-phosphofructokinase

fructose-1,6-bisphosphate

fructose bisphosphate aldolase

glyceraldehyde-3-phosphate

glyceraldehyde 3-phosphate

glyceraldehyde-3-phosphate dehydrogenase

dehydrogenase 1.2.1.12

1.2.1.9 3-phospho-D-glyceroyl-phosphate

phosphoglycerate kinase

3-phosphoglycerate pyruvate

Uu Energy Metabolism



 Glycolysis

 Missing several components

 Pentose-phosphate pathway

 Only 2/8 enzyme complexes present

 Proton motive force - ATP synthase

complex

 Urease Gene Complex

 Biologically relevant

Comparative Genomics



 What makes one organism different from

all other organisms?

 Molecular Biology

 Physiology

 Pathogenesis

 Epidemiology

 Genetics

Ortholog Comparisons



 Uu to Mg genes: 324

 53% of Uu; 67% of Mg

 71 hypothetical

 Mh to Mg genes: 314

 41% of Mh; 57% of Mg

 55 hypothetical (2 – “unique hypothetical”)

 Mh to Uu genes: 330

 47% of Uu; 43% of Mh

 82 hypothetical (19 – “unique hypothetical”)

M. genitalium - M. pneumoniae Gene Order





500,000





400,000





300,000





200,000





100,000





0

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000





M. pneumoniae Gene Position

M. genitalium - U. urealyticum Gene Order





500,000





400,000





300,000





200,000





100,000





0

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000





U. urealyticum Gene Position

Paralog Analysis



 Identification of conserved, paralogous

groups

 All against All comparison

 Genes within one organism

 Identifies groups of “related” genes

 Primary sequence

 Structure

 Function

Uu Paralogous Clusters >3



 4 tRNA synthetase

 4 Translation factors

 4 Hypothetical membrane lipoprotein

 5 ATP synthase alpha, beta chains

 6 MBA

 7 Hypothetical membrane lipoprotein

 8 Hypothetical

 10 Iron transporters

 13 Transporters

Functional Genomics



 Gene Expression

 Gene Regulation

 Genome-wide Mutagenesis

Expression Arrays



 Cell growth in different environments

 Isolate cDNAs

 Measure expression using array technology

 Create database of expression information

 Display information in an easy-to-use format

 Show ratio of expression under different conditions

Putting it all together

From F. Blattner, U. Wisc.

Chromosome Views



 Ensembl view

 UC Santa Cruz view

 NCBI View

A Final Caveat



 “The difficulty of identifying genes in

anonymous vertebrate sequences”



Claverie JM, Poirot O, Lopez F

Comput Chem 1997;21(4):203-14

The identification of genes in newly determined vertebrate genomic

sequences can range from a trivial to an impossible task. In a

statistical preamble, we show how "insignificant" are the individual

features on which gene identification can be rigorously based:

promoter signals, splice sites, open reading frames, etc. The practical

identification of genes is thus ultimately a tributary of their

resemblance to those already present in sequence databases, or

incorporated into training sets. The inherent conservatism of the

currently popular methods (database similarity search, GRAIL) will

greatly limit our capacity for making unexpected biological

discoveries from increasingly abundant genomic data. Beyond a very

limited subset of trivial cases, the automated interpretation (i.e.

without experimental validation) of genomic data, is still a myth. On

the other hand, characterizing the 60,000 to 100,000 genes thought to

be hidden in the human genome by the mean of individual

experiments is not feasible. Thus, it appears that our only hope of

turning genome data into genome information must rely on drastic

progresses in the way we identify and analyze genes in silico.

Only One Final Word of Wisdom...



 “...although the computer is a wonderful

helpmate for the sequence searcher and

comparer, biochemists and molecular

biologists must guard against the blind

acceptance of any algorithmic output;

given the choice, think like a biologist and

not a statistician.”

 - Russell F. Doolittle, 1990


Related docs
Other docs by HC11111810353
DECC Annual Report 2008-09: Appendices
Views: 1  |  Downloads: 0
INPUT
Views: 3  |  Downloads: 0
Maternal-Fetal Pharmacology
Views: 0  |  Downloads: 0
Bid # 2524 - Current
Views: 0  |  Downloads: 0
AP Microeconomics Practice Test
Views: 18  |  Downloads: 0
2004 RFQ Design Services
Views: 1  |  Downloads: 0
PowerPoint Presentation
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!