Docstoc

Genomics_ Bioinformatics and Proteomics

Document Sample
Genomics_ Bioinformatics and Proteomics Powered By Docstoc
					MG 500-Web site:

www.biosci.ohio-state.edu/~dverma/

Genomics, Bioinformatics and Proteomics
Chapter 19

From genes to genomes
•Functions of many genes are not known •Genes for many phenotypes are not known •Mutants without genotype are known •Polygenic traits •Next 10 years goal is to define the function of • most genes in plants and animals

Genome size of model organisms
Organism cellular complexity genomic complexity ______________________________________ Bacterium one cell prokaryote 4Mb Yeast Nematode Drosophila Mouse Human one cell Eukaryote 1000 cells 50,000 cells 10-11 cells 10-14 cells 15Mb 100Mb 180Mb 3000Mb 3000Mb

A Genomic Project
• • • • Genetic map Physical map cDNA sequencing Genomic sequencing
– clone -by -clone – -shotgun method

Annotating the sequence
• Identification of Open Reading Frames
– – – – Six possible open reading frames Initiation codon (ATG) Termination codon (TAA, TGA, TAG) Splicing
• False open reading frames • Wrong termination

ORF Search Programs
• Codon bias
– High in exons – Nill or low in introns

• Intron exon junction • Upstream regulatory sequences • Poly A addition signal

EST: Expressed Sequence Tags
• Not full length sequences • Provide general information about a gene transcript, tissue specificity and abundance of expression • Full length sequence can be obtained by PCR

Gene Arrays
• • • • • • • Synthetic DNA (50-60b) immobilized on a chip Hybridize with fluorescent labeled cDNA Commercially available gene arrays Total Genome array Tissue specific gene array Disease-specific gene array Signal transduction pathway arrays

Computer Programs for Genome Analysis
• Data basis (over 800 genomes sequenced)
– Gene data bases. EST, SNPs, protein data base – EMBL/ GenBank (National Center for Biotecnology Information) – FASTA, ORF – BlastN – BlastP – Consensus sequence data base, secondary structure analysis

Procaryotic Genomes
• Most genomes are Circular • Gene density is very high
– One gene /kb – Intergenic region very short – Little or no introns

• Presence of operons (polycistronic transcription unit) • 1500-5000 genes

Eucaryotic Genomes
• Low gene density • Increase in the size of genome is not proportional to gene number yeast 1 gene/2.5kb; human 1 gene /8.5kb) • Number and size of introns increases as the genome size increase (yeast has few introns; human single gene may have 100 intron) • Presence of repeatitive DNA and large intergenic regions

Eucaryotic gene
• Monocistronic but C. elegans has 25% of the gene arranged as polycistronic • Genes are found within introns of other genes • Gene duplications and the presence of pseudo genes • Maize has 10 times larger genome than Arabidopsis but contains about the same number of genes (genes are arranged in clusters) • Gene-empty regions may be involved in chromosomal rearrangement

Insights from genomics
• Organisms resembling single cell algae existed 1.4 billion years ago • Genomes are highly dynamic and evolving raqpidly • Smallest genome (Mycoplasma , only 470 genes) • Disease causing bacteria have reduced their genome size (Mycoplasma leprae has lost 50% of its genome); causing slow growth

Gene Duplication
• Important for evolution • Over 50% of the genome is duplicated in yeast • Provides insight in to the evolution of a species (over 10,000 genes in human aqre duplicated) • Gene duplication increases genetic diversity

Gene Duplication
• Caused by unequal crossing over • Replication errors • Molecular Phylogenetics allows determination of duplication and divergence events • Duplicated genes may remain linked or become scattered on different chromosomes, eg globin genes. • Multigene families can provide diverse functions during development • Multiple proteins that arise from gene duplications are known as Paralogs

Immunoglobin Genes
• Two light chains and two heavy chains with constant and variable regions • A unique somatic recombination during B cell maturation occurs to generate over 100,000 possible configurations that are specific to each unknown antigen. • Recombination occurs between variable region, J region and C region • J and C regions have no promoters

Immunoglobulins

Proteomics
• One gene- one enzyme concept is not valid • Many proteins are post-translationally modified/ combine with other proteins to make a functional complex • Proteome changes during development and in response to the environment

2G Gel analysis
• Denature proteins • Isoelectric Focusing (resolve on the basis of isolelectic point of a protein • SDS - poly acrylamide Gel electrophoresis (resolve on the basis of size) Resolve 200-1000 major spots

Isoelectric point (pH)

MW

Protein fingerprinting
• Each spot from a gel can be cut out and sequenced using MASS -Spectrophotometer • Treat with trypsin protease and analyze the mass of the fragment, compare data with the information of the mass of a peptide fragment generated by computer analysis of a protein database

Genomics/proteomics
• A fertile field with enormous potential for new discoveries/products
bioinformatics data mining Diagnostics, Novel Agricultural crops