Introduction to Bioinformatics

Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu/~cyan 6/15/2006 Changhui Yan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not appearing in the literature until 1991 …However, … had been building databases, developing algorithms and making biological discoveries by sequence analysis since the 1960s---long before anyone thought to label this activity with a special term ….So bioinformatics has, in fact, been in existence for more than 40 years …” (Mark S. Boguski, Trends Guide to Bioinformatics Elsevier, Trends Supplement 1998 p1) 6/15/2006 Changhui Yan 2 What Is Bioinformatics ? Any use of computers to handle biological information The use of computers to characterize biology molecules or to simulate dynamics of molecules The use of computers to store, compare, retrieve, or analyze biology information Computational Biology, Proteomics, Genomics, Medical Informatics … 6/15/2006 Changhui Yan 3 Bioinformatic Problems 6/15/2006 Changhui Yan 4 Central Dogma 6/15/2006 Changhui Yan 5 Genome 6/15/2006 Changhui Yan 6 Bioinformatic Problems Genome Sequencing 6/15/2006 Changhui Yan 7 Human Genome Project (HGP) To determine the sequences of the 3 billion bases that make up human DNA To identify the approximate 100,000 genes in human DNA (The estimates has been changed to 20,00025,000 by Oct 2004) To store this information in databases To develop tools for data analysis 6/15/2006 Changhui Yan 8 Human Genome Project (HGP) HGP began in October 1990 and completed in 2003 99% human DNA sequence finished to 99.99% accuracy (April 2003) 15,000 full-length human genes identified (March 2003) Finished genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster (April 2003) Post-genome era 6/15/2006 Changhui Yan 9 Completely Sequenced Genomes 6/15/2006 Changhui Yan 10 Genome Projects More than 60 eukaryotic genome sequencing projects are underway 6/15/2006 Changhui Yan 11 Genome Sequencing 6/15/2006 Changhui Yan 12 Genome Sequencing 6/15/2006 Changhui Yan 13 Difficulties due to Repeats Uncertainty Missing data Huge size!!!! … 6/15/2006 Changhui Yan 14 Gene finding Genome Sequencing Gene Finding 15,000 human genes identified The estimates are 100,000 (1990) … 20,000-25,000 (Oct 2004) 3 billion bases that make up human DNA 6/15/2006 Changhui Yan 15 Gene-finders 6/15/2006 Changhui Yan 16 Sequence Alignment Genome Gene Finding Sequence alignment 6/15/2006 Changhui Yan 17 Longest Common Subsequences 6/15/2006 Changhui Yan 18 Sequence Alignment Pair-wise Alignment Multiple Sequence Alignment Searching Databases http://www.ncbi.nlm.nih.gov/BLAST/ 6/15/2006 Changhui Yan 19 Sequence Alignment Global vs. Local 6/15/2006 Changhui Yan 20 Gene Expression Genome Sequencing Gene Finding Sequence Alignment Gene Expression 6/15/2006 Changhui Yan 21 Gene Expression 6/15/2006 Changhui Yan 22 Protein Folding Genome Sequencing Gene Finding Sequence Alignment Gene Expression Protein Structure 6/15/2006 Changhui Yan 23 Protein Structure Visualization of protein structure Protein structure alignment Protein structure prediction 6/15/2006 Changhui Yan 24 Protein Structure Prediction Comparative modeling If the sequence is similar to another one whose structure is known. Fold recognition In absence of a significantly similar sequence with known structure, these methods try to determine how well a known structure fits the sequence to model. Ab initio prediction Can detect the structures that have not been discovered. Monte Carlo search for lowest energy. 6/15/2006 Changhui Yan 25 Protein Function Prediction Genome Sequencing Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function 6/15/2006 Changhui Yan 26 Protein Function Prediction “similar sequence-similar structure-similar function” paradigm Identification of homologous sequences (BLAST, PSIBLAST) (>30% identity) Identification of conserved functional sites (<=30%) 6/15/2006 Changhui Yan 27 Conserved Functional Sites -- Motifs [AG]-G-x(0,1)-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G 6/15/2006 Changhui Yan 28 Motifs 6/15/2006 Changhui Yan 29 Conserved Functional Sites -- Motifs Single motif PROSITE: a database of biologically significant sites 6/15/2006 Changhui Yan 30 Conserved Functional Sites -- Motifs Multiple motifs PRINTS: a database of protein fingerprints. A fingerprint is a group of conserve motifs characterizing a protein function 6/15/2006 Changhui Yan 31 PRINTS >ATHA_PIG 6/15/2006 Changhui Yan 32 PRINTS 6/15/2006 Changhui Yan 33 Conserved Functional Sites -- Motifs Hidden Markov Model Pfam: 6/15/2006 Changhui Yan 34 Protein Interaction Network Genome Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function Protein Interaction Network 6/15/2006 Changhui Yan 35 Protein Interaction Network 6/15/2006 Changhui Yan 36 6/15/2006 Changhui Yan 37 Protein Interaction Network 6/15/2006 Changhui Yan 38 Bioinformatic Problems Genome Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function Protein Interaction Network 6/15/2006 Changhui Yan 39 Bioinformatic Problems There are more …. Phylogeny analysis: Tree of life Databases and tools development 6/15/2006 Changhui Yan 40 Bioinformatic Databases GenBank (DNA sequences) ProteinDataBank (Protein structures) PIR (Protein sequences) Nucleic Acids Research (2005) 719 databases 6/15/2006 Changhui Yan 41 Bioinformatic Programs Sequence analysis: BLAST, ClustalX, EMBOSS, GCG Molecular imaging/modeling: PyMol, MOLMOL, RasMol … 6/15/2006 Changhui Yan 42

Related docs
Introduction to Bioinformatics
Views: 44  |  Downloads: 10
Bioinformatics Bioinformatics
Views: 27  |  Downloads: 4
Introduction to Bioinformatics
Views: 19  |  Downloads: 3
An introduction to bioinformatics
Views: 34  |  Downloads: 10
The Future of Bioinformatics
Views: 36  |  Downloads: 7
Introduction to Bioinformatics Final project
Views: 58  |  Downloads: 5
Bioinformatics
Views: 12  |  Downloads: 2
premium docs
Other docs by StuartSpruce
de270
Views: 107  |  Downloads: 0
Organizational Behavior Brochure
Views: 1017  |  Downloads: 55
dv101k
Views: 188  |  Downloads: 0
No Higher Calling
Views: 296  |  Downloads: 1
Leasehold Estates
Views: 258  |  Downloads: 4
Helicopters Nacionales de Columbia v Hall
Views: 213  |  Downloads: 0
I Exalt Thee
Views: 202  |  Downloads: 0
We Fall Down
Views: 210  |  Downloads: 2
There is a Redeemer
Views: 184  |  Downloads: 3
Hannah v Peel
Views: 389  |  Downloads: 3
Checklist - Contracts
Views: 539  |  Downloads: 29
Counter offer
Views: 663  |  Downloads: 30
Gruen v Gruen
Views: 207  |  Downloads: 2
Shine Jesus Shine
Views: 307  |  Downloads: 4