Docstoc

Bioinformatics I

Document Sample
Bioinformatics I Powered By Docstoc
					Swiss Institute of Bioinformatics




                                             Bioinformatics     I
                                             2003 - 2004




                             Mihaela Zavolan | Michael Primig
                          Erik van Nimwegen | Torsten Schwede
Bioinformatics I




              For lecture notes, articles, references, links, etc.

                                 see „Teaching“

                                       @


               http://www.bioz.unibas.ch/personal/schwede/

                   http://www.bioz.unibas.ch/personal/primig/
Bioinformatics I




21. 10.   Introduction                                                         M.P. M.Z. & E.vN.
28. 10.   Sequence Comparison I: pair wise aligments, dynamic programming      M. Zavolan
04. 11.   Sequence Comparison II: multiple sequence aligments                  A. Brüngger (Novartis)
11. 11.   Alignment Programs and Sequence Database Searching                   A. Brüngger (Novartis)
18. 11.   Evolution of Genes and Genomes                                       M. Primig
25. 11.   Evolution of Protein Families                                        T. Schwede
02. 12.   no lecture (SAB meeting)                                             ---
09. 12    Models of Molecular Evolution                                        E. van Nimwegen
16. 12.   Genefinding I: protein-coding genes, procaryotic genome annotation   M. Zavolan
06. 01.   Genefinding II: eucaryotic genome annotation                         M. Zavolan
13. 01.   Pharmacogenomics & SNPs                                              T. Schwede
20. 01.   Clustering                                                           E. van Nimwegen
27. 01.   Microarray Data Computation                                          M. Peitsch
03. 02.   Microarray Data Analysis                                             M.P. & T.S.
10. 02.   WebXam
Bioinformatics I


What is Bioinformatics?

         An operational definition:

         The applications of computer sciences to
         molecular biology - in particular to the study of
         macromolecules such as proteins and nucleic acids.


         Some Synonyms:

         • Molecular Bioinformatics

         • Computational Biology

         • Biocomputing
Bioinformatics I


What is not Bioinformatics?

      • “Bio-inspired” computer sciences (e.g. artificial life, neural
          networks, genetic algorithms)

      • Genome Sequencing (e.g. genetic mapping and contig
      assembly)

      • Biomathematics or bio-statistics

      • Modeling of biological systems (Ecological or population
      modeling)

      • Protein Structure Refinement (X-Ray, NMR)

      • DNA Computers
Bioinformatics I




            Why do we need Bioinformatics?
Bioinformatics I


Why do we need Bioinformatics?
           5’                         3’
 DNA
                                             Transcription

                                                Splicing
 mRNA


                                              Translation
 Poly-
 peptide
                                                Folding


 Protein



                                      • Transport / Localization
                                      • Oligomerization
                                      • PTM (Post-Translational Modification)
                Function   Function
Bioinformatics I
                                                Genome Projects
Why do we need Bioinformatics?                  need to store and
                                                 organize DNA
           5’                         3’          sequences.
 DNA
                                             Transcription

                                                Splicing
 mRNA


                                              Translation
 Poly-
 peptide
                                                Folding


 Protein



                                      • Transport / Localization
                                      • Oligomerization
                                      • PTM (Post-Translational Modification)
                Function   Function
Bioinformatics I


DNA databases...
  gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaa
  atctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgc
  gagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatac
  tgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagca
  gcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatc
  agaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaa
  gtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaa
  ccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagc
  atgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcaggg
  cccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagag
  agcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgataca
  gtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatct
  gttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaattt
  caaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatca
  gtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcag
  catgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaac
  atcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaa
  ttatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtca
  atggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatac
  aaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaa
  ttaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaagcaca
  accagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagata
  aagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagat
  tgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcac
  cagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcag
  tattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaagga
  ccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttag
  taaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccactagggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggt
  caaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgccatattaggacatatagttagccctaggtgtgaatatcaagcaggacataa
  caaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaagggaaccatacaatgaatggacactagaacttttaga
  ggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaacatagcagaatag
  acattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcag
  gaagaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaat
  ataggaaaataagaagacaaaacaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaattt
  gtgggtcacagtttattatggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaa
  attttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtttatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccact
  agtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagataaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagt
  cattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactc
  aactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaattaattgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggc
  agagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaacaatagtctttaatcaatcctcaggaggggacccagaaattgtaat
  gcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattataaacatgtggcaggaagtaggaaaag
  caatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatataaagtagta
  aaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtc
  tggtatagtgcaacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcattt
  gcaccactgctgtgccttggaatactagttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattg
  gaattagataactgggcaagtttgtggaattggtttagcataacaaattggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagac
  gcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagagagacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttga
  ttgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggattcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaa
  agagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaaattggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctc
  gagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
  gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactgacctttggttggtgcttcaagctagtaccagttgagcc
  agagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggagtactacaaggactgctgac
  actgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactagg
  gaacccactgcttaagcctcaataaagcttgccttgagtgcttca



                Human immunodeficiency virus type 1, complete genome 9214 BP
Bioinformatics I




 DNA databases




         • GenBank (USA) http://www.ncbi.nlm.nih.gov/Genbank/
         • EMBL (Europe) http://www.ebi.ac.uk/embl/
         • DDBJ (Japan)   http://www.ddbj.nig.ac.jp/
Bioinformatics I
Bioinformatics I




   EMBL/GenBank/DDBJ Annotations:



                        Warning !




   DNA data base annotations are full of errors:

   • In sequences, in annotations, in CDS attribution
   • No consistency of annotations
   • Most annotations are done by scientist who submit
   data
   • Heterogeneity of data quality and updating
Bioinformatics I


 Some interesting sequence annotations:

      FT source        1..124
      FT           /db_xref="taxon:4097"
      FT           /organelle="plastid:chloroplast"
      FT           /organism="Nicotiana tabacum"
      FT           /isolate="Cuban cahibo cigar, gift from President Fidel
      FT           Castro"


Or:
      FT source         1..17084
      FT           /chromosome="complete mitochondrial genome"
      FT           /db_xref="taxon:9267"


                                                       ???
      FT           /organelle="mitochondrion"
      FT           /organism="Didelphis virginiana"
      FT           /dev_stage="adult"
      FT           /isolate="fresh road killed individual"
      FT           /tissue_type="liver"
Bioinformatics I




Taxonomy
Browser
@ EBI:




                   http://www.ebi.ac.uk/newt/
Bioinformatics I


Why do we need Bioinformatics?                   How do we find
           5’                         3’          protein coding
 DNA                                           regions, introns and
                                                exons in genomic
                                             Transcription
                                                DNA sequences?

                                                Splicing
 mRNA


                                              Translation
 Poly-
 peptide
                                                Folding


 Protein



                                      • Transport / Localization
                                      • Oligomerization
                                      • PTM (Post-Translational Modification)
                Function   Function
gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaaatctctagcagtggcgcc
tgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgcgagagcgtcgatattaagcgggggaggattagatag

   Bioinformatics I
atgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatactgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattat
ataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatg
gtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatcagaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaa
tgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggat
taaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaaccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactatt
ttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaa
agaagggcacatagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccc
caccagaagagagcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgatacagtattag
aagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatctgttgactcagattggttgcacttta


      Finding genes in prokaryotic genomes ...
aattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaatttcaaaaatcgggcctgaaaatccatataatactccagtatttgc
cataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcagga
agtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcagcatgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgta
ggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggac
tgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaattatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggag
tgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcata
gtaatatggggaaagactcctaaatttaaactacccatacaaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatgg
ggcagctaacagggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaag
cacaaccagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagataaagcccaagaagaa
catgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagattgtacacatttagaaggaaaaattatcctggt
agcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcaccagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaat
ttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcagtattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagac
ataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaa
gatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccacta
gggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggtcaaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgc
catattaggacatatagttagccctaggtgtgaatatcaagcaggacataacaaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaaggga
accatacaatgaatggacactagaacttttagaggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtc
aacatagcagaatagacattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcaggaa
gaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaatataggaaaataagaagacaaa
acaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaatttgtgggtcacagtttattatggggtacctgtgtggaaaga
agcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaaattttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtt
tatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccactagtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagat
aaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagtcattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataa
aaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactcaactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaatta
attgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggcagagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaaca
atagtctttaatcaatcctcaggaggggacccagaaattgtaatgcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattat
aaacatgtggcaggaagtaggaaaagcaatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatata
aagtagtaaaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtctggtatagtg
caacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcatttgcaccactgctgtgccttggaatactag
ttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattggaattagataactgggcaagtttgtggaattggtttagcataacaa
attggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagacgcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagaga
gacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttgattgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggat
tcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaaagagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaa
attggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctcgagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggt
acctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaagaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactga
cctttggttggtgcttcaagctagtaccagttgagccagagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggag
tactacaaggactgctgacactgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactag
ggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatccct
tcagaccaaatttagtcagtgtgaaaaatctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagaga
gatgggtgcgagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatactgggacaac
tacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctgacacaggaaatagcagccag
gtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatcagaaggagccaccccacaagatttaaacaccatgctaaacacagt
ggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccac
ctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaaccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgaca
gaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaa
ttttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccag
ggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagagagcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaac
taaaggaagctctattagatacaggagcagatgatacagtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtc
aacataattggaagaaatctgttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaat
ttcaaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggatg
tgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcagcatgacaaaaatcttagagccttttagaaaacaa
aatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggttatgaactccatcctgataa
atggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaattatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctag
aactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgta
aaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatacaaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagtt
agagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaag
taaacatagtaacagactcacaatatgcattaggaatcattcaagcacaaccagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatc
aggaaagtactctttttagatggaatagataaagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaat
atggcaactagattgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcaccagta
ctacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcagtattcatccacaattttaaaaga
aaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagtaat
acaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactat
Bioinformatics I




 Finding genes in prokaryotic genomes ...


      Intrinsic Information:

      • Transcription signals
      • Codon usage
      • GC – content

       External Information:

      • Similarity to known proteins in other organisms
Bioinformatics I




 Finding genes in eukaryotic genomes ...

   Some of the problems:


        • Complexity of genome increases with complexity of
        organisms
        • 1-2 % coding sequence
        • Intron – exon structure
        • Alternative splicing (~30% of all genes)
        • Pseudogenes
Bioinformatics I


Why do we need Bioinformatics?
           5’                         3’
 DNA
                                              Transcription

                                                Splicing
 mRNA
                                               Under which
                                               Translation
                                           conditions is a certain
 Poly-                                       gene transcribed?
 peptide
                                                Folding


 Protein



                                      • Transport / Localization
                                      • Oligomerization
                                      • PTM (Post-Translational Modification)
                Function   Function
Bioinformatics I




  Gene Expression Analysis using

            DNA Microarrays

        Microarrays are ordered sets of DNA molecules of
        known or unknown sequence immobilized on a support
        (e.g. glass).
        • DNA
        • DNA replication
        • RNA
        • Genome-wide Protein-DNA interactions
Bioinformatics I




    PCR Microarrays
Bioinformatics I




    GeneChips




     $$$
Bioinformatics I
Bioinformatics I




                   Examples of Microarray images




               Genechip              PCR Microarray
Bioinformatics I


Analysis & Interpretation: making
sense of the expression data
 The most frequent problem is to find
 sets of genes that show correlated
 expression profiles during a process
 (cell cycle, development), a pathological
 condition (cancer, myopathy) or an
 external stimulus (temperature,
 medium, drug)
 • Computing raw data
 • Filter genes (up- or
 downregulated)
 • Cluster expression profiles



                   http://www.biozentrum.unibas.ch/primig/
Bioinformatics I


Where do we need Bioinformatics?
           5’                            3’
 DNA

                                          WhatTranscription
                                               do we know
                                          about a specific
                                             protein?
                                                  Splicing
 mRNA


                                                 Translation
 Poly-
 peptide
                                                  Folding


 Protein



                                         • Transport / Localization
                                         • Oligomerization
                                         • PTM (Post-Translational
                   Function   Function   Modification)
Bioinformatics I




     Minimal content of a « protein sequence » db

                   •   Sequences !!
                   •   Accession number (AC)
                   •   Taxonomic data
                   •   References
                   •   ANNOTATION/CURATION
                   •   Keywords
                   •   Cross-references
                   •   Documentation
Bioinformatics I




    SWISS-PROT/TrEMBL


   • Collaboration between the SIB (CH) and EMBL/EBI (UK)


   • SWISS-PROT: Fully annotated (manually), non-redundant,
       cross-referenced, documented protein sequence database.


   • TrEMBL: is automatically generated (from annotated EMBL
       coding sequences (CDS)) and annotated using software
       tools.

                                   http://www.expasy.org/sprot/
Bioinformatics I




  ExPASy Web Server

  ExPASy =

  Expert
  Protein
  Analysis
  System




                      http://www.expasy.org/
Bioinformatics I


         Some databases in the field of molecular biology…
 AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD,
 Beanref, Biolmage, BioMagResBank,      BIOMDB,     BLOCKS,      BovGBASE,
 BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC,
 CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
 CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS,
 DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL,
 EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB,
 GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC,
 GermOnline, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB,
 HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase,
 HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA,
 KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI,
 MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR,
 MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb,
 PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB,
 PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP,
 REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD,
 SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-
 3DIMAGE, SWISS-MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS,
 TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT,
 WormPep, YEPD, YPD, YPM, ......... etc
Bioinformatics I


Why do we need Bioinformatics?
           5’                            3’
 DNA
                                                Transcription

                                                   Splicing
 mRNA
                                      How can we compare
                                       protein sequences?
                                                  Translation
 Poly-
 peptide
                                                   Folding


 Protein



                                         • Transport / Localization
                                         • Oligomerization
                                         • PTM (Post-Translational Modification)
                Function   Function
Bioinformatics I



 Sequence alignments and comparison

      1: MYTAILORISRICH

      2: MONTAILLEURESTRICHE

      1: MY-TAIL--ORIS-RICH-
         ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦         Global Alignment
      2: MONTAILLEURESTRICHE

      1:           TAILO   RICH
                   ¦¦¦¦x   ¦¦¦¦    Two Local Alignments
      2:           TAILL   RICHE
                                     ¦ = Identity
                                     x = Mismatch
                                     - = Insertion / Deletion
Bioinformatics I

                   HBA_CHICK    VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL    48

 Multiple
                   HBAD_CHICK   ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL    48
                   HBPI_CHICK   AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV    48
                   HBB_CHICK    VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL    48

 Sequence          HBE_CHICK
                   HBRH_CHICK
                                VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL
                                VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL
                                                                                      48
                                                                                      48
                   MYG_CHICK    GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL    49

 Alignment                         ....   .   ..*   . .. * * * *.. .* *    * * ..

                   HBA_CHICK    SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV   93
                   HBAD_CHICK   SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV   93
                   HBPI_CHICK   SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV   93
                   HBB_CHICK    SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV   98
 (MSA)             HBE_CHICK
                   HBRH_CHICK
                                SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV
                                SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV
                                                                                     98
                                                                                     98
                   MYG_CHICK    KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI   99
                                .      *. .. ** .*.. . . .. ..     .   *.. *    ..

                   HBA_CHICK    DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR--   141
                   HBAD_CHICK   DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR--   141
                   HBPI_CHICK   DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR--   141
 Programs:         HBB_CHICK
                   HBE_CHICK
                                DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH--
                                DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH--
                                                                                     146
                                                                                     146

 • CLUSTALW        HBRH_CHICK
                   MYG_CHICK
                                DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH--
                                PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF
                                                                                     146
                                                                                     149
                                    . .... .   .* .   . ... .   .* .      .. **.
 • T_COFFEE
                   HBA_CHICK    ----   141
 • MULTALIGN       HBAD_CHICK
                   HBPI_CHICK
                                ----
                                ----
                                       141
                                       141
                   HBB_CHICK    ----   146
                   HBE_CHICK    ----   146
                   HBRH_CHICK   ----   146
                   MYG_CHICK    GFQG   153

                   Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%)
                   Character to show that a position in the alignment is perfectly conserved: '*'
                   Character to show that a position is well conserved: '.'
Bioinformatics I   NCBI BLAST   http://www.ncbi.nlm.nih.gov/blast/




BLAST:

Basic
Local
Alignment
Search
Tool
Bioinformatics I


Why do we need Bioinformatics?
           5’                            3’
 DNA
                                               Transcription

                                                  Splicing
 mRNA
                                        Can we predict
                                      protein structures?
                                                  Translation
 Poly-
 peptide
                                                  Folding


 Protein



                                        • Transport / Localization
                                        • Oligomerization
                                        • PTM (Post-Translational Modification)
                Function   Function
Bioinformatics I



  Protein Structure Modeling

     • Ab initio modeling
     • Threading & Fold Recognition
     • Homology Modeling



 MNIFEMLRID
 HLLTKSPSLN
               EGLRLKIYKD
               AAKSELDKAI
                            TEGYYTIGIG
                            GRNCNGVITK
                                         ?
 DEAEKLFNQD    VDAAVRGILR   NAKLKPVYDS
 LDAVRRCALI    NMVFQMGETG   VAGFTNSLRM
 LQQKRWDEAA    VNLAKSRWYN   QTPNRAKRVI
 TTFRTGTWDA    YKNL
Bioinformatics I




  Some remarks on bioinformatics in general ...


    • It does not replace but complement
      experimental research

    • It helps plan experiments

    • Good bioinformatics studies take a long
      time

    • Like anywhere else: some garbage in, a
      lot of garbage out
Bioinformatics I




    For lecture notes, articles, references, links, etc.

                           see „Teaching“

                                  @

             http://www.bioz.unibas.ch/personal/schwede/

               http://www.bioz.unibas.ch/personal/primig/

				
DOCUMENT INFO