Bioinformatics I by pengxiuhui

VIEWS: 14 PAGES: 38

									Swiss Institute of Bioinformatics
                       Bioinformatics   I
                       2003 - 2004
               Mihaela Zavolan | Michael Primig
             Erik van Nimwegen | Torsten Schwede
Bioinformatics I
       For lecture notes, articles, references, links, etc.

                 see „Teaching“

                    @


        http://www.bioz.unibas.ch/personal/schwede/

          http://www.bioz.unibas.ch/personal/primig/
Bioinformatics I
21. 10.  Introduction                             M.P. M.Z. & E.vN.
28. 10.  Sequence Comparison I: pair wise aligments, dynamic programming   M. Zavolan
04. 11.  Sequence Comparison II: multiple sequence aligments         A. Brüngger (Novartis)
11. 11.  Alignment Programs and Sequence Database Searching          A. Brüngger (Novartis)
18. 11.  Evolution of Genes and Genomes                    M. Primig
25. 11.  Evolution of Protein Families                    T. Schwede
02. 12.  no lecture (SAB meeting)                       ---
09. 12  Models of Molecular Evolution                    E. van Nimwegen
16. 12.  Genefinding I: protein-coding genes, procaryotic genome annotation  M. Zavolan
06. 01.  Genefinding II: eucaryotic genome annotation             M. Zavolan
13. 01.  Pharmacogenomics & SNPs                       T. Schwede
20. 01.  Clustering                              E. van Nimwegen
27. 01.  Microarray Data Computation                     M. Peitsch
03. 02.  Microarray Data Analysis                       M.P. & T.S.
10. 02.  WebXam
Bioinformatics I


What is Bioinformatics?

     An operational definition:

     The applications of computer sciences to
     molecular biology - in particular to the study of
     macromolecules such as proteins and nucleic acids.


     Some Synonyms:

     • Molecular Bioinformatics

     • Computational Biology

     • Biocomputing
Bioinformatics I


What is not Bioinformatics?

   • “Bio-inspired” computer sciences (e.g. artificial life, neural
     networks, genetic algorithms)

   • Genome Sequencing (e.g. genetic mapping and contig
   assembly)

   • Biomathematics or bio-statistics

   • Modeling of biological systems (Ecological or population
   modeling)

   • Protein Structure Refinement (X-Ray, NMR)

   • DNA Computers
Bioinformatics I
      Why do we need Bioinformatics?
Bioinformatics I


Why do we need Bioinformatics?
      5’             3’
 DNA
                       Transcription

                        Splicing
 mRNA


                       Translation
 Poly-
 peptide
                        Folding


 Protein                   • Transport / Localization
                   • Oligomerization
                   • PTM (Post-Translational Modification)
        Function  Function
Bioinformatics I
                        Genome Projects
Why do we need Bioinformatics?         need to store and
                         organize DNA
      5’             3’     sequences.
 DNA
                       Transcription

                        Splicing
 mRNA


                       Translation
 Poly-
 peptide
                        Folding


 Protein                   • Transport / Localization
                   • Oligomerization
                   • PTM (Post-Translational Modification)
        Function  Function
Bioinformatics I


DNA databases...
 gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaa
 atctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgc
 gagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatac
 tgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagca
 gcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatc
 agaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaa
 gtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaa
 ccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagc
 atgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcaggg
 cccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagag
 agcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgataca
 gtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatct
 gttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaattt
 caaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatca
 gtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcag
 catgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaac
 atcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaa
 ttatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtca
 atggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatac
 aaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaa
 ttaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaagcaca
 accagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagata
 aagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagat
 tgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcac
 cagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcag
 tattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaagga
 ccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttag
 taaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccactagggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggt
 caaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgccatattaggacatatagttagccctaggtgtgaatatcaagcaggacataa
 caaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaagggaaccatacaatgaatggacactagaacttttaga
 ggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaacatagcagaatag
 acattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcag
 gaagaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaat
 ataggaaaataagaagacaaaacaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaattt
 gtgggtcacagtttattatggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaa
 attttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtttatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccact
 agtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagataaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagt
 cattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactc
 aactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaattaattgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggc
 agagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaacaatagtctttaatcaatcctcaggaggggacccagaaattgtaat
 gcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattataaacatgtggcaggaagtaggaaaag
 caatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatataaagtagta
 aaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtc
 tggtatagtgcaacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcattt
 gcaccactgctgtgccttggaatactagttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattg
 gaattagataactgggcaagtttgtggaattggtttagcataacaaattggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagac
 gcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagagagacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttga
 ttgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggattcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaa
 agagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaaattggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctc
 gagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
 gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactgacctttggttggtgcttcaagctagtaccagttgagcc
 agagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggagtactacaaggactgctgac
 actgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactagg
 gaacccactgcttaagcctcaataaagcttgccttgagtgcttca        Human immunodeficiency virus type 1, complete genome 9214 BP
Bioinformatics I
 DNA databases
     • GenBank (USA) http://www.ncbi.nlm.nih.gov/Genbank/
     • EMBL (Europe) http://www.ebi.ac.uk/embl/
     • DDBJ (Japan)  http://www.ddbj.nig.ac.jp/
Bioinformatics I
Bioinformatics I
  EMBL/GenBank/DDBJ Annotations:            Warning !
  DNA data base annotations are full of errors:

  • In sequences, in annotations, in CDS attribution
  • No consistency of annotations
  • Most annotations are done by scientist who submit
  data
  • Heterogeneity of data quality and updating
Bioinformatics I


 Some interesting sequence annotations:

   FT source    1..124
   FT      /db_xref="taxon:4097"
   FT      /organelle="plastid:chloroplast"
   FT      /organism="Nicotiana tabacum"
   FT      /isolate="Cuban cahibo cigar, gift from President Fidel
   FT      Castro"


Or:
   FT source     1..17084
   FT      /chromosome="complete mitochondrial genome"
   FT      /db_xref="taxon:9267"


                            ???
   FT      /organelle="mitochondrion"
   FT      /organism="Didelphis virginiana"
   FT      /dev_stage="adult"
   FT      /isolate="fresh road killed individual"
   FT      /tissue_type="liver"
Bioinformatics I
Taxonomy
Browser
@ EBI:
          http://www.ebi.ac.uk/newt/
Bioinformatics I


Why do we need Bioinformatics?          How do we find
      5’             3’     protein coding
 DNA                      regions, introns and
                        exons in genomic
                       Transcription
                        DNA sequences?

                        Splicing
 mRNA


                       Translation
 Poly-
 peptide
                        Folding


 Protein                   • Transport / Localization
                   • Oligomerization
                   • PTM (Post-Translational Modification)
        Function  Function
gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaaatctctagcagtggcgcc
tgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgcgagagcgtcgatattaagcgggggaggattagatag

  Bioinformatics I
atgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatactgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattat
ataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatg
gtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatcagaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaa
tgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggat
taaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaaccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactatt
ttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaa
agaagggcacatagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccc
caccagaagagagcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgatacagtattag
aagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatctgttgactcagattggttgcacttta


   Finding genes in prokaryotic genomes ...
aattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaatttcaaaaatcgggcctgaaaatccatataatactccagtatttgc
cataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcagga
agtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcagcatgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgta
ggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggac
tgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaattatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggag
tgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcata
gtaatatggggaaagactcctaaatttaaactacccatacaaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatgg
ggcagctaacagggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaag
cacaaccagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagataaagcccaagaagaa
catgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagattgtacacatttagaaggaaaaattatcctggt
agcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcaccagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaat
ttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcagtattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagac
ataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaa
gatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccacta
gggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggtcaaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgc
catattaggacatatagttagccctaggtgtgaatatcaagcaggacataacaaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaaggga
accatacaatgaatggacactagaacttttagaggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtc
aacatagcagaatagacattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcaggaa
gaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaatataggaaaataagaagacaaa
acaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaatttgtgggtcacagtttattatggggtacctgtgtggaaaga
agcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaaattttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtt
tatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccactagtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagat
aaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagtcattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataa
aaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactcaactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaatta
attgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggcagagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaaca
atagtctttaatcaatcctcaggaggggacccagaaattgtaatgcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattat
aaacatgtggcaggaagtaggaaaagcaatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatata
aagtagtaaaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtctggtatagtg
caacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcatttgcaccactgctgtgccttggaatactag
ttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattggaattagataactgggcaagtttgtggaattggtttagcataacaa
attggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagacgcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagaga
gacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttgattgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggat
tcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaaagagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaa
attggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctcgagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggt
acctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaagaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactga
cctttggttggtgcttcaagctagtaccagttgagccagagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggag
tactacaaggactgctgacactgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactag
ggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatccct
tcagaccaaatttagtcagtgtgaaaaatctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagaga
gatgggtgcgagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatactgggacaac
tacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctgacacaggaaatagcagccag
gtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatcagaaggagccaccccacaagatttaaacaccatgctaaacacagt
ggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccac
ctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaaccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgaca
gaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaa
ttttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccag
ggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagagagcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaac
taaaggaagctctattagatacaggagcagatgatacagtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtc
aacataattggaagaaatctgttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaat
ttcaaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggatg
tgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcagcatgacaaaaatcttagagccttttagaaaacaa
aatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggttatgaactccatcctgataa
atggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaattatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctag
aactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgta
aaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatacaaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagtt
agagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaag
taaacatagtaacagactcacaatatgcattaggaatcattcaagcacaaccagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatc
aggaaagtactctttttagatggaatagataaagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaat
atggcaactagattgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcaccagta
ctacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcagtattcatccacaattttaaaaga
aaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagtaat
acaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactat
Bioinformatics I
 Finding genes in prokaryotic genomes ...


   Intrinsic Information:

   • Transcription signals
   • Codon usage
   • GC – content

    External Information:

   • Similarity to known proteins in other organisms
Bioinformatics I
 Finding genes in eukaryotic genomes ...

  Some of the problems:


    • Complexity of genome increases with complexity of
    organisms
    • 1-2 % coding sequence
    • Intron – exon structure
    • Alternative splicing (~30% of all genes)
    • Pseudogenes
Bioinformatics I


Why do we need Bioinformatics?
      5’             3’
 DNA
                       Transcription

                        Splicing
 mRNA
                        Under which
                        Translation
                      conditions is a certain
 Poly-                    gene transcribed?
 peptide
                        Folding


 Protein                   • Transport / Localization
                   • Oligomerization
                   • PTM (Post-Translational Modification)
        Function  Function
Bioinformatics I
 Gene Expression Analysis using

      DNA Microarrays

    Microarrays are ordered sets of DNA molecules of
    known or unknown sequence immobilized on a support
    (e.g. glass).
    • DNA
    • DNA replication
    • RNA
    • Genome-wide Protein-DNA interactions
Bioinformatics I
  PCR Microarrays
Bioinformatics I
  GeneChips
   $$$
Bioinformatics I
Bioinformatics I
          Examples of Microarray images
        Genechip       PCR Microarray
Bioinformatics I


Analysis & Interpretation: making
sense of the expression data
 The most frequent problem is to find
 sets of genes that show correlated
 expression profiles during a process
 (cell cycle, development), a pathological
 condition (cancer, myopathy) or an
 external stimulus (temperature,
 medium, drug)
 • Computing raw data
 • Filter genes (up- or
 downregulated)
 • Cluster expression profiles          http://www.biozentrum.unibas.ch/primig/
Bioinformatics I


Where do we need Bioinformatics?
      5’              3’
 DNA

                     WhatTranscription
                        do we know
                     about a specific
                       protein?
                         Splicing
 mRNA


                         Translation
 Poly-
 peptide
                         Folding


 Protein                     • Transport / Localization
                     • Oligomerization
                     • PTM (Post-Translational
          Function  Function  Modification)
Bioinformatics I
   Minimal content of a « protein sequence » db

          •  Sequences !!
          •  Accession number (AC)
          •  Taxonomic data
          •  References
          •  ANNOTATION/CURATION
          •  Keywords
          •  Cross-references
          •  Documentation
Bioinformatics I
  SWISS-PROT/TrEMBL


  • Collaboration between the SIB (CH) and EMBL/EBI (UK)


  • SWISS-PROT: Fully annotated (manually), non-redundant,
    cross-referenced, documented protein sequence database.


  • TrEMBL: is automatically generated (from annotated EMBL
    coding sequences (CDS)) and annotated using software
    tools.

                  http://www.expasy.org/sprot/
Bioinformatics I
 ExPASy Web Server

 ExPASy =

 Expert
 Protein
 Analysis
 System
           http://www.expasy.org/
Bioinformatics I


     Some databases in the field of molecular biology…
 AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD,
 Beanref, Biolmage, BioMagResBank,   BIOMDB,   BLOCKS,   BovGBASE,
 BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC,
 CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
 CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS,
 DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL,
 EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB,
 GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC,
 GermOnline, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB,
 HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase,
 HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA,
 KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI,
 MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR,
 MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb,
 PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB,
 PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP,
 REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD,
 SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-
 3DIMAGE, SWISS-MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS,
 TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT,
 WormPep, YEPD, YPD, YPM, ......... etc
Bioinformatics I


Why do we need Bioinformatics?
      5’              3’
 DNA
                        Transcription

                          Splicing
 mRNA
                   How can we compare
                    protein sequences?
                         Translation
 Poly-
 peptide
                          Folding


 Protein                     • Transport / Localization
                     • Oligomerization
                     • PTM (Post-Translational Modification)
        Function  Function
Bioinformatics I Sequence alignments and comparison

   1: MYTAILORISRICH

   2: MONTAILLEURESTRICHE

   1: MY-TAIL--ORIS-RICH-
     ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦     Global Alignment
   2: MONTAILLEURESTRICHE

   1:      TAILO  RICH
          ¦¦¦¦x  ¦¦¦¦  Two Local Alignments
   2:      TAILL  RICHE
                   ¦ = Identity
                   x = Mismatch
                   - = Insertion / Deletion
Bioinformatics I

          HBA_CHICK  VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL  48

 Multiple
          HBAD_CHICK  ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL  48
          HBPI_CHICK  AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV  48
          HBB_CHICK  VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL  48

 Sequence     HBE_CHICK
          HBRH_CHICK
                VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL
                VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL
                                           48
                                           48
          MYG_CHICK  GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL  49

 Alignment             ....  .  ..*  . .. * * * *.. .* *  * * ..

          HBA_CHICK  SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV  93
          HBAD_CHICK  SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV  93
          HBPI_CHICK  SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV  93
          HBB_CHICK  SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV  98
 (MSA)       HBE_CHICK
          HBRH_CHICK
                SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV
                SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV
                                           98
                                           98
          MYG_CHICK  KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI  99
                .   *. .. ** .*.. . . .. ..   .  *.. *  ..

          HBA_CHICK  DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR--  141
          HBAD_CHICK  DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR--  141
          HBPI_CHICK  DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR--  141
 Programs:     HBB_CHICK
          HBE_CHICK
                DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH--
                DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH--
                                           146
                                           146

 • CLUSTALW    HBRH_CHICK
          MYG_CHICK
                DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH--
                PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF
                                           146
                                           149
                  . .... .  .* .  . ... .  .* .   .. **.
 • T_COFFEE
          HBA_CHICK  ----  141
 • MULTALIGN    HBAD_CHICK
          HBPI_CHICK
                ----
                ----
                    141
                    141
          HBB_CHICK  ----  146
          HBE_CHICK  ----  146
          HBRH_CHICK  ----  146
          MYG_CHICK  GFQG  153

          Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%)
          Character to show that a position in the alignment is perfectly conserved: '*'
          Character to show that a position is well conserved: '.'
Bioinformatics I  NCBI BLAST  http://www.ncbi.nlm.nih.gov/blast/
BLAST:

Basic
Local
Alignment
Search
Tool
Bioinformatics I


Why do we need Bioinformatics?
      5’              3’
 DNA
                        Transcription

                         Splicing
 mRNA
                    Can we predict
                   protein structures?
                         Translation
 Poly-
 peptide
                         Folding


 Protein                    • Transport / Localization
                    • Oligomerization
                    • PTM (Post-Translational Modification)
        Function  Function
Bioinformatics I Protein Structure Modeling

   • Ab initio modeling
   • Threading & Fold Recognition
   • Homology Modeling MNIFEMLRID
 HLLTKSPSLN
        EGLRLKIYKD
        AAKSELDKAI
              TEGYYTIGIG
              GRNCNGVITK
                     ?
 DEAEKLFNQD  VDAAVRGILR  NAKLKPVYDS
 LDAVRRCALI  NMVFQMGETG  VAGFTNSLRM
 LQQKRWDEAA  VNLAKSRWYN  QTPNRAKRVI
 TTFRTGTWDA  YKNL
Bioinformatics I
 Some remarks on bioinformatics in general ...


  • It does not replace but complement
   experimental research

  • It helps plan experiments

  • Good bioinformatics studies take a long
   time

  • Like anywhere else: some garbage in, a
   lot of garbage out
Bioinformatics I
  For lecture notes, articles, references, links, etc.

              see „Teaching“

                 @

       http://www.bioz.unibas.ch/personal/schwede/

        http://www.bioz.unibas.ch/personal/primig/

								
To top