Prediction of Protein-2D-Structure

Document Sample
Prediction of Protein-2D-Structure Powered By Docstoc
					                   Prediction of
                   P di ti     f
               Protein 2D Structure
               Protein-2D-Structure
                          R. Merkl, University of Regensburg




Predicting 2D-Structure                                        1
                               Content


            Protein-2D-Structure
            Protein 2D Structure

                        p        g
            Methods for predicting 2D
            Generating MSAs
            Neural networks

            State-of-the-art 2D prediction

            Utilization of 2D information
                     gy
            Homology modeling   g




Predicting 2D-Structure                      2
                              Protein-Architecture




Protein Structure Databases                          3
              How can one classify proteins?


   The number of protein architectures is limited (< 10 000)

                    p                   proteins
         It must be possible to cluster p



   First approach: Based on Function

         Content of Domains




Predicting 2D-Structure                                        4
               Definition of a Protein Domain




      A protein domain is the smallest unit with a well
      defined structure that folds independently from the rest
       f h          i
      of the protein. U Usually, d
                             ll        i       i    f 50-150
                                  domains consist of 50 150
      residues and possess individual functions. Their
      concerted interplay constitute the function of the
      protein.




Predicting 2D-Structure                                          5
                          CAP-Protein


                                C-terminal domain (blue)
                                involved in dimerisation

                                N-terminal domain (orange)
                                binds CAMP




Predicting 2D-Structure                                      6
   Domain Composition of SAP27 and MAGI-1A



         Content of Domains

            Which domains occur in the protein?
            How often?




Predicting 2D-Structure                           7
        Second approach: Classification of
           Proteins Based on Structure


   Levels of hierarchy [Linderstrom-Lang, 1952]

            primary structure (1D)
            secondary structure (2D)
            t ti      t   t
            tertiary structure (3D)

            quaternary structure (4D)




Predicting 2D-Structure                           8
                          First Level: Sequence



>1LBM_A
MVRVKICGITNLEDALFSVESGADAVGFVFYPKSKRYISPEDARRISVELP
PFVFRVGVFVNEEPEKILDVASYVQLNAVQLHGEEPIELCRKIAERILVIK
AVGVSNERDMERALNYREFPILLDTKTPEYGGSGKTFDWSLILPYRDRFRY
LVLSGGLNPENVRSAIDVVRPFAVDVSSGVEAFPGKKDHDSIKMFIKNAKG
L




Predicting 2D-Structure                           9
                     Result of Sequence Comparison (BLAST):
        Bifunctional Enzyme Phosphoribosylanthranilate Isomerase (1pii):
             Indoleglycerolphosphate Synthase from Escherichia coli


 e gt   5 Query Score     9.3 bits (194),   pect    e 6, et od:
Length=452 Que y Sco e = 79.3 b ts ( 9 ), Expect = 7e-16, Method:
Compositional matrix adjust. Identities = 64/195 (32%),
Positives = 96/195 (49%), Gaps = 22/195 (11%)

Query    5       KICGITNLEDALFSVESGADAVGFVFYPKSKRYISPEDARRISVELPPFVFRVGVFVNEE   64
                 K+CG+T +DA + ++GA     G +F   S R ++ E A+ +    P + VGVF N +
Sbjct    258     KVCGLTRGQDAKAAYDAGAIYGGLIFVATSPRCVNVEQAQEVMAAAP--LQYVGVFRNHD   315

Query    65      PEKILDVASYVQLNAVQLHGEEP---IELCRK-IAERILVIKAVGVSNERDMERALNYRE   120
                    ++D A + L AVQLHG E     I+ R+ +    + + KA+ V         L RE
Sbjct    316     IADVVDKAKVLSLAAVQLHGNEEQLYIDTLREALPAHVAIWKALSVG------ETLPARE   369

Query    121     FP----ILLDTKTPEYGGSGKTFDWSLILPYRDRFRYLVLSGGLNPENVRSAIDVVRPFA   176
                 F      +LD      GGSG+ FDWSL+         ++L+GGL +N    A
Sbjct    370     FQHVDKYVLDNGQ---GGSGQRFDWSLL--NGQSLGNVLLAGGLGADNCVEAAQ-TGCAG   423

Query    177     VDVSSGVEAFPGKKD   191
                 +D +S VE+ PG KD
Sbjct    424     LDFNSAVESQPGIKD   438




  Predicting 2D-Structure                                                        10
                     Second Level: Series of 2D-Elements




Predicting 2D-Structure                                    11
                          Third Level: 3D-Structure




Predicting 2D-Structure                               12
                 Forth Level: 3D-Structure of Complexes:
                        Large Unit of the Ribosome




Predicting 2D-Structure                                    13
                              Summary




            Each classification level has specific pros and cons

               p      g
            Depending on the considered p         ,
                                           problem,
            a different level has to be chosen

   I the following: F
   In th f ll i           i
                    Focussing on 2D St   t
                                 2D-Structure

                              p p                        p gy
            Which biochemical properties determine the topology?




Predicting 2D-Structure                                            14
          Topology of the Peptide Bond




Two angles ( Φ , Ψ )
        per residue
     determine the
  orientation of the
        main chain.
Predicting 2D-Structure                  15
                                      main-                                 MFE-
    Figure 3 Ramachandran plot of the main-chain angles [phi] and [psi] for MFE-23




Biochemical Journal           www.biochemj.org            Biochem. J. (2000) 346,
Predicting 2D-Structure            519-
                                   519-528                                           16
                          Specific Areas




Predicting 2D-Structure                    17
                               2D-Elements


                      3D-Structures
   Regularly arranged 3D Structures

                      which are part of the main chain



   Caused by

                      hydrogen bonds between

                      imino- and carbonyl-groups




Predicting 2D-Structure                                  18
                            α-Helices


                                                   constant,
   If (Φ , Ψ ) – angles of successing residues are constant
   then the results are

                          helical structures



   Most important


                              α -Helix




Predicting 2D-Structure                                        19
                          α-Helices


                                      Hydrogen bonds
                                      between
                                      C=O-group of one
                                      AA and the
                                      NH-group of the AA
                                      following after four
                                              g
                                      positions.

                                      3.6 AA      tit t
                                      3 6 AAs constitute
                                      one turn.




Predicting 2D-Structure                                    20
               Helices at the Protein Surface




                              Hydrophobic momentum:

                          Hydrophobic,
                          Hydrophobic non-polar residues
                             Hydrophilic, polar residues
Predicting 2D-Structure                                    21
                          β-Sheets

   β-sheets consist of individual
        β-strands, (usually 5-10 residues long)

   Caused by hydrogen bonds between residues
   of different strands.

   Link a C=O – group of one strand
   with a NH-groups of the adjacent strand.


   The Cα –atoms of successing residues lie alternating above
     db l    the l            d b th h t
   and below th plane spanned by the sheet.




Predicting 2D-Structure                                     22
                               β-Sheet




       Here:

       Anti-parallel strands




Predicting 2D-Structure                  23
                          β-Sheet




                                    k
                                           H-bonds
                                        between positions
                                        distant in sequence


                                    l



Predicting 2D-Structure                                24
                          Architecture of Sheets

   The strains may have different orientation:

                          parallel
                            ti     ll l
                          anti-parallel


                   β-sheets                parallel.
   Inside proteins β sheets are frequently parallel


   At the surface they are frequently anti-parallel.

   In this case: A changing of hydrophobic und hydrophilic
   amino acids, which extend into the core of the protein or
   into the solvent.


Predicting 2D-Structure                                        25
           ß-Strands at the Protein Surface


                           core
                                        hydrophobic




                                        hydrophilic



                          solvent

Predicting 2D-Structure                       26
                Specifying 2D-Structure:
              DSSP [Kabsch + Sander 1983]
      1 PYQYPALTPE QKKELSDIAH RIVAPGKGIL AADESTGSIA KRLQSIGTEN
           SS HH HHHHHHHHHH HHTTTT EEE E HHHHH HHHHTTT

    51 TEENRRFYRQ LLLTADDRVN PCIGGVILFH ETLYQKADDG RPFPQVIKSK
       HHHHHHHHH HHTTTTTTTT TTBSEEEE H HHHT B TTS BHHHHHHHT

   101 GGVVGIKVDK GVVPLAGTNG ETTTQGLDGL SERCAQYKKD GADFAKWRCV
          T EEEEE EEE SSSSS EEE TTH HHHHHHHHHT T EEEEEEE

   151 LKIGEHTPSA LAIMENANVL ARYASICQQN GIVPIVEPEI LPDGDHDLKR
        E SSS S H HHHHHHHHHH HHHHHHHHTT T EEEEEEEE SS HHH

                                                       Code Structural Element
   201 CQYVTEKVLA AVYKALSDHH IYLEGTLLKP NMVTPGHACT QKFSHEEIAM
       HHHHHHHHHH HHHHHHHHTT GGGEEE TT S HHHHHH
                                                             H    α helix
                                                                  α-helix
   251 ATVTALRRTV PPAVTGITFL SGGQSEEEAS INLNAINKCP LLKPWALTFS
       HHHHHHHTTS TTS EEEE TT HHHHH HHHHHHHH S S SEEEEE B residue in an isolated β-strand

                                                        E  extended, part of a β-sheet
   301 YGRALQASAL KAWGGKKENL KAAQEEYVKR ALANSLACQG KYTPSGQAGA
       ESHHHHTHHH HHHTTTGGGH HHHHHHHHHH HHHHHHHTTT SS S
                                                             G    310-helix (3-helix)
   351 AASESLFVSN HAY
       TTSS                                                   I   π -helix (5-helix)

                                                              T   turn with H-bond

                                                              S   bend
Predicting 2D-Structure                                                                 27
                     Supersecondary Structure


                                             Barrel
                                       (βα)8-Barrel


                                         (βα) units
                                       8 (βα)-units

                                       arranged in a
                                       rotational-symmetric
                                       manner




Predicting 2D-Structure                                       28
                          Predicting

          i       d
     Protein Secondary Structure




Predicting 2D-Structure                29
               Goal: Predict 2D-Structure from Sequence




Predicting 2D-Structure                                   30
                          First Approaches


                          Chou-Fasman
                          Chou Fasman [ChFa 74]

   Basis

            15 protein 3D-structures (X-ray)



       y
   Analysis

            Occurrence of the amino acids
            in α-helices and β-sheets




Predicting 2D-Structure                           31
                                  Pα                Pβ
                          Glu    1.53        Met   1.67
                          Ala    1.45   Hα   Val   1.65   Hβ
                          Leu    1.34        Ile   1.60
                          His    1.24        Cys   1.30
                          Met    1.20        Tyr   1.29
                          Gln    1.17        Phe   1.28
                                        hα
                          Trp    1.14        Gln   1.23   hβ
                          V l
                          Val    1.14
                                 1 14        L
                                             Leu   1.22
                                                   1 22
                          Phe    1.12        Thr   1.20
                          Lys    1.07        Trp   1.19
                                        Iα
                          Ile    1.00
                                 1 00        Ala   0.97
                                                   0 97   Iβ   H α : strong helix former
                          Asp    0.98        Arg   0.90        h α : helix-former
                          Thr    0.82        Gly   0.81   iβ   I α:   weak helix-former
                          Ser    0 79
                                 0.79   iα   Asp   0.80
                                                   0 80        iα:    indifferent
                                                                      i diff    t
                          Arg    0.79        Lys   0.74        b α : helix breaker
                          Cys    0.77        Ser   0.72        B α: strong helix breaker
                          Asn    0.73        His   0.71   bβ                            β-
                                                               Analogous terms used for β
                                        bα
                          Tyr    0.61        Asn   0.65        sheets, according to [ChFa74].
                          Pro    0.59        Pro   0.62
                                        Bα
                          Glyy   0.53        Glu   0.26   Bβ

Predicting 2D-Structure                                                                    32
                              Rules

     A cluster of four helical residues (Hα or hα) out of six along
      h        i             ill i i i       h li The helical
     the protein sequence will initiate an a-helix. Th h li l
     segment is extended in both directions until sets of
     tetrapeptide breakers (Pα < 1.00) are reached.

     Proline cannot occur in the inner helix or at the C-terminal
     helical end but can occur within the last three residues at the
     N-terminal end.

     The inner helix is defined as one omitting the three helical
     residues at both the amino and carboxyl ends. Any segment
     that is at least six residues long with Pα > 1.03 and Pα > Pß is
                   helical.
     predicted as helical




Predicting 2D-Structure                                         33
                            Rules


     A cluster of three ß-formers or a cluster of three formers
     out of five residues along the sequence will initiate a ß-
     sheet. The ß-sheet is propagated in both dimensions until
                            p opaga d    bo d            o   u
     terminated by a set of tetrapeptide breakers (Pß < 1.00).




Predicting 2D-Structure                                      34
                               Performance

            mean
         At mean, the 2D-structure is correct
         for 50-55% of the predictions.

         Reasons for low performance:

                   Scores deduced from local frequencies
                   do not consider long range interactions

                   Number of structures was low


         However

                   Assessment of a larger set of training data
                   does not improve the method.


Predicting 2D-Structure                                          35
                    Roadmap to Better Results




                  Assessment of longer sub-sequences

                  Analysis of sequence profiles




Predicting 2D-Structure                                36
                          β-Sheet




                                    k
                                           H-bonds
                                        between positions
                                        distant in sequence


                                    l



Predicting 2D-Structure                                37
                 Profile based approach: PHD


                                          [RoSa93]
                          Rost and Sander [RoSa93], [Rost96]




   Performance

   For the CASP2-data set,
   74% of the 2D-structure were predicted correctly.




Predicting 2D-Structure                                        38
                                 Outline


   Essentials

                   p (Q      y)
                 Input (Query) used to g           p
                                        generate a profile.
                 The profile gets analyzed by means of a
                 complex neural network.



                   performance
   Reason for good p
              g

                    replacing query sequence by a
                            (MSA).
                    profile (MSA)




Predicting 2D-Structure                                       39
                                                 Multiple
                                                 M lti l
                                           Sequence Alignments




"One or two homologous sequences whisper...
a full multiple sequence alignment shouts out loud."
                       Arthur Lesk, 1996

   Predicting 2D-Structure                                  40
                             A typical MSA

   EETI II
   EETI-II                    GCPRILMRCKQDSDCLAGCVCGPNG FCGSP
                          ----GCPRILMRCKQDSDCLAGCVCGPNG-FCGSP   30
   Ii_Mutant              ----GCPRLLMRCKQDSDCLAGCVCGPNG-FCG–-   28
   BDTI-II                ---RGCPRILMRCKRDSDCLAGCVCQKNG-YCG–-   29
   CMeTI-B                ---VGCPRILMKCKTDRDCLTGCTCKRNG-YCG–-   29
   CMTI-IV                HEERVCPRILMKCKKDSDCLAECVCLEHG-YCG–-   32
   CSTI-IIB               ---MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS   32
   MRTI-I
   MRTI I                    GICPRILMECKRDSDCLAQCVCKRQG YCG
                          ---GICPRILMECKRDSDCLAQCVCKRQG-YCG–-   29
   Trypsin                ---RICPRIWMECTRDSDCMAKCICVAG--HCG–-   28
   ITRA_MOMCH             ---RSCPRIWMECTRDSDCMAKCICVAG--HCG–-   28
   MCTI A
   MCTI-A                    RICPRIWMECKRDSDCMAQCICVDG HCG
                          ---RICPRIWMECKRDSDCMAQCICVDG--HCG–-   28
   LCTI-III               ---RICPRILMECSSDSDCLAECICLENG-FCG--   29
                               **:: *.*. * **: * *      .**




Predicting 2D-Structure                                              41
                          Why using MSAs?




    Each column of a MSA characterizes more precisely than
     a sequence the requirements imposed by the structure
    and/or function of the protein for the considered position.




Predicting 2D-Structure                                       42
                  Reading a MSA




Strictly conserved residues       loops
Reading a MSA
                          Is the Quality Sufficient?

   EETI II
   EETI-II                       GCPRILMRCKQDSDCLAGCVCGPNG FCGSP
                             ----GCPRILMRCKQDSDCLAGCVCGPNG-FCGSP   30
   Ii_Mutant                 ----GCPRLLMRCKQDSDCLAGCVCGPNG-FCG–-   28
   BDTI-II                   ---RGCPRILMRCKRDSDCLAGCVCQKNG-YCG–-   29
   CMeTI-B                   ---VGCPRILMKCKTDRDCLTGCTCKRNG-YCG–-   29
   CMTI-IV                   HEERVCPRILMKCKKDSDCLAECVCLEHG-YCG–-   32
   CSTI-IIB                  ---MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS   32
   MRTI-I
   MRTI I                       GICPRILMECKRDSDCLAQCVCKRQG YCG
                             ---GICPRILMECKRDSDCLAQCVCKRQG-YCG–-   29
   Trypsin                   ---RICPRIWMECTRDSDCMAKCICVAG--HCG–-   28
   ITRA_MOMCH                ---RSCPRIWMECTRDSDCMAKCICVAG--HCG–-   28
   MCTI A
   MCTI-A                       RICPRIWMECKRDSDCMAQCICVDG HCG
                             ---RICPRIWMECKRDSDCMAQCICVDG--HCG–-   28
   LCTI-III                  ---RICPRILMECSSDSDCLAECICLENG-FCG--   29
                                  **:: *.*. * **: * *      .**




Predicting 2D-Structure                                                 45
           Methods to Compute an MSA

                          ClustalW

                          T-Coffee


Predicting 2D-Structure                46
                          Progressive Alignment


1) Compute pair-wise sequence similarities
2) Determine guide tree
3) Align sequences according to a guide tree




Predicting 2D-Structure                           47
                               Clustal W

                           Progressive alignment
                           based on global scores

   What is th       t?
   Wh t i the concept?

   The more similar the sequences,
                            information.
   the more reliable is the information

   A logical consequence:

   Gaps and misalignments introduced during the iterations
   are not corrected later.

                          Once a gap, always a gap


Predicting 2D-Structure                                      48
            Classical Progressive Alignment

                             SEQ1 GARFIELD THE LAST FAT CAT

                             SEQ2 GARFIELD THE FAST CAT

                             SEQ3 GARFIELD THE VERY FAST CAT

                             SEQ4 THE FAT CAT


                          SEQ1   GARFIELD   THE   LAST   FA-T   CAT
                          SEQ2   GARFIELD   THE   FAST   CA-T   ---
                          SEQ3   GARFIELD   THE   VERY   FAST   CAT
                          SEQ4   --------   THE   ----   FA-T   CAT


Predicting 2D-Structure                                               49
                          Limitations of ClustalW

   Based on a global alignment

        inappropriate for sequences having only
            /f
        one/few d        i ( ) in
                  domain(s) i common

   Improvement:
     p

                               T-Coffee [NoHiHe00]

   Starting with a pair-wise comparison
   based on different methods,
          t ti    f local similarities, structural correspondences
   computation of l     l i il iti       t   t   l          d

                                            g
        These scores steer the iterative alignment.

Predicting 2D-Structure                                         50
                              Extended Library

                          SEQk GARFIELD THE LAST FAT        CAT

                          SEQl GARFIELD THE          FAST CAT

           Align all combinations of sequences SEQk and SEQl
                               to create a scoring table.


                      U d thi f an alignment of the pairs:
                      Used this for li     t f th     i



                          SEQk GARFIELD THE LAST FA-T CAT
                          SEQl GARFIELD THE ---- FAST CAT

Predicting 2D-Structure                                           51
                                 Guide Tree


                          SEQ1 GARFIELD THE LAST FAT CAT
  ClustalW
                      SEQ2       GARFIELD THE FAST CAT
  SEQ1       GARFIELD THE        LAST FA-T CAT
  SEQ2                 THE
             GARFIELD SEQ3       GARFIELD ---
                                 FAST CA-T THE VERY FAST CAT
  SEQ3       GARFIELD THE        VERY FAST CAT
  SEQ4                 THE
             -------- SEQ4       THE FA-T CAT
                                 ---- FAT CAT

                          SEQ1   GARFIELD   THE   LAST   FA-T   CAT
                          SEQ2
                            Q    GARFIELD   THE   ----   FAST   CAT
                          SEQ3   GARFIELD   THE   VERY   FAST   CAT
                          SEQ4   --------   THE   ----   FA-T   CAT


Predicting 2D-Structure                                               52
                          M-Coffee (2006)

                   Meta-Approach
                   Combination of 8 MSA programs
                       Poa
                       Dialign-T
                       ClustalW
                       PCMA
                       FINSI
                       T-Coffee
                       T Coffee
                       Muscle
                       ProbCons




Predicting 2D-Structure                            53
     Performance of Individual Programs




                                       Column Score
                  Upper line: Addition of individual methods to M-Coffee,
                     Lower line: P f
                     L       li                   f i di id l   th d
                                  Performance of individual methods

Predicting 2D-Structure                                                     54
                              MSA                  Profiles


                               A       T       A       A       G      C
                               A       -       A       T       G      C
                               A       -       A       A       G      C


                              a1      a2      a3      a4      a5     a6
                          A   1 .0   0 .0     1 .0   0 .6 6   0 .0   0 .0
                          C   0 .0   0 .0     0 .0   0 .0     0 .0   1 .0
                          G   0 .0   0 .0     0 .0   0 .0     1 .0   0 .0
                          T   0 .0   0 .3 3   0 .0   0 .3 3   0 .0   0 .0
                          -   0 .0   0 .6 6   0 .0   0 .0     0 .0   0 .0




                   PHD: Replace query with a profile


Predicting 2D-Structure                                                     55
                          PHD: Overview


             Software architecture

                          2 interconnected neural networks

             Upstream:

                          Program for the generation of a MSAs

             Downstream:

                          Jury decision
                          J    d i i




Predicting 2D-Structure                                          56
                          PHD: Preparatory Stage


   MSA

   BLAST-ing the query against SWISS-PROT

   Sequences used to create a MSA.

   MSA is the basis to extract global and local parameter.

   These data are fed into the
   input layer of the first neural network.




Predicting 2D-Structure                                      57
                            Global Parameter




                      Amino acid composition of the query
                      Length of the query
                      For each position the distance to the
                      C- and N-terminus of the protein


   Global parameter are analyzed in both networks.




Predicting 2D-Structure                                       58
                          Local Parameter




   Local parameter are determined residue-wise
                                  residue wise

   Deduced from the MSA

            Frequency distribution of AAs
            The number of insertions and deletions
            a score for conservedness




Predicting 2D-Structure                              59
                          Machine-Learning:
                           Neural Networks



Predicting 2D-Structure                       60
               Machine Learning (Wikipedia)



       Machine learning is a scientific discipline that is con-
       cerned with the design and development of algorithms
                           g               p            g
       that allow computers to learn based on data, such as
       from sensors or databases.


       A major focus of machine learning research is to
       automatically learn to recognize complex patterns and
       make intelligent decisions based on data.




Predicting 2D-Structure                                           61
       Neural Networks: First Approaches


   McCulloch and Pitts introduced an element
   (the neuron) possessing:

                          several input lines
                          one output line
                          t      t t t t (0,1)
                          two output states (0 1)
                          a threshold function



   Problem

   How can one implement machine learning?


Predicting 2D-Structure                             62
                     Hebb‘s Hypothesis (1949)




                     y
            Plasticity of the interconnections:


            Each signal is multiplied with a weight factor wij


                              g phase:
            During the learning p
                 g

            Step-wise modification of the weights




Predicting 2D-Structure                                          63
                Perceptron, Rosenblatt, 1958

One example for a basic module:


                          n input lines xi
                          one output line yr

The state yr of the output is computed as:




                                                n
                                  yr: = f(     ∑w
                                               i =1
                                                      ri   xi )



Predicting 2D-Structure                                           64
                             Perceptron


                          Output




                               Input

Predicting 2D-Structure                   65
                          Threshold-Function


             ⎧ 1 if x > 0 5
                        0.5                            1
     f(x): = ⎨                         f ( x ): =
             ⎩ 0 otherwise                          1 + e− x




Predicting 2D-Structure                                        66
               Examples : Boolean Functions


                   OR     AND         NOT




Predicting 2D-Structure                       67
               Examples : Boolean Functions


                   OR             AND           NOT



         ?                ?   ?         ?   ?         ?




Predicting 2D-Structure                                   68
               Examples : Boolean Functions
                OR         AND        NOT




Predicting 2D-Structure                       69
          Architecture of a Neural Network


                                   neurons.
   A NN consists of interconnected neurons

                  y         g               g
   Based on the way of wiring one can distinguish

                   recurrent architectures,
                      t i i   directed l
                   containing di   t d loops

                   feed forward architectures,
                   which do not contain loops




Predicting 2D-Structure                             70
                          Layered Architecture

        Neurons are organized in classes (layer)
        the interconnections are specified class-wise.

Layers are grouped into
L                di t

        visible layer (input, output layer)

        hidden layer




Applications in computational biology

        Usually layered, feedforward networks


Predicting 2D-Structure                                  71
                          Architecture


                          Output
                          O t t




                           Input
Predicting 2D-Structure                  72
                       Learning: Backpropagation

For a set of known protein structures:
Compare predictions with real 2D-elements and determine error E.

If the weights wij are chosen in an optimal way:
The Error E is minimal. Thus, learning the weights is a
         minimization problem.

Simple approach of minimization: gradient descent.
                                        step-wise
The backpropagation algorithm determins step wise the local
error:

                                          ∂ E
                              Δwij: = −η
                                         ∂ wij


  Neuronal Netzworks                                          73
                        Learning the OR-Function




w1=0.2                      w2=0.1   w1=0.2           w2=0.3   w1=0.4           w2=0.3



     x1=1, x2=1                          x1=1, x2=0                x1=1, x2=0

yr=0, true result: 1                 yr=0, true result: 1      yr=0, true result: 1




  Predicting 2D-Structure                                                     74
                          Back to
                          B k t PHD



Predicting 2D-Structure               75
                        1. Neural Net:
                                   2D Structure
               Mapping Sequence → 2D-Structure

   Scope:
       Predict 2D-structure for the central residue
       of a window with length 13.

   Sliding Window Approach
        Input window moved along the MSA.

   Input
       Local parameters for the 13 residues plus global parameter.

   Output

                                              p
           For the central residue a value representingg
           one of three 2D-elements: α-helix, β-strand and loop.




Predicting 2D-Structure                                              76
    2. Neural Net: 2D-Structure → 2D-Structure


   Input:
   Results generated by the first NN plus global parameters

   Sliding Window
   Window of length 17:
   For each residue one of the symbols (α, β, L)
               g      p
   In addition global parameter.

   Output:
                                2D state
   For the center residue again 2D-state predicted
   α-helix, β-fold or loop

   Results stored in a table MM
   Used to determine a mean value.

                                                 output.
   Classification with highest mean value is the output

Predicting 2D-Structure                                       77
                          Architecture




Predicting 2D-Structure                  78
                          Training the NNs


      Training set
      Consists of pairs (sequence, known 2D-structure)

      Originates from 3D-structures of the PDB-database,
      possessing less than 25% sequence identity (130 entries).

      Occurrence of 2D-elements

      32 % of the residues in helices
      21 % in β-sheets
      47 % in loops
                 p




Predicting 2D-Structure                                      79
                           Performance


   Prediction quality

            At mean the prediction of PHD is correct
            for 72 % of the residues

   Prerequisite

            A sufficiently large number of similar sequences

            Without MSA: prediction quality drops about 10%.




Predicting 2D-Structure                                        80
                            Example (PDH-Output)

            (     )
PHD results (brief)

           ....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8...
AA         MADVHDKATRSKNMRAIATRDTAIEKRLASLLTGQGLAFRVQDASLPGRPDFVVDEYRCVIFTHGCFWHHHHCYLFKVPATRT
PHD_sec
PHD sec               HHHHHHHH     HHHHHHHHHHHHHHHHEE             E EEEEEEEEE
Rel_sec      ******** ******* ****************        **********      ****** *      *** ******
P_3_acc      eeeeeeeee eeebebeeeee eeeeebbebbbebbbeb eeeeebeb bebbbee ebbbbbbbbbbbb
Rel_acc                   * * *         *   *   * *                    ***** * **




  Predicting 2D-Structure                                                               81
                    Performance: Comparison
                    with real 2D (CAP-Protein)




Predicting 2D-Structure                          82
                          State of the Art

                               Spine
                           [Dor, Zhou 2007]

 NN-based, window l
 NN b   d   i d       th       id
                  length 21 residues

 additional parameters:
     a steric parameter (graph shape index),
     hydrophobicity,
     volume,
                                   Result:
     polarisability,
     isoelectric point,      A mean prediction
     helix probability,      quality of 80% for
     sheet probability      sequences of length
Predicting 2D-Structure
                                     80 to 300 residues.
                                                      83
                          Are there Problems?


   Hard cases:

            g                          p
   Predicting 2D-structure of membrane proteins

   For that application, special algorithms are needed:

         Statistics for the occurrence of AAs is biased.




Predicting 2D-Structure                                    84
                    Applications of
             2D-Predictions / 2D-Structures




                          Classification of proteins (SCOP)

                          Homology modeling




Predicting 2D-Structure                                       85
           Homology Modeling /
               Threading
               Th   di
MSA, HMM                         86
                                Concept



    Comparison of

                 an arbitrary protein sequence and
                 a small number of known structures.

         Goal: predict a 3D-structure


    Criterion for the assessment

                 the environment of individual residues


Predicting 2D-Structure                                   87
               88




 rchite
Ar        re
      ectur
 f
of 3D--PSSM
               Predicting 2D-Structure
                                Summary


                       2D-structure
            Predicting 2D structure with acceptable quality
            possible for most cases

            Servers available

            2D-structure is the basis for protein classification
            2D t    t    i th b i f          t i   l   ifi ti



            Hierarchical categorization implemented

            2D-Elements relevant for homology modelling



Predicting 2D-Structure                                            89
          s for y
     Thanks f   your attention
Predicting 2D-Structure          90
                          Consequence

   When using a random selection of training examples
   (unbalanced training), then the prediction quality is
   proportional to the frequency of the 2D-elements in
                set.
   the training set


          i   b         f balanced training
   Correction by means of b l    d    i i

   Select training elements with identical frequency.

   Prediction quality is similarly high for all 3 2D-elements.




Predicting 2D-Structure                                          91
                          Approach of PHD


   Combination of 4 networks



   For the two types of NNs, one gets trained in balanced,
   the other one in unbalanced mode.



   The 2 x 2 combinations generate 4 predictions, which are
   used to compute the maximal mean.




Predicting 2D-Structure                                       92

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:10/1/2011
language:
pages:92