Multiple Mapping Method with Multiple Templates (M4T by dxe19593

VIEWS: 24 PAGES: 23

									  Multiple Mapping Method with Multiple Templates (M4T):
optimizing sequence-to-structure alignments and combining
         unique information from multiple templates




                       András Fiser

                       Department of Biochemistry and
                       Seaver Center for Bioinformatics
                       Albert Einstein College of Medicine
                       Bronx, New York, USA
     Comparative protein structure modeling

     START


Template Search           Multiple Templates


Target – Template
                          Multiple Mapping Method
   Alignment


 Model Building           Loop, side chain modeling


Model Evaluation          Statistical potential

      END
   Why do we need sequence alignments?


#Sequence vs. sequence:
 Establishing residue equivalencies between two proteins to locate
  conserved/variable regions

#Sequence vs. databases:
  Querying sequence databases

#Sequence vs. structure
 To generate input alignment for comparative modeling / threading
Ranking of models built on alternative alignments

                               Template: 1a6m;                   ~21% sequence identity
Example:                       Target:   1spg, chain B
Template     VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK
Target CLW   DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM
Target A2D   DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template     GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY
Target CLW   DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----
Target A2D   DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH




                                                            Problem: None of the
                                                            currently available
                                                            methods produce
                                                            consistently superior
                                                            results in all cases
Alternative solutions vs. sequence similarity




  Instead of relying on just one alignment method, one should
  combine results of several alternative techniques
           Multiple Mapping Method

• Idea:
   – Improve the accuracy of sequence-to-structure
     alignment by optimally splicing alternative inputs.

• Three components:
  - Sampling
   - Algorithm
   - Scoring function
             MMM scoring function:
increasing the dimensionality of input information
    Template     VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL
    Target CLW   DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV
1   Template     KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY
    Target CLW   QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----



    Template     VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL
    Target A2D   DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV
2   Template     KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY
    Target A2D   QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH




1                               2                             Different mapping identifies a
                                                              different environment for
                                                              each residue to align

                                                              Assess the “fitness” of each
                                                              mapping
      Multiple Mapping Method: Algorithm

   Step 1: Identify variable regions from the consensus alignment of the input set

   Step 2: Select the best scoring variable segments, and combine them with
           with the core region of the alignment.




                               Template    1a6m;
Example:                       Target      1spg, chain B
                                                                     21% sequence id

Template     VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK
Target CLW   DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM
Target A2D   DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template     GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY
Target CLW   DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----
Target A2D   DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH
MMM example using ideal scoring function
                                                                                CLUSTALW 4.6 Å
Experimental                                                                    ALIGN2D 1.1 Å
ClustalW, RMSD 2.0 Å
Align2D, RMSD 2.7 Å
                                      CLUSTALW 2.6 Å
                                      ALIGN2D 6.1 Å


Template     VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK
Target MMM   DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template     GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY
Target MMM   DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----




Experimental
MMM, RMSD 1.8 Å
Multiple Mapping Method: scoring function (1)
    A composite scoring function to assess the
    compatibility/fit of alternative variable segments in the
    template structural environment.

•   The composite scoring function consists of three mostly
    non-overlapping components.

    1. Environment-specific substitution matrices (FUGUE1).

    2. A scoring scheme based on a comparison (PHD vs. DSSP) of
       the secondary structure types (H3P22).

    3. Statistically derived residue-residue contact energy (Rykunov
       and Fiser3).

        1Shiet al. J. Mol. Biol. (2001) 310, 243-257
        2Riceet al., J. Mol. Biol (1997) 267, 1026-1038
        3Rykunov & Fiser., Proteins. (2007) 67, 559-68
MMM performance on 1400 pairs
MMM performance on 87 pairs, meta-servers

                    ESypred3D
                    Consensus
Sampling vs. Scoring
                     Summary


• Multiple Mapping Method optimally combines alternative
  alignments obtained from different methods or scoring
  function:

  On a benchmark dataset of 6635 protein pair structural
  alignments, comparative models built using MMM
  alignments are approximately 0.3 Ǻ and 0.5 Å more
  accurate on average in the whole spectrum and in the
  <30% target-template sequence identity regions,
  respectively, than the average accuracy of models built
  using the alternative input alignments ( ~3 and ~4 Å).
Optimally combining multiple templates
         Selecting multiple templates
• Target sequence: by PSI-BLAST.

• Hits selected if sequence overlap with the target is > 60% of the actual
  SCOP domain length or more than 75% of the PDB chain length in case of a
  missing SCOP classification.

• Iterative clustering procedure identifies the most suitable templates to
  combine. Templates are selected or discarded according to a hierarchical
  selection procedure that accounts for
   –   sequence identity between templates and target sequence,
   –   sequence identity among templates,
   –   crystal resolution of the templates,
   –   contribution of templates to the target sequence (i.e. if a region is covered by
       several templates or by a single template only).
Single versus multiple templates
Using a dataset of 765 proteins with known structure two sets of models were
built: (1) using one template (best E-value hit; light bars), (2) using multiple
templates (grey bars)
       And…increased coverage
Histogram of models’ difference length. Each query sequence is modeled using
single and multiple templates. The histogram shows the frequency of (Lm–Ls).
Lm: length of model built using multiple templates, and Ls length of the model
built using a single template.
The x-ray structure, the model with multiple templates and with a single
template are shown in grey, red, and blue, respectively.

Multiple templates agree much better in two exposed regions: A and B, than
the model built using single template.
                            Increased Coverage

 The x-ray structure, the model with multiple templates, and model with single
           templates are shown in grey, red, and blue, respectively.

The addition of extra templates allowed obtaining a longer model that include a
     beta-turn-beta-turn extra region (20 amino acids), depicted in ribbon.
                    Acknowledgement

•   Lab members:
     – Dmitrij Rykunov
     – Rotem Rubinstein
     – J. Eduardo Fajardo
     – Carlos J. Madrid-Aliste
     – Veena Venkatagiriyappa
     – Joseph Dybas
     – Mario Pujato



     – Brajesh Rai
     – Narcis Fernandez-Fuentes
     – Elliot Sternberger
Http://www.fiserlab.org/servers

								
To top