Docstoc

PPT - MAlign_ a tool for alignin

Document Sample
PPT - MAlign_ a tool for alignin Powered By Docstoc
					      Bioinformatics of Disease:
      immune epitope prediction
                        Shoba Ranganathan
Professor and Chair – Bioinformatics
Dept. of Chemistry and Biomolecular Sciences &      Adjunct Professor
Biotechnology Research Institute                 Dept. of Biochemistry
Macquarie University                  Yong Loo Lin School of Medicine
Sydney, Australia          National University of Singapore, Singapore
(shoba.ranganathan@mq.edu.au)                   (shoba@bic.nus.edu.sg)
                          Visiting scientist @
           Institute for Infocomm Research (I2R), Singapore
Bioinformatics is …..

 Bioinformatics is the study of living
  systems through computation
Data in Bioinformatics (in the main)



Sequences   Structures   Genomes Transcriptomes                 Networks,
                                                  Genetics and pathways
                                                   populations and systems

and their management and analysis


Databases, Data & text                              Physics/   Evolution and
ontologies               Algorithms Maths/Stats
             mining                                Chemistry   phylogenetics
Overview of my research
                  1.   Genome analysis
                  2.   Transcriptome analysis
                  3.   Protein/Proteome
                       analysis
                  4.   Systems Biology
                  5.   Immunoinformatics
                  6.   Genome-phenome
                       mapping
                  7.   Biodiversity Informatics
5. What is Immunoinformatics?

 Using Bioinformatics to address problems
  in Immunology
    Application of bioinformatics to
     accelerate immune system research has
     the potential to deliver vaccines and
     address immunotherapeutics.
    Computational systems biology of
     immune response
Immunoinformatics



             Immunology



       Computer       Biology
        Science
                    Basic        Clinical
                  immunology   immunology

  Networks,
  pathways,                                         -omics
 and systems



                IMMUNOINFORMATICS
  Artificial
intelligence                                      Cell biology




                                                Physics/
 Databases                                     Chemistry
               Algorithms        Maths/Stats
Summary
                                 Genetics and    Networks,
 Introduction                    populations
                                                 pathways
                                                and systems

 Structural Immunoinformatic
  Database development
 Data Analysis                 Basic      Clinical
                              immunology immunology
 Computational models
 Applications

                                          -omics
The immune system
 Composed of many interdependent cell
  types, organs, and tissues to
   protect the body from infections (bacterial,
     parasitic, fungal, or viral) and
   arrest abnormal growth and differentiation
 Inappropriate immune responses lead to
  allergies and autoimmunity
 2nd most complex system in the human body
Genomics vs. Immunomics
 Genomics: solving the genome puzzle
   104 genes coding for 106 products
 Immunomics: understanding immune
  response
   102-103 genes leading to >1012 products
Enormous diversity in immunomics has
  implications for immune function and
  modulation
It is a numbers game….
 >1013 MHC class I haplotypes (IMGT-HLA)

 107-1015 T cell receptors (Arstila et al., 1999)

 >109 combinatorial antibodies (Jerne, 1993)

 1012 B cell clonotypes (Jerne, 1993)

 1011 linear epitopes composed of nine amino acids

 >>1011 conformational epitopes
T cell mediated adaptive immune
response
 Specific peptide residues critical for stimulating
  cellular immune responses
 Major histocompatibility complex (MHC)
  molecules (Human Leukocyte Antigen or HLA in
  humans) bind and present short antigenic
  peptides to T cell receptors, for inspection
 Antigen presentation is by two classes of MHC
  (class I and class II)
 Those peptides that bind to specific MHC and
  trigger T cell recognition (T cell epitopes) are
  targets for vaccine and immunotherapy
  development
How to generate a T cell-mediated
        immune response




                        3. T cell receptor
         2. MHC
               1. Epitope
Major histocompatibility complex
                              Gene structure of the human MHC




           3D structure of the human MHC

MHC                                                  MHC
Class II                                             Class I
MHC Class I
for endogenous
peptides




    Figure by Eric A.J. Reits
MHC class II for exogenous peptides




 Figure by Eric A.J. Reits
Antigen processing pathway:
  peptides, MHC, T-cells
1. Degradation of antigen
2. Peptide binding to MHC
3. Recognition of peptide-MHC complex by T-cells
            Yewdell et al. Ann. Rev Immunol (1999)

                     0.05% chance of immunogenicity




          20%              0.5%               50%
       processed        bind MHC          CTL response
Physico-chemical properties
affect MHC-peptide binding
Epitope prediction “Fishing”
 Computational models can help
 identify T cell epitopes
 Suggest candidate epitopes by in silico
  screening of entire proteins and even
  proteomes with specificity at:
   the allele level
   the supertype level
   disease-implicated alleles alone.
 Minimize the number of wet-lab experiments
 Cut down the lead time involved in epitope
  discovery and vaccine design
 Predicting MHC-binding peptides
                               Tong, Tan and Ranganathan (2007)
                               Briefings in Bioinformatics 8: 96-108
1.   Sequence-based approach
      Pattern recognition techniques
          • binding motif, matrices, ANN, HMM, SVM
      Main limitations:
          • Require large amount of data for training
          • Preclude data with limited sequence conservation
2.   Structure-based approach
      Rigid backbone modeling techniques
      Flexible docking techniques
      Main advantage: large training datasets unnecessary
Our aim: Structure-based prediction
      of MHC-binding peptides
Why structure?
 Great potential to:
   generate biologically meaningful data for analysis
   predict candidate peptides for alleles that have not
    been widely studied, where sequence-based
    approaches fail or are not attempted
   predict binding affinity of peptides
   predict non-contiguous epitopes

 Structure determination through experimental methods is
  both expensive and time-consuming
 Has not been extensively studied due to high
  computational costs and development complexity
Existing Structure-based
  Prediction Techniques
   Protein Threading [Altuvia et al. 1995; Schueler-Furman
    et al. 2000]
   Homology Modeling [Michielin et al. 2000]
   Rigid/Flexible Docking [Rosenfeld et al. 1993;
    Sezerman et al. 1996; Rognan et al. 1999; Desmet et
    al. 2000; Michielin et al. 2003]
Hypothesis for epitope selection

 Peptides bound to MHC alleles are similar to
  substrates bound to enzymes
 “Lock-and-key” mechanism for peptide
  selection
   Shape
   Size
   Electrostatic characteristics
                                      Databases,
 Introduction                        ontologies

 Structural Immunoinformatic
  Database development
 Data Analysis               Sequences     Structures


 Computational models
 Applications
                                  Basic    Genetics and
                                immunology populations
MPID:MHC-Peptide Interaction Database
Govindarajan et al. (2003) Bioinformatics, 19: 309-310
RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)
                                  Distribution based on MHC allele specificity
        25


        20


        15


        10


         5


         0
             A*0201
                      A*6801
                               B*0801
                                        B*2705
                                                 B*3501
                                                          B*5101
                                                                   B*5301




                                                                                                                                                                            RT1.Aa
                                                                                                                                          HLA-Cw3
                                                                                                                                                    HLA-Cw4
                                                                                                                                                              I-Ad
                                                                                                                          H2-Kb




                                                                                                                                                                     I-Ak
                                                                                  DR1
                                                                                        DR2
                                                                                              DR3
                                                                                                    DR4
                                                                                                          H2-Db
                                                                                                                  H2-Dd


                                                                                                                                  H2-Ld
                                                                            DQ8




                                                                                  MHC allele
Peptide/MHC interaction
characteristics


Peptide
Length       Gap Volume              Interface area


                                  Interacting Residues
                 Intermolecular
                hydrogen bonds


             Gap volume
Gap index =
            Interface area
MPID-T: MHC-Peptide-T Cell Receptor
Interaction Database Tong et al. (2006)
                      Applied Bioinformatics, 5: 111-114




                          187 curated pMHC
                          16 with TCR
                          Human:110, Murine:74
                           and Rat:3
                          Alleles: 40
                       Distribution of MHC by allele
40
35
30
25
20
15
10
5
0




                                                                                                                                                                                                                                                                                                                                                                          I-Ek
                                                                                                                                                                                                                                                                                                                            H2-M3
                                                                                                                                                                                                                                                                                            H2-Db
                                                                                                                                                                                                                                                                                                    H2-Dd




                                                                                                                                                                                                                                                                                                                                    H2-Qa-2
                                                                                                                                                                         DRB1*0101
                                                                                                                                                                                     DRB1*0301
                                                                                                                                                                                                 DRB1*1501
                                                                                                                                                                                                             DRB5*0101




                                                                                                                                                                                                                                                 DRB1*0401
                                                                                                                                                                                                                         DQB1*0302
                                                                                                                                                                                                                                     DQB1*0602


                                                                                                                                                                                                                                                             DQB1*0201


                                                                                                                                                                                                                                                                                  RT1-A1C




                                                                                                                                                                                                                                                                                                                    H2-Ld
                                                                                                                                              E*0103
                                                                                                                                                       E*0101
                                                                                                                                                                G*0101




                                                                                                                                                                                                                                                                         RT1.Aa




                                                                                                                                                                                                                                                                                                            H2-Kb




                                                                                                                                                                                                                                                                                                                                              I-Ak
                                                                                                                                                                                                                                                                                                                                                     I-Ab
                                                                                                                                                                                                                                                                                                                                                            I-Ad
                                                                                                                                                                                                                                                                                                                                                                   I-Au


                                                                                                                                                                                                                                                                                                                                                                                 I-Ag 7
                                B*1501
                                         B*3501
                                                  B*0801
                                                           B*2705
                                                                    B*2709
                                                                             B*4402
                                                                                      B*4403
                                                                                               B*4405
                                                                                                        B*5101
                                                                                                                 B*5301
     A*0101
              A*0201
                       A*6801




                                                                                                                          Cw*0304
                                                                                                                                    Cw*0401


                                   101 new entries
                                   187 entries (Human: 110; Murine: 74; Rat: 3)
                                   134 non-redundant entries (class I: 100; class II: 34)
                                   121 class I and 41 class II entries
                                   26 HLA alleles (class I: 18; class II: 8)
                                   14 rodent alleles (class I: 8; class II: 6)
                                   16 TCR/peptide/MHC complexes
Peptide/MHC binding motifs




  Polar     Amide       Basic       Acidic     Hydrophobic
 Conserved peptide properties in solution structures
 Classified according to
    Alleles
    Peptide length
How to obtain structures of
experimentally unsolved alleles?

1. There were only 36 crystal structures of
   unique MHC (2006) alleles vs. 1765 unique
   MHC alleles identified in IMGT/HLA database
2. Structure determination through experimental
   methods is both expensive and time-
   consuming
3. Homology model building for alleles with no
   structural data!
 Introduction                    Structures


 Structural Immunoinformatic
  Database development
                                  Data & text
 Data Analysis of pMHC Class I     mining

  complexes
 Computational models
 Applications                    Maths/Stats
 MHC Class I superfamilies have
 different interaction characteristics
   Single linkage cluster analysis of 68 pMHC Class I
    complexes from 13 alleles (all available A and B)
Superfamily       HLA-A2            HLA-B7             HLA-B27
                  (36 entries)      (12 entries)       (18 entries)

Interface area (Å2) 846.3±48.9      876.7±72.4         934.0±136.0
Gap volume (Å3)   799.8±195.2 870.2±198.0 985.1±101.5
Gap index         0.9±0.2           1.0±0.1            1.0±0.3
Hydrogen bonds    11.1±1.9          14.3±2.3           17.9±2.8
                  Concentrated at   Well distributed   Concentrated at
                  pockets A, B, F                      pockets A, B, F
Data
 68 peptide–HLA complexes spanning 13 classes I alleles
                                                  Details
  from MPID-T
Hierarchical clustering
 Hierarchical clustering using the agglomerative algorithm.
 Distance between structures computed by single-linkage
  method (MATLAB version 7.0) based on the separation
  between the each pair of data points.
 Nearest neighbors merged into clusters.
 Smaller clusters were then merged into larger clusters based
  on inter-cluster distances, until all structures are combined.
 Last 3 levels considered for defining HLA class I supertypes.
Interaction parameters
 Significant for the characterization of peptide/MHC interface:
     Intermolecular hydrogen bonds          Gap volume
     pMHC Interface area                    Gap index
 Binding characteristics of HLA supertypes analyzed
Do the Class I alleles aggregate into
“superfamilies” using receptor-ligand
interaction patterns?


 Legend

  B27

  B44

  B7

  B62

  B8
MHC Class I superfamilies from
receptor-ligand interactions
                                     80 HLA class I complexes
                                     13 class I alleles
 Tong, Tan and Ranganathan (2007)
 Bioinformatics, 23: 177-183
                                     Five descriptors
                                     Hierarchical clustering using
                                      nearest neighbor algorithm
                                     77% consensus with data
                                      from other groups
                                     Supertype definition:
                                      receptor structure, ligand
Legend
                                      binding motifs, or receptor-
B27      B44   B7    B62     B8       ligand interaction patterns
 Introduction                  Sequences    Structures


 Structural Immunoinformatic
  Database development
 Data Analysis                         Physics/
                                       Chemistry

 Computational models
 Applications
                                       Maths/Stats
Two-step approach to predict
MHC-binding peptides
1. Finding the best fit conformation (docking) of
   peptides within the MHC binding groove
2. Screening potential binders from the
   background
Docking is a computationally
exhaustive procedure
 Large number of possible peptide conformations
    3 global translational degrees of freedom
    3 global rotational degrees of freedom
    1 conformational degree of freedom for each rotatable bond
>1010 possible conformations for a 10-residue peptide
               y
                                          N
                                                    C
                                      C        Ca
                           x
                                      O
                                               R
        z
Conservation of nonamer peptide
backbone conformation
                   Class I peptides
                      N-termini residues
                        0.02 – 0.29 Å
                      C-termini residues
                        0.00 – 0.25 Å

                   Class II binding registers
                      Only 9 residues fit in
                       the binding groove
                      N-termini residues
                         0.01 – 0.22 Å
                      C-termini residues
                         0.02 – 0.27 Å
Rapid docking of peptide to MHC
Tong, Tan & Ranganathan (2004) Protein Sci. 13:2523-2532


          Anchoring root fragments
     1    to reduce search space
   (Pseudo-Brownian rigid body docking )

            Loop modeling (Loop closure
     2      of central backbone by
            satisfaction of spatial restraints)

         Ligand backbone and
    3 side-chain refinement (entire
   backbone and interacting side-chains
Benchmarking with existing techniques
    Author                   Technique                Peptide      RMSDa     RMSDb
                                                     TLTSCNTSV      1.04       0.46
                                                     FLPSDFFPSV     1.59       1.10
 Rognan et al.          Simulated Annealing          GILGFVFTL      0.46       0.32
                                                     ILKEPVHGV      0.87       0.87
                                                     LLFGYPVYV      0.78       0.33
 Desmet et al.     Combinatorial Buildup Algorithm   RGYVYQGL       0.56       0.32
                                                     FAPGNYPAL      2.70       0.40
Rosenfeld et al.       Multiple Copy Algorithm
                                                     GILGFVFTL      1.40       0.32
                                                     LLFGYPVYV      1.40       0.33
                                                     ILKGPVHGV      1.30       0.87
Sezerman et al.    Combinatorial Buildup Algorithm
                                                     GILGFVFTL      1.60       0.32
                                                     TLTSCNTSV      2.20       0.46
aRMSD  of peptide backbone obtained from respective authors. bRMSD of peptide backbone
obtained in our work from redocking bound complexes and single template respectively.
Quantitative separation of binders from
non-binders: empirical free energy scoring
function
  DQ3.2b involved in several autoimmune
  diseases:
    Celiac disease
    insulin-dependent diabetes mellitus
    IDDM-associated periodontal disease
    autoimmune polyendocrine syndrome
     type II
Quantitative separation of binders
from non-binders: empirical free
energy scoring function
Gbind = αGH + βGS + GEL + C
   Gbind = binding free energy
   GH = hydrophobic term
   GS = decrease in side chain entropy
   GEL = electrostatic term
   C       = entropy change in system due to external factors
   α, β, γ optimized by least-square multivariate regression
    with experimental binding affinities (IC50) of MHC-peptides
    in training dataset (Rognan et al., 1999)

    Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Test case: MHC Class II DQ8

 DQ3.2b (DQA1*0301/DQB1*0302) is
  involved in several autoimmune diseases:
    Celiac disease
    insulin-dependent diabetes mellitus
    IDDM-associated periodontal disease
    autoimmune polyendocrine syndrome
     type II
Data used
 Structure: 1JK8 - DQ3.2β–insulin B9-23 complex
 Dataset I: 127 peptides with experimentally determined
  IC50 values [70 high-affinity (IC50 < 500 nM), 13 medium-
  affinity (500 nM < IC50 < 1500 nM )and 23 low-affinity
  (1500 < IC50 < 5000 nM) binders and 21 non-binders
  (5000 < IC50)] derived from biochemical studies.
    87 with known binding registers.
   Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
 Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2)
  peptides with experimental T-cell proliferation values from
  functional studies, with 7 peptides eliciting DQ3.2β-
  restricted T-cell proliferation.
Scoring: Training & testing datasets
 Training
    56 binding conformations with known registers
    30 non-binding conformations from 3 non-
     binders
 Testing
    Test set 1 – 68 peptides from biochemical
     studies
       16 strong ; 13 medium; 21 weak; 18 non-binders
    Test set 2 – 12 peptides from functional studies
       7 elicit T-cell proliferation
Screening class II binding register:
a sliding window approach
          E285B 112-126 peptide
    Y Q T I E E N I K I F E E D A

       Core sequence   Binding Energy
         YQTIEENIK        -23.12

         QTIEENIKI        -21.34
         TIEENIKIF        -25.32

         IEENIKIFE        -29.53

         EENIKIFEE        -32.27

         ENIKIFEED        -21.72

         NIKIFEEDA        -22.95
4-step protocol used

              Anchoring root fragments
          A   (probes) to reduce search
              space


          B   Loop modeling

Docking
              Refinement of binding
          C   register

              Extension of flanking
          D
              residues for MHC Class II
Accuracy estimates
 Sensitivity (SE) = number of binders correctly predicted
                   = TP/AP (TP+FN)
 Specificity (SP) = number of non-binders correctly predicted
                   = TN/AN (TN+FP)


                               Area under ROC (receiver
                               operating characteristics)
                               curve:
                               >90% excellent
                               >80% good
Results for Training set
                   Specificity       Group   Sensitivity    Binding Energy
                   (SP) Level                  (SE)        Threshold (kJ/mol)
 High SE (good    SP = 0.80     LMH            0.90             -28.70
  for most                       MH             0.85             -29.10
  predictions)                   H              0.75             -30.82
                   SP = 0.90     LMH            0.84             -29.10
                                 MH             0.77             -30.50
                                 H              0.75             -32.74
 Very few FPs,    SP = 0.95     LMH            0.81             -29.93
  but also fewer                 MH             0.73             -32.12
  predictions                    H              0.63             -33.59
Screening class II binding register:
HLA-DQ8 prediction accuracy for
Test Set I
 Group           LMH              MH     H
 AROC            0.88             0.93   0.93

 Classification of binding peptides
   High-affinity binders (H)
         IC50 ≤ 500 nM
   Medium-affinity binders (M)
         500 nM < IC50 ≤ 1500 nM
   Low-affinity binders (L)
         1500 < IC50 ≤ 5000 nM
                       Position                                   BE      IC50
                                                Source
                     1  4 67 9                                 (kJ/mol)   (nM)

                                    Test Set 1: Improved
Binding              T  D RR Q
Motif                S  V VV N
                     W  M DD G
                     K
                     E
                        A AA
                        I II
                             D
                             P       detection of binders
                                  lacking position specific
                     D       R
                             YY
                     Q       E
                     F
                     L
                     M                      binding motifs
                LQLQPFPQPQPFPPL          A-gliadin 56-70        -41.01      20
Ligands /         DMTPADALDDFDL          HSV                    -40.53     173
Epitopes            AAAAAVAAEAY          Artificial sequence    -39.98      48
                    GVAGLLVALAV          IA-2 499-509           -36.16      95
            DSNIMNSINNVMDEIDFFEK         Pf ABRA 487–506        -36.01     171
                  FESTGNLIAPEYGFKISY     HA 255–271Y            -35.70      62
                   YPFIEQEGPEFFDQE       MHC Ia 51–63 analog    -35.34    1156
                 LLDILDTAGLEEYSAMRD      p21 51–66; C out       -35.27     202
                QPYPQPQPFPSQQPY          A-gliadin 41-55        -35.26    1120
                 FPSQQPYLQLQPFPQ         A-gliadin 49-63        -33.93      20
                   CDGERPTLAFLQDVM       GAD 101–115            -33.57      69
                    SFPPQQPYPQPQPQY      A-gliadin 77-91        -33.35     370
               SQDLELSWNLNGLQADLSS       FceR 104–122           -32.89     123
                  EPRAPWIEQEGPEYW        MHC Ia 46-63           -32.89     519
                  PPLYATGRLSQAQLMPSPPM   VP16                   -32.59     538
               SQDLELSWNLNGLQAY          FceR 104–122 analog    -32.49     118
                    IARAKMFPAVAEK        34P3A                  -31.91     541
Binding registers
20/23 (87%) binding registers
Only register (aa 4-12) from Test Set 2
  (Der p 2: 1-20)
(SE=0.80; SP(LMH)=0.90)
T-cell proliferation
 Top 5 predictions are experimental
  positives at very stringent threshold criteria
  (SE=0.95; SP(H)=0.63)
Multiple registers (SP=0.95, SE(LMHP
=0.81): 58% of Test Set 1)
                               Mainly for medium and high binders
                   14
                   13          Experimental support: Sinha et al. for
                   12           DRB1*0402
                   11
  No of Peptides




                   10
                    9          Is this why binding motifs are unsuccessful?
                    8
                    7
                    6
                    5
                    4
                    3
                    2
                    1
                    0
                        1      2       3         4          5          6         7
                                      No of Binding Registers

                            Weak Binders   Medium Binders       Strong Binders
 Introduction
 Structural Immunoinformatic Database
  development
 Data Analysis
 Computational models developed
 Applications
Pemphigus vulgaris (PV)




  www.aafp.org



                    adam.about.com

                                     http://www.medscape.com
 Autoimmune blistering skin disorder
 Characterized by autoantibodies targeting
  desmoglein-3 (Dsg3)
 Strong association with DR4 and DR6 alleles
Who are the major players in PV?
 DR4 PV implicated alleles (for Semitic)
    DRB1*0401
    DRB1*0402
    DRB1*0404
    DRB1*0406
 DR6 PV implicated alleles (for Caucasians)
    DRB1*1401
    DRB1*1404
    DRB1*1405
    DQB1*0503
What is known about DR4?
DR4 PV implicated alleles (DRB1*0401, *0402, *0404, *0406)
 High sequence conservation
     97.9 – 99.0% identity
     98.4 – 99.5% similarity
 High structural conservation
     Cα RMSD <0.22 Å for all key binding pockets
 7 polymorphic residues within binding cleft
     Pocket 1 (β86),
     Pocket 4 (β70, 71, 74)
     Pocket 6 (β11)
     Pocket 7 (β71)
     Pocket 9 (β37)
What is known about DR6?
DR6 PV implicated alleles
(DRB1*1401, *1404, *1405, DQB1*0503)
 High sequence conservation
    85.8 – 94.1% identity
    83.2 – 97.3% similarity
 High structural conservation
    Cα RMSD <0.22 Å for all key binding pockets
 14 polymorphic residues within binding clefts
    Pocket 1 (β86)
    Pocket 4 (β13, 70, 71, 74, 78)
    Pocket 6 (β11)
    Pocket 7 (β28, 30, 67, 71)
    Pocket 9 (β9, 37, 57, 60)
Clues…
  9 stimulatory Dsg3 peptides tested on PV patients
   possessing DR4 and DR6 PV implicated alleles
    1. Dsg3 96-112 (DR4, DR6)
    2. Dsg3 191-205 (DR4, DR6)
    3. Dsg3 206-220 (DR4, DR6)
    4. Dsg3 252-266 (DR4, DR6)
    5. Dsg3 342-356 (DR4, DR6)
    6. Dsg3 380-394 (DR4, DR6)
    7. Dsg3 763-777 (DR4, DR6)
    8. Dsg3 810-824 (DR4)
    9. Dsg3 963-977 (DR4)
Disease associated alleles vs.
innocent bystanders
DR4 PV
 8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402
 Atomic clashes with all other investigated DR4 subtypes
DR6 PV
 6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503
 Atomic clashes with all other investigated DR6 subtypes
     HLA association in DR6 PV more likely to be at DQ than
      DR locus
     Consistent with experimental work done by Sinha et al.
      (2002, 2005, 2006)
Tong et al. (2006) Immunome Research, 2: 1
Whither sequence motifs (again!)?




   1/9 investigated Dsg3 peptides fits existing binding motifs
   Flanking residues – clashes in fitting binding register
   Register-shift for Peptide V (Dsg3 342-356)
       Detected binding register: Dsg3 346-354
       Binding motifs: Dsg3 347-355 (Veldman et al., 2003)
                       : Dsg3 345-353 (Sinha et al., 2006)
Large-scale screening of Dsg3
peptides Tong et al. (2006) BMC Bioinformatics, 7(Suppl 5): S7
 Docking of 936 15mer Dsg3 peptides generated using a
  sliding window of size 15 across the entire Dsg3 glycoprotein

    Dsg3 peptide (sliding window width 15)

N                                                             C
                                         Training set: 8
              Binding register            peptides each,
         (sliding window width 9)         with exp. IC50
                                          values and known
                                          binding registers
                                          (5 binders and 3
                                          non-binders)
           Flanking residues
                     Large-scale screening of Dsg3 peptides
                      -40.00
     Binding Energy




                      -35.00
                      -30.00
                      -25.00
                      -20.00 50    70     90    110    130           150          170   190   210    230    250
                      -15.00
                      -40.00                                 15-mer start position
    Binding Energy




                      -35.00                                                                                       DRB1*0402
                      -30.00
                      -25.00
                                                                                                                   DQB1*0503
                      -20.00250   270    290    310    330           350          370   390   410     430   450
                      -15.00
                     -40.00
                      -40.00                                 15-m er start position                                Extracellular
Binding Energy
Binding Energy




                      -35.00
                     -35.00
                      -30.00
                     -30.00                                                                                        Transmembrane
                      -25.00
                     -25.00
                      -20.00450
                     -20.00450     470
                                  470     490
                                         490     510
                                                510    530
                                                       530           550
                                                                     550         570
                                                                                 570    590
                                                                                        590   610    630    650    Intracellular
                      -15.00
                     -15.00
                      -40.00                                 15-m er start position
                                                             15-m er start position                                Immunoreactive
    Binding Energy




                      -35.00                                                                                       region
                      -30.00
                      -25.00
                      -20.00650   670    690    710    730           750          770   790   810     830   850
                      -15.00
                      -40.00                                 15-m er start position
    Binding Energy




                      -35.00
                      -30.00
                      -25.00
                      -20.00850   870    890    910    930           950         970    990   1010   1030   1050
                      -15.00
                                                             15-m er start position
Common epitopes possibly responsible
for inducing disease in DR4 & DR6
patients
 Significant level of cross reactivity observed between
 DRB1*0402 and DQB1*0503 ( AROC=0.93)
   57% of peptides investigated in this study predicted to
    bind to both alleles with high affinity
   90% of known Dsg3 peptides predicted to bind to both
    alleles
   12/20 top predicted DQB1*0503-specific Dsg3 peptides
    from transmembrane region
   All top predicted DQB1*0402-specific Dsg3 peptides
    from extracellular regions
   Disease initiation implications: DR4 from ECD; DR6 from
    TM
Multiple binding registers revisited
 76% (410/539) predicted high-affinity binders to DRB1*0402
  possess > 2 binding registers
 57% (384/673) predicted high-affinity binders to DQB1*0503
  possess > 2 binding registers
 66% (354/539) bind both alleles at different registers
 Similar proportion (70%) detected in known binders to both
  alleles
                                          350
                                          300
 Both alleles bind
                         No of Peptides


                                          250

  similar peptides via                    200
                                          150
  different binding                       100
  registers                               50
                                           0
                                                0   1     2       3        4      5   6
                                                        No of Binding Registers

                                                        DQB1*0503     DRB1*0402
What next?

 We have developed a predictive model for
  HLA-C (Cw*0401) with very limited (only
  six) experimental binding values.
 The model yields excellent results for test
  data (AROC=0.93).
 Application to determine immunological
  hot spots for HIV-1 p24gag and gp160gag
  glycoproteins shows binding energies
  similar to HLA-A and –B.
Conclusions

 Computational models for immunogenic
  epitope prediction can be successfully
  developed, even for alleles with limited
  experimental data.
 While computations can never completely
  replace “wet-lab” experiments, in silico
  predictions can significantly cut down the
  development time of therapeutic vaccines.
1. Genome analysis

Approaches              Outcomes
 EST analysis          Comprehensive
 Annotation pipeline    annotation at the gene
  using workflow         and protein levels
  strategies            Novel &/or pathogen-
                         specific genes
Applications            Immune response
 Parasitic nematodes    evasion strategies
 Cancer EST data
2. Transcriptome analysis
Approaches               Outcomes
 Graph formalism for     New mRNA-gDNA alignment
  alternative splicing     method, MGAlign & MGAlignIt
 Genome-wide analysis  First splicing graph database,
                           DEDB
Applications              Web server for splicing
 Drosophila genome        graphs, ASGS
 Chicken compared to     Sub-graph elements for
  human and mouse          alternative splicing
 Kallikrein variants as  Multi-species splicing graph
  markers                  database, GraphDB
3. Protein/Proteome research:
Origin and evolution of structural domains
Approaches                  Outcomes
 Intron mapping to          New database of protein
  domain boundary             coding genes, XPro
 All eukaryotic proteins    Visualization of intronic
  analyzed                    locations on protein structural
                              doimains, XDomView
Applications                 Analysis tool, Go Module
 Domain prediction in        Viewer
  EST/genome data
 Effect of splice
  variants on domains
3. Protein/Proteome research:
Small disulfide-rich proteins
<100 aa per domain;             Outcomes
≥ 2 SS bonds                     New database, DSFD
                                 Server for model building,
Approaches                        SDPMOD
 Multiple structure             Understanding of wound-
  alignment and                   induced protease inhibitor
  hierarchical classification     folding
 Comparative modeling
  rules                         Applications
 Sequence, structure and        Design of protease
  evolutionary analysis of        inhibitors, channel
  Potato II inhibitor family      modulators, growth
                                  regulators
3. Protein/Proteome research:
Protease cleavage site prediction
Approaches                   Outcomes
 Detailed structural         New databases, SPdb,
  modeling and docking of      CasBase
  signal peptide moiety to    Server for caspase
  signal peptidase I           clevage prediction,
 SVM for caspases             CASVM
                              Signal peptide cleavage
Applications                   prediction (under
 Enhanced production of       development)
  therapeutic and
  cemmercial
  heterologous proteins
 Apoptosis initiation
4. Systems Biology
Approaches                        Outcomes
 Holistic computational,          Improved heterologous
  molecular biology and FRET
  study to locate secretion         protein production
  roadblocks                        using filamentous fungi
 EST analysis of host-parasite    Understanding of how
  interactions                      parasites evade host
                                    immune activation
Applications
 Trichoderma reesei as fungal
  bioreactor
 Parasites that lead to: liver
  cancer - food borne trematode
  (Opisthorchis viverrini) and
  bladder cancer (Schistosoma
  haematobium).
6. Genome-Phenome mapping
Approaches                 Outcomes
 Mutation data for non-
  laboratory animals       OMIA database, with links
 Mapping to OMIM           to OMIM (courtesy NCBI)
 Mapping to structure     Mutations linked to
                            severity of disease for α-
Applications
 OMIA-OMIM mapping
                            D-mannosidosis
  to structure             Predictions of new human
 Correlation between       disease mutations from
  genotype and disease      known mutation sites in
  pehnotype                 cow, cat and guinea pig
7. Biodiversity Informatics:
   Customary medicinal plants
Approaches                       Outcomes
 Integrating, visualizing and    CMkb, an integrated
  analyzing ethnobotanical,        knowledgebase
  phytochemical and
  pharmacological data on
  customary medicinal plants
 Data from Australian
  aboriginal elders and Indian
  Siddha doctors
Applications
 Novel antimicrobial, anti-
  inflammatory and anti-
  cancer lead compunds
                          Prof. Bernard Pullman
Dedications




                          Mme. Alberte Pullman




          My brother, a
          CML survivor
Acknowledgements
 Dr. (Victor) J.C. Tong, NUS&I2R, Singapore
 A/Prof. Tin Wee Tan, NUS
 Dr. Animesh Sinha, Weill Medical College of
  Cornell University & Michigan State
  University, USA
 Drs. J. Tom August (JHU) and Vladimir
  Brusic (DFCI) (NIAID-NIH Grant #5 U19
  AI56541 & Contract
  #HHSN266200400085C).
 All of you!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:18
posted:8/21/2010
language:English
pages:80