PASS2: Semi -Automated database of Protein Alignments organized as

Document Sample
PASS2: Semi -Automated database of Protein Alignments organized as Powered By Docstoc
					PASS2 Database

Bioinformatics Group,
  Bangalore, India.
National Centre for Biological
      Sciences (NCBS)
                      About NCBS
• The National Centre for Biological Sciences (NCBS),
  Bangalore is a new centre of the Tata Institute of
  Fundamental Research, Mumbai
• Four Major Research Groups are there.
•   Biochemistry, Biophysics and Bioinformatics
•   Genetics and Development
•   Cellular Organization and Signaling
•   Neurobiology
     About Sowdhamini’s group
   (Computational Approaches to Protein Science)

• This lab began its activities at NCBS in July 1998.
• This lab is interested in studying protein structural
  similarities which are important to the prediction of fold
  and/or function of new proteins.
• This group interested ,
• In the development of tools for such predictive exercises.
• In large-scale sequence analysis of proteins that are
  determined by genome sequence projects.
• In applying existing methods to specific systems which are
  of interest to collaborative research.
             Development of Tools
• Protein Superfamily alignment database - PASS2 Compilation
  and analysis of distantly related protein structural similarities
  form an important component of the lab’s research activities.
  These similarities are being used to recognize newer members of
  relatively well-understood protein superfamilies.
• DDBASE - (Dial derived Domain database)
• DSDBASE- Protein Disulphide database is useful in searching
  for sub-structural folds in known proteins which are compatible
  to accommodate given disulphide connectivity
             Other Activities
• Genome Analysis
• Structure prediction and modelling of
  transmembrane domain of inositol-triphosphate
  and ryanodine receptors
• Modelling transforming growth factors (TGF) and
  their interaction with receptors
• Sequence analysis and structure prediction of a
  protein in bioluminescence superfamily
• Comparative modelling and analysis of active site
  region of a non-functional serine protease domain
PASS2: Semi -Automated database
of Protein Alignments organized as
  Structural Superfamiles (ver.2)

• Homologous proteins resemble each other in sequence, 3D
  structure and usually function; they are related through
  evolutionary divergence.
• Divergent evolution has also led to the existence of
  superfamilies with very low sequence identities, but
  similar topologies and often related functions.
• Derived databases are now available that classify protein
  structures deposited in the PDB into homologous families,
  superfamilies and folds.
• We have generated and updated, nearly-automated version
  of the original superfamily alignment database,
  CAMPASS (Sowdhamini et al., 1998) called PASS2:
  Semi-Automated database of Protein Alignments
  organised as Structural Superfamilies.
• Structure-based alignments of superfamilies confirm that
  identities and/or conservative variation in sequence are
  usually associated with structural determinants (key
  packing relationships or hydrogen bonds) or functional
  requirements, common to the superfamily.
• Therefore, these structure-based alignments provide a firm
  basis for understanding and predicting amino acid
  substitutions in superfamilies and for developing methods
  of “Fold Recognition”.
Structure-based Sequence Alignment
  and Compilation of the Database
• This new version PASS2, contains all possible alignments
  of protein structures at the superfamily level in the direct
  correspondence with the SCOP database (Murzin et al.,
• For all MMS (Multi Member Superfamilies) alignment of
  family members is based on the conservation of structural
  features like solvent accessibility, hydrogen bonding and
  the presence of secondary structures.
• The superfamilies are chosen from SCOP and all the
  representative members are extracted by using 27%
  sequence identity cutoff, with high resolutions. For NMR
  structure we assigned 3.2Å as resolution cutoff.
• The initial equivalence are obtained by using non-gap position of
  MALIGN alignment of superfamiliy members with JOY, mnyfit
  input option.
• We have employed COMPARER (Sali et al., 1990; Sali et
  al.,1992) to obtain the multiple sequence alignment of distantly
  related proteins.
• The final alignment of MMS & SMS are presented in JOY format
  (Mizuguchi et al.,1998) to include structural features.
• Structure-based sequence alignment are validated by using the
  RMSD VALIDATION method for miss-alignments.
• If the optimal alignments are not good then we repeated the
  process once again with STAMP and then CAMPARER program
  with Simulated Annealing option.
• Final sequence alignment of members in the MMS were used as
  initial equivalence to superpose structures using MNYFIT
  without update of the equivalence.
        Description of the Database
• Access Methods having may links given as follows:
• SMS-PASS2 Single Member Superfamilies of PASS2
• MMS-PASS2 Multi Member Superfamilies of PASS2
• More information about the database
• Software information
• Browse the list of Entries
• Keyword search
• Align Your sequence with PASS2 members
• PSI-BLAST search against PASS2
• PHI-BLAST search against PASS2
• CAMPASS (old version of MMS)
  Genome Distribution and other
• For the MMS & SMS family members we find the genome
  distribution for 72 genome, in which more than 40
  genomes are completed genomes.
• Finally we annotated that genome with JOY program,
  which is available via Genome Distribution link.
• Also there are two other main links ‘Structural-Motifs’,
  ‘Functional-Motifs’ will be available soon.
• For each entries in MMS & SMS there are links to their
  output datafiles such as atm, seq, hbond, sstruc, and psa
  are also available and these datafiles will be helpful for
  detailed analysis.
• Also, there are links more than 25 other Databases And
  Analysis Tools With Information on each MMS/SMS
Genome Distribution Procedure
         MMS/SMS Entry


          Hits extracted


         JOY annotation

   Structural Phylogeny
(RMSD of equivalent regions)
   Cytochromes Superfamiy
Percentage Identity Matrix of
    Cytochromes Superfamily
                1bbha- 1cpq-- 256bb-

         1bbha- 100.0 23.6 22.6
         1cpq-- 23.6 100.0 19.8
         256bb- 22.6 19.8 100.0

       high highest percentage identity
        low lowest percentage identity
        Alignment Based On Similarities In
   Structural Features (COMPARER Alignment)
                  Cytochromes superfamily

1bbha-( 1 ) aglspeeqIetRqagyefmgwNmgkIkanlegeynaaqVeaAAnvIaaiA
1cpq--( 1 ) adtkevleaReayfkslggSmkaMtgvaka-fdaeaAkveAakLekil
256bb-( 1 )      adledNmetlndnlkvIekAd----naaqVkdALtkMraAA
                    aaaaaaaaaaaaaaaaa           aaaaaaaaaaaa

1bbha-( 51 ) nsgmgalygpgTdk-nvgdvkTrvkpeffqnmedvgkiarefvgaAntLa
1cpq--( 48 ) atdvaplFpagTSstdlpg-qTeakaaIwanmddfgakgkamheaGgaVi
256bb-( 38 ) ld-AqkatPpkle---------dksp-dspeMkdfrhGfdilvgqIddAl

1bbha-( 100 ) evAatgeaeaVktafgdVgaackschekYrak
1cpq--( 97 ) aaAnagdgaaFgaalqkLggtckachddYreed
256bb-( 77 ) klAnegkvkeAqaaAeqLkttrnayhqkyr
              aaa   aaaaaaaaaaa aaaaaaaaaaa
             Key to JOY

solvent inaccessible            UPPER CASE        X
solvent accessible                   lower case    x
alpha helix                            red        x
beta strand                            blue       x
3 - 10 helix                          maroon       x
hydrogen bond to main chain amide      bold        x
hydrogen bond to mainchain carbony l underline     x
disulphide bond                        cedilla     ç
positive phi                            italic     x
 Superposed structure of
Cytochromes superfamily
   Statistics of Superfamily Entries
                in PASS2
S.No               Classs            No.of Folds No. of SF No. of SMS No.of C-SMS No.of MMS No.of C-MMS
  1    All Alpha Proteins(a)             128        197        156        151         41         34
  2    All Beta Proteins(b)              87         158        115        110         43         15
  3    Alpha and Beta Proteins (a/b)     93         153         92         88         61         35
  4    Alpha and Beta Proteins (a+b) 168            237        185        174         52         19
  5    Multi Domain Proteins             25         25          22         20         3           0
  6    Membrane and Proteins             11         17          16         14         1           0
  7    Small Proteins                    52         72          57         56         15          7
  8    Total                             564        859        643        613        216        110
• Overington, J.P., Johnson, M.S., Sali, A., and Blundell, T.L. (1990).Tertiary
  structural constraints on protein evolutionary diversity:Templates, key
  residues and structure prediction. Proc. Royal Soc. Lond., B241, p.146.
• Sali, A., and Blundell, T.L. (1990).The definition of topological equivalence
  in homologous and analogous structures : A procedure involving a
  comparison of local properties and relationships. J. Mol. Biol., 212, p.403.
• Sowdhamini, R. & Blundell, T.L. (1995). An automatic method involving
  cluster analysis of secondary structure for the identification of domains in
  proteins. Prot. Sci. 4, p.506.
• Sowdhamini, R., Rufino, S.D. and Blundell, T.L. (1996).A database of
  globular protein structural domains: Clustering of representative family
  members into similar folds. Folding and Design, 1, p.209.
• Zhu, Z.-Y., Sali, A., and Blundell, T.L. (1992). A variable gap penalty
  function and feature weights for protein 3-D structure comparison. Prot.
  Engng.,, 5, p.43.
• The Wellcome Trust, UK.
• Dr.R.Sowdhamini, NCBS-TIFR, Bangalore,
• Lab members.
• NCBS-TIFR, Bangalore, India.
                  Good News!
• This PASS2 database is accepted by NAR and this paper
  will be appear in Jan 2002, NAR Database Supplement.
Thank You!

Shared By: