National Centre for Biological
• The National Centre for Biological Sciences (NCBS),
Bangalore is a new centre of the Tata Institute of
Fundamental Research, Mumbai
• Four Major Research Groups are there.
• Biochemistry, Biophysics and Bioinformatics
• Genetics and Development
• Cellular Organization and Signaling
About Sowdhamini’s group
(Computational Approaches to Protein Science)
• This lab began its activities at NCBS in July 1998.
• This lab is interested in studying protein structural
similarities which are important to the prediction of fold
and/or function of new proteins.
• This group interested ,
• In the development of tools for such predictive exercises.
• In large-scale sequence analysis of proteins that are
determined by genome sequence projects.
• In applying existing methods to specific systems which are
of interest to collaborative research.
Development of Tools
• Protein Superfamily alignment database - PASS2 Compilation
and analysis of distantly related protein structural similarities
form an important component of the lab’s research activities.
These similarities are being used to recognize newer members of
relatively well-understood protein superfamilies.
• DDBASE - (Dial derived Domain database)
• DSDBASE- Protein Disulphide database is useful in searching
for sub-structural folds in known proteins which are compatible
to accommodate given disulphide connectivity
• Genome Analysis
• Structure prediction and modelling of
transmembrane domain of inositol-triphosphate
and ryanodine receptors
• Modelling transforming growth factors (TGF) and
their interaction with receptors
• Sequence analysis and structure prediction of a
protein in bioluminescence superfamily
• Comparative modelling and analysis of active site
region of a non-functional serine protease domain
PASS2: Semi -Automated database
of Protein Alignments organized as
Structural Superfamiles (ver.2)
• Homologous proteins resemble each other in sequence, 3D
structure and usually function; they are related through
• Divergent evolution has also led to the existence of
superfamilies with very low sequence identities, but
similar topologies and often related functions.
• Derived databases are now available that classify protein
structures deposited in the PDB into homologous families,
superfamilies and folds.
• We have generated and updated, nearly-automated version
of the original superfamily alignment database,
CAMPASS (Sowdhamini et al., 1998) called PASS2:
Semi-Automated database of Protein Alignments
organised as Structural Superfamilies.
• Structure-based alignments of superfamilies confirm that
identities and/or conservative variation in sequence are
usually associated with structural determinants (key
packing relationships or hydrogen bonds) or functional
requirements, common to the superfamily.
• Therefore, these structure-based alignments provide a firm
basis for understanding and predicting amino acid
substitutions in superfamilies and for developing methods
of “Fold Recognition”.
Structure-based Sequence Alignment
and Compilation of the Database
• This new version PASS2, contains all possible alignments
of protein structures at the superfamily level in the direct
correspondence with the SCOP database (Murzin et al.,
• For all MMS (Multi Member Superfamilies) alignment of
family members is based on the conservation of structural
features like solvent accessibility, hydrogen bonding and
the presence of secondary structures.
• The superfamilies are chosen from SCOP and all the
representative members are extracted by using 27%
sequence identity cutoff, with high resolutions. For NMR
structure we assigned 3.2Å as resolution cutoff.
• The initial equivalence are obtained by using non-gap position of
MALIGN alignment of superfamiliy members with JOY, mnyfit
• We have employed COMPARER (Sali et al., 1990; Sali et
al.,1992) to obtain the multiple sequence alignment of distantly
• The final alignment of MMS & SMS are presented in JOY format
(Mizuguchi et al.,1998) to include structural features.
• Structure-based sequence alignment are validated by using the
RMSD VALIDATION method for miss-alignments.
• If the optimal alignments are not good then we repeated the
process once again with STAMP and then CAMPARER program
with Simulated Annealing option.
• Final sequence alignment of members in the MMS were used as
initial equivalence to superpose structures using MNYFIT
without update of the equivalence.
Description of the Database
• PASS2 URL:
• Access Methods having may links given as follows:
• SMS-PASS2 Single Member Superfamilies of PASS2
• MMS-PASS2 Multi Member Superfamilies of PASS2
• More information about the database
• Software information
• Browse the list of Entries
• Keyword search
• Align Your sequence with PASS2 members
• PSI-BLAST search against PASS2
• PHI-BLAST search against PASS2
• CAMPASS (old version of MMS)
Genome Distribution and other
• For the MMS & SMS family members we find the genome
distribution for 72 genome, in which more than 40
genomes are completed genomes.
• Finally we annotated that genome with JOY program,
which is available via Genome Distribution link.
• Also there are two other main links ‘Structural-Motifs’,
‘Functional-Motifs’ will be available soon.
• For each entries in MMS & SMS there are links to their
output datafiles such as atm, seq, hbond, sstruc, and psa
are also available and these datafiles will be helpful for
• Also, there are links more than 25 other Databases And
Analysis Tools With Information on each MMS/SMS
Genome Distribution Procedure
(RMSD of equivalent regions)
Percentage Identity Matrix of
1bbha- 1cpq-- 256bb-
1bbha- 100.0 23.6 22.6
1cpq-- 23.6 100.0 19.8
256bb- 22.6 19.8 100.0
high highest percentage identity
low lowest percentage identity
Alignment Based On Similarities In
Structural Features (COMPARER Alignment)
1bbha-( 1 ) aglspeeqIetRqagyefmgwNmgkIkanlegeynaaqVeaAAnvIaaiA
1cpq--( 1 ) adtkevleaReayfkslggSmkaMtgvaka-fdaeaAkveAakLekil
256bb-( 1 ) adledNmetlndnlkvIekAd----naaqVkdALtkMraAA
1bbha-( 51 ) nsgmgalygpgTdk-nvgdvkTrvkpeffqnmedvgkiarefvgaAntLa
1cpq--( 48 ) atdvaplFpagTSstdlpg-qTeakaaIwanmddfgakgkamheaGgaVi
256bb-( 38 ) ld-AqkatPpkle---------dksp-dspeMkdfrhGfdilvgqIddAl
1bbha-( 100 ) evAatgeaeaVktafgdVgaackschekYrak
1cpq--( 97 ) aaAnagdgaaFgaalqkLggtckachddYreed
256bb-( 77 ) klAnegkvkeAqaaAeqLkttrnayhqkyr
aaa aaaaaaaaaaa aaaaaaaaaaa
Key to JOY
solvent inaccessible UPPER CASE X
solvent accessible lower case x
alpha helix red x
beta strand blue x
3 - 10 helix maroon x
hydrogen bond to main chain amide bold x
hydrogen bond to mainchain carbony l underline x
disulphide bond cedilla ç
positive phi italic x
Superposed structure of
Statistics of Superfamily Entries
S.No Classs No.of Folds No. of SF No. of SMS No.of C-SMS No.of MMS No.of C-MMS
1 All Alpha Proteins(a) 128 197 156 151 41 34
2 All Beta Proteins(b) 87 158 115 110 43 15
3 Alpha and Beta Proteins (a/b) 93 153 92 88 61 35
4 Alpha and Beta Proteins (a+b) 168 237 185 174 52 19
5 Multi Domain Proteins 25 25 22 20 3 0
6 Membrane and Proteins 11 17 16 14 1 0
7 Small Proteins 52 72 57 56 15 7
8 Total 564 859 643 613 216 110
• Overington, J.P., Johnson, M.S., Sali, A., and Blundell, T.L. (1990).Tertiary
structural constraints on protein evolutionary diversity:Templates, key
residues and structure prediction. Proc. Royal Soc. Lond., B241, p.146.
• Sali, A., and Blundell, T.L. (1990).The definition of topological equivalence
in homologous and analogous structures : A procedure involving a
comparison of local properties and relationships. J. Mol. Biol., 212, p.403.
• Sowdhamini, R. & Blundell, T.L. (1995). An automatic method involving
cluster analysis of secondary structure for the identification of domains in
proteins. Prot. Sci. 4, p.506.
• Sowdhamini, R., Rufino, S.D. and Blundell, T.L. (1996).A database of
globular protein structural domains: Clustering of representative family
members into similar folds. Folding and Design, 1, p.209.
• Zhu, Z.-Y., Sali, A., and Blundell, T.L. (1992). A variable gap penalty
function and feature weights for protein 3-D structure comparison. Prot.
Engng.,, 5, p.43.
• The Wellcome Trust, UK.
• Dr.R.Sowdhamini, NCBS-TIFR, Bangalore,
• Lab members.
• NCBS-TIFR, Bangalore, India.
• This PASS2 database is accepted by NAR and this paper
will be appear in Jan 2002, NAR Database Supplement.