CATH Database .ppt - CATH Database by wulinqing


									CATH Database
Homologous Superfamily
CATH Database
• The CATH database is a hierarchical domain
     classification of protein structures in the
     Protein Data Bank.
•    There are 4 major levels in this hierarchy:
    1.   Class - C – Level
    2.   Architecture – A – Level
    3.   Topology – T – Level
    4.   Homologous Superfamily – H - Level
CATH Database – Level 1
Class: C-Level
•    Determined according to the secondary structure composition and
     packing within the structure. Three major classes are recognized:
    1. Mainly-Alpha
    2. Mainly-Beta
    3. Alpha – Beta
       1. Alpha/Beta
       2. Alpha+Beta
    4. A fourth class is also identified which contains protein domains
        which have low secondary structure content.
CATH Database – Level 2
Architecture: A-Level
• describes the overall shape of the
  domain structure as determined by the
  orientations of the secondary structure
  but ignores the connectivity between the
  secondary structures.
• It is currently assigned manually using a
  simple description of the secondary
  structure arrangement.
CATH Database – Level 3
Topology (Fold Family): T-Level
• Structures are grouped into fold groups at
  this level depending on both the overall
  shape and the connectivity of the
  secondary structures.
• This is done using the structures
  comparison algorithms:
  – SSAP?
CATH Database – Level 4
Homologous (Superfamily): H-Level
•   This level groups together protein domains which are thought to share a
    common ancestor and can therefore be described as homologous.
•   Similarities are identified either by high sequence identity or structure
    comparison using SSAP.
•   Structures are clustered into the same homologous superfamily if they
    satisfy one of the following criteria:
    1.   Sequence identity >=35%, overlap >=60% of larger structure equivalent to
    2.   SSAP score >=80.0, sequence identity >= 20%, 60% of larger structure
         equivalent to smaller.
    3.   SSAP score >= 70.0, 60% of larger structure equivalent to smaller, and
         domains which have related functions, which is informed by the literature and
         Pfam protein family database.
    4.   Significant similarity from HMM-sequence searches and HMM-HMM
         comparisons using SAM.
  CATH Hierarchy Search Example
• Class: Alpha, Beta, Alpha

• Architecture: Roll, Alpha-
  Beta Barrel, Box

• Topology: Proliferating
  Cell Nuclear Antigen
CATH Example Results

         PDB ID

To top