Tree Pattern Matching in Phylogenetic Trees by uda13689


									   Tree Pattern Matching in
      Phylogenetic Trees

Automatic Search for Orthologs or Paralogs
in Homologous Gene Sequence Databases

By: Jean-François Dufayard, Laurent Duret,
  Simon Penel, Manolo Gouy, François
      Rechenmann, and Guy Perrière

           Presented by: Jean Yeh
         Background Information

 The authors have created three databases that
  gather genes into homologous families
     HOVERGEN – vertebrates
     HOBACGEN – prokaryotes
     HOGENOM – completely sequenced organisms
 Among homologous genes, need to be able to
  differentiate orthologs from paralogs
        Homologous Sequences

 Homologs: Two genes related by descent
  from a common ancestral DNA sequence
 Orthologs: Two genes in different species;
  evolved from a single ancestral gene by
 Paralogs: Two genes related by duplication
  within a genome
   Orthologs and Paralogs
               Gene Function

 Gene function tends to change after gene
 Orthologs are more reliable predictors of
  gene function than paralogs
 Evolutionary distance also plays a role
 Closely related paralogs probably more
  similar than distantly related orthologs

 Create algorithms that allow for automatic
  searching for orthologs or paralogs in their
     One algorithm for tree reconciliation
     One algorithm for tree pattern matching
     Implement under architecture used to query the
              Tree Reconciliation

 Infers speciation and duplication events
 Compares gene tree G with species tree S to
  give a reconciled tree R
 Algorithm:
     R=S
     Step through G and R simultaneously
     If nodes are incongruent, insert duplication node
      in R and annotate gene losses
Tree Reconciliation
          Tree Pattern Matching

 A tree pattern is a peculiar tree structure with
  taxonomic and evolutionary parameters
  contained in nodes and leaves
 Can be considered a subtree
 Want to match to a target tree
 E.g. pattern (X, Y, Z) matches ((X, Y), Z),
  (X, (Y, Z)), and ((X, Z), Y)
          Tree Pattern Matching

 Uses a recurrence algorithm that takes into
  account different taxonomic levels as well as
  the specific branch constraints
 Cuts down on run time by checking the
  number of leaves in the pattern and the target
 Allows users to search for orthologs/paralogs
               FamFetch Interface

 User interface to access the databases
 Incorporates both algorithms
 Pattern editor has two frames: tool and
     Pattern frame – interactive editor to construct,
      load, save, and match patterns with a tree
     Tool frame – tools used in pattern frame
                Tree Rooting

 For tree reconciliation, the trees must be
 Authors use their reconciliation algorithm to
  find the most parsimonious solution – the
  one that requires the least number of gene
 Reconciliation algorithm relatively fast
            Tree Pattern Search

 By forming their algorithm as a tree pattern
  search, the authors managed to increase
  possible queries for the users
 Can search for gene duplication or gene
  speciation events, not just orthologs and
 Also relatively fast algorithm, though lose
  the human flexibility of pattern matching
   Automatic Search for Orthologs

 Previously done with pairwise BLAST
  searches and reciprocal hits
     Need all genes and if genes are wrong, results
      may be wrong
 Classifying genes into clusters of orthologs
  depends on evolutionary distance between
          Possible Improvement

 Have program estimate reliability of
 While it allows for easier comparative
  sequence analysis, it was designed solely for
  databases the authors had already created
 Might be improved if it could be generalized
  for more databases

To top