# matthew by yaoyufang

VIEWS: 7 PAGES: 16

• pg 1
```									Towards Identifying Lateral
Gene Transfer Events
L. Addario-Berry, M. Hallett, J. Lagergren

Presented By: Jeff Mathew
 Key terms
 τ-transfer problem
 H-moves and I-moves algorithm
 Tree generation for simulation
 Experimental results
 Conclusions and future work
Lateral transfer scenario
 LGT = HGT
 Root of scenario tree must correspond to root
of gene tree
 The scenario tree is connected and respects the
direction of evolution implied by the arcs of T and
S.
α-activity
 An α-active scenario for a gene tree and
species tree allows at most alpha copies of
a gene to simultaneously exist in the
genome of an ancestral taxon.
 Authors focus on 1-active scenarios
though intractability results have been
proved earlier for α ≥ 1.
τ-transfer problem
   Input: Species tree S, gene tree T, integer τ

   Output: A τ* lateral transfer scenario for S and T,
τ* ≤ τ

   Intractability result
◦ The decision version of the α-Active, τ-Transfer
Problem (does there exist a α-active scenario with
cost ≤ τ?) is NP-complete.

   τ is the number of lateral transfer events needed
to explain the difference between S and T
Algorithm
   2 Phase approach
   Phase 1
◦ While H-fat or I-fat vertices remain
 Perform H-fat move or I-fat move
   At the end of phase 1, we are guaranteed
that the scenario is 1-active. What about
cycles?
   Phase 2
◦ Remove minimum number of LGT events from
each candidate to make it acyclic.
   Running Time: 24τ n2
Simulating species trees
 Create random species tree S on n-leaves.
Θ(log n) expected depth
 S is supposed to reflect the actual
evolutionary relationships between taxa
◦ S is ultrametric. Therefore, edge-weights
correspond to time.
◦ Randomly assign weights to every edge such
that every root-to-leaf path has weighted sum
1.
Simulating gene trees
 Begin with generated ultrametric species tree
 Lateral transfer events occur according to a
Poisson process with mean rate λ
 Moving from root to leaves, for each vertex x0
with children x1 and x2, examine both edges
◦ If the Poisson process provides us with a lateral
transfer event along (x0, x1), we add it and point it to
a randomly chosen edge alive at that point in time.
◦ Else add a speciation event for x1
◦ Repeat the analysis for (x0, x2)
Degenerate Cases
 Simulation can result in plausible biological events
that are not detectable by the algorithm.
 Useless transfers: LGTs that don’t change the
gene tree
 Transfer-loss events: One child of a node is a LGT
event. Another child is a loss event.
   Ω   = number of repetitions

Results
   τ   = true number of LGT events
   τ‘ = minimum cost LGT scenario found by algorithm
   λ   = mean rate of LGTs from Poisson process
Finding the saturation point
   The point when the average τ‘ stops
increasing.
   Random trees from a large pool were
chosen as gene trees and species trees
◦ Trials suggest that saturation point is slightly
above n/2, i.e., when τ > n/2, the algorithms stops
detecting new LGT events
   Thus, if τ’ > n/2, the correspondence
between T and S via LGT events is not very
meaningful.
   Ω   = number of repetitions

Results
   τ   = true number of LGT events
   τ‘ = minimum cost LGT scenario found by algorithm
   λ   = mean rate of LGTs from Poisson process
   Ω   = number of repetitions

Results
   τ   = true number of LGT events
   τ‘ = minimum cost LGT scenario found by algorithm
   λ   = mean rate of LGTs from Poisson process
   Ω   = number of repetitions

Results
   τ   = true number of LGT events
   τ‘ = minimum cost LGT scenario found by algorithm
   λ   = mean rate of LGTs from Poisson process
Conclusions
 Empirically verified feasibility of the τ-
transfer algorithm
 Degenerate events such as transfer-loss
events that result in over-estimates of
transfers occur with low probability
 Achieved near-optimal scenarios when λ is
low enough not to cause saturation
 The cycle elimination phase of the algorithm
is extremely rare in practice implying a O(22τ
n2) running time.
Future work and open problems
   Use weighted gene trees and species trees
◦ Species trees are nearly ultra-metric while gene trees
are not
 Do fast algorithms exist when the input is a set of
gene trees with no species tree?
 Tractability on larger phylogenies
 Can we consider gene duplication, lateral gene
transfers, and other events simultaneously?
 Can we use probabilistic models that assign
likelihood events to various events and optimize
over such models in a tractable manner?

```
To top