Towards Identifying Lateral Gene Transfer Events L. Addario-Berry, M. Hallett, J. Lagergren Presented By: Jeff Mathew Roadmap Key terms τ-transfer problem H-moves and I-moves algorithm Tree generation for simulation Experimental results Conclusions and future work Lateral transfer scenario LGT = HGT Root of scenario tree must correspond to root of gene tree The scenario tree is connected and respects the direction of evolution implied by the arcs of T and S. α-activity An α-active scenario for a gene tree and species tree allows at most alpha copies of a gene to simultaneously exist in the genome of an ancestral taxon. Authors focus on 1-active scenarios though intractability results have been proved earlier for α ≥ 1. τ-transfer problem Input: Species tree S, gene tree T, integer τ Output: A τ* lateral transfer scenario for S and T, τ* ≤ τ Intractability result ◦ The decision version of the α-Active, τ-Transfer Problem (does there exist a α-active scenario with cost ≤ τ?) is NP-complete. τ is the number of lateral transfer events needed to explain the difference between S and T Algorithm 2 Phase approach Phase 1 ◦ While H-fat or I-fat vertices remain Perform H-fat move or I-fat move At the end of phase 1, we are guaranteed that the scenario is 1-active. What about cycles? Phase 2 ◦ Remove minimum number of LGT events from each candidate to make it acyclic. Running Time: 24τ n2 Simulating species trees Create random species tree S on n-leaves. Θ(log n) expected depth S is supposed to reflect the actual evolutionary relationships between taxa ◦ S is ultrametric. Therefore, edge-weights correspond to time. ◦ Randomly assign weights to every edge such that every root-to-leaf path has weighted sum 1. Simulating gene trees Begin with generated ultrametric species tree Lateral transfer events occur according to a Poisson process with mean rate λ Moving from root to leaves, for each vertex x0 with children x1 and x2, examine both edges ◦ If the Poisson process provides us with a lateral transfer event along (x0, x1), we add it and point it to a randomly chosen edge alive at that point in time. ◦ Else add a speciation event for x1 ◦ Repeat the analysis for (x0, x2) Degenerate Cases Simulation can result in plausible biological events that are not detectable by the algorithm. Useless transfers: LGTs that don’t change the gene tree Transfer-loss events: One child of a node is a LGT event. Another child is a loss event. Ω = number of repetitions Results τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process Finding the saturation point The point when the average τ‘ stops increasing. Random trees from a large pool were chosen as gene trees and species trees ◦ Trials suggest that saturation point is slightly above n/2, i.e., when τ > n/2, the algorithms stops detecting new LGT events Thus, if τ’ > n/2, the correspondence between T and S via LGT events is not very meaningful. Ω = number of repetitions Results τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process Ω = number of repetitions Results τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process Ω = number of repetitions Results τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process Conclusions Empirically verified feasibility of the τ- transfer algorithm Degenerate events such as transfer-loss events that result in over-estimates of transfers occur with low probability Achieved near-optimal scenarios when λ is low enough not to cause saturation The cycle elimination phase of the algorithm is extremely rare in practice implying a O(22τ n2) running time. Future work and open problems Use weighted gene trees and species trees ◦ Species trees are nearly ultra-metric while gene trees are not Do fast algorithms exist when the input is a set of gene trees with no species tree? Tractability on larger phylogenies Can we consider gene duplication, lateral gene transfers, and other events simultaneously? Can we use probabilistic models that assign likelihood events to various events and optimize over such models in a tractable manner?
Pages to are hidden for
"matthew"Please download to view full document