# Slide 1 by e1X71NK

VIEWS: 6 PAGES: 29

• pg 1
```									                     1       4

Phylogeny Tree           3   2       5

Reconstruction

1   4       2   3       5
Phylogenetic Trees

• Nodes: species
• Edges: time of independent
evolution

• Edge length represents
evolution time

 AKA genetic distance

 Not necessarily
chronological time

CS262 Lecture 13, Win07, Batzoglou
Parsimony – direct method not using distances

•     One of the most popular methods:
     GIVEN multiple alignment
     FIND tree & history of substitutions explaining alignment

Idea:
Find the tree that explains the observed sequences with a minimal
number of substitutions

Two computational subproblems:

1. Find the parsimony cost of a given tree (easy)

2. Search through all tree topologies (hard)
CS262 Lecture 13, Win07, Batzoglou
Example: Parsimony cost of one column

{A}
Final cost C = 1

{A}

{A, B}
Cost
A
C+=1
B
A
A

A              B         A                A
{A}            {B}       {A}              {A}
CS262 Lecture 13, Win07, Batzoglou
Parsimony Scoring
Given a tree, and an alignment column u
Label internal nodes to minimize the number of required substitutions

Initialization:
Set cost C = 0; node k = 2N – 1 (last leaf)
Iteration:
If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth species

If k is not a leaf,
Let i, j be the daughter nodes;
Set Rk = Ri  Rj if intersection is nonempty
Set Rk = Ri  Rj, and C += 1, if intersection is empty

Termination:
Minimal cost of tree for column u, = C
CS262 Lecture 13, Win07, Batzoglou
Example

{B}

{A,B}

{A}
{B}
{A}
{A,B}
{A}

A                    A         A     A           B         B         A     B
{A}                  {A}         {A}   {A}         {B}       {B}       {A}   {B}
CS262 Lecture 13, Win07, Batzoglou
Traceback to find ancestral nucleotides

Traceback:

1. Choose an arbitrary nucleotide from R2N – 1 for the root

2. Having chosen nucleotide r for parent k,
If r  Ri choose r for daughter i
Else, choose arbitrary nucleotide from Ri

Easy to see that this traceback produces some assignment of cost C

CS262 Lecture 13, Win07, Batzoglou
Example

B
x                   Still optimal, but
A
{A, B}
A                                   B

{A}                                 x                           B
{A, B}                             A       B   A   B
B       x
x
A        B A         B                          A
{A}      {B} {A}     {B}                                         A       B   A   B

A

A
x
x
A       B   A   B

CS262 Lecture 13, Win07, Batzoglou
Multiple Sequence
Alignments

CS262 Lecture 13, Win07, Batzoglou
Definition

• Given N sequences x1, x2,…, xN:
 Insert gaps (-) in each sequence xi, such that
• All sequences have the same length L
• Score of the global map is maximum

• A faint similarity between two sequences becomes significant if
present in many

• Multiple alignments reveal elements that are conserved among a
class of organisms and therefore important in their common biology

• The patterns of conservation can help us tell function of the element

CS262 Lecture 13, Win07, Batzoglou
Scoring Function: Sum Of Pairs

Definition: Induced pairwise alignment
A pairwise alignment induced by the multiple alignment

Example:
x:   AC-GCGG-C
y:   AC-GC-GAG
z:   GCCGC-GAG
Induces:

x: ACGCGG-C;         x: AC-GCGG-C;   y: AC-GCGAG
y: ACGC-GAC;         z: GCCGC-GAG;   z: GCCGCGAG

CS262 Lecture 13, Win07, Batzoglou
Sum Of Pairs (cont’d)

• Heuristic way to incorporate evolution tree:

Human

Mouse
Duck
Chicken

• Weighted SOP:

S(m) = k<l wkl s(mk, ml)

CS262 Lecture 13, Win07, Batzoglou
A Profile Representation

-   A   G    G   C   T   A    T   C    A    C   C   T   G
T   A   G    –   C   T   A    C   C    A    -   -   -   G
C   A   G    –   C   T   A    C   C    A    -   -   -   G
C   A   G    –   C   T   A    T   C    A    C   –   G   G
C   A   G    –   C   T   A    T   C    G    C   –   G   G

A                    1                    1            .8
C              .6                 1           .4   1        .6 .2
G                        1 .2                          .2         .4     1
T              .2                     1       .6                  .2
-              .2            .8                             .4 .8 .4

• Given a multiple alignment M = m1…mn
 Replace each column mi with profile entry pi
• Frequency of each letter in 
• # gaps
• Optional: # gap openings, extensions, closings
 Can think of this as a “likelihood” of each letter in each position
CS262 Lecture 13, Win07, Batzoglou
Multiple Sequence Alignments

Algorithms

CS262 Lecture 13, Win07, Batzoglou
Multidimensional DP

Generalization of Needleman-Wunsh:

S(m) = i S(mi)

(sum of column scores)

F(i1,i2,…,iN):                Optimal alignment up to (i1, …, iN)

F(i1,i2,…,iN)                 = max(all neighbors of cube)(F(nbr)+S(nbr))

CS262 Lecture 13, Win07, Batzoglou
Multidimensional DP

• Example: in 3D (three
sequences):

• 7 neighbors/cell

F(i,j,k)                 = max{ F(i – 1, j – 1, k – 1) + S(xi, xj, xk),
F(i – 1, j – 1, k ) + S(xi, xj, - ),
F(i – 1, j     , k – 1) + S(xi, -, xk),
F(i – 1, j     ,k      ) + S(xi, -, - ),
F(i , j – 1, k – 1) + S( -, xj, xk),
F(i , j – 1, k        ) + S( -, xj, - ),
F(i , j       , k – 1) + S( -, -, xk) }

CS262 Lecture 13, Win07, Batzoglou
Multidimensional DP

Running Time:

1. Size of matrix: LN;

Where L = length of each sequence
N = number of sequences

2. Neighbors/cell: 2N – 1

Therefore………………………… O(2N LN)

CS262 Lecture 13, Win07, Batzoglou
Multidimensional DP

• How
Running Time: do gap states generalize?

1. Size of matrix: LN;
 Require 2N – 1 states, one per combination of
gapped/ungapped sequences
 Running of each  2N  LN) =
Where L = lengthtime: O(2N sequence O(4N LN)
N = number of sequences
Y         YZ

2. Neighbors/cell: 2N – 1
XY       XYZ        Z

Therefore………………………… O(2N LN)
X         XZ

CS262 Lecture 13, Win07, Batzoglou
Progressive Alignment
x
pxy
y

pxyzw                  z
pzw
w

•     When evolutionary tree is known:

 Align closest first, in the order of the tree
 In each step, align two sequences x, y, or profiles px, py, to generate a new
alignment with associated profile presult

Weighted version:
 Tree edges have weights, proportional to the divergence in that edge
 New profile is a weighted average of two old profiles

CS262 Lecture 13, Win07, Batzoglou
Progressive Alignment
x
y
Example
z
Profile: (A, C, G, T, -)
px = (0.8, 0.2, 0, 0, 0)
w
py = (0.6, 0, 0, 0, 0.4)

•     When evolutionary tree is known:     s(px, py) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A)
+ 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -)
 Align closest first, in the order of the tree
 In each step, align two sequences x, y, or profiles(0.7, y0.1,generate a new
Result: pxy = px, p , to 0, 0, 0.2)
alignment with associated profile presult
s(p , -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -)
x
Weighted version:
 Tree edges have weights, proportional to the divergence in that edge
Result: p = (0.4, 0.1, 0, 0, 0.5)
 New profile is a weighted average of two old x-
profiles

CS262 Lecture 13, Win07, Batzoglou
Progressive Alignment
x
y
?        z
w

•     When evolutionary tree is unknown:

 Perform all pairwise alignments
 Define distance matrix D, where D(x, y) is a measure of evolutionary
distance, based on pairwise alignment
 Construct a tree (UPGMA / Neighbor Joining / Other methods)
 Align on the tree

CS262 Lecture 13, Win07, Batzoglou
Heuristics to improve alignments

• Iterative refinement schemes

• A*-based search

• Consistency

• Simulated Annealing

• …
CS262 Lecture 13, Win07, Batzoglou
Iterative Refinement

One problem of progressive alignment:
• Initial alignments are “frozen” even when new evidence comes

Example:

x:             GAAGTT
y:             GAC-TT   Frozen!

z:             GAACTG   Now clear correct y = GA-CTT
w:             GTACTG

CS262 Lecture 13, Win07, Batzoglou
Iterative Refinement

Algorithm (Barton-Stenberg):

1.  For j = 1 to N,
Remove xj, and realign to               z   x
x1…xj-1xj+1…xN
2.  Repeat 4 until convergence         y

allow y to vary

x,z fixed projection

CS262 Lecture 13, Win07, Batzoglou
Iterative Refinement

Example: align (x,y), (z,w), (xy, zw):

x:   GAAGTTA
y:   GAC-TTA
z:   GAACTGA
w:   GTACTGA

After realigning y:

x:   GAAGTTA
y:   G-ACTTA   + 3 matches
z:   GAACTGA
w:   GTACTGA
CS262 Lecture 13, Win07, Batzoglou
Iterative Refinement

Example not handled well:

x:    GAAGTTA
y1:   GAC-TTA   Realigning any single yi
y2:   GAC-TTA   changes nothing
y3:   GAC-TTA

z:    GAACTGA
w:    GTACTGA

CS262 Lecture 13, Win07, Batzoglou
Consistency

zk
z

xi
x

y
yj        yj’

CS262 Lecture 13, Win07, Batzoglou
Consistency
zk
z
xi
x

y
yj                       yj’

Basic method for applying consistency

•     Compute all pairs of alignments xy, xz, yz, …

•     When aligning x, y during progressive alignment,

 For each (xi, yj), let s(xi, yj) = function_of(xi, yj, axz, ayz)
 Align x and y with DP using the modified s(.,.) function
CS262 Lecture 13, Win07, Batzoglou
Some Resources
Genome Resources

Annotation and alignment genome browser at UCSC
http://genome.ucsc.edu/cgi-bin/hgGateway

Specialized VISTA alignment browser at LBNL
http://pipeline.lbl.gov/cgi-bin/gateway2

ABC—Nice Stanford tool for browsing alignments
http://encode.stanford.edu/~asimenos/ABC/

Protein Multiple Aligners

http://www.ebi.ac.uk/clustalw/
CLUSTALW – most widely used

http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py
MUSCLE – most scalable

http://probcons.stanford.edu/
PROBCONS – most accurate

CS262 Lecture 13, Win07, Batzoglou

```
To top