Docstoc

Bio472-Phylogenetics1-4spp

Document Sample
Bio472-Phylogenetics1-4spp Powered By Docstoc
					Phylogenetics                                                   Phylogenetics
                                                                the anatomy of a tree

                                                                     Branch
                                                                                                                       Taxon
• The study of evolutionary relationships.                         (Edge, Link)
                                                                                                                        Leaf
                                                                                                       D                 Tip
• Conversion of DNA or protein sequence data into a branching
  diagram (‘tree’) that shows the relationships between the
  sequences.
                                                                                                       C


                                                                                                       B
                                                                                                                    Clade
                                                                                                       A
                                                                   Node                       time




Phylogenetics                                                   Phylogenetics
the many shapes of trees                                        time and most recent common ancestors (mrca)



                                                                           MRCA (ABC,D)
                                                                                                           time 3




                                                                                                                       time
                                                                          MRCA (AB,C)
                                                                                                           time 2
               =                 =                =
                                                                       MRCA (A,B)
                                                                                                           time 1
A B C D            A B C D           D C B A          C B A D


                                                                             A      B     C   D
Phylogenetics                                                                                Rooting Trees
rooted vs. unrooted trees
                                                                  A
                                                                                               • Root – one node identified as the root from which all other
                                                                                                 nodes descend.

              Rooted                                              B                            • Rooted trees have a direction corresponding to evolutionary
               Tree                                               C                              time.
                                                                                                  – Allow us to define ancestor-descendent relationships


                                                                  D
                                  A                                       C
          Unrooted
            Tree
                                  B                                       D




Rooting Trees                                                                                Rooting Trees
                                                                                                                                                                unrooted 3 taxa tree          B           C

Unrooted Tree                                                                                                                          3 possible roots for a 3 taxa tree                                 A
                       A          2                   4           C                                                                                                         D
                                                                                                                                                        D
                                                                                                                                                                                              B           C
                                            1                                                                                               B               C           B         C                           D

                                                                                          15 possible roots for a 4 taxa tree                               A                                             A
                       B          3                   5           D
                                                                                                                                                                                  A
                                                                                                       E                   E                                                    DE
                                                                                                                       B       D            B           D               B                         B       D
                                                                                                   B       D                                                E                             C                   C
Rooted Trees                                                                                                   C                   C                            C                                                 E

   1                   2                   3                  4               5                                A                   A                            A                     A                           A
                                                                                                       E                           E                E
                                                                                                                       B   D                                        C                             B   D       C
                                                                                               B           D       C                    B       D                       B       E D       C
                                                                                                                                   C                                                                              E

A B C D           A B C D             B A C D             C D A B         D C A B                                                                                                                             A
                                                                                                           A                       A                        A
(A,B),(C,D)       (A,(B,(C,D)))       (B,(A,(C,D)))       (C,(D,(A,B)))   (D,(C,(A,B)))                E                   E                            C               B       C E       D   B           C
                                                                                                   B       C           B       C            B               E                                                 D
                                                                                                               D                   D                            D                                                 E


                                                                                                               A                   A                            A                     A                       A
         Rooting Trees                                                                                                                     Rooting Trees
                                                                                                                                                                  A                        C
                      # Sequences                            # Unrooted Trees                               # Rooted Trees
                               2                                            1                                     1
                                                                                                                                                                  B                        D
                               3                                            1                                     3
                               4                                            3                                    15                            Outgroup Rooting                             Midpoint Rooting
                               5                                            15                                   105                                              Outgroup
                               6                                           105                                   945                                                                                       A
                               7                                           945                                 10,395                                             A
                               8                                         10,395                                135,135
                                                                                                                                                                                                           B
                               9                                         135,135                              2,027,025                                           B                                        C
                                                                                                                                                                  C
                              10                                     2,027,025                               34,459,425


                                                                                                                                                                  D                                        D




         Rooting Trees                                                                                                                     Phylogenetics
         SARS example                                     feline AAK09095
                                                                                                                                           terminology
                                                                                      human NP 073549

                                                porcine AJ271965
                                                                                         porcine AF353511
                                                                                                                                           • Ancestral State
            Replicase protein                                              1
                                                    avian AJ311317
                                                                                                                                              – The state of the common ancestor
                                                                     2
                                                                                       SARS AAP13442                                          – a.k.a. plesiomorphy

                                                                                  3    bovine AF220295
                                                                                                                                           • Derived State
                                              murine NP 068668
                                                                               murine AF201929                                                 – A state that has changed from the ancestor
                                                       murine AF029248
                                                                                                                                               – a.k.a. apomorphy
                                                                                                                                                  • Autapomorphy = unique derived state
                   root 1                                                root 2                                       root 3
                                                                                                                                                  • Synapomorphy = shared derived state
                       avian AJ311317                                       avian AJ311317                              bovine AF220295
                       SARS AAP13442                                        SARS AAP13442                               murine AF201929
                       bovine AF220295                                      bovine AF220295                             murine AF029248    • Homoplasy
                       murine AF201929                                      murine AF201929
                                                                            murine AF029248
                                                                                                                        murine NP 068668
                                                                                                                        SARS AAP13442
                                                                                                                                               – Similarity due to parallel evolution, convergent evolution or
                       murine AF029248
                       murine NP 068668                                     murine NP 068668                            avian AJ311317
                                                                                                                                                 secondary loss.
                       porcine AJ271965                                     porcine AJ271965                            porcine AJ271965
                       feline AAK09095                                      feline AAK09095                             feline AAK09095
                       porcine AF353511                                     porcine AF353511                            porcine AF353511
                       human NP 073549                                      human NP 073549                             human NP 073549
Stavrinides & Guttman 2004 J.Virology 78:76
Phylogenetics                                                          Phylogenetics
terminology                                                            homoplasy

    Derived        Ancestral
                                                   Homoplasy
   Character       Character




                                                                            Parallel                Convergent                 Secondary
                                                                           Evolution                 Evolution                    Loss
                                                                           Independent               Independent          Reversion to ancestral
                                                                        evolution of same        evolution of same                state
                                                                       character from same         character from
                                                                          ancestral state        different ancestral
                                                                                                         state




Phylogenetics                                                          Phylogenetics
homoplasy                                                              fundamental elements
                           Ancestral Sequence
                           ACTGAACGTAACGC                                • Taxa
                                                                            – Proper sampling

               A                                         A               • Loci
               C           Single substitution           C   → A
                                                         T                  –   Homologous sequences
               T
               G                                         G                  –   Sufficient (but not excessive) genetic variation
    T ← C ←    A          Multiple substitution          A                  –   Proper sampling of genetic variation
               A                                         A
                        Coincidental substitutions                          –   Independence of characters
         G ←   C                                         C   → A
               G                                         G
         A ←   T          Parallel substitutions         T   → A         • Analysis
               A                                         A                  – Quality data
       *
    T ← C ←    A        *Convergent substitution         A   *
                                                             → T
               C                                         C                  – Good multiple sequence alignments
               G                                         G
                                                             → T + C
                            +Back sustitution
               C                                         C       →
Phylogenetics                                                                               Phylogenetics
tree building methods                                                                       distance-based methods

 • Distance methods                                                                         • Relationships based upon the amount of dissimilarity between sequences.
    – UPGMA                                                                                 • Advantages
    – Neighbor-joining                                                                          – Computationally fast.
    – Minimum evolution                                                                         – Single ‘best tree’ found.

 • Character-based (discrete) methods                                                       • Disadvantages
     – Maximum parsimony                                                                        – Assume additive distances.
     – Maximum likelihood                                                                       – Information loss occurs when transforming sequence data into distances.
                                                                                                – Uninterpretable branch lengths (e.g. negative distances or distance
 • Note, there are many other powerful and important methods                                      corresponding to fractional substitutions).
   that we do not have time to discuss.                                                         – Single ‘best tree’ found.




Distance-Based Phylogenetic Methods                                                         Distance-Based Phylogenetic Methods
additive distances and the four-point metric condition                                      four-point metric condition
• Evolutionary distances between each pair of taxa is equal to the sum of                                                                    A          B    C          D
                                                                                            A   ACGCGTTGGGCGATGGCAAC
  the lengths of each branch in the path connecting them.                                                                            A       -          3    7          8
                                                                                            B   -------------C--T--T
   – This includes hypothetical ancestral taxa.                                                                                      B                  -    6          7
                                                                                            C   ----A---AAT----AT--T                 C                       -          3
• Must satisfy the four-point metric condition.                                             D   --A-A---A-T---AAT--T                 D                                  -

dAB + dCD ≤ maximum(dAC + dBD, dAD + dBC)        dij is the distance between taxa i and j                   A                                       C
• is equivalent to stating that of the three distances                                                             2                          2
    (dAB + dCD, dAC + dBD, dAD + dBC), the two largest distances are equal.
                                                                                                                                 4
                                                                                                                    1                         1
      A                                              C              dAB = v1 + v2                             B                                   D
              v1                        v4                          dAC = v1 + v3 + v4
                           v3                                       dAD = v1 + v3 + v5
                                                                    dBC = v2 + v3 + v4                          dAB + dCD ≤ max(dAC + dBD, dAD + dBC)
                                                                    dBD = v2 + v3 + v5
              v2                            v5                      dCD = v4 + v5                                       3+3 ≤ 8+6 = 7+7
          B                                        D
                                                                                                                           6 ≤ 14 = 14
Distance-Based Phylogenetic Methods                                                       Distance-Based Phylogenetic Methods
ultrametric trees                                                                         additive trees
                           • Distances between all taxa and their common
A   -                        ancestors are equal.                                          A     -                             The taxa may diverge different amounts from
                                                                                                                                         their common ancestor
B   2    -                   Molecular Clock                                               B     6     -
                               rate of evolution is the same and constant for all
C   6    6        -                                                                        C     7     3       -
                                  lineages
D   10   10       10   -                               1                                   D    14    10       9       -                                 5
                                                               A                                                                                                            A
    A    B        C    D                                                                         A     B       C       D           1
                                       2
                                                                      2                                                                                                            6
                                                       1                                                                                       1
                       2                                       B           6                                           2                                                B              7
                                                                      6         10                                                                                                 3        14
                                               3                                                                                   1
                                                               C           10                                                                                           C              10
                                                                     10                                                                                                            9
                                   5                                                                                                   6
                                                               D                                                                                                        D
              5        4       3           2       1       0                                               6       5           4           3       2          1     0




Distance-Based Phylogenetic Methods                                                       UPGMA
Unweighted Pair Group Method using Arithmetic Averages (UPGMA)
                                                                                     1.   Given a matrix of pairwise distances, find the clusters i and j such that dij is the minimum.

                                                                                                                           A            B               C          D         E
                                                                                                               A           -           0.17            0.21       0.31      0.23
• Assumes rate of change among branches of the tree is constant                                                B                         -             0.30       0.34      0.21
  (molecular clock).                                                                                           C                                         -        0.28      0.39
                                                                                                               D                                                    -       0.43
• Distances are ultrametric.                                                                                   E                                                              -

   – Distances between all taxa and their common ancestors are
      equal.                                                                         2.   Define the depth of the branch between i and j to be dij/2


                                                                     D                               Branch between A and B at depth of 0.17 / 2 = 0.086

                                                                     C                                                                                              A
                                                                                                                                                          0.086
                                                                     B
                                                                                                                                                                     B
                                                                     A
     UPGMA                                                                                               UPGMA
3.   Define a distance u to each other cluster (k) to be an average of the distances dki and dkj.   3.   Define a distance u to each other cluster (k) to be an average of the distances dki and dkj.

                                 A           B         C           D           E                                                     A           B         C            D              E
                       A         -          0.17      0.21        0.31        0.23                                         A         -          0.17      0.21         0.31           0.23
                       B                      -       0.30        0.34        0.21                                         B                      -       0.30         0.34           0.21
                       C                                -         0.28        0.39                                         C                                -          0.28           0.39
                       D                                            -         0.43                                         D                                             -            0.43
                       E                                                        -                                          E                                                            -



                                            dA:B,C = (0.21 + 0.30)/2 = 0.26                                                                     dA:B,C = (0.21 + 0.30)/2 = 0.26
                                                                                                                                                dA:B,D = (0.31 + 0.34)/2 = 0.33



                                      A:B           C         D           E                                                               A:B           C         D               E
                           A:B         -           0.26                                                                        A:B         -           0.26      0.33
                            C                        -       0.28        0.40                                                   C                        -       0.28         0.40
                            D                                  -         0.43                                                   D                                  -          0.43
                            E                                              -                                                    E                                               -




     UPGMA                                                                                               UPGMA
3.   Define a distance u to each other cluster (k) to be an average of the distances dki and dkj.   4.   Go back to step 1 with one less cluster
                                                                                                         •   clusters i and j have been eliminated, and cluster u has been added.
                                 A           B         C           D           E                                                         A:B        C            D         E
                       A         -          0.17      0.21        0.31        0.23                                             A:B        -        0.26         0.33      0.22
                       B                      -       0.30        0.34        0.21                                              C                    -          0.28      0.40
                       C                                -         0.28        0.39                                              D                                 -       0.43
                       D                                            -         0.43                                              E                                           -
                       E                                                        -

                                                                                                                  Branch between A:B and E at depth of 0.22 / 2 = 0.11

                                            dA:B,C = (0.21 + 0.30)/2 = 0.26                                                                                                   A
                                            dA:B,D = (0.31 + 0.34)/2 = 0.33
                                            dA:B,E = (0.23 + 0.21)/2 = 0.22
                                                                                                                                                                              B
                                                                                                                                                         0.11
                                      A:B           C         D           E
                           A:B         -           0.26      0.33        0.22
                                                                                                                                                                              E
                            C                        -       0.28        0.40
                            D                                  -         0.43
                            E                                              -
   UPGMA                                                                                                    UPGMA


                                     A:B        C             D             E                                                                 A:B           C             D             E
                         A:B          -        0.26          0.33          0.22                                                   A:B          -           0.26          0.33          0.22
                          C                      -           0.28          0.40                                                    C                         -           0.28          0.40
                          D                                    -           0.43                                                    D                                       -           0.43
                          E                                                  -                                                     E                                                     -

If cluster i contains Ti taxa, and cluster j contains Tj then:       dku = (Tidki + Tjdkj) / (Ti + Tj)   If cluster i contains Ti taxa, and cluster j contains Tj then:          dku = (Tidki + Tjdkj) / (Ti + Tj)

                                           dA:B:E,C = (2 x 0.26 + 0.40)/3 = 0.31                                                                    dA:B:E,C = (2 x 0.26 + 0.40)/3 = 0.31
                                                                                                                                                    dA:B:E,D = (2 x 0.33 + 0.43)/3 = 0.36


                                       A:B:E           C             D                                                                          A:B:E              C             D
                             A:B:E       -            0.31                                                                            A:B:E       -               0.31          0.36
                               C                        -           0.28                                                                C                           -           0.28
                               D                                      -                                                                 D                                         -




   UPGMA                                                                                                    UPGMA

                                      A:B:E        C            D                                                                              A:B:E          C             D
                          A:B:E         -         0.31         0.36                                                                A:B:E         -           0.31          0.36
                            C                       -          0.28                                                                  C                         -           0.28
                            D                                    -                                                                   D                                       -


             Branch between C and D at depth of 0.28 / 2 = 0.14

                                                                           A                                                                    dA:B:E,C:D = (3 x 0.31 + 3 x 0.36)/6 = 0.34


                                                                           B
                                                                                                                                                     A:B:E          C:D
                                                                           E                                                            A:B:E          -            0.34
                                                                                                                                         C:D                         -
                                                                           C
                                               0.14
                                                                           D
UPGMA                                                                   Distance-Based Phylogenetic Methods
                                                                        Neighbor-Joining

                                  A:B:E           C:D
                       A:B:E          -           0.34                • Additive trees, but removes the assumption that the data are
                        C:D                        -                    ultrametric.
                                                                         – The taxa may diverge different amounts from their common
                                                                            ancestor.
   Branch between A:B:E and C:D at depth of 0.34 / 2 = 0.17

                                                         A

                                                         B

                                                         E
                                 0.17
                                                         C

                                                         D




Neighbor-Joining                                                        Neighbor-Joining


 1. Give a matrix of pairwise distances (d), calculate the net           4. Regenerate matrix by defining the distance from u to each
    divergence (ri) for each terminal node i for all taxa. N is the         remaining terminal node as:
    number of terminal nodes.                                                          dku = (dik + djk – dij) / 2
                                       N
                               ri =   ∑d
                                      k =1
                                             ik
                                                                         5. If more than 2 nodes, return to step 1. If tree is fully defined
                                                                            except for length of the branch joining the 2 remaining nodes (i
2. Create a rate-corrected distance matrix (M).                             and j) then define this branch as
                   Mij = dij – (ri + rj) / (N – 2)                                                     sij = dij

        - the only values of interest are the minimum Mij

3. Define a new node u whose three branches join nodes i, j and
   the rest of the tree.
           siu = dij / 2 + (ri – rj) / [2(N – 2)]
           sju = dij - siu
Neighbor-Joining                                                                              Neighbor-Joining
star decomposition
              C                                                             C
      B                   D                                     B                     D                                             Pairwise distances – upper diagonal
                                                                                                                                    Rate-corrected distances – lower diagonal

  A                           E                             A                             E                     A        B         C            D            E      ri
                                                                                                      A         -       0.17      0.21      0.31            0.23
      H                   F                                     H                     F               B                   -       0.30      0.34            0.21
              G                                                             G                         C                            -        0.28            0.39
                                                                                                      D                                         -           0.43
              C                                                                                       E                                                      -
      B                   D                                                 C
                                                                B                     D
                                                                                               1.   Give a matrix of pairwise distances (d), calculate the net divergence
 A
                              E                             A                             E         (ri) for each terminal node i for all taxa. N is the number of
                                                                                                    terminal nodes.
                                                                                                                                                 N
      H
                                                                H                     F
                                                                                                                                         ri =   ∑d     ik
                              F                                                                                                                 k =1
                  G                                                         G




Neighbor-Joining                                                                              Neighbor-Joining


                                          Pairwise distances – upper diagonal                                                       Pairwise distances – upper diagonal
                                          Rate-corrected distances – lower diagonal                                                 Rate-corrected distances – lower diagonal

                      A            B      C            D              E          ri                             A         B        C            D            E      ri
          A           -           0.17   0.21      0.31              0.23       0.92                  A         -       0.17      0.21      0.31            0.23   0.92
          B                        -     0.30      0.34              0.21                             B                   -       0.30      0.34            0.21   1.02
          C                               -        0.28              0.39                             C                            -        0.28            0.39
          D                                            -             0.43                             D                                         -           0.43
          E                                                           -                               E                                                      -



 1.   Give a matrix of pairwise distances (d), calculate the net divergence                    1.   Give a matrix of pairwise distances (d), calculate the net divergence
      (ri) for each terminal node i for all taxa. N is the number of                                (ri) for each terminal node i for all taxa. N is the number of
      terminal nodes.                                                                               terminal nodes.
                                                        N                                                                                        N
                                                ri =   ∑d
                                                       k =1
                                                                ik                                                                       ri =   ∑d
                                                                                                                                                k =1
                                                                                                                                                       ik
Neighbor-Joining                                                                     Neighbor-Joining


                                         Pairwise distances – upper diagonal                                               Pairwise distances – upper diagonal
                                         Rate-corrected distances – lower diagonal                                         Rate-corrected distances – lower diagonal

                   A         B           C            D            E      ri                         A         B           C            D       E      ri
           A       -        0.17       0.21          0.31         0.23   0.92                A       -        0.17       0.21          0.31    0.23   0.92
           B                  -        0.30          0.34         0.21   1.02                B     -0.48        -        0.30          0.34    0.21   1.02
           C                             -           0.28         0.39   1.18                C                             -           0.28    0.39   1.18
           D                                           -          0.43   1.36                D                                           -     0.43   1.36
           E                                                       -     1.26                E                                                  -     1.26

                                                                                        2.   Create a rate-corrected distance matrix (M).
 1.   Give a matrix of pairwise distances (d), calculate the net divergence
                                                                                                     Mij = dij – (ri + rj) / (N – 2)
      (ri) for each terminal node i for all taxa. N is the number of
      terminal nodes.                                                                                MAB = 0.17 – (0.93 + 1.02)/3 = -0.48
                                                       N
                                              ri =    ∑d
                                                      k =1
                                                             ik




Neighbor-Joining                                                                     Neighbor-Joining


                                         Pairwise distances – upper diagonal                                               Pairwise distances – upper diagonal
                                         Rate-corrected distances – lower diagonal                                         Rate-corrected distances – lower diagonal

                   A         B           C            D            E      ri                         A         B           C            D       E      ri
           A       -        0.17       0.21          0.31         0.23   0.92                A       -        0.17       0.21          0.31    0.23   0.93
           B     -0.48        -        0.30          0.34         0.21   1.02                B     -0.48        -        0.30          0.34    0.21   1.02
           C     -0.49     -0.43         -           0.28         0.39   1.18                C     -0.49     -0.43         -           0.28    0.39   1.19
           D     -0.45     -0.45      -0.57            -          0.43   1.36                D     -0.45     -0.45      -0.57            -     0.43   1.36
           E     -0.50     -0.55      -0.42          -0.44         -     1.26                E     -0.50     -0.55      -0.42          -0.44    -     1.26

      2.   Create a rate-corrected distance matrix (M).                                 2.   Create a rate-corrected distance matrix (M).
                   Mij = dij – (ri + rj) / (N – 2)                                                   Mij = dij – (ri + rj) / (N – 2)

                                                                                             - identify the minimum Mij
Neighbor-Joining                                                                           Neighbor-Joining


                                               Pairwise distances – upper diagonal                                                        Pairwise distances – upper diagonal
                                               Rate-corrected distances – lower diagonal                                                  Rate-corrected distances – lower diagonal

                    A            B            C             D      E      ri                                   A            B            C               D         E      ri
        A           -           0.17         0.21          0.31   0.23   0.93                      A           -           0.17         0.21          0.31        0.23   0.93
        B                        -           0.30          0.34   0.21   1.02                      B                        -           0.30          0.34        0.21   1.02
        C                                      -           0.28   0.39   1.19                      C                                      -           0.28        0.39   1.19
        D                                    -0.57          -     0.43   1.36                      D                                    -0.57            -        0.43   1.36
        E                                                          -     1.26                      E                                                               -     1.26

3.   Define a new node u whose three branches join nodes i, j and the                      3.   Define a new node u whose three branches join nodes i, j and the
     rest of the tree.                                                                          rest of the tree.
          siu = dij / 2 + (ri – rj) / [2(N – 2)]                                                     sC,Node1 = dCD / 2 + (rC – rD) / [2(N – 2)]
                                                                                                              = 0.28 / 2 + (1.19 – 1.36) / 2(3) = 0.11
         sju = dij - siu                                                                             sD,Node1 = dCD – sC,Node1
                                                                                                              = 0.28 – 0.11 = 0.17




Neighbor-Joining                                                                           Neighbor-Joining

                        Node1        Node2         Node3                                                           Node1        Node2         Node3
             A            -                                                                             A            -
             B            -                                                                             B            -
                                                                                                                                                                                D
             C          0.11                                                                            C          0.11
             D          0.17                                                                            D          0.17
              E           -                                                                              E           -
            Node1         -                                                                            Node1         -
            Node2         -                                                                            Node2         -
                                                   E                                                                                          E

                                     A                      D                                                                   A
                                                                                                                                                             n1


                                         B             C                                                                            B


                                                                                                                                                  0.05                    C
 Neighbor-Joining                                                                                     Neighbor-Joining
                                                                                                     Begin the process again:
                            A           B              C             D             E
                A           -          0.17           0.21          0.31          0.23                   •     Calculate net divergence (ri) and rate-corrected distance matrix (M).
                B                       -             0.30          0.34          0.21                                             A          B            E       node1            ri
                C                                      -            0.28          0.39                              A              -         0.17         0.23         0.12     0.52
                D                                                    -            0.43                              B          -0.37          -           0.21         0.18     0.56
                E                                                                  -                                E          -0.39     -0.43             -           0.27     0.71

4.   Regenerate matrix by defining the distance from u to each remaining terminal                                node1         -0.43     -0.39        -0.37             -       0.57
     node as:    dku = (dik + djk – dij) / 2   where ij is the new node, and k is the tip
                                                                                                 •     Find the minimum rate corrected distance
                    dA,Node1 = (dA,C + dA,D - dC,D) / 2
                                                                                                 •     Calculate the distance to node2
                    dA,Node1 = (0.21 + 0.31 – 0.28) / 2 = 0.12
                                                                                                             siu = dij / 2 + (ri – rj) / [2(N – 2)]
                                  A              B            E          node1                               sA,Node2 = dA,Node1 / 2 + (rA – rNode1) / [2(N – 2)]
                     A            -           0.17           0.23          0.12                              sA,Node2 = 0.12 / 2 + (0.52 – 0.57) / 4 = 0.05
                     B                           -           0.21          0.18                              sju = dij - siu
                      E                                       -            0.27                              sNode1,Node2 = 0.12 – 0.05 = 0.07
                    node1                                                   -




 Neighbor-Joining                                                                                     Neighbor-Joining

                                      Node1          Node2        Node3
                                                                                                                                        B             E          node2         ri
                            A           -            0.05
                            B           -              -                                                                       B         -          0.21         0.11         0.32
                                                                                             D                                 E       -0.51          -          0.19         0.40
                            C         0.11             -
                            D         0.17             -                                                                  node2        -0.51        -0.51          -          0.31
                            E           -              -
                          Node1         -            0.07
                                                                                                                          sB,Node3 = 0.11 / 2 + (0.32 – 0.31) / 2 = 0.06
                          Node2         -              -
                                                                                                                          sNode2,Node3 = 0.11 – 0.06 = 0.05
                     E

                                                                                                                                                      E          node3
                                                                    n1
                                            n2                                                                                           E            -          0.14
                                                                                                                                       node3                       -
                      B

                                        A                                                                                                     sE,Node3 = 0.14
                                                           0.05                          C
Neighbor-Joining                                                                   Neighbor-Joining

                             Node1         Node2        Node3                                                       Node1             Node2      Node3
                   A              -        0.05              -                                            A              -                0.05        -
                   B              -          -              0.06                                          B              -                 -         0.06
                                                                               D
                   C             0.11        -               -                                            C             0.11               -          -

E                  D             0.17        -               -                                            D             0.17               -          -
                    E             -          -              0.14                                           E             -                 -         0.14
                  Node1           -        0.07              -                                           Node1           -                0.07        -
                  Node2           -          -              0.05                                         Node2           -                 -         0.05



                                                                                                               n3                                B
              n3                                            n1                               n2
                                      n2                                                                                                                        E
                                                                                                                    A
                                                                                                                                                            C
         B                                                                                          n1
                                  A                                                                                                                             D

                                                 0.05                  C                              0.05




Distance-Based Phylogenetic Methods                                                Distance-Based Phylogenetic Methods
UPGMA vs. Neighbor-Joining                                                         Minimum Evolution

                                                                                   • Finds the tree that minimizes the total branch length of the tree (L).

                                                                 A                 • Unrooted tree of n taxa has (2n-3) branches, each with length ei
                                                                 B
                                                                                   • Estimate the length ei of each branch from the pairwise distances between
                                                                 E                   taxa.
                                                                 C                                                           2 n −3

                                                                 D
                                                                                                                    L=       ∑ei =1
                                                                                                                                      i



                        n3                              B                          • Has an optimality criterion - optimum defined by minimum tree length
       n2
                                                                           E           – Conceptually similar to Maximum Parsimony except based upon pairwise
                             A                                                           distances
                                                                   C
                                                                                      Simulation studies find ME to be one of the most accurate methods.
             n1                                                            D

               0.05
Distance-Based Phylogenetic Methods                                                 Distance-Based Phylogenetic Methods
Minimum Evolution

1. Pick a tree topology.

2. Estimate the length of each branch of the tree based on the pairwise
   distance between the taxa.

3. Determine the total tree length by summing the individual branch lengths.

4. Return to step 1 and repeat until the smallest (optimal) tree length is found.




                                                                                                                      www.megasoftware.net




Distance-Based Phylogenetic Methods                                                 Distance-Based Phylogenetic Methods
Distance-Based Phylogenetic Methods   Distance-Based Phylogenetic Methods




Distance-Based Phylogenetic Methods   Distance-Based Phylogenetic Methods
  Character-Based Phylogenetic Methods                                       Maximum Parsimony
  Maximum Parsimony
                                                                         • Advantages
                                                                            – Maximizes similarity that can be attributed to common ancestry.
    • Optimization criterion                                                    • Any character that does not fit a given tree requires us to postulate
                                                                                  that the similarity is due to homoplasy, not homology.
          – The best explanation of the data is the simplest
                                                                            – Based on an implicit evolutionary assumption:
                • requires the fewest ad hoc assumptions.                       • Evolutionary change is rare.
                                                                            – Can produce many equally parsimonious trees.
          – Gives rise to the shortest tree
                • The tree with the fewest number of substitutions.
                                                                         • Disadvantages
                • The tree with the fewest number of homoplasies                                                              A                           C
                                                                                 – Can be inconsistent under certain models
                                                                                    • “Long branch attraction”
                                                                                 – Can produce many equally parsimonious trees.   B              D




  Maximum Parsimony                                                          Maximum Parsimony
  minimizing evolutionary change across a tree                               selecting the most parsimonious tree

                                             Position 1                                                              Position 1

                                Sequence 1   ATATT                                                    Sequence 1     ATATT
                                Sequence 2   ATCGT                                                    Sequence 2     ATCGT
                                Sequence 3   GCAGT                                                    Sequence 3     GCAGT
                                Sequence 4   GCCGT                                                    Sequence 4     GCCGT
assume tree ((1,2),(3,4))

                                                                             Potential Tree 1            Potential Tree 2             Potential Tree 3
                A               G                                              ((1,2),(3,4))               ((1,3),(2,4))                ((1,4),(2,3))
            1                       3            1A                 G3
                                                                             A               G                                        A              A
                                                                         1                       3     1A                  A2     1                       2
                      A     G                             G A
            2A                  G4               2A                 G4             A    G                        G                           G
                     1 Step                               5 Steps        2A                  G4        3G                  G4     4G                 G3
                                                                                   1 Step                     2 Steps                     2 Steps
  Maximum Parsimony                                                Sequence 1             ATATT            Maximum Parsimony
  selecting the most parsimonious tree                             Sequence 2             ATCGT            selecting the most parsimonious tree
                                                                   Sequence 3             GCAGT
                                                                   Sequence 4             GCCGT                                                                  Sites
                                                                                                                              Tree           1       2       3           4*    5*        Total
                                                                                                                         ((1,2),(3,4))       1       1       2           1      0         5
            Position 2                                        Position 3
                                                                                                                         ((1,3),(2,4))       2       2       1           1      0         6
        T                C                  1A                A 3 1A                             A3
    1                        3                                                                                           ((1,4),(2,3))       2       2       2           1      0         7
                                                                     or                                                   * Not phylogenetically informative
              T   C                                   C                                   A
    2T                   C4                 2C                 C4         2C                     C4       Most parsimonious tree
             1 Step                                2 Steps                          2 Steps

                  Position 4                                             Position 5
                                                                                                             1                  3             1                    2                 1                    2

             1T                  G3                                1T                      T 3

                      G                                                         T                            2                  4             3                    4                4                     3
             2G                                                    2T                      T 4
                                 G4
                   1 Step                                                  0 Steps




  Maximum Parsimony                                                                                        Maximum Parsimony
  the process                                                                                              the process

                                                        Taxa             1       2         3   4                                                                               {A}       {C}     {A}    {G}
                                                                                                       Continue with procedure until reach the node above the root
                                        Character state (S1..5)         {A}     {C}       {A} {G}
                                                                                                         (the basal fork node).
                                                                                                                                                                              {A,C}
 Initialize tree length = 0                                                                            If the root state is not contained in the basal fork node state             {A}                   l=1
                                                                                                 l=0      set, increase the length of the tree by 1.
                                                                                                                                                                                      {A,G}
                                                                                                                                                                                                 {C}
                                                                                          {C}
                                                                                           5 (root)
                                                                                                       Start at the root node and traverse up through the nodes.
                                                                                                                                                                                {A}       {C}     {A}   {G}
                                                                          Si        Sj                 For the case where an internal node has more than one
                                                                         {A}        {C}    {A}   {G}     potential state:                                                        {A}
Visit internal nodes:
   • If the intersection of the states Si and Sj is not empty,                                            • If the derived node state set shares a state with the                   {A}                   l=1
      set the state of this node (Sk) to the intersection state.               Sk                           ancestral node, set the derived node state equal to the
                                                                                                                                                                                          {A}
   • Else, set the state of this node (Sk) to be the union of Si                                 l=0        ancestral node.
                                                                           {A,C}
      and Sj                                                                                              • If the derived node state set does not share a state                                 {C}
                                                                                                            with the ancestral node, pick one of the states from
                                                                                           {C}              the derived node arbitrarily.
   Maximum Parsimony                                                                                          Maximum Parsimony
   the process                                                                                                generalized parsimony
                                                                         {A}       {C}   {A} {G}
Count the number of state changes and add this number to                                                    • We can easily establish weighting schemes for all possible
                                                                         {A}
  the tree length.                                                                                            substitutions to increase the biological realism.
                                                                            {A}                  l=4
In this case, number of state changes = 3                                          {A}
                                                                                                                                              Substitution Model           Step Matrix
                                                                                         {C}
                                                                                                                                                                           A   C   G   T
                                                                                                                                                           1           A   0   1   1   1
                                                                      {A}          {C}   {A} {G}                                                A                  G
                                                                                                           Equal substitution probability                              C   1   0   1   1
Note that there is an alternative, equally as parsimonious                                                                                      1     1        1   1
                                                                                                                                                                       G   1   1   0   1
  reconstruction for this tree.                                          {A}
                                                                            {A}                  l=4
                                                                                                                                                C         1        T   T   1   1   1   0

                                                                               {G}
                                                                                                                                                                           A   C   G   T
                                                                                         {C}                                                               1           A   0   2   1   2
The total number of evolutionary changes on a tree (the tree’s                                             Transversions more                   A                  G
  length, L) is the sum of the number of changes at each                                   k               costly than transitions                                     C   2   0   2   1

                                                                                   L = ∑ li
  position. If we have k positions, each with a length l, the the                                                                               2     2        2   2   G   1   2   0   2
  total length L of a tree is:
                                                                                          i =1                                                  C                  T   T   2   1   2   0
                                                                                                                                                          1




   Maximum Parsimony                                                                                          Maximum Parsimony
   generalized parsimony                                                                                      weighted parsimony

                                    Sequence 1       ATATT
                                    Sequence 2       ATCGT                                                  • Different sequence position often evolve at different rates (e.g. 3rd
                                    Sequence 3       GCAGT                                                    positions of codons).
                                    Sequence 4       GCCGT
                                                                                                            • Rapidly evolving sites may quickly become saturated with change.
      A                   A          A                    C                A                       C
  1                       3         1                        2           1                             2
                                                                                                                                                     k
  2
      C
                          4
                          C
                                    3
                                     A                    C
                                                             4           4
                                                                           C                       A
                                                                                                       3                                    L = ∑ wi li
                                                                                                                                                    i =1
                              Tv = Ts = 1                         Tv = 2, Ts = 1
          Tree        1   2     3    4      5    T    1       2      3         4     5     T
                                                                                                                                      wi = weight of position i
      ((1,2),(3,4))   1   1     2    1      0    5    1       1      4         1     0     7
      ((1,3),(2,4))   2   2     1    1      0    6    2       2      2         1     0     7
      ((1,4),(2,3))   2   2     2    1      0    7    2       2      4         1     0     9

				
DOCUMENT INFO
Shared By:
Stats:
views:52
posted:2/26/2010
language:English
pages:19