Allan Wilson Centre
An introduction to trees
An introduction to trees and
and networks
networks
David Penny
Allan Wilson Centre for
Molecular Ecology and Evolution
overview
overview of talk
trees and networks (with cycles)
spanning and phylogenetic (Steiner) trees
exploratory data analysis, visualization
multiscale
splits
compatibility/incompatibility
application - the MinMax squeeze
tree terminology
m
1a 1b 1c r
.. . . . .. . . .
n
p q
s
o
1d 1e 1f
6.3 3.7 3.5
6
4 5 5.5 6.1
5.9
t1 t2 t3 t4 t5 t6
1g 1h 1i t5
t1
t4
t6
t2
t1 t2 t3 t4 t5 t6 t1 t2 t3 t4 t5 t6 t3
1c, spanning tree 1f, phylogenetic (Steiner) tree
multiscale, popn growth
lose internal nodes
0 1 2 3
lose nodes near the root minimum spanning subtrees
mixture of spanning and Steiner trees even using complete sequences
the question of multiscale
effects of time scale
sequences abundant sequences sparse
ancestral states still present speciation only at tips
multifurcations and cycles binary phylogenetic trees
sites in same rate class (gamma) sites on or off, covarion model
neutral plus deleterious hypervariable sites average over all sites
TIME SCALE
generations populations species families orders classes phyla
microevolution macroevolution
dataset used to argue poor
RASA
Data set against RASA
performance of
Mol. Biol. Evol. 19:14–23. 2002
RASA1.NXS
taxon A 1000010000 taxonH taxonG
taxon B 1000001000
taxon C 0100001000 taxonI taxonF
taxon D 0100000100
taxon E 0010000100 taxonJ taxonE
taxon F 0010000010
taxon G 0001000010
taxonA taxonD
taxon H 0001000001
taxon I 0000100001 taxonB taxonC
0.01
taxon J 0000110000 Fit=100.0 ntax=10 nchar=10 -dsplits -hamming
some reasons for cycles?
repeated mutations – random (insufficient information)
- systematic [e.g. GC bias]
- positive selection
hybridization
gene conversion/concerted evolution
lateral transfer (incl. endosymbiosis)
alternative (non-Steiner tree) models
(incorrect assumptions about the mechanism)
exploratory data analysis - EDA
pika guineapig
rabbit canerat
aardvark
strict consensus tree mouse
dormouse
treeshrew
tenrec squirrel rat
vole
elephant
armadillo cebus
loris
flyingfox baboon
mole
harbseal fruitbat macaca
cat human
dog
whiterhino gibbon
horse
cow pig
hippo
finwhale
rabbit pika canerat
guineapig
dormouse
10% consensus
aardvark treeshrew
squirrel
tenrec
elephant
harbseal
armadillo
dog
cat
vole mouse
rat
horse
loris
whiterhino cebus
pig mole
cow fruitbat
flyingfox
human
gibbon
finwhale hippo baboon
macaca
guineapig
rabbit pika
canerat
bandicoot
opposum
dormouse
treeshrew
wallaroo
squirrel
possum
platypus cebus
aardvark
human
tenrec
gibbon
elephant
armadillo loris
baboon
macaca
harbseal
fruitbat
mole
dog
cat flyingfox
horse whiterhino pig
cow
finwhale hippo
consensus methods lose
information
Three data sets from Bandelt et al
1 2 3 (1995). The consensus of each would
taxon A 1000 110 1000 be a polytomy as for (a). However,
taxon B 0100 101 0111 the median network of each shows
taxon C 0010 011 0100
taxon D 0001 000 0010
that each has a different data
structure
000 1000 0010
0100 0010 110
0001
1000
101
0100 0111
011
are sites always in the same rate class?
observed expected under i.i.d observed
NJ Split graph/dcov Split graph/dcov
cyanobacteria
LowG+C cyanobacteria LowG+C LowG+C
cyanobacteria
HighG+C green HighG+C
100 plants. HighG+C green
100 100 green
87
plants.
98 82 plants
γ α γ α
γ α
Similar results for RecA, 16SrDNA, Hsp60, FtsZ
AtpB (2000) MBE 17, 835-838
0.12 support clashes
0.1
Lento plot, ex Kerryn
0.08
0.06
0.04
0.02
0
* * * * * * * * * * *++++++ ++
-0.02
-0.04
-0.06
50
50
signals, Kerry’s birds
falcon & stork/penguin
40 chicken & duck/goose
40
30
30 20
20 10
10 0
0 -10
-20
-10
196 1 0.9
1 0.9 0.8 0.7 S1
0.8 0.7 187 0.6 0.5
0.6 0.5 0.4 0.3
0.4 0.3 0.2 0.1
0.2 0.1
overview (contd)
overview of talk
trees and networks (with cycles)
spanning and phylogenetic (Steiner) trees
exploratory data analysis, visualization
multiscale
splits
compatibility/incompatibility
application - the MinMax squeeze
when MP and ML are equivalent
Ø for a single column of data (this leads to),
Ø there is no common mechanism shared by the
characters,
Ø maximum evolutionary pathway likelihood(ML ep) is
equivalent to maximum parsimony,
Ø there are effectively infinite character states (the same
mutation never occurs twice)
Ø abundant sequences?
overview
overview of talk
trees and networks (with cycles)
spanning and phylogenetic (Steiner) trees
exploratory data analysis, visualization
multiscale
splits
compatibility/incompatibility
application - the MinMax squeeze
loss of information
0.01 0.005 0.002 0.001
1
0.8
0.6
0.4
0.2
0
1 10 100 1000 10000
-0.2
splits
Ø from trees and networks
Ø from data - sequences (character states)
Ø from data - distances
what is a split - from trees
c
d
b
a e
A B
what is a split - from trees
c
d
b
a e
A B
A = {a,b} B = {c, d, e}
Denote by A | B.
splits in a network
c d
b
a e
{ a,b,c} {d,e}
{ a,b,e} {c,d}
summary of introduction
overview of talk
trees and networks (with cycles)
spanning and phylogenetic (Steiner) trees
exploratory data analysis, visualization
multiscale – generations to macroevolution
splits – from trees, characters or distances
compatibility/incompatibility
application - the MinMax squeeze