Data Matrix Construction. Two criteria were applied for selection of genes and
species: a) only single-copy loci for which data from more than 10% of species in the
clade were available were retained, and b) only species with sequence data from more
than 3 of these loci were retained. Enforcement of these two criteria resulted in a data
matrix containing 77 species out of a total of 211 described species for this clade. The
density of taxon sampling was similar across major lineages (Supplementary Table 1).
D. willistoni and D. sturtevanti were utilised as outgroups. We used only loci present
as a single copy in the D. melanogaster genome to reduced concerns stemming from
the existence of hidden paralogs. In total, 11 loci fitting the selection criteria were
identified. Six of the loci are physically located in the nuclear genome, with the
remaining 5 located in the mitochondrial genome. Each species in the data matrix
contained an average of 6 loci (Supplementary Table 2).
Acquisition of genes from D. gunungcola and D. tristis. Gene fragments were
obtained by PCR amplification (primers are indicated in Supplementary Table 3) from
genomic DNA and sequencing of the cloned product. D. gunungcola: Adh (accession
#: AM 181670), COII (accession #: AM 181672), hb (accession #: AM 181673);
D.tristis COII (accession #: AM 181671), hb (accession #: AM 181668), gpdh
(accession #: AM 181669).
Acquisition of genes from raw genomic data. To acquire genes from unanotated
genomic data, we used the publicly available trace sequence data at the NCBI Trace
Archive from the genome projects of D. ananassae, D. erecta, D. sechellia, D.
simulans, D. yakuba, D. persimilis, D. pseudoobscura, and D. willistoni. Trace
sequence files for the genes of interest from these taxa were retrieved using Cross-
Species Mega BLAST (Altschul et al., 1997). Orthologs from various other
Drosophilid species were used as input queries for blasting each taxon’s genomic
trace archive. The resulting trace sequences were then used to build contigs using
Sequencher 4.1 (Gene Codes).
Phylogenetic Analysis. Genes were aligned with ClustalW (Thompson et al., 1994).
Indels and areas of uncertain alignment were excluded from further analysis.
Phylogenies were estimated using Bayesian inference (BI), maximum parsimony
(MP) and maximum likelihood (ML). The ML tree with clade support values under
all three optimality criteria is shown in Fig. S1.
Bayesian inference was conducted using MrBayes, version 3.1 (Huelsenbeck and
Ronquist, 2001). The analyses were designed so that each of the 11 loci was allowed
its own model of sequence evolution, rate of heterogeneity among sites and
proportion of invariable sites. Two independent analyses were run in parallel. Each
analysis contained 4 chains (1 cold and 3 incrementally heated) and trees were
sampled every 100 generations. The analyses were run until the average deviation of
split frequencies fell below 0.015 (after approximately 1,400,000 generations).
Following inspection of the likelihood values, the first 139,200 generations from each
of the two analyses were discarded as the burn-in. Clade support was assessed by
inspection of posterior probabilities from a total sample of 24,750 trees. Maximum
Parsimony analyses were performed using PAUP*, version 4.0b10 (Swofford, 2002).
All characters equally weighted and clade support was assessed using non-parametric
bootstrap re-sampling (100 replicates). Maximum likelihood analyses were performed
using PHYML, version 2.4.4 (Guindon and Gascuel, 2003). The model of nucleotide
evolution utilised was estimated by Modeltest (Posada and Crandall, 1998). The best-
fit model was the general-time reversible (R-matrix: A-C = 1.29, A-G = 6.37, A-T =
3.47, C-G = 3.36, C-T = 9.31, G-T = 1.00), and included rate heterogeneity across
sites (shape parameter alpha=0.79) and a proportion of invariable sites (p-inv = 0.53).
Clade support was assessed using non-parametric bootstrap re-sampling (100
Ancestral Trait Reconstruction. For the purposes of this study, the wing spot was
considered as a single character with two states (present / absent). We reconstructed
the evolution of the wing spot using Bayesian Inference as implemented in
BayesMultistate (Pagel et al., 2004), which provides a statistical framework that
explicitly accomodates for both phylogenetic and mapping uncertainty in its ancestral
character reconstruction estimates. This is achieved by construction of a model of
character evolution that is defined by: a) the two rates q01 and q10 (corresponding to
the rate of gain of spot and the rate of loss of wing spot, respectively), b) prior
probability distributions for these two rates, c) the distribution of trees from the BI
analysis, c) the likelihood function to evaluate the data on the distribution of trees
given the rates (Pagel et al., 2004).
We first tested whether the rate of gain (q01) was significantly different from the rate
of loss (q10), by performing ancestral character reconstruction using ML as
implemented in Discrete (Pagel, 1994). Comparison of the likelihood score obtained
when q01 = q10 with the likelihood score when q01 ≠ q10 showed that the model of
character evolution in which q01 ≠ q10 offers a significantly better fit to the data (LR =
7.58, alpha = 0.05, critical value for χ2 with 1 df = 3.841). The ML analyses also
allowed the construction of empirical bayes prior distributions for both rates (for q01
the prior distribution was uniform, ranging from 0.7 to 3.3; for q10 the prior
distribution was uniform, ranging from 6.4 to 19.2).
Analyses of the parameter values from the BI analyses (see above) showed that
neighboring trees in our 24,750 tree sample were not statistically independent. To
obtain independent phylogenetic trees from the posterior distribution, we retained
only 150 trees from a total sample of 24,750 trees (only one every 165 trees from the
posterior distribution was retained). Ancestral character reconstruction analyses using
BayesMultistate (Pagel et al., 2004) were performed on these 150 trees. Ancestral
character reconstruction analyses were run for 7,500,000 generations, with sampling
of parameter values occurring every 500 generations. The first 10% of the sample
discarded as burn-in, with the remaining 90% of the sample used for estimation of the
posterior probabilities for the ancestral character states of the most recent common
ancestors of various clades (Fig. 1). Similar results were obtained by independent runs
and by using a flat prior distribution (ranging from 0 to 100) for both rates (data not
Molecular Clock Analysis. Divergence date estimates for the whole clade were
obtained using the penalized likelihood method as implemented in r8s, version 1.7
(Sanderson, 2003). Confidence intervals for dates were estimated by repeating the
analyses on 100 nonparametric bootstrap replicates. The five paleobiogeographic
dates used in all analyses as calibration points are listed below.
a) 80 Myr was used as an upper bound on the age of this Drosophila clade. This date
is based on biogeographic evidence arguing that the age of the Drosophilidae is not
older than 80 Myr (Beverley and Wilson, 1984).
b) 36 Myr was used as a lower bound on the age of the most recent common ancestor
of the obscura and melanogaster species groups. This date stems from
Throckmorton’s (1975) inference showing that five Drosophilid lineages (two
subgenus Drosophila lineages, the sophophoran obscura lineage, Scaptodrosophila
and Hirtodrosophila) exhibit the same biogeographic patterns and that these patterns
indicate a Cenozoic age for each of these lineages (Powell and DeSalle, 1995).
c-e) 17 Myr was used as an upper bound on the age of the melanogaster species
subgroup, 7 Myr was used as an upper bound on the age of divergence between D.
erecta and D. orena, and 3 Myr was used as an upper bound on the age of divergence
between D. melanogaster and D. simulans. All dates are based on the biogeographic
inferences provided by Lachaise and colleagues (1988).
Drosophila stocks and maintenance. All stocks were raised on standard cornmeal or
Wheeler-Clayton (1965) food at 25°C, except D. tristis which needs to be maintained
at 18°C. Constructs were injected into D. melanogaster yw mutants as previously
described (Miller et al., 2002; Spradling and Rubin, 1982). Fly stocks were kindly
provided by M. T. Kimura (D. elegans, Hong-Kong and D. gunungcola, Sukarami,
Indonesia), S. Hagemann (D. tristis, Tübingen, Germany, D. guanche, Canary
Islands), V. Stamataki (D. tristis, Heidelberg, Germany), A. Kopp (D. mimetica,
Kuala Belalong, Brunei) and the Tucson Drosophila Stock Center (AZ, USA) (D.
Immunochemistry. Pupal wing dissected 70 hours after puparium formation were
stained with a rat anti-Yellow antibody (Wittkopp et al., 2002), as described
previously (Gompel et al., 2005).
Cloning. All the y fragments for reporter constructs were PCR amplified from
genomic DNA from various species (see Supplementary Table 3 for primer
sequences) and cloned into a customized version of a P-based transformation vector
(Barolo et al., 2000; Gompel et al., 2005). The entire 5’ regions of D. elegans and D.
gunungcola extend from the y coding sequence to the closest upstream gene (CG3777
in D. melanogaster (Flybase). The chimeric spotele or spotgun constructs were built by
stitching PCR sub-fragments or using the Quickchange Site-directed Mutagenesis kit
(Stratagene, La Jolla, USA) (primer sequences available on request).
Imaging. Wing imaging (transmitted light and confocal) was performed as described
previously (Gompel et al., 2005) on the wing of 5-day-old adult flies, or pupae 70 to
90 h after puparium formation. Courtship behaviours were imaged on a Leica M420
stereoscope equipped with a JVC TK-C1380 charge-coupled device camera.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and
Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 25, 3389-402.
Barolo, S., Carver, L. A. and Posakony, J. W. (2000). GFP and beta-galactosidase
transformation vectors for promoter/enhancer analysis in Drosophila. Biotechniques
29, 726, 728, 730, 732.
Beverley, S. M. and Wilson, A. C. (1984). Molecular evolution in Drosophila and the
higher Diptera II. A time scale for fly evolution. J Mol Evol 21, 1-13.
Gompel, N., Prud'homme, B., Wittkopp, P. J., Kassner, V. A. and Carroll, S. B.
(2005). Chance caught on the wing: cis-regulatory evolution and the origin of pigment
patterns in Drosophila. Nature 433, 481-7.
Guindon, S. and Gascuel, O. (2003). A simple, fast, and accurate algorithm to
estimate large phylogenies by maximum likelihood. Syst Biol 52, 696-704.
Huelsenbeck, J. P. and Ronquist, F. (2001). MRBAYES: Bayesian inference of
phylogenetic trees. Bioinformatics 17, 754-755.
Lachaise, D., Cariou, M.-L., David, J. R., Lemeunier, F., Tsacas, L. and Ashburner,
M. (1988). Historical biogeography of the Drosophila melanogaster species
subgroup. Evolutionary Biology 22, 159-226.
Miller, D. F., Holtzman, S. L. and Kaufman, T. C. (2002). Customized microinjection
glass capillary needles for P-element transformations in Drosophila melanogaster.
Biotechniques 33, 366-7, 369-70, 372 passim.
Pagel, M. (1994). Detecting correlated evolution on phylogenies: a general method
for the comparative analysis of discrete characters. Proceedings of the Royal Society
of London Series B-Biological Sciences 255, 37-45.
Pagel, M., Meade, A. and Barker, D. (2004). Bayesian estimation of ancestral
character states on phylogenies. Syst Biol 53, 673-84.
Posada, D. and Crandall, K. A. (1998). MODELTEST: testing the model of DNA
substitution. Bioinformatics 14, 817-818.
Powell, J. R. and DeSalle, R. (1995). Drosophila molecular phylogenies and their
uses. Evolutionary Biology 28, 87-138.
Sanderson, M. J. (2003). r8s: inferring absolute rates of molecular evolution and
divergence times in the absence of a molecular clock. Bioinformatics 19, 301-2.
Spradling, A. C. and Rubin, G. M. (1982). Transposition of cloned P elements into
Drosophila germ line chromosomes. Science 218, 341-7.
Swofford, D. L. (2002). PAUP*: Phylogenetic Analysis Using Parsimony (*and Other
Methods), (ed. Sunderland, MA: Sinauer.
Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994). Clustal-W - improving the
sensitivity of progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22,
Throckmorton, L. (1975). The phylogeny, ecology and geography of Drosophila. In
Handbook of Genetics, vol. 3 (ed. R. C. King), pp. 421-469. New York: Plenum
Wheeler, M. R. and Clayton, F. E. (1965). A new Drosophila culture technique.
Drosophila Information Service 40, 98.
Wittkopp, P. J., True, J. R. and Carroll, S. B. (2002). Reciprocal functions of the
Drosophila yellow and ebony proteins in the development and evolution of pigment
patterns. Development 129, 1849-58.