Drosophila Microarray Analysis
RNA isolation, Hybridization, Normalization, DE analysis
Eye imaginal disc total RNA was isolated with TRIzol reagent (Invitrogen) and was further
purified using an RNeasy kit (Qiagen). In accordance with the Affymetrix protocol (Affymetrix
expression manual), all samples were processed and a total of 15ug of fragmented and labeled
cRNA were hybridized to the Affymetrix GeneChip arrays (Drosophila genome 385K 2.0). The
Chips were then washed and stained using an Affymetrix Fluidics Station 450 and Flourescence
was detected using the Affymetrix GS3000. For each genotype there were three biological
replicates, a total of 12 Affymetrix raw files (*.CEL files) were background corrected and
normalized using the R-Bioconductor (Gentleman et al. 2004) package “affy” (Irizarry et al.
2003a; Gautier et al. 2004). Following normalization, the statistical technique RMA (Robust
Multi-chip Average Method) (Irizarry et al. 2003a; Irizarry et al. 2003b) was used to estimate gene
expression. The Bioconductor package “arrayQualityMetrics” (Kauffmann et al. 2009) was used
to determine the quality and variability of the microarray experiment. In order to carry out
differential expression (DE) analysis (linear model) of genes with a preprocessed data set with an
empirical Bayes method, the Bioconductor package “limma” (Smyth 2004) was used to calculate
the log2 fold change (logFC). The fold change (FC) is defined as the ratio of intensities between
the two experimental conditions under comparison. Because these corresponded to two
populations of RNA, each comprised by 3 biological replicates, the ratio of the average intensities
across replicates was calculated. This package determines the t-test p-value and then determines
the multiple test correction (FDR) (Benjamini and Hochbert 1995) p-value for each of the genes
under the two experimental conditions. For all comparisons, a gene was considered differentially
expressed if it had a FDR p-value < 0.05 (irrespective of the DE logFC). A more stringent logFC
filter was applied of >0.5 (~ >1.4 fold increase or decrease) to all comparisons to generate a more
refined set of DE genes.
To annotate probes with the corresponding Ensembl gene ID, gene symbol and description, we
used the annotation description from Affymetrix website
(http://www.affymetrix.com/support/technical/annotationfilesmain.affx), Biomart, Ensembl v.55
(Drosophila melanogaster genes; BDGP 5.4) (Hubbard et al. 2007) and FlyBase database
Enrichment Analysis of Pathways
Enrichment Analysis (EA) of Pathways
Functional annotation of genes is based on Gene Ontology (GO Consortium, 2006). EA was
performed using Gitools (http://www.gitools.org) to identify processes that might be enriched
among up- or down- regulated genes. In search of statistical significance (p-value) we use
binomial distribution and p-value calculated as:
n = total no. of genes in the category
x = number of differently expressed genes in the category.
p= frequency of upregulated or downregulated genes
Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method
of False Discovery Rate (FDR) (Benjamini and Hochbert 1995).
Identification of putative Sd and dE2F1 binding sites for DE genes
We determined if the DE genes only found in either rbf/wt, warts/wt, or rbf wts (DM)/wt contain
either putative binding sites for Sd, dE2F1, or both Sd and dE2F1. To search for SD binding
sites in promoter regions (600bp upstream sequence with 100bp downstream sequences
relative to transcription start site (TSS) for these genes) the TRANSFAC database
(Release 2009.1) (Matys et al. 2003) position frequency matrices (PFM) for Drosophila
(insect matrices) was used. STORM algorithm (Schones et al. 2007) was used for scanning
through sequences to find SD binding sites represented in the TRANSFAC PFM. Since
TRANSFAC database does not offer a PFM for Drosophila (or insect) specific E2F transcription
factor, we used the ChIP-on-chip data from (Xu et al. 2007) for E2F1, E2F4, E2F6 on five
different human cell lines including normal cells, and we used the Drosophila ortholog (Biomart,
Ensembl v.55 ; Homo sapiens genes GRCh37; Drosophila melanogaster genes; BDGP 5.4 ;
(Hubbard et al. 2007)) for these human E2F target genes to search which DE genes from our
analysis were among the ortholog targets. All Drosophila ortholog targets of human E2F1, E2F4
and E2F6 were considered as putative Drosophila dE2F1 targets Following this, an examination of
common targets determined which DE genes contained putative binding sites for both Sd and
Hybridization was done at by the Functional Genomics Facility at the University of
Chicago and raw data analysis was performed by AI and NLB at the University of Pompeu Fabra.
Cloning of endogenous luciferase reporters
A minimal promoter taken from the heat shock protein-70 gene was cloned in between the HindIII and
BglII sites of the (Promega). For each reporter the following fragments were PCR amplified from
genomic DNA isolated from Canton S. flies and then subcloned (in the sense orientation) upstream the
hsp70 minimal promoter in the pGL3-basic luciferase vector [All position references are with respect the
transcriptional start site of each gene as annotated by Flybase version 2010_04].
Gene Fragment Sites subcloned in between
Cdc2c -620bp to +140bp MluI-XhoI
dDP -701bp to +119bp MluI-XhoI
Ex2kb -2kb to -1bp MluI-XhoI
Ex1kb -1kb to +1kb MluI-XhoI
CycB3 -640bp to +120bp KpnI-NheI
Dachs -560bp to +220bp KpnI-NheI
DNA polymerase ε -680bp to +280bp KpnI-NheI
Mcm2 -620bp to +101bp KpnI-NheI
Mcm3 -680bp to +149bp KpnI-NheI
Mcm10 -600bp to +159bp KpnI-NheI
DNA polymerase ε 553 -680bp to -127bp KpnI-NheI
DNA polymerase ε SdE2F -427bp to -127bp KpnI-NheI
dDP 709 -701bp to +8bp KpnI-NheI
dDP 273 -428bp to +119bp KpnI-NheI
dDP SdE2F -428bp to +8bp KpnI-NheI
Following sequencing analysis, each plasmid was then used as described in the paper. Primer sequences
used in the cloning and plasmid maps are available upon request.
Benjamini, Y. and Hochbert, Y. 1995. Controlling the false discovery rate: A practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society 57: 289–300.
Drysdale, R. 2008. FlyBase : a database for the Drosophila research community. Methods
Mol Biol 420: 45-59.
Gautier, L., Cope, L., Bolstad, B.M., and Irizarry, R.A. 2004. affy--analysis of Affymetrix
GeneChip data at the probe level. Bioinformatics 20(3): 307-315.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B.,
Gautier, L., Ge, Y., Gentry, J., and et al. 2004. Bioconductor: open software development for
computational biology and bioinformatics. Genome Biol 5(10): R80.
G.O. Consortium, The Gene Ontology (GO) project in 2006, Nucleic Acids Res. 34 (2006),
Hubbard, T.J., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L.,
Coates, G., Cunningham, F., Cutts, T., and et al. 2007. Ensembl 2007. Nucleic Acids Res
35(Database issue): D610-617.
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003a.
Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31(4): e15.
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and
Speed, T.P. 2003b. Exploration, normalization, and summaries of high density oligonucleotide
array probe level data. Biostatistics 4(2): 249-264.
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T.,
Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y. 2008. KEGG for linking genomes to
life and the environment. Nucleic Acids Res 36(Database issue): D480-484.
Kauffmann, A., Gentleman, R., and Huber, W. 2009. arrayQualityMetrics--a bioconductor
package for quality assessment of microarray data. Bioinformatics 25(3): 415-416.
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K.,
Karas, D., Kel, A.E., Kel-Margoulis, and et al. 2003. TRANSFAC: transcriptional regulation, from
patterns to profiles. Nucleic Acids Res 31(1): 374-378.
Schones, D.E., Smith, A.D., and Zhang, M.Q. 2007. Statistical significance of cis-
regulatory modules. BMC Bioinformatics 8: 19.
Smyth, G.K. 2004. Linear models and empirical bayes methods for assessing differential
expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3.
Xu, X., Bieda, M., Jin, V.X., Rabinovich, A., Oberley, M.J., Green, R., and Farnham, P.J.
2007. A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells
reveals interchangeable roles of E2F family members. Genome Res 17(11): 1550-1561.