Microarray Analysis

Document Sample
Microarray Analysis Powered By Docstoc
					Drosophila Microarray Analysis

       RNA isolation, Hybridization, Normalization, DE analysis

       Eye imaginal disc total RNA was isolated with TRIzol reagent (Invitrogen) and was further

purified using an RNeasy kit (Qiagen). In accordance with the Affymetrix protocol (Affymetrix

expression manual), all samples were processed and a total of 15ug of fragmented and labeled

cRNA were hybridized to the Affymetrix GeneChip arrays (Drosophila genome 385K 2.0). The

Chips were then washed and stained using an Affymetrix Fluidics Station 450 and Flourescence

was detected using the Affymetrix GS3000. For each genotype there were three biological

replicates, a total of 12 Affymetrix raw files (*.CEL files) were background corrected and

normalized using the R-Bioconductor (Gentleman et al. 2004) package “affy” (Irizarry et al.

2003a; Gautier et al. 2004). Following normalization, the statistical technique RMA (Robust

Multi-chip Average Method) (Irizarry et al. 2003a; Irizarry et al. 2003b) was used to estimate gene

expression. The Bioconductor package “arrayQualityMetrics” (Kauffmann et al. 2009) was used

to determine the quality and variability of the microarray experiment. In order to carry out

differential expression (DE) analysis (linear model) of genes with a preprocessed data set with an

empirical Bayes method, the Bioconductor package “limma” (Smyth 2004) was used to calculate

the log2 fold change (logFC). The fold change (FC) is defined as the ratio of intensities between

the two experimental conditions under comparison. Because these corresponded to two

populations of RNA, each comprised by 3 biological replicates, the ratio of the average intensities

across replicates was calculated. This package determines the t-test p-value and then determines

the multiple test correction (FDR) (Benjamini and Hochbert 1995) p-value for each of the genes

under the two experimental conditions. For all comparisons, a gene was considered differentially

expressed if it had a FDR p-value < 0.05 (irrespective of the DE logFC). A more stringent logFC
filter was applied of >0.5 (~ >1.4 fold increase or decrease) to all comparisons to generate a more

refined set of DE genes.

To annotate probes with the corresponding Ensembl gene ID, gene symbol and description, we

used the annotation description from Affymetrix website

(, Biomart, Ensembl v.55

(Drosophila melanogaster genes; BDGP 5.4) (Hubbard et al. 2007) and FlyBase database

(Drysdale 2008).

         Enrichment Analysis of Pathways

Enrichment Analysis (EA) of Pathways

Functional annotation of genes is based on Gene Ontology (GO Consortium, 2006). EA was

performed using Gitools ( to identify processes that might be enriched

among up- or down- regulated genes. In search of statistical significance (p-value) we use

binomial distribution and p-value calculated as:

n = total no. of genes in the category

x = number of differently expressed genes in the category.

p= frequency of upregulated or downregulated genes

Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method

of False Discovery Rate (FDR) (Benjamini and Hochbert 1995).

       Identification of putative Sd and dE2F1 binding sites for DE genes

We determined if the DE genes only found in either rbf/wt, warts/wt, or rbf wts (DM)/wt contain

either putative binding sites for Sd, dE2F1, or both Sd and dE2F1. To search for SD binding

sites in promoter regions (600bp upstream sequence with 100bp downstream sequences

relative to transcription start site (TSS) for these genes) the TRANSFAC database

(Release 2009.1) (Matys et al. 2003) position frequency matrices (PFM) for Drosophila

(insect matrices) was used. STORM algorithm (Schones et al. 2007) was used for scanning

through sequences to find SD binding sites represented in the TRANSFAC PFM. Since

TRANSFAC database does not offer a PFM for Drosophila (or insect) specific E2F transcription

factor, we used the ChIP-on-chip data from (Xu et al. 2007) for E2F1, E2F4, E2F6 on five

different human cell lines including normal cells, and we used the Drosophila ortholog (Biomart,

Ensembl v.55 ; Homo sapiens genes GRCh37; Drosophila melanogaster genes; BDGP 5.4 ;

(Hubbard et al. 2007)) for these human E2F target genes to search which DE genes from our

analysis were among the ortholog targets. All Drosophila ortholog targets of human E2F1, E2F4
and E2F6 were considered as putative Drosophila dE2F1 targets Following this, an examination of

common targets determined which DE genes contained putative binding sites for both Sd and


         Hybridization was done at by the Functional Genomics Facility at the University of

Chicago and raw data analysis was performed by AI and NLB at the University of Pompeu Fabra.

Cloning of endogenous luciferase reporters

A minimal promoter taken from the heat shock protein-70 gene was cloned in between the HindIII and

BglII sites of the (Promega). For each reporter the following fragments were PCR amplified from

genomic DNA isolated from Canton S. flies and then subcloned (in the sense orientation) upstream the

hsp70 minimal promoter in the pGL3-basic luciferase vector [All position references are with respect the

transcriptional start site of each gene as annotated by Flybase version 2010_04].

Gene                           Fragment                     Sites subcloned in between

Cdc2c                          -620bp to +140bp             MluI-XhoI

dDP                            -701bp to +119bp             MluI-XhoI

Ex2kb                          -2kb to -1bp                 MluI-XhoI

Ex1kb                          -1kb to +1kb                 MluI-XhoI

CycB3                          -640bp to +120bp             KpnI-NheI

Dachs                          -560bp to +220bp             KpnI-NheI

DNA polymerase ε               -680bp to +280bp             KpnI-NheI

Mcm2                           -620bp to +101bp             KpnI-NheI
Mcm3                          -680bp to +149bp               KpnI-NheI

Mcm10                         -600bp to +159bp               KpnI-NheI

DNA polymerase ε 553          -680bp to -127bp               KpnI-NheI

DNA polymerase ε SdE2F -427bp to -127bp                     KpnI-NheI

dDP 709                       -701bp to +8bp                 KpnI-NheI

dDP 273                       -428bp to +119bp               KpnI-NheI

dDP SdE2F                    -428bp to +8bp                 KpnI-NheI

Following sequencing analysis, each plasmid was then used as described in the paper. Primer sequences

used in the cloning and plasmid maps are available upon request.


        Benjamini, Y. and Hochbert, Y. 1995. Controlling the false discovery rate: A practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society 57: 289–300.

        Drysdale, R. 2008. FlyBase : a database for the Drosophila research community. Methods

Mol Biol 420: 45-59.

        Gautier, L., Cope, L., Bolstad, B.M., and Irizarry, R.A. 2004. affy--analysis of Affymetrix

GeneChip data at the probe level. Bioinformatics 20(3): 307-315.

        Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B.,

Gautier, L., Ge, Y., Gentry, J., and et al. 2004. Bioconductor: open software development for

computational biology and bioinformatics. Genome Biol 5(10): R80.

        G.O. Consortium, The Gene Ontology (GO) project in 2006, Nucleic Acids Res. 34 (2006),

pp. D322–D326.
       Hubbard, T.J., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L.,

Coates, G., Cunningham, F., Cutts, T., and et al. 2007. Ensembl 2007. Nucleic Acids Res

35(Database issue): D610-617.

       Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003a.

Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31(4): e15.

       Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and

Speed, T.P. 2003b. Exploration, normalization, and summaries of high density oligonucleotide

array probe level data. Biostatistics 4(2): 249-264.

       Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T.,

Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y. 2008. KEGG for linking genomes to

life and the environment. Nucleic Acids Res 36(Database issue): D480-484.

       Kauffmann, A., Gentleman, R., and Huber, W. 2009. arrayQualityMetrics--a bioconductor

package for quality assessment of microarray data. Bioinformatics 25(3): 415-416.

       Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K.,

Karas, D., Kel, A.E., Kel-Margoulis, and et al. 2003. TRANSFAC: transcriptional regulation, from

patterns to profiles. Nucleic Acids Res 31(1): 374-378.

       Schones, D.E., Smith, A.D., and Zhang, M.Q. 2007. Statistical significance of cis-

regulatory modules. BMC Bioinformatics 8: 19.

       Smyth, G.K. 2004. Linear models and empirical bayes methods for assessing differential

expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3.

       Xu, X., Bieda, M., Jin, V.X., Rabinovich, A., Oberley, M.J., Green, R., and Farnham, P.J.

2007. A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells

reveals interchangeable roles of E2F family members. Genome Res 17(11): 1550-1561.