cancer genetic markers of susceptibility CGEMS

Reviews
Shared by: tony lindeman
Stats
views:
45
rating:
not rated
reviews:
0
posted:
3/30/2008
language:
pages:
0
Selecting Initial GWAS and replication studies David Hunter Harvard School of Public Health Brigham and Women’s Hospital Broad Institute of MIT and Harvard Initial Study for GWAS • Cases and controls well matched with respect to ancestry to minimize population stratification (restriction to one self-identified group) • Genomic control or other methods e.g. Eigenstrat (Price et al, 2006), may compensate for looser matching Control of population stratification e.g. hair color in Nurses’ Health Study (European ancestry) Chi-squared inflation factors and Q-Q plots of –log10 p-values with no adjustment for population stratification and adjusting for the top four and fifty eigenvectors (Price et al, 2006) 45, 19 and 19 SNPs (respectively) with p<10-7 not shown Kraft P, unpublished Article Nature 447, 661-678 (7 June 2007) | doi:10.1038/nature05911; Received 26 March 2007; Accepted 11 May 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium Conclusions Broad matching on ancestry and region adequate for discovery of strongest hits Statistical methods for control of population stratification (within populations of European ancestry) adequate to assist in discovery of strongest hits Will more rigorous designs permit discovery of weaker associations? When signal-noise is low, how does noise due to multiple comparisons compare with noise due to poor matching of controls? False negatives the biggest problem (can deal with false +ves via replication). Criteria for follow-up of initial reports of genotype–phenotype associations Replication studies should be of sufficient sample size to convincingly distinguish the proposed effect from no effect Replication studies should preferably be conducted in independent data sets, to avoid the tendency to split one well-powered study into two less conclusive ones The same or a very similar phenotype should be analysed A similar population should be studied, and notable differences between the populations studied in the initial and attempted replication studies should be described Similar magnitude of effect and significance should be demonstrated, in the same direction, with the same SNP or a SNP in perfect or very high linkage disequilibrium with the prior SNP (r2 close to 1.0) Statistical significance should first be obtained using the genetic model reported in the initial study When possible, a joint or combined analysis should lead to a smaller P-value than that seen in the initial report A strong rationale should be provided for selecting SNPs to be replicated from the initial study, including linkage-disequilibrium structure, putative functional data or published literature Replication reports should include the same level of detail for study design and analysis plan as reported for the initial study Chanock, Maniolo et al. Nature, June 7th 2007 Initial Study for GWAS: technical issues • Standard advice – case and control samples handled exactly the same at every stage • Source of DNA – Blood/buffy coat mostly good results – Buccal cell variable results (Feigelson et al. CEBP, 2007 - encouraging) – Whole genome amplified DNA (Affy OK, Illumina in development) Replication studies For statistical replication, prefer: • Similar phenotype • Similar ancestry For generalizability, prefer • Different populations • Different ancestry backgrounds (may also help with fine mapping) Study design? Prospective • Protect from survivor bias • Protect from selection bias • Interpretability of gene-environment analyses • Possibility of interpretable biomarkers Study quality? Importance depends on strength of signal • To date – little apparent relation between probability of replication and quality • May matter more for weak signals • Sample size may trump quality (within limits) NCI BPC3 Results: 7909 cases, 8683 controls Cohort All (phet=0.483) Genotype CC AC AA CC AC AA CC AC AA CC AC AA CC AC AA CC AC AA CC AC AA CC AC AA Cases / 5,566 2,064 279 Controls / 6,666 / 1,842 / 175 OR (99%CI) Ref. 1.33 (1.20-1.46) 1.87 (1.44-2.42) Ref. 1.56 (1.17-2.08) 2.61 (0.92-7.37) Ref. 1.23 (0.95-1.60) 1.81 (0.94-3.51) Ref. 1.17 (0.87-1.58) 1.57 (0.53-4.59) Ref. 1.53 (1.07-2.19) 2.09 (0.56-7.80) Ref. 1.32 (1.11-1.58) 1.89 (1.30-2.75) Ref. 1.27 (0.96-1.69) 2.06 (0.83-5.12) Ref. 1.33 (1.02-1.72) 1.39 (0.63-3.10) P-value -19 4.00x10 ACS 871 / 955 238 / 166 21 / 9 606 / 623 312 / 260 45 / 25 551 / 869 169 / 233 12 / 12 495 / 545 157 / 114 11 / 6 1,426 / 1,565 728 / 614 146 / 88 801 / 1,123 200 / 220 21 / 15 816 / 986 260 / 235 23 / 20 2.63x10-5 ATBC 0.012 EPIC 0.258 HPFS 3.63x10-3 MEC 2.58x10 -7 PHS 0.013 PLCO 0.014 Rs1447295: Overall p, trend 4 x 10-19 Schumacher et al. Can Res, April 2007 a, rs2981582; b, rs3803662; c, rs889312; d, rs13281615; and e, rs3817198 FGFR2 Forest plots of the per-allele odds ratios for each of the five SNPs reaching genome-wide significance for breast cancer. Easton et al. Nature, May 2007 Cancer Genetic Markers of Susceptibility (CGEMS): http://cgems.cancer.gov General Strategy for Multistage analysis of Prostate & Breast Cancer Initial GWAS Study 1150 cases/1150 controls 540,000 Tag SNPs ~28,000 SNPs Follow-up Study #1 4500 cases/ 4500 controls Follow-up Study #2 3500 cases/ 3500 controls at least 1,500 SNPs Fine Mapping 30 ±20 loci Committed Studies CGEMS Prostate Cancer PLCO (GWAS) ACS HPFS PHS ATBC CeRePP EPIC MEC Breast Cancer NHS (GWAS) PLCO WHI Polish C/C ACS EPIC MEC CGEMS: caBIG Posting Pre-Computed Analysis Pre-computed Analysis No Restrictions Raw Genotype Case/control Age (in 5 yrs) Family Hx (+/-) Registration http://cgems.cancer.gov/data Association Tests Prostate 10/06 Breast 04/07 ~528,000 SNPs Illumina 550k Instant Replication! http://cgems.cancer.gov Additional In silico replication possibilities dbGAP ncbi.nlm.nih.gov/dbgap Framingham nhlbi.nih.gov/about/framingham WTCCC DGI wtccc.org.uk broad.mit.edu/diabetes Chromosomes p 1 -2 q 2 3 4 5 6 7 p 8 q -3 -4 -5 Log10(p-value) -2 p 9 q 10 11 12 13 14 15 16 17 18 19 20 21 22 p X q -3 -4 FGFR2 -5 -6 The six SNPs with the smallest P values of the 528,173 tested among 1,145 cases of postmenopausal invasive breast cancer and 1,141 controls (full results available at http://cgems.cancer.gov ). Χ2* 25.37 23.56 23.39 23.17 22.40 21.99 SNP ID 1. rs10510126 2. rs1219648 3. rs17157903 4. rs2420946 5. rs7696175 6. rs12505080 P* 0.0000031 0.0000076 0.0000083 0.0000095 0.0000137 0.0000168 ORhet* 0.59 1.24 1.60 1.25 1.38 1.21 ORhomo* Chromosome 0.62 10 1.81 10 0.79 7 1.81 10 0.86 4 0.52 4 Gene FGFR2 RELN FGFR2 TLR1,TLR6 *From analyses adjusting for age, matching factors (see Methods), and three eigenvectors of the principal components identified by Eigenstrat. P value obtained by a score test with 2df. Hunter et al, Nat Gen, May 2007 F ig 2 1 2 3 .2 -6 1 lo g 0 (p -value) 1 2 3 .3 1 2 3 .4 -4 -2 0 FG FR2 Scatterplot of P values for the FGFR2 locus from the GWAS. Results of associations of rs1219648 in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population (N cases/N controls) Allele Frequency Cases Controls (%) (%) ORhet (95% CI) ORhomo Ptrend (95% CI) Nurses’ Health Study (1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x 10-6 (1.04-1.50) (1.43-2.31) Nurses’ Health Study 2 (302/594) 48.18 40.57 1.29 1.93 0.002 (0.95-1.75) (1.31-2.86) 1.06 1.22 0.13 (0.86-1.30) (0.94-1.58) 1.32 2.06 0.0002 (1.02-1.72) (1.42-2.97) PLCO (919/922) 44.50 41.49 ACS CPS-II (555/556) 44.95 37.41 Pooled estimates (2,921/3,213) 1.20 1.64 1.1 x 10-10 (1.07-1.34) (1.42-1.90) Results of associations of rs1219648 in the Nurses Health Study, Nurses’ Health Study 2, and the PLCO study. Study Population (N cases/N controls) Allele Frequency Cases Controls (%) (%) ORhet (95% CI) ORhomo Ptrend (95% CI) Nurses’ Health Study (1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x 10-6 (1.04-1.50) (1.43-2.31) Nurses’ Health Study 2 (302/594) 48.18 40.57 1.29 1.93 0.002 (0.95-1.75) (1.31-2.86) 1.06 1.22 0.13 (0.86-1.30) (0.94-1.58) 1.32 2.06 0.0002 (1.02-1.72) (1.42-2.97) PLCO (919/922) 44.50 41.49 ACS CPS-II (555/556) 44.95 37.41 Pooled estimates (2,921/3,213) 1.20 1.64 1.1 x 10-10 (1.07-1.34) (1.42-1.90) UNFINISHED AGENDA Where is the causal variant? What does this tell us about mechanisms of breast carcinogenesis? THE HITS KEEP COMING…. UNFINISHED EPIDEMIOLOGIC/PUBLIC HEALTH AGENDA Gene-environment interaction, what do the genes tell us about environmental exposures? Gene-gene interaction Pathway analysis Clinical implications – risk stratification for screening? Intervention? Health policy implications? Much of the substrate data – publicly available or relatively cheap. NHS/HPFS/PHS GENETIC STUDIES Immaculata De Vivo NHS/HPFS: Peter Kraft Hardeep Ranu Crystal Arnone Carolyn Guo Pati Soule Sue Hankinson Shelley Tworoger Eric Rimm Frank Hu Meir Stampfer Craig Labadie Carolyn Guo Walt Willett Frank Speizer Jiali Han Monica Macgrath Chunyan He Patrick Dennett David Cox Tim Niu Aditi Hazra Charles Fuchs Ed Giovannucci Andy Chan, Debra Schaumberg Fran Grodstein, Jae Hee Kang PHS: Jing Ma Fred Schumacher Mike Gaziano, P Ridker Harvard cohorts EPIC cohorts CEPH ACS cohort Multiethnic Cohort BROAD INSTITUTE PLCO cohort ATBC cohort NCI Core Gen Facility NCI BPC3 STEERING COMMITTEE: Harvard EPIC,CEPH, Cambridge ACS NCI Mukesh Verma MEC & Broad David Hunter, Michael Gaziano, Julie Buring, Graham Colditz, Walter Willett Elio Riboli, Rudolf Kaaks, Federico Canzian, Gilles Thomas, Michael Thun, Heather Feigelson, Jeanne Calle Richard Hayes, Demetrius Albanes, Bob Hoover, Stephen Chanock; Program - Brian Henderson, Laurence Kolonel, David Altshuler, Malcolm Pike SECRETARIAT: David Hunter, Elio Riboli GENOMICS subgroup: David Altshuler (Chair) Steve Chanock Gilles Thomas Genotyping subgroup: Chris Haiman (Chair) Federico Canzian Alison Dunning Steve Chanock David Cox David Hunter Loic LeMarchand James Mackay STATISTICS subgroup: Dan Stram (Chair) Peter Kraft Rudolf Kaaks Paul Pharoah Malcolm Pike Gilles Thomas Shalom Wacholder PUBLICATIONS COMMITTEE: Michael Thun (Chair) Elio Riboli Brian Henderson David Hunter Graham Colditz Richard Hayes Demetrius Albanes CGEMS Acknowledgements • NCI • • • • • • • • • • • • • • • • Stephen Chanock Gilles Thomas Robert Hoover Joseph Fraumeni Daniela Gerhard Kevin Jacobs Zhaoming Wang Meredith Yeager Robert Welch Richard Hayes Sholom Wacholder Nilanjan Chatterjee Kai Yu Margaret Tucker Marianne Rivera-Silva NCICB HSPH David Hunter Peter Kraft Fred Schumacher David Cox ACS Heather Feigelson Carmen Rodriguez Eugenia Calle Michael Thun PLCO Regina Ziegler Chris Berg Saundra Buys Chris MacCarty Selecting initial and replication samples from existing studies I. What studies of the same phenotype exist? II. Can a consortium or collaborative approach provide a study with adequate power for the initial GWAS, along with pre-planned replication studies? III. Do any of these studies have pre-existing data that would increase power e.g. “free” controls for a prior GWAS of another phenotype? IV. V. Is the phenotype defined in the same or similar manner? Are covariate data available, and defined similarly? VI. Do any of the studies have additional phenotypic information e.g. biomarkers that would create opportunities for “added value” analyses, if these are the subjects of the GWAS?

Related docs
premium docs
Other docs by tony lindeman
zimlets technical white paper
Views: 713  |  Downloads: 6
X86-486 technology white paper
Views: 459  |  Downloads: 9
web office technology white paper
Views: 439  |  Downloads: 20
Voice over IP technical white paper
Views: 572  |  Downloads: 41
Virtuoso RDF views _SQL_ white paper
Views: 467  |  Downloads: 4
Universal disk format technical white paper
Views: 839  |  Downloads: 5
UFD identification technical white paper
Views: 641  |  Downloads: 6
The utah digital newspapers technical whitepaper
Views: 216  |  Downloads: 1
the new apple of malware eye whitepaper
Views: 151  |  Downloads: 0
the halo collaporation white paper
Views: 141  |  Downloads: 1