Genome-Wide Association Studies in Breast and Prostate Cancer: Cancer Genetic Markers of Susceptibility (CGEMS)
Stephen Chanock, M.D. Chief of Laboratory of Translational Genomics, DCEG, NCI Director, Core Genotyping Facility, DCEG, NCI October 20, 2007
Identifying Genetic Markers for Prostate & Breast Cancer
Genome-Wide Analysis Public Health Problem Prostate (1 in 8 Men) Breast (1 in 9 Women) Analyze Long-Term Studies NCI PLCO Study Nurses’ Health Study
Initial Study Follow-up #1
Follow-up #2
Establish Loci
Fine Mapping Functional Studies Validate Plausible Variants Possible Clinical Testing
http://cgems.cancer.gov
CGEMS: Nurses Health Study
Longitudinal Study of 121,700 women enrolled in 1976 CGEMS Case-Control derived from 32,826 participants who provided blood sample between 1989 & 1990
Followed for incident disease until May 2004 Post-menopausal invasive breast cancer Capture rate estimated to be 90% Controls matched on age, blood collection, ethnicity (self-described Caucasians) and use of hormones
Final Selection for Association Analysis
Starting Sample Set: 2,494 1,183 cases & 1,185 controls 93 dups, 5 triplicates 23 QC samples
Repeat if <94%
Su mm ary of sele ctio n o f cases and controls cases Initiall y attempted 1,183 - l ow completion rate 30 - u ncle ar identit y 5 - a dmi xed ori gin 3 = Used in scan 1,145 for association analysis controls 1,185 29 13 1 1142
Fingerprints: 1. IdentifilerTM Kit is a multiplex
assay that amplifies 15 STR markers and the amelogenin locus for gender determination Robust PCR reaction requires less than 1 ng of DNA Matching probabilities that exceed one in one billion Experience with nearly 200,000 samples at CGF
2. Check with BPC3 SNPs
Dye Marker D8S1179 D21S11 D7S820 CSF1PO D3S1358 TH01 D13S317 D16S539 D2S1338 D19S433 vWA TPOX D18S51 D5S818 PET Chromosome 8 21q11.2-q21 7q11.21-22 5q33.3-34 3p 11p15.5 13q22-31 16q24-qter 2q35-37.1 19q12-13.1 12p12-pter 2p23-2per 18q21.3 5q21-31 Possible Alleles 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 24, 24.2, 25, 26, 27, 28, 28.2, 29, 29.2, 30, 30.2, 31, 31.2, 32, 32.2, 33, 33.2, 34, 34.2, 35, 35.2, 36, 37, 38 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 12, 13, 14, 15, 16, 17, 18, 19 4, 5, 6, 7, 8, 9, 9.3, 10, 11, 13.3 8, 9, 10, 11, 12, 13, 14, 15 5, 8, 9, 10, 11, 12, 13, 14, 15 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 9, 10, 11, 12, 12.2, 13, 13.2, 14, 14.2, 15, 15.2, 16, 16.2, 17, 17.2 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 6, 7, 8, 9, 10, 11, 12, 13 7, 9, 10, 10.2, 11, 12, 13, 13.2, 14, 14.2, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 26.2, 27, 28, 29, 30, 30.2, 31.2, 32.2, 33.2, 42.2, 43.2, 44.2, 45.2, 46.2, 47.2, 48.2, 50.2, 51.2 X Y SIZE STANDARD
6-FAM
VIC
NED
FGA 4q28 Amelogenin X p22.1-22.3 : Y: p11.2
LIZ
SNP Call Rates in NHS
Attempted Failed (no or <90% call rate) MAP < 1% Analysis SNPs 555,352 -8,706 -18,473 528,173 (95.1%)
Table 1. Summar y of C ompletion Sample Completion rate for Scan 1 stud y Scan 1 case Sca n1 control
rate for N HS samples 528,173 SNPs (r etaine d) 99.754 % 99.756 % 99.773 %
546,646 SNPs (attempted) 95.754 % 95.704 % 95.799 %
Triangular plot for admixture vectors of cases and controls in NHS using STRUCTURE program
Based on 2 sets of SNPs 7,050 & 7,061 with r2< 0.01 Asia
Europe
Africa
Assessment of Discordance Rates
Participants
142 duplicate pairs
Mean discordance rate prostate 2.0 10-4 breast 1.5 10-4
CEPH-CGEMS
74 duplicate pairs
CEPHHapMap 28 individuals
(with 24 duplicates)
Mean discordance rate 1.4 10-3
Mean discordance rate 2 10-4
Concordance Rates: 99.985% NHS: 50,820,003 of 50,827,468 comparisons
Criterium SNP call rate > 90%
Overall SNP & DNA Success Rates: PLCO & NHS
Criterium DNA completion rate > 94%
Number of SNPs attempted failed PLCO*
NHS
Success Rate
561,494
555,352
1,490
8,706
0.973
0.984 2.1%
Attempted SNPs
SNPx
Number of individuals attempted
4696
failed
66
success rate
0.986
Attempted DNAs
Working set of data Proportion of missing data 0.002
DNA y
1.4%
NHS GWAS Deviation from Hardy-Weinberg Proportions
Figure 1 . lo g scale p-v alue quantile
0
plot for devi ation from Hard y-W einb erg proportion
log
10 (p
-v alue)
-2
-4
-6
-8 -2 -1
log
10 (quantile)
0
p<0.05: 29,318 (5.55%)
p<0.001: 2,880 (0.55%) ~38% map to CNV regions
CGEMS Breast Cancer Scan in NHS
log quantile plot of p-values for all SNPs
0
log10(p-value)
-2
-4
corrected uncorrected
-6 -6
-4
-2
0
log10(quantile)
528,173 SNPs
Log quantile plot of p-values for the 550 SNPs with lowest p values (0.1%)
Breast cancer GWA scan Prostate cancer GWA scan
log10(p-value)
-4
-4
-5
-5
-6 corrected uncorrected
-6
corrected uncorrected -7 -6 -5 -4
-7 -6
-5 -4
log10(quantile)
-3
log10(p-value)
-3
-3
log10(quantile)
-3
Incidence density sampling
pter
-2
GWAS in NHS Breast Cancer
qter
-3
-4
Fig1 a
-5
TLR6 (38.5M) rs12505080 (37M)
RELN
-6 -2
1
2
3
4
5
6
7
8
log10(p-value)
-3
-4
-5
rs10510126 (125M) FGFR2 (123M)
ATP10A
rs11150911
-6
9
10
11
12
14
13
15
16
17
18
19
20
21
22
Heterogeneity in Signal: BCAC & CGEMS (NHS)
BCAC*- best hits
SNP rs1219648 rs889312 rs3817198 rs2107425 rs13281615 rs981782 rs30099 rs4666451 rs3803662 GENE FGFR2 MAP3K1 LSP1 H19 CHR 10 5 11 11 8 5 5 2 16
CGEMS
TYPED IN CGEMS? 2.00E-06 no 0.51 no no no 0.88 no 0.05 BEST IN 100kb REGION rs726501 rs217228 rs10098985 rs4866929 rs12710697 p 0.012 0.030 0.017 7.30E-05 0.046 RANK IN SCAN 1 6,340 269,442 14,344 12,048 33 462,781 24,741 27,025
TNRC9/LOC643714
Easton Nature 2007 3 Stage Design >28,000 cases/26,000 controls
http://cgems.cancer.gov 1145 cases/1142 controls
Figure 2
FGFR2 SNPs in NHS GWAS: Intron 2
123.2 123.3
123.4
log10(p-value)
-4
-2
0
FGFR2
Pooled: p 1.1 x 10-10 Cases/controls 2,921/3,214
Hunter et al Nat Genet 2007
General Strategy for Breast Cancer GWAS
Initial Study 1150 cases/1150 controls
540,000 Tag SNPs
>33,000 SNPs
Follow-up Study #1 4000 cases/ 4000 controls
Follow-up Study #2 5000 cases/ 5000 controls
at least 7,600 SNPs
Fine Mapping
10 ±5 loci
http://cgems.cancer.gov
Two Strategies for Staged Follow-up
550k GWAS 550k
GWAS
Cone of Truth
28000
F/U #1
28000 F/U #1
150 F/U #2
150 F/U #2
Truth
Aggressive Prostate Cancer
High priority to examine non-aggressive vs aggressive Cohort based studies (screening)
• Bias towards early cases
Enrich primary scan with >55% aggressive cases
• Aggressive defined as: • Gleason score ≥ 7 OR Stage C/D • Follow-up studies in cohorts • Comparable distributions for early/advanced
Approximately same ratio overall for follow-up studies
Multiple potential genetic models
Simple process
No cancer
Predisposition gene
cancer
Independent processes
Non-aggressive specific gene
Multistep process
Initiation gene progression gene
non aggressive cancer
No cancer
Aggressive specific gene
No cancer
Non-aggressive cancer
Accelerator gene
Aggressive cancer
Aggressive cancer
For each SNP type, the mode of expression may be recessive, dominant, additive, multiplicative and even overdominant .
Inclusion of PLCO prostate cancer patients
1994 1996 1998 2000 2002
0 0
Aggressive Cancer Non-aggressive Cancer
737 Oct 2003
624 Oct 2001
1994
28,521 eligible participants
Matching with controls was performed for 737 aggressive cases and 493 randomly selected non-aggressive cases.
Non aggressive : stage <= 2 (non invasive) and Gleason score <= 6 Aggressive : stage > 2 (invasive) or Gleason score > 6
Log-Log Quantile Plot for p-values for the 4 Statistical Tests Used
-3
Log10 (p value)
Sing. Sampl. No cov
-4
Incid. Den. Sampl. No cov Sing. Sampl. with cov
-5
Incid. Den. Sampl. with cov
-6
http://cgems.cancer.gov
-5
-4
Log10 (quantile)
-3
CGEMS Prostate Cancer GWS
Chromosomes
p
1
-2
q
2
3
4
5
6
7
p
8
q
-4
8q24
-6
Log10(p-value)
-2
p
9
q
10
11
12
13
14
15
16
17
18
19 20 21 22
p
X
q
-4
-6
incidence density sampling
General Strategy for Prostate GWAS
Initial Study 1150 cases/1150 controls
540,000 Tag SNPs
>28,000 SNPs
PLCO ACS/ATBC/ HPFS/FrCC/ PHS MEC/EPIC/ JHU/SwCaP
Follow-up Study #1 3700 cases/ 3900 controls
Follow-up Study #2 5500 cases/ 5500 controls
at least 7,600 SNPs
Fine Mapping
10 ±5 loci
Genotype, Haplotype, Sequence Determine Causal Variant(s)
Breakdown of Agnostic 26,890 SNPs in Prostate Cancer Follow-up #1
Single SNP Analysis- Identify “best” 30,000 (i.e. lowest p values) Incidence Density Sampling Covariates (age in 5 years and center) Score test (4 df) Two-SNP analysis Stratified for each notable SNP Inclusion of those with improved p values FILTER: Selection was sequential such that any SNP with r2>0.8 with a selected SNP was not selected. Tally Single SNP = 24,988 with α < 0.063 Two SNP = 1,902 (7.6%)
iSelect Composition for Prostate Scan
Agnostic 1 SNP Agnostic 2 SNP Population Stratification Monitors 8q24 Illumina 550 SNPs + Candidate Coverage Of 28,880 bead types designed Approximately 1300 (4.5%) Failed Design and/or Manufacturing 86.5% 6.6% 5.2% 0.5% 1.2%
Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men
Freedman et al. PNAS 2006;103:14068-73.
DECODE: rs1447295 + microsatellite
Amundadottir et al. Nat Genet. 2006;38:652-58.
Prostate Cancer & 8q24
Rs1447295 replication in BPC3 & 3 Other Studies Commonly Amplified in Prostate tumors
“Gene-poor region” GWAS- multiple signals
Yeager et al Nat Genet 39:645-649, 2007
-log10 r2
Replication Studies in CGEMS Prostate Cancer GWAS
rs6983267
Subjects PLCO ACS ATBC FPCC HPFS 1157 1151 896 459 636 1172 1150 894 455 625 Predisposing allele frequency Cases 0.55 0.55 0.57 0.56 0.57 Cont. 0.49 0.50 0.51 0.51 0.51 2.4x10-05 3.2x10-03 1.9x10-03 1.2x10-01 1.0x10-02 P-value
rs1447295
Predisposing allele frequency
Cases Cont. P-value
0.14
0.12 0.21 0.12 0.13 0.15
0.10
0.08 0.17 0.07 0.09 0.11
9.8x10-05
2.7x10-05 2.9x10-02 4.4x10-03 2.7x10-03 1.5x10-14
ALL
4299
4296
0.56
0.50
9.4x10-13
Estimated Odds Ratios Overall Heterozygotes 1.26 Homozygotes 1.58
1.43 2.23
Ancestral Recombination Graph Analysis* of the 8q24 region identifying independent regions of association flanking a hotspot of recombination.
Minichiello & Durbin AJHG 2006
Population Attributable Risk of Prostate Cancer with 8q24 Loci in Caucasians
AL L AC S AT B C FPCC HPFS PLCO J o in t P AR 0 .2 8 4 0 .2 5 5 0 .2 5 1 0 .3 0 6 0 .2 4 9 0 .3 4 7 P AR rs 1 4 4 7 2 9 5 0 .0 8 5 0 .0 9 4 0 .0 5 2 0 .0 9 6 0 .0 8 5 0 .0 8 6 P AR rs 6 9 8 3 2 6 7 0 .2 0 9 0 .1 9 2 0 .1 5 7 0 .0 9 1 0 .1 8 0 0 .2 7 6
rs6983267 G: 21%
rs1447295 A: 7%
•Suggests that both SNPs contribute substantially to the population burden of prostate cancer.
Follow-up to GWAS Studies
Fine Mapping of Notable Regions
Genotyping & Sequencing Bio-informatics (exclude common CNV)
Analysis of Population Genetics Functional Determination of Causal Variant(s) Design Issue for Analysis in Clinical Studies
Population-based studies Sequence of Clinical Studies
Validation in Follow-up Studies Clinical Implementation
CGEMS: caBIG Posting Pre-Computed Analysis
Pre-computed Analysis No Restrictions
Raw Genotype Case/control Age (in 5 yrs) Family Hx (+/-) Registered Access SF424 Data Use Certificate
http://cgems.cancer.gov/data
Access to CGEMS Analyses through caBIG Portal
Pre-computed Analyses with Methods PDF Prostate 1A Scan Prostate 1 Scan Breast 1 Scan 300,000 SNPs 530,000 SNPs 530,000 SNPs Oct 2006 Feb 2007 April 2007
Registered Access (individual and institutional access) Signed SF424 (modified) with Abstract Data Use Certificate
Association Tests 8q24 Scan 1A ~300,000 SNPs
http://cgems.cancer.gov Available 10/06
Committed Studies CGEMS
Prostate Cancer
PLCO (GWAS) ACS HPFS PHS ATBC CeRePP EPIC MEC
Breast Cancer
NHS (GWAS) PLCO WHI Polish C/C ACS EPIC MEC
Acknowledgements
CGEMS & DCEG
Gilles Thomas Kevin Jacobs Meredith Yeager Robert Hoover Joseph Fraumeni Daniela Gerhard Zhaoming Wang Xiang Deng Nick Orr Robert Welch Richard Hayes Sholom Wacholder Nilanjan Chatterjee Kai Yu Margaret Tucker Marianne Rivera-Silva
HSPH
David Hunter Peter Kraft David Cox Sue Hankinson
CeRePP,
France Olivier Cussenot Geraldine Cancel-Tassin Antoine Valeri
ACS
Michael Thun Heather Feigelson Eugenia Calle
Wellcome Trust,
Mark Minichiello
UK
NPHI,
Finland Jarmo Virtamo
NCICB
Ken Buetow Carl Schaefer Subhah Madhavan Liming Yang
Wash. U.,
St Louis
Gerald Andriole