NIH Epigenomics Roadmap Data Analysis and Coordination Center at by malj

VIEWS: 2 PAGES: 22

									 Exploring Monoallelic Methylation
Using High-throughput Sequencing



                   Cristian Coarfa, Ronald Harris
 Ting Wang, Aleksandar Milosavljevic, Joe Costello
    Comparison of sequencing-based methods to
    profile DNA methylation and identification of
        monoallelic epigenetic modifications

Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey S, Johnson
BE, Delaney A, Zhao Y, Olshen A, Ballinger T, Zhou X, Fosberg KJ, Gu J,
Echipare L, O’Geen H, Lister R, Pelizzola M, Xi Y, Epstein CB, Bernstein BE,
Hawkins RD, Ren B, Chung WY, Gu H, Bock C, Gnirke A, Zhang MQ,
Haussler D, Ecker JR, Li W, Farnham PJ, Waterland RA, Meissner A, Marra
MA, Hirst M, Milosavljevic A, Costello JF.

In press, Nature Biotechnology
Biological importance of intermediate methylation levels


 1. Imprinting


 2. Non-imprinted monoallelic methylation


 3. Cell type-specific methylation


 4. Sites of inter-individual variation in methylation level
   Unmethylated CpGs                                                  Methylated CpGs
      methylation-sensitive                                                methyl DNA
       restriction digestion                                            immunoprecipitation
              (MRE)                                                        (MeDIP)

  combine parallel digests,                                       IP sonicated, adapter-ligated
       ligate adapters,           Illumina library construction
                                                                  DNA, size-select 100-300 bp
   size-select 100-300 bp


      ~20 million reads/sample              IGAII sequencing      ~100 million reads/sample



                                        data visualization


Methylated


Unmethylated




                          5’ CpG islands                           3’ CpG island is
                         are unmethylated                         partially methylated
Unmethylated and Methylated patches within a CpG island
1   high MeDIP, no or low MRE




    high MRE, no or low MeDIP
2




    high MRE and MeDIP
3   (uniform)



    high MRE and MeDIP
4   (patch Methylation)
Intermediate methylation levels at imprinted genes
Initial catalogue of Intermediate methylation sites

          Start     Stop      MRE      MeDIP     nearest gene Gene

 Chr1. . . . . . . . . . . . . . .
 .
 .
  .
 Chr11    1533281   1536667   1.0342   91.9069   -205410   HCCA2

 chr11    1946475   1948787   0.7769   58.5443    -18939   LOC100133545

 chr11    1975141   1977439   1.2845   87.5516      0      H19

 chr11    2245680   2250508   2.3451   99.4044    -29211   C11orf21

 chr11    2420747   2423224   1.6565   29.5161      0      KCNQ1

 .
 chr22   . . . . . . . . . . . . . . . .
                                       Ting Wang, Washington University
 Using Genetic Variation to Detect Monoallelic
     Epigenomic and Transcription States


H1 cell line


1. Monoallelic DNA methylation (MRE and MeDIP)

2. Monoallelic expression (MethylC-seq and RNA-seq)

3. Monoallelic Histone H3K4me3 (MethylC-seq and Chip-seq)
 Monoallelic Epigenomic Marks and Expression

             MethylC-seq + RNA-seq



                       21



                   0        1
                       4


              39       21       34


MRE-seq                              MethylC-seq
   +                                      +
MeDIP-seq                             ChIP-seq
        Intermediate methylation levels in POTEB


CpG islands

 MRE-seq 1

MeDIP-seq 1

 MRE-seq 2


MeDIP-seq 2
  Bisulfite
    POTEB


        Location           Medip Allele   Count   MRE Allele Count
 chr15:19346666-19350003      G             9       A          30
Validation of monoallelic DNA methylation in POTEB
   Searching for Monoallelic Methlylation
    Using Shotgun Bisulfite Sequencing
• We expect streaks of 50±d% methylation ratios
• Use 500bp windows tiling CpG Islands
• Compute average CpG methylation
   – CpG Islands
   – 1000 loci
• Infer distribution of methylation in 1000 loci
• Subselect 500bp windows tiling CpG Islands
• In the selected windows, search for allele specific
  methylation
Average methylation over 500 bp window
     in CpG Islands and 1000 loci
                             Average Methylation Scores over 500bp windows in CpG Islands and 1000
                                              putative intermediate methylation loci

                 5.00%
                                                                                                                        % of CpG Islands w indow s
                 4.50%                                                                                                  % w indow s in 1000 loci


                 4.00%


                 3.50%
  % of windows




                 3.00%


                 2.50%


                 2.00%


                 1.50%


                 1.00%


                 0.50%


                 0.00%
                         0     4   8   12   16   20   24   28   32   36   40   44   48   52   56   60   64   68   72   76   80   84   88   92      96

                                                                          Percent methylation
                          Parameter Search
• Experimented with various lower and upper bounds for methylation
• Guidelines
    • Discover as many of the 1000 loci
    • Reduce the overall number of 500bp windows


Lower        Upper        Number of 500bp          Number of 500bp windows       % of 500bp windows          1000 loci
Bound        Bound            windows                  overlapping 1000 loci     overlapping 1000 loci       overlapped
        10           70                 24793                             2851                0.114992135             950
        10           80                 28060                             3877                0.138168211             989
        10           90                 36677                             5512                  0.15028492            999
        20           70                 14084                             2345                0.166500994             926
        20           80                 17351                             3371                  0.19428275            977
        20           90                 25968                             5006                0.192775724             990
        30           70                     9403                          1912                  0.20333936            884
        30           80                 12670                             2938                0.231886346             958
        30           90                 21287                             4573                  0.21482595            979



              30-80 rediscovers 958 of loci, at the highest specificity
 Incorporating Genetic Variation
• Search for allele-specific methylation
• Look only into the 30-80% methylation loci overlapping with CpG
  Islands
• Use het SNPs
• Check for those that separate reads into different methylation states
    • One allele >20%
    • Other allele <20%
    • Other thresholding methods possible
                       Results
• Found 6295 heterozygous sites
• 586 sites have allele specific methylation
• Overlap with 62 of the 1000 loci
   – 37 of the loci discovered using pairs of assays
   – 25 new loci
Monoallelic Epigenomic Marks and Expression
Distribution of the 62 SBS-ASM loci

                MethylC-seq + RNA-seq



                          1                   Additional
                                               25 loci
                      0        0
                          4


                  9       16       7


 MRE-seq                                  MethylC-seq
    +                                          +
 MeDIP-seq                                 ChIP-seq
                   Breast Tissue
   Allele specific methylation
   Determine informative heterozygous SNPs
   Loci with monoallelic MRE-seq and MeDIP-seq
              Breast Tissue
•   Multiple cell types
     –   Different epigenotypes
     –   Same genotype
•   Identify monoallelic events
     –   Constitutional
     –   Tissue specific
•   Cell types for four individuals
     –   Conserved monoallelic marks
     –   Individual specific monoallelic marks
Integrate Array-based and Seq-based methods
 • Collaboration with Leo Schalkwyk and Jonathan Mill,
   King’s College, UK
 • Investigate same breast tissue samples

 • Insight
    – Cost
    – Results
        • # of ASM loci
        • Distribution of ASM loci identified by each method
    – Suggestions for designing future studies
                   Acknowledgements
NIEHS/NIDA: Joni Rutter, Tanya Barrett, Fred Tyson, Christine Colvis

EDACC: R. Alan Harris, Cristian Coarfa, Yuanxin Xi, Wei Li, Robert A. Waterland, Aleksandar
Milosavljevic

UCSF/GSC REMC: Raman Nagarajan, Chibo Hong, Sara Downey, Brett E. Johnson, Allen
Delaney, Yongjun Zhao, Marco Marra, Martin Hirst, Joseph Costello

 – UCSC: Tracy Ballinger, David Haussler

 – Washington University: Xin Zhou, Maximiliaan Schillebeeckx, Ting Wang

 – UCD: Lorigail Echipare, Henriette O’Geen, Peggy J. Farnham

UCSD REMC: Ryan Lister, Mattia Pelizzola, Bing Ren, Joseph Ecker

 – Cold Spring Harbor: Wen-Yu Chung, Michael Q. Zhang

Broad REMC: Hongcang Gu, Christoph Bock, Andreas Gnirke, Chuck Epstein, Brad Bernstein,
Alexander Meissner

								
To top