Immunoprecipitated DNA were sequenced using Illumina sequencers

Document Sample
Immunoprecipitated DNA were sequenced using Illumina sequencers Powered By Docstoc
					                        Supplementary Online Material

Open Chromatin Defined by DNaseI and FAIRE Identifies Regulatory Elements
                         That Shape Cell-Type Identity

Supplementary Methods

Cell culture: Vendor information and standard cell growth protocols for

GM12878, K562, HepG2, HeLa-S3, HUVEC, NHEK, and H1-ES cell lines can be

found           at           the            UCSC          ENCODE              site

( After growth, cells were

divided and either prepared for DNase-seq or fixed and frozen for FAIRE-seq and

ChIP-seq. DNase-seq datasets from seven lymphoblastoid lines were described

previously (McDaniell et al. 2010). Due to technical constraints, one GM12878

biological replicate for MYC ChIP and one K562 replicate for Pol II ChIP were

not from the same growth used for DNaseI and FAIRE. One biological growth of

H1-ES was divided generate two technical replicates for DNase-seq. Cross-

linking of H1-ES cells for FAIRE-seq was performed at Cellular Dynamics using a

standard ChIP protocol that differs in fixation time and may have affected


DNase-seq/chip: DNase-seq was performed as described (Song and Crawford

2010) with the only modification of oligo 1b synthesized with a 5’ phosphate to

increase the efficiency of ligation. DNase-chip (Crawford et al. 2006; Shibata and

Crawford 2009) was performed on the same DNaseI digested DNA using 1%

tiled arrays from NimbleGen/Roche.

FAIRE-seq/chip: FAIRE was performed as described (Giresi et al. 2007; Giresi

and Lieb 2009) DNA fragments are prepped for sequencing using the

recommended protocol except that samples are amplified prior to gel extraction.

Amplified DNA between 150 and 300 bp are excised from the gel, recovered

using Qiagen Gel Extraction kit, and sequenced. FAIRE-chip was performed using

1% tiled arrays from NimbleGen.

ChIP-seq/chip and input sequencing: ChIP was performed by cross-linking proteins

to DNA using 1% formaldehyde solution (Bhinge et al. 2007; The ENCODE Project

Consortium 2007). Immunoprecipitated DNA were sequenced using Illumina

sequencers, (ChIP-seq) and hybridized to NimbleGen 1% Human ENCODE tiling

arrays with the input DNA as reference (ChIP-chip). Input sequencing data were

generated for GM12878, K562, HeLa-S3, HepG2, and HUVEC by cross-linking

chromatin, shearing and reversing cross-links without immunoprecipitating. Input

data were used to create control models for F-Seq processing. For NHEK and H1-ES,

a general control model that corrected for sequencing and alignment biases, but not

copy number changes, was derived from the five Input data sets.

Insulator assay: The insulator/enhancer blocking assay was performed as

previously described (Bell et al. 1999). Briefly, the enhancer block vector was

constructed by cloning an enhancer element beta-globin DNaseI HS2 site

upstream of a gene for neomycin resistance (NeoR) gene. The chicken insulator

was used as a positive control. DNaseI HS sites were amplified from human

genome DNA by PCR and cloned into the enhancer block vector. CTCF motifs

were deleted using site directed mutagenesis kit. Enhancer blocking value = -log2

[(# colonies, DNaseI HS site vector)/(# colonies, chicken insulator control)].

Exon arrays: RNA was extracted using trizol followed by cleanup on RNEasy

column (Qiagen) that included a DNase step. RNA was checked for quality using a

nanodrop and an Agilent Bioanalyzer. RNA (1µg) was then processed according

to the Affymetrix Whole transcript Sense Target labeling protocol that included a

riboreduction step. Fragmented biotin-labeled cDNA was hybridized over 16

hours to Affymetrix Exon 1.0 ST arrays and scanned on Affymetrix Scanner 3000

7G. Resulting CEL files were normalized using Affymetrix Expression Console

with sketch quantile normalization and "PM" and "RMA background correction"

with median polish summarization. Probe sets without annotations at the core

level were discarded. The maximum expression value was retained for genes

with multiple probe sets. The average expression value (cell lines with two

replicates) or median expression value (one or three replicates) was used as the

final expression level for each gene.

Sequence data processing: Sequence data from all experiments were aligned to

the human reference genome (NCBI Build 36, March 2006) using MAQ (Li et al.

2008) with default settings. Alignments were filtered to remove problematic

repetitive regions (alpha satellites and rRNA) and likely PCR artifacts

characterized by many sequences mapped to small genomic locations. At most 5

sequences were counted at any one genomic position. Replicate data for a cell

line/experiment were compared to ensure reproducibility and then combined. A

base-pair signal reflecting open chromatin levels (DNase-seq and FAIRE-seq) or

a binding event (ChIP-seq) was generated using F-seq (Boyle et al. 2008).

Discrete peaks were determined by fitting signal data to a gamma distribution

and determining the signal value that corresponded to a certain p-value cut-off

(DNase-seq and ChIP-seq, p-value = 0.05; FAIRE-seq p-value = 0.1). Gene-relative

categories were defined as follows: (i) promoter regions: overlaps 2 kb upstream

of any TSS; (ii) 5′ regions: overlaps first exon or first intron; (iii) intragenic

regions: overlaps internal exon or intron; (iv) 3′ regions: overlaps last exon or 2

kb downstream of end of transcription; and (v) intergenic regions: not within

any previous category. Sites were assigned to the first category whose criterion

was satisfied. Cell type selective and ubiquitous open chromatin sites were

calculated using the top 100K sites from each cell type. A union set of significant

sites identified by DNase-seq and FAIRE-seq was created, and significance of

combined peaks was calculated using Fisher’s combined probability test (Fisher

1925; Monsteller 1948).

Cell Line    Exp           Reps   Lanes    Sequences               ROC                       Cross-Rep
GM12878      DNaseHS       3      8+4+20   50.3+12.0+99.3=161.7    0.907/0.854/0.947/0.946   0.783/0.944/0.841
GM12878      FAIRE         3      5+5+9    21.5+27.6+41.4=90.5     0.906/0.899/0.871/0.947   0.649/0.632/0.603
K562         DNaseHS       2      2+7      10.3+29.2=39.5          0.863/0.859/0.881         0.876
K562         FAIRE         2      6+7      47.1+44.9=92.0          0.821/0.787/0.818         0.767
HepG2        DNaseHS       3      3+3+6    14.0+13.3+24.8=52.1     0.912/0.929/0.964/0.959   0.859/0.867/0.930
HepG2        FAIRE         3      6+6+6    51.3+41.7+48.1=141.1    0.900/0.920/0.896/0.939   0.693/0.677/0.770
HeLaS3       DNaseHS       3      2+3+6    11.1+15.1+19.1=55.3     0.934/0.932/0.945/0.962   0.788/0.817/0.937
HeLaS3       FAIRE         2      6+6      19.0+36.4=55.4          0.801/0.805               0.487
HUVEC        DNaseHS       2      4+3      14.9+16.8=31.7          0.906/0.921/0.926         0.935
HUVEC        FAIRE         2      8+8      40.0+34.0=74.0          0.773/0.753/0.775         0.818
NHEK         DNaseHS       2      4+4      19.2+22.1=41.3          0.885/0.901/0.910         0.942
NHEK         FAIRE         2      5+5      50.0+49.0=99.0          0.902/0.907/0.923         0.828
H1-ES        DNaseHS       2      6+6      47.6+53.7=101.3         N/A                       0.923
H1-ES        FAIRE         2      5+5      77.0+78.5=155.5         N/A                       0.620

            Supplementary Table 1. DNase-seq and FAIRE-seq cell line statistics. Cell lines
            were prepared in 2 or 3 separate growths, or replicates, for each cell line (column 3).
            DNase-seq and FAIRE-seq were independently performed on each replicate with
            differing numbers of Illumina GAII sequencing lanes being run (column 4). Sequence
            counts are in millions of sequences (column 5). DNase-chip and FAIRE-chip were
            also performed and used for Receiver Operating Curve (ROC) analyses (column 6).
            Positive peaks from chip data were determined using ChIPOTle (Buck et al. 2005). A
            similar number of “negative” peaks were created by randomly selecting the same
            number low signal chip regions of comparable size. ROC scores include the score of
            the combined replicate dataset at the end. Comparisons across replicates were
            performed to measure reproducibility (column 7). DNaseHS cross replicate tests
            measured proportion of top 50K from one replicate in top 100K of second replicate.
            FAIRE cross replicate tests measured proportion of top 50K from one replicate in top
            200K of second replicate. Cross replicate test percentages for 3 replicates correspond
            to rep1 vs rep2, rep1 vs rep3, and rep2 vs rep3, respectively. Percentages are the
            average of this analysis in both directions (i.e. rep1->rep2 and rep2->rep1).

                   2kb            1st exon    Internal     Last exon  Intergenic
                   upstream of    and         exons and    and 2kb
                   TSS            intron      introns      downstream
vs Random              1.83e-05     0.7826      0.08745       1.31e-04    4.73e-04
DNase-only vs
Random                 5.55e-05    6.77e-07       0.1155      2.92e-04    1.72e-06
FAIRE-only vs
Random                   0.8211    0.01306        0.4576          0.64     0.06542
DNase-only vs
FAIRE-only             3.81e-06     0.9281      2.93e-06      6.33e-04    1.16e-05

Supplementary Table 2: Significance of distributions of sites categorized by
relationships to genes. The average percentage of sites found in each cell line
by both DNase-seq and FAIRE-seq, DNase-seq only, and FAIRE-only in each of
the gene relative categories (see Methods) were compared to randomly
distributed regions (see Fig 2B). This table shows the significance of the
differences of these distributions based on t-tests. In addition, differences in the
distributions of DNase-only and FAIRE-only sites were also analyzed using t-

             GM12878    K562      HeLaS3   HepG2     HUVEC     NHEK     H1-ES
GM12878      1          0.351     0.296    0.315     0.346     0.332    0.342
K562         0.257      1         0.322    0.324     0.345     0.311    0.398
HeLaS3       0.249      0.372     1        0.348     0.365     0.373    0.387
HepG2        0.312      0.439     0.413    1         0.416     0.389    0.479
HUVEC        0.230      0.315     0.294    0.280     1         0.326    0.344
NHEK         0.246      0.316     0.329    0.291     0.364     1        0.333
H1-ES        0.264      0.382     0.332    0.356     0.354     0.330    1

Supplementary Table 3. Pairwise overlap of open chromatin regions between
seven cell types.

     of Cell                                                          Permuted
                  Top 25K
     Lines                      Top 50K Union      Top 100K Union     Top 100K
     Peaks                                                             Union
                Only             Only                Only
                         All               All                All
                Top              Top                 Top
                        Peaks             Peaks              Peaks
                25K              50K                100K
        1      32566    2345    76225     13934    173999    66386      487836
        2       8859    3812    20093     16482     45972    54908       65097
        3       4791    4901    10341     16982     22586    44189       11358
        4       3381    5706     6294     15940     13783    34444        2607
        5       3069    6219     5033     14823      9920    27749        725
        6       3575    7951     5383     16222      9092    24550        249
        7       8169    33466   15764     44750     25883    49009         82
               64400    64400   139133   139133    301235   301235      567954

Supplementary Table 4. Number of cell lines open chromatin sites are found in
from each union set. The union sets of top 25K, 50K, and 100K peaks were found.
For each peak, the number of cell lines in which that peak is found in that union set
was found. For each union set, the first column only considers the peaks found within
that union set when counting the number of cell lines. The second column considers
all peaks found to be significant in each cell type. The last column considers a union
set of sites with 100K permuted genomic locations in each cell line.

Supplementary Table 5. Coordinates of genomic regions used for insulator
blocking assay (NCBI 36/hg18 assembly)

Transfac    GM12878        GM12878                 K562                  K562          HeLaS3         HeLaS3
                          Enrichment /                               Enrichment                     Enrichment /
  Top          TF           P-value                    TF              / P-value         TF           P-value
                                                                      2.29 / 6.8
   1       ISRE            3.34 / 0.00      EGR2                        10-7          ARP-1            2.68 / 0.0
   2       IRF1             2.66 / 0.0      STAT5B                    2.27 / 0.0      BACH1             2.61 / 00
   3       IRF7             2.14 / 0.0      GATA1                     2.22 / 0.0      ATF              2.12 / 0.0
                                                                                                      2.11 / 1.78
   4       MEF2A            2.00 / 0.0      GATA-X                    2.18 / 0.0      PBX1b              10-4
   5       MEF2             1.92 / 0.0      LMO2                      1.80 / 0.0      ATF2             2.01 / 0.0
   6       NFKB1            1.73 / 0.0      GATA3                     1.78 / 0.0      BACH2            1.99 / 0.0
   7       OCT-x            1.66 / 0.0      MAX                       1.75 / 0.0      CREB             1.79 / 0.0
                           1.38 / 1.2                               1.68 / 7.0
   8       RORalpha2          10-4          ATF                         10-6          FOXL1            1.74 / 0.0
   9       ELK1             1.36 / 0.0      PATZ1                     1.63 / 0.0      FOXJ2            1.66 / 0.0
           Tal-            1.30 / 8.8                                                                1.58 / 7.0
  10       1alpha:TCF3        10-5          MEIS1B:HOXA9             1.61 / 0.001     FOXF2              10-6

            Transfac      HepG2             HepG2              HUVEC                  HUVEC
                                         Enrichment                                 Enrichment /
               Top          TF             / P-value             TF                    P-value
                1        HNF4              4.55 / 0.0       SOX9                      2.05 / 0.0
                2        NR2F,             3.67 / 0.0       ELK1                      1.68 / 0.0
                3        HNF1              3.60 / 0.0       NFAT                      1.63 / 0.0
                4        FOXD1             2.68 / 0.0       FOXL1                     1.54 / 0.0
                5        FOXC1             2.20 / 0.0       STAT5B                    1.53 / 0.0
                6        FOXF2             1.74 / 0.0       LHX3                      1.45 / 0.0
                                          1.73 / 6.8                              1.42 / 1.14 10-
                   7     ARP-1              10-9            NF1                           8

                   8     FOXO3            1.64 / 0.0        FOXD3                     1.42 / 0.0
                   9     HFH-3            1.45 / 0.0        TBP                       1.42 / 0.0
                  10     RORalpha1        1.39 / 0.0        NKX3-1                    1.38 / 0.0

             Transfac         NHEK                 NHEK               H1-ES              H1-ES
                                               Enrichment                            Enrichment
                  Top         TF                 / P-value          TF                 / P-value
                   1     BACH2                   1.85 / 0.0     GTF3A                  2.67 / 0.0
                   2     BACH1                   1.72 / 0.0     PATZ1                  2.40 / 0.0
                   3     AP-1                    1.57 / 0.0     RFX1                   2.23 / 0.0
                   4     NFE2                    1.55 / 0.0     OCT-x                  2.17 / 0.0
                   5     TP53                    1.52 /0.0      E2F                    2.03 / 0.0
                                                1.23 / 5.0                           1.82 / 2.1
                   6     NF1                           10-5     POU3F2                   10-7
                   7     MEIS1A:HOXA9           1.20 / 0.121    SP1                    1.82 / 0.0
                                                1.15 / 2.0
                   8     MYC:MAX                       10-6     TFAP2C                 1.72 / 0.0
                   9     LHX3                   1.12 / 6.6     TFAP2A                 1.66 / 0.0

           10      USF                  1.11 / 0.0   MZF1            1.66 / 0.0

Supplementary Table 6. Top 10 enriched TRANSFAC motifs in cell-type
selective distal open chromatin. Motif enrichment of each transcription factor was
defined as the ratio of predicted binding site frequency among peaks for that cell type
and predicted binding site frequency among all cell-type selective peaks from the
other six cell lines. P-values were calculated using z-scores based on a normal

Supplementary Table 7. Expression values for transcription factors
corresponding to motifs discovered by de novo algorithms CisFinder and
cERMIT described in main text Figure 5A. Log2-based RNA values were
determined by Affymetrix exon arrays for enriched transcription factors. Numbers in
red represent expression in the cell line in which that the motif was enriched, within
cell-type selective open chromatin sites. Cells highlighted in yellow are transcription
factors that are most highly expressed (1st or 2nd out of seven cell lines) in the same
cell type that motifs were detected to be enriched.

                 TSSs       Distal sites
Cell Line    (Figure S10)   (Figure 6)
GM12878           358         12324
K562              184          4706
HeLaS3            113          6267
HepG2             245          6461
HUVEC             161          5999
NHEK              168          8129
H1-ES             198          5523

Supplementary Table 8. Number of cell-type specific promoter and distal
regions detected in seven cell lines used for Figure 6 and Figure S10.

Open Chromatin (OC)              Contains Annotated Gene    Contains No Annotated Genes
Correspondence                   1 cell type >1 cell type   1 cell type  >1 cell type
OC matches expression & Pol II         21         31             N/A           N/A
OC matches expression only              9         14             N/A           N/A
OC matches Pol II only                 11          6              19            20
OC matches neither expression
                                     19            3            24              4
nor Pol II
OC matches CTCF                      40           45            32              23
OC matches none of expression,
                                      8            1             9              0
Pol II, or CTCF
Total                                60           54            43              24

 Supplementary Table 9. Correspondence of cell types with open chromatin
 enrichment in COREs and gene expression, PolII binding, and CTCF binding.
 Cell lines with significantly increased total open chromatin across each CORE were
 identified by T tests. For each CORE, it was determined if the cell type with the top
 expressed gene (including those within 10 kb of upstream or downstream of the
 CORE), the highest Pol II signal, and the highest CTCF signal matched the cell line(s)
 with significant open chromatin. In eight instances, there were no cell types with
 significantly more open chromatin. For these eight COREs, we considered the single
 cell types with the highest median chromatin level and compiled them in the
 appropriate 1 cell type column. The table thus shows what correspondence with open
 chromatin was present for all181 COREs. Some COREs did not contain any
 annotated genes for which expression data was available. The Supplementary COREs
 Figures file contains detailed plots of this information for each CORE.

 Supplementary Fig 1. Overlap between DNaseI and FAIRE sites for each of the
seven cell types. For each cell type, a table of DNaseI and FAIRE overlaps at each of
the four different signal category cut-offs is given followed by a pictorial
representation of these overlaps.

A. GM12878

                       100K        50K            25K          10K      FAIRE
     100K               43,790     32,831          20,672       9,133
      50K               32,497     26,206          17,890       8,548
      25K               18,792     16,076          12,003       6,627
      10K                8,215      7,237           5,768       3,611

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top

B. K562

                       100K       50K            25K          10K      FAIRE
    100K                44,763    35,149          22,074       9,642
     50K                33,190    27,564          18,697       8,910
     25K                19,558    16,748          12,110       6,505
     10K                 8,133     6,992           5,076       2,846

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top

C. HeLaS3

                       100K       50K            25K          10K      FAIRE
    100K                39,582    30,059          19,706       9,132
     50K                31,552    25,372          17,652       8,642
     25K                19,912    17,166          13,046       7,217
     10K                 8,972     8,200           6,879       4,517

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top

D. HepG2

                       100K       50K            25K          10K      FAIRE
    100K                46,304    35,192          21,723       9,511
     50K                32,208    25,944          17,382       8,396
     25K                17,824    14,682          10,236       5,545
     10K                 7,392     6,017           4,182       2,333

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top


                       100K        50K            25K          10K      FAIRE
     100K               54,479     39,750          23,012       9,712
      50K               38,717     30,829          19,740       9,073
      25K               21,826     18,447          13,140       6,937
      10K                9,303      8,169           6,205       3,695

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top


                       100K       50K            25K          10K      FAIRE
    100K                52,653    37,008          21,694       9,374
     50K                32,922    25,162          16,307       7,763
     25K                16,753    13,040           8,946       4,752
     10K                 6,623     5,032           3,407       1,855

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top

G. H1-ES

                       100K        50K            25K          10K      FAIRE
    100K                30,143     21,964          14,289       6,875
     50K                19,531     14,264           9,340       4,559
     25K                10,233      7,238           4,610       2,251
     10K                 3,881      2,637           1,624         825

            FAIRE top 100K       FAIRE top 50K         FAIRE top 25K      FAIRE top 10K
DNase top
DNase top
DNase top
DNase top

Supplementary Fig 2. DNase-only and FAIRE-only sites are often found in
multiple cell types. DNaseI sites that were not detected by FAIRE (A: DNase-only),
and FAIRE sites that were not detected by DNaseI (B: FAIRE-only) were identified
from seven cell types and determined how often they were detected in one or more
cell types.

Supplemenatary Fig 3. DNase-only and FAIRE-only sites are enriched for
multiple histone modifications. The x-axis represents the relative positions of open
chromatin regions whose midpoints were set to 0. The number of sequence tags for
three histone tail modifications (H3K4me1, H3K4me3, H3K9ac) were counted at
each base position relative to these midpoints for open chromatin identified by both
DNase-seq and FAIRE-seq (black), DNase-seq only (red), FAIRE-seq only (green),
and a set of 100,000 randomly defined regions (blue). DNase-only sites are enriched
for promoter associated modifications (H3K4me3 and H3K9ac) while FAIRE-only
sites are enriched for a distal enhancer associated modification (H3K4me1).

Supplementary Fig 4. Representative region of the genome showing ChIP-seq data
(CTCF, MYC, and Pol II) for seven cell types.

Supplementary Fig 5. Average ChIP-seq intensity scores for CTCF, MYC and Pol
II sites that overlap DNaseI and FAIRE (DNase+FAIRE), DNase-only, or FAIRE-
only peaks. "Neither" represents ChIP peaks that do not overlap DNase or FAIRE.
Box plots show all the sites in each category from seven cell lines. Outliers more than
1.5 times the interquantile range away from the box are not shown.

Supplementary Fig 6: Overlap of DNase-seq and FAIRE-seq data with
previously published ChIP-seq data. DNase-seq and FAIRE-seq identify the vast

majority of ChIP-seq identified binding sites for most cell-specific factors, ChIP-seq
data are from (Fujiwara et al. 2009) (GATA-1), (Frietze et al. 2010) (ZNF263), (Raha
et al. 2010) (Pol III), (Kouwenhoven et al. 2010) (TP63), and (Motallebipour et al.
2009) (FOXA1 and FOXA3).

Supplementary Fig 7. Clustering of seven cell lines based on open chromatin (A)
and expression (B). For open chromatin, we first defined regions that are open in at
least one cell type, and then created a matrix for all seven cell types, such that each
region was classified as a "0" if that region wasn't open, or a "1" if it was open for one
or more cell types. We used mrbayes, which is phylogenetic software, to output a
consensus posterior tree.

Supplementary Fig 8. Saturation plot showing the total number of DNaseI HS sites
discovered as a function of the number of cell types tested (x-axis). DNaseI HS site
peaks saturate much more quickly in seven similar GM lymphoblastoid lines
(triangles, see methods) than in seven diverse cell types (circles). In general, the top
25k sites increase at a lower slope than the top 50k and 100k sites. However, for all
categories, the slope of the GM lymphoblastoid lines is lower than that of the diverse
cell lines.

Supplementary Fig 9. The sum of –log10 p-value distributions for open chromatin
sites found in a cell-type selective (x=1), more than one cell type (x=2-6), and
ubiquitous open chromatin sites (x=7).

Supplementary Fig 10. Deletions of CTCF motifs within three regions detected to
have strong insulator activity. Site directed mutagenesis was used to remove the
CTCF motif for insulator regions #1, #23, and #44 (see Table Supplementary Table
5). Removing the CTCF motif reduces the enhancer blocking activity for region #1,
but not for regions #23 and #44. Beta-globin chicken insulator was used as a positive
control with known insulator activity. Approximately twice as many G418R colonies
result when using a vector with no insulator (Neg Cont) compared to the chicken
insulator or Ins 1, 23, and 44.

Supplementary Fig 11. Cell-type selective proximal open chromatin is linked to
cell-type selective expression Cell-type selective open chromatin sites that map
within 2kb from a transcription start site (TSS) were identified for each cell type (x-
axis), and expression values were determined for these sets of genes from that cell
type (blue box plots). Expression values were also calculated for matched gene sets
for each cell type that did not display an open chromatin site at the TSSs (green box
plots). Asterisks indicate whether differences in RNA levels were significant (pair-
wise T-tests).

Supplementary Fig 12. H1-ES specific open chromatin around NANOG. Cell-type
selective sites only found in H1-ES cells are highlighted in gray.

Supplementary Fig 13. K562 cell-type selective open chromatin around GATA1 are
highlighted in gray.

A                                  B                                   C

Supplementary Fig. 14. Cell types with significantly higher levels of open
chromatin show increased expression, and PolII and CTCF ChIP-seq signals
across all COREs. For each Cluster of Open Regulatory Elemets (CORE), cell types
with significantly more open chromatin were identified. (A) Genes within 10kb of
each CORE show higher overall expression in cell types with open chromatin (right)
compared to those with more closed chromatin (left). Significantly open cell types
also have higher Pol II (B) and CTCF (C) ChIP-seq signals throughout the COREs.
All differences were statistically significant based on a 1-sided t-test.

Supplementary References

Bell AC, West AG, Felsenfeld G. 1999. The protein CTCF is required for the
       enhancer blocking activity of vertebrate insulators. Cell 98: 387-396.
Bhinge AA, Kim J, Euskirchen GM, Snyder M, Iyer VR. 2007. Mapping the
       chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic
       Enrichment (STAGE). Genome Res 17: 910-916.
Boyle AP, Guinney J, Crawford GE, Furey TS. 2008. F-Seq: a feature density
       estimator for high-throughput sequence tags. Bioinformatics 24: 2537-
Buck MJ, Nobel AB, Lieb JD. 2005. ChIPOTle: a user-friendly tool for the analysis
       of ChIP-chip data. Genome Biol 6: R97.
Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R,
       Meltzer PS, Wolfsberg TG, Collins FS. 2006. DNase-chip: a high-resolution
       method to identify DNase I hypersensitive sites using tiled microarrays.
       Nat Methods 3: 503-509.
Fisher RA. 1925. Statistical Methods for Research Workers. Oliver and Boyd,
Frietze S, Lan X, Jin VX, Farnham PJ. 2010. Genomic targets of the KRAB and
       SCAN domain-containing zinc finger protein 263. J Biol Chem 285: 1393-
Fujiwara T, O'Geen H, Keles S, Blahnik K, Linnemann AK, Kang YA, Choi K,
       Farnham PJ, Bresnick EH. 2009. Discovering hematopoietic mechanisms
       through genome-wide analysis of GATA factor chromatin occupancy. Mol
       Cell 36: 667-681.
Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. 2007. FAIRE (Formaldehyde-
       Assisted Isolation of Regulatory Elements) isolates active regulatory
       elements from human chromatin. Genome Res 17: 877-885.
Giresi PG, Lieb JD. 2009. Isolation of active regulatory elements from eukaryotic
       chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory
       Elements). Methods 48: 233-239.
Kouwenhoven EN, van Heeringen SJ, Tena JJ, Oti M, Dutilh BE, Alonso ME, de la
       Calle-Mustienes E, Smeenk L, Rinne T, Parsaulian L et al. 2010. Genome-
       wide profiling of p63 DNA-binding sites identifies an element that
       regulates gene expression during limb development in the 7q21 SHFM1
       locus. PLoS Genet 6: e1001065.
Li H, Ruan J, Durbin R. 2008. Mapping short DNA sequencing reads and calling
       variants using mapping quality scores. Genome Res 18: 1851-1858.
McDaniell R, Lee BK, Song L, Liu Z, Boyle AP, Erdos MR, Scott LJ, Morken MA,
       Kucera KS, Battenhouse A et al. 2010. Heritable individual-specific and
       allele-specific chromatin signatures in humans. Science 328: 235-239.
Monsteller FaF, R. A. . 1948. Questions and answers #14. The American
       Statistician 2: 30-31.
Motallebipour M, Ameur A, Reddy Bysani MS, Patra K, Wallerman O, Mangion J,
       Barker MA, McKernan KJ, Komorowski J, Wadelius C. 2009. Differential
       binding and co-binding pattern of FOXA1 and FOXA3 and their relation to
       H3K4me3 in HepG2 cells revealed by ChIP-seq. Genome Biol 10: R129.

Raha D, Wang Z, Moqtaderi Z, Wu L, Zhong G, Gerstein M, Struhl K, Snyder M.
       2010. Close association of RNA polymerase II and many transcription
       factors with Pol III genes. Proc Natl Acad Sci U S A 107: 3639-3644.
Shibata Y, Crawford GE. 2009. Mapping regulatory elements by DNaseI
       hypersensitivity chip (DNase-Chip). Methods Mol Biol 556: 177-190.
Song L, Crawford GE. 2010. DNase-seq: a high-resolution technique for mapping
       active gene regulatory elements across the genome from mammalian
       cells. Cold Spring Harb Protoc 2010: pdb prot5384.
The_ENCODE_Project_Consortium. 2007. Identification and analysis of functional
       elements in 1% of the human genome by the ENCODE pilot project.
       Nature 447: 799-816.


Shared By: