VIEWS: 46 PAGES: 4 POSTED ON: 8/8/2011
DNA SEQUENCING As published in BTi - October 2006 Breakthrough innovations in ultra-high throughput sequencing methods The first high throughput, low-cost alternative to standard sequencing microorganisms. For instance, sequenc- systems that are based on Sanger chemistry has been introduced to the ing of a 3 megabase bacterial genome up to a high quality draft is now possi- market and is taking the sequencing world by storm. Developed by the ble within days, rather than months. US-based company 454 Life Sciences, the new technology — the Genome Due to the fact that high-quality reads Sequencer 20 system — is exclusively distributed worldwide by Roche are generated at an average read length Diagnostics. Using the new sequencing system, it is possible to address a of 100 bases, both de novo assembly as broad variety of different applications in the fields of whole genome well as resequencing (mapping) of sequencing, transcriptome and gene regulation studies, as well as ampli- genomes is possible. The mapping con analysis. This article describes the principles and some of the applica- application generates the consensus DNA sequence by mapping, or align- tions of the new system. Many such applications simply cannot, for either ment, of the reads to a reference technical or economical reasons, be carried out using standard Sanger sequence, as well as a list of high-confi- technology. The new system has already led to significant developments dence mutations. The current version of in genomic research such as the identification of novel transcripts or the system software has the capacity to unknown classes of small non-coding RNAS (sncRNAs). analyse genomes up to 50 Mbp in size at 15-25x depth of coverage. Examples of The Genome Sequencer 20 System is an PicoTiterPlate device preparation, sequenc- several bacterial genome assemblies are ultra-high-throughput automated DNA ing run, and data analysis. The output of a shown in Table 1. The mapping applica- sequencing system capable of carrying out single run is typically 20 x 106 nucleotides or tion typically results in greater than and monitoring sequencing reactions in a more (for the 70 x 75 mm PicoTiterPlate 99.99% accuracy over 95% of the non- massively parallel fashion. Since the new device) at an average read length of 100 repeat parts of the genome (Q40+ system provides a complete solution for high-quality bases, and multiple runs can bases), when the average genome cover- ultra-high-throughput DNA sequencing, be pooled for off-line assembly/mapping. It age is at least 15-fold. The assembler individual researchers can now prepare is the combination of both the massive application yields N50 contig sizes samples and sequencing reactions, gener- throughput and low costs per clonal read greater than 10 kb with higher than ate sequence reads, and assemble genome that enables new applications which were 99.99% accuracy over 95% of the non- sequence data within days. The whole previously not possible to be carried out. repeat parts of the genome (Q40+ genome sequencing workflow from bases), when the average genome cover- sample input to data output consists of WHOLE GENOME SEQUENCING age is at least 25-fold. (Contigs are con- DNA library preparation, emulsion-based The new system has already revolu- tiguous sequences of DNA created by clonal PCR amplification (emPCR), tionised whole genome sequencing of assembling overlapping sequences). DNA SEQUENCING data obtained from the paired-end libraries are combined with standard Genome Sequencer 20 whole genome shotgun sequencing reads in a new ver- sion of the assembler. The benefits of combining the reads from Genome Sequencer shotgun sequences with the paired-end reads have been tested on sev- eral bacterial genomes and on a Saccharomyces cerevisiae genome that had previously been sequenced at 454 Life Sciences. For instance, the 4.6 Mbp genome of E. coli K12 strain was sequenced in three standard runs to a depth of 22-fold. The assembly per- formed with the Newbler assembly soft- Figure 1. Paired-end library preparation scheme: genomic DNA is fragmented to yield average ware resulted in 140 unoriented contigs. fragment sizes around 2.5 kb. The fragmented genomic DNA is methylated with Eco RI methy- One additional sequencing run of a lase to protect the Eco RI restriction sites. The ends of the fragments are blunt-ended and pol- paired-end library yielded approximately ished, and a biotinylated oligonucleotide adaptor is blunt-end ligated onto both ends of the 112,000 reads. The paired-end data digested DNA fragments. Subsequent digestion with Eco RI restriction enzyme cleaves a portion improved the genome assembly to 20 of the adaptor DNA, leaving sticky ends. The fragments are circularised and ligated, resulting in multi-contig scaffolds covering 98.6% of 2.5-kb circular fragments. The adaptor DNA contains two Mme I restriction sites and after the genome. The 12.2 Mbp genome of S. treatment with Mme I the circularised DNA is cleaved 20 nucleotides away from the restriction cerevisiae S288C (16 haploid chromo- site. Digestion generates small DNA fragments that have the adaptor DNA in the middle and somes and one 86 Kbp mitochondrion) 20 nucleotides of genomic DNA that were once approximately 2.5 kb apart on each end. These was shotgun sequenced in nine sequenc- small, biotinylated DNA fragments are purified from the rest of the genomic DNA by strepta- ing runs yielding approximately 23-fold vidin beads. over sampling. The assembly performed with the Genome Sequencer De Novo Since the Genome Sequencer 20 System uses neither cloning nor Assembler resulted in 821 unoriented contigs. Two additional electrophoretic separation, sequence coverage biases normally sequencing runs of a paired-end library yielded approximately associated with these techniques are eliminated. Lack of 395,000 reads. The paired-end data reduced the assembly to 153 sequence coverage bias has been confirmed by sequencing scaffolds, covering 93.2% of the genome. several bacterial genomes. The remaining gaps in assembled genome sequences are due largely to the presence of sequence AMPLICON ANALYSIS repeats longer than ~75 bp. This means that the Genome Sequence reads from the new system are on average 100 bases Sequencer 20 System is particularly useful in sequencing AT-rich long, but are tens of thousands-fold deep. These characteristics organisms resistant to subcloning in E. coli. One example is the open up a unique opportunity to use the system in applications sequencing of the filamentous fungus Neurospora crassa. By where the detection of rare variants of a known sequence in using the new sequencing technique, 2.5% additional sequence complex mixtures of sequences is crucial. Direct sequencing of information has been identified compared with the Sanger mixed, non-clonal PCR products (amplicons) using standard sequencing approach. Not surprisingly, the GC content of this Sanger dideoxy terminator chemistry is not sensitive enough to additional information was quite low (27%). identify and quantify many of the sequence variants present in biological specimens. Bacterial cloning of amplicons into a vec- WHOLE GENOME SEQUENCING WITH PAIRED-END LIBRARIES tor prior to traditional sequencing of individual clones will Recently, the developers of the new system, namely 454 Life increase the sensitivity, but at the cost of a large increase in time Sciences of Branford, Connecticut, USA, have also developed a and expense, thus making this approach uneconomical in prac- new protocol which makes whole genome sequencing using the tice. Genome Sequencer 20 System even more efficient. Paired-end The 454 technology provides amplification of hundreds of thou- libraries are generated and sequenced in order to determine the sands of molecules via the emulsion PCR step and highly accu- orientation and relative positions of contigs produced by the de rate sequencing, as each fragment is sequenced to a depth of a novo shotgun sequencing and assembly [Figure 1]. Sequence hundred- or a thousand-fold. DNA SEQUENCING complex samples with low tumour con- (sncRNAs), for the identification of tran- Although there are many potential uses tent for which conventional Sanger scription factor binding sites or the eluci- for amplicon sequencing, the molecular sequencing was not informative . dation of DNA-methylation patterns. biology and software developments Somatic EGFR mutations that were Compared with sequencing of small non- undertaken so far have initially focused on missed when the Sanger sequencing coding RNAs (sncRNAs) using the Sanger oncology research applications, more method was used were identified. approach, during which miRNA frag- specifically on the detection of rare somatic ments are concatemerised in order to mutations in complex cancer samples. The TRANSCRIPTOME AND GENE make sequencing more economical, the ability to sensitively detect somatic muta- REGULATION STUDIES new approach is much more straightfor- tions in cancer cells promises to be of great The Genome Sequencer 20 enables the ward. The often difficult concatemerisa- help in understanding in much greater study of transcriptomes at a previously tion step can simply be skipped. Moreover, detail the development of cancer at the impossible depth of coverage and sensitiv- costs per clonal read are much lower using genetic level. Additionally, none of the ity. This is due to the system's massively the Genome Sequencer 20 System, thus existing high-throughput technologies parallel sequencing technology which providing a real basis for screening for offers the possibility of novel variant generates a high number of sequence scnRNA at a genome-wide level. As an detection. To demonstrate the power of reads (minimum of 200,000 single reads example, Girad et al. used the system in the new system, previously described per 5-hour run), thus facilitating the iden- order to characterise a new class of small single nucleotide polymorphisms from tification of previously unknown tran- RNAs, called piwi-interacting RNAs upstream of the HLA-DMA gene to the scripts . Preliminary results from a (piRNAs), in mouse testes . More than TAP2 gene in the Class II region of the short-tag sequencing project also revealed 87,000 reads were generated, around MHC were chosen as a model system . that the Genome Sequencer 20 System was 53,000 of which would be classified as It was possible to reproduce the already very well suited for transcript quantifica- candidate piRNAs. Other examples published data using the new system; tion (data not shown). regarding the characterisation of sncRNAs allele frequencies down to 3% were easily include the genome-wide analysis of an detected [Figure 2]. The results of a recent In terms of gene regulation, the new tech- Arabidopsis thaliana dicer mutant , or study confirmed that using the Genome nology has so far been shown to be per- the characterisation of the piRNA com- Sequencer 20 System enabled detection of fectly suited for the genome-wide identifi- plex from rat testes . low-abundance oncogene mutations in cation of small non-coding RNAs The identification of binding sites of DNA-binding proteins, such as those of the transcription factor p53 has recently been described . DNA fragments that include binding-site sequences can be iso- lated after immunoprecipitation with their protecting transcription factors and characterised using high-throughput sequencing. This study revealed that bind- ing sites can be detected with unprece- dented efficiency and sensitivity. LOSS OF METHYLATION AND HYPER- METHYLATION An extremely important regulation mech- anism of many genes is the loss of methy- lation (and also hypermethylation) of CpG islands within promoter regions. Genome methylation occurs at cytosine residues Figure 2. Genotyping results of three SNPs in the HLA-DMA gene region (class II MHC). located 5´ to a guanosine in a CpG dinu- Base changes along the fragment sequence (x-axis) are colour coded and their positions cleotide. Dense areas of CpG dinucleotides shown as bars. The primary y-axis denotes base change frequency. The secondary y-axis as within promoter regions are organised well as the black line above the mutation spectrogram represents sequencing coverage. Both into CpG islands. Applying a known bisul- high-frequency alleles (top panel) and low-frequency alleles (bottom panel) are shown. phite treatment procedure, 454 Life DNA SEQUENCING Figure 3. Following extraction from tissue or cells, genomic DNA is treated with sodium bisulphite, which serves to capture the methy- lation status of the sample. Treatment of DNA with sodium bisulphite results in the deamination of unmethylated cytosines to uracils while methylated cytosines remain unchanged. The PCR amplification of the converted C (to uracil) will result in the replacement of thymine for the uracil. Comparison of the sequence obtained from the bisulphite treated amplicon to the published sequence using the Genome Sequencer 20 amplicon software allows identification of any differential methylation. Sciences has recently established a sequencing-based technology 3. Ng P et al. Nucleic Acids Res 2006; 34: e84. to determine quantitatively the methylation state of each CpG 4. Girard A et al. Nature 2006; 442: 199 -202. dinucleotide in a given target genomic sequence [Figure 3]. To 5. Henderson IR et al. Nat Genet 2006; 38: 721-725. better understand how the new chemistry performs on cancer 6. Lau NC et al. Science 2006; 313: 363-367. research samples, eight samples from colo-rectal cancer tumours 7. Kim BN et al. Int J Oncol 2005; 26: 1217-1226. were analysed, together with matched normal adjacent tissue 8. Herman JG et al. Proc Natl Acad Sci 1996; 93: 9821-9826. (NAT). The results obtained in this experiment agreed with those in the published literature: a significant percentage of CRCs show ROCHE DIAGNOSTICS methylation of the p16 CpG island [7, 8]. Mannheim, Germany. Tel +49 621 759 8555 More details can be obtained from: email@example.com REFERENCES 1. http://www.le.ac.uk/gc/ajj/HLA/Polymorphism.html 2. Thomas RK et al. Nat Med 2006; 12: 852-855.
Pages to are hidden for
"Breakthrough innovations in ultra-high throughput sequencing methods"Please download to view full document