Emerging Technology Dna Sequencing - PDF
Description
Emerging Technology Dna Sequencing document sample
Document Sample


The Current State of
Next Generation DNA Sequencing Technologies
53rd Annual American Osteopathic Association (AOA) Research Conference:
The Translation of Genomic Science into Osteopathic Clinical Practice and Research
New Orleans, LA
Nov. 1, 2009
0
• Currently composed of 9 life sciences core facilities
• Open to all investigators at Cornell University and to outside investigators
• Established as part of the Cornell University New Life Sciences Initiative
• Part of a Center for Advanced Technology sponsored by the New York
State Office of Science, Technology, and Academic Research (NYSTAR)
• Administration: Institute for Biotechnology and Life Science Technologies
• Mission
• Promote research in the life sciences with advanced technologies
in a shared resource environment
• The Center includes:
• fee-for-service research
• technology testing and development
• educational components
1
DNA Sequencing Proteomics & Mass Microscopy and Computational Biology
and Genotyping Spectrometry Facility Imaging Facility Service Unit
Facility
Bioinformatics and computational biology services
Mass Spectrometer Mass Spectrometer Mass Spectrometer
Dionex MDLC and ABI 4000 Q Trap ABI 4700 MALDI-TOF/TOF Bruker Esquire Ion Trap • Provide research, software and hardware support for biology
Illumina / Solexa Sequenom Applied Biosystems research groups.
Genome Analyzer Compact MassArray 3730xl DNA Analyzer
• Research: collaborate on research projects that require
expertise in genomics, proteomics, quantitative genetics and
Image data collection and analysis structural biology.
• Computational resources: developed and maintain an extensive
Transmitted and fluorescence light microscopy, confocal suit of computational biology applications on clusters (BioHPC),
microscopy, stereo microscopy, spectrofluorometry, molecular
Protein preparation, detection, and identification with web interface for easy access.
High throughput DNA sequencing and genotyping imaging (whole mouse luminescence and fluorescence imaging),
• Protein and Peptide Separation (2D gel and 2D HPLC) and high-resolution ultrasonic imaging. • Software development: develop and modify software tools for
• Next generation sequencing (Illumina Solexa, Roche 454) • Sample Preparation (digestion, desalt, concentration) customized solutions; adapt computationally intensive
• Conventional DNA sequence detection and analysis (ABI 3730xl) • Protein Identification (gel and solution samples) Instrument training and application expertise for digital imaging, applications into massively parallel environment.
• Fragment analysis (microsatellite instability) • Quantitative Proteomics (2D DIGE, iTRAQ) image analysis, live cell microscopy, tissue and animal imaging
High-
• One of ten Microsoft High-Performance Computing Institutes
• SNP genotyping (ABI SNPlex, Illumina, Affymetrix, Sequenom) • Protein Characterization (PTMs) and fluorescence techniques and measurements.
worldwide.
• Real Time PCR (ABI 7900 HT) • Small Molecule Identification and Quantitation
Mouse
Animal
DNA and Protein
Microarrays Facility Epigenomics Facility Transgenic Facility Information Technology
y y
Microarrays Facility A
Services Facility
Transgenic organism production
GeneChip Microarrays BeadChip Microarrays NimbleGen Microarrays Analysis of regulation of gene activity and expression
Affymetrix Illumina BeadStation Hybridization System • Generate gain-of-function and loss-of-function transgenic mice.
GeneChip Microarrays BeadChip Microarrays NimbleGen Microarrays
Affymetrix Illumina BeadStation Hybridization System Use NimbleGen, Illumina, and Affymetrix microarrays, • Provide nuclear injection of DNA into fertilized oocytes.
Sequenom MassArray, ABI 3730xl, Illumina Genome Analyzer • Provide ES cell injections into blastocysts.
for studies on: • Provide cryopreservation services. Information technology (IT) services
• DNA methylation • Generation of transgenic organisms of other species possible.
• Protein-nucleic acid association • Create and maintain IT infrastructure that supports research
Gene expression and DNA analysis
• Small RNA, CGH and CNV profiling and operations within the core laboratories of the Center.
Gene expression and DNA analysis
Expression analysis: quantitation (coding level analysis,
analysis
biomarker signatures, transcription mapping), regulation
Expression analysis: quantitation (coding level analysis,
analysis
HELP assay
(HpaII tiny fragment enrichment by ligation mediated PCR)
High Throughput Screening • Create and support enterprise LIMS used to place
orders and receive results from multiple laboratories,
(DNA methylation, origins of replication, mapping sites of
biomarker signatures, transcription mapping), regulation ● Dual color assay with internal control for copy number or SNPs
transcription factor binding). replication, mapping sites of
Biochemical Assays
Binding
to view sample submission history with associated
(DNA methylation, origins of ● 2.1 M custom array capturing 1.3 million HpaII fragments
Protein
Surf ace
data files, to provide data results quickly and securely,
Waveguide
● Long oligo array array with probes specific to this assay
Substrat e
transcription factor binding).
DNA analysis: whole genome (linkage analysis, association
analysis Broadband Reflected
to track financials, and to view order status.
● Allows measurement of methylation as a continuous variable
Source wavelength
Cell-Based Assays
studies, population genetics, chromosomal copy number),
DNA analysis: whole genome (linkage analysis, association
analysis ● Enriches for the HYPOmethylated fraction of the genome Cell
Mass
Redistribution
• Offer access to commercial software licenses.
Surface
studies, genotyping (fine mapping, custom copy number),
targeted population genetics, chromosomal SNPs), sequence
Waveguide
Substrate
analysis (comparative sequencing, SNP identification, MeDIP (methylation immunoprecipitation) B roadband
Source
Reflected
wavelength
targeted genotyping (fine mapping, custom SNPs), sequence ● Enriches methylated DNA Corning Epic Biosystem Label-free Detection • Provide IT support for academic units on campus outside of
sequence analysis of small genomes). identification,
analysis (comparative sequencing, SNP ● Dual color assay test vs. input the Center as a fee-for-service.
sequence analysis of small genomes). t biochemical and cell based screening
Advanced Technology Contact Information
Assessment
George Grills
• Identify, test, evaluate, acquire, implement, and optimize newly
emerging technologies as shared resources in core facilities. LSCLC_director@cornell.edu
http://cores.lifesciences.cornell.edu
• Implement the Life Sciences Advanced Technology Seminar series.
Life Sciences Core Laboratories Center
Scope of Research Resources
Genes to Proteins to Phenotype
Imaging
and
Transgenics
Bioinformatics and Bio-IT
3
Collaborative Program in Personalized Medicine
• Integrated clinical phenotyping, SNP genotyping, gene expression,
epigenomics, and proteomics study of disease models.
• Includes the collaborative and integrated support of seven facilities
of the Life Sciences Core Laboratories Center:
– DNA Sequencing and Genotyping
– Microarrays
– Epigenomics
– Proteomics and Mass Spectrometry
– Protein Production and Characterization
– Information Technology
– Computational Biology
• Project on pulmonary disease.
• Collaborative program between Cornell Ithaca, Weill, and Qatar.
• Project on prostate cancer.
• Collaborative program between Cornell Ithaca, the University of
Rochester Medical School, and the Mayo Clinic.
• Project on neural tube defects.
• Collaborative program between Cornell Ithaca, Cornell Weill,
The Methodist Hospital in Texas, Texas A&M, and Stanford University.
4
Life Sciences Core Laboratories Center
5
Challenges to Implementation of Emerging Technologies
Decide on Appropriate Level of Implementing New Technologies
• Cutting edge
• very early stage emerging technology
• requires resources to tolerate bleeding of resources
during implementation
• Early adaptor
• early stage development of new technology
• involves optimization of applications and
development of new applications on the new platform
• Conventional user
• most important considerations are
high throughput, high quality results and
a robust platform with standard operating protocols
6
Emerging DNA Sequencing Technologies
Life Sciences Core Laboratories Center support and applications
Applications
• whole genome sequencing
• de novo assembly or whole genome re-sequencing
• re-sequencing of PCR amplicons
• candidate region or gene re-sequencing
• low frequency mutation detection
• mutation identification and profiling
• genetic variation identification
• SNP genotyping
• copy number variation
• gene expression analysis
• whole genome gene expression profiling
• expression studies using degraded RNA from stored samples
(e.g., formalin-fixed, paraffin-embedded tissue samples)
• gene regulation studies
• whole genome small RNA discovery and quantitation
• genome wide measurements of protein-nucleic acid interactions
• genome wide DNA methylation profiling
7
Potential Projects Using
Emerging Sequencing Technologies
• Sequencing of individual human genomes as part of
personalized medicine.
• Cancer research: mutation identification and profiling
tumor types for diagnosis and prognosis.
• Identification of microbial diversity for agricultural,
environmental and therapeutic goals.
• metagenomics in the body and in the environment
• DNA computing (manipulating DNA libraries for
highly parallel computations).
8
DNA Sequencing Technologies
500 - 1000 bases 44 - 88 - 100 bases
600,000 90 million
> 400 million > 10 billion GB)
5 -8 billion (8(GB)
8 - 12 hours 3 – 8 days
~ $10,000 ~ $10,000
9
Conventional DNA Sanger Sequencing
10
Conventional DNA Sanger Sequencing:
ABI 3730xl
11
454 Process:
Emulsion PCR
DNA Library Preparation emPCR Sequencing
4.5 h and 10.5 8h 6.5
h h
Anneal sstDNA to an Emulsify beads and PCR Clonal amplification Break microreactors,
excess of DNA capture reagents in water-in-oil occurs inside enrich for DNA-
beads microreactors microreactors positive beads
sstDNA library Clonally-amplified sstDNA attached to bead
12
Roche 454 Sequencing Technology
Shear DNA, capture on beads, Deposit beads in wells on plates
amplify by emulsion PCR
Add pyrosequencing enzymes
Place on sequence detection instrument Perform pyrosequencing reaction, sequence detection and analysis
13
454 Process: Sequencing
14
Roche 454 Sequencing Technology
Strengths
• Cost effective high throughput sequencing
• Just one sample preparation and one amplification for a genome
• Does not need large scale sample preparation robotics
Shortcomings
• Errors in homopolymer regions
• Shorter read lengths than Sanger sequencing
• make it difficult to align fragments reliably
• make it difficult to assemble across repeat regions
15
Roche 454 Sequencing Technology
16
Conventional DNA Sequencing Technology:
Rate of Production of Sequencing Data
Pseudomonas Production Sequencing:
Cumulative Number of Reads per Week
120,000
100,000
Cumulative Number of Reads
80,000
60,000
40,000
20,000
0
1 2 3 4 5 6 7 8 9 10 11
Week
17
Illumina Genome Analyzer
• sequencing-by-synthesis approach uses
4 fluorescently labeled modified nucleotides
• allows for reversible termination, which permits
each sequencing reaction cycle to occur in the
presence of all four nucleotides (A, C, T, G)
• sequence millions of DNA fragments in parallel,
a single base at a time.
18
Illumina Compared to Roche 454
Sequencing Technology
Strengths
• No problems with single base resolution in homopolymer tracts
• Lower cost per run than the 454
• No problems with potential emulsion PCR contamination
• Much more sequence data per sample run
Shortcomings
• Fewer runs per unit time than the 454
• Much more data load per run (data transfer and storage issues)
• Much shorter read lengths than the 454
However, unique portions of the genome often resolve themselves as
readily using 454 or Solexa reads as when longer Sanger reads are used.
19
Sequence Capture
20
Challenges to Implementation
of New Sequencing Technologies in Core Facilities
LARGE AMOUNTS OF DATA
Assessment of run quality and the analysis of data is an
IT data transfer and informatics analysis challenge
due to the large amount of data and significant time involved
in running data analysis.
SCHEDULING FULL RUNS
Collecting the appropriate number and type of samples for a full run is a
scheduling and cost efficiency challenge to operating the new instruments
effectively in a core facility environment.
DATA ANALYSIS APPLICATIONS
Manufacturer provided analysis tools are usable but not optimal.
Analysis tools for short reads are still in development by many groups.
Installation and comparison of informatics analysis options is a challenge.
21
Challenges to Implementation
of New Sequencing Technologies in Core Facilities
APPLICATION PROTOCOL DEVELOPMENT
Few standard protocols have been established yet for project design
of applications such as microRNA profiling.
The need for protocol development for many types of applications is a
challenge to running these types of projects in a core facility production
pipeline.
RAPIDLY DEVELOPING TECHNOLOGY
Frequent upgrades to instrumentation hardware and software present a
challenge to the standardization of operating protocols and quality metrics
even as they lead to significantly improved performance.
NEED FOR MULTIPLE PLATFORMS
Different new sequencing platforms are optimal for different applications.
Some projects are best done with a hybrid platform approach.
Potential need for multiple types of new sequencing instruments is a fiscal
and operational challenge to implementing these platforms in core facilities.
22
Challenges to Implementation
of New Sequencing Technologies in Core Facilities
Large and Rapidly Increasing Amounts of Data
23
Next Generation Sequencing Processing
24
Illumina Next Generation Sequencing
IT Pipeline
DATA GENERATION DATA TRACKING
Project Initiation
and sample submission Consumables Ordering
Control Samples
(phi X)
in development
LIMS
(Ordering, Order Tracking and Invoicing)
Cluster Generation Station AND
WikiLIMS
(Process Details, Analysis, Results and Reporting)
Flowcell Physical Storage
Illumina Genome Analyzer Investigator Group
Image Data Generation Place Orders
Retrieve Results
Review QC information
Images transferred
with Robocopy
Pull for RT Analysis
In development
DATA ANALYSIS Initiate CBSU Interactions
Receive Invoices
Data Analysis Servers
1 X 4 core, 16 GB RAM
2 X 8 core, 32 GB RAM
FTP / File Server
CBSU and CAC resources
Intel Modular Server (cluster)
Storage
iSCSI SAN (size+) Informatics Facility
Server Attached Storage (performance+) Analysis Pipeline
NFS (flexibility+) and Custom Analysis
Archive Image and Run Data DATA ARCHIVE
(LTO-3/LTO-4 Tape)
25
WikiLIMS Application
for Next Generation Sequencing at Cornell
Implemented WikiLIMS, a collection of customizable software tools for data management
that extends the open-source MediaWiki software.
Advantages of WikiLIMS:
• Structured information but also flexible as an electronic lab notebook.
• Based on open source components, therefore easy to extend and maintain.
• Implemented as a service from BioTeam, with no licensing or annual fees.
Milestones achieved in WikiLIMS implementation:
• Apache, MySQL, MediaWiki, and WikiLIMS installation and configuration.
• Automated migration of sequencing results from Illumina instruments to WikiLIMS.
• Facilitation of data collection from multiple Illumina sequencing runs into a common
storage system, annotation of data within a run, and establishment of relationships
between runs.
• Integration of WikiLIMS with existing LIMS for business processes (e.g., ordering and
invoicing).
Next goals for WikiLIMS development:
• Enhance reporting on sequencing production results statistics.
• Extension of WikiLIMS support for multiple core facility resources (e.g., microarrays,
protein production).
26
Challenges to Implementation of New Sequencing Technologies:
NEED FOR EXTENSIVE INFORMATICS INFRASTRUCTURE
Computational Biology Service Unit
Informatics Infrastructure Resources
• 677 node cluster (1,354 CPUs)
• 425-node Sun Linux cluster
280 GB local HD space
• 252-node Windows cluster
344 GB local HD space
• 6 web and general purpose servers
1.1 TB total HD space
• 4 file and database servers
15.5 TB total HD space
Dell Windows Cluster • Large shared-memory machine SUN Linux Cluster
with 64 GB RAM
27
Illumina Next Generation Sequencing
Informatics Pipeline
Base Calling
Firecrest image analysis
Bustard base-caller
DNA Profiling
Illumina QC analysis
Genome and Targeted Region
Re-
Re-sequencing
Production QA / QC Assembly and SNP detection
and low frequency mutation detection
Error rate of each cycle based on control lane
using PolyBases Short and custom algorithms
Average quality score at each cycle
Illumina and custom made QA / QC tools
RNA Prof
RNA Profiling
Sequence Alignment
RNA-Seq
MicroRNA-Profiling
RNA
Transcription Profiling
Profiling transcription activity at exon level
Build and annotate TAG library
ELAND, BLAST, MOSAIK, MAQ, et al., g
Alignment to genome.
Identification of alternative splicing
project dependant
Profiling in flanking
Stem-loopAlignment analysis
detection to genome. region
Stem-loop detection in flanking region
Epigenomics
Data Visualization
using GBROWSE Allele Specific
or UCSC Genome Browser ChIP-
ChIP-Seq Transcription Profiling
Protein-Nucleic Acid Association Profiling Analysis Align to reference mRNA sequences using BLAST.
Identify and count allele specific reads.
Use Mosaik to build ace files
to manually inspect selected genes
Custom Apps
Whole Genome
Custom Projects Gene Expression
Applications development Strand Discrimination
in Prokaryotes
One of the recent major new developments in this informatics pipeline is that the growth in robustness
of some analysis packages allows some of the downstream analysis to be standardized for some applications.
28
Using Informatics Pipeline to Access the Quality of Results
of Next Generation Sequencing
WikiLIMS visualization of Summary Data
88 nt run GAIIx
29
Integrated Clinical and Research Database
Integrated Database
for
Phenotyping, SNP Genotyping, Gene Expression,
DNA Sequencing, and Proteomics Data
• Developed as a flexible data management and data analysis pipeline
for plant projects and extended to human clinical studies.
• Patient or plant and animal sample information is encrypted in
the database and is only accessible to authorized personnel.
• Data access to the relational database is possible
through either a web browser or flat file spreadsheets.
• Database schema is designed to accommodate both
basic research projects and clinical study protocols.
• Analysis of the resulting large scale data sets
done on massively parallel computer clusters.
30
Computational Biology Applications Suite
for High Performance Computing (BioHPC)
HPC portal that can facilitate
next generation sequencing analysis applications
• User accessibility is major problem in life sciences high performance computing (HPC)
• Using a parallel cluster requires knowledge of the operating system, queuing system
and parallel programming
• BioHPC suite provides easy access to standardized life sciences applications on the
Windows HPC platform
• Web-based, point-and-click access to a variety of bioinformatics applications
with the underlying structure of the computational platform transparent to the user
• Enhancement of standard applications through parallelization, transparent to the user
• Web-based administration of users, jobs, applications, and clusters within the suite
• Standardized access to and maintenance of bioinformatics databases
• Allows remote HPC computing via JSDL (web service) connection to a remote cluster
31
Computational Biology Applications Suite
for High Performance Computing (BioHPC)
Applications
• Data mining / sequence analysis (BLAST, HMMER, InterProScan, RepeatFinder)
• Protein structure prediction and modeling (LOOPP, Modeller)
• Protein simulation (MOIL)
• Population genetics (BEAST, IM, IMa, InStruct, MDIV, Migrate, MKPRF, MSVAR,
Parentage, SFS_CODE, Structurama, Structure, TESS)
• Phylogenetics (MrBayes, ClustalW, T-COFFEE)
• Various other applications (R)
The system is flexible and can be easily customized to include additional software, including new software
analysis tools for next generation sequencing. The interface to each application is standardized. Users can
choose the cluster, number of nodes or allow the interface to determine it based on the best load balance
and node availability.
32
Challenges to Implementation of New Sequencing Technologies:
NEED FOR MULTIPLE PLATFORMS
High density scoreable markers for maize trait dissection
hybrid experimental strategy that involves a mix of instrument types
• Use 454 to sequence 8 reference lines to identify filtered fraction of the
genome.
• Follow this by 10X coverage of this methyl-fraction by Solexa runs on
each of the lines (86 total runs).
• The growing set of identified SNPs are being used to project
genotypes of the 27 founder maize lines onto their 7200 previously
genotyped offspring.
“Length Matters” - Roche 454
“Coverage Size Matters” - Illumina Solexa
and ABI SOLiD
33
Challenges to Implementation of New Sequencing Technologies:
RAPIDLY DEVELOPING FIELD
Next Generation Sequencing Technologies
$100,000 Genome
• example of approach:
– sequencing by synthesis
Roche 454, Illumina GAIIx, ABI SOLiD,
Helicos HeliScope, Dovor Polonator
Revolutionary Sequencing Technologies
$1,000 Genome
• examples of approaches:
– zero-mode waveguides
– nanopores
Pacific Biosystems, Life Technologies, Oxford Nanopore
How soon will we have to buy our next new toy?
What new challenges will the new platforms present?
34
Acknowledgements
Cornell University Life Sciences Core Laboratories Center
Facility Directors Facility Staff Faculty Advisory Board Members USDA-
USDA-ARS
James VanEe Barbara Hover Lawrence Bonassar Colin Parrish Edward Buckler
Peter Schweitzer Adriano Manocchia Thomas Brenna Andrea Quaroni Doreen Ware
Wei Wang Jeffrey Mattison Anthony Bretscher David Russell
Yushan Li Stefano Monni Edward Buckler Karel Schat Illumina
Sheng Zhang Chris Myers Carlos Bustamante John Schimenti Joe Garsetti
Cynthia Kinsland Elizabeth Paronett Richard Cerione Jeffrey Scott Lara Toerien
Rebecca Williams Lalit Ponnala Jeffrey Doyle Ram Sharma Gary Schroth
Ke-Yu Deng Daniel Ripoll Daniel Dwyer Michael Shuler
Jaroslaw Pillardy Sabine Baumgart Steven Ealick Kenneth Simpson Roche Applied Sciences
Qi Sun Carol Bayles Olivier Elemento Holger Sondermann Timothy Harkins
Robert Bukowski Cornelia Farnum Michael Stanhope Bruce Taillon
Becky Cameron Theresa Fulton Patrick Stover Lei Du
Faculty Advisors Linda Cote Maureen Hanson Ted Thannhauser
Chip Aquadro Joseph Dick Richard Harrison Rory Todhunter BioTeam
Andrew Clark Edward Dodge John Helmann Klaas Van Wijk Stan Gloss
Steve Elick John Flaherty David Holowka Susi Varvayanis Brian Osborne
Stephen Hamilton Jacob Kresovich Graham Kerslick Volker Vogt
David Lin Robert Sherwood Sondra Lazarowitz Martin Wiedmann
Ari Melnick Ken Smith Brian Lazzaro David Wilson National Institutes of Health
Jocelyn Rose Jezra Spisak Irby Lovette Mariana Wolfner (NCRR / NIH)
Adam Siepel Stefan Stefanov Alexander Nikitin Kelly Zamudio
Paul Soloway Thomas Stelick National Science Foundation
Institute for Biotechnology
Warren Zipfel Guowei “Will” Xia (NSF)
and Life Science Technologies
Yuan Xin Stephen Kresovich
Huiming Yan New York State
Linda Carr
Jie “Georgia” Zhao Weill Cornell (NYSTAR, NYSTEM)
Dianna Smith
Medical College
Michelle Cole
Harry Lander
Emily Sampson 35
36
ABRF Research Group Studies
on Next Generation Sequencing Technologies in Core Facilities
Association of Biomolecular Resource Facilities (ABRF)
• ABRF is an organization of over 700 scientists from shared research resource core facilities
and biotechnology laboratories.
• Members represent over 250 core labs in academic and research institutions, government,
and industry.
• ABRF Research Groups conduct multi-institutional studies to benchmark existing and
emerging biotechnologies and to optimize their implementation as shared research resources.
– Currently conducting a multi-disciplinary study on next generation sequencing instruments,
including the DNA Sequencing, Microarrays, and Genomic Variation Research Groups
• Web site is a comprehensive resource for biotechnology news and information about core
facility resources and services.
• Electronic discussion group facilitates sharing of technical advice and core facility networking.
• Official journal, the Journal of Biomolecular Techniques (JBT), covers genomics, proteomics,
imaging, and other biotechnologies, and core facility operational management.
• Participates in science policy in the US on the national level through affiliation with FASEB.
• Annual meeting presents the latest developments in biotechnologies and provides networking
opportunities for core facilities.
37
Related docs
Other docs by umm26160
IMF’S ROLE IN EMERGING MARKETS REASSESSING THE ADEQUACY OF ITS RESOURCES AND LENDING FACILITIES
Views: 584 | Downloads: 0
Get documents about "