Docstoc

EMBL EBI Powerpoint Presentation How to store and

Document Sample
EMBL EBI Powerpoint Presentation How to store and Powered By Docstoc
					How to store and visualize
     RNA-seq data

         Gabriella Rustici
      Functional Genomics Group

          gabry@ebi.ac.uk




                        EBI is an Outstation of the European Molecular Biology Laboratory.
                                  Talk summary


    • How do we archive RNA-seq data in ArrayExpress

    • How do we process RNA-seq data

    • How we display RNA-seq data in the Expression Atlas




2   26/08/2011   HTS data in ArrayExpress and Atlas
    Components of a functional genomics experiment




3   26/08/2011   HTS data in ArrayExpress and Atlas
                                        ArrayExpress
                         www.ebi.ac.uk/arrayexpress/

     Is a public repository for functional genomics data, mostly
      generated using microarray or high throughput sequencing (HTS)
      assays
     Serves the scientific community as an archive for data supporting
      publications, together with GEO at NCBI and CIBEX at DDBJ
     Provides easy access to well annotated data in a structured and
      standardized format
     Facilitates the sharing of microarray designs, experimental
      protocols,……
     Based on community standards: MIAME guidelines & MAGE-TAB
      format for microarray, MINSEQE guidelines for HTS data
      (http://www.mged.org/minseqe/)

4   26/08/2011    HTS data in ArrayExpress and Atlas
                              Standards for sequencing
                                            MINSEQE guidelines

     Minimal Information about a high-throughput Nucleotide
      SEQuencing Experiment

     The proposed guidelines for MINSEQE are (still work in progress):

                 1. General information about the experiment
                 2. Essential sample annotation including experimental factors and their
                    values (e.g. compound and dose)
                 3. Experimental design including sample data relationships (e.g. which
                    raw data file relates to which sample, ….)
                 4. Essential experimental and data processing protocols
                 5. Sequence read data with quality scores, raw intensities and
                    processing parameters for the instrument
                 6. Final processed data for the set of assays in the experiment


5   26/08/2011             HTS data in ArrayExpress and Atlas
                 Standards for microarray & sequencing
                                              MAGE-TAB format
    MAGE-TAB is a simple spreadsheet format that uses a number of
    different files to capture information about a microarray experiment. We
    adapted it to handle HTS data:

     IDF            Investigation Description Format file, contains top-level information about the
                    experiment including title, description, submitter contact details and protocols.

     SDRF           Sample and Data Relationship Format file contains the relationships between
                    samples and arrays, as well as sample properties and experimental factors, as
                    provided by the data submitter.

     Data files     Raw and processed data files.
                    The ‘raw’ data files are the trace data files (.srf or .sff). Fastq format files are also
                    accepted, but SRF format files are preferred. The trace data files that you submit to
                    ArrayExpress will be stored in the European Nucleotide Archive (ENA).
                    The processed data file is a ‘data matrix’ file containing processed values, e.g. files
                    in which the expression values are linked to genome coordinates.




6   26/08/2011           HTS data in ArrayExpress and Atlas
                 Types of data that can be submitted




7   26/08/2011        HTS data in ArrayExpress and Atlas
                 ArrayExpress – two databases




8   26/08/2011     HTS data in ArrayExpress and Atlas
         What is the difference between Archive
                        and Atlas?
    Archive
       • Query by experiment, sample and experimental
         factor annotations
       • Filter on species, array platform, molecule assayed
         and technology used

    Atlas
       • Gene and/or condition queries
       • Query across experiments and across platforms


9   26/08/2011   HTS data in ArrayExpress and Atlas
                  ArrayExpress – two databases




10   26/08/2011     HTS data in ArrayExpress and Atlas
                    How much data in AE Archive?




11   ArrayExpress
                  Browsing the AE Archive




12   26/08/2011   HTS data in ArrayExpress and Atlas
                                                                                           The date when the
                            Browsing the AE Archive                                        data were loaded
                                                                                             in the Archive

    AE unique               Curated title of             Number of      Species                           loaded in
   experiment ID             experiment                   assays      investigated                        Atlas flag




                                                                                                   Raw sequencing
                                                                                                   data available in
                                                                                                         ENA




                                     The total number of             The direct link to raw and
 The list of experiments             experiments and assay           processed data. An icon
 retrieved can be printed,           retrieved                       indicates that this type of
 saved as Tab-delimited format                                       data is available.
 or exported to Excel or as
13
 RSS feed
                  Browsing the AE Archive




14   26/08/2011   HTS data in ArrayExpress and Atlas
                  RNA-seq data in AE Archive




15   26/08/2011   HTS data in ArrayExpress and Atlas
                                    HTS data in AE Archive

          HTS vs other technologies                                               RNA-seq vs DNA-seq
                        3%

                                                                                   7%



                                                                                                       RNA seq
                                                    HTS experiments
                                                                                                       DNA seq
                                                    Other                   38%
                                                                                                 55%   RNA seq, DNA-seq


                  97%




                                         RNA-seq: coding vs non coding



                                                                      32%

                                                                                        coding
                                                                                        non coding

                                                    68%


16   23.09.2011              HTS data in ArrayExpress and Atlas
                  AE Archive – experiment view




17   26/08/2011   HTS data in ArrayExpress and Atlas
                         Link to raw data in ENA




18   23.09.2011   Master headline
                  RNA-seq processing pipeline

       Direct data
       submissions                                  ArrayExpress                           Expression
                                                       Archive                                Atlas
       and GEO
       import                                                                                    RPKMs
                                                                             SDRF




                                                          Data Acquisition
                                                            FASQ files
                                                                                    RNAseq
                                                                                    Processing
                                                                                    pipeline
                                    Short reads                          FASTQ
                                    (FASTQ files)                                                BAMs



                  EGA                                     ENA                               Ensembl



19   26/08/2011      HTS data in ArrayExpress and Atlas
                  RNA-seq processing pipeline:
                                    ArrayExpressHTS

     • ArrayExpressHTS is an R based pipeline for pre-processing, expression
       estimation and data quality assessment of RNA-seq datasets

     • The pipeline can be used for analyzing:
          private data
          public data, available through ArrayExpress and ENA

     • It can be used:
           on a local computer
           remotely on the EBI R Cloud, www.ebi.ac.uk/tools/rcloud

                                                          Goncalves et al., Bioinformatics 2011

20   26/08/2011      HTS data in ArrayExpress and Atlas
                  ArrayExpressHTS in Bioconductor




21   26/08/2011       HTS data in ArrayExpress and Atlas
                       ArrayExpressHTS pipeline




 transcriptome or genome

 Bowtie, BWA or TopHat

 filtering options
 (e.g., average base
 quality, read complexity,…)




 cufflinks or MMSEQ




22   26/08/2011          HTS data in ArrayExpress and Atlas
                  Using ArrayExpressHTS
     library("ArrayExpressHTS")
     aehts <- ArrayExpressHTS("E-GEOD-16190", usercloud = FALSE)




23   26/08/2011   HTS data in ArrayExpress and Atlas
ArrayExpressHTS on the R cloud

                  R-cloud
                   R-server
                     R-server
                       R-server
                                                                                   - SDRF
                                                                                   - IDF
                           ArrayExpressHTS
                           R package                                                           ArrayExpress


 References,                                                         - RAW DATA
 Index &                                                             - Experiment meta data
 Annotation

                   Pipeline tools                                                             ENA
                   - tophat
                                                              - ExpressionSet
                   - bowtie                                   - Quality reports
                   - bwa
                   - cufflinks
                   - samtools

                                                                User Project
                                                                  Storage


24   26/08/2011          HTS data in ArrayExpress and Atlas
                  RNA-seq processing pipeline

         Direct data
         submissions                                  ArrayExpress                          Expression
                                                         Archive                               Atlas
         and GEO
         import                                                                                   RPKMs
                                                                              SDRF




                                                           Data Acquisition
                                                             FASQ files
                                                                                     RNAseq
                                                                                     Processing
                                                                                     pipeline
                                       Short reads                        FASTQ
                                       (FASTQ files)                                              BAMs



                   EGA                                    ENA                                Ensembl


25   26/08/2011      HTS data in ArrayExpress and Atlas
                  ArrayExpress – two databases




26   26/08/2011     HTS data in ArrayExpress and Atlas
     Expression Atlas
     Experiment selection criteria
      The criteria we use for selecting experiments for inclusion in the Atlas
       are as follows:

                  • For microarray-based experiments, array designs must be
                    provided to enable re-annotation using Ensembl or Uniprot (or
                    have the potential for this to be done)
                  • High MIAME/MINSEQE scores
                  • Experiment must have 6 or more assays
                  • Sufficient replication and large sample size
                  • EF and EFV must be well annotated
                  • Adequate sample annotation must be provided
                  • Processed data must be provided or raw data which can be
                    renormalized must be available

27   26/08/2011          HTS data in ArrayExpress and Atlas
     Expression Atlas
     Atlas construction

      Data is taken as normalized by the submitter
      Gene-wise linear models (limma) and t-statistics are applied to
       identify the differentially expressed genes across all biological
       conditions, in all the experiments

      The result is a two-dimensional matrix where rows correspond to
       genes and columns correspond to biological conditions

      The matrix entries are p-values together with a sign, indicating the
       significance and direction of differential expression




28   26/08/2011    HTS data in ArrayExpress and Atlas
Expression Atlas
Atlas construction
                                Expression Atlas




30   26/08/2011   HTS data in ArrayExpress and Atlas
                                Atlas home page
                              http://www.ebi.ac.uk/gxa/
                               Restrict query by
 Query for                     direction of differential   Query for conditions
 genes                         expression




                                                                    The ‘advanced
                                                                    query’ option
                                                                    allows building
                                                                    more complex
                                                                    queries




31   26/08/2011   HTS data in ArrayExpress and Atlas
                  Atlas gene summary page




32   26/08/2011   HTS data in ArrayExpress and Atlas
                              Atlas heatmap view




33   26/08/2011   HTS data in ArrayExpress and Atlas
                  Atlas experiment page




34   23.09.2011
                  View of RNA-seq data in Ensembl




35   26/08/2011     HTS data in ArrayExpress and Atlas
                  Atlas gene-condition query




36   26/08/2011    HTS data in ArrayExpress and Atlas
                     Data submission to AE




37   26/08/2011   HTS data in ArrayExpress and Atlas
     Submission of HTS gene expression data
     • Submit via MAGE-TAB submission route
     • Submit:
            • MAGE-TAB spreadsheet containing details of the samples and
              protocols used.
            • Trace data files for each sample (in SRF, FASTQ or SFF format )
            • Processed data files
     • For non-human species we will supply your SRF or FASTQ files to
       the European Nucleotide Archive (ENA).
     • If you have human identifiable sequencing data you need to submit
       to the The European Genome-phenome Archive and not
       ArrayExpress. They will supply you with a suitable template for
       submission and store human identifiable data securely.

38   26/08/2011       HTS data in ArrayExpress and Atlas
     What happens after submission?

     • Email confirmation
     • Curation
        • The curation team will review your submission and will
          email you with any questions.
        • Possible reopening for editing
     • We will send you an accession number when all the
       required information has been provided.
     • We will load your experiment into ArrayExpress and
       provide you with a reviewer login for viewing the data
       before it is made public.



39   26/08/2011   HTS data in ArrayExpress and Atlas
     To find out more
     Email questions regarding ArrayExpressHTS to:
     •    Angela Goncalves, filimon@ebi.ac.uk
     •    Andrew Tikhonov, andrew@ebi.ac.uk


     Read more at:

     •    Goncalves et al. (2011). A pipeline for RNA-seq data processing and quality
          assessment. http://www.ncbi.nlm.nih.gov/pubmed/21233166

     •    http://www.bioconductor.org/packages/2.9/bioc/html/ArrayExpressHTS.html

     •    R-cloud: http://www.ebi.ac.uk/Tools/rcloud/

     eLearning courses: http://www.ebi.ac.uk/training/online/


40   26/08/2011        HTS data in ArrayExpress and Atlas

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:44
posted:9/23/2011
language:English
pages:40