Introduction to Bio301 by gregoria

VIEWS: 145 PAGES: 54

									Introduction to Bio301
   linus@iis.sinica.edu.tw
          Introduction
Many biological studies using high throughput
methods result in huge lists of sequences
Many of these sequences are not associated
with known genes or functions, blocking
further interpretation.
    Introduction (cont.)
Expressed sequence tag (EST) and
complementary DNA (cDNA) sequences
provide direct evidence for all the
sampled transcripts and they are
currently the most important resources
for transcriptome exploration.
ESTs are short (200–800 nucleotide
bases in length), unedited, randomly
selected single-pass sequence reads
derived from cDNA libraries.
    Introduction (cont.)
There are several steps in EST analysis
and an overwhelming number of tools
available for each step.
However, there exists confusion in
choosing the right tool for each different
step of EST analysis and the
subsequent downstream annotation at
DNA or protein level.
Generic steps involved in EST analysis




Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015
       Introduction (cont.)
EST GENERATION
  Messenger RNA (mRNA) sequences in the cell
  represent copies from expressed genes.
     RNA cannot be cloned directly
     reverse transcribed to double-stranded cDNA
  The resultant cDNA is cloned to make libraries
  representing a set of transcribed genes of the
  original cell, tissue or organism.
Characteristics of EST sequences




  Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015
Introduction to Bio301
   linus@iis.sinica.edu.tw
        EST DATA RESOURCES

The largest, freely-available repository of
EST data (32 889 225 ESTs from 559
different organisms; as on Feb 2006) is
dbEST.
  Resource#                    Web site                          Contents of EST        Organisms              Category*
                                                                 resource

  ApiEST-DB                    http://www.cbil.upenn.edu/apidot Raw                     Apicomplexan parasites F, D
                               s/
  dbEST at NCBI                http://www.ncbi.nlm.nih.gov/dbE Raw                      All                    F, D
                               ST/
  Diatom EST database          http://avesthagen.sznbowler.com. Raw and clusters        Diatoms                F, D

  ESTree                       http://www.itb.cnr.it/estree/     Raw and clusters       Peach                  F, D

  Fungal genomics project      https://fungalgenomics.concordia. Raw                    Fungal                 F, D
                               ca/home/index.php

  Honey bee brain EST project http://titan.biotec.uiuc.edu/bee/h Raw and clusters       Honey Bee              F, D
                              oneybee_project.htm

  Nematode ESTs at the Sanger ftp://ftp.sanger.ac.uk/pub/pathog Raw and clusters        Parasitic nematodes    F, D
  Institute                   ens/nem_ests/
  NEMBASE- parasitic nematode http://www.nematodes.org          Raw and clusters        Parasitic nematodes    F, D
  ESTs
  Parasitic and free-living   http://www.nematode.net/          Raw and clusters        Nematodes              F, D
  nematode EST resource
  Phytopathogenic Fungi and    http://cbr-rbc.nrc-               Plant pathogenic fungi Fungi and oomycetes    F, D
  Oomycete EST database        cnrc.gc.ca/services/cogeme/       ESTs

  Plant Gene Research, Kazusa http://www.kazusa.or.jp/en/plant Raw                      Heterogeneous set      F, D
  DNA Research Institute      /database.html

  Plant Genome database        http://www.plantgdb.org/          Raw and clusters       Plants                 F, D

  Rat EST data at University of http://ratest.eng.uiowa.edu      Raw                    Rat                    F, D
  Iowa
  Sanger Institute Xenopus      http://www.sanger.ac.uk/Projects Raw and clusters       Xenopus                F, D
  tropicalis EST project        /X_tropicalis/

  The TIGR Gene Indices        http://www.tigr.org/tdb/tgi/      Raw and gene indices   All                    F, D

  UniGene database at NCBI www.ncbi.nlm.nih.gov/UniGene          Raw and clusters       All                    F, D
                                                               *F, free for academic users; D, data available for download.
Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015
          Introduction
What bio301 supplies?
  To manage sequences.
  To ease routine investigation of new
  functional annotations on ESTs.
  To recommend sequences for microarray
  design.
  To obtain functional differences between
  libraries using the Gene Ontology.
            Bio301
http://bio301.iis.sinica.edu.tw/
      Register & Create library
http://bio301.iis.sinica.edu.tw




                      Click here to register
Register

         fill table and click submit




Successful message
Register (cont.)
Check your email


     Click here to activate your account
   Create Library




Click “Create Library” to create a new library
Key in Library name and upload EST
sequence
Checking
parameter
Click “Confirm” and bio301 will put job
into schedule
 EST analysis processing
Preprocessing stage:
  Vector removing, polyA/T trimming, delete
  low quality EST, delete mitochondrion DNA
Clustering stage:
  TIGR assembler (TGICL)
Annotation stage:
  BLAST, BLAST result parsing, GO retrieval
Storing stage:
  Data file processing, SQL, WWW
Vector and low                                      Vector and low
                        EST
quality sequence                         Poly A/T
                                                    quality sequence

                   ATCGA…………ATCG
……



                        SeqClean


                                                     ESTs
ESTs




                                   TUC


                                                    TGICL
                                   TUS


                         TUG
        ESTs




                 TGICL (clustering & assembling ESTs by overlaps)




                                                      TUS1                  TUS2              TUS3

TUC1                            TUC2
       TUC=Tentative Unique Contig                           TUS=Tentative Unique Singleton




                                         TUGs
                                 TUG=Tentative Unique Gene
                    Report
Bio301 will send you a e-mail when the job is
finished.
            Bio301
http://bio301.iis.sinica.edu.tw/
         Checking Results
Library List




         Click Detail to check result
Click “Statistics”
Statistics
            Statistics
Gene ontology statistics in level two
Click “Summary view”
Summary View
          Click “Detail” to view
          annotations
Summary View




        GO annotation by blast


        GO annotation by E2D
                   What is E2D?
  Generally, we transfer annotations of
  known genes if they are significantly hit
  by query sequences.
  For query sequences that have no
  significant hits, we use E2D to retrieve
  their potential functional domain.
       A functional domain could infer functional
       annotations.
Guo-Hsing Lee, Nai-Yu Chuang, Wen-Dar Lin, Hahn-Ming Lee, Chung-Der Hsiao and Jan-Ming Ho,
"E2D: A Novel Tool for Annotating Protein Domains in Expressed Sequence Tags", IEEE
Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE
CIBCB, September 2006
                                 E2D
E2d performance
  E2d is about 69 times faster than
  InterProScan, which is the official tool
  provided by the InterPro.

                 1
                0.9
                0.8
                0.7
    F-measure




                0.6
                0.5
                0.4
                0.3
                0.2                                       E2D
                0.1                                       InterProScan
                 0
                      0-300   300-500    500-700     700-900    >=900
                                        EST Length
Click “GOBU”
What is Gene Ontology?
GO(Gene Ontology) is a structured
terminology that describes gene products
in a consistent way.
GO Consortium members
  ZFIN (zebrafish), WormBase (worm), TAIR
  (arabidopsis), SGD (yeast), RGD (rat), MGD
  (mouse), HGNC (human), FlyBase (fly), …
Three ontologies are offered
  Cellular component
  Molecular function
  Biological process
          Gene Ontology
The structured terminology
  more general           more detailed




                                         the same term

                                         the same term
               GOBU
Stand for Gene Ontology Browsing Utility
A visualization tool based on GO terms.
The result of EST pipeline is input to this
visualization tool.
This tool helps biologists to obtain
functional profile of their EST data.
Share your library




                Click “Share”
Share your library




   Key in someone’s ID and click “Add”
Share your library (cont.)




      Shared ID list
Share your library (cont.)




    Someone share their library to you.
            Bio301
http://bio301.iis.sinica.edu.tw/
        Library Comparisions
                 Library Comparison

     What's the
difference between
these two libraries?




  ※ We provide a way to compare libraries and tell their
                variations in functions.
     Library Comparison
Justifications
  biologists used to sequence more than one library
  from
     different tissues of the same organism
     from similar tissues belonging to different organisms
  To study functional differences between libraries,
  we need a standard approach
     Must be appropriate to all kinds of EST data
  Bio301 uses the Gene Ontology as the unique
  reference for functional comparisons.
     Library Comparison
Two library comparison modules
  The hierarchical clustering of libraries according to
  their expression patterns with respect to GO terms.
  Closer libraries infer that they express similarly in
  terms of gene functions.
Library Comparison (cont.)
Two library comparison modules
  Rank GO terms according to their expression
  deviations from the expectation.
    Former terms might be expressed “more different” than
    other terms.
Library Comparison (cont.)



                                   2.Key in a name




    1.Click “Library Comparison”
Library Comparison (cont.)




         Select libraries and click ”submit”
Library Comparison (cont.)




        Click “Detail” to view result
Library Comparison (cont.)




        Click to view functional difference
        between libraries
Library Comparison (cont.)
Thank You!

								
To top