Introduction to Bio301 email@example.com Introduction Many biological studies using high throughput methods result in huge lists of sequences Many of these sequences are not associated with known genes or functions, blocking further interpretation. Introduction (cont.) Expressed sequence tag (EST) and complementary DNA (cDNA) sequences provide direct evidence for all the sampled transcripts and they are currently the most important resources for transcriptome exploration. ESTs are short (200–800 nucleotide bases in length), unedited, randomly selected single-pass sequence reads derived from cDNA libraries. Introduction (cont.) There are several steps in EST analysis and an overwhelming number of tools available for each step. However, there exists confusion in choosing the right tool for each different step of EST analysis and the subsequent downstream annotation at DNA or protein level. Generic steps involved in EST analysis Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015 Introduction (cont.) EST GENERATION Messenger RNA (mRNA) sequences in the cell represent copies from expressed genes. RNA cannot be cloned directly reverse transcribed to double-stranded cDNA The resultant cDNA is cloned to make libraries representing a set of transcribed genes of the original cell, tissue or organism. Characteristics of EST sequences Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015 Introduction to Bio301 firstname.lastname@example.org EST DATA RESOURCES The largest, freely-available repository of EST data (32 889 225 ESTs from 559 different organisms; as on Feb 2006) is dbEST. Resource# Web site Contents of EST Organisms Category* resource ApiEST-DB http://www.cbil.upenn.edu/apidot Raw Apicomplexan parasites F, D s/ dbEST at NCBI http://www.ncbi.nlm.nih.gov/dbE Raw All F, D ST/ Diatom EST database http://avesthagen.sznbowler.com. Raw and clusters Diatoms F, D ESTree http://www.itb.cnr.it/estree/ Raw and clusters Peach F, D Fungal genomics project https://fungalgenomics.concordia. Raw Fungal F, D ca/home/index.php Honey bee brain EST project http://titan.biotec.uiuc.edu/bee/h Raw and clusters Honey Bee F, D oneybee_project.htm Nematode ESTs at the Sanger ftp://ftp.sanger.ac.uk/pub/pathog Raw and clusters Parasitic nematodes F, D Institute ens/nem_ests/ NEMBASE- parasitic nematode http://www.nematodes.org Raw and clusters Parasitic nematodes F, D ESTs Parasitic and free-living http://www.nematode.net/ Raw and clusters Nematodes F, D nematode EST resource Phytopathogenic Fungi and http://cbr-rbc.nrc- Plant pathogenic fungi Fungi and oomycetes F, D Oomycete EST database cnrc.gc.ca/services/cogeme/ ESTs Plant Gene Research, Kazusa http://www.kazusa.or.jp/en/plant Raw Heterogeneous set F, D DNA Research Institute /database.html Plant Genome database http://www.plantgdb.org/ Raw and clusters Plants F, D Rat EST data at University of http://ratest.eng.uiowa.edu Raw Rat F, D Iowa Sanger Institute Xenopus http://www.sanger.ac.uk/Projects Raw and clusters Xenopus F, D tropicalis EST project /X_tropicalis/ The TIGR Gene Indices http://www.tigr.org/tdb/tgi/ Raw and gene indices All F, D UniGene database at NCBI www.ncbi.nlm.nih.gov/UniGene Raw and clusters All F, D *F, free for academic users; D, data available for download. Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015 Introduction What bio301 supplies? To manage sequences. To ease routine investigation of new functional annotations on ESTs. To recommend sequences for microarray design. To obtain functional differences between libraries using the Gene Ontology. Bio301 http://bio301.iis.sinica.edu.tw/ Register & Create library http://bio301.iis.sinica.edu.tw Click here to register Register fill table and click submit Successful message Register (cont.) Check your email Click here to activate your account Create Library Click “Create Library” to create a new library Key in Library name and upload EST sequence Checking parameter Click “Confirm” and bio301 will put job into schedule EST analysis processing Preprocessing stage: Vector removing, polyA/T trimming, delete low quality EST, delete mitochondrion DNA Clustering stage: TIGR assembler (TGICL) Annotation stage: BLAST, BLAST result parsing, GO retrieval Storing stage: Data file processing, SQL, WWW Vector and low Vector and low EST quality sequence Poly A/T quality sequence ATCGA…………ATCG …… SeqClean ESTs ESTs TUC TGICL TUS TUG ESTs TGICL (clustering & assembling ESTs by overlaps) TUS1 TUS2 TUS3 TUC1 TUC2 TUC=Tentative Unique Contig TUS=Tentative Unique Singleton TUGs TUG=Tentative Unique Gene Report Bio301 will send you a e-mail when the job is finished. Bio301 http://bio301.iis.sinica.edu.tw/ Checking Results Library List Click Detail to check result Click “Statistics” Statistics Statistics Gene ontology statistics in level two Click “Summary view” Summary View Click “Detail” to view annotations Summary View GO annotation by blast GO annotation by E2D What is E2D? Generally, we transfer annotations of known genes if they are significantly hit by query sequences. For query sequences that have no significant hits, we use E2D to retrieve their potential functional domain. A functional domain could infer functional annotations. Guo-Hsing Lee, Nai-Yu Chuang, Wen-Dar Lin, Hahn-Ming Lee, Chung-Der Hsiao and Jan-Ming Ho, "E2D: A Novel Tool for Annotating Protein Domains in Expressed Sequence Tags", IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE CIBCB, September 2006 E2D E2d performance E2d is about 69 times faster than InterProScan, which is the official tool provided by the InterPro. 1 0.9 0.8 0.7 F-measure 0.6 0.5 0.4 0.3 0.2 E2D 0.1 InterProScan 0 0-300 300-500 500-700 700-900 >=900 EST Length Click “GOBU” What is Gene Ontology? GO(Gene Ontology) is a structured terminology that describes gene products in a consistent way. GO Consortium members ZFIN (zebrafish), WormBase (worm), TAIR (arabidopsis), SGD (yeast), RGD (rat), MGD (mouse), HGNC (human), FlyBase (fly), … Three ontologies are offered Cellular component Molecular function Biological process Gene Ontology The structured terminology more general more detailed the same term the same term GOBU Stand for Gene Ontology Browsing Utility A visualization tool based on GO terms. The result of EST pipeline is input to this visualization tool. This tool helps biologists to obtain functional profile of their EST data. Share your library Click “Share” Share your library Key in someone’s ID and click “Add” Share your library (cont.) Shared ID list Share your library (cont.) Someone share their library to you. Bio301 http://bio301.iis.sinica.edu.tw/ Library Comparisions Library Comparison What's the difference between these two libraries? ※ We provide a way to compare libraries and tell their variations in functions. Library Comparison Justifications biologists used to sequence more than one library from different tissues of the same organism from similar tissues belonging to different organisms To study functional differences between libraries, we need a standard approach Must be appropriate to all kinds of EST data Bio301 uses the Gene Ontology as the unique reference for functional comparisons. Library Comparison Two library comparison modules The hierarchical clustering of libraries according to their expression patterns with respect to GO terms. Closer libraries infer that they express similarly in terms of gene functions. Library Comparison (cont.) Two library comparison modules Rank GO terms according to their expression deviations from the expectation. Former terms might be expressed “more different” than other terms. Library Comparison (cont.) 2.Key in a name 1.Click “Library Comparison” Library Comparison (cont.) Select libraries and click ”submit” Library Comparison (cont.) Click “Detail” to view result Library Comparison (cont.) Click to view functional difference between libraries Library Comparison (cont.) Thank You!
Pages to are hidden for
"Introduction to Bio301"Please download to view full document