Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

BioVLAB-Microarray_ Microarray Data Analysis in Virtual Environment

VIEWS: 22 PAGES: 34

  • pg 1
									BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment
Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun Kim School of Informatics Indiana University

CONTENTS
• • • • • • • • Introduction Approach Related Works Microarray technology System Architecture Experiments Conclusion Demo

INTRODUCTION
• Analysis of high throughput microarray experiment • Performing microarray analysis is a demanding task for biologists and small research labs • Computing infrastructure issue
– Computationally intensive – Nontrivial to integrate various bioinformatics applications

• Exploratory data analysis issue
– Multiple tasks in a single batch – Repetitive execution

APPROACH
• On-demand computing resources • A suite of microarray analysis applications • Reconfigurable GUI workflow composer can alleviate technical burden – Well defined workflow can be repetitively used • Web portal • Reusable, reconfigurable, high-level workflow execution workbench powered by computing clouds for microarray gene expression analyses

RELATED WORKS
• Efficient and user-friendly workflow composers and execution engine
– SIBIOS, BioWBI, KDE Bioscience

• Distributed and heterogeneous computing resources + Workflow system
– Taverna, Triana, Kepler, GNARE, RENCI-Bioportal

MICROARRAY TECHNOLOGY
• A subset of genes is expressed corresponding to environmental changes and its changing needs • Dynamics of cell activity • Measure gene expression levels of hundreds of thousands of genes within a cell • Usage
– Function prediction: Guilt by association – Interaction: Co-expression of genes in transcription networks reveals how they interact. – Drug discovery: Identify genes related to certain disease and detect effectiveness of new drugs

Source: www.liv.ac.uk/lmf/about_microarrays.htm

RESEARCH GOALS
• Gene expression analysis
– Search for similar patterns of genes
• Similar patterns of gene may reveal the function of a gene with unknown function

– Extraction of differentially expressed genes
• Statistical evaluation

– Clustering
• Protein function prediction • Genes with similar expression may need to be studied as a group

– Component analysis
• Hidden structure of expression patterns may be revealed

• Expression network analysis
– Expose hidden structures – Protein-protein interaction (PPI) network analysis
• Central issue: key role in understanding how a cellular system works • Modularity in structure in a network may reflect higher-level functional organization of cellular components

MICROARRAY ANALYSIS COMMON TASK

• Output of a task can plugged into another task • Repeat the same set of tasks with small changes of parameters

SYSTEM ARCHITECTURE
• Workflow composer and execution engine • Application services • Web portal
Workflow Composer & Execution
Execute
Manage Data

Create

Application Services

Web Portal

WORKFLOW COMPOSER & EXECUTION ENGINE
• Introduced in the scientific communities to execute a batch of multiple tasks Input • Enables repetitive tasks easily • Directed acyclic graph
– Node: application to execute
• Starting node: input • End node: output

Task A

Task B

Task C Output

– Edge: a flow of data

XBaya
• • • • GUI Workflow composer and execution engine Developed at IU Drag-and-drop compose from workbench Monitor status of workflow execution
Workflow Composer Panel

Workbench Panel

Application Information Panel

Drag-and-drop

Monitor Panel

APPLICATION SERVICES
• Interoperability among applications can be achieved by Application Services • Generic Service Toolkit (Gfac)
– Gfac converts command-line bioinformatics application into a web service

• On-demand computing resources
– Amazon Elastic Computing Cloud (EC2)

• Remote storage services
– Amazon Simple Storage Services (S3) – Microsoft Application-Based Storage

BioVLAB APPLICATION DEVELOPMENT PROCEDURE
User

• Develop a command line app.
• Install the app. in Amazon EC2 • Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage • Make a virtual machine image • Register the app. by using Gfac

Gfac Registration form

Admin User

• Instantiate EC2 and run the app. by using XBaya

(Gfac user manual)

WEB PORTAL
• Adiministrator
– Management of registered applications by Gfac registry portlet – User management and access control

• User
– access of stored data

• Built by Open Grid Computing Environments (OGCE)

ANALYSIS RESOURCES
• • • • • • R: statistical learning Bioconductor: microarray analysis Data acquisition: NCBI GEO Microarray DB Similar expression pattern: correlation Differentially expressed gene: limma package Clustering: K-means, hierarchical clustering, QT clustering, biclustering, Self organizing map (SOM) • Component Analysis: principal component analysis (PCA) and Independent component analysis (ICA) • Network: Database of Interacting Proteins (DIP), Perl Graph package and GraphViz

EXPERIMENT
• Data set: GDS38
– Remotely retrieved from the NCBI GEO database – Time-series gene expression data to observe cell cycle in Saccharomyces cerevisiae yeast genome. – 7680 spots in each 16 samples – Each sample was taken every 7 minutes as cell went through cell cycle.

• Expression analysis • PPI network analysis

EXPERIMENTS

CONCLUSION
• Microarray data analysis in virtual environment • Coupling computing clouds and GUI workflow engine • Effective system design for small research labs

FUTURE WORKS
• Integration of more packages and analyses • A system of great flexibility
– Integrate various high throughput data
• Microarray, mass spectronomy, massively parallel sequencing, etc

– Integrate various computing resources
• Clouds, grid, and multi-core PCs

– Integrate various public resources
• NCBI, KEGG, PDB, etc

SCREEN SHOTS

S3 BROWSER

EC2 ACTIVE INSTANCE

WORKFLOW FOR CLUSTERING

INPUT PARAMETERS

WORKFLOW EXECUTION

DATA ACQUISITION

SUBSET EXTRACTION

CLUSTERINGS

WORKFLOW TERMINATION

EXPERIMENT RESULT

DOWNLOAD FILE

HEATMAP FOR K-MEANS CLUSTERING

ACKNOWLEDGEMENT
• The work is partially supported by NSF MCB 0731950 and a MetaCyt Microbial Systems Biology grant from Lilly Foundations. • Extreme Computing Group at IU
– Suresh Marru, Srinath Perera, and Chathura Herath

Thank You


								
To top