Detecting acceleration in the rate of nonsynonymous substitutions

					InnateDB – Facilitating Systems Level Analyses of the Mammalian Innate Immune Response

David Lynn M.Sc., Ph.D., Research Associate, Brinkman Lab., Simon Fraser University & Hancock Lab., University of British Columbia.

InnateDB & Data Analysis Workshop - UBC, Vancouver. April 2nd & 3rd 2008. Updated Sept. 2009.

Systems Biology Approaches to Investigating the Innate Immune Response:



Although progress has been made in understanding the innate immune response including the detailed dissection of some of the critical signaling pathways involved.



Now becoming clear that the innate immune response does not involve simple linear pathways but rather complex networks of pathways and interactions, negative feedback loops and multifaceted transcriptional responses.



To better understand the complexities of the innate immune response and the crosstalk between its components, complimentary systems level analyses and more focused follow-up experimental approaches are now needed.

InnateDB Developed in the Context of Two Large International Systems Biology Projects
Mouse Model Datasets: Cerebral Malaria mouse model (IMR, Australia) Tuberculosis mouse model (AECM) Shigella xenograft model (Pasteur) Human Clinical Datasets: Typhoid & Malaria Vietnam (OUCRU/Stanford/Sanger) Non Typhoidal Salmonella Malawi (Sanger) Chronic/Acute Helminth Ecuador (USF de

+

Quito/Sanger)

Modulating innate immune response via Host Defense peptides (Hancock lab, UBC) Mouse KOs (Sanger)

Novel insight into host response and mechanism of peptides. Common Pathways, networks and transcriptional regulation?

Why Systems Approaches are Needed:



 

Many layers of complexity: Layers of regulation: 100s – 1000s DE genes: Not simple pathways  networks of molecular interactions. Gardy*, Lynn*, Brinkman, Hancock. Enabling a systems biology approach to immunology: focus on innate immunity. Trends in Immunology June 2009.



The Need for InnateDB & the Manual Curation of Innate Immunity Relevant Molecular Interactions & Pathways.


Quickly apparent that available resources provided poor coverage and detail of the molecular interactions and pathways relevant to innate immunity.
This information is essential for the systems-orientated interpretation of large scale genomics data. TLR4  one of the most important molecules in the innate immune response, has relatively few molecular interactions annotated in the major publicly available interaction DBs. 5 of these DBs combined contained annotated molecular interactions between TLR4 and just 11 other proteins. Through a review of the literature we have curated, in detail, a further 16 unique interactions, and provided annotation of nearly 60 different lines of evidence supporting these interactions. Relatively new pathways (NLR, RLR pathways) not annotated at all in major pathway databases. Few resources available for analysis of data in a pathway/network context that were accessible to a biologist. No resources for innate immunity.













Overview of InnateDB Project (www.innatedb.ca)


InnateDB (www.innatedb.ca) is a database of all human and mouse experimentally-verified interactions and pathways (& their component molecules – Genes/Proteins/RNAs). Particular emphasis on the contextual manual curation of interactions involved in innate immunity (10,000 intxns). InnateDB facilitates systems-level analyses of mammalian signaling through integrated bioinformatics and visualization tools: pathway & ontology analysis, network construction & analysis, orthologs, Cerebral, Cytoscape, CyOOg, etc. Manual curation project & integration of publicly available databases into InnateDB greatly increases innate immunity relevant molecular interaction networks & pathways.









Enable biologists without a computational background to explore their data in a more systems-oriented, yet user-friendly, manner.

Contextually Curating Innate Immunity-Relevant Interactions


Manual curation > 10,000 innate immunerelevant interactions (human and mouse). Involving 2,700+ genes from review of 2,600+ unique publications. We can often double # of interactions for a given gene. Pathways & interactions are curated with contextual annotations  (supporting publication; participant molecules; the species; the interaction detection method; the host system; the interaction type; the cell, cell-line and tissue types etc). Developed InnateDB submission system software to allow submission of interaction annotation in an ontology-controlled and MIMIx & PSI-MI 2.5 compliant manner. Developed curator tool software to allow curators modify existing annotations.











Going Beyond Innate Immunity – A Centralized Resource for Interactions & Pathways
Aside from the well known signalling pathways  a range of other disparate processes, including apoptosis, ubiquitination, endocytosis, cell activation and recruitment  all required to mount effective innate immune response. Adding to this complexity  borders between the innate and adaptive immune responses are becoming increasingly blurred. Furthermore, if we hope to identify new networks or pathways involved in innate immunity, analyzes must include genes and proteins that are, as yet, not known to play specific roles in the innate immune response. To address these issues  InnateDB also incorporates data on the entire human and mouse interactomes.









Going Beyond Innate Immunity – An Integrative Biology Resource


115,000+ human and mouse interactions extracted & loaded from BIND, INTACT, DIP, BIOGRID & MINT DBs. Cross-referenced genes to >3,000 pathways from KEGG, PID, BIOCARTA, INOH, NetPath & Reactome DBs.  Allows one to visualize/analyze interactions associated with specific pathway.  Pathway ORA. Annotation from Ensembl provides details of human & mouse genes, transcripts and proteins. UniProt, Entrez, Gene Ontology  rich protein & gene annotation.







Through manual curation & integration of existing data from publicly available databases we can greatly increase innate immunity relevant networks

TLR4 direct and secondary interactions annotated by MINT Database

TLR4 direct and secondary interactions annotated by InnateDB

Direct and Secondary Interactions of TLR4 in InnateDB (~20% of these interactions unique to InnateDB)

www.innatedb.ca

InnateDB – Advanced Yet User-Friendly Searching – Find & Analyze Relevant Interactions, Pathways & Genes/Proteins.

Search for interactions by: Gene, Pathway, Interaction type (phosphorylation), Detection method (GST Pull Down), Cell/Tissue type, Protein-Protein, Protein-DNA/RNA etc etc.

Rich contextual annotation for each interaction. Find orthologous interactions in other species. Download in variety of formats (e.g. PSI XML)

- Visualize & Analyze interactions in Cerebral - Cytoscape plugin - Uses subcellular localization annotation to generate more biologically intuitive pathwaylike layouts of networks.

InnateDB – Facilitating Systems-Level Analyses of Gene Expression Data:
Upload Your Own Gene Expression Data - Up to 10 conditions/timepoints at 1 time. Overlay Gene Expression Data from Multiple Conditions on Networks/Pathways Pathway, Gene Ontology & TF ORA tools Find – DE Pathways/Functionally Related Genes/TFs

Go Beyond Pathway Analysis – Differentially Expressed Sub-networks – New Pathways? How Are DE Genes Actually Inter-connected? Central Regulators (Network Hubs)

Pathway Analysis – Any type of Quantitative Data.

InnateDB  3,000+ human & mouse pathway annotations  all processes not just immunity.

Orthologous Pathways

GWA Candidate Associated Genes InnateDB pathway analysis: identify OR pathways. highlight potentially unknown relationships between makers on different chromosomes.

Constructing & Analyzing Networks Using InnateDB


Pathway analysis can be very powerful in determining which annotated pathways are most significantly associated with DE genes.



Network analysis  move from simple view of the signaling response to a more comprehensive analysis of the molecular interactions between DE genes and their encoded proteins & RNAs.
Potentially uncover as yet unknown signaling cascades or pathways, functionally relevant subnetworks and the central molecules, or hubs, of these networks.



Results: Visualize Gene Expression Data in an Interaction Network Context
Click here to visualize interactions in Cerebral with overlaid gene expression data.

Multi-experiment View in Cerebral

Robust Orthology & Gene Order Predictions – Facilitating Comparative Analysis



Majority of mammalian interaction data available in InnateDB and other interaction databases primarily refers to human genes and proteins. To facilitate comparative network-based analysis of the human, mouse and bovine interactomes, detailed orthology predictions have been integrated into InnateDB. Orthology predictions generated using an inhouse method, Ortholuge, which provides accurate predictions of orthology using a phylogenetic distance-based approach. Orthology predictions are further supported through the development of a human and mouse gene order and synteny browser.







A Guide to Using InnateDB

InnateDB – User Friendly Interface www.innatedb.ca

Not sure what you want to search for?… Browse InnateDB by Interaction Type, Pathway or Various Immune Gene Lists

All InnateDB Interactions Can be Downloaded in Proteomics Standards Initiative (PSI) 2.5 XML Format

Resources Page: Details of Relevant Software, Databases, and Immune gene Lists

Statistics on Curated Interactions & Interactions from other Databases

Use contact form or send email to innatedb-mail@sfu.ca to report bugs, errors or to get involved in curation.

Documentation, Tutorials & Help

Searching InnateDB

Do a simple search for genes, proteins or interactions of interest on the InnateDB hompage e.g. IRAK genes.

Advanced Search for Genes & Proteins

Search using a wide variety of terms Construct more complex queries using Boolean operators

Advanced Search for Interactions

InnateDB contains detailed Return direct interactors or their secondary information for more than interacting neighbors. 115,000 human and mouse molecular interactions integrated from several of the major public interaction databases along with 10,000+ manually-curated innate Allows users search for interactions in immunity relevant interactions. particular cell/tissue types, particular interaction types e.g. phosphorylation, or experimental type e.g. coimmunoprecipitation.

To reduce redundancy, interactions in InnateDB that have the same participants and interaction type are grouped together by default. Choose 'No' to return all redundant interactions separately.

Search for Particular Interactions or Genes that are in a Specific Pathway

Select a pathway from the list or search more than 3,000 pathways by typing the pathway name in the box. Pathways and the interactions in them By default, a list of molecular are are returned which is interactionsspecies specific. restricted to interactions only between annotated members of the pathway. If "No" is selected, a more comprehensive list of interactions is returned, displaying interactions between pathway members and all other molecules with which they interact. .

Allows users to sort large set Allows users to download of results by available fields Search Results: searching for genes of interest e.g. IRAK results to Excel file or Tab/CSV text files.

Display columns not shown by default.

Search Results: searching for genes of interest e.g. IRAK

View interactions involving this gene and its encoded proteins

Interaction Results Page.

View evidence supporting this interaction & contextual details e.g. cell type etc

View evidence supporting this interaction & contextual details e.g. cell type etc

There may be multiple evidence references for an interaction.

Visualize Interactions in a subcellular localization-based layout using the Cerebral plugin for Cytoscape.
Click here to visualize interactions in Cerebral

You must have a recent version of Java installed.

How a biologist thinks of a pathway ….

Pathway Visualization in Cytoscape

Pathway Visualization using Cerebral
(Bioinformatics 2007)

www.pathogenomics.ca/cerebral

A Quick Guide to Using Cerebral in InnateDB


Cerebral can be used to visualize interaction networks from a set of interactions from InnateDB. Cerebral uses subcellular localization annotations to provide more biologically intuitive pathway-like lay-outs of interaction networks.





Note: the subcellular localizations in Cerebral should only be used as a guide. There are many proteins with no annotated subcellular localizations and many others that have multiple possible localizations (only 1 will be shown, nuclear, extracellular and membrane localizations will take precedence over cytoplasm if there are multiple).
InnateDB batch searching allows users to upload a list of genes along with associated gene expression data from up to 4 different conditions. Gene expression data can be overlaid on network data and you can visualize this in Cerebral.





Opening Interaction Data in Cerebral from an Interaction Results page in InnateDB.


You will be prompted to open a .jnlp file. You are recommended to save this file to your computer and then open it – this will allow you save a copy of this dataset. Opening the .jnlp file directly without saving sometimes causes Cerebral to hang when loading large datasets. Note: to use Cerebral you need to install Java version 6 or greater. You can get this from http://java.com/en/download/index.jsp









Opening Cerebral



Cerebral is a Java plugin for the Cytoscape Visualization software. When you open the .jnlp file Cytoscape will begin downloading. You will then be prompted – “Do you want to run the application” – click Run.





Cerebral is Now Open and Displays Interactions Based on Protein Subcellular Localizations

Re-size the Network

Click here to re-size the network display to fullscreen.

Navigating in Cerebral


Right click and push your mouse forward or back to zoom. Hold middle button of your mouse and drag to navigate around the network. Grey nodes do not have an annotated subcellular localization (from Gene Ontology data in InnateDB). Lines connecting nodes represent interactions. Dashed lines have only 1 supporting publication in InnateDB. The thicker the line the more publications support the interaction.







Interactively Link back to InnateDB to Look up Information on Particular Genes/Interactions of Interest.



Right-click on a node (protein/gene) or edge (interaction line) to link to the relevant gene or interaction details page in InnateDB.

Nodes Can be Dragged to Other Layers as Desired.

Click on & drag nodes to other layers.

Click “Node Attribute Browser” and select from list to see node attributes.

Do a simple search for genes, proteins or interactions of interest on the InnateDB hompage e.g. IRAK genes.

View Detailed Gene Annotation.

Gene Details Page.

Click here to see protein details page with proteinspecific annotation.

Gene Details Page – Molecular Interactions & Gene Ontology Annotation.

Integrated Orthology & Gene Order Information Ortholuge method uses a phylogenetic distance-based Mouse & Cow Orthologous approach to predict orthologs. Genes

Click here to show human & mouse gene order for SSD orthologs are wellthis gene. supported orthologs.

Human/Mouse Conserved Gene Order & Synteny Browser

Gene Details Page – Associated Pathways

Gene Details Page – Cross-references to other Databases

Integrating Gene Expression Data in a Molecular Interaction Network and Pathway Context

InnateDB – Integrating Gene Expression Data in a Molecular Interaction Network and Pathway Context
Integrated Gene Expression Data with Molecular interaction data Pathway associations Rich gene annotation Microarray Data

Batch Search of InnateDB

 Differentially expressed Genes

Orthologs in Human, Mouse or Cow

Orthologous Interaction Networks

 

Detailed protein/gene interaction data mainly available for human. Can use InnateDB ortholog predictions in mouse and cow


Build the hypothetical orthologous interaction network for genes of interest in these species. Find associations to pathways for orthologous genes e.g. map pathways to mouse genes based on human orthology. Predict potential differences in different species e.g. missing orthologous gene in one species  may indicate reliability as model organism for network of interest. Compare orthologous predicted networks to experimental data e.g. in mouse.







Example Tab-delimited File:

P values associated with gene expression values for condition 1 (column 4) Ensembl Gene IDs Use these as the cross-reference IDs to link your data into InnateDB (column 2) Fold change in gene expression value for condition 1 (column 3)

Upload Gene/Protein List to InnateDB Along with Any Associated Quantitative Data

Select a file to upload by clicking on the "Upload File" button - upload a tab-delimited file of protein/gene identifiers or accession numbers and obtain a list of all genes, proteins, pathways, interactors or interactions that they are associated with. Alternatively, click on the "Web Form" button and paste your tab-delimited data in the text box (max. 1000 lines)

Results: Visualize Gene Expression Data in an Interaction Network Context

Click here to visualize interactions in Cerebral with overlaid gene expression data.

Open Cerebral as previously shown.

Multi-experiment View in Cerebral

Mini-windows show overlaid gene expression data for each condition. Red = upreg. Green = downreg. Set thresholds for differentially Click on one of the expressed genes mini-windows to view data for condition in large window.

Click on these buttons in 2 different miniwindows to display changes in gene expression from 1 condition to another in the bigger window.

Cerebral – Multi-Array Viewer

Click “Node Attribute Browser” and select from list to see node attributes.
Uncheck this box if changes are not shown correctly.

Cerebral – Multi-Array Viewer
Click here to display graphs of change in expression for each gene. Clicking a line highlights that gene in the network. K-means clusters of genes with similar patterns of gene expression.

Interactively Link back to InnateDB to Look up Information on Particular Genes/Interactions of Interest.

Pathway Over-representation Analysis

Return Pathways Associated with Uploaded Gene List:



To do pathway over-representation analysis (ORA) you first need to upload a list of gene identifiers and associated fold-change in gene expression values (and P values) as described above. InnateDB recommends that you to upload All genes from your array dataset not just differentially expressed (DE) genes (probes mapping to multiple different genes should be removed). The pathway ORA tool uses the proportion of DE genes on the whole array to determine if a particular pathway is significant. As the above method can be very conservative due to the large number of tests performed InnateDB also provides users with the option of uploading a subset of genes and performing the pathway ORA analysis. This subset analysis uses a slightly different algorithm that does not take gene expression values into account. This is necessary as the algorithm does not know the proportion of DE genes on the array. Therefore, this analysis cannot handle data from multiple conditions. If you have multiple probes for the same gene these values will be averaged for the purposes of the pathway ORA. Because InnateDB sources its pathway data from multiple databases, each with its own interpretations of the components of a given pathway, you will observe some degree of duplication in the results; however, this is outweighed by the extra annotation that can be obtained from different data sources.









Pathways Associated with Uploaded List

Click here to do pathway overrepresentation analysis.

Choose Parameters for Pathway ORA

Choose fold-change in gene expression threshold (determines which genes are considered differently expressed) Default = +/- 1.5. Choose P value threshold associated with each fold-change in gene expression value. (determines which genes are considered differently expressed) Default P < 0.05. Several different statistical methods are available to determine if pathways are significantly associated with DE genes Hypergeometric, Fisher & Chi Square. Two options to correct for multiple testing are included - The Benjamini & Hochberg correction for the FDR and the more conservative Bonferroni correction.

Pathways significantly associated with up-regulated genes.

Display Pathways significantly associated with downregulated genes.

Click here to see a summary of the pathway and what genes are DE.

Pathway Summary Page

KEGG pathway diagrams can be dynamically linked to overlaying gene expression data

Click here to visualize pathway interactions in Cerebral with gene expressio data overlaid.

Note: If a gene belongs in a pathway but has no interactions it will not be shown.

Acknowledgements – The Bioinformatics Team


Overall Project Management:  Bob Hancock  Brett Finlay  Lorne Babiuk  Bernadette Mah
Bioinformatics & InnateDB Management:  Fiona Brinkman  David Lynn InnateDB Database Development/Data Loading:  Matthew Laird  Nicolas Richard  Fiona Roche  Timothy Chan  Michael Acab



InnateDB Submission System & Curator tool:  Calvin Chan  Naisha Shah
Cerebral – Pathway Visualization Software:  Jennifer Gardy  Aaron Barsky  Tamara Munzner Orthologs & Gene Order:  Dan Tulpan  Matthew Whiteside  Mark Sun  Matthew Laird  Matthew Whiteside Systems Administration:  Matthew Laird (SFU)  Timothy Chan (UBC) Meta-analysis Team: David Lynn, Chris Fjell, Jennifer Gardy, Karsten Hokamp, Nicolas Richard, Avinash Chikatamarla.











InnateDB Search Engine & User Interface:  Geoff Winsor




Manual Curation:  David Lynn  Misbah Naseer  Jaimmie Que  Melissa Yau  Raymond Lo




				
DOCUMENT INFO