GO 2010 REPLY to class excercise by H8tRHS8


									                                 GENE ONTHOLOGY

The Gene Ontology (GO) project is a major bioinformatics initiative with the aim of
standardizing the representation of gene and gene product attributes across species and
The project provides a controlled vocabulary of terms for describing gene product
characteristics and gene product annotation data from GO Consortium members. A large
amounts of tools were developed to access and analytical process this data

In this class we will introduce you to the GO concept as presented by

Currently there are 30385 terms, 99.2% with definitions.

18932 biological_process
2734 cellular_component
8719 molecular_function

1. Why each of the branches of GO has a very different number?______

This is related to the different levels of resolution in the ontology. The “cellular
component” refers to physical entities like Mitochondria, Nucleus. The “biological
process” is far more detailed like “entry to mitochondrion” “negative regulation on entry
to mitochondrion” “positive regulation on entry to mitochondrion” etc etc.

The result is the GO annotation for cellular component produce a shallow DAG (diacyclic
graph) and the other branches are very deep (may reach 15 levels)

2. Let’s start… Go to the AmiGO (from the list of tools in the right section of the page)
and ask for SNARE gene or proteins. You know that SNAREs are set of proteins that
function in vesicle trafficking in all organisms.

How many SNARE proteins you had identified ?        How many from mice only (mus

122 results for snare in genes or proteins fields symbol, full name(s) and synonyms
Only 4 results for snare that is Exact (in genes or proteins fields symbol, full name(s) and

Mouse: 9 results for snare in genes or proteins

3. Did you find all SNARE related proteins. Check the “snare” as GO TERM (instead of
gene or protein) - can you explain why these lists are not identical

27 results for snare in terms fields term accession, term name and synonyms

The ontology and the gene are not the same concept. Some terms may have tens of
proteins and some terms will have only one protein.

4. Select all ‘SNARE’ that you found in mouse by search for “genes or proteins” and
activate the “as input for enrichments” . This activates oe of many supported tools

The tools will provide the GO TERMS that are ENRICHED relative to randomly selected.
To allow the calculation to be accurate use the MGI as a background.

When this is applied (in the default parameters of GO and MGI (=the most complete
resource for mouse genes) the results are

Database filter(s): MGI
The p-value cutoff was: 0.01
The minimum number of gene products: 2
There were 29 result(s) clearing the threshold values and 76 not.

What is the enrichment level of “vesicle-mediated transport”

GO:0016192 vesicle-mediated transport
P value is 1.23e-07
With the sapmle frequency of 6/9 (66.7%) and these genes are
Use1 Gosr2 Napg Gosr1 Napa Napb

5. Use the graphical view and indicate the terms that are most significant in each of the
GO tree
Graphics of the 3 branches

The p value of GO:0015031 ( protein transport ) is the most significant
P value - 7.11e-14
9/9 are in the list but the best statistics is a result of the numbers of proteins that are
annotated by this term. For example, in
GO:0006810 (transport ) there are also 9/9 in the input list by the P value is only

The reason is that the protein transport is more specific than the ‘transport’ and found in
706 proteins and not in 2558/34030 . (all these numbers appear in the result table).


To top