A Gene3D Tutorial (Tutorial Running Time = 30-45 minutes complete by techmaster


									                                    A Gene3D Tutorial

(Tutorial Running Time =~ 30-45 minutes complete, 10 minutes reduced)


This tutorial is designed to give you a preliminary introduction to Gene3D. At first glance
Gene3D can be a bit overwhelming. Gene3D is not a simple domain database, neither is
it a simple gene family database, and the website can do a lot of things as well. This
tutorial should help guide you in where to start and what Gene3D can do for you.

Background: the data in Gene3D.
Gene3D is built upon the BioMap sequence database. This sequence database consists
of UniProt (including the genome sets obtained from Integr8) and extra sequences from
various functional resources (including KEGG and GO).

These sequences are annotated using functional data from GO, COGS and KEGG. We
scan the CATH domain database against the whole sequence database and add Pfam
domain family data for UniProt sequences. Also, where available, we add protein-protein
interaction data from BIND and MINT.

We have also clustered the sequences into whole-chain protein families. These families
should show good conservation of function and structural features.

The complete genomes that we present are obtained from Integr8 at the EBI. They are
mostly correct, but for some genomes (i.e. rat) you may want to check on whether they
are actually complete or not.

Tutorial Structure.
This tutorial contains 10 brief lessons below. If you do these in the order presented, you
should gain a solid understanding of how to do nearly anything in Gene3D. If you're in a
rush or you just want to get started, stick to the 'Basic Query' lessons.

Querying Gene3D
Whilst you can begin your Gene3D investigation with almost any query term, for this
tutorial we are going to start with a single protein: VAV_HUMAN.

(1) A Basic Query I – A Single Protein.
* Type 'vav_human' into the top most search box on the front page and press return to
get back a 'BioMap Protein View'
* Scroll about the results page a bit and view the data.
* If the “mouse-over activated pop-up displays” aren't working too well in your browser,
don't worry. You can get this information elsewhere.
* All identifiers (i.e. UniProt codes) link back to the source database. Try one to see.
* You will also see blue search tags. These extract related information from Gene3D.

-> This is a 'default' search and returns you some basic information on this protein. This
type of view is the 'Detailed Output' and lies at the heart of using Gene3D. As you will
now see it is very flexible and feature-rich.

(2) Advanced Searching I – The Pink Search Box (PSB).
* Look at the pink search box to the left. As you can see it has been filled in with the
values of the search you just did.
* Click on Domains/Features. This is not a default option due to performance and HTML
* Press search again.
* Scroll down or click the 'Features' tag to view the domain architectures.

-> In general, you can control what information is displayed in the Detailed View. This
can help you get an uncluttered screen or speed up the query (slightly).

(3) Advanced Searching II – Protein Families.
* Looking at the results for VAV_HUMAN (P15498), you'll see that it has been assigned
a ten part Gene3D protein family code ('G3DPF:'; for details see ...).
* To look at the whole family click on the first part of the code ('22568').

-> You can now see aggregate information for the family that includes VAV_HUMAN.

* Try viewing a sequence alignment.
* Scroll down to the bottom of the screen or click the 'Sequence' tag.
* Press the 'html' option and wait about 2 minutes. Press the back arrow on your browser
once you've had a look.
* You can select a subfamily from within a family by either: (a) Clicking on the
appropriate segment of the family code in the 'G3DPF' column or (b) Entering the code to
the level you want in the PSB (i.e. 'G3DPF:22568.').
* You can also see all the family members with the same architecture by clicking the
'blue' search button next to the architecture you're interested in. Give it a try.

-> Important note: when entering the family code it MUST be preceded by the term

(4) Advanced Searching III – By domain.
* In the PSB type 'G3DSA:', select all the data types and click search.

-> You can now see all the proteins and architectures in which this domain is found. you
can do this type of query with either Pfam accessions/IDs or with CATH codes. CATH
codes must be preceded by 'G3DSA:' (for the same reason that our families must be
preceded by 'G3DPF:'.

(5) Advanced Searching – By Functional Terms.
* By now I'm sure you've got the drift. Try a COG term (i.e. COG0515) on the front page.
Note this COG has a lot of domain architectures and so may take a while to load on

(6) A Basic Query II – Summary Views
* Type 'G3DPF:22568' into the PSB (pink search box).
* Tick the 'Summary Output' Options.
* Set the drop down 'Cluster Level' menu to S90.
* Press 'Search'.

-> This returns a set of summary views for this family. The family has been split at the
S90, or 90% sequence identity level, for the purposes of generating the summary. By
varying the cluster level you can control the specificity of the summarised groups.

(7) A Basic Query III – XML Views
* Type 'VAV_HUMAN' into the PSB.
* Select 'XML Output' and hit search.

-> You will now see the XML view of this protein. This view contains the complete
information from BioMap for the protein or set of proteins that you are querying. Use this
view if you have low bandwidth, wish to download the data or want some specific
information that is not being displayed by us.

-> XML views can be directly accessed from within the search pages by clicking the XML
button. Or you can directly download the file by right-clicking your mouse on it and
selecting 'Save link as'. This XML file will contain all information, not just that which you
search for or is being displayed.

(8) A Basic Query IV – Taxon IDs.
* Type '9606' into the Taxon Query box and press return.
* Check out the Gene3D summary for human.

-> You can download two data files: (a) An XML file with all data for all proteins (in the
style described above in 6.) and (b) a domain feature file (“Feature list”) that has all the
domains assigned by CATH and Pfam for each protein.

-> You can also see all the genomes in Gene3D by clicking the “Browse Genomes” box
at the top of the site.
(9) A Basic Query V – BLASTing a sequence.
* Click the BLAST tab at the top.
* Enter a sequence in FASTA format in the main box. Try Q9XA16 from UniProt if you
don't have a sequence to hand.
* Select the max number of hits or the max E-value threshold (or leave as they are – the
defaults should be fine)
* Press 'Search'.

-> If there is a perfect match you will be taken straight to the sequence.

* Now go back and delete a couple of letters from the protein sequence.
* Run the search again.
* Now you get a list of proteins. Pick the one you feel is most similar and investigate
(probably Q9XA16).

(10) Advanced Searching IV – Batch searches
* Type 'Q9XA16, Q9XA19, VAV_HUMAN' into the PSB.
* Select 'Detailed Results' and press search.
* Have a look about and you can see the data for all three of these proteins.

-> List searches can be of any number of terms (though you probably don't want to go
crazy), but they MUST be of the same type. You can't mix domains with proteins and
COG terms ... yet.

Hooray! If you've worked your way through all of this you are probably a Gene3D expert.
Keep trying new things, playing around and get comfortable. Opening links in new
windows or tabs is useful for multi-layered querying.

You can copy out lists of interactions partners, put them straight in the query box and
search for GO terms, thereby finding out what processes your protein interacts with. And
a whole other range of useful feats.

A quick note of warning: It's very easy to accidently do an enormous query that will take
some time process. Give it a chance; if it's taking too long send us an E-mail and we'll

If you've any other got problems or you want more, please get in touch and let us know.
E-mail: gene3d@biochem.ucl.ac.uk

Best wishes,
Gene3D - Corin Yeats
&& BioMap - Michael Maibaum

To top