Docstoc

gene

Document Sample
gene Powered By Docstoc
					                                                                                                                      1


                                                                                          Created by Paul Yenerall, 2010




                           How to access gene records in NCBI
       The purpose of this tutorial is to help teach students how to access different types of records in
       NCBI and explain certain important items displayed by these records. There are multiple types of
       records NCBI contains, such as nucleotide sequences and protein sequences. This tutorial will
       focus on accessing nucleotide and protein sequences, but will briefly explain other features of
       NCBI.

       For questions, corrections, or help with this tutorial, please e-mail paulyenerall@gmail.com or
       lzhou1@pitt.edu

                          Accessing a nucleotide and/or protein sequence

1) Open up an internet browser, preferably Mozilla Firefox, and go to the URL:
http://www.ncbi.nlm.nih.gov/. A page such as below should be displayed:
                                                                                                       2


2) For illustrative purposes, we will access the nucleotide sequence and protein sequence for the
human (Homo sapiens) gene BRCA1. First, type BRCA1 in the search bar located at top. Next, click the
drop down box above the search box you just typed BRCA1 into, as illustrated below. Because BRCA1 is
a gene, select “Gene” from the dropdown box then click search, as illustrated below:




The page below should be shown:
                                                                                                         3


Note: If you do not specify what type of search you want to do (i.e. Gene, Genome, Protein etc.,) from
the dropdown box, you will be taken to a general page that asks you what type of search you want to
perform.

Before we continue, let us narrow our search. While this isn’t always necessary, sometimes you may
need to limit the number of returns from a search if you are doing a very broad search. To do so, click on
the “Limits” tab below the search bar, as illustrated below:




The following page should be displayed after clicking limits:
                                                                                                           4


Take a moment to review this page and look over all the possible limits. Depending upon how big your
search is, you can use a multitude of constraints to limit your results. For instance, if you are searching
for all genes on one chromosome, you can use the option “Limit by Chromosomal Region”, select which
species you want to search for, and select which nucleotides you want to search between. For this
tutorial, we will use some basic restraints relevant to our gene. Check off all the boxes illustrated in the
next picture:




Because we have the specific gene name (BRCA1), we select the specific field “Gene Name”. Because the
BRCA1 gene is heavily researched in humans, we select for RefSeq status the field “Known” because we
only want the known (verified by humans – as opposed to predicted which is predicted using ab initio
gene finders) isoforms. For the Taxonomy (species classification) section, as stated before, we want to
look at the BRCA1 in humans (Homo sapiens) so we select this box as well.
                                                                                                           5


Once you have selected all these limits, hit the “Go” button to begin your search. The following page
should be displayed (note: normally you will have a list of options if you do not put specific limits on
your search, but because our limits were specific we went directly to the record for BRCA1 in Homo
sapiens):




The top of this page shows us some basic information about BRCA1 (the summary). Take a moment to
review this page. The most important information displayed on the top of this page is the “Official
symbol” (what the genes symbol is), “Official Full Name” (full name of the gene) “Lineage” (what family
this species comes from) as well as a summary of BRCA1’s functions (“Summary”) and other names for
BRCA1 (“Also known as”)




3) Before we view the nucleotide and protein sequences for BRCA1, we will spend a moment reviewing
other parts of this page as it contains more crucial information about the gene BRCA1. Aside from the
obvious things (such as the Full Name, Gene Type, Lineage etc.,) listed in the Summary, other
information will be reviewed on this page. Scroll down to the portion of the page called “Genomic
regions, transcripts, and products” as shown below (next page):
                                                                                                              6




Each section highlighted is briefly explained below:

        1 – This part tells us which strand the gene is located on, the positive (forward) or minus
        (reverse or negative) strand.
        2 – This part tells us at which nucleotides the gene BRCA1 is located. In this case, BRCA1 is listed
        between nucleotide 41196314-41277468 in the species Homo sapiens.
        3 – This part tells us how many protein isoforms there are, the name of the isoform, and type of
        isoform. In this case, we have 11 different protein isoforms of BRCA1 in Homo sapiens

Now, scroll down to the next section entitled “Genomic Context” as shown below. This area tells us
where our gene BRCA1 is located, as well as genes in close proximity to BRCA1.




Each section highlighted is briefly explained below:

        1 – This part tells us which chromosome our gene is located on (remember, eukaryotes
        generally have multiple linear chromosomes, whereas prokaryotes generally have one - but
        sometimes two - looped chromosomes)
        2 – These parts indicated show genes close to the position of our gene on the same strand as
        our gene (the minus or reverse strand). You can tell these genes are located on the reverse
        (minus) strand because the arrow is pointing to the left.
        3 – This part tells us genes close to the nucleotide position of our gene but on the positive
        (forward strand). You can tell these genes are located on the positive strand because the arrow
        is pointing to the right.

There is a lot of other relevant information listed on this page, but most is self explanatory and will not
be highlighted in this tutorial. Note, however, that on this page you can find related PubMed articles to
                                                                                                         7


this gene, GeneRIFs (the noted functions of the gene), genes that interact with this gene
(“Interactions”), phenotypes (observable characteristic trait of an organism), genotypes, pseudogenes,
orthologs, as well as a description of all the isoforms (“NCBI reference sequences – RefSeq”).




4) Now, let’s obtain the nucleotide and protein sequence for BRCA1. Go back to the section entitled
“Genomic regions, transcripts, and products”. For this tutorial, we will view the nucleotide and protein
sequence for the first listed isoform, isoform 3. To obtain the nucleotide sequence for BRCA1 isoform 3,
click on the blue lettering labeled “NM_007297+2” and then from the pop-up box that appears, select
FASTA, as illustrated below:
                                                                                                        8




The following page should be displayed after clicking on FASTA:




        1) The part labeled “>gi|63252874|ref|NM_007297.2| Homo sapiens breast cancer 1, early
        onset (BRCA1), transcript variant BRCA1-delta2-10, mRNA” is the header for the BRCA1 gene,
        and does not contain any of the nucleotide sequence. It is used as an identifier and contains
        information about the gene such as the gene’s full name. It should always been included.
        2) The part that contains “CTTAGCGGTAGCCCCTTGGTTTCCGTGGCAACG……” is the nucleotide
        sequence for BRCA1 isoform 3.




We now have the nucleotide sequence; now let’s obtain the protein sequence for BRCA1 isoform 3. Click
the back arrow to leave this page and return to the previous page. Make sure you’re still looking at the
section labeled “Genomic regions, transcripts, and products”. Click the red text on the right side of this
section, labeled “NP_009228+1 isoform 3”, then from the pop-up box that appears, select FASTA, as
illustrated below (next page):
                         9




(results on next page)
                                                                                                             10


The following page should be displayed after clicking on the FASTA link:




        1) Just as before, the area containing “>gi|6552305|ref|NP_009228.1| breast cancer 1, early
        onset isoform BRCA1-delta2-10 [Homo sapiens]” is simply the header and gives us a description
        of the protein BRCA1-delta2-10, not the protein sequence, but should always be included
        2) The part that contains “MNVEKAEFCNKSKQPGLARSQ…..” is the protein sequence for BRCA1
        isoform 3.


5) Now that we have both the nucleotide and protein sequence, let’s save this page in case we wish to
access it quickly later. Note: this is optional, but in a large project, such as a gene annotation project, it
can be a huge time saver. To save a result, simply click the link “Save Search” to the right of the search
bar, as illustrated below:




On the right side of the page that comes up, click to create an account (if you don’t have an account).
Follow the steps provided to create your account and then save the entry. If you have an account, log-in
(if you’re not already logged in) so you can save this entry. Saving entries such as this allows for quick
access.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:8/12/2012
language:
pages:10