Docstoc

BioPerl_Genbank

Document Sample
BioPerl_Genbank Powered By Docstoc
					Querying NCBI Entrez and Extracting Sequence Features

 Today's Topics:

   Review of NCBI Entrez database queries
   Bio::DB::GenBank methods
      Querying Entrez databases

   Bio::SeqFeature methods
      Parsing GenBank sequence files




                                                    1
 Review of BioPerl Objects



                    Sequence File




Bio::SeqIO stream


Bio::Seq object




                                    2
            Relationships between Objects

GenBank Connection, Query

                                      Sequence File
          Bio::DB::GenBank


             Bio::SeqIO stream


              Bio::Seq object


             Bio::SeqFeature object


                                                      3
                     Bio::DB::GenBank
 We've used Bio::SeqIO->new() to connect to a sequence
  file and the SeqIO method next_seq() to retrieve sequence
  objects

 Another way to get a "stream" of sequences is to execute
  a GenBank query from a Perl script, and use the resulting
  Bio::SeqIO object

 The object hierarchy is:

 Bio::DB::Genbank has a method that returns a Bio::SeqIO object
 Bio::SeqIO has a method that returns a Bio::Seq object
 Bio::Seq has methods that return Bio::Seq objects, strings, or
  numbers



                                                                  4
                    Bio::DB::GenBank Example
use Bio::DB::GenBank;
# Initialize GenBank connection, set retrievaltype to 'tempfile' to avoid
# having all results loaded into memory
my $gb = Bio::DB::GenBank->new(-retrievaltype => 'tempfile');
my $outSeqIO = Bio::SeqIO->new(-file => ">Qry.gbk",
                                   -format => 'GenBank');
my $query = 'saccharomyces cerevisiae[orgn] AND gene NOT patent';
my $qry = Bio::DB::Query::GenBank->new(-query => $query,
                                            -db => 'nucleotide');
my $inSeqIO = $gb->get_Stream_by_query($qry);
my $nseqs = 0;
while( defined (my $seqObj = $inSeqIO->next_seq )) {
  $nseqs++;
  $outSeqIO->write_seq($seqObj);
}
print STDERR "Downloaded $nseqs sequence records\n";
                                                                        5
                  Info on Entrez Databases
 The NCBI Bookshelf contains the NCBI Help Manual, and within
   that is an Entrez Help document:
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpentrez&
   part=EntrezHelp
This document describes the 40 Entrez databases and the
   Searching Options for use in forming queries, Indexed Fields
   (e.g. [orgn], [Publication Date]).
 Efetch documentation:
http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_h
   elp.html
 Entrez searches from the NCBI website can be saved in a "My
   NCBI" account:
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpmyncbi


                                                            6
            GenBank Sequence Features Review
In a GenBank file there are typically many attributes associated
with each sequence, such as Accession Number, Organism,
Keywords, Authors, Journal, and a List of Features
Each Feature has a Key, a Location and one or more Qualifier
Tag/Value Entries, e.g.:

source    1..1191
           /country="China: Guangzhou"
           /mol_type="genomic DNA"
           /organelle="mitochondrion"
           /db_xref="taxon:10092"
           /organism="Mus musculus domesticus"
 tRNA      <1..21
           /product="tRNA-Glu"
  CDS      join(complement(11342..11776),complement(11844..12174))
           /protein_id="AAQ08536.1"
           /transl_table=2
           /note="TAA stop codon is completed by the addition of 3' A
residues to the mRNA"
                                                                        7
                        Bio::SeqFeature
Qualifier Tags can have multiple values, e.g:

  CDS          complement(82..1143)
               /db_xref="GI:109156643"
               /db_xref="GeneID:4126916"

Bio::SeqFeature methods can extract Feature Keys and Qualifier
Tag/Value Entries
The collection of features for a sequence can be returned as an
array of Feature objects by the get_SeqFeatures() method
For each feature, the primary_tag() method returns the Feature
Key
The get_all_tags() method returns an array of Tag objects, and
for each Tag the method get_tag_values() returns an array of
Values
                                                                   8
    Bio::SeqFeature Example: extracting feature info
# get number of features in Seq object
my $fc = $seqobj->feature_count();
print "Number of features: $fc\n";
# Get all Features from SeqObj
my @f = $seqobj->get_SeqFeatures();
foreach $feat (@f) {
    # Get the primary tag
   my $pt = $feat->primary_tag;
   if ($pt eq "repeat") {
       my @alltags = $feat->get_all_tags;
      # look for anything but "tandem" in alltags
       foreach my $tag (@alltags) {
          if ($tag ne "tandem") {
            print "$pt $tag $feat->start $feat->end \n";
         }
       }
   } # end if "repeat"
} # end foreach feature

                                                           9
   Bio::SeqFeature Example: extracting gene sequences
   # Get all of the Features from $seqObj
   my @f = $seqobj->get_SeqFeatures();
   foreach $feat (@f) {
    # Get the primary tag
   my $pt = $feat->primary_tag;
   if ($pt eq "gene") {
       my @alltags = $feat->get_all_tags;
      # look for locus_tag in alltags
       foreach my $tag (@alltags) {
          if ($tag eq "locus_tag") {
            my @tv = $feat->get_tag_values($stag);
            my $gsObj = $feat->spliced_seq;
            $gsObj->display_id($tv[0]);
            $gsObj->desc($gsObj->length . ' bases');
            $OutSeqio->write_seq($gsObj);
          }
       }
   } # end if "gene"
} # end foreach feature
                                                       10
                 More help with BioPerl
See the BioPerl Tutorial for more discussion of
Bio::SeqFeature methods
http://bioperl.org/wiki/BioPerl_Tutorial

The Feature-Annotation HOWTO explains SeqFeature and
Annotation objects
http://bioperl.open-bio.org/wiki/HOWTO:Feature-
 Annotation

Jason Stajich is a major contributor to BioPerl
http://jason.open-bio.org/Bioperl_Tutorials/
(especially the 2008 dirs)


                                                   11

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:12/8/2011
language:
pages:11