A Tour of the PSI-Nature
Structural Biology Knowledgebase
The PSI-Nature Structural Biology Knowledgebase (PSI SBKB) is designed to turn the products of the
Protein Structure Initiative into knowledge that is important for understanding living systems and
disease. This "one-stop shop" provides users with the available genetic, structural, functional and
experimental information about a particular protein of interest.
This walkthrough will introduce you to the features and search capabilities of the PSI SBKB.
Navigating the PSI SBKB
The PSI SBKB homepage makes many features available from one central place.
Features available on the PSI SBKB homepage
The central search box is the main entry point to find out more information about a protein. You can
search by protein or nucleotide sequence, PDB ID (Protein Data Bank atomic 3D coordinates file ID) or
also conduct a search by text. These will be described in the second half of this tutorial.
Central view: Structural Biology Update - a view of available research highlights this month.
Left Navigation Menu
Provides access to the our scientific resources, the Structural Biology Update content, and information
about the Protein Structure Initiative.
Right navigation menu
E-alerts: subscribe to Nature's Email alert service or download RSS feeds which lists the monthly content
and weekly structure updates.
Functional Sleuth: shows all of the PSI structures solved by the large-scale centers that do not yet have
functional annotation information.
Propose Targets: a way for groups outside the PSI to submit targets and benefit from the PSI's high-
throughput structure determination pipeline.
Latest PSI Statistics: a count of the protein structures solved by the PSI efforts.
See Latest Structures: a list of the PSI structures released for public use, updated weekly. (Available also
as an RSS feed)
A closer look at features to browse
In Functional Sleuth, we present structures determined by the PSI efforts whose functions are still
unknown. Clicking on a structure in a gallery will perform a query of the Knowledgebase site to provide a
starting point to explore each structure.
The PSI Centers have developed high-throughput protein production and structure determination
pipelines and new experimental methods as solutions to a number of experimental bottlenecks. The
greater scientific community is invited to benefit from these efforts as well. Each PSI Center entertains
target nominations for structure determination, which are vetted for feasibility and consistency with the
overall PSI goals. Investigators can create an account and submit their target proposals using this feature
on the PSI SBKB, and a decision is usually received within one month. Proposals accepted for structure
determination must adhere to the PSI rules, most notably that structural data, including, must be
deposited in the public database, the PDB, within 4 weeks of completion of the structure.
See Latest Structures
As a rule, the PSI centers must deposit their protein structures to the Protein Data Bank within 4 weeks
of their completion so that these structures can quickly reach the biological communities that use them
for clinical and basic studies. The "See Latest Structures" feature shows all structures released that
week. This list can also be delivered to your web browser in an RSS feed.
The Structural Biology Update page
The Structural Biology Update keeps readers current in advances made by the PSI and in the fields of
structural biology and structural genomics. It delivers editorials describing the latest research findings
and technical highlights, an Events Calendar, recent articles from Nature News, and a Research Library
categorized by experimental topic.
The Search box is also available on the right side of the page; this allows users to search anything else
that they see of interest while browsing the articles and sites on the PSI SBKB.
Features available on the Structural Biology Update site
Editorials about recent protein structures and new techniques/methods are written each month by the
Nature Publishing Group, focusing on topics that could be of broader scientific interest to anyone.
Research highlights published in various Nature Journals that relate to proteins are also shared here.
Each month, the Featured PSI Molecule gives a detailed portrait of a biological molecule solved by the
PSI efforts. Using interactive illustrations and generalized explanations, each article describes the
features of these biologically significant targets for students of all ages.
The Research Library is a catalog of all PSI publications to date, in addition to recent structural results
and technological advances from the broader structural biology community. Updated monthly, this
specialized resource is organized by subject (see below) so that users can find papers related to various
solutions to problems in the protein pipeline, such as a new DNA vector for protein expression, to novel
NMR or x-ray structure determination methods, to new protein function prediction resources.
To present a broader view of the latest in science in general, we provide the latest news from the
Protein Structure Initiative, Nature News and other NPG publications. The monthly newsletter, "PSI in
the Spotlight", contains PSI and NIGMS news items such as funding announcements, press releases, new
(or changes to existing) policies, and conference reports.
Calendar of Events
An events calendar keeps the community in touch with upcoming conferences, events, and workshops
that promote a structural view of biology. We also invite the community to let us know about events
you would like to post here - write us at firstname.lastname@example.org.
Further Information about the Protein Structure Initiative
The PSI SBKB also contains site help and information regarding the PSI
program, its mission, and its policies, found in the left navigation
menu in the "About" links.
The "About this site" menu contains information about the SBKB - a
"getting started" tutorial and classroom exercises made by OpenHelix
and the SBKB group, contact information, site map, terms of us, and
references in case you wish to cite the SBKB.
The "About PSI" menu has information on the PSI overall mission and
goals, their biomedical themes. It also shows active funding
opportunities to either become part of the PSI efforts, or to
collaborate with current consortia, with links to the NIH/NIGMS
announcements and notices.
The PSI centers link gives information about each PSI center and their research projects.
Searching the PSI SBKB
The PSI-Nature SBKB can be searched by one-letter code protein sequence, nucleotide sequence, plain
text and Protein Data Bank identifier (PDB ID) code. The following section describes how to use these
The PSI SBKB consists of a main searchable database linked with modules (PSI resources) that provide
additional information about the query terms.
Searchable by sequence and PDB ID:
Experimental Data Tracking databases
TargetDB and PepcDB
Structures from the PDB
Annotations from external biological resources
Protein Model Portal - homology models
Materials Repository - DNA clones
Searchable by text:
Technology Portal - a repository of technical reports and methods provided by the PSI centers,
searchable by center and by experimental step.
Publications Portal - a list of all articles published by the PSI centers.
PSI Centers - search text from within the PSI centers websites
Next, we will discuss searching these features in detail.
Searching by Sequence or PDB ID
The PSI-Nature SBKB maintains a database of the sequences of PSI protein targets and the sequences of
all solved protein structures released by the Protein Data Bank. Sequence searches are performed using
the BLASTP program with an E-value cutoff of 10 for sequences less than or equal to 50 amino acids (150
nucleotides) or a E-value cutoff of 0.001 for sequences 51 amino acids or longer. To search for a
particular protein sequence, enter the one-letter amino acid sequence in the search form, select the by
Sequence radio button and press Search. Nucleotide sequence searched are also supported, using the
BLASTX program to determine possible reading frames and displaying closely matched protein
An example query is available by selecting the by Sequence radio button, pressing "example query", and
then pressing the Search button. These options are highlighted in the figure below.
The PSI-Nature SBKB maintains a database of the identifier codes for all experimental structure entries
released by the Protein Data Bank. To search for a particular Protein Data Bank entry, enter the
structure's 4-letter ID code in the search form, select the by PDB id radio button and press Search. An
example query (2BEI) is available on the site to explore these features.
Results of a Sequence or PDB ID Search
The results of sequence and PDB ID searches are first displayed as a summary of available records
relating to the input query. An example of a Results Summary is shown below.
To view query result details individually, select the DB REPORT tab at the top of the summary page.
From this summary, you can view the type of information you seek:
1. Structures - displays a list of experimental structures within the PDB. The structures tab will also show
all genetic, structural, and functional annotations attributed to a structure through a "notebook" view
2. Models - supplied by the Protein Model Portal (http://www.proteinmodelportal.org), displays
computational models related to the sequence
3. Targets - supplied by the experimental data tracking (EDT) database, TargetDB,
(http://targetdb.pdb.org), displays information on the experimental progress and status of targets
selected for structure determination. Target sequences will also have annotations, even in the absence
of a 3D structure.
4. Protocols - supplied by the EDT database, PepcDB, (http://pepcdb.pdb.org), displays status history,
stop conditions, reusable text protocols and contact information collected from the NIH PSI and other
structural genomics centers.
5. Materials - supplied by the PSI Materials Repository, (http://psimr.asu.edu/) displays DNA clones
available for purchase.
The Structures Tab
The Structures tab of the DB Report provides the essentials details about any structures matching the
input query. If the query results for a sequence search are displayed, then the percent of sequence
identity (percent exact sequence similarity) with the input sequence is displayed for each matching
structure entry (I), as well as the E value (E).
The Structures section presents:
a link to the RCSB PDB Structure Explorer Page,
a download option for the PDB format structure data file,
a thumbnail of the structure, which when clicked, will launch the interactive FirstGlance
molecular viewer application, and
a "post-it" with a list of possible annotation types, which when clicked, launches a rich
"notebook" view of all annotations connected to this structure (described later).
Other reference information includes:
PubMed and DOI for the primary citation (when available),
Title of the deposited structure (may not be the same as the related publication),
Structure entry deposition and release dates, and
Experimental method used to obtain the model.
If the structure was solved by a PSI project then this information is provided along with the associated
PSI Target identifier. There is also a glossary of terms available in the upper right hand corner which
defines these headings. A glossary is present for each tab.
To view the other reports, click on their tab headings (Models, Protocols, etc.)
The Annotations Notebook
Each protein target and protein structure has many biological descriptions, or annotations, attached to
them. The SBKB assembles the annotations from over 150 PSI and other genomic, structural, functional,
and evolutionary resources to provide you with most of the information available today about that
protein sequence. These annotations are organized into a "notebook", classified by scientific topic :
Gene-level view, protein-level view, structural view, biological functions, cellular localization,
biochemical pathways, medicinal relationships and references.
First, you can quickly get a sense of how many annotations exist through the "quick table". By hovering
the mouse over a hotlinked chain ID, a quick table will appear showing you if annotations exist for ~35
popular resources. Every database that contains an annotation will be highlighted in green, and clicking
on the resource name will take you directly to that record in the main "notebook" view.
The full list of annotations is available in the Notebook view. In the figure of a typical protein-level
annotation notebook page below, links are provided to the databases UniProtKB (comprehensive
protein database), Pfam (a protein family and motifs database), InterPro (protein family assignment),
and Gene3D (predictive structural annotation).
From this view the user can see what annotation databases have data relating to the sequence, and can
go directly to the record by following the link.
The Glossary of Terms, available in the top-right corner, defines these headings; in this case, the glossary
describes what kind of information each linked database provides.
The Models Tab
Computational Models associated with a query sequence or structure are shown in this section.
In the case of a sequence query, the number of models that have been predicted for this sequence are
presented along with a link to the details for each model. In the case of the PDB ID query, the number of
computational models which are based on information from this experimental structure is presented.
All of these results are obtained by a remote query to the PSI Protein Models Portal, which collects and
maintains this information. In the example below, there are 4 models from three modeling databases
available. To explore, follow the "view" link to go to the PSI Protein Models Portal.
Example: using the same sequence search example,
Step 1: Once you see the results of your search, follow the "view" link.
Step 2: Explore the available pre-computer models. Included here is a graphical explanation of how the
similar sequence, structures, and models relate to each other, along with domain information in grey.
Also, the list of proteins IDs from UniProt that relate to the sequence are shown. Lastly - the list of
models themselves, along with a pictorial clue of model reliability with the little traffic light icon.
Part of a full Model report from the Protein Model Portal is as follows:
The Sequence Summary:
red: your query
blue: the model you are viewing.
this model consists of residues 27-357 of your query sequence.
Reports what protein domains are recognized in your query sequence, with a link to InterPro for further
information. In this example, the model is of the GDPD domain of the protein.
The computation model is presented, with information related to its creation. You can also display an
interactive view the model and also download its coordinates for further evaluation.
Protein structure models are computational predictions which may contain errors. Based on the
sequence identity to the template, a model is assigned to one of three categories of modeling
complexity (see PMP for more details).
The target-template alignment provided on the model info pages are generated dynamically by
structural superposition of model and template structures using the program MAMMOTH.
The Targets Tab
Information about matching protein targets is shown in the Targets tab of the DB report.
The information provides the user with a status summary of the work performed on the target already.
Information in this summary includes:
the TargetID, with a link to the record in TargetDB
the protein sequence alignment between your query sequence and similar sequences found in
reported target status
and PSI Target Category
The annotations "post-it", quick table, and notebook views described in the structures section is also
available, and well as a Glossary of Terms in the top right corner that defines these headings.
You can read the full record by clicking on the TargetID in the report (ex. GO.74365)
The full Targets report from TargetDB is as follows:
General information, such as when the latest update occurred, the responsible center, status
information, source organism and target sequence.
If the target's experimental structure was successfully determined, a link to the RCSB PDB Structure
Explorer page is also given.
Links to domain annotation and function prediction databases are provided, along with calculated
biochemical and biophysical parameters for the sequence.
The Protocols Tab
The Protocols section provides links to the Protein Expression Purification and Crystallization Database
The information provided in this tab expands upon the information listed in the Targets tab by providing
links to the experimental protocols. Information in this summary includes:
the TargetID, with a link to the record in PepcDB
the protein sequence alignment between your query sequence and similar sequences found in
links to the protocols used at each step of protein production and structure determination
Each experimental step is a link to a detailed protocol used by the structural determination center.
These protocols can suggest an experimental strategy that shortens the time needed to obtain protein
samples for further research.
A Glossary of Terms is available in the top right corner that defines these headings.
You can read the full report by clicking on the TargetID, or you can also read individual protocols used
during the production of this protein by clicking on the experimental step (ex. expression)
The full Protocols report from PepcDB is as follows:
General information, such as the TargetID, responsible center, and UniProt entry name.
Other useful information includes the CloneID, and a link to purchase the target DNA clone, available
through the PSI Materials Repository. Then, it provides derived protein information that may elucidate
structure and function, as in the Targets tab.
The novel feature is the experimental summary of this target - number of trials attempted, how far the
trial progressed (and if work was stopped), as well as the protocols used during the protein production
process. Since the search query can begin from a protein sequence of interest, this database will show
which protocols were successful (or unsuccessful) on similar sequences. In this way, PepcDB can be used
as a tool for experimental design.
The Materials Tab
The Materials tab provides information about the availability of relevant target DNA clone materials at
the PSI Materials Repository (PSI MR). The PSI MR is a resource that provides an on-line searchable
database of archived PSI genetic materials, transfer, storage and maintenance of PSI plasmids in a highly
quality-controlled manner at centralized on-site and off-site locations, and the facilities to distribute PSI
plasmids and supporting information for research purposes within the U.S. and abroad.
From our initial search example, the PSI MR has 7 similar target clones available to order.
The information provided in this tab:
the TargetID, with a link to the record in TargetDB
A link to order to clone
A link to a detailed record about the target's DNA sequence (DNA insert).
A link to information about the DNA vector in which the target sequence resides.
Selecting one of the last three links will transfer you to the PSI-MR- DNASU website at
To see further information about this DNA clone and the vector, including antibiotic resistance for
positive selection, click on the Clone Details link. An example of a record is shown below.
Searching the PSI SBKB using plain text
The PSI-Nature SBKB maintains a 'plain text' index of all content in web pages and documents at the PSI
Center websites , PSI Technology and Publications Portal, and the Annotations Module.
To the search the PSI-Nature SBKB by plain text, enter the appropriate words in the search form, select
the by Text radio button and press Search. An example query (the word "membrane") is available by
selecting the "by plain text" radio button, selecting the example query link, and pressing the Search
The results of the text search are presented as list of pages containing the input search term (e.g.
membrane) as shown below.
In the “Gateway” Search, all instances of ‘membrane' that occur on the SBKB site are found, including 6
highlights written for the SBKB that somehow talk about membranes and membrane proteins.
Clicking on the Structural Publication tab will show all structural articles that contain the query term; in
this case, all structural publications that contain the term membrane.
These records include links to protein structures that contain the search term as well. The PubMed
identifier, DOI number, and PubMed Central links to the article are provided when available, and by
selecting the "Read More" link, the full citation and abstract of the article will appear.
Clicking on the Methods tab will show all PSI-published articles and reports containing the search term
that focus on methodology. By selecting the "Read More" link, the full citation will be shown. In this
way, you can search for new methods developed by the PSI efforts to help your own research.
Lastly, explore the site on your own.
This tutorial has walked through all of the features available that you can use towards your own
research. With this "one-stop shop", you can find various sorts of assistance, from structural and
annotation information about your protein, to reports and protocols about how to obtain it.
If you have any questions or comments, or would like to suggest future features for the PSI SBKB, please
contact us at email@example.com.