Rdf Technology by hbx71800

VIEWS: 3 PAGES: 24

More Info
									UniProt

Eric Jain
Swiss Institute of Bioinformatics, Geneva
W3C Workshop on Semantic Web for Life Sciences, October 2004
What is it?
ID   ATPB_CANFA       STANDARD;    PRT;     19 AA.
AC   P99504;
DT   15-JUL-1998 (Rel. 36, Created)
DT   15-JUL-1998 (Rel. 36, Last sequence update)
DT   05-JUL-2004 (Rel. 44, Last annotation update)
DE   ATP synthase beta chain, mitochondrial (EC 3.6.3.14) (Fragment).
GN   Name=ATP5B;
OS   Canis familiaris (Dog).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Carnivora; Fissipedia; Canidae; Canis.
OX   NCBI_TaxID=9615;
RN   [1]
RP   SEQUENCE.
RC   TISSUE=Heart;
RX   MEDLINE=98163340; PubMed=9504812;
RA   Dunn M.J., Corbett J.M., Wheeler C.H.;
RT   "HSC-2DPAGE and the two-dimensional gel electrophoresis database of
RT   dog heart proteins.";
RL   Electrophoresis 18:2795-2802(1997).
CC   -!- FUNCTION: Produces ATP from ADP in the presence of a proton
CC       gradient across the membrane. The beta chain is the catalytic
CC       subunit.
CC   -!- CATALYTIC ACTIVITY: ATP + H(2)O + H(+)(In) = ADP + phosphate +
CC       H(+)(Out).
CC   -!- SUBUNIT: F-type ATPases have 2 components, CF(1) - the catalytic
CC       core - and CF(0) - the membrane proton channel. CF(1) has five
CC       subunits: alpha(3), beta(3), gamma(1), delta(1), epsilon(1). CF(0)
CC       has three main subunits: a, b and c.
CC   -!- SUBCELLULAR LOCATION: Mitochondrial.
CC   -!- SIMILARITY: Belongs to the ATPase alpha/beta chains family.
DR   HSC-2DPAGE; P99504; DOG.
DR   InterPro; IPR000194; ATPase_a/bcentre.
DR   PROSITE; PS00152; ATPASE_ALPHA_BETA; PARTIAL.
KW   ATP synthesis; ATP-binding; CF(1); Direct protein sequencing;
KW   Hydrogen ion transport; Hydrolase; Mitochondrion.
FT   UNSURE         8       8
FT   UNSURE        17      19
FT   NON_TER       19      19
SQ   SEQUENCE    19 AA; 1871 MW; BB9C163FDC60BB42 CRC64;
     ATQTSPSPKG AAAXXXRVV
//
What have we done so far?
[DIR]   Parent Directory          19-Jul-2004   13:02       -
[   ]   cellular-components.rdf   11-Oct-2004   19:15      5k
[   ]   databases.rdf             11-Oct-2004   19:15     45k
[   ]   databases.rdf.gz          13-Sep-2004   11:34      6k
[   ]   datasets.rdf              19-Oct-2004   16:32      4k
[   ]   enzymes.rdf.gz            11-Oct-2004   19:15    309k
[   ]   go.rdf.gz                 11-Oct-2004   19:15    839k
[   ]   intact.rdf.gz             11-Oct-2004   19:15    636k
[   ]   keywords.rdf.gz           11-Oct-2004   19:15     96k
[   ]   ontology.owl              19-Oct-2004   18:27     77k
[   ]   taxonomy.rdf.gz           11-Oct-2004   19:15    4.0M
[   ]   uniparc.rdf.gz            13-Oct-2004   10:54    762M
[   ]   uniprot.rdf.gz            11-Oct-2004   19:39    768M
[   ]   uniref.rdf.gz             01-Oct-2004   12:56   52.2M
use Expasy::RDF;

my $parser = Expasy::RDF::Parser->new('P12345.rdf');

while (my $protein = $parser->next)
{
   my $id = $protein->id;
   my $mass = $protein->sequence->mass;
   print "Mass of $id is $mass.\n";
   print $_->type, ': ', $_->comment, "\n";
     foreach ($protein->annotation)
}

$parser->close;
Issues
XML Syntax

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="urn:lsid:uniprot.org:ontology:"
  xmlns:owl="http://www.w3.org/2002/07/owl#"
>

  <rdf:Description rdf:about="urn:lsid:uniprot.org:taxonomy:9606">
    <rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Taxon"/>
    <mnemonic>HUMAN</mnemonic>
    <scientificName>Homo sapiens</scientificName>
    <commonName>Human</commonName>
    <rdfs:subClassOf rdf:resource="urn:lsid:uniprot.org:taxonomy:9605"/>
  </rdf:Description>

</rdf:RDF>
Triples, Quads and Quints

● What is the source of a triple?

● Compact reification.
Web Services

● Overkill for providing programmatic access to resources.

● Often impractical for performance reasons.
Life Science Identifiers

● Need special resolver.

● Resolution tied to retrieval.

● Explicit version numbers.

● Not widely used.
Embedded References

uniprot.rdf

<rdf:Description rdf:about="#_2F9A">
<rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Caution_Annotation"/>
<rdfs:comment>In mouse, 5 genes homologous to human CD209/DC-SIGN and
CD209L/DC-SIGNR have been identified. Mouse CD209A product was named DC-
SIGN by {citation 1} because of its similar expression pattern and
chromosomal location in juxtaposition to CD23, but despite of the low
sequence similarity.</rdfs:comment>
<citation rdf:resource="#_2F8A"/>
</rdf:Description>


cyc.rdf

<owl:Class rdf:ID="Antigen">
<rdfs:comment>The collection of substances that can stimulate immune
response. For example, bacteria [#$Bacterium], #$Viruses, proteins
[#$ProteinMolecule] can serve as #$Antigens.</rdfs:comment>
</owl:Class>
Summary
People will adopt the technology if it provides
 immediate benefits and is simple to use.
Credits
<?xml version="1.0" encoding="UTF-8"?>


<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
>


<foaf:Project>
   <foaf:name>UniProt</foaf:name>
    <foaf:homepage rdf:resource="http://uniprot.org/"/>
    <foaf:fundedBy>
      <foaf:Organization>
        <foaf:name>National Institutes of Health</foaf:name>
        <foaf:homepage rdf:resource="http://www.nih.gov/"/>
      </foaf:Organization>
   </foaf:fundedBy>
</foaf:Project>


<foaf:Organization>
    <foaf:name>Swiss Institute of Bioinformatics</foaf:name>
    <foaf:nick>SIB</foaf:nick>
   <foaf:homepage rdf:resource="http://www.isb-sib.ch/"/>
</foaf:Organization>


<foaf:Organization>
    <foaf:name>European Bioinformatics Institute</foaf:name>
    <foaf:nick>EBI</foaf:nick>
   <foaf:homepage rdf:resource="http://www.ebi.ac.uk/"/>
</foaf:Organization>

<foaf:Organization>
    <foaf:name>Georgetown University</foaf:name>
    <foaf:homepage rdf:resource="http://www.georgetown.edu/"/>
</foaf:Organization>


...

								
To top