Rdf Technology

Document Sample
Rdf Technology Powered By Docstoc
					UniProt

Eric Jain
Swiss Institute of Bioinformatics, Geneva
W3C Workshop on Semantic Web for Life Sciences, October 2004
What is it?
ID   ATPB_CANFA       STANDARD;    PRT;     19 AA.
AC   P99504;
DT   15-JUL-1998 (Rel. 36, Created)
DT   15-JUL-1998 (Rel. 36, Last sequence update)
DT   05-JUL-2004 (Rel. 44, Last annotation update)
DE   ATP synthase beta chain, mitochondrial (EC 3.6.3.14) (Fragment).
GN   Name=ATP5B;
OS   Canis familiaris (Dog).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Carnivora; Fissipedia; Canidae; Canis.
OX   NCBI_TaxID=9615;
RN   [1]
RP   SEQUENCE.
RC   TISSUE=Heart;
RX   MEDLINE=98163340; PubMed=9504812;
RA   Dunn M.J., Corbett J.M., Wheeler C.H.;
RT   "HSC-2DPAGE and the two-dimensional gel electrophoresis database of
RT   dog heart proteins.";
RL   Electrophoresis 18:2795-2802(1997).
CC   -!- FUNCTION: Produces ATP from ADP in the presence of a proton
CC       gradient across the membrane. The beta chain is the catalytic
CC       subunit.
CC   -!- CATALYTIC ACTIVITY: ATP + H(2)O + H(+)(In) = ADP + phosphate +
CC       H(+)(Out).
CC   -!- SUBUNIT: F-type ATPases have 2 components, CF(1) - the catalytic
CC       core - and CF(0) - the membrane proton channel. CF(1) has five
CC       subunits: alpha(3), beta(3), gamma(1), delta(1), epsilon(1). CF(0)
CC       has three main subunits: a, b and c.
CC   -!- SUBCELLULAR LOCATION: Mitochondrial.
CC   -!- SIMILARITY: Belongs to the ATPase alpha/beta chains family.
DR   HSC-2DPAGE; P99504; DOG.
DR   InterPro; IPR000194; ATPase_a/bcentre.
DR   PROSITE; PS00152; ATPASE_ALPHA_BETA; PARTIAL.
KW   ATP synthesis; ATP-binding; CF(1); Direct protein sequencing;
KW   Hydrogen ion transport; Hydrolase; Mitochondrion.
FT   UNSURE         8       8
FT   UNSURE        17      19
FT   NON_TER       19      19
SQ   SEQUENCE    19 AA; 1871 MW; BB9C163FDC60BB42 CRC64;
     ATQTSPSPKG AAAXXXRVV
//
What have we done so far?
[DIR]   Parent Directory          19-Jul-2004   13:02       -
[   ]   cellular-components.rdf   11-Oct-2004   19:15      5k
[   ]   databases.rdf             11-Oct-2004   19:15     45k
[   ]   databases.rdf.gz          13-Sep-2004   11:34      6k
[   ]   datasets.rdf              19-Oct-2004   16:32      4k
[   ]   enzymes.rdf.gz            11-Oct-2004   19:15    309k
[   ]   go.rdf.gz                 11-Oct-2004   19:15    839k
[   ]   intact.rdf.gz             11-Oct-2004   19:15    636k
[   ]   keywords.rdf.gz           11-Oct-2004   19:15     96k
[   ]   ontology.owl              19-Oct-2004   18:27     77k
[   ]   taxonomy.rdf.gz           11-Oct-2004   19:15    4.0M
[   ]   uniparc.rdf.gz            13-Oct-2004   10:54    762M
[   ]   uniprot.rdf.gz            11-Oct-2004   19:39    768M
[   ]   uniref.rdf.gz             01-Oct-2004   12:56   52.2M
use Expasy::RDF;

my $parser = Expasy::RDF::Parser->new('P12345.rdf');

while (my $protein = $parser->next)
{
   my $id = $protein->id;
   my $mass = $protein->sequence->mass;
   print "Mass of $id is $mass.\n";
   print $_->type, ': ', $_->comment, "\n";
     foreach ($protein->annotation)
}

$parser->close;
Issues
XML Syntax

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="urn:lsid:uniprot.org:ontology:"
  xmlns:owl="http://www.w3.org/2002/07/owl#"
>

  <rdf:Description rdf:about="urn:lsid:uniprot.org:taxonomy:9606">
    <rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Taxon"/>
    <mnemonic>HUMAN</mnemonic>
    <scientificName>Homo sapiens</scientificName>
    <commonName>Human</commonName>
    <rdfs:subClassOf rdf:resource="urn:lsid:uniprot.org:taxonomy:9605"/>
  </rdf:Description>

</rdf:RDF>
Triples, Quads and Quints

● What is the source of a triple?

● Compact reification.
Web Services

● Overkill for providing programmatic access to resources.

● Often impractical for performance reasons.
Life Science Identifiers

● Need special resolver.

● Resolution tied to retrieval.

● Explicit version numbers.

● Not widely used.
Embedded References

uniprot.rdf

<rdf:Description rdf:about="#_2F9A">
<rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Caution_Annotation"/>
<rdfs:comment>In mouse, 5 genes homologous to human CD209/DC-SIGN and
CD209L/DC-SIGNR have been identified. Mouse CD209A product was named DC-
SIGN by {citation 1} because of its similar expression pattern and
chromosomal location in juxtaposition to CD23, but despite of the low
sequence similarity.</rdfs:comment>
<citation rdf:resource="#_2F8A"/>
</rdf:Description>


cyc.rdf

<owl:Class rdf:ID="Antigen">
<rdfs:comment>The collection of substances that can stimulate immune
response. For example, bacteria [#$Bacterium], #$Viruses, proteins
[#$ProteinMolecule] can serve as #$Antigens.</rdfs:comment>
</owl:Class>
Summary
People will adopt the technology if it provides
 immediate benefits and is simple to use.
Credits
<?xml version="1.0" encoding="UTF-8"?>


<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
>


<foaf:Project>
   <foaf:name>UniProt</foaf:name>
    <foaf:homepage rdf:resource="http://uniprot.org/"/>
    <foaf:fundedBy>
      <foaf:Organization>
        <foaf:name>National Institutes of Health</foaf:name>
        <foaf:homepage rdf:resource="http://www.nih.gov/"/>
      </foaf:Organization>
   </foaf:fundedBy>
</foaf:Project>


<foaf:Organization>
    <foaf:name>Swiss Institute of Bioinformatics</foaf:name>
    <foaf:nick>SIB</foaf:nick>
   <foaf:homepage rdf:resource="http://www.isb-sib.ch/"/>
</foaf:Organization>


<foaf:Organization>
    <foaf:name>European Bioinformatics Institute</foaf:name>
    <foaf:nick>EBI</foaf:nick>
   <foaf:homepage rdf:resource="http://www.ebi.ac.uk/"/>
</foaf:Organization>

<foaf:Organization>
    <foaf:name>Georgetown University</foaf:name>
    <foaf:homepage rdf:resource="http://www.georgetown.edu/"/>
</foaf:Organization>


...

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:7/30/2011
language:English
pages:24
Description: Rdf Technology document sample