Rdf Technology
Description
Rdf Technology document sample
Document Sample


UniProt
Eric Jain
Swiss Institute of Bioinformatics, Geneva
W3C Workshop on Semantic Web for Life Sciences, October 2004
What is it?
ID ATPB_CANFA STANDARD; PRT; 19 AA.
AC P99504;
DT 15-JUL-1998 (Rel. 36, Created)
DT 15-JUL-1998 (Rel. 36, Last sequence update)
DT 05-JUL-2004 (Rel. 44, Last annotation update)
DE ATP synthase beta chain, mitochondrial (EC 3.6.3.14) (Fragment).
GN Name=ATP5B;
OS Canis familiaris (Dog).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Carnivora; Fissipedia; Canidae; Canis.
OX NCBI_TaxID=9615;
RN [1]
RP SEQUENCE.
RC TISSUE=Heart;
RX MEDLINE=98163340; PubMed=9504812;
RA Dunn M.J., Corbett J.M., Wheeler C.H.;
RT "HSC-2DPAGE and the two-dimensional gel electrophoresis database of
RT dog heart proteins.";
RL Electrophoresis 18:2795-2802(1997).
CC -!- FUNCTION: Produces ATP from ADP in the presence of a proton
CC gradient across the membrane. The beta chain is the catalytic
CC subunit.
CC -!- CATALYTIC ACTIVITY: ATP + H(2)O + H(+)(In) = ADP + phosphate +
CC H(+)(Out).
CC -!- SUBUNIT: F-type ATPases have 2 components, CF(1) - the catalytic
CC core - and CF(0) - the membrane proton channel. CF(1) has five
CC subunits: alpha(3), beta(3), gamma(1), delta(1), epsilon(1). CF(0)
CC has three main subunits: a, b and c.
CC -!- SUBCELLULAR LOCATION: Mitochondrial.
CC -!- SIMILARITY: Belongs to the ATPase alpha/beta chains family.
DR HSC-2DPAGE; P99504; DOG.
DR InterPro; IPR000194; ATPase_a/bcentre.
DR PROSITE; PS00152; ATPASE_ALPHA_BETA; PARTIAL.
KW ATP synthesis; ATP-binding; CF(1); Direct protein sequencing;
KW Hydrogen ion transport; Hydrolase; Mitochondrion.
FT UNSURE 8 8
FT UNSURE 17 19
FT NON_TER 19 19
SQ SEQUENCE 19 AA; 1871 MW; BB9C163FDC60BB42 CRC64;
ATQTSPSPKG AAAXXXRVV
//
What have we done so far?
[DIR] Parent Directory 19-Jul-2004 13:02 -
[ ] cellular-components.rdf 11-Oct-2004 19:15 5k
[ ] databases.rdf 11-Oct-2004 19:15 45k
[ ] databases.rdf.gz 13-Sep-2004 11:34 6k
[ ] datasets.rdf 19-Oct-2004 16:32 4k
[ ] enzymes.rdf.gz 11-Oct-2004 19:15 309k
[ ] go.rdf.gz 11-Oct-2004 19:15 839k
[ ] intact.rdf.gz 11-Oct-2004 19:15 636k
[ ] keywords.rdf.gz 11-Oct-2004 19:15 96k
[ ] ontology.owl 19-Oct-2004 18:27 77k
[ ] taxonomy.rdf.gz 11-Oct-2004 19:15 4.0M
[ ] uniparc.rdf.gz 13-Oct-2004 10:54 762M
[ ] uniprot.rdf.gz 11-Oct-2004 19:39 768M
[ ] uniref.rdf.gz 01-Oct-2004 12:56 52.2M
use Expasy::RDF;
my $parser = Expasy::RDF::Parser->new('P12345.rdf');
while (my $protein = $parser->next)
{
my $id = $protein->id;
my $mass = $protein->sequence->mass;
print "Mass of $id is $mass.\n";
print $_->type, ': ', $_->comment, "\n";
foreach ($protein->annotation)
}
$parser->close;
Issues
XML Syntax
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="urn:lsid:uniprot.org:ontology:"
xmlns:owl="http://www.w3.org/2002/07/owl#"
>
<rdf:Description rdf:about="urn:lsid:uniprot.org:taxonomy:9606">
<rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Taxon"/>
<mnemonic>HUMAN</mnemonic>
<scientificName>Homo sapiens</scientificName>
<commonName>Human</commonName>
<rdfs:subClassOf rdf:resource="urn:lsid:uniprot.org:taxonomy:9605"/>
</rdf:Description>
</rdf:RDF>
Triples, Quads and Quints
● What is the source of a triple?
● Compact reification.
Web Services
● Overkill for providing programmatic access to resources.
● Often impractical for performance reasons.
Life Science Identifiers
● Need special resolver.
● Resolution tied to retrieval.
● Explicit version numbers.
● Not widely used.
Embedded References
uniprot.rdf
<rdf:Description rdf:about="#_2F9A">
<rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Caution_Annotation"/>
<rdfs:comment>In mouse, 5 genes homologous to human CD209/DC-SIGN and
CD209L/DC-SIGNR have been identified. Mouse CD209A product was named DC-
SIGN by {citation 1} because of its similar expression pattern and
chromosomal location in juxtaposition to CD23, but despite of the low
sequence similarity.</rdfs:comment>
<citation rdf:resource="#_2F8A"/>
</rdf:Description>
cyc.rdf
<owl:Class rdf:ID="Antigen">
<rdfs:comment>The collection of substances that can stimulate immune
response. For example, bacteria [#$Bacterium], #$Viruses, proteins
[#$ProteinMolecule] can serve as #$Antigens.</rdfs:comment>
</owl:Class>
Summary
People will adopt the technology if it provides
immediate benefits and is simple to use.
Credits
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
<foaf:Project>
<foaf:name>UniProt</foaf:name>
<foaf:homepage rdf:resource="http://uniprot.org/"/>
<foaf:fundedBy>
<foaf:Organization>
<foaf:name>National Institutes of Health</foaf:name>
<foaf:homepage rdf:resource="http://www.nih.gov/"/>
</foaf:Organization>
</foaf:fundedBy>
</foaf:Project>
<foaf:Organization>
<foaf:name>Swiss Institute of Bioinformatics</foaf:name>
<foaf:nick>SIB</foaf:nick>
<foaf:homepage rdf:resource="http://www.isb-sib.ch/"/>
</foaf:Organization>
<foaf:Organization>
<foaf:name>European Bioinformatics Institute</foaf:name>
<foaf:nick>EBI</foaf:nick>
<foaf:homepage rdf:resource="http://www.ebi.ac.uk/"/>
</foaf:Organization>
<foaf:Organization>
<foaf:name>Georgetown University</foaf:name>
<foaf:homepage rdf:resource="http://www.georgetown.edu/"/>
</foaf:Organization>
...
Get documents about "