Embed
Email

Open Innovation

Document Sample

Shared by: wuxiangyu
Categories
Tags
Stats
views:
0
posted:
12/8/2011
language:
pages:
11
Linking Open Drug Data

Susie Stephens,

Principal Research Scientist, Eli Lilly

The Linked Data Cloud









Source: Chris Bizer

Linking Open Drug Data

• HCLSIG task started October 1, 2008

• Primary Objectives

• Survey publicly available data sets about drugs

• Publish and interlink these data sets on the Web

• Explore interesting questions in competitive

intelligence that could be answered if the data sets

are linked

• Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don

Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric

Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao

Assessment of Data Sources









Mark Sharp et al. A Framework for Characterizing Drug Information Sources. AMIA 2008

Published Data Sets

• LinkedCT (http://linkedct.org)

• Online registry of more than 60,000 clinical trials

• Published in XML

• 7,011,000 triples (290,000 interlinking)



• DrugBank (http://www4.wiwiss.fu-berlin.de/drugbank)

• A repository of almost 5,000 FDA-approved drugs

• Published as DrugBank DrugCards

• 1,153,000 triples (23,000 interlinking)



• DailyMed (http://www4.wiwiss.fu-berlin.de/dailymed/)

• High quality information about marketed drugs

• Flat file representation

• 124,000 triples (29,600 interlinking)



• Diseasome (http://www4.wiwiss.fu-berlin.de/diseasome)

• Information about 4,300 disorders and disease genes linked by known disorder-gene

associations

• Published in XML

• 88,000 triples (23,000 interlinking)

Classes of Links

• Based on common identifiers

• Links present in the source data sets



• Based on link discovery and record linkage techniques

• String matching

– E.g., “Alzheimer’s disease” in LinkedCT was matched with

“Alzheimer_disease” in Diseasome

• Semantic matching

– E.g. “Varenicline” has the synonym “Varenicline Tartrate” and the brand

names “Champix” and “Chantix”

Business Use Case

• A neuroscience focused business manager is interested in seeing an

update on new clinical trials by competitors on Alzheimer’s Disease (AD)

• A phase III trial by Pfizer for a drug called Varenicline has just been listed in

linkedCT

• More information of interest is found in DBpedia, DailyMed, and DrugBank

• DailyMed indicates the drug is already on the market for Nicotine addiction

and has minimal side effects

• DrugBank allows the manager to see the targets for Varenicline

• Diseasome, however, indicates that the corresponding genes are only

implicated in nicotine addiction, rather than AD

• This suggests a more complex relationship between the diseases than just

the drug target

• Extending the browsing to the SWAN Knowledgebase shows that there are

hypotheses relating AD to nicotine receptors through amyloid beta

Technical Challenges

• Life sciences data is difficult to connect due to inconsistent

terminology and the prevalence of synonyms, and homonyms

• Refinement of tools and techniques for enabling more automatic

linking of entities across data sets

• Selection of ontologies to enable consistent mappings

• Development a sufficiently robust platform as to enable

inferencing

• Provide an interface to users that supports browsing, querying,

and filtering data

• Persuade data providers to publish in RDF would alleviate the

need for us to update data, and provide some of the interlinking

Next Steps

• Ensure that existing data are accurately and comprehensively

linked



• Incorporate additional data sources into the LODD cloud that

are of interest to competitive intelligence (e.g. Traditional Chinese

Medicine)



• Use novel link discovery tools and frameworks including Silk

and LinQuer



• Explore using SIOC to aggregate information as what patients

are saying about drugs



• Submit paper to the iTriplify Challenge

Task Alignment

• LODD is looking to use Pharma Ontology’s work to help

inform the mappings



• Data converted to RDF is also loaded into BioRDF’s

HCLS KB

Conclusions

• Added 4 drug-related data sets into the cloud for

competitive intelligence



• Will add further data sources to the LODD cloud to

enable more insights to be gleaned



• Will continue to explore and test tools that are being

developed for LOD



Related docs
Other docs by wuxiangyu
ArticleReview1
Views: 0  |  Downloads: 0
Cutaway
Views: 1  |  Downloads: 0
MonthlySummary.Sep.2005
Views: 0  |  Downloads: 0
Layout 1 _Page 1_
Views: 0  |  Downloads: 0
WHAT AGE SHOULD YOUR CHILD HAVE A CELL PHONE
Views: 0  |  Downloads: 0
BCOM
Views: 14  |  Downloads: 0
Site Builder Toolkit Basics _v2.9_
Views: 3  |  Downloads: 0
INTERNET
Views: 2  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!