How to Build an Ontology

Reviews
Shared by: club56
Stats
views:
66
rating:
not rated
reviews:
0
posted:
11/14/2008
language:
English
pages:
0
How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith 1 Ontology A classification of entities and the relations between them. Ontology is a list of types structured by relations Defined by a scientific field's vocabulary and by the canonical formulations of its theories. Scientific theories consist of generalizations. What I will not be talking about: XML, OWL, ..., data(types), information models, file formats ... 2 Top-Level GO OBO, OBO Core NCBO FMA NCBC Roadmap Centers NCI EVS NECTAR (National Electronic Clinical Trials and Research) Network 3 Instances are not included in an ontology It is the generalizations that are important (but instances must still be taken into account) 4 A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 5 Ontology Types Instances 6 Ontology = A Representation of Types 7 Each node of an ontology consists of: • preferred term (aka term) • term identifier (TUI, aka CUI) • synonyms • definition, glosses, comments Ontology = A Representation of Types 8 Nodes in an ontology are connected by relations: primarily: is_a (= is subtype of) and part_of designed to support search, reasoning and annotation Ontology = A Representation of Types 9 Rules for formating terms • Terms are names of types: if you prefix a term with the type ___ the term should still make sense • Hence: terms should be in the singular • Terms should be lower case • Avoid abbreviations even when it is clear in context what they mean („breast‟ for „breast tumor‟) 10 Motivation: to capture reality Inferences and decisions we make are based upon what we know of reality. An ontology is a computable representation of this underlying bio(techno)logical reality. Enables a computer to reason over the data in (some of) the ways that we do. 11 Biomedical ontology integration / interoperability Will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts What’s really needed is to have well-defined commonly used relationships 12 Concepts Biomedical ontology integration will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts 13 Concepts Concepts are in your head and will change as our understanding changes Ontologies represent types: not concepts, meanings, ideas ... Types exist, with their instances, in objective reality – including types of experimental process, design, method, ... 14 Most ontologies are execrable But some good ontologies do already exist • as far as possible don‟t reinvent • use the power of combination and collaboration • ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies 15 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors Intuitive rule facilitate training of curators and annotators Common rules allow alignment with other ontologies Logically coherent rules enhance harvesting of content through automatic reasoning systems 16 Rules on types Don‟t confuse types with concepts Don‟t confuse types with ways of getting to know types Don‟t confuse types with ways of talking about types Don‟t confuses types with data about types 17 First Rule: Univocity Terms (including those describing relations) should have the same meanings on every occasion of use. In other words, they should refer to the same types in reality 18 Second Rule: Positivity There are no negative types Terms such as „non-mammal‟ or „nonmembrane‟ do not designate genuine types. (There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe) 19 Third Rule: Objectivity Which types exist is not a function of our biological knowledge. Terms such as „unknown‟ or „unclassified‟ or „unlocalized‟ do not designate biological natural kinds. 20 Fourth Rule: Single Inheritance No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level 21 Rule of Single Inheritance no diamonds: B is_a1 C is_a2 A 22 Problems with multiple inheritance B is_a1 A C is_a2 „is_a‟ no longer univocal 23 „is_a‟ is pressed into service to mean a variety of different things shortfalls from single inheritance are often clues to incorrect entry of terms and relations the resulting ambiguities make the rules for correct entry difficult to communicate to human curators 24 is_a Overloading serves as obstacle to integration with neighboring ontologies The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned. 25 To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via force majeure 26 Current Best Practice: The Foundational Model of Anatomy 27 Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Part Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Tissue Pleural Cavity Pleural Sac Parietal Pleura Pleura(Wall of Sac) Interlobar recess Visceral Pleura Mesothelium of Pleura 28 Mediastinal Pleura Current Best Practice: The Foundational Model of Anatomy Follows formal rules for definitions laid down by Aristotle. When A is_a B, the definition of „A‟ takes the form: an A =def. a B which ... a human being =def. an animal which is rational 29 FMA Example Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def a cell part that surrounds the cytoplasm 30 The FMA regimentation Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs. The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. The entire information content of the FMA‟s term hierarchy can be translated very cleanly into a computer representation 31 GO now adopting structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron 32 Ontology alignment One of the current goals of GO is to align: Cell Types in GO with Cell Types in the Cell Ontology cone cell fate commitment keratinocyte differentiation adipocyte differentiation dendritic cell activation retinal_cone_cell keratinocyte fat_cell dendritic_cell lymphocyte T_lymphocyte garland_cell lymphocyte proliferation T-cell homeostasis garland cell differentiation heterocyst cell differentiation heterocyst 33 Alignment of the two ontologies will permit the generation of consistent and complete definitions GO id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition 34 Other Ontologies to be aligned with GO Chemical ontologies – 3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies – metanephros development GO itself – mitochondrial inner membrane peptidase activity  OBO core 35 eventually to comprehend all of OBO 36 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions 37 Definitions should be intelligible to both machines and humans Machines can cope with the full formal representation Humans need modularity 38 Fifth Rule: Terms and relations should have clear definitions These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: – actual cells, actual portions of cytoplasm, and so on 39 But Some terms are primitive (cannot be defined) AVOID CIRCULAR DEFINITIONS ! Avoid definitions of the forms: An A is an A which is B (person = person with identity documents) An A is the B of an A (heptolysis = the causes of heptolysis) 40 types animal mammal substance organism cat leaf type siamese frog instances 41 Benefits of well-defined relationships If the relations in an ontology are welldefined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C). Find all DNA binding proteins should also find all transcription factor proteins because transcription factor is_a DNA binding protein 42 What happens when an ontology has no clear definition of A is_a B: cancer documentation is_a cancer disease prevention is_a disease living subject is_a information object representing an animal or complex organism individual allele is_a act of observation 43 Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Part Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Tissue Pleural Cavity Pleural Sac Parietal Pleura Pleura(Wall of Sac) Interlobar recess Visceral Pleura Mesothelium of Pleura 44 Mediastinal Pleura How to define A is_a B A is_a B =def. all instances of A are as a matter of biological science also instances of B here A and B are names of types in reality 45 How to define A is_a B A is_a B =def. for all a if a instance_of A, then a instance_of B 46 Kinds of relations Between types: – is_a, part_of, ... Between an instance and a type – this explosion instance_of the type explosion Between instances: – Mary‟s heart part_of Mary 47 Part_of as a relation between types is more problematic than is standardly supposed heart part_of human being ? human heart part_of human being ? human being has_part human testis ? testis part_of human being ? 48 Definition of part_of as a relation between types A part_of B =Def all instances of A are instance-level parts of some instance of B human testis part_of adult human being 49 Instance level this nucleus is adjacent to this cytoplasm implies: this cytoplasm is adjacent to this nucleus Type level nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle 50 Definitions of the all-some form allow cascading inferences If A R1 B and B R2 C, then we know that every A stands in R1 to some B, but we know also that, whichever B this is, it can be plugged into the R2 relation 51 transformation_of C same instance C1 c at t1 c at t time pre-RNA child mature RNA adult 52 transformation_of A transformation_of B =Def. Every instance of A was at some earlier time an instance of B adult transformation_of child 53 embryological development C c at t C1 c at t1 54 tumor development C c at t C1 c at t1 55 derives_from C c at t C1 c1 at t1 time C' c' at t instances ovum zygote derives_from sperm 56 One main obstacle to integrating biological and experimentgenerated data Most ontologies have no facility for dealing with time and instances 57 EXPO: Experiment Ontology 58 representational style part_of experimental hypothesis experimental actions part_of experimental design 59 tool part_of experimental design (confuses object with specification) 60 hypothesis driven is_a Galilean 61 physical is_a scientific experiment (avoid abbreviations) 62 admin info about experiment is_a scientific experiment 63 where is the top level? objects, processes, characteristics 64 is_a and part_of never cross categorial divides (cf. tripartite organization of GO) if A is_a B then A is an object type iff B is an object type then A is a process type iff B is a process type then A is a characteristic type iff B is a characteristic type 65 Some thoughts on time continuants vs. occurrents objects, characteristics vs. processes time timeline day daytime menstrual cycle high tide 66 What is time? 67 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions Space = the largest spatial region Time = the largest temporal region 68 Relative time, subjective time terms describing (regions of) time in special (qualitative, perspectivedependent, landmark dependent) ways tomorrow, yesterday uptown, downtown phase A trial Wednesday 69 Characteristics are continuants many characteristics have realizations, applications or executions, which are processes plan design method menstrual cycle function 70 GlaxoSmithKline* What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma‟s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work. *Robin McEntire 71 OBO Relation Ontology “Relations in Biomedical Ontologies”, Genome Biology, Apr. 2005 relations for continuants behave differently from relations for processes 72 part_of for component types is time-indexed A part_of B =def. given any particular a and any time t, if a is an instance of A at t, then there is some instance b of B such that a is an instance-level part_of b at t 73 part_of for process types is not time-indexed A part_of B =def. given any particular a, if a is an instance of A, then there is some instance b of B such that a is an instance-level part_of b at t 74 Main Upper Level Ontologies CYC Cycorp (Austin, TX) human being = partially tangible thing SUO (Suggested Upper Ontology) IEEE monkey, body covering DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) BFO (Basic Formal Ontology) 75 SUO top level Entity – Physical • Object – SelfConnectedObject » Substance » CorpuscularObject » Food – Region – Collection – Agent • Process – Abstract • SetOrClass • Relation • Quantity – Number – PhysicalQuantity • Attribute • Proposition 76 MIGS Specification Top Levels Organism Phenotype Environment Sample Process Data Process 77

Related docs
How to Build an Ontology
Views: 21  |  Downloads: 10
How to Build an Ontology
Views: 80  |  Downloads: 19
What is an Ontology
Views: 62  |  Downloads: 26
Ontology Review Paper
Views: 11  |  Downloads: 1
Ontology Review Paper
Views: 25  |  Downloads: 5
Financial-Securities-Ontology
Views: 0  |  Downloads: 0
The Ontology of Dramaturgy
Views: 0  |  Downloads: 0
Digital Human Unified Ontology
Views: 4  |  Downloads: 0
How to Structure an Ontology
Views: 43  |  Downloads: 10
premium docs
Other docs by club56
Consent of children s home or agency
Views: 217  |  Downloads: 1
de315
Views: 177  |  Downloads: 0
dv160
Views: 97  |  Downloads: 0
Make Me a Servant
Views: 430  |  Downloads: 2
disc003
Views: 118  |  Downloads: 0
ch100
Views: 153  |  Downloads: 1
Come Let Us Worship and Bow Down
Views: 323  |  Downloads: 1
dv108v
Views: 113  |  Downloads: 0
Cry of My Heart
Views: 258  |  Downloads: 4
Awesome Power
Views: 200  |  Downloads: 1
dv500infok
Views: 96  |  Downloads: 0
dv126infok
Views: 84  |  Downloads: 0
Nobody Fills My Heart Like Jesus
Views: 156  |  Downloads: 1
tfintelements
Views: 131  |  Downloads: 0
Revell v Lidov
Views: 613  |  Downloads: 6