									                                                                             Modeling Issues Encountered from
                                                                       Building a Taxonomy from a Biology Textbook
                                           A. Patrice Seyeda, John Pachecob, Andrew Goldenkranzb, and Vinay Chaudhrib
                        aDepartment of Computer Science and Engineering, University at Buffalo, Buffalo, New York, bSRI International, Menlo Park, CA

                                                              Linnaean Classification                                 Entity/Process Dichotomy                                              Subclass/Subevent
•       Task to create a taxonomy from an AP Biology                                                                  •       Biologists wanted to classify Light-Microscope
        textbook’s glossary terms (Campbell and               •   Biologists wanted to classify the
                                                                                                                              under the subclass Technology
        Reece, 8th Edition)                                       different kingdoms under the
                                                                                                                      •       Biologists also wanted to classify Technology                 •      There was a strong initial
                                                                  class Kingdom                                               under the subclass Inquiry                                           tendency to use the
Materials and Methods                                         •   However, there are 5 instances                      •       These two uses of the term `Technology’ refer to                     hierarchy to organize sub-
•       Imported ~2400 glossary terms and definition              of Kingdom                                                  two different senses                                                 parts or sub-processes
        strings from electronic glossary into                                                                         •       Glossary definition for Technology: “The                      •     E.g., Telophase is a
        Collaborative Protégé in OWL format, as                                                                               application of scientific knowledge for a specific
                                                                                                                              purpose, often involving industry or commerce                       subclass of Mitosis, instead
        classes and comment strings                                                                                           but also including uses in basic research.”                         of a sub-process
•       Use of just the subclass-of relation                  •   Biologists also wanted to relate the different
                                                                  levels of classification of the Linnaean                                                                          Solution
•       Team consisted of biologists and KR specialists
                                                                  taxonomy to each other in the familiar                                                                            •           Move parts or sub-processes to appropriate
•       Biologists attempted initial classifications, where       hierarchical way                                                                                                              locations whenever they are found
        modeling issues were identified and solutions
                                                                        • However, an instance of Phylum is                                                                         •           In training sessions, reinforce the subclass of
                                                                            not an instance of Kingdom                Solution                                                                  relationship
Results                                                                                                               •          In this case it was simple to notice the problem
                                                              Solution                                                           and refactor the taxonomy
Entity/Role Dichotomy                                         • Built Linnaean taxonomy under organism                •          Some terms are polysemous
       Initial encoding of organic molecules included        • Used common English names for simplicity                            •      Definitions including “; also” are a
        classes defined by structure and classes defined                                                                                   warning sign
                                                                        • “Moss is a Plant” is clearer than “Moss                          Example: Wild Type is ``An individual
        by function
                                                                          is a Plantae”                                                    with the phenotype most commonly
           Proteins and steroids are defined by their                                                                                     observed in natural populations; also
                                                              • Made the classes that represent the tiers in that
              chemical composition                                                                                                         refers to the phenotype itself.’’
                                                                hierarchy into “Classification-Units”
           Hormones are defined by the function they
       There is an overlap between Steroid and
           Some hormones are
           steroids, others are proteins

                                                                                                                          Classifying Areas of Research
    •    Hormones are defined as roles that certain
         chemicals play                                                                                                   •       Initial tendency to classify areas of                 Conclusions
                                                              Potential Refinements                                               research (Genetics, Anatomy, Ecology)                 •       Initially, biologists relied on prior knowledge and
    •    Steroid-Hormone is still a class in the                                                                                  under Inquiry
         taxonomy which represents a useful class             •   Add the latin-named classes as instances of their                                                                             definitions for organizing class hierarchy,
                                                                                                                          •       Areas of research are complex, involving                      classes treated as organizational “buckets”
         of biologists’ intuitive thinking                        Classification-Unit                                             research activities and educational
                                                                   Examples:                                                      institutions constituted of departments,              •       Ontological principles were applied to identify
                                                                                                                                  faculty members, programs and curricula.                      modeling issues and provide a foundation during
                                                                  • Animalia instance of Kingdom
                                                                                                                          •       However, the definitions of the terms for                     taxonomy building process
                                                                  • Plantae instance of Kingdom
                                                                                                                                  each area of research is prefixed by ``the            •       The biologists now have a much better sense for
                                                                                                                                  scientific study of’’                                         these types of modeling issues
                                                              •   Treat Classification Units as meta-classes              •       Given commitment to processes,
                                                                   • Animal would be an instance of the meta-                     classification under Inquiry is appropriate.      Acknowledgements
                                                                     class Kingdom                                                                                                  This work was funded by Vulcan Inc.
                                                                   • Animalia would be a synonym for that class                                                                     References
