SYMining Solutions
Phenotype Database Merger
David Tupek Jie Chou
Department of Bioengineering University of Illinois at Urbana-Champaign
March 11, 2008
Problem Statement
• Recording phenotype classification has developed into a non-standardized and non-collective science. Currently there is no program allowing the merger of phenotypic databases. A project is needed to merge current databases into a single organized, standard-format, updatable database.
Mission Statement
• The project will produce software allowing the user, Lawrence Livermore National Laboratories, to independently download the most current version of each user designated database, parse biologically meaningful information into a standardized format, merge the data into a single database, and generate a summarizing report. We will work with Lawrence Livermore National Laboratory to formulate customer requirements. Our team will review the targeted databases, code the software, design a practical user interface and validate results. The resulting product will be a single comprehensive, standardized database, compiling the biologically relevant information from the phenotype classifications of the targeted databases.
Background
• Phenotype
– “describes any observed quality of an organism, such as its morphology, development, or behavior, as opposed to its genotype” – “the phenotype is not simply a product of the genotype, but is influenced by the environment to a greater or lesser extent”
• Wikipedia.org; http://en.wikipedia.org/wiki/Phenotype
Current Databases
• Individual collections of phenotype data • Phenotypes defined independently by separate databases • Researchers choose database to use • Add their results to one of these databases OR begin their own data collection • No unified set of phenotype data currently • Difficult to keep up with all of the most current databases across the internet
Background
• Phenotype
– “describes any observed quality of an organism, such as its morphology, development, or behavior, as opposed to its genotype” – “the phenotype is not simply a product of the genotype, but is influenced by the environment to a greater or lesser extent”
• Wikipedia.org; http://en.wikipedia.org/wiki/Phenotype
Customer Needs
• Parse relevant information out of database content • Search special cases (LLNL defined) • Contradictory information between DB’s • User specified database updates of the targeted databases • Software determines phenotype definitions whenever possible • Summary Report / special cases • Continuous log kept of merged DB changes
Design Requirements
• Software will download remote databases from user specified URLS • Software will parse remote databases for phenotypic information from • User determines similar phenotypes b/w remote databases • User resolves phenotype naming conflicts • User resolves conflicting phenotypic data entries • Software will merge remote databases into a comprehensive phenotype database
– Based on user selections
Concept Design
Alternative Concepts
Component Decomposition Diagram
Function Decomposition Diagram
Product Configuration
• • • • • Physical and Functional Elements Clustered Elements Clustered Layout Incidental Interaction Graph Configuration of Special Parts
Physical and Functional Elements
Website Data Grabber Parser New Data Storage
Lastest Separate Databases Comparisons
Conflicting Entry Rules
Updated Databases
User Selections
Merged Database
User Selection Logs
Report Generator
Report Logs
Clustered Layout
Software Run Initiation
Website Data Grabber
Data Extraction
Parser
New Data Storage
Logic Decisions
Comparisons
Conflicting Entry Rules
Logs
Lastest Separate Databases Updated Databases
User Selections
User Selection Log
Results
Merged Database Report Generator
Report Log
Clustered Elements
Cluster Data Flow
Client Side
Data Extraction
Logic Decisions
Results
Logs
Incidental Interaction Graph
Incidental Interaction Graph
Client Side
Connection to Server/ Targeted Database Error
Data Extraction
Logic Decisions
Results
Connection to Server Error
Logs
Configuration of Special Part
PHP
Perl
Targeted Databases
LWP
* Special Parts are non-standard Perl/PHP libraries