Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design A System Architecture Design for Knowledge Management (KM) in Medical Genetic Testing (MGT) Laboratories Yulong Gu1, James Warren2, Jan Stanek1, Graeme Suthers3 1 Advanced Computing Research Centre, University of South Australia, Australia firstname.lastname@example.org; Jan.Stanek@unisa.edu.au 2 Department of Computer Science, University of Auckland, New Zealand email@example.com 3 Familial Cancer Unit, the Women’s and Children’s Hospital, Adelaide, Australia firstname.lastname@example.org Abstract heavily supported by Information Systems / Information Technologies (IS/IT) for Although genetic services have potential value for 1. data storage and information documentation, for health care and disease control, there is no systematic example, Word Processing and Database (DB) tools; solution for knowledge capture in Medical Genetic 2. data analysis, e.g. base sequence referencing via Testing (MGT) Laboratories. This paper addresses such GenBank , the DNA Data Bank of Japan (DDBJ) knowledge management (KM) technology weakness in , and/or European Molecular Biology genetics domain, then proposes an Information System Laboratory (EMBL) Sequence DB ; alignment (IS) architecture to establish process automation and analysis by DNASIS® MAX  / Mutation content management of the distributed workflow of Surveyor™  / CodeLink™ Bioarray Systems knowledge generation and knowledge management  / SNPs3D  / STRAP ; phenotype- (KG&KM) during MGT result interpretation. The genotype correlation analysis by 'UMD (Universal presented IS will validate the interpretation decision by Mutation Database) Central Phenotype-Genotype using Information Systems / Information Technologies Analysis' ; mutation analysis by 'UMD Central (IS/IT), esp. KM tools, such as Workflow Management Mutation Analysis' , et cetera; System (WfMS), search engine and groupware. Once 3. information searching, e.g. National Center for developed and implemented, our integrated system will Biotechnology Information (NCBI) search engine significantly improve MGT lab researchers’ KG&KM and Data Mining tools , , UMD Central tool performance through increasing knowledge capture, , Google search engine , etc.; and improving documentation quality and maintaining (if not 4. information transferring, e.g. Internet, Email, etc. improving) users’ information satisfaction. In an ideal scenario, researchers in MGT labs will only Keywords: Knowledge Management (KM), Medical issue a MGT report with validated interpretation of their Genetic Testing (MGT), Workflow Management System identified gene sequencing variant(s). To validate variant (WfMS), search engine, groupware. interpretation, a consensus about the variant significance (or otherwise) needs to be developed, often by checking accessible information resources and/or by discussing 1. Introduction among genetics research community. Variant significance may indicate the association strength between phenotype Genetic services provide both clinical and laboratory and genotype of the variant, i.e. the relevance between services for those with, or concerned about, a disorder certain disease and gene variant. Therefore, a valid MGT with a significant genetic component and their families report may add human genetic variation (mutation) . In daily genetic services, a 'genetic disorder', in the knowledge and human disease knowledge as well. In form of a human gene sequencing variant, could be other words, the knowledge generated during MGT result detected in a Medical Genetic Testing (MGT) Laboratory. interpretation may have some implications for health care After studying the gene variant, MGT lab researchers and disease control. would issue a MGT report with variant description and However, no existing IS/IT solution could effectively interpretation; then send it to the test requesting clinicians. support the interpretation decision-making, esp. regarding During this process, laboratory genetics research is the variant significance. Also there is no evidence showing an efficient, reliable and integrated information 1-4244-0165-8/06/$20.00 ©2006 IEEE. Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design resource being used in MGT labs. Meanwhile, the communications with existing query service providers, variation details contained in MGT reports are seldom such as NCBI and UMD , , . submitted to online DBs, despite the fact that the data is likely of high quality and of high relevance to others 2.3. Groupware working in the diagnostic community . Thus, neither the knowledge generation (KG) performance nor the Groupware refers to the software products that provide knowledge management (KM) performance in MGT labs collaborative support to groups; i.e., it offers a is satisfactory, which may be improved with adequate mechanism for teams to share opinions, data, information, IS/IT support. knowledge, and other resources . The groupware element in our system is to meet the increasing 2. Theory base requirements for collaboration and communication among (distributed) genetics community. Since 1950’s, the role and importance of IS/IT have undergone substantial transition from data processing, 2.4. IS/IT adoption status in MGT domain management services, to information processing . The ongoing fourth 'IS capability' Era, identified as 'to create Since most scientific research deliverables are precise and sustain IS/IT-based competitive advantage ', knowledge, there are growing KM approaches in requires explicit/tacit organizational knowledge to be academic field by implementing IS/IT. In genetics constantly documented and managed, which extends the domain, enormous informatics efforts are made especially Knowledge Management (KM) trend from 1990’s. In this by two associations – the Human Genome Variation paper, KM refers to the technology that deals with Society (HGVS) and the National Genetics Reference following issues: where to find knowledge, how to Laboratories (NGRL) in Manchester. For instance, it is classify/store/maintain/use it, how to ensure its quality, the HGVS-recommended mutation DB guidelines that are and how to motivate people to contribute , . KM regarded as standards for General DB, Locus Specific tools include Email/messaging, Document management, Database (LSDB) and Central DB , , . Search Engines, Enterprise information portal, Data Meanwhile, NGRL summarizes the overall workflow in warehouse, Groupware, Workflow management, Web, etc. MGT labs and highlights general IS requirements for . Some of these tools are applied in this project due to Genetic Services . However, so far there is no the users’ information requirements in MGT labs. systematic solution supporting KG or KM in MGT labs, which inspires our project of technology development for 2.1. Workflow Management System (WfMS) KM issues in the domain. Workflow is the automation of a business process; while 3. System architecture design Workflow Management System (WfMS) is a system that defines, creates and manages the execution of workflows Having addressed the KM technology weakness in through using software . WfMSs have been genetics domain, we developed an Information System successfully utilized in a variety of organizations to (IS) architecture to establish process automation and increase efficiency and reduce cost. Specifically, content management of the distributed KG&KM scientific WfMSs are emerging from a number of workflow during MGT result interpretation. By adapting academic grounds primarily for research process and extending KM tools, our portal system has two automation, e.g. BioMAX systems  and Taverna primary objectives: to maximize knowledge capture from project  in life science domain. WfMS is regarded as MGT labs, and to achieve high quality in documentation an essential component of our system in favor of process and interpretation of MGT results. Based on the scenario automation and activity audit-ability as well. mentioned before, MGT lab researchers will only issue a MGT report containing validated gene variant 2.2. Search engine interpretation. Aiming at producing such a valid MGT report, we decide to apply WfMS for automating and Search engine is a program designed to help find auditing the variant interpretation process (as key information by allowing one to ask for content meeting KG&KM procedure), to develop an intelligent search specific criteria and retrieving a list of references that engine for aiding information searching tasks, and to match those criteria . W3C has recommended several deploy groupware for assisting group work. By technologies to enhance search engines, such as Web integrating these tools, our IS will have at least two Services Description Language (WSDL) , Resource properties: Description Framework (RDF) , Web Ontology • Managing knowledge capture and knowledge Language (OWL) , Metadata , etc. These may be diffusion (knowledge 'flow') from MGT labs to test deployed in developing our search engine to establish requesting party, journals, (LS)DBs, etc.; Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design • Support for making and documenting highly- Step 3: the variant is interpreted (so called 'curated' or structured research decisions (e.g. MGT result 'diagnosed') by MGT lab researchers. interpretation), and support for regular revision of Step 4: a (virtual) lab meeting across genetics these decisions. community could be conducted so as to achieve 'consensus' of the variant interpretation. During the 3.1. System analysis discussion, further information searching may be needed to obtain an all-agreed interpretation decision. Before designing a system architecture with above Step 5: a MGT report is produced and submitted to the functionalities, we analyzed the existing business test requesting party, and possibly to journals/DBs. So far, processes in MGT labs, which began with workflow easy submission to DBs has not been realized. As a model choosing among process-oriented models , matter of fact, Dr. Scott Grist states that he sits on at least object-oriented models , , , goal-focused 100 non-submitting variants so far. Given the large models  and task-based models . Considering the number of MGT labs around the globe, there must be an flexibility of these models versus the complexity of MGT enormous wealth of human genetic variation knowledge research tasks (by applying use case model), object- recognized locally, but not accessible to the community, oriented behavior-based workflow model was chosen to and undeniably worth managing. examine KG&KM-related research workflow in MGT labs. As a result, Figure 1 pictures a MGT report- 3.2. System design producing workflow instance (for variant 'A'). Based on Figure 1 workflow instance, a reasonable 3 system architecture is designed, as shown in Figure 2. 1 2 Base Sequence Reference Variant identification 4 5 MGT Result 1. Name the Sequence (e.g. 1 = Name the 'Variant A' 2 = Collect available documentations about Variant A 3 = Make the interpretation decision 4 = (Virtual) lab meeting to achieve 'consensus' for the Literature (e.g. DBs (e.g. local DB, LSDB, Search engines (e.g. NCBI& interpretation of Variant A 5 = Submit the 'MGT report' of Variant A to the test requesting party, journals, LSDBs, Central DB, etc. Sequence Variant & 2. Intelligent existing documents Figure 1. Behavior-based workflow instance diagram for producing a MGT Report of 'Variant A' 3. Interpretation Interpretation Decision As a simple case in MGT labs, Figure 1 illustrates an decision-making by optimal workflow of producing a valid MGT report for gene sequencing variant 'A'. MGT Report of 4. Groupware Step 1: a detected and identified variant is named Variant A (with applying standard nomenclature , as recommended by d i HGVS . 5. Submit to test requesting Step 2: MGT lab researchers may need to collect existing research results about the variant as reference. According to Dr. Scott Grist, senior MGT lab researcher Figure 2. Proposed system architecture (with data in Adelaide Flinders Medical Centre, 55% of the variants flow) to produce a valid MGT report detected in his lab show strong negative (or positive) pathogenicity, which indicates clear diagnosis A raw MGT result is identified as a gene variant by (interpretation). For the rest 45% variants, the diagnosis applying sequence analysis software, such as Mutation depends mostly on previous knowledge about the variant- Surveyor, and/or comparing with base sequence, which is related DNA/RNA/Protein, including their conservation stored in base sequence reference DBs, such as GenBank (through alignment analysis), pathological impact (by . The identified variant has to be named appropriately phenotype-genotype analysis) and significance. Such data as a 'Sequence Variant' using standard nomenclature. is gathered from literature/DBs and is regarded as key Then, information searching may be performed using a evidence for interpretation decision. metasearch engine. The query results from multiple sources (e.g. literature, DBs, and even search engines) will be recorded along with search audit trail. Afterwards, Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design a full document consisting of variant description, relevant of MGT results will improve, which is to be evaluated records of the variant and search audit trail will be sent to based on the feedback from clinical geneticists, who MGT lab researchers for interpretation decision. Later, request the test at the first place and receive the result this decision may be discussed by a group of geneticists eventually. Thirdly, the usability and acceptance of our to get interpretation 'consensus', through which data system is expected to be satisfactory, which will be quality of the MGT report will be improved. With checked by future user survey using Doll and geneticists in various locations to perform group work, to Torkzadeh’s ‘User Satisfaction’ instrument . collaborate and to communicate with each other regularly, With WfMS feature in our system, the ability for several groupware functions would be used frequently, business processes to be modeled and monitored in real such as virtual conference, whiteboard and so forth. time will increase; and those processes will become more At the last step, a MGT report of the 'Sequence easily changed in response to volatile market trends and Variant' with consented interpretation will be produced technology . Therefore, our system will keep efficient and submitted to the test requesting party (e.g. genetic flexibility of the knowledge-intensive research process in clinicians) and possibly to journals/DBs. This dissertation MGT labs. Additionally, the audit-ability of WfMS will task would become much easier with a standard variation be explored through recording and sharing the track of data format. Fortunately, HGVS is proposing a 'mutation MGT lab researchers’ job tasks. Meanwhile, search entry form' , which may offer a typical messaging engine capacity will offer more complete query results format to support systematic submission of variant details. from multiple sources, such as literature, DBs (including Hope-fully, submission automation, as one of the ultimate those offering analysis services) and search engines deliverables of our system, will be realized along with (including NCBI and UMD tools , , ). It will international standardization of the variation data format. deliver a comprehensive document about the variant with molecular and clinical data, which may be of interest to 3.3. System developing tools clinicians (e.g. phenotype-genotype correlations), to geneticists (e.g. distribution and frequency of mutations) To facilitate our system architecture, Business Process and to research biologists (e.g. structural domains and Execution Language (BPEL) is proposed as programming molecular epidemiology) . As another essential language aiming to produce portable business processes element of our system, groupware will provide  and to enable programming in the large . functionalities on all levels of information flow axis, i.e., Furthermore, Oracle BPEL Process Manager is a possible the management of connection, communication, content developing tool in this project since it offers a and process . comprehensive and easy-to-use infrastructure for creating, To ensure the usability and acceptance of our system, deploying and managing BPEL business processes . several actions are taken, including technology readiness In addition, components from a range of open-sourced assessment, user information requirement study and scientific WfMSs may be 'plugged' into our system, e.g. system implementation planning. Moreover, potential Scientific Process Automation (SPA) program , system users were involved during system design and will Taverna project , etc. take part in system development, system implementation The expected search engine capabilities will support and system usage training. In later phases of the project, queries from multiple sources and will keep search additional issues about system acceptance may be tracking. Accordingly, metadata/RDF/WSDL/OWL addressed, such as organizational involvement and techniques are proposed to facilitate communications commitment to manage change process , , . between our system and multiple DBs/search engines. On the other hand, to avoid engineering replication in More participant study later in this project will MGT domain, we will keep on watching and guarantee proposed groupware tools (e.g. Whiteboard/ collaborating closely with international project groups, Email/E-conference/Internet) match precise Time/Place e.g. NGRL’s DmuDB project . Last but not least, communication requirements . It is also possible to current system architecture (Figure 2) could be expanded add Group Decision Support System (GDSS) feature into with future progresses in genetics field, for instance, if the system. And web services will be applied to assist analysis methodology for 'complex genetic disorders' communications among all system components. involving interactive variants and multiple diseases is established. 4. Discussion 5. Conclusion Our system architecture, once developed and implemented, will help capture human genetic variation This paper proposes an IS architecture designed to knowledge from MGT labs, which is to be measured by solve KM problems in MGT laboratories by utilizing MGT result submission attempts to journals and DBs. WfMS, search engine and groupware technologies. It is Secondly, quality of the documentation and interpretation expected that in a couple of years, the proposed system will be developed and implemented in several MGT labs Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design in South Australia. As a result, it will establish process  GE Healthcare (2005). "CodeLink™ Bioarray Systems", automation and content management of the distributed http://www4.amershambiosciences.com/aptrix/upp01077.nsf/Co workflow of KG&KM in MGT labs. Through validating ntent/Products?OpenDocument&ParentId=568694 (2005-11-14). MGT result interpretation, the system will significantly  Gokhale, DA., Devereau, AD and Taylor, GR. (2004). "A improve MGT lab researchers’ KG&KM performance, in new online mutation repository for UK diagnostic laboratories", http://www.genomic.unimelb.edu.au/mdi/pdfb.pdf (2005-11-03). terms of increasing knowledge capture, improving documentation quality and maintaining (if not improving)  Goland, Y.Y. (2005). "The Promise of Portable Business Processes", Web Services Journal, 2(12) users’ information satisfaction. http://ftpna2.bea.com/pub/downloads/BPEL4WS_WSJ.PDF (2005-11-14). Acknowledgement  Google (2005). "Google", http://www.google.com/ (2005- 11-18). The authors of this paper would like to thank many  HGVS (2005a). "Nomenclature for the description of people for their assistance during this project. Dr. Sui Yu sequence variations", at Adelaide Women’s and Children Hospital, Dr. http://www.genomic.unimelb.edu.au/mdi/mutnomen/ (2005-11- Jacqueline Carroll, Dr. Graeme Casey, and Dr. Glenice 14). Cheetham at the Institute of Medical and Veterinary  HGVS (2005b). "Allele Variant Entry Form", Science, as well as Dr. Scott Grist at Flinders Medical http://www.genomic.unimelb.edu.au/mdi/entry.html (2005-11- Centre, provided us with many expertise opinions and 14). valuable recommendations. We are also indebted to Prof.  Horaitis, O. and Cotton, R. G. H. (2004). "The challenge Paul Swatman and Prof. Markus Stumptner in the of collecting mutations across the genome: The Human Genome University of South Australia for their constructive Variation Society approach", Hum Mutat 23:447–452. comments. Here we want to express our deepest  ITPerspectives Inc. (2005). "NHS Genetic Service appreciation to every person who contributed with either Information Systems Output Based Specification", http://www.ngrl.org.uk/Manchester/Pages/Downloads/ITP- inspirational or actual work in the project! GEN3-OBSv05.pdf (2005-11-03).  Laboratory of Human Genetics Montpellier (France). References (2005a). "The UMD Central Mutation Analysis",  Allen, R. (2001). "Workflow: An Introduction", in http://188.8.131.52:2200/mutations.html (2005-11-14). Workflow Handbook 2001, Fischer L. (ed), WfMC (Workflow  Laboratory of Human Genetics Montpellier (France). Management Coalition), 15-38. (2005b). "The UMD Central Phenotype-Genotype Analysis",  Anderson, J. G. (1997). "Clearing the way for physicians' http://184.108.40.206:2200/CLIN.shtml (2005-11-14). use of clinical information systems", Association for Computing  Laboratory of Human Genetics Montpellier (France). Machinery. Communications of the ACM 40(8): 83. (2005c). "The UMD Central", http://220.127.116.11:2200/  Antonarakis, S.A. and the Nomenclature Working Group. (2005-11-18). (1998). "Recommendations for a nomenclature system for  Lau, F. and M. Herbert (2001). "Experiences from health human gene mutations", Hum. Mutat. 11: 1-3. information system implementation projects reported in Canada  Biomax. (2004). "Biomax Solutions: Integrated Life between 1991 and 1997", Journal of End User Computing 13(4): Science Software and Services", 17. http://www.biomaxsolutionsinc.com (2005-09-12).  Liebowitz, J. (1999). Knowledge management handbook,  Brown, S. A., A. P. Massey, et al. (2002). "Do I really Boca Raton, FL: CRC Press. have to? User acceptance of mandated technology", European  MiraiBio (2005). "DNASIS® MAX v2.5", Journal of Information Systems 11(4): 283. http://www.miraibio.com/products/cat_bioinformatics/view_dna  Browne, E. D., Schrefl, M. and Warren, J. R. (2004). sismax/ (2005-11-14). "Goal-Focused Self-Modifying Workflow in the Healthcare  NCBI (2005a). "GenBank Database", Domain", in Proceedings of the Proceedings of the 37th Annual http://www.psc.edu/general/software/packages/genbank/genban Hawaii international Conference on System Sciences (Hicss'04) k.html (2005-11-14). - Track 6 - Volume 6 (January 05 - 08, 2004). HICSS. IEEE  NCBI. (2005b). "Tools for Data Mining", Computer Society, Washington, DC, 60145.2. http://www.ncbi.nlm.nih.gov/Tools/ (2005-11-14).  BSCW. (2005). "Workflow",  NIG (2005). "DNA Data Bank of Japan", http://bscw.fit.fraunhofer.de/V42workflow.html (2005-09-12). http://www.ddbj.nig.ac.jp/ (2005-11-14).  Doll, W. J. and Torkzadeh, G. (1988). "The measurement  Oracle (2005). "Oracle BPEL Process Manager", of end-user computing satisfaction", MIS Quarterly, 12(2), 259- http://www.oracle.com/technology/products/ias/bpel/index.html 274. (2005-11-02).  EMBL-EBI (2005). "Nucleotide Sequence Database",  Preuner, G. and Schrefl, M. (2000). "A three-level schema http://www.ebi.ac.uk/embl/ (2005-11-14). architecture for the conceptual design of web-based information  Finnie, G. (2005). "Business Intelligence and Data systems: from web-data management to integrated web-data and Warehousing", web-process management", World Wide Web 3, 2 (Mar. 2000), http://www.it.bond.edu.au/inft323/043/Lectures/lecture%206%2 125-138. 0023.ppt (2005-11-03). Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design  Prior, C. (2003). "Workflow and Process Management", in  The Medical School Charite Berlin (2005). "Multiple Workflow Handbook 2003, Fischer L. (ed), FL: Future Sequence Alignment Interactive Program STRAPNT", Strategies Inc. http://www.charite.de/bioinf/strap/ (2005-11-14).  Schmidt, M.T. (1998). "Building Workflow Business  Turban, E. and Aronson, J.E. (2001). Decision Support Objects, Object-Oriented Programming Systems Languages Systems and Intelligent Systems (6th. ed.), Upper Saddle River, Applications", OOPSLA'98 Business Object Workshop, London, NJ: Prentice Hall. ISBN: 013-089465-6 (TA). Springer, 1998.  UMBI (2005). "SNPs3D", http://www.snps3d.org/ (2005-  Schrefl, M. and Stumptner, M. (2002). "Behavior- 11-14). consistent specialization of object life cycles", ACM Trans.  W3C. (2001a). "Web Services Description Language Softw. Eng. Methodol. 11, 1 (Jan. 2002), 92-148. (WSDL) 1.1", http://www.w3.org/TR/wsdl (2005-09-12).  Scriver, C.R., Nowacki, P.M. and Lehväslaiho, H. (1999).  W3C. (2001b). "Metadata and Resource Description", "Guidelines and recommendations for content, structure and http://www.w3.org/Metadata/ (2005-09-12). deployment of mutation databases", Hum. Mutat. 13: 344-350.  W3C. (2004a). "Resource Description Framework (RDF)",  Scriver, C.R., Nowacki, P.M. and Lehväslaiho, H. (2000). http://www.w3.org/RDF (2005-09-12). "Guidelines and recommendations for content, structure and  W3C. (2004b). "Web-Ontology (WebOnt) Working Group deployment of mutation databases II: Journey in progress", Hum. (Closed)", http://www.w3.org/2001/sw/WebOnt/ (2005-09-12). Mutat. 15: 13-15.  Ward, J. & Peppard, J. (2002). Strategic Planning for  SDM Center (2004). "Scientific Process Automation Information Systems, John Wiley & Sons: Chichester. (SPA)", http://www-casc.llnl.gov/sdm/ (2005-11-18).  Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D.,  SoftGenetics (2005). "Mutation Surveyor™ ... A Unique Madden, T.L., Schuler, G.D., Tatusova, T.A. and Rapp, B.A. Research Tool", http://www.softgenetics.com/ms/index.htm (2000). "Database resources of the National Center for (2005-11-14). Biotechnology Information", Nucleic Acids Res., 28, 10–14.  Somogyi E K and Galliers R D. (2002). "Developments in  Wikipedia. (2005a). "Search Engine", the Application of Information Technology in Business: from http://en.wikipedia.org/wiki/Search_engine (2005-09-12). Data Processing to Strategic Information Systems", in Galliers  Wikipedia. (2005b). "BPEL", RD and Leidner DE , Strategic Information Management, 3rd ed. http://en.wikipedia.org/wiki/BPEL (2005-09-12). Butterworth-Heineman 2002.  zur Muehlen, M. (1999). "Beyond Reengineering", in:  SourceForge.net (2005). "Taverna 1.2", Becker, J.; Kugeler, M.; Rosemann, M. (Eds.): Process http://taverna.sourceforge.net/ (2005-11-18). Management. Springer, Berlin et al. 1999, pp. 285-327. (also 2nd and 3rd Edition).