An Extensible Architecture for Enterprise-wide
Shared by: abu66250
An Extensible Architecture for Enterprise-wide Automated Requirements Traceability Jun Lin1, Chan Chou Lin2, Joseph Amaya3, Massimo Ilario4, Jane Cleland-Huang5 Center for Requirements Engineering School of Computer Science, Telecommunications and Information Systems DePaul University 1 email@example.com ; firstname.lastname@example.org;email@example.com; firstname.lastname@example.org; email@example.com Abstract Automated traceability utilizes information retrieval methods to dynamically generate traceability links on an as-needed basis. Although a significant body of research has demonstrated the feasibility of automated traceability a rather ad-hoc approach that often involves significant human effort has previously been used to import the traceable data from 3rd party case tools into the trace tool. This paper describes an architectural framework and corresponding prototype tool for providing “in-place” traceability in which data residing in distributed 3rd party case tools is automatically parsed to extract information needed to service trace queries. The framework provides extensibility for adding new types of traceable artifacts and 3rd party case tools. An open source model is proposed in which organizations contribute to the development of specialized adapters for different case tools and artifact types. 1. Introduction Traceability enables relationships between different software entities such as requirements, design artifacts, code, and test cases to be clearly defined . It therefore provides support for activities such as impact analysis, requirements validation, rationale management, and test case selection. Although developers can explicitly define and maintain traceability links using spreadsheets, tables, databases or requirements management tools, the practical difficulties and effort involved in maintaining links in an accurate state means that traceability can be prohibitively expensive and that many organizations have no systematic traceability process in place [4,5]. To address this problem, several researchers have investigated the feasibility of an automated approach to dynamically generate traceability links using information retrieval techniques [1,2,3,6,7,9,10,11,12,13]. In current methods, artifacts are typically extracted from their natural contexts, indexed for tracing, and then stored within a proprietary traceability database. For many types of traceable artifact, this part of the process is labor intensive and means that only carefully prepared artifacts and documents can be traced by the tool. However, for automated trace retrieval to be useful within industry, artifacts must be traceable within the case tools in which they are created and maintained. This type of “in-place” traceability would minimize the manual work involved in preparing a document for tracing and would make automated traceability feasible within an industrial context. a. Query screen b. Results screen Figure 1. An automated trace query. This paper describes an architectural framework for providing enterprise level connectivity between an automated trace retrieval tool and third party packages such as Rational Rose, XDE, and DOORs. The framework is designed to provide extensibility so that additional 3rd party case tools can be added at runtime. It also supports the use of different underlying trace engines. This framework improves on prior work through enabling traceability of artifacts within their native and often distributed 3rd party case tools. Poirot+ is essential in order for deploying an automated trace tool in industry. Section 2 of this paper provides a brief introduction to automated traceability illustrated using the Poirot tool. Section 3 discusses and outlines the requirements for an enterprise level traceability tool. Section 4 then describes the Poirot+ architectural framework for supporting enterprise level traceability, section 5 describes architectural support for four critical features, and section 6 concludes with an analysis of the framework and its application in industry. 2. Automated traceability tool Automated traceability tools utilize an assortment of information retrieval techniques in order to dynamically generate traces between two or more artifacts. Several researchers have developed tools based upon the Vector Space Model [6,7,11], Latent Semantic Indexing (LSI), probabilistic network models [2,3,11], and other similar methods. In all of these approaches a probability value or similarity score is calculated that depicts the likelihood of a link occurring between a pair of artifacts. Links with scores over a certain threshold value are typically categorized as candidate links and presented to an analyst for evaluation. The process is depicted in Figure 1, which illustrates a trace query using Poirot: Tracemaker. An analyst issues a query by specifying either an artifact ID or by typing in a free-text query, representing a change request or proposed new requirement. The analyst also specifies one or more target artifact groups. For example in Figure 1a, a query is issued for requirement 9014 against all UML classes. The underlying trace retrieval algorithm then calculates the probability of a link between the query and each class diagram and as depicted in Figure 1b, displays these results to the analyst for evaluation. During the course of evaluating the links, the analyst may request additional information concerning an artifact, which must then be retrieved and displayed. For example, in Figure 1b, the analyst has requested additional information about a UML class. 3. Enterprise Level Trace Requirements In a typical software development environment, artifacts from various stages of the software development lifecycle are stored in different case tools distributed throughout the organization. For example, software requirements may be stored in some combination of word files and requirements management tools such as DOORS or Requisite Pro. Design models may be embedded as visio diagrams into word documents, or stored as UML diagrams within a case tool such as Rational Rose or Poseidon. Code may be created and stored within Eclipse, and test cases may be stored in tools such as Test Director or Clear Case. An enterprise level distributed traceability solution must interface with these types of tools in order to provide traceability for the artifacts residing in them. 3.1 Third Party Case Tools Clearly one of the daunting challenges to achieving enterprise level traceability is the sheer variety of 3rd party case tools as well as the diversity of model types and organization of artifacts possible within each tool. Each model type within each tool needs to be accessed and parsed in a unique way. For example, some 3rd party tools have the ability to create and export XML or XMI files; others provide API’s for directly accessing and retrieving data from their proprietary databases; some have an underlying database structure that is easily accessible and understandable; while others simply store their artifacts as readable text files in clearly accessible directories. In order to provide traceability across artifacts stored in these 3rd party case tools some type of external access to the artifacts must be provided by the tool. 3.2 Data Storage To prevent the problem of massive data redundancy and stale data, only the minimal amount of data needed to service a trace query must be stored in the trace tool’s proprietary database. All other data should be stored in native format within the 3rd party case tool. 3.3 Extensibility In addition to supporting a broad set of 3rd party tools, the architecture must support the runtime addition and relocation of these tools without the need to recompile or restart the traceability tool. This flexibility is essential because new case tools and artifact types may be introduced to a project at any time. 4. Poirot+ An Enterprise level architecture for automated traceability A high-level view of the Poirot+ architecture is given in Figure 2. The primary components include the trace GUI, trace manager, trace engine, controller, data cleanser, resource manager, resource broker, data manager and adapter, diagram manager and adapter, and the diagram generator. These are described in more detail: • The TraceGUI provides a browser based component for issuing trace queries and displaying results. This functionality is depicted in Figure 1. The web client is also used for setting up Trace Engine TraceGUI CalculateSimilarityScores() GetQuery() DisplayResults() DefineProjectArtifacts() Concrete Trace Engine opname2() 1 1 Database Access TraceManager UpdateArtifact() GenerateTraceQuery(queryID : String, searchType String) UpdateArtifactType() GenerateTraceQuery(FreeText String, SearchType String) UpdateTermFrequencyData() GenerateRequestForArtifactDiagram(artifactID : String) RetrieveTermFrequencyData() StoreUserFeedback(artifactID : String, status : Boolean) 1 DataCleanser 1 Controller Cleanse(artifactXML : Document) RequestArtifacts(artifactType : String) : XML Document StopTerms(Terms : String) RequestArtifactDiagram(artifactID : String) StemTerms(Terms : String) ReceiveArtifacts(artifactXML : Document) SplitTerms(Terms : String) CleanseArtifact(artifactXML : Document) CountTerms(TermsXML : Document) UpdateDatabase(artifactXML : Document) FindPhrases(artifactXML : Document) Resource Manager Resource Broker RequestArtifacts(artifactType : String) RequestArtifactDiagram(artifactID : String) AcceptArtifacts(artifactXML : Document) LocateResource() MR Data Manager MR Diagram Manager <<Diagram Generator>> XMLDocument 1 1 SVGDocument SVGDocument RequestArtifacts() 1 RequestArtifactDiagram() GenerateSVGDiagram() ReturnXMLDocument() ReturnSVGDocument() <<Data Adapter>> <<Diagram Adapter>> Concrete Diagram Generator RequestData() RequestDiagram() 1 Concrete Data Adapter 1 * Concrete Diagram Adapter 3rd Party Case Tool Figure 2. High level classes a project, specifying the location of 3rd party case tools to be managed by the project, and specifying how and when key index terms are to be retrieved from those projects. • The TraceEngine provides the underlying functionality for generating traces between artifacts. It takes a query as input, uses a retrieval algorithm to calculate similarity scores, and then returns a list of candidate links ranked according to likelihood of a link. The user of a standard interface enables the use of different underlying trace engines. For example, our prototype tool implements both a probabilistic network and a vector space model. • The TraceManager coordinates traceability activities. When it receives requests from the Trace GUI to process a query it calls on the services of the trace engine and then returns the results to the GUI. It also can provide the GUI with data directly from the database. Finally, if the GUI requests additional information that is not stored in the database the trace manager forwards the request to the controller. • The controller is one of the key components of the enterprise traceability architecture. The primary function it serves is to control the flow of information between the local trace manager and the 3rd party case tools. The controller manages requests for distributed information from the trace GUI client, controls the process of receiving data, and uses the services of the data cleanser to extract the necessary index terms. • The data cleanser accepts raw data structured in an XML document and processes it by stemming words to their root forms, removing unimportant “stop” words, splitting variable names (such as mLastName to the words ‘last’ and ‘name’), identifying phrases, and performing other pre-parsing tasks needed to prepare the data for tracing. • The resource manager uses the services of the Resource broker to locate the IP address and port of a remote adapter and to forward messages through the appropriate data or diagram manager to the adapter which connects to a 3rd party case tool. • Requests for data are forwarded to the appropriate managed resource (MR) data manager. The MR data manager is responsible for instantiating an appropriate concrete data adapter and forwarding data requests to it. • The concrete data adapter understands the specific interface needed to extract data from a specific model type within a specific case tool. A concrete data adapter must therefore be programmed for each model type within each case tool of interest. For example an adapter is needed to parse class diagrams in Rational Rose, or class diagrams in XDE, or sequence diagrams in Rose. Constructing adapters represents the primary effort involved in adding a new traceable artifact type to a project. No other parts of the architecture need to be modified. • Requests for diagrams are forwarded by the resource broker to the MR diagram manager which is responsible for instantiating an appropriate concrete diagram adapter and forwarding diagram related messages. The concrete diagram adapter forwards specific requests for the data it needs to the MR data manager. • A concrete diagram generator is developed for each type of model. For example a concrete diagram generator is programmed to draw class diagrams, another one for sequence diagrams and so on. Multiple concrete diagram adapters could potentially utilize the services of a single concrete diagram generator. For example two diagram adapters created to extract classes from Rational Rose and XDE, could both utilize the services of a single diagram generator designed to generate class diagrams. The sharing of diagram generators is one way in which this architectural framework supports extensibility and reuse. Each concrete diagram adapter therefore instantiates an instance of the concrete diagram generator that it needs to generate its diagram. The diagram generator builds an SVG file containing the results which is returned via the MR diagram manager, the resource broker, the controller, and the trace manager to the Trace GUI where it is displayed in a browser using a generic SVG plug-in. As depicted in Figure 3, the trace engine, trace manager, data cleanser and trace manager are all situated on the Poirot server. To provide enterprise level traceability, the controller, resource manager, and resource broker are also situated on this server. Each case tool server hosts a Case Tool Server Ex: DOORS Resource Managers Adapters Diagram Generator Poirot Server Case tool (DOORS) Web Trace Manager Client Trace Engine Data Access Controller Resource Manager Case Tool Server Resource Broker Ex: Rat. Rose Resource Managers Adapters Diagram Generator Case Tool Server Case tool (Rose) Ex: XDE Resource Managers Adapters Diagram Generator Case tool (XDE) Figure 3. Deployment in Enterprise Environment managed resource data manager and diagram manager, the appropriate data and diagram adapters, one or more diagram generators, and one or more 3rd party case tools. 5. Interactions In this section we describe four primary scenarios supported by the framework. These include adding a new artifact type to a project, indexing artifacts residing in a 3rd party case tool, issuing a trace query, and requesting additional data from a case tool. 5.1 Adding a new artifact type If an appropriate data adapter and diagram adapter are available then no additional programming is required in order to add a new artifact type to a project. Poirot+ setup tools can be used to associate the new type with existing adapters, to install those adapters on the server hosting the case tool, and to specify the IP address and port at which the managed resource server will be situated. If no appropriate adapters are available, then a programmer must investigate the possible ways of accessing data within the case tool and develop a data and diagram adapter. If an appropriate diagram generator is already available it can be re-used, otherwise a diagram generator must also be constructed. Because developing a new adapter can be challenging and labor intensive, we plan to promote reuse through open sourcing the code for the architectural framework and all associated adapters and diagram generators. 5.2 Indexing artifacts As stated previously, key terms and other structural data must be stored in the trace database in order to support trace queries. Although both a pull and push model could be supported by the framework, the push model is a riskier approach and is dependent on 3rd party case tools having push capabilities such as the ability to trigger executable events when a state change occurs. As this type of feature is only available in more sophisticated tools our prototype implements a pull model. In the pull model, the project manager specifies a host URL, port number, and 3rd party : TraceManager : Controller : Resource : Resource : MR Data : Concrete : 3rd Party Manager Broker Manager Data Adapter Case Tool RequestArtifacts(String) RequestArtifacts(String) RequestArtifacts(String) RequestArtifacts(String, String) RequestData( ) RequestData( ) Proprietary data Formatted data XML Raw Data XML Raw Data XML Raw Data : DataCleanser Data Access : Cleanse(Document) Database Access XML Cleansed Data UpdateArtifact( ) Figure 4. Scenario for indexing artifacts case tool information for each artifact group that is to be integrated into the project. Such artifact groups are called managed resources (MRs). The project manager either sets up an automated import routine, or manually requests updates to occur on an as-needed basis. The sequence diagram shown in Figure 4 depicts the interactions that occur when an update request is issued. One of the key elements of the design is in the way data is returned to the trace manager. Regardless of the data source and type, all data is formatted in exactly the same way using the following XML DTD (document type definition). <!ELEMENT artifact(art_id,art_title,art_content,art_parent, art_type)> <!ELEMENT art_id (#PCDATA)> <!ELEMENT art_title (#PCDATA)> <!ELEMENT art_content(#PCDATA)> <!ELEMENT art_parent (#PCDATA)> <!ELEMENT art_type (#PCDATA)> For example, the raw data representation of a requirement might be represented as follows: <artifact> <art_id>UC102</art_id> <art_title>The image shall rotate 360 degrees</art_title> <art_content>The correct image shall be displayed from any angle selected by the user </art_content> <art_parent>HC032</art_parent> <art_type>Requirement</art_type> </artifact> TraceGUI : : TraceGUI : TraceManager : Database : Concrete TraceManager Access Trace Engine GenerateTraceQuery(String, String) RetrieveTermFrequencyData( ) Frequency List CalculateSimilarityScores( Frequency List) Similarity List Similarity List Figure 5. Trace Query This XML representation is returned to the controller which sends it through the data cleanser. Once terms are extracted and stemmed, the controller updates the terms into the trace database. A critical design element of the enterprise level trace architecture is that all artifact data is represented using the same XML DTD and is therefore uniformly managed by the controller and the trace retrieval algorithm regardless of its type or source. This enables new 3rd party case tools or new artifact model types to be added without the need to modify or recompile any of the components on the Poirot server, and without the need to restart the application. 4.3 Issue a trace query For performance reasons, all trace queries are dealt with locally. The analyst issues a query through the Trace GUI which is then passed to the trace manager. The trace manager retrieves the appropriate index data from the database and passes it to the trace engine which performs the calculations to determine probabilities or similarity scores. These scores are returned by the trace manager to the trace GUI where they are displayed on the screen. This scenario is illustrated in Figure 5. 5.4 Request for supporting visual data When an analyst reviews and evaluates candidate links, they may request additional information about a specific artifact. That artifact must then be displayed to the user in its native format. For example, if an analyst requests additional information about a class within a UML diagram, then that class must be visually displayed using standard UML notation. To facilitate the addition of new artifact types, the trace GUI must be able to display any potential type of artifact without specific knowledge of the artifact type. To support this requirement, upon receiving a request for a specific artifact to be displayed, the concrete diagram adapter invokes the services of the appropriate diagram generator which generates an SVG (scalable vector graphics) file represented in standard XML. This XML file is returned via the normal channels to the controller and forwarded to the web client, which displays it using a standard SVG plug-in. This scenario is depicted in Figure 6. : TraceGUI : TraceManager : Controller : Resource : Resource : MR Diagram : MR Data : 3rd Party Manager Broker Manager Manager Case Tool GenerateRequestForArtifactDiagram(String) RequestArtifactDiagram(String) RequestArtifactDiagram(String) RequestArtifactDiagram(String) RequestArtifactDiagram( ) : Concrete Data RequestArtifacts(String, String) Adapter RequestArtifacts(String, String) Artifact List Artifact List : Concrete Diagram Adapter GenerateSVGDiagram( DiagramData List) SVG Document SVG Document SVG Document SVG Document SVG Document SVG Document Figure 6. Request for supporting visual data 6. Using the Framework This enterprise level traceability framework makes a significant contribution to the ongoing work on dynamic traceability, because it provides a practical solution for implementing dynamic traceability in industry. A team of researchers at the DePaul Center for Requirements Engineering have constructed a prototype tool known as Poirot+. Poirot+ is a java based application using TomCat server to support the Trace Manager and RMI to connect the various managed resource servers. The prototype demonstrates all of the functionality described in this paper. It has been prototyped using two interchangeable trace engines based on the vector space model and on a probabilistic network approach, and adapters and corresponding diagram generators have been constructed to access class diagrams and activity diagrams in rational rose and sequence diagrams in XDE. 6.1 Future Work Future work will focus on extending the set of available adapters and class generators and utilizing Poirot+ to support our ongoing pilot study within Siemens. We also intend to create an open source community for enhancing the architecture and for creating new artifact adapters. Acknowledgments The work described in this paper was jointly funded by NSF grant CCR- 0306303, a grant from Siemens Corporate Research, and with ongoing support from Siemens Logistics and Automation plant in Grand Rapids, MI. We also thank the many students and faculty from DePaul University who contributed to the Poirot prototype. In addition to the authors and coauthors of this paper, Raffaella Settimi, Xuchang Zou, Chuan Duan, Ossama Ben Khadra, Jigar Mody, Komtanoo Pinpimai, Grace Bedford, Frederick Strahl, Wiktor Lukasik, and Harold Streeter all made invaluable contributions to the programming or design of the prototype.. References  G. Antoniol, G. Canfora, A. De Lucia, G. Casazza, “Information Retrieval Models for Recovering Traceability Links between Code and Documentation”, Proc. of the International Conference on Software Maintenance, San Jose, California, USA, Oct. 2000, pp. 40-51.  J. Cleland-Huang, R. Settimi, O. BenKhadra, E. Berezhanskaya, S. Christina, ”Goal- Centric Traceability for Managing Non-Functional Requirements”, Proc. Of the 27th International Conference on Software Engineering, St. Louis, USA, May 2005, pp. 362- 371.  J. Cleland-Huang, R. Settimi, C. Duan, X. Zou, “Utilizing Supporting Evidence to Improve Dynamic Requirements Traceability”, Proc. of the 13th IEEE International Requirements Engineering Conference, Paris, France, Aug 2005, pp.130-139.  R. Domges and K. Pohl, “Adapting Traceability Environments to Project Specific Needs”, Communications of the ACM, Vol. 41, No. 12, 1998, pp. 55-62.  O. Gotel, A. Finkelstein, “An Analysis of the Requirements Traceability Problem”. 1st Intn’l Conference on Requirements Engineering, Colorado Springs, USA, Apr. 1994, pp. 94-101.  J. Huffman-Hayes, A. Dekhtyar, and J. Osborne, “Improving Requirements Tracing via Information Retrieval”, Proc. Of the 11th IEEE International Requirements Engineering Conference, Monterey, CA, USA, Sept. 2003, pp.138-150.  J. Huffman-Hayes, A. Dekhtyar, S. Karthikeyan Sundaram, S. Howard, “Helping Analysts Trace Requirements: An Objective Look”, Proc. of the 12th International Requirements Engineering Conference, Kyoto, Japan, Sept. 2004, pp.249-259  M. Jarke, “Requirements Traceability”, Communications of the ACM, Vol. 41, No. 12, Dec. 1998, pp. 32-36.  J. I. Maletic, E. V. Munson, A. Marcus, and T. N. Nguyen, “Using a hypertext model for traceability link conformance analysis”, Proc. of the 2nd Intn’l workshop on Traceability in Emerging Forms of Soft. Eng., Montreal, Oct. 2003. pp 47-54.  A, Marcus, J. I.. Maletic, "Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing", Proc. of 25th IEEE Intn’l Conference on Software Engineering, Portland, Oregon, USA, May 2003, pp. 125-137.  R. Settimi, J. Cleland-Huang, O. BenKhadra, J. Mody, W. Lukasik, and C. DePalma, C., “Supporting Change in Evolving Software Systems through Dynamic Traces to UML”, Proc. of the 7th IEEE International Workshop on Principles of Software Evolution, Kyoto, Japan, Sept. 2004, pp.49-54.  X. Zou, R. Settimi, J. Cleland-Huang, C S. Miller, “Supporting Trace Evaluation with Confidence Scores”, Proc. of Requirements Engineering Decision Support, Paris, Aug. 2005. pp 1-6.  S.K.M. Wong, and Y.Y. Yao, “A probabilistic inference model for information retrieval” Information Systems, Vol. 16. No. 3, pp.301-321.