An Extensible Architecture for Enterprise-wide

W
Document Sample
scope of work template
							               An Extensible Architecture for Enterprise-wide
                   Automated Requirements Traceability

       Jun Lin1, Chan Chou Lin2, Joseph Amaya3, Massimo Ilario4, Jane Cleland-Huang5
                            Center for Requirements Engineering
          School of Computer Science, Telecommunications and Information Systems
                                     DePaul University
                            1
           linkjoin@163.com ; enzolin@gmail.com2;jamaya1@students.depaul.edu3;
                       sub0@speakeasy.com4; jhuang@cs.depaul.edu5


                                            Abstract

Automated traceability utilizes information retrieval methods to dynamically generate
traceability links on an as-needed basis. Although a significant body of research has
demonstrated the feasibility of automated traceability a rather ad-hoc approach that often
involves significant human effort has previously been used to import the traceable data from 3rd
party case tools into the trace tool. This paper describes an architectural framework and
corresponding prototype tool for providing “in-place” traceability in which data residing in
distributed 3rd party case tools is automatically parsed to extract information needed to service
trace queries. The framework provides extensibility for adding new types of traceable artifacts
and 3rd party case tools. An open source model is proposed in which organizations contribute to
the development of specialized adapters for different case tools and artifact types.


1. Introduction

    Traceability enables relationships between different software entities such as requirements,
design artifacts, code, and test cases to be clearly defined [8]. It therefore provides support for
activities such as impact analysis, requirements validation, rationale management, and test case
selection. Although developers can explicitly define and maintain traceability links using
spreadsheets, tables, databases or requirements management tools, the practical difficulties and
effort involved in maintaining links in an accurate state means that traceability can be
prohibitively expensive and that many organizations have no systematic traceability process in
place [4,5].
      To address this problem, several researchers have investigated the feasibility of an
automated approach to dynamically generate traceability links using information retrieval
techniques [1,2,3,6,7,9,10,11,12,13]. In current methods, artifacts are typically extracted from
their natural contexts, indexed for tracing, and then stored within a proprietary traceability
database. For many types of traceable artifact, this part of the process is labor intensive and
means that only carefully prepared artifacts and documents can be traced by the tool. However,
for automated trace retrieval to be useful within industry, artifacts must be traceable within the
case tools in which they are created and maintained. This type of “in-place” traceability would
minimize the manual work involved in preparing a document for tracing and would make
automated traceability feasible within an industrial context.
               a. Query screen                                 b. Results screen
                             Figure 1. An automated trace query.

      This paper describes an architectural framework for providing enterprise level connectivity
between an automated trace retrieval tool and third party packages such as Rational Rose, XDE,
and DOORs. The framework is designed to provide extensibility so that additional 3rd party case
tools can be added at runtime. It also supports the use of different underlying trace engines.
This framework improves on prior work through enabling traceability of artifacts within their
native and often distributed 3rd party case tools. Poirot+ is essential in order for deploying an
automated trace tool in industry.
      Section 2 of this paper provides a brief introduction to automated traceability illustrated
using the Poirot tool. Section 3 discusses and outlines the requirements for an enterprise level
traceability tool. Section 4 then describes the Poirot+ architectural framework for supporting
enterprise level traceability, section 5 describes architectural support for four critical features,
and section 6 concludes with an analysis of the framework and its application in industry.

2. Automated traceability tool

    Automated traceability tools utilize an assortment of information retrieval techniques in order
to dynamically generate traces between two or more artifacts. Several researchers have
developed tools based upon the Vector Space Model [6,7,11], Latent Semantic Indexing (LSI),
probabilistic network models [2,3,11], and other similar methods. In all of these approaches a
probability value or similarity score is calculated that depicts the likelihood of a link occurring
between a pair of artifacts. Links with scores over a certain threshold value are typically
categorized as candidate links and presented to an analyst for evaluation[12].
    The process is depicted in Figure 1, which illustrates a trace query using Poirot: Tracemaker.
An analyst issues a query by specifying either an artifact ID or by typing in a free-text query,
representing a change request or proposed new requirement. The analyst also specifies one or
more target artifact groups. For example in Figure 1a, a query is issued for requirement 9014
against all UML classes.
    The underlying trace retrieval algorithm then calculates the probability of a link between the
query and each class diagram and as depicted in Figure 1b, displays these results to the analyst
for evaluation. During the course of evaluating the links, the analyst may request additional
information concerning an artifact, which must then be retrieved and displayed. For example, in
Figure 1b, the analyst has requested additional information about a UML class.

3. Enterprise Level Trace Requirements
    In a typical software development environment, artifacts from various stages of the software
development lifecycle are stored in different case tools distributed throughout the organization.
For example, software requirements may be stored in some combination of word files and
requirements management tools such as DOORS or Requisite Pro. Design models may be
embedded as visio diagrams into word documents, or stored as UML diagrams within a case tool
such as Rational Rose or Poseidon. Code may be created and stored within Eclipse, and test
cases may be stored in tools such as Test Director or Clear Case. An enterprise level distributed
traceability solution must interface with these types of tools in order to provide traceability for
the artifacts residing in them.

3.1 Third Party Case Tools
    Clearly one of the daunting challenges to achieving enterprise level traceability is the sheer
variety of 3rd party case tools as well as the diversity of model types and organization of artifacts
possible within each tool. Each model type within each tool needs to be accessed and parsed in a
unique way. For example, some 3rd party tools have the ability to create and export XML or
XMI files; others provide API’s for directly accessing and retrieving data from their proprietary
databases; some have an underlying database structure that is easily accessible and
understandable; while others simply store their artifacts as readable text files in clearly accessible
directories. In order to provide traceability across artifacts stored in these 3rd party case tools
some type of external access to the artifacts must be provided by the tool.

3.2 Data Storage
    To prevent the problem of massive data redundancy and stale data, only the minimal amount
of data needed to service a trace query must be stored in the trace tool’s proprietary database.
All other data should be stored in native format within the 3rd party case tool.

3.3 Extensibility
    In addition to supporting a broad set of 3rd party tools, the architecture must support the
runtime addition and relocation of these tools without the need to recompile or restart the
traceability tool. This flexibility is essential because new case tools and artifact types may be
introduced to a project at any time.

4. Poirot+ An Enterprise level architecture for automated traceability

    A high-level view of the Poirot+ architecture is given in Figure 2. The primary components
include the trace GUI, trace manager, trace engine, controller, data cleanser, resource manager,
resource broker, data manager and adapter, diagram manager and adapter, and the diagram
generator. These are described in more detail:
• The TraceGUI provides a browser based component for issuing trace queries and displaying
    results. This functionality is depicted in Figure 1. The web client is also used for setting up
                                                                                                  Trace Engine

                                                               TraceGUI                    CalculateSimilarityScores()

                                                          GetQuery()
                                                          DisplayResults()
                                                          DefineProjectArtifacts()         Concrete Trace Engine
                                                          opname2()
                                                                                                       1
                                                                                                       1
                   Database Access                                              TraceManager

               UpdateArtifact()                           GenerateTraceQuery(queryID : String, searchType String)
               UpdateArtifactType()                       GenerateTraceQuery(FreeText String, SearchType String)
               UpdateTermFrequencyData()                  GenerateRequestForArtifactDiagram(artifactID : String)
               RetrieveTermFrequencyData()                StoreUserFeedback(artifactID : String, status : Boolean)
                                                1


                         DataCleanser                           1
                                                                                     Controller

               Cleanse(artifactXML : Document)
                                                               RequestArtifacts(artifactType : String) : XML Document
               StopTerms(Terms : String)
                                                               RequestArtifactDiagram(artifactID : String)
               StemTerms(Terms : String)
                                                               ReceiveArtifacts(artifactXML : Document)
               SplitTerms(Terms : String)
                                                               CleanseArtifact(artifactXML : Document)
               CountTerms(TermsXML : Document)
                                                               UpdateDatabase(artifactXML : Document)
               FindPhrases(artifactXML : Document)


                                                                                      Resource Manager

                                            Resource Broker                 RequestArtifacts(artifactType : String)
                                                                            RequestArtifactDiagram(artifactID : String)
                                                                            AcceptArtifacts(artifactXML : Document)
                                                                            LocateResource()


                MR Data Manager                          MR Diagram Manager               <<Diagram Generator>>
               XMLDocument              1           1    SVGDocument                        SVGDocument

               RequestArtifacts()                   1    RequestArtifactDiagram()          GenerateSVGDiagram()
               ReturnXMLDocument()                       ReturnSVGDocument()


                <<Data Adapter>>
                                                              <<Diagram Adapter>>         Concrete Diagram Generator
                   RequestData()
                                                                 RequestDiagram()
                                                                                                   1

              Concrete Data Adapter
                                                                     1
                                                                                                  *
                                                                         Concrete Diagram Adapter


               3rd Party Case Tool


                                            Figure 2. High level classes

    a project, specifying the location of 3rd party case tools to be managed by the project, and
    specifying how and when key index terms are to be retrieved from those projects.
•   The TraceEngine provides the underlying functionality for generating traces between
    artifacts. It takes a query as input, uses a retrieval algorithm to calculate similarity scores,
    and then returns a list of candidate links ranked according to likelihood of a link. The user of
    a standard interface enables the use of different underlying trace engines. For example, our
    prototype tool implements both a probabilistic network and a vector space model.
•    The TraceManager coordinates traceability activities. When it receives requests from the
     Trace GUI to process a query it calls on the services of the trace engine and then returns the
     results to the GUI. It also can provide the GUI with data directly from the database. Finally,
     if the GUI requests additional information that is not stored in the database the trace manager
     forwards the request to the controller.
• The controller is one of the key components of the enterprise traceability architecture. The
     primary function it serves is to control the flow of information between the local trace
     manager and the 3rd party case tools. The controller manages requests for distributed
     information from the trace GUI client, controls the process of receiving data, and uses the
     services of the data cleanser to extract the necessary index terms.
• The data cleanser accepts raw data structured in an XML document and processes it by
     stemming words to their root forms, removing unimportant “stop” words, splitting variable
     names (such as mLastName to the words ‘last’ and ‘name’), identifying phrases, and
     performing other pre-parsing tasks needed to prepare the data for tracing.
• The resource manager uses the services of the Resource broker to locate the IP address and
     port of a remote adapter and to forward messages through the appropriate data or diagram
     manager to the adapter which connects to a 3rd party case tool.
• Requests for data are forwarded to the appropriate managed resource (MR) data manager.
     The MR data manager is responsible for instantiating an appropriate concrete data adapter
     and forwarding data requests to it.
• The concrete data adapter understands the specific interface needed to extract data from a
     specific model type within a specific case tool. A concrete data adapter must therefore be
     programmed for each model type within each case tool of interest. For example an adapter is
     needed to parse class diagrams in Rational Rose, or class diagrams in XDE, or sequence
     diagrams in Rose. Constructing adapters represents the primary effort involved in adding a
     new traceable artifact type to a project. No other parts of the architecture need to be
     modified.
• Requests for diagrams are forwarded by the resource broker to the MR diagram manager
     which is responsible for instantiating an appropriate concrete diagram adapter and
     forwarding diagram related messages. The concrete diagram adapter forwards specific
     requests for the data it needs to the MR data manager.
•         A concrete diagram generator is developed for each type of model. For example a
concrete diagram generator is programmed to draw class diagrams, another one for sequence
diagrams and so on. Multiple concrete diagram adapters could potentially utilize the services of
a single concrete diagram generator. For example two diagram adapters created to extract
classes from Rational Rose and XDE, could both utilize the services of a single diagram
generator designed to generate class diagrams. The sharing of diagram generators is one way
in which this architectural framework supports extensibility and reuse. Each concrete diagram
adapter therefore instantiates an instance of the concrete diagram generator that it needs to
generate its diagram. The diagram generator builds an SVG file containing the results which is
returned via the MR diagram manager, the resource broker, the controller, and the trace manager
to the Trace GUI where it is displayed in a browser using a generic SVG plug-in.
     As depicted in Figure 3, the trace engine, trace manager, data cleanser and trace manager are
all situated on the Poirot server. To provide enterprise level traceability, the controller, resource
manager, and resource broker are also situated on this server. Each case tool server hosts a
                                                               Case Tool Server
                                                                 Ex: DOORS
                                                               Resource Managers
                                                                   Adapters
                                                               Diagram Generator
                                         Poirot Server
                                                              Case tool (DOORS)
                          Web           Trace Manager
                          Client        Trace Engine
                                        Data Access
                                        Controller
                                        Resource Manager     Case Tool Server
                                        Resource Broker       Ex: Rat. Rose
                                                             Resource Managers
                                                                 Adapters
                                                             Diagram Generator
                                        Case Tool Server      Case tool (Rose)
                                            Ex: XDE
                                       Resource Managers
                                           Adapters
                                       Diagram Generator
                                        Case tool (XDE)

                          Figure 3. Deployment in Enterprise Environment
managed resource data manager and diagram manager, the appropriate data and diagram
adapters, one or more diagram generators, and one or more 3rd party case tools.

5. Interactions

    In this section we describe four primary scenarios supported by the framework. These
include adding a new artifact type to a project, indexing artifacts residing in a 3rd party case tool,
issuing a trace query, and requesting additional data from a case tool.

5.1 Adding a new artifact type

    If an appropriate data adapter and diagram adapter are available then no additional
programming is required in order to add a new artifact type to a project. Poirot+ setup tools can
be used to associate the new type with existing adapters, to install those adapters on the server
hosting the case tool, and to specify the IP address and port at which the managed resource
server will be situated. If no appropriate adapters are available, then a programmer must
investigate the possible ways of accessing data within the case tool and develop a data and
diagram adapter. If an appropriate diagram generator is already available it can be re-used,
otherwise a diagram generator must also be constructed. Because developing a new adapter can
be challenging and labor intensive, we plan to promote reuse through open sourcing the code for
the architectural framework and all associated adapters and diagram generators.

5.2 Indexing artifacts

    As stated previously, key terms and other structural data must be stored in the trace database
in order to support trace queries. Although both a pull and push model could be supported by the
framework, the push model is a riskier approach and is dependent on 3rd party case tools having
push capabilities such as the ability to trigger executable events when a state change occurs. As
this type of feature is only available in more sophisticated tools our prototype implements a pull
model. In the pull model, the project manager specifies a host URL, port number, and 3rd party
  : TraceManager    : Controller                                      : Resource   : Resource      : MR Data     : Concrete       : 3rd Party
                                                                       Manager       Broker         Manager     Data Adapter      Case Tool
      RequestArtifacts(String)
                                         RequestArtifacts(String)
                                                                        RequestArtifacts(String)
                                                                                   RequestArtifacts(String, String)
                                                                                                         RequestData( )
                                                                                                                          RequestData( )
                                                                                                                       Proprietary data
                                                                                                         Formatted data
                                                                                          XML Raw Data
                                                                            XML Raw Data
                                              XML Raw Data
                                   : DataCleanser


                                                     Data Access :
                        Cleanse(Document)
                                                    Database Access
                        XML Cleansed Data
                                   UpdateArtifact( )




                                          Figure 4. Scenario for indexing artifacts

case tool information for each artifact group that is to be integrated into the project. Such artifact
groups are called managed resources (MRs). The project manager either sets up an automated
import routine, or manually requests updates to occur on an as-needed basis.
    The sequence diagram shown in Figure 4 depicts the interactions that occur when an update
request is issued. One of the key elements of the design is in the way data is returned to the trace
manager. Regardless of the data source and type, all data is formatted in exactly the same way
using the following XML DTD (document type definition).

<!ELEMENT          artifact(art_id,art_title,art_content,art_parent, art_type)>
<!ELEMENT          art_id     (#PCDATA)>
<!ELEMENT          art_title (#PCDATA)>
<!ELEMENT          art_content(#PCDATA)>
<!ELEMENT          art_parent (#PCDATA)>
<!ELEMENT          art_type   (#PCDATA)>
For example, the raw data representation of a requirement might be represented as follows:
<artifact>
      <art_id>UC102</art_id>
      <art_title>The image shall rotate 360 degrees</art_title>
      <art_content>The correct image shall be displayed from any angle selected by the user
      </art_content>
      <art_parent>HC032</art_parent>
      <art_type>Requirement</art_type>
</artifact>
                              TraceGUI
                            : : TraceGUI          :
                                            TraceManager         : Database         : Concrete
                                            TraceManager           Access          Trace Engine
                          GenerateTraceQuery(String, String)

                                              RetrieveTermFrequencyData( )

                                                      Frequency List

                                                  CalculateSimilarityScores( Frequency List)

                                                               Similarity List

                                    Similarity List




                                           Figure 5. Trace Query

    This XML representation is returned to the controller which sends it through the data
cleanser. Once terms are extracted and stemmed, the controller updates the terms into the trace
database.
    A critical design element of the enterprise level trace architecture is that all artifact data is
represented using the same XML DTD and is therefore uniformly managed by the controller and
the trace retrieval algorithm regardless of its type or source. This enables new 3rd party case
tools or new artifact model types to be added without the need to modify or recompile any of the
components on the Poirot server, and without the need to restart the application.

4.3 Issue a trace query

    For performance reasons, all trace queries are dealt with locally. The analyst issues a query
through the Trace GUI which is then passed to the trace manager. The trace manager retrieves
the appropriate index data from the database and passes it to the trace engine which performs the
calculations to determine probabilities or similarity scores. These scores are returned by the
trace manager to the trace GUI where they are displayed on the screen. This scenario is
illustrated in Figure 5.

5.4 Request for supporting visual data

    When an analyst reviews and evaluates candidate links, they may request additional
information about a specific artifact. That artifact must then be displayed to the user in its native
format. For example, if an analyst requests additional information about a class within a UML
diagram, then that class must be visually displayed using standard UML notation. To facilitate
the addition of new artifact types, the trace GUI must be able to display any potential type of
artifact without specific knowledge of the artifact type.
    To support this requirement, upon receiving a request for a specific artifact to be displayed,
the concrete diagram adapter invokes the services of the appropriate diagram generator which
generates an SVG (scalable vector graphics) file represented in standard XML. This XML file is
returned via the normal channels to the controller and forwarded to the web client, which
displays it using a standard SVG plug-in. This scenario is depicted in Figure 6.
  : TraceGUI   : TraceManager   : Controller      : Resource   : Resource    : MR Diagram    : MR Data                          : 3rd Party
                                                   Manager       Broker         Manager       Manager                            Case Tool

 GenerateRequestForArtifactDiagram(String)
                 RequestArtifactDiagram(String)
                                RequestArtifactDiagram(String)
                                               RequestArtifactDiagram(String)
                                                                 RequestArtifactDiagram( )
                                                                                                            : Concrete Data
                                                                        RequestArtifacts(String, String)
                                                                                                               Adapter

                                                                                             RequestArtifacts(String, String)




                                                                                                     Artifact List
                                                                                     Artifact List
                                                                                                                                              : Concrete Diagram
                                                                                                                                                    Adapter
                                                                                                     GenerateSVGDiagram( DiagramData List)
                                                                                                              SVG Document
                                                                     SVG Document
                                                        SVG Document
                                        SVG Document
                        SVG Document
        SVG Document




                                      Figure 6. Request for supporting visual data
6. Using the Framework

    This enterprise level traceability framework makes a significant contribution to the ongoing
work on dynamic traceability, because it provides a practical solution for implementing dynamic
traceability in industry. A team of researchers at the DePaul Center for Requirements
Engineering have constructed a prototype tool known as Poirot+. Poirot+ is a java based
application using TomCat server to support the Trace Manager and RMI to connect the various
managed resource servers. The prototype demonstrates all of the functionality described in this
paper. It has been prototyped using two interchangeable trace engines based on the vector space
model and on a probabilistic network approach, and adapters and corresponding diagram
generators have been constructed to access class diagrams and activity diagrams in rational rose
and sequence diagrams in XDE.

6.1 Future Work
     Future work will focus on extending the set of available adapters and class generators and
utilizing Poirot+ to support our ongoing pilot study within Siemens. We also intend to create an
open source community for enhancing the architecture and for creating new artifact adapters.

Acknowledgments
   The work described in this paper was jointly funded by NSF grant CCR- 0306303, a grant
from Siemens Corporate Research, and with ongoing support from Siemens Logistics and
Automation plant in Grand Rapids, MI. We also thank the many students and faculty from
DePaul University who contributed to the Poirot prototype. In addition to the authors and
coauthors of this paper, Raffaella Settimi, Xuchang Zou, Chuan Duan, Ossama Ben Khadra,
Jigar Mody, Komtanoo Pinpimai, Grace Bedford, Frederick Strahl, Wiktor Lukasik, and Harold
Streeter all made invaluable contributions to the programming or design of the prototype..

References

[1]    G. Antoniol, G. Canfora, A. De Lucia, G. Casazza, “Information Retrieval Models for
       Recovering Traceability Links between Code and Documentation”, Proc. of the
       International Conference on Software Maintenance, San Jose, California, USA, Oct. 2000,
       pp. 40-51.
[2]    J. Cleland-Huang, R. Settimi, O. BenKhadra, E. Berezhanskaya, S. Christina, ”Goal-
       Centric Traceability for Managing Non-Functional Requirements”, Proc. Of the 27th
       International Conference on Software Engineering, St. Louis, USA, May 2005, pp. 362-
       371.
[3]    J. Cleland-Huang, R. Settimi, C. Duan, X. Zou, “Utilizing Supporting Evidence to Improve
       Dynamic Requirements Traceability”, Proc. of the 13th IEEE International Requirements
       Engineering Conference, Paris, France, Aug 2005, pp.130-139.
[4]    R. Domges and K. Pohl, “Adapting Traceability Environments to Project Specific Needs”,
       Communications of the ACM, Vol. 41, No. 12, 1998, pp. 55-62.
[5]    O. Gotel, A. Finkelstein, “An Analysis of the Requirements Traceability Problem”. 1st
       Intn’l Conference on Requirements Engineering, Colorado Springs, USA, Apr. 1994, pp.
       94-101.
[6]    J. Huffman-Hayes, A. Dekhtyar, and J. Osborne, “Improving Requirements Tracing via
       Information Retrieval”, Proc. Of the 11th IEEE International Requirements Engineering
       Conference, Monterey, CA, USA, Sept. 2003, pp.138-150.
[7]    J. Huffman-Hayes, A. Dekhtyar, S. Karthikeyan Sundaram, S. Howard, “Helping Analysts
       Trace Requirements: An Objective Look”, Proc. of the 12th International Requirements
       Engineering Conference, Kyoto, Japan, Sept. 2004, pp.249-259
[8]    M. Jarke, “Requirements Traceability”, Communications of the ACM, Vol. 41, No. 12,
       Dec. 1998, pp. 32-36.
[9]    J. I. Maletic, E. V. Munson, A. Marcus, and T. N. Nguyen, “Using a hypertext model for
       traceability link conformance analysis”, Proc. of the 2nd Intn’l workshop on Traceability in
       Emerging Forms of Soft. Eng., Montreal, Oct. 2003. pp 47-54.
[10]   A, Marcus, J. I.. Maletic, "Recovering Documentation-to-Source-Code Traceability Links
       using Latent Semantic Indexing", Proc. of 25th IEEE Intn’l Conference on Software
       Engineering, Portland, Oregon, USA, May 2003, pp. 125-137.
[11]   R. Settimi, J. Cleland-Huang, O. BenKhadra, J. Mody, W. Lukasik, and C. DePalma, C.,
       “Supporting Change in Evolving Software Systems through Dynamic Traces to UML”,
       Proc. of the 7th IEEE International Workshop on Principles of Software Evolution, Kyoto,
       Japan, Sept. 2004, pp.49-54.
[12]   X. Zou, R. Settimi, J. Cleland-Huang, C S. Miller, “Supporting Trace Evaluation with
       Confidence Scores”, Proc. of Requirements Engineering Decision Support, Paris, Aug.
       2005. pp 1-6.
[13]   S.K.M. Wong, and Y.Y. Yao, “A probabilistic inference model for information retrieval”
       Information Systems, Vol. 16. No. 3, pp.301-321.