7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 1
Static Support for Understanding SOA Descriptions:
Exploring the Requirements
Laura White, Thomas Reichherzer, Norman Wilde, John Coffey
{lwhite|treichherzer|nwilde|jcoffey}@uwf.edu
Douglas Leal, Joshua Dault, Juan Gil Restrepo, David Kaczynski,
{ddl12|jbd16|jg38|dak19}@students.uwf.edu
Executive Summary
Service Oriented Architecture (SOA) has emerged as a way of providing flexibility to large
scale software systems. However there may be problems in understanding and
maintaining software constructed using this new paradigm. This report summarizes some
of the issues that have been discussed and analyzes the requirements for static analysis
tools to aid SOA maintainers. It also describes ongoing work on SOAMiner, a text search
tool tuned to the analysis of SOA description files such as WSDL's, XSD's, and BPEL's.
SOAMiner is currently under development following a spiral model to clarify requirements
through repeated evaluations of prototypes.
2
This report may be cited as S ERC-TR-303, Security and Software Engineering
2
Research Center (S ERC), http://www.serc.net, July 1, 2010.
Table of Contents
1 Introduction and Motivation .................................................................................................... 2
2 Program Comprehension Tools and SOA ............................................................................... 3
3 Static SOA Program Comprehension in Context .................................................................... 4
4 The SOAMiner tool ................................................................................................................. 6
5 Initial Studies with SOAMiner ................................................................................................ 7
5.1 Case Study Sources ......................................................................................................... 7
5.1.1 The Travel Reservations Service ............................................................................. 8
5.1.2 A WSDL from MicroPAVER™ ............................................................................. 8
5.1.3 SOA Descriptions Harvested from the Web............................................................ 8
5.2 Scalability Study.............................................................................................................. 8
5.3 Basic Maintenance Scenario Study ................................................................................. 9
5.4 Locating Data Type Usages........................................................................................... 10
6 Conclusions ........................................................................................................................... 10
7 Acknowledgements ............................................................................................................... 11
8 References ............................................................................................................................. 11
Appendix A - Results of the Basic Maintenance Scenario Study ................................................. 14
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 2
1 INTRODUCTION AND MOTIVATION
In recent years many organizations have turned to Service Oriented Architectures (SOA) as a way
to structure large software systems. While there are many different views of SOA, most of them
describe applications structured as a collection of services, running on different nodes, and
loosely coupled by exchange of messages via a layer of SOA infrastructure, sometimes called an
Enterprise Service Bus (Figure 1)
Figure 1 – Structure of a Service Oriented Architecture Application
Since the emergence of the SOA architectural style in the early 2000's, some concern has been
expressed about how this new generation of computer applications will be maintained.
Maintenance has always been the most expensive phase of the software life cycle, primarily due
to the need to sustain understanding of complex code as it grows, often losing structure in the
process, and is handed off from one group of software engineers to another. Software changes, be
they bug fixes or enhancements, become much more risky and time-consuming as knowledge is
lost. There seems to be little reason why these same issues will not emerge over time with SOA.
In fact, some have described SOA systems as being in continuous evolution (or permanent beta)
as soon as they are deployed [KONT:2008].
We will explore possible requirements for static analysis tools to aid SOA software engineers and
specifically will describe ongoing work to create SOAMiner, a software engineer's search tool
that users might think of as a Google* for SOA.
An initial motivation for the development of SOAMiner came from working with students on an
introductory SOA tutorial distributed as part of the Netbeans development environment. The
Travel Reservations Service [KOVAL:2008] is intended to be a simple example of the use of
BPEL to orchestrate services, and consists of a BPEL module and three partner services. The
partners are simply stubs, designed to simulate reserving airline seats, hotel rooms and rental cars.
Yet the whole example once deployed consists of 129 files distributed across 49 directories, not
counting files actually deployed to the server (Table 1). While the tutorial went smoothly as a
*
Google is a trade mark of Google, Inc.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 3
demonstration of Netbeans' BPEL capabilities, as novices we found ourselves completely
bewildered by the multiplicity of components it used.
Table 1 - Size of the Travel Reservation Service Example
Initially After Deploy After Deploy
Partner Composite
Services Application
Directories 26 32 42
Files 76 105 129
We hypothesized that any future maintainer would encounter equal bewilderment if faced with
the need to modify such a system. Obviously some aid to navigation through the mass of material
could be useful, and a SOAMiner search engine tuned to SOA requirements seemed to be a
relevant and understandable analogy.
Further study of the Travel Reservation Service and other examples indicated that much of the
complexity is in the files that serve to tie the application together and to deploy it to an
application server. For lack of a better term we call these SOA description files; they include:
Web Service Definition Language (WSDL) files, which specify the interfaces of web
services and the addresses on which they are deployed
XML Schema Definition (XSD) files, which may define data types used in messages
Business Process Execution Language (BPEL) files, used to specify the orchestration of
services
A variety of XML files of different types, apparently containing mappings used to
provide information to the programming environment or to the application server where
the services will be deployed.
These SOA description files provide a complex web of information that may provide essential
background for a software maintenance task. For example a data type may be described in an
XSD file, and then referred to in a message description within a WSDL file, which is then
mapped to a service operation within that same WSDL, which specifies the URL where the
service may be accessed, which is in turn mapped to specific EJB's by an XML deployment
descriptor. A software maintainer may need to comprehend this web of relationships to fully
understand the consequences of any change to the data type.
2 PROGRAM COMPREHENSION TOOLS AND SOA
There is a fairly extensive body of literature on the comprehension of pre-SOA styles of software
including reports of research that has been backed up by experiments or careful case studies. It is
clear from this literature that experienced software engineers use a pragmatic, as-needed strategy
in studying unfamiliar code. They rarely attempt to understand a large program in its entirety, but
rather seek out those parts that are essential for the specific task they have at hand [KOEN:1991].
This finding provides the motivation for the development of tools to help software engineers
locate and browse code using different criteria.
The actual mental processes used during comprehension are complex. For example von
Mayrhauser and Vans observed experienced software engineers as they worked and noted that
they switch back and forth between different perspectives: a program model (overall control flow
of the code), a situation model (functional and data flow abstraction), and a top-down model
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 4
(knowledge of the application domain) [VONM:1994]. The conclusions of this line of research
emphasize the rapid mental switching involved as engineers recognize "beacons" such as variable
names or code patterns, and extract information from multiple sources. This view puts a premium
on agile tools that can give answers quickly and play well within the engineer's development
environment.
For the specific case of SOA applications, there is little published work on program
comprehension that is based on experimental research but there have been a number of
discussions in the literature of the potential problems to be expected. In a panel discussion at the
2004 International Conference on Software Maintenance several of the panelists focused on
organizational and software process changes that may become necessary [KAJK:2004]. Kajko-
Mattson and Tepczynski later elaborated further on these suggested organizational changes and
on the concept of "Service Centers" to specialize in the maintenance of web services
[KAJK:2005]. Gold et al. describe comprehension issues in scenarios in which applications are
composed dynamically, possible differently on every invocation, using broker services that may
not disclose their inner workings [GOLD:2004b]. Wilde et al. discuss proposals for understanding
specific features in SOA applications, based on their experiences with earlier kinds of distributed
software [WILDE:2008].
Gold and Bennett provide some interesting experience based on the development of a prototype
health information service [GOLD:2004a]. This system involved integrating information from a
wide range of health service providers, and not surprisingly they found that the integration of
multiple changing data models and ontologies presents significant challenges. Either interfaces
must be tightly coordinated among participating organizations or code must be constructed to
cope with minor interface changes. Tracing of execution patterns via "audit services" could help
both program comprehension and debugging, especially if services are composed on-the-fly.
More recently, two papers at the 2008 Frontiers of Software Maintenance workshop addressed
SOA. Lewis and Smith [LEWIS2008] discuss some of the issues related to the evolution of SOA
systems, notably the problems of dealing with distributed systems with multiple owners and the
comprehension difficulties of having expertise to deal with multiple languages and operating
environments. Kontogiannis also discussed the multi-language issues and the need for processes
to support the continuous incremental evolution characteristic of deployed SOA applications.
[KONT:2008].
3 STATIC SOA PROGRAM COMPREHENSION IN CONTEXT
The history of static support for program comprehension shows an interesting evolution from
simpler to more complex tools (see Table 2)
Table 2 - The Evolution of Static Program Comprehension
Level Tool Category Characteristics
1 cross-referencing, single source file, no user interface (hardcopy),
indexing byproduct of the compilation process
2 text search, regular multiple file search, initially command line interface,
expression later GUIs
3 graph model of impacts database of whole software system, various query
or dependencies interfaces, tracing of chains of relationships
4 design recovery specialized tools for recovery of specific
abstractions assumed to be useful to maintainers
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 5
From the 1960's, compilers have often provided a cross reference listing of source code as a
byproduct of the compilation process. This simple list of identifiers giving the line numbers
where each was used was a significant aid for software maintainers, especially at a time when the
use of global data was much more prevalent than it has now become. For example the cross
reference helped a maintainer to understand data flows since he could see all the places where a
particular variable was set and accessed. This kind of tracing became, of course, more and more
tedious as programs grew and separate compilation units became common.
In the 1970's programmers gained continuous access to source code through time sharing
terminals. Source code was now more often split across an increasing number of files so multi-
file search tools were needed. A classic example was the grep tool for regular expression search
which has continued to be available in most Unix environments [OPEN:2004]. Regular
expression matching provided more freedom in expressing queries, but at the cost of possible
false matches to irrelevant code and comments. Tracing effects through code continued to be
difficult, requiring the maintainer to locate each 'hit' in the code, evaluate its relevance, and then
possibly generate new queries based on the evaluation. Regular expression search tools are now
commonly built into programming environments such as Eclipse and Netbeans and continue to be
a maintenance programmer's favorite for many tasks.
A third level of development emerged in the 1980's and 1990's with tools founded on more
sophisticated models of the source code. These models were typically based on graphs of some
sort, where the nodes represented different entities in the source code (e.g. variables, data types,
classes, functions or methods) and the arcs represented different kinds of relationships between
them. Tools for relationship extraction [CHEN:1990], program slicing [GALL:1991],
dependency analysis [LINO:1994] and impact analysis [QUEI:1994] fall into this category. Many
of the tools at this level aimed to reduce the tedium of tracing effects through the code by
allowing queries about chains of relationships, for example, "show all parts of the code that affect
the value of variable X".
The fourth level of sophistication is seen in tools that provide or support design recovery. At this
level the toolmaker attempts to abstract up from the code to a higher level representation and
provide a concise response to questions maintainers are presumed to ask about the code. There
are many such tools in the literature with specialized goals and approaches, e.g. [MURPH:1997],
[DILUC:2000]. Gueheneuc, Mens and Wuyts provide a classification scheme [GUEH:2006].
Perhaps because of this specialization, relatively few of these tools have passed into widespread
use.
In thinking about tool requirements, we see three interesting progressions as we move up this
scale of tools. At the lower levels the tool does much less work and more is left to the software
maintainer. That has benefits as well as costs since maintainers generally know quite a lot about
the code they are faced with. They are likely to be familiar with the problem domain, with the
run-time environments, and possibly with coding practices used in development. Even
maintainers lacking these advantages may have access to colleagues who can help them over the
rough spots. Thus in formulating queries and viewing results they have a substantial advantage
over a software tool, no matter how sophisticated it may be.
A second progression derives from the first. Since at the lower levels the maintainer does most of
the work, the requirements for the tool are relatively straightforward. If the maintainer needs to
understand a variable it is up to him to seek it in the cross reference or formulate a regular
expression query. At this level the tool designer is not directly concerned with the thought
processes or the work flow of the maintainer.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 6
However as the tool tries to do more, it becomes important that the 'more' should actually map to
real maintainer tasks and thought processes. The tool designer needs to understand typical
maintainer problems and craft a user interface that will present solutions to these problems in an
easy-to-use and intuitive way. The user of grep does not need to understand how it processes
regular expressions. The user of more sophisticated tools should not need to understand either the
structure of the graph it uses or the subtleties of the output it displays.
Which brings us to our third progression: ease of use, or perhaps more important, ease of first use.
For over 20 years our program comprehension research group has been working with industrial
software engineers in the Security and Software Engineering Research Center - S2ERC
(http://www.serc.net) [WILDE:2007]. Our experience is that maintainers of real industrial
software are very busy, and have little leisure to explore new tools. Tools that take more than a
few hours to install and run, or that require extensive practice to use effectively are unlikely to
become part of their toolkit. The cross referencing and text search tools are favored because they
require little or no setup and can be used quickly across a range of maintenance tasks with little
training. More sophisticated tools tend to need more time-consuming setup, as well as more
expertise to use.
4 THE SOAMINER TOOL
Our current SOAMiner prototype is firmly located at Level 2 of Table 1. We feel that it is too
early to develop more elaborate tools since there is still little experience with the practical
maintenance of SOA systems. We simply do not know what tasks and questions maintainers will
encounter. We still need to explore the diversity of different SOA designs and interact with
industry software engineers before we can define the requirements for higher level tools.
So our initial goal is to support maintainers of SOA applications with an easy-to-use text search
tool for SOA description files while keeping within the well known mental model of a web search
engine. The tool should provide initial benefits quickly; our goal is less than one hour from tool
download to first results.
Our initial SOAMiner prototype is based on Apache Solr [APAC:2010]. Solr is a widely-used
open source text search system that runs within a servlet container and may be accessed from a
web browser. To use Solr, the administrator creates a schema describing the different fields in
each document he wishes to make searchable along with indexing and querying options. He then
parses the input documents to create the specified fields and post the result into Solr's index.
Users may then make queries using a web browser interface.
The design of the Solr schema is important since it determines the kinds of queries that can be
made and the results that will be returned [SMIL:2009, chapter 2]. An important decision
concerns the granularity of the index. The grep tool searches text that is divided into lines and
echoes the lines that match the user's query. Most web search engines index files and return a link
to each complete file that matches the query. We noted that most of the SOA description files
(WSDL, XSD, etc.) have an XML format and that most of the information is given in attributes
within tags (See Figure 2). For such files line endings are arbitrary, and complete files may be
quite complicated so neither granularity is really appropriate. We thus hypothesized that the best
unit to index would be the XML tag and we decided to focus on queries that match or partially
match the values of tag attributes.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 7
Figure 2 - Part of a WSDL File Showing XML Structure
There are three tags, , and , with
matching closing tags and
SOASearch, our current user interface used for searching SOAMiner, is a JavaScript application
based on AJAX-Solr, an open source library [AJAX:2010]. SOASearch runs in a web browser
and provides a window divided into two panes (see Figure 3). Users make queries in the left pane
and the usual strategy is to make a general query and then narrow it down as needed. This pane
contains "tag clouds" showing the most common file types, file names, XML tag names. etc. in
the most recent query response. The user can click on these tags or enter search text to restrict the
query.
The right pane displays the current results, paged 10 at a time. At the moment it simply displays
the data stored in Solr's index for each matching XML tag.
Figure 3 - SOAMiner Search Interface
5 INITIAL STUDIES WITH SOAMINER
5.1 Case Study Sources
SOAMiner is still in early stages of a spiral development process so the studies done with it so far
are not evaluations of a finished tool, but rather intended to provide feedback on our design
decisions and to surface unanticipated requirements.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 8
We have used three data sets in these initial evaluations, the Travel Reservation Service, a WSDL
from the MicroPAVER™ civil engineering application, and a collection of SOA description files
harvested from the Web.
5.1.1 The Travel Reservations Service
The Travel Reservations Service is the Netbeans example described earlier as a motivation of this
project [KOVAL:2008]. Disregarding duplicates and many miscellaneous XML files, we were
left with three distinct WSDL's (one each for airline, hotel and rental car reservations), one large
XSD with travel industry standard data types, and a BPEL file for the program to orchestrate the
services.
5.1.2 A WSDL from MicroPAVER™
MicroPAVER is a large software application widely used by civil engineers in managing the
maintenance of pavement installations such as roads and airport runways [AWPA:2009]. It is
implemented as a large collection of services programmed using Windows Communication
Foundation (WCF). Most of these services are tightly coupled and WSDL's are not normally used
internally but they can be generated by the WCF software to allow external access. We used one
very large generated WSDL of over 1 MB that may be representative of SOA descriptions
generated automatically from large legacy components*.
5.1.3 SOA Descriptions Harvested from the Web
A third data set of WSDL's and XSD's was collected from the Web to provide both test data for
SOAMiner and a rough snapshot of the current state of practice in service design. A crawler was
written that generated automatic Web queries to select and download WSDL and linked XSD
files. The Web queries included various keywords and type specifications to select URLs that
match WSDL or XSD files. Keywords were selected from the vocabulary of WSDL and XSD
files as well as glossaries from different subjects, to select files from a wide range of applications.
Among matching results, up to 100 Web documents were downloaded using the URLs returned
by the search engines. The documents were subsequently filtered to match WSDL or XSD files.
In addition, WSDL files were analyzed to collect any XSD files describing data structures within
the WSDL files. The result was a data set with 1513 WSDL and XSD files.
5.2 Scalability Study
As has been mentioned, we believe that ease of first use is an important design goal for
SOAMiner. Since it may be used on large systems with many files we ran an initial scalability
study to make sure that the choice of Solr and our design of the Solr schema were not
compromising SOAMiner's ability to rapidly index and query data sets of various sizes and
complexity.
We made two timed tests, one with the single large WSDL from MicroPAVER to stress memory
use in our parser, and a second with the entire 1513 file data set harvested from the Web to stress
Solr's index. Both tests were run on a MacBook™ pro with an Intel Core™ 2 Duo 2.8GHz
processor, 6MB of L2 cache and 4GB of RAM. We measured the clock time required to parse the
input files and the time to post the resulting data into the Solr index. The results are shown in
Table 3, along with the data set size, measured as the total number of input XML tags that were
indexed.
*
We would very much like to thank Dr. Arthur Baskin of Intelligent Information Technologies, a S2ERC
affiliate company, for providing us with this data as well as with many insights into the way services are
used within MicroPAVER and associated programs.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 9
Table 3 - Times to Parse and Load into SOAMiner
Data Set Total Total Size Parse Time Post Time
Files (tags) (sec) (sec)
MicroPaver WSDL 1 12,818 166.63 20.13
Web Harvested
1513 529,127 1278.89 955.93
WSDL's and XSD's
We made several test queries against each data set and found that the response time was only a
few seconds in all cases.
These results suggest that SOAMiner will scale well with both the size of data sets and the size of
individual files. Parse, load and query times are within acceptable limits.
5.3 Basic Maintenance Scenario Study
A usability study was conducted with the initial prototype of SOAMiner to evaluate current
capabilities and to identify additional requirements. A think aloud protocol was used with two
participants engaging in two predefined software maintenance scenarios using the Travel
Reservations Service described earlier. Participants were encouraged to verbalize their thoughts
as they performed activities related to software maintenance, while observers recorded times,
comments, and participant behavior.
To help isolate usability factors related to text search tools in general as opposed to usability
issues with SOAMiner in particular, each participant performed one of the scenarios using grep
and the other using SOAMiner. Both participants were students with some reading experience
with WSDL's and XSD's, but without practical experience working with such documents.
Accordingly the study was preceded with approximately three hours of basic orientation related
to XML Schema, WSDLs, BPEL, SOAMiner, and grep.
The first maintenance scenario was fairly simple and involved locating where the URL for a
particular service was defined. The second scenario involved a hypothetical bug involving failed
cancellations of vehicle reservations; this was more difficult in that it required tracing through
several of the SOA description files to understand how message return data was defined.
Both participants, one using grep and one using SOAMiner, were able to get the correct answers
for the first scenario. The participant using grep was able to answer the questions within 15
minutes while the participant using SOAMiner took 25 minutes, the difference being entirely
attributable to the time required to go through SOAMiner's indexing procedure.
The second scenario was more challenging and neither participant was able to correctly answer
all of the questions. The main difficulty seemed to be that both participants only had a novice
level of familiarity with BPEL, WSDL, and XSD and neither tool was sufficient to substitute for
this lack of background knowledge. One specific problem was that the WSDL contained strings
such as "CancelVehicleOut" and so the participants searched on variants of "CancelVehicle".
However in the XSD the data type they were looking for was called "CancellationStatus" and so
was not found. A SOA expert would be able to trace the point in the WSDL where the
terminology changed but novices could not make the connection.
The most important result of this study is the list of suggested improvements to SOAMiner as
given in Appendix A.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 10
5.4 Locating Data Type Usages
One final study explored a task that we think may be typical for users of SOAMiner -
understanding usages of data types. The WSDL 1.1 specification provides wide latitude to service
developers as to how data types are declared. There are three possible strategies which may be
combined within any WSDL. The data contained within a particular message may be declared:
(1) by reference in an optional section to an external XML Schema document (2) by
using the XML Schema namespace and coding XML Schema-formatted data types in the
section of the WSDL itself or (3) if only the 44 simple types in the XML Schema
recommendation are used, by coding them directly in tags. Unfortunately this
flexibility means that maintainers may often be faced with WSDL's written in an unfamiliar style.
For the Travel Reservation Service we imagined a scenario in which a maintainer needed to
understand the data used in the input message to reserve a vehicle. A WSDL expert would know
that data types are often declared in tags within a tag. In SOAMiner it was
easy to restrict to WSDL files and to tags and then search for "vehicle". This
immediately finds the four matching messages (Figure 3). However it was more difficult to
navigate to the tag contained within the ReserveVehicleIn message. The only solution was
to search for the "tag child Id", an arbitrary unique string generated during parsing. While that
method works, it would be desirable to have a more-intuitive way of navigating up and down the
hierarchy of tags.
Once the tag for ReserveVehicleIn was found, SOAMiner showed immediately that its
input data type is “ota:TravelItinerary”. The obvious next step was to do an unrestricted search
for “TravelItinerary”, but that produced thousands of hits because all tags in the file named
OTA_TravelItinerary.XSD match the query! Thus the file containing the type definition was
quickly identified, but the current Solr search interface does not provide any easy way to search
for that specific string within that file. The Solr schema should probably be adjusted to avoid
matches to file names or to give low weight to such matches.
6 CONCLUSIONS
This report has discussed some of the problems that software maintainers may face when trying
to understand the large SOA applications which are now coming into service. It also described the
ongoing development of SOAMiner, a proposed search tool that users might think of as a Google
for SOA. Since there is little documented experience in the maintenance of SOA applications, we
do not know clearly what maintainers will need, so SOAMiner is being developed following a
spiral process with repeated evaluation of prototypes.
The evaluations reported in this report will be used to guide the development of the next version
of SOAMiner, which we hope may then be ready for trials at S2ERC industrial affiliates. The top
priorities identified for the next cycle are:
1) Provide a better user interface and a more agile setup and load procedure for indexing
SOA description files into SOAMiner.
2) Redesign the panel showing SOASearch output (the right panel in Figure 3) so that it
conveys more information about the tags that matched the query and the context in which
those tags exist. One possibility would be to integrate with a text display that would show
the query results highlighted on the original XML file.
3) Miscellaneous cleanups to our parser and to the Solr schema to avoid searching on file
names and paths and to provide better information for use in the redesigned output panel.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 11
7 ACKNOWLEDGEMENTS
Work described in this paper was partially supported by the University of West Florida
Foundation under the Nystul Eminent Scholar Endowment. We would also like to thank Dr.
Arthur Baskin of IIT, a S2ERC affiliate, for his guidance.
8 REFERENCES
[AJAX:2010] Evolvingweb's AJAX-Solr, http://github.com/evolvingweb/ajax-solr, link
accessed June 2010.
[ALMOD: 2010] Almonaies, Asil A., Cordy, James A. and Dean, Thomas R., Legacy System
Evolution towards Service-Oriented Architecture, International Workshop on
SOA Migration and Evolution SOAME 2010, Madrid, March 2010, pp. 53-
62, ISBN 978-3-00-030627-3.
[APAC:2010] Apache Software Foundation, Apache Solr. http://lucene.apache.org/solr/
link accessed June 2010.
[APWA:2009] APWA - American Public Works Association, MicroPAVER 6.1.5 Pavement
Maintenance Management System,
http://www.apwa.net/About/SIG/Micropaver/ link accessed June 2010.
[CHEN:1990] Chen, Yih-Farn; Nishimoto, Michael; Ramamoorthy, C. V., "The C
Information Abstraction System", IEEE Transactions on Software
Engineering, Vol. 16, No 1, pp. 325 - 334.
[DILUC:2000] Di Lucca, Guiseppe Antonio; Fasolino, Anna Rita; De Carlini, Ugo,
"Recovering Class Diagrams from Data-Intensive Legacy Systems"
Proceedings International Conference on Software Maintenance, ICSM-
2000, San Jose, October 2000, pp. 52 - 63.
[GALL.1991] Gallagher, Keith B. and Lyle, James R., "Using Program Slicing in Software
Maintenance" IEEE Transactions on Software Engineering, Vol. 17, No. 8,
August 1991, pp. 751 - 761.
[GOLD:2004a] Nicolas Gold, Keith Bennett, "Program Comprehension for Web Services,"
pp.151, 12th IEEE International Workshop on Program Comprehension
(IWPC'04), 2004
[GOLD:2004b] Nicolas Gold, Claire Knight, Andrew Mohan, Malcolm Munro,
"Understanding Service-Oriented Software," IEEE Software, vol. 21, no. 2,
pp. 71-77, Mar./Apr. 2004, doi:10.1109/MS.2004.1270766.
[GUEH:2006] Gueheneuc, Y.-G.; Mens, K.; Wuyts, R., "A comparative framework for
design recovery tools," Proceedings of the 10th European Conference on
Software Maintenance and Reengineering, 2006. CSMR 2006, pp.123 -134,
March 2006, doi: 10.1109/CSMR.2006.1.
[KAJK:2004] Mira Kajko-Mattsson, "Evolution and Maintenance of Web Service
Applications," pp.492-493, 20th IEEE International Conference on Software
Maintenance (ICSM'04), 2004.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 12
[KAJK:2005] Mira Kajko-Mattsson, Michal Tepczynski, "A Framework for the Evolution
and Maintenance of Web Services," pp.665-668, 21st IEEE International
Conference on Software Maintenance (ICSM'05), 2005.
[KOEN:1991] Koenemann, J. and Robertson, S. P. 1991. Expert problem solving strategies
for program comprehension. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems: Reaching Through Technology (New
Orleans, Louisiana, United States, April 27 - May 02, 1991). S. P. Robertson,
G. M. Olson, and J. S. Olson, Eds. CHI '91. ACM, New York, NY, 125-130.
DOI= http://doi.acm.org/10.1145/108844.108863
[KONT:2008] Kostas Kontogiannis, Challenges and Opportunities Related to the Design,
Deployment and Operation of Web Services, Frontiers of Software
Maintenance (FoSM) 2008, Beijing, Sept. - Oct. 2008, pp. 11-20.
[KOVAL:2008] Anastasia Koval, Understanding the Travel Reservation Service,
http://netbeans.org/kb/61/soa/understand-trs.html, link accessed June, 2010.
[LEWIS:2008] Lewis, G. A. and Smith, D. B., Service-Oriented Architecture and its
implications for software maintenance and evolution, Frontiers of Software
Maintenance (FoSM) 2008, Beijing, Sept. - Oct. 2008, pp 1-10.
[LINO:1994] Linos, Panagiotis; Aubet, Philippe; Dumas, Laurent; Helleboid, Yann;
Lejeune, Patricia; Tulula, Philippe, "Visualizing Program Dependencies: An
Experimental Study" Software - Practice and Experience, Vol. 24, No. 4,
April 1994, pp. 387 - 403.
[MURPH:1997] Murphy, Gail and Notkin, David, "Reengineering with Reflexion Models: A
Case Study", IEEE Computer, Vol. 30, No. 8, August 1997, pp. 29 - 36.
[OPEN:2004] The Open Group, grep - The Open Group Base Specifications Issue 6,
http://www.opengroup.org/onlinepubs/009695399/utilities/grep.html, link
accessed June, 2010.
[PANCH:2007] Oleksandr Panchenko: Concept Location and Program Comprehension in
Service-Oriented Software. 23rd IEEE International Conference on Software
Maintenance (ICSM 2007), October 2-5, 2007, Paris, France, ICSM 2007:
513-514
[QUEI:1994] Queille, J.-P.; Voidrot, J.-F.; Wilde, N.; Munro, M., "The Impact Analysis
Task in Software Maintenance: a Model and a Case Study", Proc. IEEE
International Conference on Software Maintenance - 1994, Victoria, Canada,
September 1994, pp. 234 - 242.
[SMIL:2009] David Smiley and Eric Pugh, Solr 1.4 Enterprise Search Server, Packt
Publishing Ltd., Birmingham UK, 2009, ISBN 978-1-847195-88-3.
[VONM:1994] von Mayrhauser, A. and Vans, A. M., Dynamic Code Cognition Behaviors
for Large Scale Code, Proceedings Third Workshop on Program
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 13
Comprehension, November 14-15, 1994, Washington, DC, IEEE Computer
Society, pp. 74-81.
[WILDE:2007] Norman Wilde, Dennis Edwards, Sharon Simmons, "Software
Reconnaissance: Experiences with a Simple Requirements Traceability
Technique", International Symposium on Grand Challenges in Traceability,
TEFSE/GCT’07, March 22-23, 2007, Lexington, KY, USA, pp. 103 - 107.
[WILDE:2008] Wilde, N., Simmons, S., Pressel, M., and Vandeville, J. 2008. Understanding
features in SOA: some experiences from distributed systems. In Proceedings
of the 2nd international Workshop on Systems Development in SOA
Environments (Leipzig, Germany, May 11 - 11, 2008). SDSOA '08. ACM,
New York, NY, 59-62. DOI= http://doi.acm.org/10.1145/1370916.1370931
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 14
APPENDIX A - RESULTS OF THE BASIC MAINTENANCE SCENARIO STUDY
Observations from Scenario 1
Both participants, one using grep and one using SOAMiner, were able to correctly answer
questions related to the maintenance of the hotel reservation system. The participant using grep
was able to answer the questions within 15 minutes. The participant using SOAMiner was able to
answer the questions within 25 minutes. The observer noted that the difference in time was
attributed to the setup time for SOAMiner. Both users retraced steps in their attempt to derive the
sought after information.
The participant using SOAMiner remarked that copying all of the SOA description files into a
special directory, clearing the index, and typing the SOAParser command n times – once for each
of n files was tedious. The participant using SOAMiner initially forgot to load the index – but
fairly quickly realized the error when the system did not behave as expected. This participant
performed partial loads of the files needed to answer the first question and then performed a
second load later when additional files were needed for the second question.
At one point the participant using SOAMiner did not realize that he had the answer in the right
panel displayed in SOAMiner. The lack of line numbers in the display provided by SOAMiner
necessitated the supplemental use of a text editor to answer some of the maintenance scenario
questions. SOAMiner doesn’t integrate access to an editor that displays line numbers.
Comments from the participant using SOAMiner were that an easier way to load each file into the
index is needed, that line numbers are needed, and that he liked the left panel browser with cloud
tags.
The participant using SOAMiner got to the appropriate location within the file structure fairly
quickly once the files were loaded. The participant using grep entered lots of irrelevant queries
before attaining answers.
Comments from the participant using grep were that grep was fast and provided good support
once he knew the commands, however, he did not like having to use the command line interface,
and had difficulty keeping track of his location within the file structure.
Observations from Scenario 2
The second scenario was more challenging for each of the participants than the first scenario.
Neither participant was able to correctly answer all of the questions regarding the maintenance
scenario.
Both participants only had a novice level of familiarity with BPELs, WSDLs, and XSDs which
caused difficulty for them in finding answers – the tools did not substitute for lack of background
knowledge.
The participant using grep also made extensive use of the vi text editor, however he did not
understand the WSDL, BPEL, XSD relationships well enough to navigate within and between
files to find answers. He eventually decided the answer had to be in the XSD, and examined that
file in the vi editor, however name changes within those files stumped him.
7201ba2a-66a6-4db6-8a41-a3cdae7e7010.doc 15
The participant using grep entered many mistyped and irrelevant commands--never used a text
editor.
The SOAMiner participant spent 11 of 36 minutes creating the index. He also performed a partial
load of needed files.
Comments from the participant using SOAMiner were that the interface is easy to use, and that
options in the left panel browser (e.g., filter capability) were very useful. However deriving the
desired meaning from the results was most difficult. The tool was easy to use like other web
searches but did not provide enough information in the results.
Conclusions - Suggested Improvements:
1) The extent of time that both participants expended setting-up and loading files into
SOAMiner made evident that the need to improving the manner in which users set-up and
load files into SOAMiner is a priority. The new set-up and load procedures should
accommodate multiple load use cases.
2) The SOAMiner user-interface is not as intuitive as desired. The capability to undo/redo
activities would improve support for the natural thought process of users as demonstrated
by both participants’ attempts to retrace their steps as they progressed through the
scenarios. The integration of a text editor that can be launched from the SOAMiner
interface, possibly when clicking on links associated with files, and the display of files
with line numbers will better support users.
3) Enhancing SOAMiner with the means to represent domain knowledge about WSDL,
BPEL, XSD and relationships among these would be beneficial for users, especially for
the novice user.
4) Incorporating a built-in HELP system for SOAMiner, possibly one where a user can
hover over tags and get information about the meaning of the tag would be one way to
address the participant remarks about deriving meaning from the SOAMiner output.
5) A problem with text based-tools is with different labels for the same concept. In this case
the WSDL contained strings such as CancelVehicleOut whereas in the XSD the term was
CancellationStatus.
6) Further testing with users with more familiarity with the XML vocabularies involved is
needed to evaluate the usability of SOAMiner for more experienced maintainers.
Future Work
Two issues arose from the evaluation of these results that warrant further work. First, we need to
determine the effect of loading the same files into SOAMiner multiple times, and second we need
to look at configuration issues such as case sensitivity, partial string match, and explicit
namespace qualifiers to see how these are best handled in SOA maintenance scenarios.