VIEWS: 13 PAGES: 13 POSTED ON: 5/26/2011
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 351 Concept Analysis for Module Restructuring Paolo Tonella AbstractÐLow coupling between modules and high cohesion inside each module are the key features of good software design. This is obtained by encapsulating the details about the internal structure of data and exporting only public functions with a clean interface. The only native support to encapsulation offered by procedural programming languages, such as C, is the possibility to limit the visibility of entities at the file level. Thus, modular decomposition is achieved by assigning functions and data structures to different files. This paper proposes a new approach to using concept analysis for module restructuring, based on the computation of extended concept subpartitions. Alternative modularizations, characterized by high cohesion around the internal structures that are being manipulated, can be determined by such a method. To assess the quality of the restructured modules, the trade-off between encapsulation violations and decomposition is considered and proper measures for both factors are defined. Furthermore, the cost of restructuring is evaluated through a measure of distance between original and new modularizations. Concept subpartitions were determined for a test suite of 20 programs of variable size, 10 public domain and 10 industrial applications. On the resulting module candidates, the trade-off between encapsulation and decomposition was measured, together with an estimate of the cost of restructuring. Moreover, the ability of concept analysis to determine meaningful modularizations was assessed in two ways. First, programs without encapsulation violations were used as oracles, assuming the absence of violations as an indicator of careful decomposition. Second, the suggested restructuring interventions were actually implemented in some case studies to evaluate the feasibility of restructuring and to deeply investigate the code organization before and after the intervention. Concept analysis was experienced to be a powerful tool supporting module restructuring. Index TermsÐConcept analysis, modularization, encapsulation, abstract data type, legacy systems, reengineering, restructuring. æ 1 INTRODUCTION M ost complex man-made systems are designed and developed by breaking down their overall structure into smaller, relatively independent units. In many fields, In languages such as C, the support intrinsically given to modularization is minimal. Data structures and functions can be made private to a file by exploiting the access one of which is software engineering, decomposition specifier static. Therefore, in the following, the file will driven by abstraction is the key to managing complexity. be considered the basic modular unit for C programs. The A decomposed, modular computer program is easier to programmer can violate the encapsulation that was write, debug, maintain, and manage. A program consist- originally designed for a module, if one was, by means of ing of modules that exhibit high internal cohesion and pointers, accessing any field of a given data structure, and low coupling between each other is considered superior function pointers for the functions. Moreover, there are to a monolithic one. situations in which encapsulation of data structures is not Inadequate modularization makes maintenance of old enforced although it would be desirable to have it. Direct legacy systems often expensive and difficult. In some access to data structures is intermixed with the usage of instances, the original modular structure of the program interface functions, while a more disciplined interaction of may undergo degradation due to the violations introduced client modules could result in an improved maintainability by successive maintenance interventions. In others, even the and understandability. original design of the program was not conceived to be This paper presents a novel approach to module modular, resulting in an increasingly convoluted and, in the restructuring based on concept analysis. The notion of end, unmanageable system. concept subpartition is introduced to obtain meaningful Improving the modular structure of a program is a form combinations of the concepts extracted by concept analysis of preventive maintenance that is often necessary when the which can be extended to become candidate modulariza- system undergoes new releases. In fact, modifying an tions of the original program. Concepts can be characterized intricate code base may not be feasible unless a preliminary as groupings of objects sharing common attributes. 1 Functions and data structure accesses instantiate the restructuring step is performed. In other cases, restructur- notions of objects and attributes for the present application ing becomes unavoidable if the system is to survive its of concept analysis. Therefore, concepts represent the basic growing entropy. elements that determine the borders encapsulating func- tions into modules. If the attributes are able to capture the . The author is with the ITC-irst Centro per la Ricerca Scientifica e internal structure accesses performed by the functions in Tecnologica, Povo (Trento), Italy. E-mail: email@example.com. the program, concepts and extended concept subpartitions Manuscript received 4 Nov. 1998; revised 7 Mar. 2000; accepted 11 July 2000. Recommended for acceptance by H. Muller. 1. Objects and attributes introduced in the framework of concept analysis For information on obtaining reprints of this article, please send e-mail to: should not be confused with objects and attributes of object-oriented firstname.lastname@example.org, and reference IEEECS Log Number 108171. programming. 0098-5589/01/$10.00 ß 2001 IEEE 352 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 result in highly cohesive module candidates, organized oracles, assuming that their actual modularization is a good around the data structure being manipulated. The data one and corresponds to a common purpose. Concept structures around which modules are built may be statical analysis was able to exactly reconstruct the same modular- (e.g., global variables) or dynamical (heap allocated) and ization on about half of them and produced a very close functions operating on them have their type in the signature modular structure on the remaining ones. In some case if they are not globally accessible. Consequently, three kinds studies, the restructuring interventions suggested by con- of attributes are considered: dynamic memory, signature cept analysis and selected by examining encapsulation, types, and global variables. A module can encapsulate a set decomposition, and cost were actually implemented with the purpose of gaining knowledge about the real effort of operations manipulating a common dynamically allo- required. The results show that improving encapsulation cated data structure (e.g., a list or a tree). Moreover, a can be effectively supported by concept analysis and that module can group functions receiving a user-defined data the initial directions obtained through it are extremely structure as a parameter and operating on it. Finally, global useful. variables can be the shared structures around which a The paper is organized as follows: The next section module is built. presents the related work. Section 3 describes the basic When the modules of a program are restructured, two elements of concept analysis, concept partitions, and the contrasting factors have to be controlled: encapsulation and proposed concept combinations represented by concept decomposition. It is easy to obtain solutions to the subpartitions. The last topic of this section deals with two restructuring problem if only one of these factors is novel metrics for encapsulation and decomposition assess- considered. A program with all functions in a module has ment. In Section 4, the notion of partition distance is no encapsulation violations but has a low level of decom- introduced as a means of evaluating restructuring costs. position. On the contrary, assigning every function to a Section 5 gives experimental results obtained for a test suite distinct module produces the maximum decomposition, but of public domain and industrial programs. Finally, Section 6 also the maximum encapsulation violations. A means of is devoted to the conclusions. evaluating the trade-off between encapsulation and decom- position is suggested here and is based on proper measures of the two factors to be compared with the original levels. In 2 RELATED WORK fact, there is no absolute optimal value, but improvements The related work deals with the identification of abstract can be defined with respect to the starting point. Moreover, data types and objects in the code. In , the main methods additional criteria (e.g., work assignment) usually have to for object identification are classified as global-based or be accounted for, when modularizing or restructuring a type-based, respectively, when functions are clustered system, related to the different perspectives that can drive around globally accessible objects or formal parameter its decomposition. and return types. A new identification methodÐbased on While the relative encapsulation and decomposition the concept of receiver parameter typeÐis also proposed. improvements determine the benefits of restructuring the The approach presented in , which considers accesses to program, a further element affecting the final decision is global variables, uses an internal connectivity index to cost. Estimating the effort required to reorganize a program decide which functions should be clustered around the according to a new modular structure is a hard task. recognized object. Such a method is extended in  to Nevertheless, a first coarse grain indicator is given by the include type-based relations and it is combined with the distance between the partition of the functions in the strong direct dominance tree to obtain a more refined result. original modules and in the new ones. Such a notion of The recovery technique described in  builds a graph distance is defined in this paper and an algorithm for showing the references of procedures to structure internal computing it is also provided. The encapsulation and fields. Accesses to global variables drive the recognition of decomposition measures, together with the distance from object instances. the original modularization, give a complete picture of the Atomic components are detected and organized in a required intervention. It is possible to graphically represent hierarchy of modules, according to the method described in the trade-off discussed above and to allow the programmer . Three kinds of atomic components are considered: to choose among the available alternative modularizations abstract state encapsulations, grouping global variables and computed from concept subpartitions. accessing procedures, abstract data types, grouping user Experimental results suggest that concept analysis is an defined types and procedures with such types in their effective tool to drive module restructuring. Ten public signature, and strongly connected components of mutually domain and 10 industrial programs were analyzed in the recursive procedures. Dominance analysis is used to three contexts (dynamic memory access, function signature hierarchically organize the retrieved components into types, and global variable use). For all the considered subsystems. programs, the retrieved extended concept subpartitions A radically different group of approaches for extracting provide alternative modularizations which improve encap- software components with high internal cohesion and low sulation and/or decomposition metrics with respect to the external coupling exploits the computation of software original programs. The cost associated with each candidate metrics. The ARCH tool  is one of the first examples of transformation was evaluated and used to guide the embedding the principle of information hiding turned into selection. Programs having no encapsulation violations at a measure of similarity between procedures within all in any of the three considered contexts were used as a semiautomatic clustering framework. Such a method TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 353 incorporates a weight tuning algorithm to learn from the In fact, a concept is a grouping of programming entities design decisions in disagreement with the proposed (e.g., functions) that share common attributes. Such modularization. In , , the purpose of retrieving attributes can be interpreted as a description of the modular objects is reuse, while, in , metrics are used commonalities within each module. On the contrary, to refine the decomposition resulting from the application modules recovered by means of clustering have to be of formal and heuristic modularization principles. Another inspected to trace metrics values back to the attributes different application is presented in , where cohesion originating them. and coupling measures are used to determine clusters of Module restructuring methods based on concepts suffer processes. The problem of optimizing a modularity quality from the difficulty of determining partitions, i.e., nonover- measure based on cohesion and coupling is approached by lapping and complete groupings of program entities. In means of genetic algorithms in , which are able to fact, concept analysis does not assure that the candidate determine a hierarchical clustering of the input modules. Such a technique is improved in  by the possibilty of modules it determines are disjoint and cover the whole detecting and properly assigning omnipresent modules, of entity set. exploiting user provided clusters, and of adopting orphan The novelty in the approach proposed in this paper is the modules. In , a complementary clustering mechanism is use of concept subpartitions instead of concept partitions. applied to the interconnections, resulting in the definition of The idea is that the overly restrictive constraint of tube edges between subsystems. partitions, requiring that the whole object set is covered, In , the star diagram is proposed as a support to help can be removed, thus exploiting all the information the programmer restructure a program by improving its retrieved through concept analysis and otherwise lost with encapsulation of abstract data types. Another decomposing the concepts that are disregarded since they do not form a and restructuring system is described in . Both of them complete partition. In addition, this paper proposes two provide sophisticated interaction means to assist the user in effective metrics for evaluating the benefits of restructuring the process of analyzing and restructuring a program. and a proper distance measure to estimate restructuring The most relevant works to the presented approach are costs. The graphical representation of all these factors drives applications of concept analysis to the modularization the programmer in the selection of the subpartitions of problem. In , , , concept analysis is applied to interest. the extraction of code configurations. Modules associated with specific preprocessor directive patterns are extracted 3 CONCEPT ANALYSIS AND ITS SUPPORT FOR and interferences are detected. The relation between MODULARIZATION procedures and global variables is analyzed by means of concept analysis in . The resulting lattice is used to 3.1 Basics identify module candidates. Violations of encapsulation are In this paper, concept analysis is not presented in detail. For represented in the lattice and can be automatically handled. a primer, the interested reader can refer to . Only the The lattice can also be transformed so as to become more basic definitions are introduced and the results obtained for suitable for modularization by exploiting the block relations, a small example are discussed to informally illustrate the additional procedure/global variable relations that extend general ideas. In the following, the reference problem is the the original ones. Concept analysis is used in  to identify decomposition of a procedural program into modules modules by considering both positive and negative in- containing groups of functions. In C, this corresponds to formation about the types of the function arguments and of the organization of functions within different files. the return value. Concept partitions correspond to possible Concept analysis permits grouping objects that have modularizations of the program. In this author's previous common attributes. In the application of concept analysis to work , encapsulation around dynamically allocated modularization, objects are functions, while attributes are memory locations is considered. Points-to analysis is used properties of functions related to their encapsulation inside to determine dynamic memory accesses, while concept modules. Examples of such attributes are the accesses to analysis permits grouping functions around the accessed global variables, the accesses to dynamic locations, and the dynamic locations. The resulting clusters are plotted on a presence of a user-defined structured type in the signature, new diagram, the O-A (Objects-Attributes) diagram, allow- including return type. Concept analysis is a general frame- ing for the selection of the concepts more suitable to drive work, rather than a specific modularization technique, that the restructuring process. Concept analysis is exploited in can be specialized by the particular choice of attributes that  to reengineer class hierarchies. A context describing the are considered in evaluating encapsulation. Combinations usage of a class hierarchy is the starting point for the of different kinds of attributes and the negation of attributes construction of a concept lattice from which redesign hints can be used as well. can be derived. The starting point for concept analysis is a context The main difference between module restructuring yY eY , consisting of a set of objects y, a set of attributes based on clustering and module restructuring based on e, and a binary relation between objects and attributes, concepts is that the latter is intrinsically able to characterize stating which attributes are possessed by each object. A the restructured modules semantically, while the former concept is a maximal collection of objects that possess builds modules according to cohesion and coupling metrics. common attributes, i.e., it is a grouping of all the objects that 354 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 TABLE 1 object set y. g f I Y I Y F F F Y n Y n g is a concept Example of Context partition iff: n i y nd Vi T jY i j YX Q iI A concept partition allows assigning every function in The objects are the functions fI Y fP Y fQ and the attributes are the the considered context to exactly one module. In the accesses to the dynamic memories rieI Y rieP Y rieQ . example discussed above, the two following concept partitions can be determined: share a common set of attributes. More formally, a concept is a pair of sets Y such that: . gI fI gX . gP fP Y Q gX fo P yjV P X oY P gY I The first partition contains just one concept, I , and corresponds to modularizing the program by inserting all f P ejVo P X oY P gX P three functions, fI Y fP Y fQ , in the same module, on the basis is said to be the extent of the concept and is said to be of their shared access to rieI . The second partition the intent. There are several algorithms for computing the generates a proposal of modular organization in which fI concepts for a given context. The simple bottom-up and fP are inside a module, since they access both rieI algorithm described in  was used for this work. and rieP , while fQ is put inside a second module for its The key observation for using concept analysis is that a access to rieI and rieQ . It should be noted that the second modularization permits a violation of encapsulation module or abstract data object corresponds to a formal since functions of different modules access a shared concept. Let us consider, for example, the accesses to dynamic location, namely rieI . It ensures that no dynamic memory. A concept consists of a set of functions function outside P accesses both rieI and rieP , but operating on a set of dynamic locations, while such rieI alone is accessible. This example gives a deeper locations are not simultaneously accessed by a function insight into the modularization associated with a concept outside the concept. partition: Even in cases in which the only modularization An example of context is given in Table 1. The set of that does not violate encapsulation is the trivial one, with all objects consists of the three functions, fI Y fP Y fQ , and the functions in a module, concept analysis can extract attributes are the three dynamic locations, rieI Y alternative modularizations that do not ensure that every- rieP Y rieQ , representing three unnamed data struc- thing is encapsulated, but are based on common attributes. tures that are dynamically created on the heap (e.g., via In such a case, the residual violations of encapsulation may malloc, in C). Table 1 indicates (with a tick) the direct be considered acceptable or may be removed with the access of a function to some internal field of a dynamic introduction of proper accessor/modifier functions. location, thus, e.g., fI accesses rieI and rieP , while fQ accesses rieI and rieQ . After applying concept 3.3 Concept Subpartitions analysis to this example, the following concepts are Concept partitions introduce an overly restrictive constraint identified: on concept extents by requiring that their union covers all functions in the program. In many practical cases, the only . I ffI Y fP Y fQ gY frieI gX concept partition able to satisfy such a constraint contains . P ffI Y fP gY frieI Y rieP gX just one concept whose extent is the set of all program . Q ffQ gY frieI Y rieQ gX functions. Consider, for example, the case of a program . R fgY frieI Y rieP Y rieQ gX with a function that possesses no attribute (in the example Concept I indicates that all the three functions share above, an additional function fR that does not access access to rieI . P states that fI and fP both access rieI dynamic locations). Such a function can only be in the and rieP . fQ is the only function accessing both rieI extent of a concept with empty intent, together with all and rieQ (concept Q ), while no function has the property other functions. The only associated concept partition is the of accessing all dynamic locations (R ). trivial one, with all functions grouped in the extent of the only concept of the partition. More generally, when 3.2 Concept Partitions concepts are disregarded because they cannot be combined Concepts are good candidates for the organization of with other concepts to cover all functions, important functions into modules. In fact, each concept is, by information that was identified by concept analysis is lost definition, characterized by a high cohesion of its objects without reason. The usefulness of a group of concepts in around the chosen attributes. However, concepts may have identifying meaningful organizations of functions around extents with nonempty intersections and, thus, not every shared attributes should not be limited by the unnecessary collection of concepts represents a potential modularization. requirement that all functions are covered. In this paper, the To address this problem, the notion of concept partition was notion of concept subpartition in which the overly adopted (see, for example, ). A concept partition consists restrictive constraint is removed is proposed to replace of a set of concepts whose extents are a partition of the concept partitions. A concept subpartition associated with a TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 355 given context is a set of concepts with disjoint extents. 3.5 Encapsulation Violations g f I Y I Y F F F Y n Y n g is a concept subpartition iff: A quality factor of a modularization is its ability to encapsulate functions around shared attributes. A measure Vi T jY i j YX R of such ability is the count of the violations of encapsulation Concept partitions are particular cases of concept subparti- associated with a given modularization of a program. The tions where the union of the extents is the set y of all considered modularization may be both the original one or objects. that proposed by concept analysis through concept sub- partitions. To evaluate the number of violations of 3.4 Object Partitions encapsulation, each attribute of the considered context has Partitions of the object set represent possible modulariza- to be assigned to one of the object sets (modules) in the tions of a program.2 The actual modules in a program modularization. Then, the count of the attributes possessed can be regarded as an actual object partition of the by the objects in a module and assigned to a different program since they group the functions of the program module gives the number of violations. according to the source file they belong to. Such an object Definition 3 (Attribute Assignment). Given a context partition will be referred to as the original object partition yY eY and an object partition , the attributes assigned of the program and is associated with the original to each module wk are those with the highest number of modularization of the program. accesses from wk . An attribute is assigned to the object set A concept subpartition induces a subpartition of the wk of the object partition iff wk is the set with the object set, which in turn can be extended to an object maximum number of objects possessing . partition. The object subpartition, induced by a concept subpartition g f I Y I Y F F F Y n Y n g is the set of the P ettr wk iff k rg mx jf oY P jo P wi gjY i extents, fI Y F F F Y n g. It can be transformed into an object partition , with reference to the original partition , by means of the partition subtraction (sub) operator: where ettr wk is the set of attributes assigned to the object set wk . The maximum cardinality of the considered subset Definition 1 (Partition Subtraction). The partition subtrac- of may be associated with multiple indexes i. In such tion of an object subpartition from an object partition cases, rg mx randomly chooses one of them. It will be gives the subpartition complementary to with respect to . shown that this arbitrary choice has no impact on the count It can be obtained by subtracting the union of the sets in of encapsulation violations. from each set in . Definition 4 (Encapsulation Violations). Given a context su fwk wi À wj jwi P gX yY eY and an object partition , the encapsulation wj P violation count i is the total number of objects in each object set wi of that possess an attribute assigned to a sub is itself a subpartition because sets in are different object set wj of . disjoint and remain such after the subtraction. The subtrac- i p jf oY P jo P wi Y P ettr wj Y i T jgjX tion operator can be used to extend subpartitions to partitions: With reference to the example in Table 1, let us assume Definition 2 (Subpartition Extension). An object subparti- that the original modularization of the program tion can be extended to an object partition , with is fwI Y wP g fffI gY ffP Y fQ gg. Attributes rieI Y reference to an original partition , by the union of and rieP Y rieQ can be assigned to the modules as follows: the subtraction of from . The empty set is not considered ettr wI frieP g and ettr wP frieI Y rieQ g. an element of . In fact, rieI and rieQ are possessed, respectively, by two objects in wP vs. one object in wI and one object in wP su À YX vs. no object in wI . The attribute rieP has one access from both wI and wP , thus it was arbitrarily assigned to If, for example, fffI gY ffP Y fQ gg represents the wI . The resulting encapsulation violation number is two original modularization of a program and fffI Y fP gg since fI P wI accesses rieI , assigned to wP , and fP P is the subpartition associated with a concept subpartition of wP accesses rieP , assigned to wI . It should be noted the program, the subtraction of from gives ffgY ffQ gg, that the choice of assigning rieP to wP would not i.e., it gives all the functions not covered by the subpartition change the encapsulation violation count since exactly one and grouped according to the original modularization. The violation in the access to rieP would remain due to its extension of is therefore fffI Y fP gY ffQ gg. access from fI P wI . More generally, if the same maximum Extending subpartitions to partitions allows one to also number of accesses is detected in more than one module, obtain a modularization of all the functions in the program all accesses are violations except those done by the chosen in cases in which concept subpartitions instead of partitions module with no regard to the particular choice of the are used. The extension involves considering the original module to which the attribute is assigned. If an extended grouping of the functions into modules and using it to object subpartition of the example above is fwI Y wP g complete the subpartition. fffI Y fP gY ffQ gg (it is the object subpartition associated with 2. Only object partitions not containing the empty set are considered concept P ), attributes can be assigned as follows: since adding a fictitious module with nothing inside is meaningless. ettr wI frieI Y rieP g and ettr wP frieQ g. 356 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 The encapsulation violation number becomes one and accounts for the access from fQ P wP to rieI . 3.6 Decomposition The number of violations of encapsulation cannot be the only measure that drives modularization. In fact, the trivial modularization with all functions in a single module has an encapsulation violation count of zero but is not acceptable. The second factor that affects the quality of a modulariza- tion is its ability to decompose the system into smaller, more manageable, and meaningful subsystems. Therefore, an evaluation of the quality of a modularization should include a measure of the decomposition associated with it. Fig. 1. Three possible transformations of a partition produced by an Given an object partition , a simple decomposition elementary move. measure is given by its size: he j jX Definition 5 (Elementary Transformation). Given a partition , a new partition H is produced from by applying an The number of modules in which the system is split is thus elementary transformation t if one object is moved from a set used to account for the level of decomposition of the of into another different set of or is removed from a set of program. and generates a new singleton set. Having few encapsulation violations and high decom- position are opposite requirements in the choice of a The three situations that can occur when an elementary modularization of the program. In extreme cases, it is transformation is applied to a partition are depicted in possible to obtain i H by inserting all functions in a Fig. 1. In Case 1, the cardinality of is not changed. An single module, but the corresponding decomposition is the object is removed from a set that does not become empty minimal possible: one. On the other side, the highest and is added to an already existing set. In Case 2, the decomposition is obtained by inserting a single function cardinality of is incremented because an object is into each module. In this case, the decomposition metric is removed from a set that does not become empty and maximal and equal to the number of functions in the generates a singleton set. Finally, in Case 3, the cardinality program: he jp un j, but the corresponding en- of is decremented because an object is removed from a capsulation violation number is also maximal: i singleton set that becomes empty and is added to an jj À jej. In fact, every attribute is arbitrarily assigned to already existing set. Note that the empty set that is one of the modules accessing it since each module performs generated by this move is not considered as belonging to at most one access. All accesses are violations except for the partition. those made by the modules to which the attributes are assigned. Their number is equal to the number of attributes, Definition 6 (Partition Distance). The distance between two jej, since each such module performs just one access (under partitions is the minimum number of elementary transforma- the hypothesis that no unaccessed attributes exist). tions that can be applied to the first partition to produce the In real cases, the number of encapsulation violations second partition. should be limited and, at the same time, decomposition of tI tn the system should be encouraged. For a given program, it is 3 3 d Y minf À I 3 F F F À gX nPx possible to assess the actual decomposition and encapsula- tion levels through the metrics proposed above. A restruc- The existence of such a measure for any pair of partitions turing intervention aimed at improving the modularization descends from the possibility of transforming any partition of the program should compare the new decomposition and into any other arbitrary partition through a sequence of encapsulation levels with the original ones. An additional elementary moves. A way to do this is to reduce the element to be considered is the cost of the modification. A partition to a collection of singleton sets by means of the way to obtain a raw indication of such cost is described in second move in Fig. 1. Then, such sets can be aggregated to the next section. obtain any desired partition by means of the third move in Fig. 1. It is straightforward to show that the above definition 4 DISTANCE BETWEEN OBJECT PARTITIONS satisfies the requirements of distance. The axioms of distance require that the following conditions hold for any The actual modular structure of a program must be partition Y Y : compared with the modularization proposals coming from concept analysis to gain indications on the cost of 1. d Y ! H and d Y H iff . Being a restructuring. For this reason, a notion of distance between natural number, the partition distance is greater object partitions is developed. In the following, the notion of than or equal to zero. It is zero when a partition elementary transformation is introduced. Then, it is used to can be transformed into with zero elementary define a measure of distance between partitions. Finally, an moves, i.e., when and do not differ; vice versa, if algorithm to compute such a distance is given. Partitions are they do not differ they can be transformed into each assumed not to contain the empty set. other with zero moves. TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 357 since they originate longer sequences of elementary transformations in that no object can remain in the original set. Now, two new partitions are computed in which the transformation of the paired sets is completed. The number of elementary moves to accomplish this transformation is the cardinality of the symmetric difference (indicated with R) between the selected sets. In fact, this is the number of objects that are moved from the first set to their final destination or from the second set into the first one. It has to be augmented with the number of moves necessary to transform the two new partitions one into the other, i.e., with the recursively computed distance between the two new partitions. Finally, the minimum is returned as the result of the computation. Let us consider the object partition associated with the concept partition gP of the example in the previous section, fpI Y pP g fffI Y fP gY ffQ gg. If the actual modules of the program are qI ffI g and qP ffP Y fQ g, the original object partition is fqI Y qP g fffI gY ffP Y fQ gg. The dis- tance between the two partitions can be computed by applying the algorithm in Fig. 2. The pairs of sets with nonempty intersection that are considered for transforma- tion are pI Y qI Y pI Y qP Y pP Y qP . When each of the three transformations is completed, the new partitions become Fig. 2. Pseudocode of an algorithm that computes the distance between equal and the recursive distance is zero. The symmetric two object partitions. difference size is, respectively, one, two, one and, thus, the minimum is one. If a concept subpartition is considered 2. d Y d Y . Commutativity follows from the instead of a partition, it has to be extended to an object observation that every elementary move has an partition first. inverse. Move 1 in Fig. 1 has itself as an inverse The above notion of distance between object partitions is H because the object ok can be reinserted into nI by appealing in the context of module restructuring because H extracting it from nP , which does not become elementary transformations correspond to moving a func- empty since nP was not empty initially. Move 2 tion from a module into another module. This can be has Move 3 as its inverse and vice versa, Move 3 considered a unit of measure for the restructuring effort has Move 2 as its inverse. In fact, Move 2 extracts paid when the decision is to reorganize the modularization an object and generates a singleton set, while Move by moving some functions across modules. It is a coarse 3 inserts the object of a singleton set into an grain cost measure to be weighted with an estimate of the already existing set. Thus, any minimal sequence interventions required by the move, but it is a first that transforms into has an inverse of the indication giving the total number of such moves. On the same length and no shorter sequence can transform other hand, the distance between object partitions does not into because its inverse would otherwise be account for a second decision that can be taken: The the minimal sequence from to . functions can remain in their original module and the 3. d Y d Y d Y . The concatenation of violations of encapsulation are resolved by modifying the the minimal sequence from to and from to code of the functions or they are considered acceptable and is a legal sequence of elementary transformations no intervention is performed to remove them. Therefore, from to . Therefore, the minimal sequence from the cost of moving functions between modules is not the to can only be shorter than or equal to such a only factor to examine: The presence of residual violations concatenation. has to be evaluated. In addition, the new modularization of Fig. 2 shows the pseudocode of an algorithm that the program should not worsen the level of decomposition computes the distance between two object partitions. It is in order to gain in encapsulation. To summarize, to get the a recursive algorithm ending when the two input partitions whole picture of costs and benefits of a module restructur- are equal and, thus, their distance is zero. If the two ing intervention, the encapsulation and decomposition partitions are not equal, the minimum number of elemen- levels should be compared with the initial ones and the tary transformations to convert the first one into the second cost of each restructuring alternative should be estimated. one has to be determined. x, the total number of objects in each partition, is initially assigned to the support variable 5 EXPERIMENTAL RESULTS min. In fact, this is an upper bound for such a minimum. Then, for each pair of sets from the two partitions that are The proposed approach to module restructuring based on different and have a nonempty intersection, the elementary concept analysis was applied to 10 public domain and transformations to turn the first one into the second one are 10 industrial programs, written in C language. The front applied. Pairs with empty intersection can be disregarded end of CANTO  (Code and Architecture Analysis Tool) 358 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 TABLE 2 Test Suite of Public Domain (Top) and Industrial (Bottom) Programs The size of the programs in Lines Of Code (LOC) is given in Column 2. Columns 3 and 4 contain the number of functions and modules. The number of objects and attributes for each of the three considered contexts is shown in the next columns. was used to extract the information needed for concept Names of the programs in the industrial test suite are not analysis from the code. given for reasons of confidentiality. Their application CANTO  is composed of several subsystems: a front domain ranges from banking to telecommunications, end to parse C code, an architectural recovery environment, computer-aided design, and multimedia database manage- a static analyzer, an interface for graph displaying, and a ment. The table gives the size of each program in Lines Of customized editor. The user, in a closed loop, can analyze a Code (LOC). The next columns contain the number of system, navigate through different views by means of a functions and the number of modules. Then, for each of the graphical user interface, generate queries and new views, three considered contexts, the associated number of objects and add and remove components, subsystems, and code to and attributes is shown. accomplish maintenance tasks. Among the static analyses By considering the total size in LOC and the number of available from CANTO, the points-to analysis  is the modules (Table 2), it can be noted that programs in the test most important for the present work since it provides a suite exhibit a high variability in the granularity of modules. static solution to the problem of determining the accesses to More particularly, the size of each individual module in the dynamically allocated data structures. In fact, the result of public domain programs ranges from one LOC to 14,662 points-to analysis is a set of points-to pairs associating LOC, with an average of 702.5, while, in the industrial test pointers to the (possibly) pointed-to locations, where the suite, it is between 69 and 4,949 LOC (average 1,742.9). The locations may either be static or dynamic. Results are number of functions per module is also an indicator of high approximate (exact solutions are in general not computable) variability. In fact, in the public domain, code modules but safe, i.e., the pointed-to locations are possibly a superset contain one to 130 functions each (with an average of 9.1), but never a subset of the true set. while industrial modules contain a number of functions Three different kinds of attributes were considered for between one and 56 (average 4.9). This is an indicator of the encapsulation improvement: the accesses to dynamically strong dependence of module granularity on the applica- allocated memory locations, the structured types in the tion domain, the programming style, the development function signature, including the return type, and the software, and many other factors resulting in a high definitions and uses of global variables. Correspondingly, variability of module size and function number. three contexts were generated for each program and For the first considered context, the number of objects in restructuring directions were obtained by concept analysis, Table 2 is the number of functions accessing some dynamic aimed at improving the encapsulation, respectively, of memory, while the number of attributes is the number of dynamic memories, structured types, and global variables. dynamic locations. In the second context, only the functions with structured types in the signature are considered, and 5.1 Test Suite the number of such types is the number of attributes. Table 2 contains the public domain3 programs at the top, Finally, the last context relates functions to global variables. while the industrial programs are listed at the bottom. On average, the number of functions involved in the three contexts is, respectively, 57.8, 41.2, and 111.3, while the 3. Actually, most programs in the public domain test suite are distributed under the GNU General Public License as published by the number of attributes is 38.4, 9.0, and 313.7. Thus, the third Free Software Foundation. context involves about twice the number of functions in the TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 359 TABLE 3 TABLE 4 Original Number of Encapsulation Violations Number of Concepts for the Public Domain and Industrial for the Public Domain and Industrial Programs Programs in the Three Considered Contexts in the Three Considered Contexts inserted. Such work is very expensive, especially if it has to first and second contexts, while the attributes in the three be replicated on every program in the test suite and for contexts are highly variable in number, reaching a max- every context. Thus, restructuring was evaluated in a blind imum in the third context again. way, considering all retrieved attributes as candidates for The organization of functions into modules, i.e., their encapsulation. In a more realistic use, a manual selection of distribution among source files, was considered in order to the relevant attributes is preliminarly performed and only assess the initial number of encapsulation violations for the related violations are considered. This approach was each of the three contexts. Table 3 contains such values, followed in some case studies taken from the presented test representing the number of functions accessing attributes of suite and discussed below. another module. Regarding the accesses to dynamic Concept analysis was performed for the 20 programs memory, all public domain programs show some violations considered on a Sun SPARC 20 with 64 Mb of internal of encapsulation, with only one exception, gdbm. Industrial memory and one Gb of swap area under normal load programs have many fewer violations in the access to conditions. Table 4 contains the number of concepts found dynamic memory. In fact, only two programs have modules for each program in each context. No concept was accessing dynamic locations not belonging to them. The determined for those programs with empty context. The second context is the one with the minimum number of third context, access to globals, which has the highest encapsulation violations in the programs considered. number of objects and attributes, is the one that generates Structured types of different modules are in the signature the highest number of concepts. Then, concepts were of some functions only in five public domain and three combined to form concept subpartitions. The number of industrial programs. In addition, the number of violations is possible combinations of k concepts taken from a set of generally low. Finally, the access to global variables by n concepts is the binomial coefficient of n and k. Therefore, external modules is very frequent in that all programs the total number of subpartitions to check could be exhibit some violations of this kind. This could indicate that exponential in the number of concepts. A timeout of global variables are commonly used as a means to exchange 10 hours was fixed to stop subpartition computation in information between modules, rather than a data structure cases in which the number of concepts is too high. around which to encapsulate the related computation. Subpartitions are formed in increasing order so that, when Encapsulation violations considered in Table 3 simply the computation is stopped, higher order subpartitions are obey the rule that a module has a function accessing an not determined. For the considered programs, it was attribute from another module. This is not always unde- possible to complete such a computation for all the contexts sired behavior. For example, global variables may be in which no more than 30 concepts were found. intentionally shared among modules, types could be in The average number of subpartitions determined with- the signature of functions that do not manipulate them but in the 10 hours timeout is 183,703.4, 52,912.9, and act as accessors returning the structure to be manipulated 161,426.9 in the three contexts, respectively. If concept by means of encapsulated functions, and dynamic strings partitions are considered instead, such average numbers may be accessed from anywhere without violating encap- dramatically decrease to 1.17, 1.61, and 1.05, respectively. sulation unless strings are themselves encapsulated. There- In fact, in many cases, the only disjoint concept combina- fore, a better starting point for restructuring is a context in tion that covers the whole object set is the top concept, which only attributes intended to be encapsulated are with all objects in the extent and typically empty intent. 360 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 Fig. 3. Program minicom: restructuring cost for different decomposition Fig. 4. Program minicom: restructuring cost as a function of the and violation relative levels, in the first context (dynamic memory decomposition (b I) and encapsulation violation (` I) relative levels, access). considered independently, in the first context (dynamic memory access). Considering subpartitions has an experimental validation encapsulation violations, the restructuring cost increases in its capability to extract many nontrivial concept with the decomposition level. Typically, an improvement in combinations that are otherwise missed. encapsulation violations can be obtained more easily (with Concept subpartitions were extended to be partitions of a lower cost) if a decrease in decomposition is accepted. the whole object set. To evaluate the resulting modulariza- Points on the horizontal axis represent solutions with no tion against the original one, the proposed measures of encapsulation violations at all. In such cases, all accesses to encapsulation violations, decomposition, and partition the selected attributes do not cross the boundaries of the distance have been employed. For subpartitions with the module. same encapsulation violations and decomposition, the one Fig. 4 shows the cost of reducing encapsulation viola- with the minimum distance was chosen. The diagram tions or increasing decomposition for the minicom representing this distance for each associated encapsulation program. The horizontal axis is divided into two intervals, violation and decomposition level was computed for every from 0 to 1 and from 1 to 2. Points in the 0, 1 range program in the test suite. The levels of encapsulation represent relative encapsulation violation levels for the violations and decomposition were considered relative to the original ones by computing the ratio between the two. concept subpartitions. The associated restructuring cost, Ratios also permit a comparison between restructuring estimated as partition distance from the original modular- actions on different programs. ization, is the vertical displacement. Encapsulation viola- An example of such a diagram for the minicom tion costs considered in this diagram are the minimum program, in the first considered context, is shown in Fig. 3. values with respect to the decomposition levels. Points in The shapes of the diagrams for the other programs are the 1, 2 range represent restructuring costs to improve slight variants of that in Fig. 3. A cost equal to zero is placed decomposition. Minimum values with respect to the at the coordinates (1, 1) since this is the initial level of different encapsulation violation levels are considered. This decomposition and encapsulation violation. Ratios between diagram is useful when restructuring is mainly focused the encapsulation violations in the restructured and in the only on encapsulation and decomposition can become original programs are low, thus indicating that the mod- worse or vice versa on decomposition, with the possibility ularizations determined by concept analysis are consistent of increasing encapsulation violations. The plot of the costs with the choices made by the programmers. They are associated with the restructured modularization found by comparable in granularity and organization. Furthermore, concept analysis suggests that low levels of encapsulation they often allow for the improvement of encapsulation violations and high levels of decomposition require ex- and/or decomposition. Points in the lowest region corre- pensive restructuring interventions. While reducing encap- spond to a reduction in the number of encapsulation sulation violations, the associated restructuring cost is not violations, while points in the rightmost region represent an monotone for most programs in the test suite, thus increased decomposition. The results depicted in Fig. 3 indicating that substantial improvement may be obtained show that, for the minicom program, there are opportu- at costs as low as those for minor improvements. On the nities for restructuring. In fact, several points are in the contrary, the cost for increasing decomposition has a more lowest rightmost region with fewer encapsulation viola- regular monotonic plot. Costs for decreasing encapsulation tions and increased decomposition. This is often true also violations are generally higher than costs for increasing for the other programs in the test suite, within the first and decomposition at the same relative improvement level, for third contexts, while, for the second context, a decrease most considered programs. The same kind of plot for both in encapsulation violations is often paid in terms of costs can be observed for all three considered contexts, but decreased decomposition. In addition, for a given level of the third context is characterized by a much higher cost TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 361 TABLE 5 signature-type-based context with other attributes (dy- Relative Decomposition of the Extended Subpartitions and namic location or global variable accesses) by exploiting Distance from the Original Modules for the Programs with the knowledge of the relevance of the attribute for the No Encapsulation Violations searched modularization. By performing such an exten- sion on most of the programs in this category, it was possible to exactly reconstruct the original modularization. Two programs, gzip and flex, need special explana- tion. In gzip, the two functions _getopt_internal and getopt_long are extracted from their original module, getopt.c, by concept analysis, the reason for this being that the other three functions in this module do not manipulate struct option type data. Actually, two of them, namely my_strlen and my_index, are general string manipulation routines that do not share anything with _getopt_internal and getopt_long and are correctly taken separated. The other function, exchange, range. Such high costs are associated with eliminating shares the access to the command line string with the two global variable accesses from outside the modules defining encapsulated functions. If this access is modeled as an them, i.e., making all global variables static. additional attribute, concept analysis is able to group it with the other two extracted functions. 5.2 Assessing Concept Analysis Modularization In flex, the module sym.c implements a symbol table. In the first two contexts, there are some programs with no It exports several interface functions to manipulate the encapsulation violations at all. They can be used to assess symbol table, but it also contains the functions implement- the modularization capability of concept analysis: Attri- ing the hash table on which the symbol table is based. butes are already encapsulated in such cases and it is likely Concept analysis separates the hash table management that the encapsulation is based on a common purpose. functions from the more general symbol manipulation Therefore, concept analysis should determine a subpartition functions and assigns them to two distinct modules. By whichÐwhen extendedÐgives a modularization close to building an extended context based on the struct the original one. hash_entry type and the access to struct hash_entry Table 5 gives the list of all the programs in the first and type dynamic locations, all low level functions operating on second contexts without encapsulation violations and with a a hash table can be isolated and extracted. Symbol table nonempty context. For each of them, the subpartition manipulation functions use only interface functions to the without encapsulation violations with minimum distance hash table. from the original modularization was determined. Such a distance is shown in the next columns, after the decomposi- 5.3 Case Studies tion level, given as a fraction of the original decomposition. Some of the restructuring interventions suggested by On five of the 12 examined programs, concept analysis concept analysis were actally implemented on two of the was able to exactly reconstruct the same modularization as analyzed programs to obtain a deeper insight into the in the original programs by only exploiting information required actions and the resulting systems. about the attributes (dynamic memory or signature types) less is a UNIX utility to display a text file on a terminal of the involved functions. On nine of the remaining with the possibility of backward movements. In the programs, concept analysis modularization has a distance second context, shown in Table 6, it has just one of 1 from the original modularization and, in the last two encapsulation violation that can be eliminated by incre- cases, such distance becomes 2. Thus, when the modular- menting the decomposition level. The distance of this new ization extracted by concept analysis is not exactly the modularization from the original one is three. The original one, it is very close to it. Distance values of 1 or 2 detected encapsulation violation is due to the presence correspond to removing one or two functions from the of type struct scrpos in the signature of functions original module and inserting them into a new module, store_pos and get_pos from file ifile.c and thus increasing the decomposition level. function get_scrpos from file position.c. If all The cases with the remodularized program different computation on the struct scrpos type is encapsulated from the original one basically have two explanations. inside a separated file, two problems arise. As a field of Some modules group functions that are logically related but do not share any attribute mapped into a program- an ifile dynamic structure manipulated inside ifile.c ming construct. For example, modules manipulating is of type struct scrpos, the new module accesses its devices at a low level use a file descriptorÐrepresented private fields, thus violating encapsulation of dynamic as an integerÐto access the devices. Such a feature cannot memory (first context). Such a violation can be considered be represented by a proper attribute that can be auto- acceptable as the new module exports all operations on matically extracted from the code (checking accesses to struct scrpos data. Furthermore, an accessor function integer variables is too coarse a condition). In the other returning such a field is required in module ifile.c so basic situation, modules cannot be characterized by only that client modules can pass it to the new module without one kind of attribute. Typically, the user can extend the violating encapsulation. In the new module scrpos.c, 362 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 4, APRIL 2001 TABLE 6 part of the module window.c. Restructuring interventions Second Context (Structured Types) for the Program less found by concept analysis include a subpartition, with a cost equal to 42, allowing a 55 percent reduction of the violations and an increased decomposition. It was selected from the alternative subpartitions by exploiting the plots in Figs. 3 and 4. Among the points on the left plot, the one at coordinates (1.06, 0.45) with cost 42 exhibits an interesting trade-off between encapsulation, decomposition, and cost. If the minimum cost to improve encapsulation is considered with no regard to decomposition (Fig. 4) the same subpartitionÐpositioned at coordinates (0.45, 42)Ðappears as the best choice by giving a high improvement at minimum cost. The selected subpartition consists of one concept with 45 functions in the extent and one dynamic location of type WIN in the intent. By manually examining the statements inside those functions accessing the WIN type dynamic structure from outside window.c, it is evident that most of the accesses do not implement a meaningful and recogniz- the two functions get_scrpos and store_pos can be able operation on WIN data. Thus, such accesses can be merged and become function copy_scrpos. When replaced by simple get or set functions working on WIN encapsulated in scrpos.c, the ifile structure in their original signature is replaced by a second scrpos attributes. There are actually three functions performing a structure. As a consequence, the action performed is the more general operation, namely scrollback, drawhist, same, provided the actual parameters are exchanged in and getline from minicom.c. It is possible to incorpo- calls to store_pos, with respect to get_pos. To avoid rate them in the window.c module, thus extending its accesses to an internal dynamic structure of position.c, operations on WIN data structures. The three selected two accessor functions are added to this module, respec- functions also operate on a global location named us, of tively, returning an index in a dynamic table and the type WIN, which is static to their original module, value associated with an index. minicom.c. Therefore, to move and extract them from Alternative solutions to improve encapsulation are the original context, it is necessary to extend their signature considering the computation on struct scrpos as a part so as to include a pointer to the global WIN location that is of the computation performed inside position.c or manipulated. With some other minor changes, it was inside ifile.c. The first solution still has the disadvan- possible to encapsulate such operations inside the module tage that private fields of a dynamic structure belonging to window.c and to obtain a new version of the program with ifile.c are manipulated from inside position.c. The no encapsulation violation to the WIN type data structures second solution is probably the best one since it eliminates and with the same decomposition level. all undesired accesses. In fact, if the three considered The final solution for the minicom program is slightly functions are inserted into ifile.c, no external module different from the one associated with the selected sub- manipulates the ifile structure fields, no external func- partition. The reason is twofold: First, several violations tion manipulates scrpos structures, and the accesses to the were removed by simply providing get and set attribute dynamic table from position.c can be avoided by means manipulation operations; second, the functions recognized of the two accessor functions discussed above. as meaningful manipulations to be encapsulated were This example highlights that improving encapsulation is inserted, rather than becoming a new separated module, never a trivial task and substantial work is required on the in the module window.c since this is the natural site for part of the programmer to evaluate the alternative solutions them. As a consequence, the final decomposition level is and to take into account the whole picture. Nevertheless, the initial hints were determined through concept analysis unchanged, instead of increased. and shown to be very very useful. This second case study highlights the blind nature of minicom is a free communication program. Features concept analysis with respect to function semantics. All include a dialing directory with auto-redial, support for manipulations are considered equivalent, while a manual UUCP-style lock files on serial devices, a separate script inspection reveals that, for some of them, the availability of language interpreter, capture to file, and multiple users an accessor/modifier suffices, while others require a deeper with individual configurations. reworking, making them general encapsulated functions. In the first context, there are 22 encapsulation violations Nevertheless, concept analysis was a good starting point for associated with a dynamic location of type WIN. The data the identification of the interventions to be performed and structures of this type implement a portable character-based the selected subpartition contained useful restructuring window system for which all manipulating functions are suggestions. TONELLA: CONCEPT ANALYSIS FOR MODULE RESTRUCTURING 363 6 CONCLUSION  P.E. Livadas and T. Johnson, ªA New Approach to Finding Objects in Programs,º Software Maintenance: Research and Practice, This paper focused on the use of concept analysis for vol. 6, pp. 249±260, 1994.  S. Mancoridis and R.C. Holt, ªRecovering the Structure of module identification. By extending concept subpartitions Software Systems Using Tube Graph Interconnection Clustering,º to cover the whole object set, a modularization candidate is Proc. Int'l Conf. Software Maintenance, pp. 23±32, 1996. determined for which the variations in encapsulation and  S. Mancoridis, B.S. Mitchell, Y. Chen, E.R. Gansner, ªUsing decomposition are quantified. In addition, a measure of Automatic Clustering to Produce High-Level System Organiza- tions of Source Code,º Proc. Int'l Workshop Program Comprehension, distance from the original modular structure of the program pp. 45±52, 1998. provides some indications of the cost of the restructuring  S. Mancoridis, B.S. Mitchell, Y. Chen, and E.R. Gansner, ªBunch: A interventions. Clustering Tool for the Recovery and Maintenance of Software System Structures,º Proc. Int'l Conf. Software Maintenance, pp. 50± The proposed approach to module restructuring was 59, 1999. applied to 10 public domain and 10 industrial programs.  H.A. Muller, M.A. Orgun, S.R. Tilley, and J.S. Uhl, ªA Reverse Alternatives with respect to the original modularizations Engineering Approach to Subsystem Structure Identification,º Software Maintenance: Research and Practice, vol. 5, no. 4, pp. 181± were determined by concept analysis. The graphical plot of 204, 1993. the restructuring cost for each encapsulation and decom-  D. Paulson and Y. Wand, ªAn Automated Approach to Informa- position relative level was a helpful tool when determining tion Systems Decomposition,º IEEE Trans. Software Eng., vol. 18, no. 3, pp. 174±189, Mar. 1992 the selection of extended concept subpartitions. Concept  R.W. Schwanke, ªAn Intelligent Tool for Re-Engineering Software analysis was also able to extract modularizations identical Modularity,º Proc. Int'l Conf. Software Eng., pp. 83±92, 1991. or very similar to those in the programs without encapsula-  M. Siff and T. Reps, ªIdentifying Modules via Concept Analysis,º Proc. Int'l Conf. Software Maintenance, pp. 170±178, Oct. 1997. tion violations. This is a strong hint of the possibility of  G. Snelting, ªReengineering of Configurations Based on Mathe- capturing the organization of functions around the ma- matical Concept Analysis,º ACM Trans. Software Eng. and nipulated data structures by analyzing proper access Methodology, vol. 5, no. 2, pp. 146±189, 1996. attributes through concept analysis. The execution of some  G. Snelting, F. Tip, ªReengineering Class Hierarchies Using Concept Analysis,º Proc. Sixth Int'l Symp. Foundations of Software complete restructuring interventions suggested by concept Eng., Nov. 1998. analysis highlighted the nontrivial nature of such interven-  P. Tonella, ªUsing the O-A Diagram to Encapsulate Dynamic tions, but also enforced the intuition that very useful Memory Access,º Proc. Int'l Conf. Software Maintenance, pp. 326± 335, Nov. 1998. suggestions can come from concept subpartition computa-  A. Yeh, D. Harris, and H. Reubenstein, ªRecovering Abstract Data tion, especially when coupled with encapsulation and Types and Object Instances from a Conventional Procedural decomposition measures and restructuring cost estimates. Language,º Proc. Working Conf. Reverse Eng., pp. 227±236, 1995. REFERENCES Paolo Tonella received the laurea degree cum  G. Antoniol, R. Fiutem, G. Lutteri, P. Tonella, and S. Zanfei, laude in electronic engineering from the Uni- ªProgram Understanding and Maintenance with the CANTO versity of Padua, Italy, in 1992, and the PhD Environment,º Proc. Int'l Conf. Software Maintenance, pp. 72±81, degree in software engineering from the same Oct. 1997. university, in 1999, with a thesis entitled ªCode  G. Caldiera and V.R. Basili, ªIdentifying and Qualifying Reusable Analysis in Support to Software Maintenance.º Software Components,º Computer, pp. 61±70, 1991. Since 1994, he has been a full time researcher  G. Canfora, A. Cimitile, M. Munro, and C. Taylor, ªExtracting of the Software Engineering Group at IRST Abstract Data Type from C Programs: A Case Study,º Proc. Int'l (Institute for Scientific and Technological Re- Conf. Software Maintenance, pp. 200±209, Sept. 1993. search), Trento, Italy. He has participated in  G. Canfora, A. Cimitile, M. Tortorella, and M. Munro, ªA Precise several industrial and European Community projects on software Method for Identifying Reusable Abstract Data Types in Code,º analysis and testing. His current research interests include software Proc. Int'l Conf. Software Maintenance, pp. 404±413, Sept. 1994. engineering, reverse engineering, object-oriented programming, and  J. Esteva, ªAutomatic Identification of Reusable Components,º code analysis. Proc. Seventh Int'l Workshop Computer-Aided Software Eng., pp. 80± 87, July 1995.  R. Fiutem, P. Tonella, G. Antoniol, and E. Merlo, ªPoints-to Analysis for Program Understanding,º J. Systems and Software, vol. 44, no. 3, pp. 213±227, Jan. 1999.  P. Funk, A. Lewien, and G. Snelting, ªAlgorithms for Concept Lattice Decomposition and Their Application,º technical report, Computer Science Dept., Technische Univ. Braunschweig, 1995.  J.F. Girard and R. Koschke, ªFinding Components in a Hierarchy of Modules: A Step Towards Architectural Understanding,º Proc. Int'l Conf. Software Maintenance, pp. 72±81, Oct. 1997.  W. Griswold, M. Chen, R. Bowdidge, and J. Morgenthaler, ªTool Support for Planning the Restructuring of Data Abstractions in Large Systems,º Proc. Int'l Conf. Foundations of Software Eng., pp. 33±45, 1996.  M. Krone and G. Snelting, ªOn the Inference of Configuration Structures from Source Code,º Proc. 16th Int'l Conf. Software Eng., pp. 49±57, May 1994.  T. Kunz, ªEvaluating Process Clusters to Support Automatic Program Understanding,º Proc. 19th Int'l Workshop Program Comprehension, pp. 198±207, Mar. 1996.  C. Lindig and G. Snelting, ªAssessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis,º Proc. 19th Int'l Conf. Software Eng., pp. 349±359, May 1997.
Pages to are hidden for
"Concept Analysis for Module Restructuring"Please download to view full document