Shared by: GatorFace
Patty Jablonski Paper Summary November 12, 2007 “Cloning Considered Harmful” Considered Harmful Duplicated or cloned code is often considered harmful to software quality, however it can also be a reasonable or beneficial design option. First, in their background on the topic of cloning, the authors mention reasons for “ill- intentioned cloning” including when: 1) the cost of duplicating code is less than the cost of creating an abstraction in the short term, 2) the programmer doesn’t fully understand the problem or solution (and existing code provides some or all of the functionality), and 3) the programmer simply repeats a common solution from memory, which results in duplicated code. They also mention ways in which duplication can be introduced with “good intentions” including when: 1) it keeps the code clean and understandable rather than introducing an unreadable, complicated abstraction and 2) the programming language lacks expressiveness, so a trusted solution is reused (for example, in COBOL). Some problems that can happen as a result of cloning that the authors mention include: 1) an increase in code size, 2) the presence of unused or “dead” code, 3) an increase in maintenance, and 4) a decrease in program comprehensibility. On the other hand, beneficial effects of clones include: 1) a reduction in complexity where abstractions are difficult to form and 2) the risk to code stability can be avoided, which is useful in exploratory development. Kapser and Godfrey introduce a categorization of cloning patterns similar to the cataloging of design patterns or anti-patterns. They define 8 patterns of cloning that they found from case studies that they put into one of three groups – forking, templating, or customization. They describe each pattern in terms of its name, motivation, advantages, disadvantages, management, long term issues, structural manifestations, and examples. Forking Forking usually involves duplicating a large portion of code to be evolved independently. Cloning patterns in this group include: 1. Hardware variations – This includes copying and pasting an existing driver to create a new driver in the same hardware family. An advantage of cloning is that testing is not needed. Disadvantages include maintenance issues and code growth. An example of this cloning pattern is seen in the Linux SCSI driver. 2. Platform variations – This includes cloning when porting software to new platforms. It is often considered easier, faster, and safer to clone the code in this case. Advantages include avoided complexity and maintained stability. Disadvantages include maintenance issues like propagating bug fixes and other changes. The authors suggest that variations between the clones should be well documented and that visible behavior from each clone remain consistent. An example of this cloning pattern is seen in the Apache Portable Runtime (APR) library. 3. Experimental variation – This includes duplication when experimenting (optimizing or extending existing code), where the programmer does not want to risk the stability of the existing system. The experimental fork can be merged with or replace the stable version of the system later. The authors point out that both forks need to be consistently maintained and that it is important to document the differences between them. An example of this cloning pattern is seen in the Apache httpd web server. Templating Templating involves directly copying existing code, where the appropriate abstraction is unavailable. Cloning patterns in this group include: 4. Boiler-plating due to language in-expressiveness – This includes cases when the programming language lacks expressiveness, so a trusted solution is reused (for example, in COBOL). Cloning in this case provides consistent behavior and improves program comprehensibility. A disadvantage can be increased maintenance (however, to solve this problem, the authors suggest using synchronous editing, like Linked Editing, or to use “generated code at build time, making the duplicate exist only when the source code is compiled”). An example of this cloning pattern is seen in Postgresql. 5. API/library protocols – This includes situations where there is an ordered series of procedure calls or an order of activities like in GUI buttons or network socket creation. This cloning is beneficial in learning and reducing coding effort. Also, the size of the duplicated code is small. Problems occur when buggy code is duplicated or when changes need to be made in all places individually. An example of this cloning pattern is seen in the mail client Columba (GUI buttons). 6. General language or algorithmic idioms – This includes clear and concise implementations of particular solutions that are structured and standardized. Cloning in this situation helps improve program comprehensibility, however inconsistencies or faulty implementations of idioms may be overlooked, which could be a problem. An example of this cloning pattern is seen in the Apache APR library. Customization Customization involves copying existing code that solves a similar problem to the current problem and modifying it accordingly. Cloning patterns in this group include: 7. Bug workarounds – This includes copying and pasting in order to fix a bug due to issues of code ownership or unacceptable risk exposure. In this situation, once the original bug is fixed, any duplication should be removed. An example of this cloning pattern is seen when one of the authors wanted to fix a bug in the javac compiler (that he did not have access/ownership of the code to modify the code directly, but instead had to modify a copy of the code). Another example is seen in Postgresql and the MinGW external libraries, where the bug in the library wasn’t fixed yet, but the Postgresql developers were able to work around it by duplicating code. 8. Replicate and specialize – This includes situations where there is existing code that solves a similar problem to the current one, where modifications can be made accordingly. This is specified by the authors as the most common type of cloning. Advantages include minimizing the costs associated with risk and reducing the costs of testing, refactoring, and developing an abstraction. Disadvantages include the difficulty of finding and maintaining the duplicates over time. Solutions may include creating an abstraction or linking the clones. An example of this cloning pattern is seen when reusing complex logic by copying and pasting control structures (if statements, for/while loops, etc). Another example is seen in Gnumeric (menu). While many negative effects of code cloning have been cited as reasons to remove duplicated code from source code, code cloning can often be used in a positive way, according to Kapser and Godfrey. The authors suggest methods of managing these code clones and suggest that tools should be developed with the long term maintenance of duplicates in mind.