November 12, 2007
“Cloning Considered Harmful” Considered Harmful
Duplicated or cloned code is often considered harmful to software quality, however it can
also be a reasonable or beneficial design option.
First, in their background on the topic of cloning, the authors mention reasons for “ill-
intentioned cloning” including when: 1) the cost of duplicating code is less than the cost
of creating an abstraction in the short term, 2) the programmer doesn’t fully understand
the problem or solution (and existing code provides some or all of the functionality), and
3) the programmer simply repeats a common solution from memory, which results in
duplicated code. They also mention ways in which duplication can be introduced with
“good intentions” including when: 1) it keeps the code clean and understandable rather
than introducing an unreadable, complicated abstraction and 2) the programming
language lacks expressiveness, so a trusted solution is reused (for example, in COBOL).
Some problems that can happen as a result of cloning that the authors mention include: 1)
an increase in code size, 2) the presence of unused or “dead” code, 3) an increase in
maintenance, and 4) a decrease in program comprehensibility. On the other hand,
beneficial effects of clones include: 1) a reduction in complexity where abstractions are
difficult to form and 2) the risk to code stability can be avoided, which is useful in
Kapser and Godfrey introduce a categorization of cloning patterns similar to the
cataloging of design patterns or anti-patterns. They define 8 patterns of cloning that they
found from case studies that they put into one of three groups – forking, templating, or
customization. They describe each pattern in terms of its name, motivation, advantages,
disadvantages, management, long term issues, structural manifestations, and examples.
Forking usually involves duplicating a large portion of code to be evolved independently.
Cloning patterns in this group include:
1. Hardware variations – This includes copying and pasting an existing driver to
create a new driver in the same hardware family. An advantage of cloning is that
testing is not needed. Disadvantages include maintenance issues and code growth.
An example of this cloning pattern is seen in the Linux SCSI driver.
2. Platform variations – This includes cloning when porting software to new
platforms. It is often considered easier, faster, and safer to clone the code in this
case. Advantages include avoided complexity and maintained stability.
Disadvantages include maintenance issues like propagating bug fixes and other
changes. The authors suggest that variations between the clones should be well
documented and that visible behavior from each clone remain consistent. An
example of this cloning pattern is seen in the Apache Portable Runtime (APR)
3. Experimental variation – This includes duplication when experimenting
(optimizing or extending existing code), where the programmer does not want to
risk the stability of the existing system. The experimental fork can be merged with
or replace the stable version of the system later. The authors point out that both
forks need to be consistently maintained and that it is important to document the
differences between them. An example of this cloning pattern is seen in the
Apache httpd web server.
Templating involves directly copying existing code, where the appropriate abstraction is
unavailable. Cloning patterns in this group include:
4. Boiler-plating due to language in-expressiveness – This includes cases when
the programming language lacks expressiveness, so a trusted solution is reused
(for example, in COBOL). Cloning in this case provides consistent behavior and
improves program comprehensibility. A disadvantage can be increased
maintenance (however, to solve this problem, the authors suggest using
synchronous editing, like Linked Editing, or to use “generated code at build time,
making the duplicate exist only when the source code is compiled”). An example
of this cloning pattern is seen in Postgresql.
5. API/library protocols – This includes situations where there is an ordered series
of procedure calls or an order of activities like in GUI buttons or network socket
creation. This cloning is beneficial in learning and reducing coding effort. Also,
the size of the duplicated code is small. Problems occur when buggy code is
duplicated or when changes need to be made in all places individually. An
example of this cloning pattern is seen in the mail client Columba (GUI buttons).
6. General language or algorithmic idioms – This includes clear and concise
implementations of particular solutions that are structured and standardized.
Cloning in this situation helps improve program comprehensibility, however
inconsistencies or faulty implementations of idioms may be overlooked, which
could be a problem. An example of this cloning pattern is seen in the Apache
Customization involves copying existing code that solves a similar problem to the current
problem and modifying it accordingly. Cloning patterns in this group include:
7. Bug workarounds – This includes copying and pasting in order to fix a bug due
to issues of code ownership or unacceptable risk exposure. In this situation, once
the original bug is fixed, any duplication should be removed. An example of this
cloning pattern is seen when one of the authors wanted to fix a bug in the javac
compiler (that he did not have access/ownership of the code to modify the code
directly, but instead had to modify a copy of the code). Another example is seen
in Postgresql and the MinGW external libraries, where the bug in the library
wasn’t fixed yet, but the Postgresql developers were able to work around it by
8. Replicate and specialize – This includes situations where there is existing code
that solves a similar problem to the current one, where modifications can be made
accordingly. This is specified by the authors as the most common type of cloning.
Advantages include minimizing the costs associated with risk and reducing the
costs of testing, refactoring, and developing an abstraction. Disadvantages include
the difficulty of finding and maintaining the duplicates over time. Solutions may
include creating an abstraction or linking the clones. An example of this cloning
pattern is seen when reusing complex logic by copying and pasting control
structures (if statements, for/while loops, etc). Another example is seen in
While many negative effects of code cloning have been cited as reasons to remove
duplicated code from source code, code cloning can often be used in a positive way,
according to Kapser and Godfrey. The authors suggest methods of managing these code
clones and suggest that tools should be developed with the long term maintenance of
duplicates in mind.