Docstoc

Cloning Disadvantages

Document Sample
Cloning Disadvantages Powered By Docstoc
					Patty Jablonski
Paper Summary
November 12, 2007

          “Cloning Considered Harmful” Considered Harmful

Duplicated or cloned code is often considered harmful to software quality, however it can
also be a reasonable or beneficial design option.

First, in their background on the topic of cloning, the authors mention reasons for “ill-
intentioned cloning” including when: 1) the cost of duplicating code is less than the cost
of creating an abstraction in the short term, 2) the programmer doesn’t fully understand
the problem or solution (and existing code provides some or all of the functionality), and
3) the programmer simply repeats a common solution from memory, which results in
duplicated code. They also mention ways in which duplication can be introduced with
“good intentions” including when: 1) it keeps the code clean and understandable rather
than introducing an unreadable, complicated abstraction and 2) the programming
language lacks expressiveness, so a trusted solution is reused (for example, in COBOL).

Some problems that can happen as a result of cloning that the authors mention include: 1)
an increase in code size, 2) the presence of unused or “dead” code, 3) an increase in
maintenance, and 4) a decrease in program comprehensibility. On the other hand,
beneficial effects of clones include: 1) a reduction in complexity where abstractions are
difficult to form and 2) the risk to code stability can be avoided, which is useful in
exploratory development.

Kapser and Godfrey introduce a categorization of cloning patterns similar to the
cataloging of design patterns or anti-patterns. They define 8 patterns of cloning that they
found from case studies that they put into one of three groups – forking, templating, or
customization. They describe each pattern in terms of its name, motivation, advantages,
disadvantages, management, long term issues, structural manifestations, and examples.

Forking
Forking usually involves duplicating a large portion of code to be evolved independently.
Cloning patterns in this group include:

   1. Hardware variations – This includes copying and pasting an existing driver to
      create a new driver in the same hardware family. An advantage of cloning is that
      testing is not needed. Disadvantages include maintenance issues and code growth.
      An example of this cloning pattern is seen in the Linux SCSI driver.

   2. Platform variations – This includes cloning when porting software to new
      platforms. It is often considered easier, faster, and safer to clone the code in this
      case. Advantages include avoided complexity and maintained stability.
      Disadvantages include maintenance issues like propagating bug fixes and other
       changes. The authors suggest that variations between the clones should be well
       documented and that visible behavior from each clone remain consistent. An
       example of this cloning pattern is seen in the Apache Portable Runtime (APR)
       library.

   3. Experimental variation – This includes duplication when experimenting
      (optimizing or extending existing code), where the programmer does not want to
      risk the stability of the existing system. The experimental fork can be merged with
      or replace the stable version of the system later. The authors point out that both
      forks need to be consistently maintained and that it is important to document the
      differences between them. An example of this cloning pattern is seen in the
      Apache httpd web server.

Templating
Templating involves directly copying existing code, where the appropriate abstraction is
unavailable. Cloning patterns in this group include:

   4. Boiler-plating due to language in-expressiveness – This includes cases when
      the programming language lacks expressiveness, so a trusted solution is reused
      (for example, in COBOL). Cloning in this case provides consistent behavior and
      improves program comprehensibility. A disadvantage can be increased
      maintenance (however, to solve this problem, the authors suggest using
      synchronous editing, like Linked Editing, or to use “generated code at build time,
      making the duplicate exist only when the source code is compiled”). An example
      of this cloning pattern is seen in Postgresql.

   5. API/library protocols – This includes situations where there is an ordered series
      of procedure calls or an order of activities like in GUI buttons or network socket
      creation. This cloning is beneficial in learning and reducing coding effort. Also,
      the size of the duplicated code is small. Problems occur when buggy code is
      duplicated or when changes need to be made in all places individually. An
      example of this cloning pattern is seen in the mail client Columba (GUI buttons).

   6. General language or algorithmic idioms – This includes clear and concise
      implementations of particular solutions that are structured and standardized.
      Cloning in this situation helps improve program comprehensibility, however
      inconsistencies or faulty implementations of idioms may be overlooked, which
      could be a problem. An example of this cloning pattern is seen in the Apache
      APR library.

Customization
Customization involves copying existing code that solves a similar problem to the current
problem and modifying it accordingly. Cloning patterns in this group include:

   7. Bug workarounds – This includes copying and pasting in order to fix a bug due
      to issues of code ownership or unacceptable risk exposure. In this situation, once
       the original bug is fixed, any duplication should be removed. An example of this
       cloning pattern is seen when one of the authors wanted to fix a bug in the javac
       compiler (that he did not have access/ownership of the code to modify the code
       directly, but instead had to modify a copy of the code). Another example is seen
       in Postgresql and the MinGW external libraries, where the bug in the library
       wasn’t fixed yet, but the Postgresql developers were able to work around it by
       duplicating code.

   8. Replicate and specialize – This includes situations where there is existing code
      that solves a similar problem to the current one, where modifications can be made
      accordingly. This is specified by the authors as the most common type of cloning.
      Advantages include minimizing the costs associated with risk and reducing the
      costs of testing, refactoring, and developing an abstraction. Disadvantages include
      the difficulty of finding and maintaining the duplicates over time. Solutions may
      include creating an abstraction or linking the clones. An example of this cloning
      pattern is seen when reusing complex logic by copying and pasting control
      structures (if statements, for/while loops, etc). Another example is seen in
      Gnumeric (menu).

While many negative effects of code cloning have been cited as reasons to remove
duplicated code from source code, code cloning can often be used in a positive way,
according to Kapser and Godfrey. The authors suggest methods of managing these code
clones and suggest that tools should be developed with the long term maintenance of
duplicates in mind.