EC-WP SlowBuilds by sbepstein


									Seven Deadly Sins of Slow
Software Builds
             Usman Muzaffar, VP Product Management | 5.2010
Seven Deadly Sins of Slow Software Builds

I. The Problem of Slow Software Builds
Software builds are one of the most time-consuming, resource-intensive and expensive phases of embedded
system development. The very attributes of software that make it naturally appealing for embedded
applications—trivially duplicated, easily modified, and readily composed from small parts into large products—
have contributed to exponential increase in size, complexity, and number of targets. This explosive growth has
caused build times to jump correspondingly, decelerating the entire software production cycle. Without the ability
to quickly rebuild the product, even the smallest changes force developers and testers to waste valuable time
waiting. In many organizations, it is clear that slow software builds are the primary obstacle to shipping more
product of better quality on time.

In spite of this and the near universal agreement that faster builds are always better, in practice little is done
to directly address the problem. There are two fundamental reasons for the hesitation. First, the software build
system is a critical piece of infrastructure; any interruption, even one done in the name of improvement, can be
extremely costly to the organization. Too often, a preliminary analysis recommends that the only remedy is a
complete rewrite of the existing system; the team then concludes that undertaking a substantial infrastructure
project is imprudent “at this time” and not worth the potential risk.

Second: even when the mandate to improve build times has been escalated to a business priority, the technical
options available are few, and, frequently inapplicable. Solutions to accelerate build times, both open source and
commercial, can have an alarming number of restrictions on compatibility with operating systems, platforms,
compilers, and toolchains. The organic evolution of most build systems usually means they employ a wide variety
of tools and methodologies: the apparent impedance mismatch between the prerequisites of an acceleration
solution and the real-world specifics of the build system can make the problem appear still more intractable.

What we have found, however, is that there are several ways that build system speed can be addressed, and
large gains can often be achieved in even the most complex environments with minimal change and disruption to
the existing workflow.

It is worth reviewing how the industry has evolved solutions to the problem. Conceptually, there are three ways
to make improvements to a process transforming inputs into outputs:

     1.   Start later
     2.   Finish sooner
     3.   Run faster

In a software build system, “starting later” simply means re-using as much output as possible from previous
runs. This, of course, is precisely what tools like the classic Unix make facility and its most popular modern
descendant, GNU Make 3.811, are designed to do: by comparing modification times on input and output files,
they will only re-run the commands to update those targets that are out of date with respect to their inputs.
More generally: incremental builds are an instance of a larger strategy usually called build avoidance. By making
sophisticated use of a pre-built object cache, systems can avoid compilation. Incremental builds and avoidance
strategies are critical to individual developer workflows, but they do not have widespread adoption within
production teams tasked with building the full product. First, the speed of modern compilers is often superior
to the lookup-and-retrieve mechanics of complicated cache systems; in these cases, paradoxically, it is faster
to rebuild than to re-use. Second, avoidance systems are inherently unpredictable—depending on the nature of

the change and the state of the object cache, build times can vary widely. Finally, and most importantly: release
teams are held strictly accountable for being able to rebuild any version they ship; a nondeterministic cache
means they can not guarantee their ability to recreate, bit-for-bit, the contents of a release.

“Finishing sooner” is a less useful concept for our purposes. It manifests most prominently in large systems that
have a long-running nightly or continuous build that creates a large set of components, each with distinct owners.
While the entire product cannot be considered built until the very last link step is complete, any individual library
or subcomponent may be ready hours earlier. Designing a system that makes these intermediate outputs readily
available to their owners can dramatically reduce effective cycle time.

By far the most important strategy for addressing long build times is to make the build process itself run
faster. Obviously, this can be accomplished to some extent simply by using faster hardware: newer processors,
disks, and networks that push more bits through the cycle will obviously yield correspondingly faster builds. In
practice, however, large organizations quickly extract as much as they realistically can from hardware upgrades.
What is really required to make substantial gains in performance, and what we will spend the rest of this paper
examining, is making effective, scalable use of parallel builds.

Builds are an excellent candidate for parallel (and distributed) computation, because they comprise a large
number processes (typically, compiles and links) that are logically distinct. Engineers who first launch their
builds against a multiprocessor system (or, more ambitiously, a distributed compute cluster) are universally
disappointed to discover one or both of the following:

    1.   The build fails unexpectedly
    2.   The performance improvement is far less than expected

Underlying both of these conditions is a problem with dependencies. Implicit or missing dependencies means
the parallel build has insufficient information to accurately order build steps. For example, a link consuming a
compiled object may inadvertently run before the compile producing that object is complete. Conversely (and
more evident in builds that are functional but slow), explicit or implicit overserialization can force the system
to use only one processing node even when more are available. The problems are analogous to others in
multithreaded programming: without effective synchronization, the system is vulnerable to races and deadlocks.

Most build tools do not have a good facility for completely guaranteeing (or even specifying) safe parallel
execution. Dependencies must be listed explicitly; for example in Make syntax, prerequisites and targets are
enumerated on a single line:

                                     foo.o: foo.c
                                       $(CC) -c foo.o -o foo.o

The system makes no requirements as to what can or cannot be listed as a prerequisite, nor does it demand that
the commands actually update the target. The tremendous flexibility tools like Make give their authors means
the opportunity for misuse is broad. In the next section, we will catalog the most common mistakes that impact
parallel performance.

II. Common Problems Impacting Build Parallelization
For each problem, we will explore the situation that leads to the tempting antipattern, followed by real examples
of the consequences. We then describe alternative strategies that are better suited to parallel (and thus faster)

Build Plots as a Diagnostic Tool

Because software builds can easily have tens of thousands of steps (larger builds can approach a million
compiles), it is extremely useful to have a facility to visually graph the behavior of a parallel build. The figure
below shows a build plot generated by instrumenting the Make process.2 The X-axis is time; each row on the
Y-axis represents a compute node tasked with running some steps. Each block on the graph is a job running on
a compute node for some duration; different colors are used to indicate different kinds of jobs (compiles, links,
etc.) In an ideal build, the blocks are closely packed from start to finish on each compute node: wherever there
is whitespace, an explicit or implicit serialization has forced that node to wait until its required dependencies are
satisfied, negatively impacting performance.

1. Make on the Bottom

Unlike most imperative programming and scripting languages, Make is a declarative language.3 Frustrations
with Make syntax and logic frequently lead to the creation of a “wrapper script”, typically written in a high-level
language like Perl or Python, which iterates over projects or components and invokes Make for each one:

Moving out the main iteration is problematic for two reasons. First, we have lost the implicit parallelism the Make
rules afforded us. The Perl script’s foreach loop is necessarily serial, synchronously forking each component’s
subprocess in turn. If we attempt to make it multithreaded, we must assume all the burden of synchronization,
which can make maintenance much more difficult. Second: the Perl script gives us a tempting but dangerous
facility to do component-specific pre- and post-processing outside of Make control. Using it will lead to
serialization and inefficient job packing, as this build plot from a Windows Mobile 4 illustrates:

Fortunately, this is readily resolved by using the Perl script to do one-time setup and tear-down, then letting a
top-level Makefile recursively invoke submakes for each component:

By leveraging Make’s pattern rule syntax, it is easy to preserve the data-driven brevity of the Perl script, and
maintain the list of subcomponents in a single location. We can also annotate explicit dependencies between
components using normal Make syntax, and we are absolved of dealing with any parallelization or synchronization

2. Targets with Side Effects

A common problem in build systems is the need to pass information between commands. For example, one
target may need to compute a tool version or identification number that needs to be consumed by another.
Because Make (and similar tools) provide rich mechanics for macro and variable computation, it is tempting to
simply use those facilities to calculate a value when ready and dereference when necessary:

                  all: a b

                        @do-stuff ...
                        @$(eval VERSION=$(shell gcc -v 2>&1))
                        @echo "$(VERSION)"

This kind of usage is a side-effect in a declarative language, because the variable (here, VERSION) is global
system state that is mutated as a consequence of updating a Make target. There are good reasons to avoid
side-effects in general, as they make declarative systems much harder to understand and maintain.5 For our
purposes, the side-effect is a particularly dangerous kind of implicit serialization, because Make has no visibility
of its existence. Running a build with this construct in parallel will fail unpredictably.

The simplest way to address the situation is to introduce an explicit dependency between the target that creates
the side-effect and those that consume it (or, in very simple cases like the one above, merge the commands
into a single target). While this will ensure a correct build, for a large number of targets, it will again introduce
crippling serializations. In that case, the side-effect needs to be factored out of as many targets as possible, so
that downstream consumer targets may be allowed to execute safely in parallel.

3. Multiply Updated Files

The tools used in most build systems usually consume one or few input files and produce a single output file. It
is this monohierarchical nature that lends itself so well to parallelization. Some build tools, however, have a need
to continually update a single file over multiple invocations. Examples include the compiled template repository in
the Sun Studio CC compiler6, or the PDB debugging and symbol files produced by the Microsoft Visual Studio cl
compiler 7:

                 %.obj: %.c
                     cl /c /Fo'$@' /Fd'vc80.pdb' $<

Here, all compiles are implicitly serialized against writes to a single vc80.pdb file. The same pattern is seen in
Makefile rules that update a single archive (for example, a ZIP or Jar file) as outputs are produced.

As this build plot shows, running this in parallel can lead to serious problems as multiple processes attempt to
update the same file simultaneously:

Here, the red bricks represent any job that was identified as attempting to read or modify a file before a job
that was serially earlier had finished writing it. This build was run with ElectricAccelerator, which detected this
condition and remedied it by re-running the affected target serially. While correct, this build lost any benefit of
parallelization due to the multiply updated files.

The best solution to this problem is to leverage the fact that almost all tools that do incremental updates of a
common file (including the compilers and archive tools noted above) are capable of merging partially built files.
We can use this property to restructure the invocation into two phases: first, let as many individual targets
update a unique archive or database file in parallel, then, as a serialized final step, merge the individual files into
the single destination:

Using this construct as a guide, we can re-write our Makefile rule as:

                 %.obj: %.c
                     cl /c /Fo'$@' /Fd'$@.pdb' $<

The linker will now write foo.obj’s symbols into foo.pdb; when linking the final executable, it automatically
knows where to retrieve the partial debug information.

4. Pass-Based Builds (and the myth of “cyclical dependencies”)

Another often used paradigm is to structure the build such that it requires multiple passes over the entire
hierarchy. Typically, this strategy is employed because dependencies between compile and link or link and
packaging were too complex to disentangle and enumerate. Sometimes, the build is declared to have “cyclical
dependencies”, and not only does multiple phases (compile, link, etc.) over the source tree, but recurses into the
very same directories with identical targets repeatedly.

                  all: compile libs executables

                        $(MAKE) -C srcs compile
                        $(MAKE) -C srcs libs
                        $(MAKE) -C srcs executables

                        $(MAKE) -C srcs/foo compile

This structure can lead to a tremendous amount of wasted work. Thousands of small “do-nothing” jobs consume
both processing power and network bandwidth to determine that nothing needs to be done for targets Make has
already updated. On a build plot, jobs are so short and so closely packed together they appear as black bands at
the start of make instance:

The solution here is to avoid writing Makefile rules that recurse into the same directory multiple times during
a single build. Rather than looping over the source tree repeatedly with a single directive on each iteration,
restructure to loop once with multiple targets in each directory.

The need to support “cyclical dependencies” is interesting, because all Make-based systems easily detect and
reject Makefile constructs that explicitly contain cycles. The actual situation underlying what is usually called
a cyclical dependency is a component A that is only partially built on a first pass. A second component, B,
consumes the built part of A, and itself produces a new output that the remainder of A requires.

There is no cycle: rather, there are three components, A1, B, and A2 that have a simple serialization relationship
between them. Modeling the relationship this way is both more accurate and more efficient than resorting to
multiple passes over A.

5. Outputs in the Source Directory

Make’s default compilation rules will instruct the compiler to create object files in the same directory as the

                  %.o: %.c
                         $(CC) -c $< -o $@

This is unfortunate, because it means that when the same set of sources are used to produce different outputs,
the old objects must be deleted first. This situation is virtually assured in real-world projects: sources may
need to be built with different preprocessor options, or different debug or optimization levels. The most obvious
solution is to simply instruct make to clean the derived objects and then rebuild with the new options:

                           $(MAKE) CFLAGS=-g
                           $(MAKE) clean
                           $(MAKE) CFLAGS=-O2

This construct introduces a build-level serialization, with a build plot similar to the “Make on the bottom”
construct. All of compilations in the optimization build are artificially serialized against the entire debug build.
More generally: letting outputs build into the source directory necessarily means that the system can only
support one build at a time. Once the project matures and requires independent builds on any number of axes
(variants, versions, architectures), this can become a severe obstacle to efficient parallel builds.

The solution is to avoid Make’s defaults and instead define rules that instruct the compiler to write outputs into
architecture or build-specific directories. By using GNU Make’s filename manipulation functions, we can write a
macro that transforms a list of sources in the current directory into targets in the output directory:

                  objname=$(addprefix $(OUTDIR)/ \
                          $(notdir $(1:.cpp=.o)))

                  SRCS = main.cpp foo.cpp bar.cpp

                  OBJS = $(call objname,$(SRCS))

                  $(OUTDIR)/%.o: %.cpp
                      $(COMPILE.cpp) -o ‘$@’ ‘$<’

Now, simply by redefining the OUTDIR macro, multiple builds over the same source tree can proceed
simultaneously. This method has the added benefit that cleaning the build tree is as simple as deleting the single
OUTDIR directory.

6. Monoliths

A monolith is a single build step or Makefile target that takes a disproportionately long time to complete:

It is easy to see how a monolith forces a serialization: until it completes, all other dependent jobs must wait.
There are several reasons for a single long-running step: the build may be configured to recurse over the entire
source tree updating dependencies, it may be required to do a large file copy or checkout 8, or it may need to
run a test or analysis program that requires a lot of processing time. By far, the most common monolith in an
embedded software build is a large link, usually responsible for building the final image.

Sometimes monoliths can be partitioned effectively for parallelization. Aggregating links into libraries that are
then combined into a final target can help distribute the work. In other cases, the monolith exists simply because
a portion of the build was designed with a different (typically homegrown) tool or language that does not lend
itself to parallelization; a port to Make would both help standardize maintenance and accelerate performance.

In a lot cases, however, monoliths can not be partitioned. Here, the best practice is not to directly accelerate
the long-running step, but rather to be aware of its impact and to challenge its existence in the first place. Is
the long-running step actually required in every build, or can it be made conditional and executed only when
necessary? Can its output be cached and re-used? If it must run, can it be pushed to the end of the build, where
it will serialize fewer jobs? Undertaking the experiments to measure monoliths usually yields strategies to
mitigate their impact.

7. Bad Dependencies

The last and most difficult problem to address in parallel builds is missing or inaccurate dependencies. The
simplest real-world example usually looks like this:

                  all: foo.o myprogram

                  foo.o: foo.c
                         gcc -c foo.c -o foo.o
                         gcc foo.o -o myprogram

This will always build correctly in serial: Make ensures that prerequisites are processed left-to-right, and
schedules the foo.o target before myprogram. In parallel, however, the lack of a dependency between myprogram
and foo.o means that both the link and compile will be executed simultaneously. Depending on the timing
and the starting conditions, this build may build myprogram correctly, it may fail unexpectedly with the linker
complaining that the object does not exist, or, most insidiously, if the object already exists (as it would if this
were an incremental run) the program would appear to build correctly, but the executable would no longer match
the sources. This last condition is so frustrating and dangerous that many developers are wary of parallel builds
in general, fearing incorrect builds.

Unfortunately, careful inspection in very large builds can only solve part of this problem. Certainly looking for
and fixing missing dependencies when build steps fail unexpectedly is best practice. A more rigorous approach is
to centralize all rules and macros, to ensure, for example, that it is impossible to invoke the linker without listing
all arguments as dependencies. Good examples of this kind of technique can be found in articles that address
automatic dependency generation.9 Finally, ElectricAccelerator from Electric Cloud was designed explicitly to
efficiently solve the problem of missing dependencies in parallel builds by introducing a custom filesystem that
detects and corrects dependencies automatically.

Slow software builds can have a serious impact on the productivity of an organization. Distributed parallel builds
are the only way to completely address the problems of exponential code growth, but their effectiveness is
marred by Make and build constructs that introduce serializations. This paper covered seven common problem
patterns that naturally occur when engineers are looking for simple solutions to complexity but neglect to
take the impact on parallelization into account. In almost all cases, there are alternative constructs that are
functionally equivalent but allow far greater parallelization. Looking for and correcting these problems in
embedded software build systems can dramatically reduce the software production cycle time which in turn can
have tangible impact on business productivity.

About Electric Cloud
Electric Cloud is the leading provider of software production management (SPM) solutions. Electric Cloud
solutions automate, accelerate and analyze software build-test-deploy processes to optimize both physical and
virtual IT environments. The company’s patented and award-winning products help development organizations
to speed time to market, boost developer productivity, and improve software quality. Leading companies across
a variety of industries, including semiconductors, enterprise IT, ISVs, mobile devices, and transactional Web
sites rely on Electric Cloud’s Software Production Management solutions to transform software production from
a liability to a competitive advantage. For customer inquiries please contact Electric Cloud at (408) 419-4300 or

1.   GNU Make documentation:
2.   ElectricInsight build analyzer:
3.   Notes on declarative programming:
4.   Windows Mobile build phases:
5.   Side Effects in Declarative Languages:
6.   Sun CC Template Repositories:
7.   Microsoft Visual Studio PDB files:
8.   Strictly speaking, we do not include source configuration checkout as part of the build time; some systems,
     however, are designed to pull or query additional files from the SCM system on build start that can lead to a
     startup monolith.
9.   Automatic Dependency Generation:

© 2003-2010 Electric Cloud, Inc. All rights reserved. Electric Cloud, ElectricCommander, ElectricInsight, ElectricAccelerator and Electric Make are
registered trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners.


To top