Seven Deadly Sins of Slow Software Builds Usman Muzaffar, VP Product Management | 5.2010 Seven Deadly Sins of Slow Software Builds I. The Problem of Slow Software Builds Software builds are one of the most time-consuming, resource-intensive and expensive phases of embedded system development. The very attributes of software that make it naturally appealing for embedded applications—trivially duplicated, easily modified, and readily composed from small parts into large products— have contributed to exponential increase in size, complexity, and number of targets. This explosive growth has caused build times to jump correspondingly, decelerating the entire software production cycle. Without the ability to quickly rebuild the product, even the smallest changes force developers and testers to waste valuable time waiting. In many organizations, it is clear that slow software builds are the primary obstacle to shipping more product of better quality on time. In spite of this and the near universal agreement that faster builds are always better, in practice little is done to directly address the problem. There are two fundamental reasons for the hesitation. First, the software build system is a critical piece of infrastructure; any interruption, even one done in the name of improvement, can be extremely costly to the organization. Too often, a preliminary analysis recommends that the only remedy is a complete rewrite of the existing system; the team then concludes that undertaking a substantial infrastructure project is imprudent “at this time” and not worth the potential risk. Second: even when the mandate to improve build times has been escalated to a business priority, the technical options available are few, and, frequently inapplicable. Solutions to accelerate build times, both open source and commercial, can have an alarming number of restrictions on compatibility with operating systems, platforms, compilers, and toolchains. The organic evolution of most build systems usually means they employ a wide variety of tools and methodologies: the apparent impedance mismatch between the prerequisites of an acceleration solution and the real-world specifics of the build system can make the problem appear still more intractable. What we have found, however, is that there are several ways that build system speed can be addressed, and large gains can often be achieved in even the most complex environments with minimal change and disruption to the existing workflow. It is worth reviewing how the industry has evolved solutions to the problem. Conceptually, there are three ways to make improvements to a process transforming inputs into outputs: 1. Start later 2. Finish sooner 3. Run faster In a software build system, “starting later” simply means re-using as much output as possible from previous runs. This, of course, is precisely what tools like the classic Unix make facility and its most popular modern descendant, GNU Make 3.811, are designed to do: by comparing modification times on input and output files, they will only re-run the commands to update those targets that are out of date with respect to their inputs. More generally: incremental builds are an instance of a larger strategy usually called build avoidance. By making sophisticated use of a pre-built object cache, systems can avoid compilation. Incremental builds and avoidance strategies are critical to individual developer workflows, but they do not have widespread adoption within production teams tasked with building the full product. First, the speed of modern compilers is often superior to the lookup-and-retrieve mechanics of complicated cache systems; in these cases, paradoxically, it is faster to rebuild than to re-use. Second, avoidance systems are inherently unpredictable—depending on the nature of 2. the change and the state of the object cache, build times can vary widely. Finally, and most importantly: release teams are held strictly accountable for being able to rebuild any version they ship; a nondeterministic cache means they can not guarantee their ability to recreate, bit-for-bit, the contents of a release. “Finishing sooner” is a less useful concept for our purposes. It manifests most prominently in large systems that have a long-running nightly or continuous build that creates a large set of components, each with distinct owners. While the entire product cannot be considered built until the very last link step is complete, any individual library or subcomponent may be ready hours earlier. Designing a system that makes these intermediate outputs readily available to their owners can dramatically reduce effective cycle time. By far the most important strategy for addressing long build times is to make the build process itself run faster. Obviously, this can be accomplished to some extent simply by using faster hardware: newer processors, disks, and networks that push more bits through the cycle will obviously yield correspondingly faster builds. In practice, however, large organizations quickly extract as much as they realistically can from hardware upgrades. What is really required to make substantial gains in performance, and what we will spend the rest of this paper examining, is making effective, scalable use of parallel builds. Builds are an excellent candidate for parallel (and distributed) computation, because they comprise a large number processes (typically, compiles and links) that are logically distinct. Engineers who first launch their builds against a multiprocessor system (or, more ambitiously, a distributed compute cluster) are universally disappointed to discover one or both of the following: 1. The build fails unexpectedly 2. The performance improvement is far less than expected Underlying both of these conditions is a problem with dependencies. Implicit or missing dependencies means the parallel build has insufficient information to accurately order build steps. For example, a link consuming a compiled object may inadvertently run before the compile producing that object is complete. Conversely (and more evident in builds that are functional but slow), explicit or implicit overserialization can force the system to use only one processing node even when more are available. The problems are analogous to others in multithreaded programming: without effective synchronization, the system is vulnerable to races and deadlocks. Most build tools do not have a good facility for completely guaranteeing (or even specifying) safe parallel execution. Dependencies must be listed explicitly; for example in Make syntax, prerequisites and targets are enumerated on a single line: foo.o: foo.c $(CC) -c foo.o -o foo.o The system makes no requirements as to what can or cannot be listed as a prerequisite, nor does it demand that the commands actually update the target. The tremendous flexibility tools like Make give their authors means the opportunity for misuse is broad. In the next section, we will catalog the most common mistakes that impact parallel performance. II. Common Problems Impacting Build Parallelization For each problem, we will explore the situation that leads to the tempting antipattern, followed by real examples of the consequences. We then describe alternative strategies that are better suited to parallel (and thus faster) builds. 3. Build Plots as a Diagnostic Tool Because software builds can easily have tens of thousands of steps (larger builds can approach a million compiles), it is extremely useful to have a facility to visually graph the behavior of a parallel build. The figure below shows a build plot generated by instrumenting the Make process.2 The X-axis is time; each row on the Y-axis represents a compute node tasked with running some steps. Each block on the graph is a job running on a compute node for some duration; different colors are used to indicate different kinds of jobs (compiles, links, etc.) In an ideal build, the blocks are closely packed from start to finish on each compute node: wherever there is whitespace, an explicit or implicit serialization has forced that node to wait until its required dependencies are satisfied, negatively impacting performance. 1. Make on the Bottom Unlike most imperative programming and scripting languages, Make is a declarative language.3 Frustrations with Make syntax and logic frequently lead to the creation of a “wrapper script”, typically written in a high-level language like Perl or Python, which iterates over projects or components and invokes Make for each one: 4. Moving out the main iteration is problematic for two reasons. First, we have lost the implicit parallelism the Make rules afforded us. The Perl script’s foreach loop is necessarily serial, synchronously forking each component’s subprocess in turn. If we attempt to make it multithreaded, we must assume all the burden of synchronization, which can make maintenance much more difficult. Second: the Perl script gives us a tempting but dangerous facility to do component-specific pre- and post-processing outside of Make control. Using it will lead to serialization and inefficient job packing, as this build plot from a Windows Mobile 4 illustrates: Fortunately, this is readily resolved by using the Perl script to do one-time setup and tear-down, then letting a top-level Makefile recursively invoke submakes for each component: 5. By leveraging Make’s pattern rule syntax, it is easy to preserve the data-driven brevity of the Perl script, and maintain the list of subcomponents in a single location. We can also annotate explicit dependencies between components using normal Make syntax, and we are absolved of dealing with any parallelization or synchronization issues. 2. Targets with Side Effects A common problem in build systems is the need to pass information between commands. For example, one target may need to compute a tool version or identification number that needs to be consumed by another. Because Make (and similar tools) provide rich mechanics for macro and variable computation, it is tempting to simply use those facilities to calculate a value when ready and dereference when necessary: all: a b a: @do-stuff ... @$(eval VERSION=$(shell gcc -v 2>&1)) b: @echo "$(VERSION)" This kind of usage is a side-effect in a declarative language, because the variable (here, VERSION) is global system state that is mutated as a consequence of updating a Make target. There are good reasons to avoid side-effects in general, as they make declarative systems much harder to understand and maintain.5 For our purposes, the side-effect is a particularly dangerous kind of implicit serialization, because Make has no visibility of its existence. Running a build with this construct in parallel will fail unpredictably. The simplest way to address the situation is to introduce an explicit dependency between the target that creates the side-effect and those that consume it (or, in very simple cases like the one above, merge the commands into a single target). While this will ensure a correct build, for a large number of targets, it will again introduce crippling serializations. In that case, the side-effect needs to be factored out of as many targets as possible, so that downstream consumer targets may be allowed to execute safely in parallel. 3. Multiply Updated Files The tools used in most build systems usually consume one or few input files and produce a single output file. It is this monohierarchical nature that lends itself so well to parallelization. Some build tools, however, have a need to continually update a single file over multiple invocations. Examples include the compiled template repository in the Sun Studio CC compiler6, or the PDB debugging and symbol files produced by the Microsoft Visual Studio cl compiler 7: %.obj: %.c cl /c /Fo'$@' /Fd'vc80.pdb' $< 6. Here, all compiles are implicitly serialized against writes to a single vc80.pdb file. The same pattern is seen in Makefile rules that update a single archive (for example, a ZIP or Jar file) as outputs are produced. As this build plot shows, running this in parallel can lead to serious problems as multiple processes attempt to update the same file simultaneously: Here, the red bricks represent any job that was identified as attempting to read or modify a file before a job that was serially earlier had finished writing it. This build was run with ElectricAccelerator, which detected this condition and remedied it by re-running the affected target serially. While correct, this build lost any benefit of parallelization due to the multiply updated files. The best solution to this problem is to leverage the fact that almost all tools that do incremental updates of a common file (including the compilers and archive tools noted above) are capable of merging partially built files. We can use this property to restructure the invocation into two phases: first, let as many individual targets update a unique archive or database file in parallel, then, as a serialized final step, merge the individual files into the single destination: 7. Using this construct as a guide, we can re-write our Makefile rule as: %.obj: %.c cl /c /Fo'$@' /Fd'$@.pdb' $< The linker will now write foo.obj’s symbols into foo.pdb; when linking the final executable, it automatically knows where to retrieve the partial debug information. 4. Pass-Based Builds (and the myth of “cyclical dependencies”) Another often used paradigm is to structure the build such that it requires multiple passes over the entire hierarchy. Typically, this strategy is employed because dependencies between compile and link or link and packaging were too complex to disentangle and enumerate. Sometimes, the build is declared to have “cyclical dependencies”, and not only does multiple phases (compile, link, etc.) over the source tree, but recurses into the very same directories with identical targets repeatedly. all: compile libs executables compile: $(MAKE) -C srcs compile libs: $(MAKE) -C srcs libs executable: $(MAKE) -C srcs executables compile-component-foo: $(MAKE) -C srcs/foo compile This structure can lead to a tremendous amount of wasted work. Thousands of small “do-nothing” jobs consume both processing power and network bandwidth to determine that nothing needs to be done for targets Make has already updated. On a build plot, jobs are so short and so closely packed together they appear as black bands at the start of make instance: 8. The solution here is to avoid writing Makefile rules that recurse into the same directory multiple times during a single build. Rather than looping over the source tree repeatedly with a single directive on each iteration, restructure to loop once with multiple targets in each directory. The need to support “cyclical dependencies” is interesting, because all Make-based systems easily detect and reject Makefile constructs that explicitly contain cycles. The actual situation underlying what is usually called a cyclical dependency is a component A that is only partially built on a first pass. A second component, B, consumes the built part of A, and itself produces a new output that the remainder of A requires. There is no cycle: rather, there are three components, A1, B, and A2 that have a simple serialization relationship between them. Modeling the relationship this way is both more accurate and more efficient than resorting to multiple passes over A. 5. Outputs in the Source Directory Make’s default compilation rules will instruct the compiler to create object files in the same directory as the sources: %.o: %.c $(CC) -c $< -o $@ This is unfortunate, because it means that when the same set of sources are used to produce different outputs, the old objects must be deleted first. This situation is virtually assured in real-world projects: sources may need to be built with different preprocessor options, or different debug or optimization levels. The most obvious solution is to simply instruct make to clean the derived objects and then rebuild with the new options: all: $(MAKE) CFLAGS=-g $(MAKE) clean $(MAKE) CFLAGS=-O2 9. This construct introduces a build-level serialization, with a build plot similar to the “Make on the bottom” construct. All of compilations in the optimization build are artificially serialized against the entire debug build. More generally: letting outputs build into the source directory necessarily means that the system can only support one build at a time. Once the project matures and requires independent builds on any number of axes (variants, versions, architectures), this can become a severe obstacle to efficient parallel builds. The solution is to avoid Make’s defaults and instead define rules that instruct the compiler to write outputs into architecture or build-specific directories. By using GNU Make’s filename manipulation functions, we can write a macro that transforms a list of sources in the current directory into targets in the output directory: , objname=$(addprefix $(OUTDIR)/ \ $(notdir $(1:.cpp=.o))) SRCS = main.cpp foo.cpp bar.cpp OBJS = $(call objname,$(SRCS)) $(OUTDIR)/%.o: %.cpp $(COMPILE.cpp) -o ‘$@’ ‘$<’ Now, simply by redefining the OUTDIR macro, multiple builds over the same source tree can proceed simultaneously. This method has the added benefit that cleaning the build tree is as simple as deleting the single OUTDIR directory. 6. Monoliths A monolith is a single build step or Makefile target that takes a disproportionately long time to complete: It is easy to see how a monolith forces a serialization: until it completes, all other dependent jobs must wait. There are several reasons for a single long-running step: the build may be configured to recurse over the entire source tree updating dependencies, it may be required to do a large file copy or checkout 8, or it may need to run a test or analysis program that requires a lot of processing time. By far, the most common monolith in an embedded software build is a large link, usually responsible for building the final image. 10. Sometimes monoliths can be partitioned effectively for parallelization. Aggregating links into libraries that are then combined into a final target can help distribute the work. In other cases, the monolith exists simply because a portion of the build was designed with a different (typically homegrown) tool or language that does not lend itself to parallelization; a port to Make would both help standardize maintenance and accelerate performance. In a lot cases, however, monoliths can not be partitioned. Here, the best practice is not to directly accelerate the long-running step, but rather to be aware of its impact and to challenge its existence in the first place. Is the long-running step actually required in every build, or can it be made conditional and executed only when necessary? Can its output be cached and re-used? If it must run, can it be pushed to the end of the build, where it will serialize fewer jobs? Undertaking the experiments to measure monoliths usually yields strategies to mitigate their impact. 7. Bad Dependencies The last and most difficult problem to address in parallel builds is missing or inaccurate dependencies. The simplest real-world example usually looks like this: all: foo.o myprogram foo.o: foo.c gcc -c foo.c -o foo.o myprogram: gcc foo.o -o myprogram This will always build correctly in serial: Make ensures that prerequisites are processed left-to-right, and schedules the foo.o target before myprogram. In parallel, however, the lack of a dependency between myprogram and foo.o means that both the link and compile will be executed simultaneously. Depending on the timing and the starting conditions, this build may build myprogram correctly, it may fail unexpectedly with the linker complaining that the object does not exist, or, most insidiously, if the object already exists (as it would if this were an incremental run) the program would appear to build correctly, but the executable would no longer match the sources. This last condition is so frustrating and dangerous that many developers are wary of parallel builds in general, fearing incorrect builds. Unfortunately, careful inspection in very large builds can only solve part of this problem. Certainly looking for and fixing missing dependencies when build steps fail unexpectedly is best practice. A more rigorous approach is to centralize all rules and macros, to ensure, for example, that it is impossible to invoke the linker without listing all arguments as dependencies. Good examples of this kind of technique can be found in articles that address automatic dependency generation.9 Finally, ElectricAccelerator from Electric Cloud was designed explicitly to efficiently solve the problem of missing dependencies in parallel builds by introducing a custom filesystem that detects and corrects dependencies automatically. 11. Conclusion Slow software builds can have a serious impact on the productivity of an organization. Distributed parallel builds are the only way to completely address the problems of exponential code growth, but their effectiveness is marred by Make and build constructs that introduce serializations. This paper covered seven common problem patterns that naturally occur when engineers are looking for simple solutions to complexity but neglect to take the impact on parallelization into account. In almost all cases, there are alternative constructs that are functionally equivalent but allow far greater parallelization. Looking for and correcting these problems in embedded software build systems can dramatically reduce the software production cycle time which in turn can have tangible impact on business productivity. About Electric Cloud Electric Cloud is the leading provider of software production management (SPM) solutions. Electric Cloud solutions automate, accelerate and analyze software build-test-deploy processes to optimize both physical and virtual IT environments. The company’s patented and award-winning products help development organizations to speed time to market, boost developer productivity, and improve software quality. Leading companies across a variety of industries, including semiconductors, enterprise IT, ISVs, mobile devices, and transactional Web sites rely on Electric Cloud’s Software Production Management solutions to transform software production from a liability to a competitive advantage. For customer inquiries please contact Electric Cloud at (408) 419-4300 or www.electric-cloud.com. References 1. GNU Make documentation: http://www.gnu.org/software/make/manual/make.html 2. ElectricInsight build analyzer: http://www.electriccloud.com/products/electricinsight.php 3. Notes on declarative programming: http://en.wikipedia.org/wiki/Declarative_programming 4. Windows Mobile build phases: http://msdn.microsoft.com/en-us/library/aa448367.aspx 5. Side Effects in Declarative Languages: http://en.wikipedia.org/wiki/Side_effect_%28computer_science%29 6. Sun CC Template Repositories: http://docs.sun.com/app/docs/doc/819-5267/bkagr?a=view 7. Microsoft Visual Studio PDB files: http://msdn.microsoft.com/en-us/library/yd4f8bd1%28VS.71%29.aspx 8. Strictly speaking, we do not include source configuration checkout as part of the build time; some systems, however, are designed to pull or query additional files from the SCM system on build start that can lead to a startup monolith. 9. Automatic Dependency Generation: http://make.paulandlesley.org/autodep.html © 2003-2010 Electric Cloud, Inc. All rights reserved. Electric Cloud, ElectricCommander, ElectricInsight, ElectricAccelerator and Electric Make are registered trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners. 12.
Pages to are hidden for
"EC-WP SlowBuilds"Please download to view full document