Document Sample

c 2003 Matt Pharr and Greg Humphreys Y W UE7 5 Q P H FE7 A 9) 77) 3 00 XVTRBS RDIG8DC B@48165 421) ( ! '¤ " ¨ " ! © ¤ ' " ¢ & ' ¡ " ' & % #$§¥££¤ ¡ " ! ¢ £ ¨ ¤ ¥£ ¨ ¢ © © ¨ ¦¤ §¥£ ¢ ¡ DRAFT (4 November 2003) — Do Not Distribute ¡ ¥ £ ¦¢ ¡ ¤¢ ¡ 1 Introduction 1 1.1 Approaching the System 1 1.2 Rendering and the Ray–Tracing Algorithm 5 1.3 System Overview 5 1.4 How To Proceed Through This Book 21 2 Geometry and Transformations 25 2.1 Vectors 27 2.2 Points 33 2.3 Normals 34 2.4 Rays 35 2.5 Three-dimensional bounding boxes 38 2.6 Transformations 41 2.7 Applying Transforms 52 2.8 Differential Geometry 57 3 Shapes 63 ¡¡ iv Contents 3.1 Basic Shape Interface 63 3.2 Spheres 68 3.3 Cylinders 78 3.4 Disks 82 3.5 Other Quadrics 85 3.6 Triangles and Meshes 87 3.7 ***ADV***: Subdivision Surfaces 98 4 Primitives and Intersection Acceleration 129 4.1 Geometric Primitives 130 4.2 Aggregates 135 4.3 Grid Accelerator 138 4.4 Kd-Tree Accelerator 152 5 Color and Radiometry 177 5.1 Spectral Representation 177 5.2 Basic Radiometry 185 5.3 Working with Radiometric Integrals 190 5.4 Surface Reﬂection 194 6 Camera Models 201 6.1 Camera Model 201 6.2 Projective Camera Models 205 6.3 Environment Camera 217 7 Sampling and Reconstruction 221 7.1 Fourier Theory 222 7.2 Sampling Theory 225 7.3 Image Sampling Interface 236 7.4 Stratiﬁed Sampling 242 7.5 ***ADV***: Low-Discrepancy Sequences 252 7.6 ***ADV***: Best-Candidate Sampling Patterns 265 7.7 Image Reconstruction 279 8 Film and the Imaging Pipeline 293 8.1 Film Interface 294 8.2 Image Film 295 8.3 ***ADV***: Perceptual Issues and Tone Mapping 303 8.4 Device RGB Conversion and Output 322 Contents v 9 Reﬂection Models 329 9.1 Basic Interface 334 9.2 Specular Reﬂection and Transmission 337 9.3 Lambertian Reﬂection 351 9.4 Microfacet Models 352 9.5 Lafortune Model 362 9.6 Fresnel Incidence Effects 364 10 Materials 369 10.1 BSDFs 369 10.2 Material Interface and Bump Mapping 374 10.3 Matte 381 10.4 Plastic 382 10.5 Translucent 383 10.6 Glass 384 10.7 Mirror 385 10.8 Shiny Metal 386 10.9 Diffuse Substrate 387 10.10 Measured Data 388 10.11 Uber Material 390 11 Texture 393 11.1 Texture Interface and Basic Textures 394 11.2 Sampling and Anti-Aliasing 397 11.3 Texture Coordinate Generation 405 11.4 Interpolated Textures 410 11.5 Image Texture 412 11.6 Solid and Procedural Texturing 431 11.7 Noise 440 12 ***ADV***: Volume Scattering 457 12.1 ***ADV***: Volume Scattering Processes 458 12.2 ***ADV***: Phase Functions 463 12.3 ***ADV***: Volume Interface and Homogeneous Volumes 465 12.4 ***ADV***: Varying-Density Volumes 468 12.5 ***ADV***: Volume Aggregates 472 vi Contents 13 Light Sources 477 13.1 Light Interface 478 13.2 Point Lights 480 13.3 Distant Lights 489 13.4 Area Lights 490 13.5 Inﬁnite Area Lights 493 14 Monte Carlo Integration: Basic Concepts 497 14.1 Background and Probability Review 498 14.2 The Monte Carlo Estimator 501 14.3 The Inversion Method for Sampling Random Variables 503 14.4 Transforming Between Different Distribution Functions 506 14.5 The Rejection Method 507 14.6 Transformation in Multiple Dimensions 509 14.7 2D Sampling with Multi-Dimensional Transformation 511 15 Monte Carlo Integration II: Variance Reduction 521 15.1 Analytic Integration Techniques 522 15.2 Careful Sample Placement 526 15.3 Sampling Reﬂection Functions 531 15.4 Sampling Light Sources 542 15.5 Sampling Volume Scattering 556 15.6 Russian Roulette 558 16 Light Transport 561 16.1 Direct Lighting 563 16.2 The Light Transport Equation 573 16.3 Path Tracing 582 16.4 ***ADV***: Bidirectional Path Tracing 589 16.5 Irradiance Caching 596 16.6 Particle Tracing and Photon Mapping 608 16.7 ***ADV***: Volume Integration 628 17 Summary and Conclusion 645 17.1 Design Retrospective 645 17.2 Major Projects 649 Contents vii A Utilities 657 A.1 The C++ Standard Library 657 A.2 Communicating with the User 659 A.3 Memory Management 662 A.4 Mathematical Routines 674 A.5 Octrees 680 A.6 Kd-Trees 686 A.7 Image Input Output 693 A.8 Main Include File 693 B Scene Description Interface 697 B.1 Parameter Sets 699 B.2 Global Options 706 B.3 Scene Deﬁnition 712 B.4 Scene Object Creation 720 C Input File Format 721 C.1 Parameter Lists 722 C.2 Statement Types 723 C.3 Standard Plug-ins 725 D Dynamic Object Creation 737 D.1 Reading Dynamic Libraries 738 D.2 Object Creation Functions 743 E Index of Classes 773 F Index of Non-Classes 777 G Index of Members 1 783 H Index of Members 2 803 I Index of Code Chunks 823 ¡¡ [Just as] other information should be available to those who want to learn and understand, program source code is the only means for programmers to learn the art from their predecessors. It would be unthinkable for playwrights not to allow other playwrights to read their plays [and] only be present at theater performances where they would be barred even from taking notes. Likewise, any good author is well read, as every child who learns to write will read hundreds of times more than it writes. Programmers, however, are expected to invent the alphabet and learn to write long novels all on their own. Programming cannot grow and learn unless the next generation of programmers have access to the knowledge and information gathered by other programmers before them. — Erik Naggum £ ¤ £¡ £ ¢ Rendering is a fundamental component of computer graphics. At the highest level of abstrac- tion, rendering describes the process of converting a description of a three-dimensional scene into an image. Algorithms for animation, geometric modeling, texturing, and other areas of computer graphics all must feed their results through some sort of rendering process so that the results of their work are made visible in an image. Rendering has become ubiquitous; from movies to games and beyond, it has opened new frontiers for creative expression, entertainment, and visualization. In the early years of the ﬁeld, research in rendering focused on solving fundamental problems such as determining which objects are visible from a given viewpoint. As these problem have been solved and as richer and more realistic scene descriptions have become available, modern rendering has grown to be built on ideas from a broad range of disciplines, including physics and astrophysics, astronomy, biology, psychology and the study of perception, and pure and applied mathematics. The interdisciplinary nature is one of the reasons rendering is such a fascinating area to study. This book presents a selection of modern rendering algorithms through the documented source code for a complete rendering system. All of the images in this book, including the ones on the front and back covers, were rendered by this software. The system, lrt, is written using a programming methodology called literate programming that mixes prose describing the system with the source code that implements it. We believe that the literate programming approach is a valuable way to introduce ideas in computer science and computer graphics. Often, some of the subtleties of an xii Preface algorithm can be missed until it is implemented; seeing someone else’s implementation is a good way to acquire a solid understanding of an algorithm’s details. Indeed, we believe that deep under- standing of a smaller number of algorithms provides a stronger base for further study of graphics than superﬁcial understanding of many. Not only does reading an implementation help clarify how an algorithm is implemented in prac- tice, but by showing these algorithms in the context of a complete and non-trivial software system we are also able to address issues in the design and implementation of medium-sized rendering systems. The design of the basic abstractions and interfaces of such a system has substantial impli- cations for how cleanly algorithms can be expressed in it as well as how well it can support later addition of new techniques, yet the trade-offs in this design space are rarely discussed. lrt and this book focus exclusively on so-called photorealistic rendering, which can be deﬁned variously as the task of generating images that are indistinguishable from those that a camera would capture taking a photograph of the scene, or as the task of generating an image that evokes the same response from a human observer when displayed as if the viewer was looking at the actual scene. There are many reasons to focus on photorealism. Photorealistic images are necessary for much of the rendering done by the movie special effects industry, where computer generated imagery must be mixed seamlessly with footage of the real world. For other entertainment applications where all of the imagery is synthetic, photorealism is an effective tool to make the observer forget that he or she is looking at an environment that may not actually exist. Finally, photorealism gives us a reasonably well-deﬁned metric for evaluating the quality of the output of the rendering system. A consequence of our approach is that this book and the system it describes do not exhaustively cover the state-of-the-art in rendering; many interesting topics in photorealistic rendering will not be covered either because they didn’t ﬁt well with the architecture of the software system (e.g. ﬁnite element radiosity algorithms), or because we believed that the pedagogical value of explaining the algorithm was outweighed by the complexity of its implementation (e.g. Metropolis light transport). We will note these decisions as they come up and provide pointers to further resources so the reader can follow up on topics that are of interest. Many other areas of rendering, such as interactive rendering, visualization, and illustrative forms of rendering (e.g. pen-and-ink styles) aren’t covered in this book at all. ¡ ¥ ¨ ¨ ¥ ¨ ¡ ¥ £ ¡ ¥ ©§©©§¦¤¢ Our primary intended audience is students in upper-level undergraduate or graduate-level com- puter graphics classes. This book assumes existing knowledge of computer graphics at the level of an introductory college-level course, though certain key concepts from such a course will be presented again here, such as basic vector geometry and transformations. For students who do not have experience with programs that have tens of thousands of lines of source code, the literate pro- gramming style gives a gentle introduction to this complexity. We have paid special attention to explaining the reasoning behind some of the key interfaces and abstractions in the system in order to give these readers a sense of why the system was structured the way that it was. Our secondary, but equally important, audiences are advanced graduate students and researchers, Overview and Goals xiii software developers in industry, and individuals interested in the fun of writing their own rendering systems. Though many of the ideas in this manuscript will likely be familiar to these readers, read- ing explanations of the algorithms we describe in the literate style may provide new perspectives. lrt also includes implementations of a number of newer and/or difﬁcult-to-implement algorithms and techniques, including subdivision surfaces, Monte Carlo light transport, and volumetric scatter- ing models; these should be of particular interest even to experienced practitioners in rendering. We hope that it will also be useful for this audience to see one way to organize a complete non-trivial rendering system. § © ¨ § ¥ £ ¡ ¤©¡ ¨¦¥ ¡ ¤¥ ¢ lrt is based on the ray tracing algorithm. Ray tracing is an elegant technique that has its origins in lens-making; Gauss traced rays through lenses by hand in the 1800s. Ray tracing algorithms on computers follow the path of inﬁnitesimal rays of light through the scene up to the ﬁrst surface that they intersect. This gives a very basic method for ﬁnding the ﬁrst visible object as seen from any particular position and direction. It is the basis for many rendering algorithms. lrt was designed and implemented with three main goals in mind: it should be complete, it should be illustrative, and it should be physically based. Completeness implies that the system should not lack important features found in high-quality commercial rendering systems. In particular, it means that important practical issues, such as anti- aliasing, robustness, and the ability to efﬁciently render complex scenes should be addressed thor- oughly. It is important to face these issues from the start of the system’s design, since it can be quite difﬁcult to retroﬁt such functionality to a rendering system after it has been implemented, as these features can have subtle implications for all components of the system. Our second goal means that we tried to choose algorithms, data structures, and rendering tech- niques with care. Since their implementations will be examined by more readers than those in most rendering systems, we tried to select the most elegant algorithms that we were aware of and imple- ment them as well as possible. This goal also implied that the system should be small enough for a single person to understand completely. We have implemented lrt with a plug-in architecture, with a core of basic glue that pushes as much functionality as possible out to external modules. The result is that one doesn’t need to understand all of the various plug-ins in order to understand the basic structure of the system. This makes it easier to delve in deeply to parts of interest and skip others, without losing sight of how the overall system ﬁts together. There is a tension between the goals of being both complete and illustrative. Implementing and describing every useful technique that would be found in a production rendering system would not only make this book extremely long, but it would make the system more complex than most readers would be interested in. In cases where lrt lacks such a useful feature, we have attempted to design the architecture so that feature could be easily added without altering the overall system design. Exercises at the end of each chapter suggest programming projects that add new features to the system. xiv Preface The basic foundations for physically-based rendering are the laws of physics and their mathe- matical expression. lrt was designed to use the correct physical units and concepts for the quanti- ties that it computes and the algorithms it is built from. When conﬁgured to do so, lrt can compute images that are physically correct; they accurately reﬂect the lighting as it would be in a real-world scene corresponding to the one given to the renderer. One advantage of the decision to use a phys- ical basis is that it gives a concrete standard of program correctness: for simple scenes, where the expected result can be computed in closed-form, it lrt doesn’t compute the same result, we know that it must have a bug. Similarly, if different physically-based lighting algorithms in lrt give different results for the same scene, or if lrt doesn’t give the same results as another physically based renderer, there is certainly an error in one of them. Finally, we believe that this physically- based approach to rendering is valuable because it is rigorous. When it is not clear how a particular computation should be performed, physics gives an answer that guarantees a consistent result. Efﬁciency was secondary to these three goals. Since rendering systems often run for many minutes or hours in the course of generating an image, efﬁciency is clearly important. However, we have mostly conﬁned ourselves to algorithmic efﬁciency rather than low-level code optimization. In some cases, obvious micro-optimizations take a back seat to clear, well-organized code, though we did make some effort to optimize the parts of the system where most of the computation occurs. For this reason as well as portability, lrt is not presented as a parallel or multi-threaded application, although parallelizing lrt would not be very difﬁcult. In the course of presenting lrt and discussing its implementation, we hope to convey some hard-learned lessons from some years of rendering research and development. There is more to writing a good renderer than stringing together a set of fast algorithms; making the system both ﬂexible and robust is the hard part. The system’s performance must degrade gracefully as more geometry is added to it, as more light sources are added, or as any of the other axes of complexity are pushed. Numeric stability must be handled carefully; stable algorithms that don’t waste ﬂoating- point precision are critical. The rewards for going through the process of developing a rendering system that addresses all of these issues are enormous–writing a new renderer or adding a new feature to an existing renderer and using it to create an image that couldn’t be generated before is a great pleasure. Our most fundamental goal in writing this book was to bring the opportunity to do this to a wider audience. You are encouraged to use the system to render the example scenes on the companion CD as you progress through the book. Exercises at the end of each chapter suggest modiﬁcations to make to the system that will help you better understand its inner workings and more complex projects to extend the system to add new features. We have also created a web site to go with this book, located at www.pharr.org/lrt. There you will ﬁnd errata and bug ﬁxes, updates to lrt’s source code, additional scenes to render, supplemental utilities, and new plug-in modules. If you come across a bug in lrt or an error in this text that is not listed at the web site, please report it to the e-mail address lrtbugs@pharr.org. ¡ ¡ ¥ ¥£¤¥ ¤ ¢ ¨ §¥ £ ¡ Additional Reading xv ¨ ¨ § ¡ § ¡ £ ¡ § ¥ ¨ ¢ Donald Knuth’s article Literate Programming (Knuth 1984) describes the main ideas behind literate programming as well as his web programming environment. The seminal TEX typeset- ting system was written with this system and has been published as a series of books (Knuth 1993a; Knuth 1986). More recently, Knuth has published a collection of graph algorithms in The Stanford Graphbase (Knuth 1993b). These programs are enjoyable to read and are respec- tively excellent presentations of modern automatic typesetting and graph algorithms. The website www.literateprogramming.com has pointers to many articles about literate programming, literate programs to download as well as a variety of literate programming systems; many reﬁnements have been made since Knuth’s original development of the idea. The only other literate program that we are aware of that has been published as a book is the implementation of the lcc C compiler, which was written by Fraser and Hansen and published as A Retargetable C Compiler: Design and Implementation (Fraser and Hanson 1995). Say something nice about this book ¢ £¡ ¢ ¡ ¤ ¡ ¦¢ ¤ ¥ This chapter provides a high-level top-down description of lrt’s basic archi- tecture. It starts by explaining more about the literate programming approach and how to read a literate program. We then brieﬂy describe our coding conventions before moving forward into the high-level operation of lrt, where we describe what happens during rendering by walking through the process of how lrt com- putes the color at a single point on the image. Along the way we introduce some of the major classes and interfaces in the system. Subsequent chapters will describe these and other classes and their methods in detail. § ¨ ¤ £ © §©§ ¡ § ¢ ¥ ¦£¦¦ ¥ £ ¤ 1.1.1 Literate Programming In the course of the development of the TEX typesetting system, Donald Knuth developed a new programming methodology based on the simple (but revolution- ary) idea that programs should be written more for people’s consumption than for computers’ consumption. He named this methodology literate programming. This book (including the chapter you’re reading now) is a long literate program. 2 Introduction [Ch. 1 Literate programs are written in a meta-language that mixes a document for- matting language (e.g. L TEX or HTML) and a programming language (e.g. C++). A The meta-language compiler then can transform the literate program into either a document suitable for typesetting (this process is generally called weaving, since Knuth’s original literate programming environment was called web), or into source code suitable for compilation (so-called tangling, since the resulting source code is not generally as comprehensible to a human reader than the original literate pro- gram was). The literate programming meta-language provides two important features. The ﬁrst is a set of mechanisms for mixing English text with source code. This makes the description of the program just as important as its actual source code, encour- aging careful design and documentation on the part of the programmer. Second, the language provides mechanisms for presenting the program code to the reader in an entirely different order than it is supplied to the compiler. This feature makes it possible to describe the operation of the program in a very logical manner. Knuth named his literate programming system web since literate programs tend to have the form of a web: various pieces are deﬁned and inter-related in a variety of ways such that programs are written in a structure that is neither top-down nor bottom- up. As a simple example, consider a function InitGlobals() that is responsible for initializing all of the program’s global variables. If all of the variable initializations are presented to the reader at once, InitGlobals() might be a large collection of variable assignments the meanings of which are unclear because they do not appear anywhere near the deﬁnition or use of the variables. A reader would need to search through the rest of the entire program to see where each particular variable was declared in order to understand the function and the meanings of the values it assigned to the variables. As far as the human reader is concerned, it would be better to present the initialization code near the code that actually declares and uses the global. In a literate program, then, one can instead write InitGlobals() like this: Function Deﬁnitions ¢ £¡ void InitGlobals() { Initialize Global Variables ¡ } Here we have added text to a fragment called Function Deﬁnitions . (This frag- ¡ ment will be included in a C++ source code ﬁle when the literate program is tan- gled for the compiler.) The fragment contains the deﬁnition of the InitGlobals() function. The InitGlobals() function itself includes another fragment, Initialize Global Variables . At this point, no text has been added to the initialization frag- ¡ ment. However, when we introduce a new global variable ErrorCount somewhere later in the program, we can now write: Initialize Global Variables ¢ £¡ ErrorCount = 0; Here we have started to deﬁne the contents of Initialize Global Variables . ¡ When our literate program is turned into source code suitable for compiling, the literate programming system will substitute the code ErrorCount = 0; inside the Sec. 1.1] Approaching the System 3 deﬁnition of the InitGlobals() function. Later on, we may introduce another global FragmentsProcessed, and we can append it to the fragment: Initialize Global Variables ¡¡ ¢ FragmentsProcessed = 0; The symbol after the fragment name shows that we have added to a previ- ¢ ously deﬁned fragment. When tangled, the result of the above fragment deﬁnitions is the code: void InitGlobals() { ErrorCount = 0; FragmentsProcessed = 0; } By making use of the text substitution that is made easy by fragments, we can decompose complex functions into logically-distinct parts. This can make their operation substantially easier to understand. We can write a function as a series of fragments: Function Deﬁnitions ¡¡ ¢ void func(int x, int y, double *data) { Check validity of arguments ¡ if (x < y) { Swap parameter values ¡ } Do precomputation before loop ¡ Loop through and update data array ¡ } The text of each fragment is then expanded inline in func() for the compiler. In the document, we can introduce each fragment and its implementation in turn– these fragments may of course include additional fragments, etc. This style of decomposition lets us write code in collections of just a handful of lines at a time, making it easier to understand in detail. Another advantage of this style of pro- gramming is that by separating the function into logical fragments, each with a single and well-delineated purpose, each one can then be written and veriﬁed independently–in general, we will try to make each fragment less than ten lines or so of code, making it easier to understand its operation. Of course, inline functions could be used to similar effect in a traditional pro- gramming environment, but using fragments to decompose functions has a few important advantages. The ﬁrst is that all of the fragments can immediately refer to all of the parameters of the original function as well as any function-local vari- ables that are declared in preceeding fragments; it’s not necessary to pass them all as parameters, as would need to be done with inline functions. Another advantage is that one generally names fragments with more descriptive and longer phrases than one gives to functions; this improves program readability and understandabil- ity. Because it’s so easy to use fragments to decompose complex functions, one does more decomposition in practice, leading to clearer code. In some sense, the literate programming language is just an enhanced macro sub- stitution language tuned to the task of rearranging program source code provided 4 Introduction [Ch. 1 by the user. The simplicity of the task of this program can belie how different literate programming is from other ways of structuring software systems. 1.1.2 Coding Conventions We have written lrt in C++. However, we have used a subset of the language, both to make the code easier to understand, as well as to improve the system’s portability. In particular, we have avoided multiple inheritance and run-time ex- ception handling and have used only a subset of C++’s extensive standard library. Appendix A.1 reviews the parts of the standard library that lrt uses in multiple places; otherwise we will point out and document unusual library routines as they are used. Types, objects, functions, and variables are named to indicate their scope; classes and functions that have global scope all start with capital letters. (The system uses no global variables.) The names of small utility classes, module-local static vari- ables, and private member functions start with lower-case letters. We will occasionally omit short sections of lrt’s source code from this docu- ment. For example, when there are a number of cases to be handled, all with nearly identical code, we will present one case and note that the code for the remaining cases has been elided from the text. 1.1.3 Code Optimization As mentioned in the preface, we have tried to make lrt efﬁcient by using well- chosen algorithms rather than by having many low-level optimizations. However, we have used a proﬁler to ﬁnd which parts of it account for most of the execution time and have performed local optimization of those parts when doing so didn’t make the code confusing. We kept a handful of basic optimization principles in mind while doing so: On current CPU architectures, the slowest mathematical operations are di- vides, square-roots, and trigonometric functions. Addition, subtraction, and multiplication are generally ten to ﬁfty times faster than those operations. Code changes that reduce the number of the slower mathematical operations can help performance substantially; for example, replacing a series of di- ¡ vides by a value v with the computing the value 1 v and then multiplying by that value. Declaring short functions as inline can speed up code substantially, both by removing the run-time overhead of performing a function call (which may involve saving values in registers to memory) as well as by giving the compiler larger basic blocks to optimize. As the speed of CPUs continues to grow more quickly than the speed at which data can be loaded from main memory into the CPU, waiting for values from memory is becoming a major performance barrier. Organiz- ing algorithms and data structures in ways that give good performance from memory caches can speed up program execution much more than reducing Sec. 1.2] Rendering and the Ray–Tracing Algorithm 5 the total number of instructions to be executed. Appendix ?? discusses gen- eral principles for memory-efﬁcient programming; these ideas are mostly applied in the ray–intersection acceleration structures of Chapter 4 and the image map representation in Section 11.5.2, though they inﬂuence many of the design decisions throughout the system. 1.1.4 Indexing and Cross-Referencing There are a number of features of the text designed to make it easier to navigate. Indices in the page margins give the page number where the functions, variables, and methods used in the code on that page are deﬁned (if not on the current or facing page). This makes it easier to refer back to their deﬁnitions and descriptions, especially when the book isn’t read fromt-to-back. Indices at the end of the book collect all of these identiﬁers so that it’s possible to ﬁnd deﬁnitions starting from their names. Another index at the end collects all of the fragments and lists the page they were deﬁned on and the pages where they were used. XXX Page number of deﬁnition(s) and use in fragments XXX ¨ ©§ ¡ ¥ ¨ ¡ §§¥ £ ¡ ¢ § ¨ ©¡ § ¥ £ ¥£ ¤ ¢ §£¡ § ¤¤¢ ¢¢ £ ££ ¤ What it is, why we’re doing it, why you care. ¨ ¦ ©§ ¦ ¡ ¦£ ¥ ¤ ¦¥ ¡ £ ¥ ¢ ¥ ¡ lrt is written using an plug-in architecture. The lrt executable consists of the core code that drives the main ﬂow of control of the system, but has no imple- mentation of speciﬁc shape or light representations, etc. All of its code is written in terms of the abstract base classes that deﬁne the interfaces to the plug-in types. At run-time, code modules are loaded to provide the speciﬁc implementations of these base classes needed for the scene being rendered. This method of organiza- tion makes it easy to extend the system; substantial new functionality can be added just by writing a new plug-in. We have tried to deﬁne the interfaces to the various plug-in types so that they make it possible to write many interesting and useful ex- tensions. Of course, it’s impossible to forsee all of the ways that a developer might want to extend the system, so more far-reaching projects may require modiﬁcations to the core system. The source code to lrt is distributed across a small directory hierarchy. All of the code for the lrt executable is in the core/ directory. lrt supports twelve different types of plug-ins, summarized in the table in Figure 1.1 which lists the abstract base classes for the plug-in types, the directory that the implementaitions of these types that we provide are stored in, and a reference to the section where each interface is ﬁrst deﬁned. Low-level details of the routines that load these modules are discussed in Appendix D.1. 1.3.1 Phases of Execution lrt has three main phases of execution. First, it reads in the scene description text ﬁle provided by the user. This ﬁle speciﬁes the geometric shapes that make up the 6 Introduction [Ch. 1 Base Class Directory Section Shape shapes/ 3.1 Primitive accelerators/ 4.1 Camera cameras/ 6.1 Film film/ 8.1 Filter filters/ 7.7 Sampler samplers/ 7.3 ToneMap tonemaps/ 8.3 Material materials/ 10.2 Light lights/ 13.1 SurfaceIntegrator integrators/ 16 VolumeIntegrator integrators/ 16 VolumeRegion volumes/ 12.3 Figure 1.1: lrt supports twelve types of plug-in objects that are loaded at runtime based on which implementations of them are in the scene description ﬁle. The system can be extended with new plug-ins, without needing to be reocmpiled itself. scene, their material properties, the lights that illuminate them, where the virtual Camera 202 camera is positioned in the scene, and parameters to all of the other algorithms Film 294 that specify the renderer’s basic algorithms. Each statement in the input ﬁle has Filter 281 Light 478 a direct mapping to one of the routines in Appendix B that comprise the interface Material 375 that lrt provides to allow the scene to be described. A number of example scenes Primitive 130 Sampler 237 are provided in the examples/ directory in the lrt distribution and Appendix C Shape 63 has a reference guide to the scene description format. SurfaceIntegrator 563 Once the scene has been speciﬁed, the main rendering loop begins. This is ToneMap 310 VolumeIntegrator 630 the second main phase of execution, and is the one where lrt usually spends the VolumeRegion 465 majority of its running time. Most of the chapters in this book describe code that will execute during this phase. This step is managed by the Scene::Render() method, which will be the focus of Section 1.3.3. lrt uses ray tracing algorithms to determine which objects are visible at particular sample points on the image plane as well as how much light those objects reﬂect back to the image. Computing the light arriving at many points on the image plane gives us a representation of the image of the scene. Finally, once the second phase has ﬁnished computing the image sample contri- butions, the third phase of execution handles post-processing the image before it is written to disk (for example, mapping pixel values to the range 0 255 if necessary ¡ ¢ for the image ﬁle format being used.) Statistics about the various rendering algo- rithms used by the system are then printed, and the data for the scene description in memory is de-allocated. The renderer will then resume procesisng statements from the scene description ﬁle until no more remain, allowing the user to specify another scene to be rendered if desired. The cornerstone of the techniques used to do this is the ray tracing algorithm. Ray tracing algorithms take a geometric representation of a scene and a ray, which can be described by its 3D origin and direction. There are two main tasks that ray tracing algorithms perform: to determine the ﬁrst geometric object that is visible along a determine whether any geometric objects intersect a ray. The ﬁrst task Sec. 1.3] System Overview 7 Figure 1.2: Basic ray tracing algorithm: given a ray starting from the image plane, the ﬁrst visible object at that point can be found by determining which object ﬁrst intersects the ray. Furthermore, visibility tests between a point on a surface and a light source can also be performed with ray tracing, givng an accurate method for computing shadows. is useful for solving the hidden-surface problem; if at each pixel we trace a ray into the scene to ﬁnd the closest object hit by a ray starting from that pixel, we have found the ﬁrst visible object in the pixel. The second task can be used for 8 Scene shadow computations: if no other object is between a point in the scene and a point on a light source, then illumination from the light source at that point reaches the receiving point; otherwise, it must be in shadow. Figure 1.2 illustrates both of these ideas. The ability to quickly perform exact visibility tests between arbitrary points in the scene, even in complex scenes, opens the door to many sophisticated rendering algorithms based on these queries. Because ray tracing only requires that a particu- lar shape representation be able to determine if a ray has intersected it (and if so, at what distance along the ray the intersection occured), a wide variety of geometric representations can naturally be used with this approach. 1.3.2 Scene Representation The main() function of the program is in the core/lrt.cpp ﬁle. It uses the system-wide header lrt.h, which deﬁnes widely useful types, classes, and func- tions, and api.h, which deﬁnes routines related to processing the scene descrip- tion. lrt.cpp* ¢ £¡ #include "lrt.h" #include "api.h" main program ¡ lrt’s main() function is pretty simple; after calling lrtInit(), which does system-wide initialization, it parses the scene input ﬁles speciﬁed by the ﬁlenames given as command-line arguments, leading to the creation of a Scene object that holds representations of all of the objects that describe the scene and rendering an image of the scene. After rendering is done, lrtCleanup() does ﬁnal cleanup before system exits. 8 Introduction [Ch. 1 main program ¢ £¡ int main(int argc, char *argv[]) { Print welcome banner ¡ lrtInit(); Process scene description ¡ lrtCleanup(); return 0; } If the user ran lrt with no command-line arguments, then the scene description is read from standard input. Otherwise we loop through the command line argu- ments, processing each input ﬁlename in turn. No other command line arguments are supported. Process scene description ¢ £¡ if (argc == 1) { Parse scene from standard input ¡ } else { Parse scene from input ﬁles ¡ } lrtCleanup() 706 The ParseFile() function parses a text scene description ﬁle, either from stan- lrtInit() 706 dard input or from a ﬁle on disk; it returns false if it was unable to open the ﬁle. The mechanics of parsing scene description ﬁles will not be described in this book (it is done with straightforward lex and yacc ﬁles.) Parse scene from standard input ¢ £¡ ParseFile("-"); If a particular input ﬁle can’t be opened, the Error() routine reports this infor- mation to the user. Error() is like the printf() function in that it ﬁrst takes a format string that can include escape codes like %s, %d, %f, etc., which have values supplied for them via a variable argument list after the format string. Parse scene from input ﬁles ¢ £¡ for (int i = 1; i < argc; i++) if (!ParseFile(argv[i])) Error("Couldn’t open scene description file \"%s\"\n", argv[i]); As the scene ﬁle is parsed, objects are created that represent the camera, lights, and the geometric primitives in the scene. Along with other objects that manage other parts of the rendering process, these are all collected together in the Scene ob- ject, which is allocated by the GraphicsOptions::MakeScene() method in Sec- tion B.4. The Scene class is declared in core/scene.h and deﬁned in core/scene.cpp. Scene Declarations ¢ £¡ class Scene { public: Scene Public Methods ¡ Scene Data ¡ }; Sec. 1.3] System Overview 9 We don’t include the implementation of the Scene constructor here; it mostly just copies the pointers to these objects that were passed into it. Each geometric object in the scene is represented by a Primitive, which col- lects a lower-level Shape that strictly speciﬁes its geometry, and a Material that describes how light is reﬂected at points on the surface of the object (e.g. the ob- ject’s color, whether it has a dull or glossy ﬁnish, etc.) All of these geometric primitives are collected into a single aggregate Primitive, aggregate, that stores them ina a 3D data structure that makes ray tracing faster by substantially reducing the number of unnecessary ray intersection tests. Scene Data ¢ £¡ Primitive *aggregate; Each light source in the scene is represented by a Light object. The shape of a light and the distribution of light that it emits has a substantial effect on the illumination it casts into the scene. lrt supports a single global light list that holds all of the lights in the scene using the vector class from the standard library. While some renderers support light lists that are speciﬁed per-geometric object, allowing some lights to illuminate only some of the objects in the scene, this idea doesn’t map well to the physically-based rendering approach taken in lrt, so we only have this global list. 202 Camera Scene Data ¡¡ ¢ 478 Light vector<Light *> lights; 375 Material 130 Primitive The camera object controls the viewing and lens parameters such as camera 63 Shape 563 SurfaceIntegrator position and orientation and ﬁeld of view. A Film member variable inside the 658 vector camera class handles image storage. The Camera and classes are described in 630 VolumeIntegrator Chapter 6 and ﬁlm is described in Chapter 8. After the image has been computed, 465 VolumeRegion a sequence of imaging operations is applied by the ﬁlm to make adjustments to the image before writing it to disk. Scene Data ¡¡ ¢ Camera *camera; describe this... Scene Data ¡¡ ¢ VolumeRegion *volumeRegion; Integrators handle the task of simulating the propagation of light in the scene from the light sources to the primitives in order to compute how much light arrives at the ﬁlm plane at image sample positions. Their name comes from the fact that their task is to evaluate the value of an integral equation that describes the distri- bution of light in an environment. SurfaceIntegrators compute reﬂected light from geometric surfaces, while VolumeIntegrators handle the scattering from participating media–particles like fog or smoke in the environment that interact with light. The properties and distribution of the participating media are described by VolumeRegion objects, which are deﬁned in Chapter 12. Both types of integra- tors are described and implemented in Chapter 16. Scene Data ¡¡ ¢ SurfaceIntegrator *surfaceIntegrator; VolumeIntegrator *volumeIntegrator; 10 Introduction [Ch. 1 Figure 1.3: Class relationships for main rendering loop, which is in the Scene::Render() method in core/scene.cpp. The Sampler provides a se- quence of sample values, one for each image sample to be taken. The Camera Camera 202 Film 294 turns a sample into a corresponding ray from the ﬁlm plane and the Integrators Integrator 562 compute the radiance along that ray arriving at the ﬁlm. The sample and its ra- Sampler 237 diance are given to the Film, which stores their contribution in an image. This SurfaceIntegrator 563 VolumeIntegrator 630 process repeats until the Sampler has provided as many samples as are necessary to generate the ﬁnal image. The goals of the Sampler are subtle, but its implementation can substantially affect the quality of the images that the system generates. First, the sampler is repsonsible for choosing the points on the image plane from which rays are traced into the scene to compute ﬁnal pixel values. Second, it is responsible for supplying sample positions that are used by the integrators in their light transport computa- tions. For example, some integrators need to choose sample points on light sources as part of the process of computing illumination at a point. Generating good dis- tributions of samples is an important part of the rendering process and is discussed in Chapter 7. Scene Data ¡¡ ¢ Sampler *sampler; 1.3.3 Main Rendering Loop After the Scene has been allocated and initialized, its Render() method is invoked, starting the second phase of lrt’s execution, the main rendering loop. For each of a series of positions on the image plane, this method uses the camera and the sampler to generate a ray out into the scene and then uses the integrators to compute the light arriving along the ray at the image plane. This value is passed along to the ﬁlm, which records its contribution. Figure 1.3 summarizes the main classes used in this method and the ﬂow of data among them. Sec. 1.3] System Overview 11 Scene Methods ¢ £¡ void Scene::Render() { Allocate and initialize sample ¡ Allow integrators to do pre-processing for the scene ¡ Get all samples from Sampler and evaluate contributions ¡ Clean up after rendering and store ﬁnal image ¡ } Before rendering starts, this method allocates a Sample object for the Sampler to use to store sample values for each image sample. Because the number and types of samples that need to be generated for each image sample are partially dependent on the integrators, Sample constructor takes pointers to them so that they can inform the Sample object about their sample needs. See Section 7.3.1 for more information about how integrators request particular sets of samples at this point. Allocate and initialize sample ¢ £¡ Sample *sample = new Sample(surfaceIntegrator, volumeIntegrator, this); The only other task to complete before rendering can begin is to call the Preprocess() methods of the integrators, which gives them an opportunity to do any scene- dependent precomputation that thay may need to do. Because information like the 563 Integrator::Preprocess() number of lights in the scene, their power and the geometry of the scene aren’t 611 PhotonIntegrator known when the integrators are originally created, the Preprocess() method 660 ProgressReporter 237 Sampler gives them an opportunity to do ﬁnal initialization that depends on this informa- 238 Sampler::GetNextSample() tion. For example, the PhotonIntegrator in Section 16.6 uses this opportunity to 8 Scene create data structures that hold a representation of the distribution of illumination in the scene. Allow integrators to do pre-processing for the scene ¢ £¡ surfaceIntegrator->Preprocess(this); volumeIntegrator->Preprocess(this); The ProgressReporter object tells the user how far through the rendering pro- cess we are as lrt runs. It takes the total number of work steps as a parameter, so that it knows the total amount of work to be done. After its creation, the main render loop begins. Each time through the loop Sampler::GetNextSample() is called and the Sampler initializes sample with the next image sample value, re- turning false when there are no more samples. The fragments in the loop body ﬁnd the corresponding camera ray and hand it off to the integrators to compute its contribution, and ﬁnally updating the image with the result. Get all samples from Sampler and evaluate contributions ¢ £¡ ProgressReporter progress(sampler->TotalSamples(), "Rendering"); while (sampler->GetNextSample(sample)) { Find camera ray for sample ¡ Evaluate radiance along camera ray ¡ Add sample contribution to image ¡ Free BSDF memory from computing image sample value ¡ Report rendering progress ¡ } 12 Introduction [Ch. 1 The main function of the Camera class is to provide a GenerateRay() method, which determines the appropriate ray to trace for a particular sample position on the image plane given the particular image formation process that it is simulating. The sample and a ray are passed to this method, and the ﬁelds of the ray are initialized accordingly. An important convention that all Cameras must follow is that the direction components of the rays that they return must be normalized. Most of the Integrators depend on this fact. The camera also returns a ﬂoating-point weight with the ray can be used by Cameras that simulate realistic models of image formation where some rays through a lens system carry more energy than others; for example, in a real camera, less light typically arrives at the edges of the ﬁlm plane than at the center. This weight will be used later as a scale factor to be applied to this ray’s contribution to the image. Find camera ray for sample ¢ £¡ RayDifferential ray; Float rayWeight = camera->GenerateRay(*sample, &ray); Generate ray differentials for camera ray ¡ In order to get better results from some of the texture functions deﬁned in Chap- ter 11, it is useful to determine the rays that the Camera would generate for samples Camera 202 Camera::GenerateRay() 202 offset one pixel in the x and y direction on the image plane. This information will Integrator 562 later allow us to compute how quickly a texture is varying with respect to the pixel Ray 36 spacing when projected onto the image plane, so that we can remove detail from it RayDifferential 37 rential::hasDifferentials 38 that can’t be represented in the image being generated. Doing so eliminates a wide RayDifferential::rx 38 class of image artifacts due to aliasing. While the Ray class just holds the origin RayDifferential::ry 38 and direction of a single ray, RayDifferential inherits from Ray so that it also Sample::imageX 239 Sample::imageY 239 has those member variables, but it also holds two additional Rays, rx and ry to Spectrum 181 hold these neighbors. Generate ray differentials for camera ray ¢ £¡ ++sample->imageX; camera->GenerateRay(*sample, &ray.rx); --sample->imageX; ++sample->imageY; camera->GenerateRay(*sample, &ray.ry); ray.hasDifferentials = true; --sample->imageY; Given a ray, the Scene::Render() method calls Scene::L(), which returns the amount of light arriving at the image along the ray. The implementation of this method will be shown in the next section. The physical unit that describes the strength of this light is radiance; it is described in detail in Section 5.2. The symbol for radiance is L, thus the name of the method. These radiance values are represented with the Spectrum class, the abstraction that deﬁnes the representation of general energy distributions by wavelength–in other words, color. In addition to returning the ray’s radiance, Scene::L() sets the alpha variable passed to it to the alpha value for this ray. Alpha is an extra component beyond color that encodes opacity. If the ray hits an opaque object, alpha will be one, indicating that nothing behind the intersection point is visible. If the ray passed Sec. 1.3] System Overview 13 through something partially transparent, like fog, but never hit an opaque object alpha will be between zero and one. If the ray didn’t hit anything, alpha is zero. Computing alpha values here and storing an alpha value with each pixel can be useful for a variety of post-processing effects; for example, we can composite a rendered object on top of a photograph, using the pixels in the image of the pho- tograph wherever the rendered image’s alpha channel is zero, using the rendered image where its alpha channel is one, and using a mix of the two for the remaining pixels. Finally, an assertion checks that the returned spectral radiance value doesn’t have any ﬂoating-point “not a number” components; these are a common side- effect of bugs in other parts of the system, so it’s helpful to catch them immediately here. Evaluate radiance along camera ray ¢ £¡ Float alpha; Spectrum Ls = 0.f; if (rayWeight > 0.f) Ls = rayWeight * L(ray, sample, &alpha); Issue warning if unexpected radiance value returned ¡ Issue warning if unexpected radiance value returned ¢ £¡ if (Ls.IsNaN()) 370 BSDF Error("Not-a-number radiance value returned for image sample"); 374 BSDF::FreeAll() else if (Ls.y() < 0) 203 Camera::film 294 Film::AddSample() Error("Negative luminance value, %f, returned for image sample", 670 MemoryArena Ls.y()); 660 ProgressReporter 15 Scene::L() After we have the ray’s contribution, we can update the image. The Film::AddSample() 181 Spectrum method updates the pixels in the image given the results from this sample. The de- 661 StatsCounter tails of this process are explained in Section 7.7. Add sample contribution to image¢ £¡ camera->film->AddSample(*sample, ray, Ls, alpha); BSDFs describe material properties at a single point on a surface; they will be described in more detail later in this section. In lrt, it’s necessary to dynamically allocate memory to store the BSDFs used to compute the contribution of sample value here. In order to avoid the overhead of calling the system’s memory allo- cation and freeing routines multiple times for each of them, the BSDF class uses the MemoryArena class to manage pools of memory for BSDFs. Section 10.1.1 describes this in more detail. Now that the contribution for this sample has been computed, it’s necessary to tell the BSDF class that all of the BSDF memory allo- cated for the sample we just ﬁnished is no longer needed, so that it can be reused for the next sample. Free BSDF memory from computing image sample value ¢ £¡ BSDF::FreeAll(); So that it’s easy for various parts of lrt to gather statistics on things that may be meaningful or interesting to the user, a handful of statistics-tracking classes are deﬁned in Appendix A.2.3. StatsCounter overloads the ++ operator for indicating that the counter should be incremented. The ProgressReporter class indicates how many steps out of the total have been completed with a row of plus signs 14 Introduction [Ch. 1 printed to the screen; a call to its Update() method indicates that one of the total number of steps passed to its constructor has been completed. Report rendering progress ¢ £¡ static StatsCounter cameraRaysTraced("Camera", "Camera Rays Traced"); ++cameraRaysTraced; progress.Update(); At the end of the main loop, Scene::Render() frees the sample memory and begins the third phase of lrt’s execution with the call to Film::WriteImage(), where the imaging pipeline prepares a ﬁnal image to be stored. Clean up after rendering and store ﬁnal image ¢ £¡ delete sample; camera->film->WriteImage(); 1.3.4 Scene Methods The Scene only has a handful of additional methods; it mostly just holds the vari- ables that represent the scene. The methods it does have generally have little com- BBox 38 plexity and forward requests on to methods of the Scene’s member variables. Camera 202 First is the Scene::Intersect() method, which traces the given ray into the Camera::film 203 Film::WriteImage() 294 scene and returns a boolean value indication whether it intersected any of the Intersection 131 primitives. If so, it returns information about the closest intersection point in the Primitive::Intersect() 131 Intersection structure deﬁned in Section 4.1. Primitive::IntersectP() 131 rogressReporter::Update() 660 Scene Public Methods ¡¡ ¢ Ray 36 bool Intersect(const Ray &ray, Intersection *isect) const { Scene::aggregate 9 StatsCounter 661 return aggregate->Intersect(ray, isect); SurfaceIntegrator 563 } A closely-related method is Scene::IntersectP(), which checks for any in- tersection along a ray, again returning a boolean result. Because it doesn’t return information about the geometry at the intersection point and because it doesn’t need to search for the closest intersection, it can be more efﬁcient than Scene::Intersect() for rays where this additional information isn’t needed. Scene Public Methods ¡¡ ¢ bool IntersectP(const Ray &ray) const { return aggregate->IntersectP(ray); } Another useful geometric method, Scene::WorldBound(), returns a 3D box that bounds the extent of the geometry in the scene. We won’t include its straight- forward implementation here. Scene Public Methods ¡¡ ¢ const BBox &WorldBound() const; The Scene’s method to compute the radiance along a ray, Scene::L(), uses a SurfaceIntegrator to compute reﬂected radiance from the ﬁrst surface that the given ray intersects and stores the result in Ls. It then uses the volume integrator’s Sec. 1.3] System Overview 15 Transmittance() method to compute how much of that light is extinguished be- tween the point on the surface and the camera due attenuation and scattering of light by participating media, if any. Participating media may also increase light along the ray; the VolumeIntegrator’s L() method computes how much light is added along the ray due to volumetric light sources and scattering from particles in the media. Section 16.7 describes the theory of attenuation and scattering from participating media in detail. The net effect of these interactions is returned by this method. Scene Methods ¡¡ ¢ Spectrum Scene::L(const RayDifferential &ray, const Sample *sample, Float *alpha) const { Spectrum Ls = surfaceIntegrator->L(this, ray, sample, alpha); Spectrum T = volumeIntegrator->Transmittance(this, ray, sample, alpha); Spectrum Lv = volumeIntegrator->L(this, ray, sample, alpha); return T * Ls + Lv; } It’s also useful to compute the attenuation of a ray in isolation; the Scene’s 36 Ray 37 RayDifferential Transmittance() method returns the reduction in radiance along the ray due to 8 Scene participating media. 181 Spectrum 630 VolumeIntegrator Scene Methods ¡¡ ¢ 630 VolumeIntegrator::Transmittance( Spectrum Scene::Transmittance(const Ray &ray) const { 16 WhittedIntegrator return volumeIntegrator->Transmittance(this, ray, NULL, NULL); } 1.3.5 Whitted Integrator Chapter 16 has the implementations of many different surface and volume integra- tors, giving differing levels of accuracy using a variety of algorithms to compute the results. Here we will present a classic surface integrator based on Whitted’s ray tracing algorithm. This integrator accurately computes reﬂected and transmit- ted light from specular surfaces like glass, mirrors, and water, though it doesn’t account for indirect lighting effects. The more complex integrators later in the book build on the ideas in this integrator to implement more sophisticated light transport algorithms. The WhittedIntegrator is in the whitted.cpp ﬁle in the integrators/ di- rectory. whitted.cpp* ¢ £¡ #include "lrt.h" #include "transport.h" #include "scene.h" WhittedIntegrator Declarations¡ WhittedIntegrator Method Deﬁnitions ¡ 16 Introduction [Ch. 1 Figure 1.4: Class relationships for surface integration: the main render loop passes a camera ray to the SurfaceIntegrator, which has the task of returning the ra- diance along that ray arriving at the ray’s origin on the ﬁlm plane. The integrator calls back to the Scene::Intersect() method fo ﬁnd the ﬁrst surface that the ray intersects; the scene in turn passes the request on to an accelerator (which is itself a Primitive). The accelerator will perform ray–primitive intersection tests with the Primitives that the ray potentially intersects, and these will lead to the Shape::Intersect() routines for the corresponding shapes. Once the Intersection is returned to the integrator, it gets the material properties at the intersection point in the form of a BSDF and uses the Lights in the Scene to deter- mine the illumination there. This gives the information needed to compute reﬂected radiance at the intersection point back along the ray. BSDF 370 Intersection 131 Light 478 WhittedIntegrator Declarations ¢ £¡ Primitive 130 class WhittedIntegrator : public SurfaceIntegrator { RayDifferential 37 Scene 8 public: Scene::Intersect() 14 WhittedIntegrator Public Methods ¡ Spectrum 181 private: SurfaceIntegrator 563 WhittedIntegrator Private Data ¡ }; The key method that all integrators must provide is L(), which returns the ra- diance along a ray. Figure 1.4 summarizes the data-ﬂow among the main classes used during integration at surfaces. WhittedIntegrator Method Deﬁnitions ¢ £¡ Spectrum WhittedIntegrator::L(const Scene *scene, const RayDifferential &ray, const Sample *sample, Float *alpha) const { Intersection isect; Spectrum L(0.); if (scene->Intersect(ray, &isect)) { if (alpha) *alpha = 1.; Compute emitted and reﬂected light at ray intersection point ¡ } else { Handle ray with no intersection ¡ } return L; } For the integrator to determine what primitive is hit by a ray, it calls the Scene::Intersect() Sec. 1.3] System Overview 17 method, If the ray passed to the integrator’s L() method intersects a geometric primitive, the reﬂected radiance is given by the sum of directly emitted radiance from the object if it is itself emissive, and the reﬂected radiance due to reﬂection of light from other primitives and light sources that arrives at the intersection point. This idea is formalized by the equation below, which says that outgoing radiance from a point p in direction ωo , Lo p ωo , is the sum of emitted radiance at that point ¡ ¡ in that direction, Le p ωo , plus the incident radiance from all directions on the ¡ ¡ sphere S 2 around p scaled by a function that describes how the surface scatters light from the incident direction ω i to the outgoing direction ωo , f p ωo ωi , and ¡ ¡ ¡ a cosine term. We will show a more complete derivation of this equation later, in Sections 5.4.1 and 16.2. Lo p ω o ¡ ¢ £¡ Le p ω o ¡ ¥ ¡ ¤ Li p ωi f p ωo ωi cos θi dωi ¡ ¡ ¡ ¡ §¡ ¦ ¦ S2 Solving this integral analytically is in general not possible for anything other than the simplest of scenes, so integrators must either make simplifying assump- tions or use numerical integration techniques. The WhittedIntegrator ignores incoming light from most of the directions and only evaluates L i p ωi for the di- ¡ ¡ rections to light sources and for the directions of specular reﬂection and refraction. 370 BSDF Thus, it turns the integral into a sum over a small number of directions. 16 WhittedIntegrator The Whitted integrator works by recursively evaluating radiance along reﬂected and refracted ray directions. We keep track of the depth of recursion in the vari- able rayDepth and after a predetermined recursion depth, maxDepth, we stop trac- ing reﬂected and refracted rays. By default the maximum recursion depth is ﬁve. Otherwise, in a scene like a box where all of the walls were mirrors, the recur- sion might never terminate. These member variables are initialized in the trivial WhittedIntegrator constructor, which we will not include in the text. WhittedIntegrator Private Data ¢ £¡ int maxDepth; mutable int rayDepth; The Compute emitted and reﬂected light at ray intersection point fragment is ¡ the heart of the Whitted integrator. Compute emitted and reﬂected light at ray intersection point ¢ £¡ Evaluate BSDF at hit point ¡ Initialize common variables for Whitted integrator ¡ Compute emitted light if ray hit an area light source ¡ Compute reﬂection by integrating over the lights ¡ if (rayDepth++ < maxDepth) { Trace rays for specular reﬂection and refraction ¡ } --rayDepth; To compute reﬂected light, the integrator must have a representation of the local light scattering properties of the surface at the intersection point as well as a way to determine the distribution of illumination arriving at that point. To represent the scattering properties at a point on a surface, lrt uses a class called BSDF, which stands for “Bidirectional Scattering Distribution Function”. 18 Introduction [Ch. 1 Figure 1.5: Basic setting for the Whitted integrator: p is the ray intersection point and n is the surface normal there. The direction in which we’d like to compute reﬂected radiance is ωo ; its is the vector pointing in the opposite direction of the ray, -ray.d. These functions take an incoming direction and an outgoing direction and return a value that indicates the amount of light that is reﬂected from the incoming direc- tion to the outgoing direction (actually, BSDF’s usually vary as a function of the BSDF 370 wavelength of light, so they really return a Spectrum). lrt provides built-in BSDF BSDF::dgShading 370 classes for several standard scattering functions used in computer graphics. Exam- DifferentialGeometry::nn 58 ples of BSDFs include Lambertian reﬂection and the Torrance-Sparrow microfacet DifferentialGeometry::p 58 Intersection::GetBSDF() 375 model; these and other BSDFs are implemented in Chapter 9. Normal 34 The BSDF at a surface point provides all information needed to shade that point, Point 33 but BSDFs may vary across a surface. Surfaces with complex material properties, Ray::d 35 Spectrum 181 such as wood or marble, have a different BSDF at each point. Even if wood is Texture 394 modelled as perfectly diffuse, for example, the diffuse color at each point will Vector 27 depend on the wood’s grain. These spatial variations of shading parameters are described with Textures, which in turn may be described procedurally or stored in image maps; see Chapter 11. The Intersection::GetBSDF() method returns a pointer to the BSDF at the intersection point on the object. Evaluate BSDF at hit point ¢ £¡ BSDF *bsdf = isect.GetBSDF(ray); There are a few quantities that we’ll make use of repeatedly in the fragments to come. Figure 1.5 illustrates them. p the world-space position of the ray–primitive intersection and n is the surface normal at the intersection point. The normalized direction from the hit point back to the ray origin is stored in wo; because Cameras are responsible for normalizing the direction component of the rays they generate, there’s no need to re-noralize it here. (Normalized directions in lrt are generally denoted by the ω symbol, so wo is a shorthand we will commonly use for ω o , the outgoing direction of scattered light.) Initialize common variables for Whitted integrator ¢ £¡ const Point &p = bsdf->dgShading.p; const Normal &n = bsdf->dgShading.nn; Vector wo = -ray.d; Sec. 1.3] System Overview 19 If the ray happened to hit geometry that is itself emissive, we compute its emitted radiance by calling the Intersection’s Le() method. This gives us the ﬁrst term of the outgoing radiance equation above. If the object is not emissive, this method will return a black spectrum. Compute emitted light if ray hit an area light source ¢ £¡ L += isect.Le(wo); For each light, the integrator computes the amount of illumination falling on the surface at the point being shaded by calling the light’s dE() method, passing it the position and surface normal for the point on the surface. E is the symbol for the physical quantity irradiance, and differential irradiance, dE, is the appropriate measure of incident illumination here–radiometric concepts such as energy and differential irradiance are discussed in Chapter 5. This method also returns the direction vector from the point being shaded to the light source, which is stored in the variable wi. The Light::dE() method also returns a VisibilityTester object, which is a closure representing additional computation to be done to determine if any prim- itives block the light from the light source. Speciﬁcally, the Spectrum that is re- turned from Light::dE() doesn’t account for any other objects blocking light between the light source and the surface. To verify that there are no such occlud- 373 BSDF::f() ers, a shadow ray must be traced between the point being shaded and the point on 131 Intersection the light to verify that the path is clear. Because ray tracing is relatively expensive, 132 Intersection::Le() we would like to defer tracing the ray until we are sure that the BSDF indicates that 479 Light::dE() 9 Scene::lights some of the light from the direction ω o will be scattered in the direction ω o . For181 Spectrum example, if the surface isn’t transmissive, then light arriving at the back side of the 182 Spectrum::Black() surface doesn’t contribute to reﬂection. The VisibilityTester encapsulates the 27 Vector 479 VisibilityTester state needed to record which ray needs to be traced to do this check. (In a similar manner, the attenuation along the ray to the light source due to participating media is ignored until explicitly evaluated via the Transmittance() method.) To evaluate the contribution to the reﬂection due to the light, the integrator mul- tiplies dE by the value that the BSDF returns for the fraction of light that is scattered from the light direction to the outgoing direction along the ray. This represents this light’s contribution to the reﬂected light in the integral over incoming directions, which is added to the total of reﬂected radiance stored in L. After all lights have been considered, the integrator has computed total reﬂection due to direct lighting: light that arrives at the surface directly from emissive objects (as opposed to light that has reﬂected off other objects in the scene before arriving at the point.) Compute reﬂection by integrating over the lights ¢ £¡ Vector wi; for (u_int i = 0; i < scene->lights.size(); ++i) { VisibilityTester visibility; Spectrum dE = scene->lights[i]->dE(p, n, &wi, &visibility); if (dE.Black()) continue; Spectrum f = bsdf->f(wo, wi); if (!f.Black() && visibility.Unoccluded(scene)) L += f * dE * visibility.Transmittance(scene); } 20 Introduction [Ch. 1 Figure 1.6: Before we ﬁnish, the integrator also accounts for the contribution of light scat- tered by perfectly specular surfaces like mirrors or glass. Consider a mirror, for example. The law of mirror reﬂection says that the angle the reﬂected ray makes with the surface normal is equal to the angle made by the incident ray (see Fig- ure 1.6). Thus, to compute reﬂected radiance from a mirror in direction ω o , we need to know the incident radiance at the surface point p in the direction ω i . The key insight that Whitted had was that this could be found with arecursive call to the ray tracing routine with a new ray from p in the direction ω i . Therefore, when a specularly reﬂective or transmissive object is hit by a ray, new rays are also traced in the reﬂected and refracted directions and the returned radiance values are scaled by the value of the surface’s BSDF and added to the radiance scattered from the original point. The BSDF has a method that returns an incident ray direction for a given outgoing direction and a given mode of light scattering at a surface. Here, we are only inter- ested in perfect specular reﬂection and transmission, so we use the BSDF * ﬂags to BSDF::Sample f() to indicate that glossy and diffuse reﬂection should be ignored BSDF 370 here. Thus, the two calls to Sample f() below check for specular reﬂection and Scene 8 transmission and initialize wi with the appropriate direction and return the BSDF’s WhittedIntegrator 16 value for the directions ωo ωi . If the value of the BSDF is non-zero, the inte- ¡ ¡ grator calls the Scene’s radiance function L() to get the incoming radiance along the ray, which leads to a call back to the WhittedIntegrator’s L() method. By continuing this process recursively multiple reﬂection and refraction are accounted for. One important detail in this process is how ray differentials for the reﬂected and transmitted rays are found; just as having an approximation to the screen-space area of a directly-visible object is cruicial for anti-aliasing textures on the object, if we can approximate the screen-space area of objects that are seen through reﬂection or refraction, we can reduce aliasing in their textures as well. The fragments that implement the computations to ﬁnd the ray differentials for these rays are described in Section 10.2.2. To compute the cosine term of the reﬂection integral, the integrator calls the Dot() function, which returns the dot product between two vectors. If the vectors are normalized, as both wi and n are here, this is equal to the cosine of the angle between them. Sec. 1.4] How To Proceed Through This Book 21 Trace rays for specular reﬂection and refraction ¢ £¡ Spectrum f = bsdf->Sample_f(wo, &wi, BxDFType(BSDF_REFLECTION | BSDF_SPECULAR)); if (!f.Black()) { Compute ray differential rd for specular reﬂection ¡ L += scene->L(rd, sample) * f * AbsDot(wi, n); } f = bsdf->Sample_f(wo, &wi, BxDFType(BSDF_TRANSMISSION | BSDF_SPECULAR)); if (!f.Black()) { Compute ray differential rd for specular transmission ¡ L += scene->L(rd, sample) * f * AbsDot(wi, n); } Handle ray with no intersection ¢ £¡ if (alpha) *alpha = 0.; return L; And this concludes the WhittedIntegrator’s implementation. ¨ ©§¢ £¡ ¥ ¤ ¤ £ £ ¤¥ ¥ ¨ ¤ £ ¢ ¤ ¦ § ¥ 38 BBox 540 BSDF::Sample f() We have written this text assuming it will be read in roughly front-to-back order. 334 BSDF REFLECTION 334 BSDF SPECULAR We have tried to minimize the number of forward references to ideas and interfaces 334 BSDF TRANSMISSION that haven’t yet been introduced, but assume that the reader is acquianted with the 334 BxDFType content before any particular point in the text. Because of the modular nature 202 Camera 33 Point of the system, the most improtant thing to be able to understand an individual 36 Ray section of code is that the reader be familiar with the low-level classes like Point, 237 Sampler Ray, Spectrum, etc., the interfaces deﬁned by the abstract base classes listed in 15 63 Scene::L() Shape Figure 1.1, and the main rendering loop in Scene::Render(). 181 Spectrum Given that knowledge, for example, the reader who doesn’t care about precisely how a camera model based on a perspective projection matrix maps samples to rays can skip over the implementation of that camera and can just remember that the Camera::GenerateRay() method somehow turns a Sample into a Ray. Fur- thermore, some sections go into depth about advanced topics that some readers may wish to skip over (particularly on a ﬁrst reading); these sections are denoted by an asterisk. The book is divdided into four main sections of a few chapters each. First, chapters two through four deﬁne the main geometric functinoality in the system. Chapter two has the low-level classes like Point, Ray, and BBox; chapter three de- ﬁnes the Shape interface, has implementations of a number of shapes, and shows how to perform ray–shape intersection tests; and chapter four has the implemen- tations of the acceleration structures for speeding up ray tracing by avoiding tests with primitives that a ray can be shown to deﬁnitely not intersect. The second main section covers the image formation process. First, chapter ﬁve introduces the physical units used to measure light and the Spectrum class that represents wavelength-varying distributions (i.e. color). Chapter six deﬁnes the Camera interface and has a few different camera implementations. The Sampler classes that place samples on the image plane are the topic of chapter seven and 22 Introduction [Ch. 1 the overall process of turning the radiance values from camera rays into images suitable for display are explained in chapter eight. The third section is about light and how light scatters from surfaces and par- ticipating media. Chapter nine deﬁnes a set of building-block classes that deﬁne a variety of types of reﬂection from surfaces. Materials, described in chapter ten, use these reﬂection functions to implement a number of different types of surface ma- terials, such as plastic, glass, and metal. Chapter eleven introduces texture, which describes variation in material properties (color, roughness, etc.) over surfaces, and chapter twelve has the abstractions used to describe how light is scattered and absorbed in participating media. Finally, chapter thirteen has the interface for light sources and light source implementations. The last section brings all of the ideas of the rest of the book together to imple- ment a number of integrators. Chapters fourteen and ﬁfteen introduce the theory of Monte Carlo integration, a statistical technique for estimating the value of complex integrals, and have low-level routines for applying Monte Carlo to illumination and light scattering. The surface and volume integrators of chapter sixteen use Monte Carlo integration to compute more accurate approximations of the light reﬂection equation deﬁned above than the WhittedIntegrator did, using techniques like path tracing, bidirectional path tracing, irradiance caching, and photon mapping. WhittedIntegrator 16 The last chapter of the book has a brief retrospective and discussion of system de- sign decisions along with a number of suggestions for more far-reaching projects than those in the exercises in previous chapters. ¨ ¨ § § ¡ £ ¡ § ¥ ¨ ¢ In a seminal early paper, Arthur Appel ﬁrst described the basic idea of ray trac- ing to solve the hidden surface problem and to compute shadows in polygonal scenes (Appel 1968). Goldstein and Nagle later showed how ray tracing could be used to render scenes with quadric surfaces (Goldstein and Nagel 1971). (XXX ﬁrst direct rendering of curved surfaces? XXX) Kay and Greenberg described a ray tracing approach to rendering transparency (Kay and Greenberg 1979), and Whitted’s seminal CACM paper described the general recursive ray tracing algo- rithm we have outlined in this chapter, accurately simulating reﬂection and refrac- tion from specular surfaces and shadows from point light sources (Whitted 1980). Notable books on physically-based rendering and image synthesis include Co- hen and Wallace’s Radiosity and Realistic Image Synthesis (Cohen and Wallace 1993) and Sillion and Puech’s Radiosity and Global Illumination (Sillion and Puech 1994) which primarily describe the ﬁnite-element radiosity method; Glassner’s Principles of Digital Image Synthesis (Glassner 1995), an encyclopediac two-volume summary of theoretical foundations for realistic rendering; and Illumination and Color in Computer Generated Imagery (Hall 1989), one of the ﬁrst books to present rendering in a physically-based framework. XXX Advanced Globillum Book XXX Many papers have been written that describe the design and implementation of other rendering systems. One type of renderer that has been written about is ren- derers for entertainment and artistic applications. The REYES architecture, which forms the basis for Pixar’s RenderMan renderer, was ﬁrst described by Cook et al (Cook, Carpenter, and Catmull 1987); a number of improvements to the original algorithm are summarized in (Apodaca and Gritz 2000). Gritz and Hahn describe Additional Reading 23 the BMRT ray tracer (Gritz and Hahn 1996), though mostly focus on the details of implementing a ray tracer that supports the RenderMan interface. The renderer in the Maya modeling and animation system is described by Sung et al (Sung, Craighead, Wang, Bakshi, Pearce, and Woo 1998). Kirk and Arvo’s paper on ray tracing system design was the ﬁrst to suggest many design principles that have now become classic in renderer design (Kirk and Arvo 1988). The renderer was implemented as a core kernel that encapsulated the basic rendering algorithms and interacted with primitives and shading routines via a carefully-constructed object-oriented interface. This approach made it easy to extend the system with new primitives and acceleration methods. The Introduction to Ray Tracing book, which describes the state-of-the-art in ray tracing in 1989, has a chapter by Heckbert that sketches the design of a basic ray tracer (?). Finally, Shirley’s recent book gives an excellent introduction to ray tracing and includes the complete source code to a basic ray tracer. XXX cite XXX Researchers at Cornell university have developed a rendering testbed over many years; its overall structure is described by Trumbore et al (Trumbore, Lytle, and Greenbert 1993). Its predecessor was described by Hall and Greenberg (Hall and Greenberg 1983). This system is a loosely-coupled set of modules and libraries, each designed to handle a single task (ray–object intersection acceleration, image storage, etc), and written in a way that makes it easy to combine appropriate mod- ules together to investigate and develop new rendering algorithms. This testbed has been quite successful, serving as the foundation for much of the rendering research done at Cornell. Another category of renderer focuses on physically-based rendering, like lrt. One of the ﬁrst renderers based fundamentally on physical quantities is Radiance, which has been used widely in lighting simulation applications. Ward describes its design and history in a paper and a book (Ward 1994b; Larson and Shakespeare 1998). Radiance is designed in the Unix style, as a set of interacting programs, each handling a different part of the rendering process. (This type of rendering architec- ture, interacting separate programs, was ﬁrst described by Duff (Duff 1985).). Glassner’s Spectrum rendering architecture also focuses on physically-based rendering (Glassner 1993), appraoched through a signal-processing based formu- lation of the problem. It is an extensible system built with a plug-in architecture; lrt’s approach of using parameter/value lists for initializing plug-in objects is sim- ilar to Spectrum’s. One notable feature of Spectrum is that all parameters that de- scribe the scene can be animated in a variety of ways. Slusallek and Seidel describe the architecture of the Vision rendering system, which is also physically based and was designed to be extensible to support a wide variety of light transport algorithms (Slusallek and Siedel 1995; Slusallek and Sei- del 1996; Slusallek 1996). In particular, it has the ambitious goal of supporting both Monte Carlo and ﬁnite-element based light transport algorithms. Because lrt was designed with the fundamental expectation that Monte Carlo algorithms would be used, its design could be substantially more straightforward. The RenderPark rendering system also supports a variety of physically-based rendeirng algorithms, including both Monte Carlo and ﬁnite element approaches. It was developed by by Philippe Bekaert, Frank Suykens de Laet, Pieter Peers, and Vincent Masselus, and is available from http://www.cs.kuleuven.ac.be/cwis/research/graphics/RENDERPARK/. The source code to a number of other ray tracers and renderers is available on 24 Introduction [Ch. 1 the web. Notable ones include Mark VandeWettering’s MTV, which was the ﬁrst widely-distributed freely-available ray tracer; it was posted to the comp.sources.unix newsgroup in 1988. Craig Kolb’s rayshade had a number of releases during the 1990s; its current homepage is http://graphics.stanford.edu/ cek/rayshade/rayshade.html. The radiance system is available from http://radsite.lbl.gov/radiance/HOME.html. POV-Ray is used by a large number of individuals, primarily for personal purposes; it is available from http://www.povray.org. XXX Photon, 3Dlight, Aqusis. XXX A good introduction to the C++ programming language and C++ standard li- brary is the third edition of Stroustroup’s The C++ Programming Language(Stroustrup 1997). ¡ ¥ ¥ £ 1.1 A good way to gain understanding of the system is to follow the process of computing the radiance value for a single ray in the debugger. Build a version of lrt with debugging symbols and set up your debugger to run lrt with the XXXX.lrt scene. Set a breakpoint in the Scene::Render() method and trace through the process of how a ray is generated, how its radiance value is computed, and how its contribution is added to the image. As you gain more understanding of how the details of the system work, re- turn to this and more carefully trace through particular parts of the process. ¡ £ ¢ £ ¡ ¡ ¢ ¡ ¥ ¡ ¢ ¥ ¡ ¦¢ ¢ ¥ We now present the fundamental geometric primitives around which lrt is built. Our representation of actual scene geometry (triangles, etc.) is presented in Chapter 3; here we will discuss fundamental building blocks of 3D graphics, such as points, vectors, rays, and transformations. Most of this code is stored in core/geometry.h and core/geometry.cpp, though transformation matrices, de- ﬁned in Section 2.6 will be implemented in separate source ﬁles. 2.0.1 Afﬁne Spaces As is typical in computer graphics, lrt represents 3D points, vectors, and normal vectors with three ﬂoating-point coordinate values: x, y, and z. Of course, these values are meaningless without a coordinate system that deﬁnes the origin of the space and gives three non-parallel vectors for the x, y, and z axes of the space. Together, the origin and three vectors are called the frame that deﬁnes the coor- dinate system. Given an arbitrary point or direction in 3D, its x y z coordinate ¡ ¡ ¡ values depend on its relationship to the frame. Figure 2.1 shows an example that illustrates this idea in 2D. A frame’s origin Po and its n linearly independent basis vectors deﬁne an n- dimensional afﬁne space. All vectors v in the space can be expressed as a linear combination of the basis vectors. Given a vector v and the basis vectors v i , we can compute scalar values si such that v ¢ s 1 v1 ££¡ ¢¢¢ ¡ sn vn ¤ The scalars si are the representation of v with respect to the basis v 1 v2 vn , ¥ ¡ ¡ ¤¤ ¦ ££¤ ¡ and are the coordinate values that we store with the vector. Similarly, for all points ¨ ©§ 26 Geometry and Transformations [Ch. 2 Figure 2.1: In 2D, the x y coordinates of a point p are deﬁned by the relationship ¡ ¡ of the point to a particular 2D coordinate system. Here, two coordinate systems are shown; the point might have coordinates 8 8 with respect to the coordinate ¡ ¡ system with its coordinate axes drawn in solid lines, but have coordinates 2 4 ¡ ¡ ¡ with respect to the coordinate system with dashed axes. In either case, the 2D point p is at the same “absolute” position in space. p, we can compute scalars si such that p ¢ Po s1 v1 ¢¢¢ ££¡ sn vn ¤ Thus, although points and vectors are both represented by x, y, and z coordinates in 3D, they are clearly distinct mathematical entities, and are not freely inter- changable. This deﬁnition of points and vectors in terms of coordinate systems reveals a paradox: to deﬁne a frame we need a point and a set of vectors. But we can only meaningfully talk about points and vectors with respect to a particular frame. Therefore, we need a standard frame with origin 0 0 0 and basis vectors 1 0 0 , ¡ ¡ ¡ ¡ ¡ ¡ 0 1 0 , and 0 0 1 . All other frames will be deﬁned with respect this canonical ¡ ¡ ¡ ¡ ¡ ¡ coordinate system. We will call this coordinate system world space. 2.0.2 Coordinate System Handedness XXX Left and right handed coordinate systems: basic idea of what the differ- ence is. lrt uses left handed. XXX There are two different ways that the three coordinate axes can be arranged– having chosen perpindicular x and y, perpindicular z can go in one of two direc- tions. These Two choices have been called left-handed and right-handed. Figure XXX shows the two possibilities. Idea is that if you take your thumb, index, and middle ﬁnger, arrange them as shown in ﬁgure XXX, then for a left-handed coor- dinate system, XXX. This choice has a number of implications in how some of the geometric operations in this chapter are deﬁned... Sec. 2.1] Vectors 27 ¡§¨ ¤¥ £ £ Geometry Declarations ¢ £¡ class COREDLL Vector { public: Vector Methods ¡ Vector Public Data ¡ }; A Vector in lrt represents a direction in 3D space. As described above, we represent vectors with a three-tuple of components that give its representation in terms of the x, y, and z axes of the space it is deﬁned in. The individual components of a vector v will be written vx , vy , and vz . Vector Public Data ¢ £¡ Float x, y, z; Readers who are experienced in object-oriented design might object to our de- cision to make the Vector data publicly accessible. Typically, data members are only accessible inside the class, and external code that wishes to access or mod- ify the contents of a class must do so through a well-deﬁned API of selector and mutator functions. While we generally agree with this design principle, it is not ap- propriate here. The purpose of selector and mutator functions is to hide the class’s internal implementation details. In the case of Vectors, this hiding gains nothing, and adds bulk to the class usage. By default, the x y z values are set to zero. ¡ ¡ ¡ Vector Methods ¢ £¡ Vector(Float _x=0, Float _y=0, Float _z=0) : x(_x), y(_y), z(_z) { } 2.1.1 Arithmetic Adding and subtracting vectors is done component-wise. The usual geometric in- terpretation of vector addition and subtraction is shown in Figures 2.2 and 2.3. Vector Methods ¡¡ ¢ Vector operator+(const Vector &v) const { return Vector(x + v.x, y + v.y, z + v.z); } Vector& operator+=(const Vector &v) { x += v.x; y += v.y; z += v.z; return *this; } The code for subtracting two vectors is similar, and therefore not shown here. 2.1.2 Scaling We can also multiply a vector component-wise by a scalar, thereby changing its length. Three functions are needed in order to cover all of the different ways that 28 Geometry and Transformations [Ch. 2 w w v v+w v v+w v w (a) v w (b) v w ¡ w v Figure 2.2: Vector addition. Notice that the sum v w forms the diagonal of the parallelogram formed by v and w. Also, the ﬁgure on the right shows the commutativity of vector addition. v v v-w v-w -w w w (a) (b) Figure 2.3: Vector subtraction. The difference v w is the other diagonal of the parallelogram formed by v and w. Sec. 2.1] Vectors 29 this operation may be written in source code (i.e. v*s, s*v, and v *= s.) Vector Methods ¡¡ ¢ Vector operator*(Float f) const { return Vector(f*x, f*y, f*z); } Vector &operator*=(Float f) { x *= f; y *= f; z *= f; return *this; } Geometry Inline Functions ¡¡ ¢ inline Vector operator*(Float f, const Vector &v) { return v*f; } Similarly, a vector can be divided component-wise by a scalar. The code for scalar division is similar to scalar multiplication, though division of a scalar by a vector is not well-deﬁned, so is not permitted. In these methods, we use a single division to compute the scalar’s reciprocal, then perform three component-wise multiplications. This is a useful trick for avoid- ing expensive division operations. It is a common misconception that these sorts of optimizations are unnecessary because the compiler will perform the necessary 27 Vector analysis. Compilers are frequently unable to perform optimizations that require symbolic manipulation of expressions. For example, given two ﬂoating point num- bers, the quantities a+b and b+a are not candidates for common subexpression elimination, because the IEEE ﬂoating point representation cannot guarantee that the two sums will be identical. In fact, some programmers carefully order their ﬂoating point addition so as to minimize roundoff error, and it would be a shame for the compiler to undo all that hard work by rearranging a summation. Vector Methods ¡¡ ¢ Vector operator/(Float f) const { Float inv = 1.f / f; return Vector(x * inv, y * inv, z * inv); } Vector &operator/=(Float f) { Float inv = 1.f / f; x *= inv; y *= inv; z *= inv; return *this; } The Vector class also provides a unary negation operator. This returns a new vector pointing in the opposite direction of the original one. Vector Methods ¡¡ ¢ Vector operator-() const { return Vector(-x, -y, -z); } Some routines will ﬁnd it useful to be able to easily loop over the components of a Vector; the Vector class also provides a C++ operator so that given a vector v, 30 Geometry and Transformations [Ch. 2 then v[0] == v.x and so forth. For efﬁciency, it doesn’t check that the offset i is within the range 0 2 , but will trust calling code to get this right. This non-check ¡ ¢ is an example of a tradeoff between convenience and performance. While it places an additional burden on the caller, correct code will run faster. One possibility to avoid having to make this tradeoff would be to wrap the range check in a macro that disables the check when lrt is compiled with optimizations enabled. Why not just use assert() here? These get turned off when you compile in optimized mode. Seems wrong. Thoughts? Vector Methods ¡¡ ¢ Float operator[](int i) const { return (&x)[i]; } Float &operator[](int i) { return (&x)[i]; } 2.1.3 Normalization It is often necessary to normalize a vector; that is, to compute a new vector point- ing in the same direction but with unit length. A normalized vector is often called a unit vector. The method to do this is called Vector::Hat(), which is a com- ˆ mon mathematical notation for a normalized vector: v is the normalized version of v. Vector::Hat() simply divides each component by the length of the vector, Vector 27 denoted in text by v . Note that Vector::Hat() returns a new vector; it does not normalize the vector in place. Vector Methods ¡¡ ¢ Float LengthSquared() const { return x*x + y*y + z*z; } Float Length() const { return sqrtf( LengthSquared() ); } Vector Hat() const { return (*this)/Length(); } 2.1.4 Dot and Cross Product Two other useful operations on vectors are the dot product (also known as the scalar or inner product) and the cross product. For two vectors v and w, their dot product v w is deﬁned as ¢ ¡ vx wx vy wy vz wz Geometry Inline Functions ¡¡ ¢ inline Float Dot(const Vector &v1, const Vector &v2) { return v1.x * v2.x + v1.y * v2.y + v1.z * v2.z; } The dot product has a simple relationship to the angle between the two vectors maybe a ﬁgure here?: ¡ v w v w cos θ ¢ ¢ ¡ (2.1.1) ¡ where θ is the angle between v and w. It follows from this that v w is zero if ¢ ¡ and only if v and w are perpendicular, provided that neither v nor w is degenerate– equal to 0 0 0 . A set of two or more mutually-perpendicular vectors is said to be ¡ ¡ ¡ orthogonal. An orthogonal set of unit vectors is called orthonormal. Sec. 2.1] Vectors 31 It immediately follows from equation 2.1.1 that if v and w are unit vectors, their dot product is exactly the cosine of the angle between them. As the cosine of the angle between two vectors often needs to be computed in computer graphics, we will frequently make use of this property. A few basic properties directly follow from the deﬁnition. If u, v, and w are vectors and s is a scalar value, then u v ¢ ¡ ¢ v u ¢ ¡ su v ¢ ¡ ¢ sv u ¢ ¡ u v ¢ w ¡¡ ¢ u v ¢ ¡ u w ¢ ¡ We will frequently need to compute the absolute value of the dot product as well; the AbsDot() function does this for us so that we don’t need a separate call to fabsf(). Geometry Inline Functions ¡¡ ¢ inline Float AbsDot(const Vector &v1, const Vector &v2) { return fabsf(v1.x * v2.x + v1.y * v2.y + v1.z * v2.z); } The cross product is another useful vector operation. Given two vectors in 3D, the cross product v w is a vector that is perpendicular to both of them. Note that 27 Vector this new vector can point in one of two directions; the coordinate system’s hand- edness decides which is appropriate (recall the discussion in Section 2.0.2.) Given orthogonal vectors v and w, then v w should return a vector such that v w v w ¡ ¡ ¡ form a coordinate system of the appropriate handedness. In a left-handed coordinate system, the cross product is deﬁned as: v w ¡ x ¢ vy wz ¡ vz wy ¡ v w ¡ y ¢ vz wx ¡ vx wz ¡ v w ¡ z ¢ vx wy ¡ vy wx ¡ An easy way to remember this is to compute the determinant of the matrix: ¡ ¡ ¡ ¡ i j k ¡ ¡ v w ¢ ¡ ¡ vx vy vz ¡ ¡ ¡ wx wy wz ¡ where i, j, and k represent the axes 1 0 0 , 0 1 0 , and 0 0 1 , respectively. ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Note that this equation is merely a memory aid and not a rigorous mathematical construction, since the matrix entries are a mix of scalar and vector entries. Geometry Inline Functions ¡¡ ¢ inline Vector Cross(const Vector &v1, const Vector &v2) { return Vector((v1.y * v2.z) - (v1.z * v2.y), (v1.z * v2.x) - (v1.x * v2.z), (v1.x * v2.y) - (v1.y * v2.x)); } From the deﬁnition of the cross product, we can derive: ¡ v w ¢ v w sin θ ¡ (2.1.2) 32 Geometry and Transformations [Ch. 2 v1 h θ v2 Figure 2.4: The area of a parallelogram with edges given by vectors v 1 and v2 is equal to v2 h. The cross product can easily compute this value as v 1 v2 . where θ is the angle between v and w. An important implication of this is that the cross product of two perpendicular unit vectors is itself a unit vector. Note also that the result of the cross product is a degenerate vector if v and w are parallel. This deﬁnition also shows a convenient way to compute the area of a parallelogram– see Figure 2.4. If the two edges of the parallelogram are given by vectors v 1 and v2 , and has height h, the area is given by v 2 h. Since h sin θ v1 , we can use ¢ Equation 2.1.2 to see that the area is v 1 v2 . Vector 27 2.1.5 Coordinate system from a vector We will frequently want to construct a local coordinate system given only a single vector. Because the cross product of two vectors is orthogonal to both, we can simply apply it twice to get a set of three orthogonal vectors for our coordinate system. Note that the two vectors generated by this technique are only unique up to a rotation about the given vector. This function assumes that the vector passed in, v1, has already been normalized. We ﬁrst construct a perpendicular vector by zeroing one of the two components of the original vector and swapping the remaining two. Inspection of the two cases should make clear that v2 will be normalized and that the dot product v 1 v2 will ¢ ¡ be equal to zero. Given these two perpendicular vectors, a single cross product gives us the third, which by deﬁnition will be be perpendicular to the ﬁrst two. Geometry Inline Functions ¡¡ ¢ inline void CoordinateSystem(const Vector &v1, Vector *v2, Vector *v3) { if (fabsf(v1.x) > fabsf(v1.y)) { Float invLen = 1.f / sqrtf(v1.x*v1.x + v1.z*v1.z); *v2 = Vector(-v1.z * invLen, 0.f, v1.x * invLen); } else { Float invLen = 1.f / sqrtf(v1.y*v1.y + v1.z*v1.z); *v2 = Vector(0.f, v1.z * invLen, -v1.y * invLen); } *v3 = Cross(v1, *v2); } Sec. 2.2] Points 33 Figure 2.5: Obtaining the vector between two points. The vector p q is the component-wise subtraction of the points p and q. ¨ ¡ ¤ ¤¡ £ Geometry Declarations ¡¡ ¢ class COREDLL Point { public: Point Methods ¡ Point Public Data ¡ }; 27 Vector A point is a zero-dimensional location in 3D space. The Point class in lrt represents points in the obvious way: using x, y, and z coordinates with respect to their coordinate system. Although the same x y z representation is used for ¡ ¡ ¡ vectors, the fact that a point represents a position, whereas a vector represents a direction, leads to a number of important differences in how they are treated. Point Public Data ¢ £¡ Float x,y,z; Point Methods ¢ £¡ Point(Float _x=0, Float _y=0, Float _z=0) : x(_x), y(_y), z(_z) { } There are certain Point methods which either return or take a Vector. For instance, one can add a vector to a point, offsetting it in the given direction and ob- taining a new point. Alternately, one can subtract one point from another, obtaining the vector between them, as shown in Figure 2.5. Point Methods ¡¡ ¢ Point operator+(const Vector &v) const { return Point(x + v.x, y + v.y, z + v.z); } Point &operator+=(const Vector &v) { x += v.x; y += v.y; z += v.z; return *this; } 34 Geometry and Transformations [Ch. 2 Point Methods ¡¡ ¢ Vector operator-(const Point &p) const { return Vector(x - p.x, y - p.y, z - p.z); } Point operator-(const Vector &v) const { return Point(x - v.x, y - v.y, z - v.z); } Point &operator-=(const Vector &v) { x -= v.x; y -= v.y; z -= v.z; return *this; } The distance between two points is easily computed by subtracting to compute a vector and then ﬁnding the length of that vector. Geometry Inline Functions ¡¡ ¢ inline Float Distance(const Point &p1, const Point &p2) { return (p1 - p2).Length(); } Point 33 inline Float DistanceSquared(const Point &p1, const Point &p2) { Vector 27 return (p1 - p2).LengthSquared(); } Although it doesn’t make sense mathematically to weight points by a scalar or add two points together, the Point class still allows these operations in order to be able to compute weighted sums of points, which is mathematically meaningful as long as the weights used all sum to one. The code for scalar multiplication and addition with Points is identical to Vectors, so it is not shown here. ¡¨ ¡ ¦ £ § ¤ ¤ Geometry Declarations ¡¡ ¢ class COREDLL Normal { public: Normal Methods ¡ Normal Public Data ¡ }; A surface normal (or just normal) is a vector that is perpendicular to a surface at a particular position. It can be deﬁned as the cross product of any two non- parallel vectors that are tangent to the surface at a point. Although normals are superﬁcially similar to vectors, it is important to distinguish between the two of them: because normals are deﬁned in terms of their relationship to a particular surface, they behave differently than vectors in some situations, particularly when applying transformations. This difference is discussed in Section 2.7. The implementations of Normals and Vectors are very similar: like vectors, normals are represented by three Floats x, y, and z, they can be added and sub- tracted to compute new normals and they can be scaled and normalized. However, Sec. 2.4] Rays 35 a normal cannot be added to a point and one cannot take the cross product of two normals. Note that in an unfortunate turn of terminology normals are not necessar- ily normalized. The Normal provides an extra constructor that initializes a Normal from a Vector. Because Normals and Vectors are different in subtle ways, we want to make sure that this conversion doesn’t happen when we don’t intend it to. Fortunately, the C++ explicit keyword ensures that conversion between two compatible types only happens when that is the intent. The Vector also provides a constructor that converts the other way. Normal Methods ¡¡ ¢ explicit Normal(const Vector &v) : x(v.x), y(v.y), z(v.z) {} Vector Methods ¡¡ ¢ explicit Vector(const Normal &n); Geometry Inline Functions ¡¡ ¢ inline Vector::Vector(const Normal &n) : x(n.x), y(n.y), z(n.z) { } Thus, given the declarations Vector v; Normal n;, the assignment n = v is illegal, so it is necessary to explicitly convert the vector, as in n = Normal(v). 34 Normal 33 Point The Dot() and AbsDot() functions are also overloaded to compute dot prod- 36 Ray ucts between the various possible combinations of normals and vectors. This code 27 Vector won’t be included in the text. We won’t include implementations of all of the various other Normal methods here, since they are otherwise similar to those for vectors. ¦ § ¨ ¡ Geometry Declarations ¡¡ ¢ class COREDLL Ray { public: Ray Public Methods ¡ Ray Public Data ¡ }; A ray is a semi-inﬁnite line speciﬁed by its origin and direction. We represent a Ray with a Point for the origin, and a Vector for the direction. A ray is denoted as r; it has origin o r and direction d r , as shown in Figure 2.6. ¡ ¡ Because we will be referring to these variables often throughout the code, the origin and direction members of a Ray are named simply o and d. Ray Public Data ¢ £¡ Point o; Vector d; Notice that we again choose to make the data publicly available for convenience. The parametric form of a ray expresses it as a function of a scalar value t, giving the set of points that the ray passes through: rt ¢ ¡ or ¡ td r ¡ 0 t ∞ (2.4.3) 36 Geometry and Transformations [Ch. 2 z d O y x Figure 2.6: A ray is a semi-inﬁnite line deﬁned by its origin o r and its direction ¡ dr. ¡ The Ray also includes ﬁelds to restrict the ray to a particular segment along its inﬁnite extent. These ﬁelds, called mint and maxt, allow us to restrict the ray to a Point 33 potentially ﬁnite segment of points r mint r maxt . Notice that these ﬁelds are ¡ ¡ ¡¡ RAY EPSILON 37 declared as mutable, meaning that they can be changed even if the Ray structure Vector 27 that contains them is const. why is this useful? Ray Public Data ¡¡ ¢ mutable Float mint, maxt; For simulating motion blur, each ray may have a unique time value associated with it. The rest of the renderer is responsible for constructing a representation of the scene at the appropriate time for each ray. Ray Public Data ¡¡ ¢ Float time; Constructing Rays is straightforward. The default constructor relies on the Point and Vector constructors to set the origin and direction to 0 0 0 . Alternately, a ¡ ¡ ¡ particular point and direction can be provided. Also note that mint is initialized to a small constant rather than 0. The reason for this is discussed in Section XXX–it is a classic ray tracing hack to avoid false self-intersections due to ﬂoating point precision limitations. It’s weird that the default ray constructor makes a degenerate ray with direction (0,0,0). Should we either ﬁx this or say something? Ray Public Methods ¢ £¡ Ray(): mint(RAY_EPSILON), maxt(FLT_MAX), time(0.f) {} Ray(const Point &origin, const Vector &direction, Float start = RAY_EPSILON, Float end = FLT_MAX, Float t = 0.f) : o(origin), d(direction), mint(start), maxt(end), time(t) { } The constant to use for the initial mint is arbitrary; no single constant will solve the false self-intersection problem. It needs to be small enough to not miss true intersections, but large enough to overcome most precision errors. For any given Sec. 2.4] Rays 37 constant, it is easy to construct a scene that will not work. There are more sophis- ticated techniques for solving the false self-intersection problem; see BLAH AND BLAH. Global Constants ¢ £¡ #define RAY_EPSILON 1e-3f Because a ray can be thought of as a function of a single parameter t, the Ray class overloads the function application operator for rays. This way, when we need to ﬁnd the point at a particular position along a ray, we can write code like: Ray r(Point(0,0,0), Vector(1,2,3)); Point p = r(1.7); Ray Public Methods ¡¡ ¢ Point operator()(Float t) const { return o + d * t; } 2.4.1 Ray differentials In order to be able perform better anti-aliasing with the texture functions deﬁned in Chapter 11, lrt keeps track of some additional information with each camera ray that it traces. In Section 11.2, this information will be used by the Texture class 3336 Point Ray to estimate the projected area on the image plane of a part of the scene. From this, 394 Texture the Texture can compute the texture’s average value over that area, leading to a 27 Vector better ﬁnal image. A RayDifferential is a subclass of Ray that merely carries along additional information about two auxiliary rays. These two extra rays represent camera rays offset one pixel in the x and y direction from the main ray. By determining the area that these three rays project to on an object being shaded, the Texture can estimate an area to average over for proper anti-aliasing. Because the RayDifferential class inherits from Ray, geometric interfaces in the system are written to take const Ray & values, so that either a Ray or RayDifferential can be passed to them and the routines can just treat either as a Ray. Only the routines related to anti-aliasing and texturing require RayDifferential parameters. Geometry Declarations ¡¡ ¢ class COREDLL RayDifferential : public Ray { public: RayDifferential Methods ¡ RayDifferential Public Data ¡ }; RayDifferential Methods ¢ £¡ RayDifferential() { hasDifferentials = false; } RayDifferential(const Point &org, const Vector &dir) : Ray(org, dir) { hasDifferentials = false; } Note that we again use the explicit keyword to prevent Rays from accidentally being converted to RayDifferentials. The constructor sets hasDifferentials 38 Geometry and Transformations [Ch. 2 to false initially, because the neighboring rays are not yet known. These ﬁelds are initialized by the renderer’s main loop, in the code fragment Generate ray differentials for camera ray , on page 12. ¡ RayDifferential Methods ¡¡ ¢ explicit RayDifferential(const Ray &ray) : Ray(ray) { hasDifferentials = false; } RayDifferential Public Data ¢ £¡ bool hasDifferentials; Ray rx, ry; ¨ ¡ ¤ £ £¥ ¥ ¢ ¤ ¨ ¡ ¥ § ¡ ¤ ¥ ¨ ¡ §¡ ¢ ¦ ¤ ¥ ¡ Why isn’t the naming in 3D consistent with the naming in 2D? We should ﬁx this. Geometry Declarations ¡¡ ¢ class COREDLL BBox { public: Ray 36 RayDifferential 37 BBox Public Methods ¡ BBox Public Data ¡ }; The scenes that lrt will render will often contain objects that are computa- tionally expensive to process. For many operations, it is often useful to have a three-dimensional bounding volume that encloses an object. If a ray does not pass through a particular bounding volume, lrt can avoid processing all of the objects inside of it. The measurable beneﬁt of this technique is related to two factors: the expense of processing the bounding volume compared to the expense of processing the objects inside of it, and the tightness of the ﬁt. If we have a very loose bound around an object, we will often incorrectly determine that its contents need to be examined further. However, in order to make the bounding volume a closer ﬁt, it may be necessary to make the volume a complex object itself, and the expense of processing it increases. There are many choices for bounding volumes; we will be using axis-aligned bounding boxes (AABBs). Other popular choices are spheres and oriented bound- ing boxes (OBBs). An AABB can be described by one of its vertices and three lengths, each representing the distance spanned along the x, y, and z coordinate axes. Alternatively, two opposite vertices of the box describe it. We chose the two-point representation for lrt’s BBox class (WHY); it stores the positions of the vertex with minimum x, y, and z values, and the one with maximum x, y, and z. A 2D illustration of a bounding box and its representation is shown in Figure 2.7. The default BBox constructor sets the extent to be degenerate; by violating the invariant that pMin.x <= pMax.x, etc., it ensures than any operations done with this box will have the correct result for a completely empty box. Sec. 2.5] Three-dimensional bounding boxes 39 y pMax A pMin x Figure 2.7: An example axis-aligned bounding box. The BBox stores only the coordinates of the minimum and maximum points of this box; all other box corners are implicit in this representation. BBox Public Methods ¢ £¡ BBox() { pMin = Point( INFINITY, INFINITY, INFINITY); pMax = Point(-INFINITY, -INFINITY, -INFINITY); } 38 BBox BBox Public Data £¡ ¢ 678 INFINITY Point pMin, pMax; 33 Point It is also useful to be able to initialize a BBox to enclose a single point. BBox Public Methods ¡¡ ¢ BBox(const Point &p) : pMin(p), pMax(p) { } If the caller passes two corner points (p1 and p2) to deﬁne the box, since p1 and p2 are not necessarily chosen so that p1.x <= p2.x etc, we need to ﬁnd their minimum and maximum values component-wise. BBox Public Methods ¡¡ ¢ BBox(const Point &p1, const Point &p2) { pMin = Point(min(p1.x, p2.x), min(p1.y, p2.y), min(p1.z, p2.z)); pMax = Point(max(p1.x, p2.x), max(p1.y, p2.y), max(p1.z, p2.z)); } Given a bounding box and a point, the BBox::Union() methods computes and returns a new bounding box that encompasses that point as well as the space that the original box encompassed. 40 Geometry and Transformations [Ch. 2 BBox Method Deﬁnitions ¢ £¡ BBox Union(const BBox &b, const Point &p) { BBox ret = b; ret.pMin.x = min(b.pMin.x, p.x); ret.pMin.y = min(b.pMin.y, p.y); ret.pMin.z = min(b.pMin.z, p.z); ret.pMax.x = max(b.pMax.x, p.x); ret.pMax.y = max(b.pMax.y, p.y); ret.pMax.z = max(b.pMax.z, p.z); return ret; } We can similarly construct a new box that bounds the space encompassed by two other bounding boxes. The deﬁnition of this function is similar to the Union() method above that takes a Point; the difference is the pMin and pMax of the second box are used for the min() and max() tests, respectively. BBox Public Methods ¡¡ ¢ friend BBox Union(const BBox &b, const BBox &b2); We can easily determine if two BBoxes overlap by seeing if their extents overlap in x, y, and z. BBox 38 BBox::pMax 39 BBox Public Methods ¡¡ ¢ BBox::pMin 39 bool Overlaps(const BBox &b) const { Point 33 bool x = (pMax.x >= b.pMin.x) && (pMin.x <= b.pMax.x); Vector 27 bool y = (pMax.y >= b.pMin.y) && (pMin.y <= b.pMax.y); bool z = (pMax.z >= b.pMin.z) && (pMin.z <= b.pMax.z); return (x && y && z); } Three simple 1D containment tests tell us if a given point is inside the bounding box. BBox Public Methods ¡¡ ¢ bool Inside(const Point &pt) const { return (pt.x >= pMin.x && pt.x <= pMax.x && pt.y >= pMin.y && pt.y <= pMax.y && pt.z >= pMin.z && pt.z <= pMax.z); } The BBox::Expand() method pads the bounding box by a constant factor, and BBox::Volume() returns the volume of the space inside the box. BBox Public Methods ¡¡ ¢ void Expand(Float delta) { pMin -= Vector(delta, delta, delta); pMax += Vector(delta, delta, delta); } BBox Public Methods ¡¡ ¢ Float Volume() const { Vector d = pMax - pMin; return d.x * d.y * d.z; } Sec. 2.6] Transformations 41 The Bbox::MaximumExtent() method tells the caller which of the three axes is longest. This is very useful, for example, when deciding along which axis to subdivide when building a k-D tree. BBox Public Methods ¡¡ ¢ int MaximumExtent() const { Vector diag = pMax - pMin; if (diag.x > diag.y && diag.x > diag.z) return 0; else if (diag.y > diag.z) return 1; else return 2; } Finally, the BBox provides a method that returns the center and radius of a sphere that bounds the bounding box. In general, this may give a far looser ﬁt than a sphere that bounded the original contents of the BBox directly, though it’s a useful method to have available. For example, in chapter 15, we use this method to get a sphere that completely bounds the scene in order to generate a random ray that is likely to intersect the scene geometry. 38 BBox Maybe this method should move to that chapter? 39 BBox::pMax BBox Method Deﬁnitions ¡¡ ¢ 39 BBox::pMin 33 Point void BBox::BoundingSphere(Point *c, Float *rad) const { 27 Vector *c = .5f * pMin + .5f * pMax; *rad = Distance(*c, pMax); } ¨ ¡ ¤ ¡ § ¤ ¤ ¡ § £ £ ¢ £ £ In general, a transformation T can be described as a mapping from points to points and from vectors to vectors: p ¤ ¢ Tp ¡ v ¤ ¢ Tv ¡ The transformation T may be an arbitrary procedure. However, we will consider a subset of all possible transformations in this chapter. In particular, they will be: Linear: If T is an arbitrary linear transformation and s is an arbitrary scalar, then T sv ¢ ¡sT v and T v1 v2 ¡ T v1 T v2 . These two properties ¢ ¡ ¡ ¡ can greatly simplify reasoning about transformations. Continuous: roughly speaking, T maps the neighborhoods around p and v to ones around p and v . ¤ ¤ One-to-one and invertible: for each p, T maps p to a single unique p . Fur- ¤ thermore, there exists an inverse transform T 1 that maps p back to p. ¥ ¤ We will often want to take a point, vector, or normal deﬁned with respect to one coordinate frame and ﬁnd its coordinate values with respect to another frame. 42 Geometry and Transformations [Ch. 2 Using basic properties of linear algebra, a 4x4 matrix can express the linear trans- formation of a point or vector from one frame to another. Furthermore, such a 4x4 matrix sufﬁces to express all linear transformations of points and vectors within a ﬁxed frame, such as translation in space or rotation around a point. Therefore, there are two different (and incompatible!) ways that a matrix can be interpreted: 1. Transformation of the frame: given a point, the matrix could express how to compute a new point in the same frame that represents the transformation of the original point (e.g. by translating it in some direction.) 2. Transformation from one frame to another: a matrix can express how a point in a new frame is computed given a point in an original frame. In general, transformations make it possible to work in the most convenient coordinate space. For example, we can write routines that deﬁne a virtual camera assuming that the camera is located at the origin, looks down the z axis, and has the y axis pointing up and the x axis pointing right. These assumptions greatly simplify the camera implementation. Then to place the camera at any point in the scene looking in any direction, we just construct a transformation that maps points in the scene’s coordinate system to the camera’s coordinate system. 2.6.1 Homogeneous coordinates Given a frame deﬁned by p v1 v2 v3 , there is ambiguity between the represen- ¡ ¡ ¡ ¡ tation of a point px py pz and a vector vx vy vz with the same x y z coordi- ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ nates. Using the representations of points and vectors introduced at the start of the chapter, we can write the point as the inner product s 1 s2 s3 1 v1 v2 v3 p T and the ¢ ¢ vector as the inner product s1 s2 s3 0 v1 v2 v3 p T These four-vectors of three si val- ¤ ¤ ¤ ¢ ¢ ues and a zero or one are homogeneous representations of the point and the vector. The fourth coordinate of the homogeneous representation is sometimes called the weight. For a point, its value can be any scalar other than zero: the homogeneous points 1 3 2 1 and 2 6 4 2 describe the same Cartesian point 1 3 2 . ¡ ¡ ¡ ¡ ¢ ¡ ¡ ¡ ¡ ¡ ¢ ¡ ¡ ¡ ¡ In general, homogeneous points obey the identity: x y z xyzw ¡ ¡ ¡ ¢ ¡ ¡ ¡ w w w ¡ We will use these facts to see how a transformation matrix can describe how points and vectors in one frame can be mapped to another frame. Consider a matrix M that describes the transformation from one coordinate system to another: ¢£ ¦ §¥ £ m00 m01 m02 m03 ¦ m10 m11 m12 m13 M ¢ ¤ m20 m21 m22 m23 ¨ m30 m31 m23 m33 Then if the transformation represented by M is applied to the x axis 1 0 0 , we ¡ ¡ ¡ have: M 1000 T m00 m10 m20 m30 T ¢ ¢ ¢ ¤ Sec. 2.6] Transformations 43 Directly reading the columns of the matrix shows how the basis vectors and the origin of the current coordinate system are transformed by the matrix. x ¢ 1000 T ¢ y ¢ 0100 T ¢ z ¢ 0010 T ¢ p ¢ 0001 T ¢ In general, by characterizing how the basis is transformed, we know how any point or vector speciﬁed in terms of that basis is transformed. Because points and vectors in the current coordinate system are expressed in terms of the current coordinate system’s frame, applying the transformation to them directly is equivalent to ap- plying the transformation to the current coordinate system’s basis and ﬁnding their coordinates in terms of the transformed basis. We will not use homogeneous coordinates explicitly in our code; there is no Homogeneous class. However, the various transformation routines in the next sec- tion will implicitly convert points, vectors, and normals to homogeneous form, transform the homogeneous points, and then convert them back before return- ing the result. This isolates the details of homogeneous coordinates in one place (namely, the implementation of transformations), and leaves the rest of the system 675 Matrix4x4 clean. Transform Declarations ¢ £¡ class COREDLL Transform { public: Transform Public Methods ¡ private: Transform Private Data ¡ }; A transformation is represented by the elements of the matrix m[4][4], a Reference<> to a Matrix4x4 object. The automatic reference-counting template class Reference<> is described in Appendix A.3.2; it tracks how many objects hold a reference to the reference-counted object and automatically frees its memory when no more refer- ences are held. The low-level Matrix4x4 class is deﬁned in Appendix A.4.2. m is stored in row- order form, so element m[i][j] corrsponds to m i j , where i is the row number and j is the column number. For convenience, the Transform also stores the inverse of the matrix m in the Transform::m_inv member; it will be handy to have the inverse easily available. Note that it would be possible to compute the inverse of the matrix lazily, in case it is not needed. We don’t do this because in practice we ﬁnd that the inverse of the matrix is almost always needed. Also, most of the transformations in lrt explicitly provide their inverse, so actual matrix inversion is rarely required. Transform stores references to matrices rather than storing them directly, so that multiple Transforms can point to the same matrices. This means that any instance of the Transform class takes up very little memory on its own. If a huge number of shapes in the scene have the same object-to-world transformation, then they can all have their own Transform objects but share the same Matrix4x4s. Since we only 44 Geometry and Transformations [Ch. 2 store a pointer to the matrices and a reference count, a second transform that can re-use an existing matrix saves 72 bytes of storage over an implementation where each shape has its own Matrix4x4. This savings can be substantial in large scenes. However, we lose a certain amount of ﬂexibility by allowing matrices to be shared between transformations. Speciﬁcally, the elements of a Matrix4x4 cannot be modiﬁed after it is created. This isn’t a problem in practice, since the transfor- mations in a scene are typically created when lrt parses the scene decscription ﬁle and don’t need to change later at rendering time. Transform Private Data ¢ £¡ Reference<Matrix4x4> m, m_inv; 2.6.2 Basic operations When a new Transform is created, it will default to the identity transformation: the transformation that maps each point and each vector to itself. This is represented by the identity matrix: ¢£ ¦ §¥ £ 1 0 0 0 ¦ 0 1 0 0 I ¢ Matrix4x4 675 ¤ 0 0 1 0 ¨ Reference 664 0 0 0 1 Transform 43 Note that we rely on the default Matrix4x4 constructor to ﬁll in the identity matrix. Transform Public Methods ¢ £¡ Transform() { m = m_inv = new Matrix4x4; } We can also construct a Transform from a given matrix. In this case, we must explicitly invert the given matrix. Transform Public Methods ¡¡ ¢ Transform(Float mat[4][4]) { m = new Matrix4x4(mat[0][0], mat[0][1], mat[0][2], mat[0][3], mat[1][0], mat[1][1], mat[1][2], mat[1][3], mat[2][0], mat[2][1], mat[2][2], mat[2][3], mat[3][0], mat[3][1], mat[3][2], mat[3][3]); m_inv = m->Inverse(); } Transform Public Methods ¡¡ ¢ Transform(const Reference<Matrix4x4> &mat) { m = mat; m_inv = m->Inverse(); } Finally, the most commonly used constructor will simply take a reference to the transformation matrix along with an explictly provided inverse. This is far superior to always computing the inverse in the constructor, because many geometric trans- formations have very simple inverses and we can avoid the expense of computing a Sec. 2.6] Transformations 45 y z y, ∆ x, ∆ ∆ x Figure 2.8: Translation in 2D. generic 4 4 matrix inverse. Of course, this places the burden on the caller to make sure that the supplied inverse is correct. Transform Public Methods ¡¡ ¢ Transform(const Reference<Matrix4x4> &mat, const Reference<Matrix4x4> &minv) { m = mat; m_inv = minv; 675 Matrix4x4 } 664 Reference 43 Transform 2.6.3 Translations One of the simplest transformations is the translation T ∆x ∆y ∆z . When applied ¡ ¡ ¡ to a point p, it translates p’s coordinates by ∆x, ∆y, and ∆z, as shown in Figure 2.8. As an example, T 2 2 1 x y z ¡ ¡ x 2y 2z 1. ¡ ¡ ¡ ¢ ¡ ¡ ¡ ¡ The translation has some simple properties: T000 ¡ ¡ ¡ ¢ I T x 1 y1 z1 ¡ ¡ ¡ T x 2 y2 z2 ¡ ¡ ¡ ¢ T x1 x2 y1 ¡ y2 z1¡ z2 ¡ T x 1 y1 z1 ¡ ¡ ¡ T x 2 y2 z2 ¡ ¡ ¡ ¢ T x 2 y2 z2 ¡ ¡ ¡ T x 1 y1 z1 ¡ ¡ ¡ 1 T ¥ xyz ¡ ¡ ¡ ¢ T x ¡ ¡ y ¡ ¡ z ¡ Translation only affects points, leaving vectors unchaged. In matrix form, the translation transformation is: ¢£ ¦ §¥ £ 1 0 0 ∆x ¦ 0 1 0 ∆y T ∆x ∆y ∆z ¢ ¡ 0 0 1 ∆z ¡ ¡ ¤ ¨ 0 0 0 1 When we consider the operation of a translation matrix on a point, we see the value of homogeneous coordinates. Consider the product of the matrix for 46 Geometry and Transformations [Ch. 2 T ∆x ∆y ∆z with a point p in homogeneous coordinates x y z 1 : ¡ ¡ ¡ ¢ ¢£ ¦ ¢£ §¥ ¦ §¥ ¢£ ¦ §¥ £ 1 0 0 ∆x £ ¦ x ¦ £ x ∆x ¦ 0 1 0 ∆y y y ∆y ¤ 0 0 1 ∆z ¨ ¤ z ¨ ¢ ¤ z ∆z ¨ 0 0 0 1 1 1 As expected, we have computed a new point with its coordinates offset by ∆x ∆y ∆z . However, if we apply T to a vector v, we have: ¡ ¡ ¡ ¢£ ¦ ¢£ §¥ ¦ §¥ ¢£ ¦ §¥ £ 1 0 0 ∆x £ ¦ x ¦ £ x ¦ 0 1 0 ∆y y y ¤ 0 0 1 ∆z ¨ ¤ z ¨ ¢ ¤ z ¨ 0 0 0 1 0 0 The result is the same vector v. This makes sense, because vectors represent direc- tions, so a translation leaves them unchanged. We will deﬁne a routine that creates a new Transform matrix to represent a given translation–it is a straightforward application of the translation matrix equa- tion. These routines fully initialize the Transform that is returned, also initializing the matrix that represents the inverse of the translation. Matrix4x4 675 Transform 43 Transform Method Deﬁnitions ¡¡ ¢ Vector 27 Transform Translate(const Vector &delta) { Matrix4x4 *m, *minv; m = new Matrix4x4(1, 0, 0, delta.x, 0, 1, 0, delta.y, 0, 0, 1, delta.z, 0, 0, 0, 1); minv = new Matrix4x4(1, 0, 0, -delta.x, 0, 1, 0, -delta.y, 0, 0, 1, -delta.z, 0, 0, 0, 1); return Transform(m, minv); } 2.6.4 Scaling Another basic transformation is the scale transform. This has the effect of taking a point or vector and multiplying its components by scale factors in x, y, and z: S221 xyz ¡ 2x 2y z . It has the following basic properties: ¡ ¡ ¡ ¡ ¢ ¡ ¡ ¡ ¡ S111 ¡ ¡ ¡ ¢ I S x 1 y1 z1 ¡ ¡ ¡ S x 2 y2 z2 ¡ ¡ ¡ ¢ S x 1 x2 y1 y2 z1 z2 ¡ ¡ ¡ ¡ 1 1 1 1 S ¥ xyz ¡ ¢ S x y z ¡ ¡ ¡ ¡ ¡ We can differentiate between uniform scaling, where all three scale factors have the same value and non-uniform scaling, where they may have different values. Sec. 2.6] Transformations 47 The general scale matrix is ¢£ ¦ §¥ £ x 0 0 0 ¦ 0 y 0 0 Sxyz ¢ ¡ 0 0 z 0 ¡ ¡ ¤ ¨ 0 0 0 1 Transform Method Deﬁnitions ¡¡ ¢ Transform Scale(Float x, Float y, Float z) { Matrix4x4 *m, *minv; m = new Matrix4x4(x, 0, 0, 0, 0, y, 0, 0, 0, 0, z, 0, 0, 0, 0, 1); minv = new Matrix4x4(1.f/x, 0, 0, 0, 0, 1.f/y, 0, 0, 0, 0, 1.f/z, 0, 0, 0, 0, 1); return Transform(m, minv); } 675 Matrix4x4 43 Transform 2.6.5 X, Y, and Z axis rotations Another useful type of transformation is rotation. In general, we can deﬁne an arbitrary axis from the origin in any direction and then rotate around that axis by a given angle. The most common rotations of this type are around the x, y, and z coordinate axes. We will write these rotations as R x θ , or Ry θ , etc. The rotation ¡ ¡ around an arbitrary axis x y z is denoted by R x y z θ . ¡ ¡ ¡ ¡ ¡ Rotations also have some basic properties: Ra 0 ¡ ¢ I Ra θ1 ¡ Ra θ2 ¡ ¢ Ra θ1 θ2 ¡ Ra θ1 ¡ Ra θ2 ¡ ¢ Ra θ2 Ra θ1 ¡ ¡ Ra 1 θ ¥ ¡ ¢ Ra θ RT θ a ¢ ¡ ¡ where RT is the matrix transpose of R. This last property, that the inverse of R is equal to its transpose, stems from the fact that R is an orthonormal matrix; its upper 3 3 components are all normalized and orthogonal to each other. Fortunately, the transpose is much easier to compute than a full matrix inverse. For a left-handed coordinate system, the matrix for rotation around the x axis is: ¢£ ¦ §¥ £ 1 0 0 0 ¦ 0 cos θ sin θ 0 Rx θ ¢ ¡ ¤ 0 sin θ cos θ 0 ¨ 0 0 0 1 Figure 2.9 gives an intuition for how this matrix works. It’s easy to see that it leaves the x axis unchanged: Rx θ ¢¡ 1000 T ¢ ¢ 1000 ¢ 48 Geometry and Transformations [Ch. 2 Figure 2.9: Rotation by an angle θ about the x axis leaves the x coordinate un- changed. The y and z axes are mapped to the vertices given by the dashed lines; y and z coordinates move accordingly. It maps the y axis 0 1 0 to 0 cos θ sin θ and the z axis to 0 sin θ cos θ . In ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ general, by reading the columns of R x θ we can easily ﬁnd the vectors that the ¡ Matrix4x4 675 Transform 43 original coordinate axes transform to. The y and z axes remain in the same plane, perpendicular to the x axis, but are rotated by the given angle. An arbitrary point in space is similarly rotated about x while staying in the same yz plane as it was originally. The implementation of the RotateX() creation function is straightforward. Transform Method Deﬁnitions ¡¡ ¢ Transform RotateX(Float angle) { Float sin_t = sinf(Radians(angle)); Float cos_t = cosf(Radians(angle)); Matrix4x4 *m = new Matrix4x4(1, 0, 0, 0, 0, cos_t, -sin_t, 0, 0, sin_t, cos_t, 0, 0, 0, 0, 1); return Transform(m, m->Transpose()); } Similarly, for rotation around y and z, we have ¢£ ¦ §¥ ¢£ ¦ §¥ £ cos θ 0 sin θ 0 ¦ £ cos θ sin θ 0 0 ¦ 0 1 0 0 sin θ cos θ 0 0 Ry θ Rz θ ¢ ¡ ¤ sin θ 0 cos θ 0 ¨ ¢ ¡ ¤ 0 0 1 0 ¨ 0 0 0 1 0 0 0 1 The implementations of RotateY() and RotateZ() follow directly and are not included here. 2.6.6 Rotation around an arbitrary axis We also provide a routine to compute the transformation that represents rotation around an arbitrary axis. The usual derivation of this matrix is based on computing Sec. 2.6] Transformations 49 Figure 2.10: Rotation about an arbitrary axis a: ... rotations that map the given axis to a ﬁxed axis (e.g. z), performing the rotation there, and then rotating the ﬁxed axis back to the original axis. A more elegant derivation can be constructed with vector algebra. Consider a normalized direction vector a that gives the axis to rotate around by angle θ, and a vector v to be rotated (see Figure 2.10). First, we can compute the point p along the axis a that is in the plane through the end-point of v and is perpendicular to a. Assuming v and a form an angle α, we have: ¡ p a cos α av a ¢ ¤ ¥£ We now compute a pair of basis vectors v 1 and v2 in this plane. Trivially, one of them is v1 v p ¦ and the other can be computed with a cross product ¡ v2 v1 a § ¤ ¥£ Because a is normalized, v1 and v2 have the same length, equal to the distance from v to p. To now compute the rotation by θ degrees about the point p in the plane of rotation, the rotation formulas above give us v ©¨ p v1 cos θ v2 sin θ ¤ To convert this to a rotation matrix, we apply this formula to the basis vectors ¡ ¡ ¡ v1 1 0 0 , v2 £ 0 1 0 , and v3 £ 0 0 1 to get the values of the rows of the £ matrix. The result of all this is encapsulated in the function below. Should this say colums, not rows? 50 Geometry and Transformations [Ch. 2 Transform Method Deﬁnitions ¡¡ ¢ Transform Rotate(Float angle, const Vector &axis) { Vector a = axis.Hat(); Float s = sinf(Radians(angle)); Float c = cosf(Radians(angle)); Float m[4][4]; m[0][0] = a.x * a.x + (1.f - a.x * a.x) * c; m[0][1] = a.x * a.y * (1.f - c) - a.z * s; m[0][2] = a.x * a.z * (1.f - c) + a.y * s; m[0][3] = 0; m[1][0] = a.x * a.y * (1.f - c) + a.z * s; m[1][1] = a.y * a.y + (1.f - a.y * a.y) * c; m[1][2] = a.y * a.z * (1.f - c) - a.x * s; m[1][3] = 0; m[2][0] = a.x * a.z * (1.f - c) - a.y * s; m[2][1] = a.y * a.z * (1.f - c) + a.x * s; m[2][2] = a.z * a.z + (1.f - a.z * a.z) * c; Matrix4x4 675 Transform 43 m[2][3] = 0; Vector 27 m[3][0] = 0; m[3][1] = 0; m[3][2] = 0; m[3][3] = 1; Matrix4x4 *mat = new Matrix4x4(m); return Transform(mat, mat->Transpose()); } 2.6.7 The look-at transformation The look-at transformation is particularly useful for placing a camera in the scene. The caller speciﬁes the desired position of the camera, a point the camera is looking at, and an “up” vector that orients the camera along the viewing direction implied by the ﬁrst two parameters. All of these values are given in world-space coordi- nates. The look-at construction then gives a transformation between camera space and world space (see Figure 2.11). In order to ﬁnd the entries of the look-at transformation, we use principles de- scribed earlier in this section: the columns of a transformation matrix give the effect of the transformation on the basis of a coordinate system. Sec. 2.6] Transformations 51 Figure 2.11: Given an camera position, the position being looked at from the cam- era, and an “up” direction, the look-at transformation describes a transformation from a viewing coordinate system where the camera is at the origin looking down the z axis and the y axis is along the up direction. Transform Method Deﬁnitions ¡¡ ¢ Transform LookAt(const Point &pos, const Point &look, const Vector &up) { Float m[4][4]; Initialize fourth column of viewing matrix ¡ Initialize ﬁrst three columns of viewing matrix ¡ 675 Matrix4x4 676 Matrix4x4::Inverse() Matrix4x4 *camToWorld = new Matrix4x4(m); 33 Point return Transform(camToWorld->Inverse(), camToWorld); 43 Transform } 27 Vector The easiest column is the fourth one, which gives the point that the camera- space origin, 0 0 0 1 T , maps to in world space. This is clearly just the coordinates ¢ of the camera position, supplied by the user. Initialize fourth column of viewing matrix ¢ £¡ m[0][3] = pos.x; m[1][3] = pos.y; m[2][3] = pos.z; m[3][3] = 1; The other three columns aren’t much more difﬁcult. First, LookAt() computes the normalized direction vector from the camera location to the look-at point; this gives the vector coordinates that the z axis should map to and thus, the third column of the matrix. (Camera space is deﬁned with the viewing direction down the z axis.) The ﬁrst column, giving the world space direction that the x axis in camera space maps to, is found by taking the cross product of the user-supplied “up”’ vec- tor with the recently computed viewing direction vector. Finally, the “up” vector is recomputed by taking the cross product of the viewing direction vector with the x axis vector, thus ensuring that the y and z axes are perpendicular and we have an orthonormal viewing coordinate system. 52 Geometry and Transformations [Ch. 2 Initialize ﬁrst three columns of viewing matrix ¢ £¡ Vector dir = (look - pos).Hat(); Vector right = Cross(dir, up.Hat()); Vector newUp = Cross(right, dir); m[0][0] = right.x; m[1][0] = right.y; m[2][0] = right.z; m[3][0] = 0.; m[0][1] = newUp.x; m[1][1] = newUp.y; m[2][1] = newUp.z; m[3][1] = 0.; m[0][2] = dir.x; m[1][2] = dir.y; m[2][2] = dir.z; m[3][2] = 0.; © ¨ ¡ ¡ ¤ ¢ ¤ £ ¤ ¤ ¡ § £ ¢ Cross() 31 We can now deﬁne routines that perform the appropriate matrix multiplications Vector 27 Vector::Hat() 30 to transform points and vectors. We will overload the function application operator to describe these transformations; this lets us write code like: Point P = ...; Transform T = ...; Point new_P = T(P); 2.7.1 Points The point transformation routine takes a point x y z and implicitly represents it ¡ ¡ ¡ as the homogeneous column vector x y z 1 T . It then transforms the point by pre- ¡ ¡ ¡ ¢ multiplying this vector with its transformation matrix. Finally, it divides by w to convert back to a non-homogeneous point representation. For efﬁciency, it skips the divide by the homogeneous weight w when w 1, ¢ which is common for most of the transformations that we’ll be using–only the projective transformations deﬁned in Chapter 6 will require this divide. Sec. 2.7] Applying Transforms 53 Transform Inline Functions ¢ £¡ inline Point Transform::operator()(const Point &pt) const { Float x = pt.x, y = pt.y, z = pt.z; Float xp = m->m[0][0]*x + m->m[0][1]*y + m->m[0][2]*z + m->m[0][3]; Float yp = m->m[1][0]*x + m->m[1][1]*y + m->m[1][2]*z + m->m[1][3]; Float zp = m->m[2][0]*x + m->m[2][1]*y + m->m[2][2]*z + m->m[2][3]; Float wp = m->m[3][0]*x + m->m[3][1]*y + m->m[3][2]*z + m->m[3][3]; if (wp == 1.) return Point(xp, yp, zp); else return Point(xp, yp, zp)/wp; } We also provide transformation methods that let the caller pass in a pointer to an object for the result. This saves the expense of returning structures by value on the stack. Note that we copy the original x y z coordinates to local variables in case ¡ ¡ ¡ the result pointer points to the same location as pt. This way, these routines can 33 Point be used even if a point is being transformed in place. This is known as argument 43 Transform aliasing. why do we do this copy in the non-in-place version? Transform Inline Functions ¡¡ ¢ inline void Transform::operator()(const Point &pt, Point *ptrans) const { Float x = pt.x, y = pt.y, z = pt.z; ptrans->x = m->m[0][0]*x + m->m[0][1]*y + m->m[0][2]*z + m->m[0][3]; ptrans->y = m->m[1][0]*x + m->m[1][1]*y + m->m[1][2]*z + m->m[1][3]; ptrans->z = m->m[2][0]*x + m->m[2][1]*y + m->m[2][2]*z + m->m[2][3]; Float w = m->m[3][0]*x + m->m[3][1]*y + m->m[3][2]*z + m->m[3][3]; if (w != 1.) *ptrans /= w; } 2.7.2 Vectors We compute the transformations of vectors in a similar fashion. However, the multiplication of the matrix and the row vector is simpliﬁed since the implicit ho- mogeneous w coordinate is zero. 54 Geometry and Transformations [Ch. 2 (a) Original object (b) Scaled object (c) Scaled object with incorrect nor- with correct normal mal Figure 2.12: Transforming surface normals. The circle in (a) is scaled by 50% in the y direction. Note that simply treating the normal as a direction and scaling it in the same manner, as shown in (b), will lead to incorrect results. Transform Inline Functions ¡¡ ¢ inline Vector Transform::operator()(const Vector &v) const { Float x = v.x, y = v.y, z = v.z; Point 33 Transform 43 return Vector(m->m[0][0]*x + m->m[0][1]*y + m->m[0][2]*z, Vector 27 m->m[1][0]*x + m->m[1][1]*y + m->m[1][2]*z, m->m[2][0]*x + m->m[2][1]*y + m->m[2][2]*z); } There is also a method allowing the caller to pass a pointer to the result object. The code to do this has a similar design to the Point transformation code, and is not shown here. This code will also be omitted for subsequent transformation methods. 2.7.3 Normals Normals do not transform in the same way that vectors do, as shown in Figure 2.12. Although tangent vectors transform in the straightforward way, normals require special treatment. Because the normal vector n and any tangent vector t are or- thogonal by construction, we know that n t ¢ ¢ nT t ¢ 0 ¤ When we transform a point on the surface by some matrix M, the new tangent vector t at the transformed point is simply Mt. The transformed normal n should ¤ ¤ be equal to Sn for some 4 4 matrix S. To maintain the orthogonality requirement, we must have: T 0 ¢ ¡¤ n t ¤ T ¢ Sn Mt ¡ ¢ ¡ n T ST Mt This condition holds if ST M I, the identity matrix. Therefore, S T M 1 , so ¢ ¢ ¥ T S M 1 , and we see that normals must be transformed by the inverse transpose ¢ ¥ Sec. 2.7] Applying Transforms 55 of the transformation matrix. This is the main reason why Transforms maintain their inverses. Transform Public Methods ¡¡ ¢ Transform GetInverse() const { return Transform(m_inv, m); } Note that we do not explicitly compute the transpose of the inverse when trans- forming normals; we simply iterate through the inverse matrix in a different order (compare to the code for transforming Vectors). Transform Inline Functions ¡¡ ¢ inline Normal Transform::operator()(const Normal &n) const { Float x = n.x, y = n.y, z = n.z; return Normal(m_inv->m[0][0] * x + m_inv->m[1][0] * y + m_inv->m[2][0] * z, m_inv->m[0][1] * x + m_inv->m[1][1] * y + m_inv->m[2][1] * z, m_inv->m[0][2] * x + m_inv->m[1][2] * y + m_inv->m[2][2] * z); } 34 Normal 36 Ray 43 Transform 2.7.4 Rays 27 Vector Transforming rays is straightforward: we just transform the constituent origin and direction. Transform Inline Functions ¡¡ ¢ inline Ray Transform::operator()(const Ray &r) const { Ray ret; (*this)(r.o, &ret.o); (*this)(r.d, &ret.d); ret.mint = r.mint; ret.maxt = r.maxt; ret.time = r.time; return ret; } 2.7.5 Bounding Boxes The easiest way to transform an axis-aligned bounding box is to transform all eight of its corner vertices and then compute a new bounding box that encompasses those points. We will present code for this method below; one of the exercises for this chapter is to ﬁnd a way to do this more efﬁciently. 56 Geometry and Transformations [Ch. 2 Transform Method Deﬁnitions ¡¡ ¢ BBox Transform::operator()(const BBox &b) const { const Transform &M = *this; BBox ret( M(Point(b.pMin.x, b.pMin.y, b.pMin.z))); ret = Union(ret, M(Point(b.pMax.x, b.pMin.y, b.pMin.z))); ret = Union(ret, M(Point(b.pMin.x, b.pMax.y, b.pMin.z))); ret = Union(ret, M(Point(b.pMin.x, b.pMin.y, b.pMax.z))); ret = Union(ret, M(Point(b.pMin.x, b.pMax.y, b.pMax.z))); ret = Union(ret, M(Point(b.pMax.x, b.pMax.y, b.pMin.z))); ret = Union(ret, M(Point(b.pMax.x, b.pMin.y, b.pMax.z))); ret = Union(ret, M(Point(b.pMax.x, b.pMax.y, b.pMax.z))); return ret; } 2.7.6 Composition of Transformations Having deﬁned how the matrices representing individual types of transformations are constructed, we can now consider an aggregate transformation resulting from a series of individual transformations. Finally, we can see the real value of repre- BBox 38 senting transformations with matrices. Matrix4x4 675 Consider a series of transformations ABC. We’d like to compute a new transfor- Matrix4x4::Mul() 676 mation T such that applying T gives the same result as applying each of A, B, and Point 33 Reference 664 C in order; i.e. A B C p ¢ ¡¡¡ T p . Such a transformation T can be computed by ¡ Transform 43 multiplying the matrices of the transformations A, B, and C together. In lrt, we can write: Transform T = A * B * C; Then we can apply T to Points p as usual Point pp = T(p) instead of apply- ing each transformation in turn: Point pp = A(B(C(p))). We use the C++ * operator to compute the new transformation that results from post-multiplying the current transformation with a new transformation t2. In ma- trix multiplication, the i j th element of the resulting matrix ret is the inner prod- ¡ ¡ uct of the ith row of the ﬁrst matrix with the jth column of the second. The inverse of the resulting transformation is equal to the product of t2.m inv * m inv; this is a result of the matrix identity 1 AB ¡ ¥ ¢ B 1A ¥ ¥ 1 ¤ Transform Method Deﬁnitions ¡¡ ¢ Transform Transform::operator*(const Transform &t2) const { Reference<Matrix4x4> m1 = Matrix4x4::Mul(m, t2.m); Reference<Matrix4x4> m2 = Matrix4x4::Mul(t2.m_inv, m_inv); return Transform(m1, m2); } Sec. 2.8] Differential Geometry 57 2.7.7 Transformations and Coordinate System Handedness Certain types of transformations change a left-handed coordinate system into a right-handed one, or vice-versa. Some routines will need to know if the handedness of the source coordinate system is different from that of the destination. In partic- ular, routines that want to ensure that a surface normal always points “outside” of a surface might need to invert the normal after transformation if the handedness changes. See section 2.8 for an example. Fortunately, it is easy to tell if handedness changes. This happens only when the determinant of the transformation’s upper-left 3 3 submatrix is negative. Transform Method Deﬁnitions ¡¡ ¢ bool Transform::SwapsHandedness() const { Float det = ((m->m[0][0] * (m->m[1][1] * m->m[2][2] - m->m[1][2] * m->m[2][1])) - (m->m[0][1] * (m->m[1][0] * m->m[2][2] - m->m[1][2] * m->m[2][0])) + (m->m[0][2] * (m->m[1][0] * m->m[2][1] - 63 Shape m->m[1][1] * m->m[2][0]))); 43 Transform return det < 0.f; } ¨ £¡ ¢ ¥ ¤ ¥ ¥ © § ¤§¥ £ £ ¡ ¤ £ ¥ £ We will wrap up this chapter by developing a self-contained representation for the geometry of a particular point on a surface (typically the point of a ray inter- section). This abstraction needs to hide the particular type of geometric shape the point lies on, allowing the shading and geometric operations in the rest of lrt to be implemented generically, without the need to distinguish between different shape types such as spheres and triangles. The information needed to do this includes: The 3D point p. The surface normal n at the point. u v coordinates from the parameterization of the surface. ¡ ¡ The parametric partial derivatives ∂p ∂u and ∂p ∂v. ¡ ¡ The partial derivatives of the change in surface normal ∂n ∂u and ∂n ∂v. ¡ ¡ A pointer to the Shape that the differential geometry lies on; the shape class will be introduced in the next chapter. See Figure 2.13 for a depiction of these values. This representation assumes that shapes have a parametric description–i.e. that for some range of u v values, points on the surface are given by some function f ¡ ¡ 58 Geometry and Transformations [Ch. 2 N dPdu,S P dPdv T Figure 2.13: The local differential geometry around a point p. The tangent vectors s and t are orthogonal vectors in the plane that is tangent to the surface at p. The parametric partial derivatives of the surface, ∂p ∂u and ∂p ∂v, also lie in the tan- gent plane but are not necessarily orthogonal. The surface normal n, is given by the cross product of ∂p ∂u and ∂p ∂v. The vectors ∂n ∂u and ∂n ∂v (not shown Normal 34 here) record the differential change in surface normal as we move in u and v along Point 33 the surface. Shape 63 Vector 27 ¡ such that P f u v . Although this isn’t true for all shapes, all of the shapes that £ lrt supports do have at least a local parametric description, so we will stick with the parametric representation since this assumption will be helpful to us elsewhere (e.g. for anti-aliasing of textures in Chapter 11.) ¡ DifferentialGeometry Declarations £ ¤¢ struct DifferentialGeometry { ¡ DifferentialGeometry() { u = v = 0.; shape = NULL; } ¡ DifferentialGeometry Public Methods ¢ DifferentialGeometry Public Data ¢ }; DifferentialGeometry::nn is a unit-length version of the same normal. ¡ DifferentialGeometry Public Data £ ¤¢ Point p; Normal nn; Float u, v; const Shape *shape; We also need to store the partial derivatives of the surface parameterization and the surface normal. ¡ DifferentialGeometry Public Data ¢ £ Vector dpdu, dpdv; Vector dndu, dndv; The DifferentialGeometry constructor only needs a few parameters–the point Sec. 2.8] Differential Geometry 59 of interest, the partial derivatives, and the u v coordinates. It computes the nor- ¡ ¡ mal as the cross product of the partial derivatives and initializes s to be the nor- malized ∂p ∂u vector. It then computes t by crossing s with n, which gives us a ¡ vector that is perpendicular to both of them and thus lies in the tangent plane. DifferentialGeometry Method Deﬁnitions ¢ £¡ DifferentialGeometry::DifferentialGeometry(const Point &P, const Vector &DPDU, const Vector &DPDV, const Vector &DNDU, const Vector &DNDV, Float uu, Float vv, const Shape *sh) : p(P), dpdu(DPDU), dpdv(DPDV), dndu(DNDU), dndv(DNDV) { Initialize DifferentialGeometry from parameters ¡ Adjust normal based on orientation and handedness ¡ } Initialize DifferentialGeometry from parameters ¢ £¡ nn = Normal(Cross(dpdu, dpdv)).Hat(); u = uu; v = vv; shape = sh; The surface normal has special meaning to lrt; we assume that for closed 31 Cross() shapes, the normal is oriented such that it points to the “outside” of the shape. 58 DifferentialGeometry 34 Normal For example, this assumption will be used later when we need to decide if a ray 33 Point is entering or leaving the volume enclosed by a shape. Furthermore, for geometry 63 Shape used as an area light source, light is emitted only from the side of the two-sided 27 Vector surface where the normal points; the other side is black. Because normals have this special meaning, lrt provides a way for the user to reverse the orientation of the normal, ﬂipping it to point in the opposite direction. The ReverseOrientation directive in lrt’s input ﬁle ﬂips the normal to point in the opposite, non-default direction. Therefore, we need to check if the given Shape has this ﬂag set, and switch the normal’s direction if so. One other factor plays into the orientation of the normal and must be accounted for here. If the Shape’s transformation matrix has switched the handedness of the object coordinate system from lrt’s default left handed coordinate system to a right handed one, we need to switch the orientation of the normal as well. To see this, consider a scale matrix S 1 1 1 . We would naturally expect this scale ¡ ¡ ¡ ¡ to switch the direction of the normal, though because we compute the normal by n ∂p ∂u ∂p ∂v, it can be shown that ¡ ¡ ¢ 1 ∂p ∂u s 1 1 1 ∂p ∂v ∂p ∂u ∂p ∂v ¡ ¡ ¡ ¡ s11 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¢ ¢ n ¢ s11 ¡ ¡ ¡ 1n ¡ ¤ Therefore, we also need to manually ﬂip the normal’s direction if the transfor- mation switches the handedness of the coordinate system, since the ﬂip won’t be accounted for by the computation of the normal’s direction using the cross product. We only swap the normal’s directon if one but not both of these two conditions is met; if both were met, their effect would cancel out. The exclusive or operation lets us easily test this condition. 60 Geometry and Transformations [Ch. 2 Adjust normal based on orientation and handedness ¢ £¡ if (shape->reverseOrientation ˆ shape->transformSwapsHandedness) nn *= -1.f; The functionality described in the text below is gone. This explanation needs to be moved to the BSDF code, which DOES do this stuff... It is useful to be able to transform direction vectors from world space to the coordinate frame deﬁned by the three basis directions s, t, and n. This maps the object’s surface normal to the direction 0 0 1 , and can help to simplify shading ¡ ¡ ¡ computations by letting us think of them in a standard coordinate system. It is easy to show that given three orthogonal vectors s, t, and n in world-space, the matrix M that transforms vectors in world space to the local differential geometry space is: ¢ ¥ ¢ ¥ sx sy sz s M tx ty tz ¢ ¤ t ¨ ¢ ¤ ¨ nx ny nz n To conﬁrm this yourself, consider the value of Mn s n t n n n . Since s, ¢ ¢ ¢ ¡ ¡ ¢ ¡ t, and n are all orthonormal, the x and y components of Mn are zero. Since n is normalized, n n 1. Thus, Mn ¢ ¢ ¢0 0 1 . In this case, we don’t need to ¡ ¡ ¡ Shape::reverseOrientation 64 compute the inverse transpose of M to transform normals (recall the discussion of :transformSwapsHandedness 64 transforming normals in Section 2.7.3 on page 54.) Because M is an orthonormal matrix (its rows and columns are mutually orthogonal and are normalized), its inverse is equal to its transpose, so it is its own inverse transpose already. The function that takes vectors back from local space to world space just imple- ments the transpose to invert M and does the appropriate dot products: ¥ £ § £ £ ¨ ¡ § ¥ ¢ DeRose, Goldman, and their collaborators have argued for an elegant “coor- dinate free” approach to describing vector geometry for graphics, where the fact that positions and directions happen to be represented by x y z coordinates with ¡ ¡ ¡ respect to a particular coordinate system is de-emphasized and where points and vectors themselves record which coordinate system they are expressed in terms of (Goldman 1985; DeRose 1989; Mann, Litke, and DeRose 1997). This makes it possible for a software layer to ensure that common errors like adding a vector in one coordinate system to a point in another coordinate system are transparently handled by transforming them to a common coordinate system ﬁrst. Schneider and Eberly’s Geometric Tools for Computer Graphics is strongly in- ﬂuenced by the coordinate-free approach and covers the topics of this chapter in much greater depth. It is also full of useful geometry for graphics (Schneider and Eberly 2003). A classic introduction to the topics of this chapter is Mathematical Elements for Computer Graphics by Rogers and Adams (Rogers and Adams 1990). Note that they use a row-vector representation of points and vectors, though, which means that our matrices would be transposed when expressed in their framework, and that they multiply points and vectors by matrices to transform them, pM, rather than multiplying matrices by points as we do, Mp. There are many excellent books on linear algebra and vector geometry. We have Exercises 61 found Lang’s (Lang 1986) and Buck’s (Buck 1978) to be good references on these respective topics. Homogeneous coordinates lead to projective geometry, an elegant framework for XXX. Stolﬁ’s book XXX (Stolﬁ 1991). o Akenine–M¨ ller and Haines for graphics-based introduction to linear algebra (Akenine- o M¨ ller and Haines 2002), lots of ray bounds stuff and ray–obb stuff. The subtleties of how normal vectors are transformed were ﬁrst widely under- stood after articles by Wallis and Turkowski (Wallis 1990; Turkowski 1990c). ¡ ¥ ¥ £ 2.1 (Jim Arvo) Find a more efﬁcient way to transform axis-aligned bounding boxes by taking advantage of the symmetries of the problem: because the eight corner points are linear combinations of three axis-aligned basis vec- tors and a single corner point, their transformed bounding box can be found much more efﬁciently than by the method we presented (Arvo 1990). 2.2 Instead of boxes, we could compute tighter bounds by using the intersections of many non-orthogonal slabs. Extend our bounding box class to allow the user to specify a bound comprised of arbitrary slabs. Axis-aligned bounding box Non-axis-aligned bounding box Arbitrary bounding slabs 2.3 Derive the matrices for rotation in a right-handed coordinate system. © § ¡ ¢ ¥ £ 130 Primitive 663 ReferenceCounted Shapes in lrt are the basic representations of geometry in a scene. Each spe- ciﬁc shape in lrt (sphere, triangle, cone, etc.) is a subclass of the Shape base class. Thus, we can provide a generic interface to shapes that hides details about the speciﬁc type of shape. This abstraction makes extending the geometric capabil- ities of the system quite straightforward; the rest of lrt doesn’t need to know what speciﬁc shape it is dealing with. The Shape class is purely geometric; it contains no information about the appearance of an object. The Primitive class, intro- duced in Chapter 1, holds additional information about a shape such as its material properties. The basic interface for Shapes is in the source ﬁle core/shape.h, and various common shape function deﬁnitions are in core/shape.cpp. ¦ §¨ ¥ § § ¥ £ ¡ ¦¤¢¥ § £ ¥ © ¢ The Shape class in lrt is reference counted. This means that lrt keeps track of the number of outstanding pointers to a particular shape, and only deletes the shape when that reference count goes to zero. Although not foolproof or completely automatic, this is a form of garbage collection which relieves us from having to worry about freeing memory at the wrong time. The ReferenceCounted class handles all of this for us; its implementation is presented in section A.3.2. Shape Declarations ¢ £¡ class COREDLL Shape : public ReferenceCounted { public: Shape Interface ¡ Shape Public Data ¡ }; ¡ ¢ 64 Shapes [Ch. 3 All shapes are deﬁned in object coordinate space; for example, all spheres are deﬁned in a coordinate system where the center of the sphere is at the origin. In or- der to place a sphere at another position in the scene, a transformation that describes the mapping from object space to world space must be provided. The Shape class stores both this transformation and its inverse. Shapes also take a boolean param- eter, reverseOrientation, that records whether their surface normal directions should be reversed from their default orientations. This is useful because the orien- tation of the surface normal is used to determine which side of a shape is “outside”. Its value is set via the ReverseOrientation statement in lrt input ﬁles. Shapes also store the result of the Transform::SwapsHandedness() call for their object to world transformation; this value is needed by the DifferentialGeometry con- structor each time a ray intersection is found, so lrt computes it once and stores the result. Shape Method Deﬁnitions ¢ £¡ Shape::Shape(const Transform &o2w, bool ro) : ObjectToWorld(o2w), WorldToObject(o2w.GetInverse()), reverseOrientation(ro), transformSwapsHandedness(o2w.SwapsHandedness()) { } BBox 38 Shape 63 Shape Public Data ¢ £¡ Transform 43 const Transform ObjectToWorld, WorldToObject; Transform::GetInverse() 55 const bool reverseOrientation, transformSwapsHandedness; nsform::SwapsHandedness() 57 3.1.1 Bounding Each Shape subclass must be capable of bounding itself with a bounding box. There are two different bounding methods. The ﬁrst, ObjectBound(), returns a bounding box in the shape’s object space, and the second, WorldBound(), returns a bounding box in world space. The implementation of the ﬁrst method is left up to each individual shape, but there is a default implementation of the second method which simply transforms the object bound to world space. Shapes that can easily compute a tighter world-space bound should override this method, however. An example of such a shape is a triangle; see Figure 3.1. Shape Interface ¡¡ ¢ virtual BBox ObjectBound() const = 0; Shape Interface ¡¡ ¢ virtual BBox WorldBound() const { return ObjectToWorld(ObjectBound()); } 3.1.2 Reﬁnement Not every shape needs to be capable of determining whether a ray intersects it. For example, a complex surface might ﬁrst be tessellated into triangles, which can then be intersected directly. Another possibility is a shape that is a place-holder for a large amount of geometry that is stored on disk. We could store just the ﬁlename Sec. 3.1] Basic Shape Interface 65 Figure 3.1: If we compute a world-space bounding box of a triangle by transform- ing its object-space bounding box to world space and then ﬁnding the bounding box that encloses the resulting bounding box, a sloppy bound may result (top). However, if we ﬁrst transform the triangle’s vertices from object space to world space and then bound those vertices (bottom), we can do much better. of the geometry ﬁle and the bounding box of the geometry inside of it in memory, and read the geometry in from disk only if a ray pierces the bounding box. The default implementation of the Shape::CanIntersect() function indicates that a shape can provide an intersection, so only shapes that are non-intersectable need to override this method. Shape Interface ¡¡ ¢ 664 Reference 63 Shape virtual bool CanIntersect() const { return true; } 658 vector If the shape can not be intersected directly, it must provide a Shape::Refine() method that splits the shape into a group of new shapes, some of which may be intersectable and some of which may need further reﬁnement. The default implementation of the Shape::Refine() method issues an error message; thus, shapes that are intersectable (which is the common case) do not have to pro- vide an empty instance of this method. lrt will never call Shape::Refine() if Shape::CanIntersect() returns true. Shape Interface ¡¡ ¢ virtual void Refine(vector<Reference<Shape> > &refined) const { Severe("Unimplemented Shape::Refine() method called"); } I think there should be more here, but not sure what to say. 3.1.3 Intersection The Shape class provides two different intersection routines. The ﬁrst, Shape::Intersect(), returns information about a single ray–shape intersection corresponding to the ﬁrst intersection in the mint maxt parametric range along the ray. The other, ¡ ¢ Shape::IntersectP() is a predicate function that determines whether or not an intersection occurs, without returning any details about the intersection itself. Some shapes may be able to provide a very efﬁcient implementation for IntersectP() that can determine whether an intersection exists without computing it at all. There are a few important things to keep in mind when reading (and writing) intersection routines: 66 Shapes [Ch. 3 The Ray structure contains Ray::mint and Ray::maxt variables which de- ﬁne a ray segment. Intersection routines should ignore any intersections that do not occur along this segment. If an intersection is found, its parametric distance along the ray should be stored in the pointer t hitp that is passed into the intersection routine. If multiple intersections are present, the closest one should be returned. Information about an intersection position is stored in the DifferentialGeometry structure, which completely captures the local geometric properties of a sur- face. This type will be used heavily throughout lrt, and it serves to cleanly isolate the geometric portion of the ray tracer from the shading and illumi- nation portions. The differential geometry class was deﬁned in Section 2.8 on page 57.1 The rays passed into intersection routines will be in world space, so shapes are responsible for transforming them to object space if needed for intersec- tion tests. The differential geometry returned should be in world space. Rather than making the intersection routines pure virtual functions, the Shape class provides default implementations of the intersect routines that print an error DifferentialGeometry 58 message if they are called. All Shapes that return true from Shape::CanIntersect() Ray 36 Shape 63 must provide implementations of these functions; those that return false can de- pend on lrt to not call these routines on non-intersectable shapes. If these were pure virtual functions, then each non-intersectable shape would have to implement a similar default function. Why not the obvious default IntersectP that just calls Intersect and throws away the resulting DifferentialGeometry? Shape Interface ¡¡ ¢ virtual bool Intersect(const Ray &ray, Float *t_hitp, DifferentialGeometry *dg) const { Severe("Unimplemented Shape::Intersect() method called"); return false; } virtual bool IntersectP(const Ray &ray) const { Severe("Unimplemented Shape::IntersectP() method called"); return false; } 1 Almost all ray tracers use this general idiom for returning geometric information about inter- sections with shapes. As an optimization, many will only partially initialize the intersection infor- mation when an intersection is found, storing just enough information so that the rest of the values can be computed later when actually needed. This approach saves work in the case where a closer intersection is later found with another shape. In our experience, the extra work to compute all the information isn’t substantial, and for renderers that have complex scene data management algorithms (e.g. discarding geometry from main memory when too much memory is being used and writing it to disk), the deferred approach may fail because the shape is no longer in memory. Sec. 3.1] Basic Shape Interface 67 3.1.4 Shading Geometry Some shapes (notably triangle meshes) supports the idea of having two types of differential geometry at a point on the surface: the true geometry, which accurately reﬂects the local properties of the surface, and the shading geometry, which may have normals and tangents that are different than the true differential geometry. For triangle meshes, the user can provide normal vectors and tangents at the vertices of the mesh which are interpolated to give normals and tangents at points across the faces of triangles. The GetShadingGeometry() method of the Shape returns the shading geome- try for DifferentialGeometry returned by the Intersect() routine. By default, the shading geometry matches the true geometry, so the default implementation just copies the true geometry. One subtlety is that an object to world transformation is passed to this routine; it is important that if it needs to transform data from its ob- ject space to world space as part of computing the shading geometry, it must use this transformation rather than the Shape::ObjectToWorld transformation. This is an artifact from how object instancing is implemented in lrt (See Section 4.1.2.) Shape Interface ¡¡ ¢ virtual void GetShadingGeometry(const Transform &obj2world, const DifferentialGeometry &dg, 58 DifferentialGeometry DifferentialGeometry *dgShading) const { 63 Shape *dgShading = dg; 43 Transform } 3.1.5 Surface Area In order to properly use Shapes as area lights, we need to be able to compute the surface area of a shape in object space. As with the intersection methods, this method will only be called for intersectable shapes. Shape Interface ¡¡ ¢ virtual Float Area() const { Severe("Unimplemented Shape::Area() method called"); return 0.; } 3.1.6 Sidedness Many rendering systems, particularly those based on scan-line or z-buffer algo- rithms, support the concept of shapes being “one-sided”; the shape is visible if seen from the front, but disappears when viewed from behind. In particular, If a geo- metric object is closed and always viewed from the outside, then the back-facing shapes can be discarded without changing the resulting image. This optimization can substantially improve the speed of these types of algorithms. The potential for improved performance is substantially reduced when using this technique with ray tracing, however, since we would need to perform the ray–object intersection be- fore determining the surface normal to do the backfacing test. Furthermore, it can lead to a physically inconsistent scene description if one-sided objects are not in 68 Shapes [Ch. 3 fact closed. (For example, a surface might block light when a shadow ray is traced from a light source to a point on another surface, but not if the shadow ray is traced in the other direction.) Therefore, lrt doesn’t support this feature. ¦ ¥ £ ©© ¨ ¡ ¥ Sphere Declarations ¢ £¡ class COREDLL Sphere: public Shape { public: Sphere Public Methods ¡ private: Sphere Private Data ¡ }; Spheres are a special case of a general type of surface called quadrics. Quadrics are surfaces described by quadratic polynomials in x, y, and z. They are the sim- plest type of curved surface that is useful to a ray tracer, and are an interesting introduction to more general ray intersection routines. lrt supports six types of quadrics: spheres, cones, disks (a special case of a cone), cylinders, hyperboloids, and paraboloids. Shape 63 Most mathematical surfaces can be described in one of two main ways: in im- plicit form and in parametric form. An implicit function describes a 3D surface as: f xyz 0 ¡ ¡ ¢ ¡ The set of all points (x, y, z) that fulﬁll this condition deﬁne the surface. For a unit sphere at the origin, the familiar implicit equation is x 2 y2 z2 1 0. Only the ¢ set of x y z one unit from the origin satisﬁes this constraint, giving us the unit ¡ ¡ ¡ sphere’s surface. Many surfaces can also be described parametrically using a function to map the 2D plane to 3D points on the surface. For example, a sphere can be described as a function of 2D spherical coordinates θ φ where θ ranges from 0 to π and φ ranges ¡ ¡ from 0 to 2π: x ¢ r sin θ cos φ y ¢ r sin θ sin φ z ¢ r cos θ We can transform this function f θ φ into a function f u v over 0 1 ¡ ¡ ¡ ¡ ¡ ¢ 2 with the substitution φ ¢ u φmax ¢ θ ¢ θmin v θmax ¢ θmin ¡ This form is particularly useful for texture mapping, where we can directly use the u v values to map a texture deﬁned over 0 1 2 to the sphere. ¡ ¡ ¡ ¢ As we describe the implementation of the sphere shape, we will make use of both the implicit and parametric descriptions of the shape, depending on which is a more natural way to approach the particular problem we’re facing. Sec. 3.2] Spheres 69 Figure 3.2: Basic setting for the sphere shape. It has a radius of r and XXX. A partial sphere may be described by specifying a maximum φ value. 3.2.1 Construction Our Sphere class speciﬁes a shape that is centered at the origin in object space; to place it elsewhere in the scene, the user must apply an appropriate transformation 677 Clamp() when specifying the sphere in the input ﬁle. 677 Radians() 63 Shape The radius of the sphere can have an arbitrary value, and the sphere’s extent 68 Sphere can be truncated in two different ways. First, minimum and maximum z values 43 Transform may be set; the parts of the sphere below and above these, respectively, are cut off. Second, if we consider the parameterization of the sphere in spherical coordinates we can set a maximum φ value. The sphere sweeps out φ values from 0 to the given φmax such that the section of the sphere with spherical φ values above this φ is also removed. Sphere Method Deﬁnitions ¢ £¡ Sphere::Sphere(const Transform &o2w, bool ro, Float rad, Float z0, Float z1, Float pm) : Shape(o2w, ro) { radius = rad; zmin = Clamp(min(z0, z1), -radius, radius); zmax = Clamp(max(z0, z1), -radius, radius); thetaMin = acosf(zmin/radius); thetaMax = acosf(zmax/radius); phiMax = Radians(Clamp(pm, 0.0f, 360.0f)); } Sphere Private Data ¢ £¡ Float radius; Float phiMax; Float zmin, zmax; Float thetaMin, thetaMax; 70 Shapes [Ch. 3 3.2.2 Bounding Computing a bounding box for a sphere is straightforward. We will use the values of zmin and zmax provided by the user to tighten up the bound when less than an entire sphere is being rendered. However, we won’t do the extra work to look at θmax and see if we can compute a tighter bounding box when that is less than 2π. This is left as an exercise. Sphere Method Deﬁnitions ¡¡ ¢ BBox Sphere::ObjectBound() const { return BBox(Point(-radius, -radius, zmin), Point( radius, radius, zmax)); } 3.2.3 Intersection The task of deriving an intersection test is simpliﬁed by the fact that the sphere is centered at the origin. However, if the sphere has been transformed to another position in world space, then we need to transform rays to object space before in- tersecting them with the sphere, using the world to object transformation. Once we have a ray in object space, we can go ahead and perform the intersection computa- BBox 38 DifferentialGeometry 58 tion in object space.2 Point 33 The entire intersection method is shown below. Ray 36 Shape::WorldToObject 64 Sphere Method Deﬁnitions ¡¡ ¢ Sphere 68 bool Sphere::Intersect(const Ray &r, Float *t_hitp, Sphere::radius 69 DifferentialGeometry *dg) const { Sphere::zmax 69 Sphere::zmin 69 Float phi; Point phit; Transform Ray to object space ¡ Compute quadratic sphere coefﬁcients ¡ Solve quadratic equation for t values ¡ Compute sphere hit position and φ ¡ Test sphere intersection against clipping parameters ¡ Find parametric representation of sphere hit ¡ Initialize DifferentialGeometry from parametric information ¡ Update t hitp for quadric intersection ¡ return true; } We start by transforming the given world-space ray to the sphere’s object space. The remainder of the intersection test will take place in that coordinate system. Transform Ray to object space ¢ £¡ Ray ray; WorldToObject(r, &ray); 2 This is something of a classic theme in computer graphics: by transforming the problem to a particular restricted case, we can more easily and efﬁciently do an intersection test (i.e. lots of math cancels out since the sphere is always at 0 0 0 . No overall generality is lost, since we can just ¡ ¡ ¢ apply an appropriate translation to the ray to account for spheres at other positions. Sec. 3.2] Spheres 71 If we have a sphere centered at the origin with radius r, its implicit representation is x2 y2 z2 r2 ¢ 0 ¤ By substituting the parametric representation of the ray (Equation 2.4.3) into the implicit sphere equation, we have: 2 2 2 or ¡ x td r¡ ¡ x or ¡ y td r ¡ y ¡ or ¡ z td r ¡ z ¢ ¢ r2 ¤ ¡ Note that all elements of this equation besides t are known values. The t values where the equation holds give the parametric positions along the ray where the implicit sphere equation holds and thus the points along the ray where it intersects the sphere. We can expand this equation and gather the coefﬁcients for a general quadratic in t: At 2 Bt C 0 ¢ ¤ where3 2 2 2 A ¢ dr ¡ x dr ¡ y dr ¡ z B ¢ 2 d r xo r ¡ ¡ x d r yo r ¡ ¡ y d r zo r ¡ ¡ ¡ z 69 Sphere::radius 2 2 2 2 C ¢ or ¡ x or¡ y or ¡ z r This directly translates to this fragment of source code. Compute quadratic sphere coefﬁcients ¢ £¡ Float A = ray.d.x*ray.d.x + ray.d.y*ray.d.y + ray.d.z*ray.d.z; Float B = 2 * (ray.d.x*ray.o.x + ray.d.y*ray.o.y + ray.d.z*ray.o.z); Float C = ray.o.x*ray.o.x + ray.o.y*ray.o.y + ray.o.z*ray.o.z - radius*radius; We know there are two possible solutions to this quadratic equation, giving zero, one, or two non-imaginary t values where the ray intersects the sphere: B £ ¤ B2 4AC t0 ¢ 2A B £ B2 4AC t1 ¢ 2A We provide a Quadratic() utility function that solves a quadratic equation, returning false if there are no real solutions and returning true and setting t0 and t1 appropriately if there are solutions. 3 Some raytracers require that the direction vector of a ray be normalized, meaning A 1 This can ¡ lead to subtle errors, however, if the caller forgets to normalize the ray direction. Of course, these errors can be avoided by normalizing the direction in the ray constructor, but this wastes effort when the provided direction is already normalized. To avoid this needless complexity, lrt never insists on vector normalization unless it is mathematically necessary. 72 Shapes [Ch. 3 Solve quadratic equation for t values ¢ £¡ Float t0, t1; if (!Quadratic(A, B, C, &t0, &t1)) return false; Compute intersection distance along ray ¡ Global Inline Functions ¢ £¡ inline bool Quadratic(Float A, Float B, Float C, Float *t0, Float *t1) { Find quadratic discriminant ¡ Compute quadratic t values ¡ } If the discriminant (B2 4AC) is negative, then there are no real roots and the ray must miss the sphere. Find quadratic discriminant ¢ £¡ Float discrim = B * B - 4.f * A * C; if (discrim < 0.) return false; Float rootDiscrim = sqrtf(discrim); The usual version of the quadratic equation can give poor numeric precision when B B2 4AC due to cancellation error. It can be rewritten algebraically £ ¡ to a more stable form: q t0 ¢ A C t1 ¢ q where ¢ ¤ 5B £ B2 4AC ¡ : B 0 q £ ¢ 5B £ B2 4AC : otherwise ¤ ¡ Compute quadratic t values ¢ £¡ Float q; if (B < 0) q = -.5f * (B - rootDiscrim); else q = -.5f * (B + rootDiscrim); *t0 = q / A; *t1 = C / q; if (*t0 > *t1) swap(*t0, *t1); return true; Given the two intersection t values, we need to check them against the ray seg- ment from mint to maxt. Since t0 is guaranteed to be less than t1 (and mint less than maxt), if t0 is greater than maxt or t1 is less than mint, then it is certain that both hits are out of the range of interest. Otherwise, t 0 is the tentative hit distance. If may be less than mint, however, in which case we ignore it and try t 1 . If that is also out of range, we have no valid intersection. If there is an intersection, thit holds the distance to the hit. Sec. 3.2] Spheres 73 Compute intersection distance along ray ¢ £¡ if (t0 > ray.maxt || t1 < ray.mint) return false; Float thit = t0; if (t0 < ray.mint) { thit = t1; if (thit > ray.maxt) return false; } 3.2.4 Partial Spheres Now that we have the distance along the ray to the intersection with a full sphere, we need to handle partial spheres, speciﬁed with clipped z or φ ranges. Intersections that are in clipped areas need to be ignored. We start by computing the object space position of the intersection, phit and the φ value for the hit point. Taking the parametric equations for the sphere, y r sin θ sin φ tan φ r sin θ cos φ x ¢ ¢ so φ ¢ arctan y . x 58 DifferentialGeometry 678 M PI Compute sphere hit position and φ ¢ £¡ 69 Sphere::phiMax phit = ray(thit); 69 Sphere::zmax 69 Sphere::zmin phi = atan2f(phit.y, phit.x); if (phi < 0.) phi += 2.f*M_PI; We remap the result of the C standard library’s atan2f function to a value be- tween 0 and 2π, to match the sphere’s original deﬁnition. We can now test the hit point against the speciﬁed minima and maxima for z and φ. If the t0 intersection wasn’t actually valid, we try again with t 1 . Test sphere intersection against clipping parameters ¢ £¡ if (phit.z < zmin || phit.z > zmax || phi > phiMax) { if (thit == t1) return false; if (t1 > ray.maxt) return false; thit = t1; Compute sphere hit position and φ ¡ if (phit.z < zmin || phit.z > zmax || phi > phiMax) return false; } At this point, we are sure that the ray hits the sphere, and we can ﬁll in the DifferentialGeometry structure. We compute parametric u and v values by scaling the previously-computed φ value for the hit to lie between 0 and 1 and by computing a θ value between 0 and 1 for the hit point, based on the range of θ values for the given sphere. Then, we compute the parametric partial derivatives ∂p ∂u and ∂p ∂v ¡ ¡ 74 Shapes [Ch. 3 Find parametric representation of sphere hit ¢ £¡ Float u = phi / phiMax; Float theta = acosf(phit.z / radius); Float v = (theta - thetaMin) / (thetaMax - thetaMin); Compute sphere ∂p ∂u and ∂p ∂v ¡ ¡ ¡ Compute sphere ∂n ∂u and ∂n ∂v ¡ ¡ ¡ Computing the partial derivatives of a point on the sphere is a short exercise in algebra. Using the parametric deﬁnition of the sphere, we have: x ¢ r sin θ cos φ ¢ r sin θmin v θmax θmin ¡¡ cos φmax u ¡ Consider the ﬁrst component of ∂p ∂u, ∂x ∂u: These equations could use a ¡ ¡ bit more explanation at each step, like what variable depends on what, which ones can be pulled out of the partial, etc ∂x ∂ r sin θ cos φ ∂u ∂u ¢ ¡ ∂ r sin θ cos φ ∂u ¢ ¡ Sphere::phiMax 69 Sphere::radius 69 ¢ r sin θ φmax sin φ ¡ Sphere::thetaMax 69 Sphere::thetaMin 69 Using a substitution based on the parametric deﬁnition of the sphere’s y coordinate, Vector 27 this simpliﬁes to ∂x ∂u φmax y ¡ ¢ ¤ Similarly ∂y ∂u φmax x ¡ ¢ ¡ and ∂z ∂u ¡ ¢ 0 ¤ A similar process gives us ∂p ∂v. ¡ ∂p φmax y φmax x 0 ∂u ¢ ¡ ¡ ¡ ∂p θmax θmin z cos φ z sin φ r sin θ ∂v ¢ ¡ ¡ ¡ ¡ ¡ Compute sphere ∂p ∂u and ∂p ∂v ¡ ¡ ¢ £¡ Float invzradius = 1.f / sqrtf(phit.x*phit.x + phit.y*phit.y); Float cosphi = phit.x * invzradius, sinphi = phit.y * invzradius; Vector dpdu(-phiMax * phit.y, phiMax * phit.x, 0); Vector dpdv = (thetaMax-thetaMin) * Vector(phit.z * cosphi, phit.z * sinphi, -radius * sinf(thetaMin + v * (thetaMax - thetaMin))); Sec. 3.2] Spheres 75 3.2.5 ***ADV***: Partial Derivatives of Normal Vectors It is useful to determine how the normal changes as we move along the surface in the u and v directions. For example, some of the anti-aliasing techniques in Chapter 10 will use this information. The differential changes in normal ∂n ∂u ¡ and ∂n ∂v are given by the Weingarten equations from differential geometry. They ¡ are: ∂n f F eG ∂p eF f E ∂p ∂u EG F 2 ∂u EG F 2 ∂v ¢ ∂n gF f G ∂p f F gE ∂p ∂v EG F 2 ∂u EG F 2 ∂v ¢ where E, F, and G are coefﬁcients of the ﬁrst fundamental form and are given by ∂p 2 ¡ ¡ ¡ ¡ E ∂u ¡ ¡ ¢ ¡ ¡ ∂p ∂p F ¢ ∂u ∂v ¢ ¡ ∂p 2 ¡ ¡ ¡ ¡ G ¤ ∂v ¡ ¡ ¢ ¡ ¡ These are easily computed with the ∂p ∂u and ∂p ∂v values found above. e, f , and ¡ ¡ g are coefﬁcients of the second fundamental form, ∂2 p e N ¢ ∂u2 ¢ ¡ ∂2 p f N ¢ ∂u∂v ¢ ¡ ∂2 p g N ¢ ¤ ∂v2 ¢ ¡ The two fundamental forms have basic connections with the local curvature of a surface; see a differential geometry textbook such as Gray’s (Gray 1993) for details. To ﬁnd e, f , and g, we need to compute the second order partial derivatives ∂2 p ∂u2 etc. ¡ For spheres, a little more algebra gives the required second derivatives: ∂2 p φ2 max x y 0 ∂u2 ¢ ¡ ¡ ¡ ∂2 p zmax zmin zφmax sin φ cos φ 0 ∂u∂v ¢ ¡ ¡ ¡ ¡ ¡ ∂2 p θmax θmin 2 xyz ∂v2 ¢ ¡ ¡ ¡ ¡ 76 Shapes [Ch. 3 Compute sphere ∂n ∂u and ∂n ∂v ¡ ¡ ¢ £¡ Vector d2Pduu = -phiMax * phiMax * Vector(phit.x, phit.y, 0); Vector d2Pduv = (zmax - zmin) * phit.z * phiMax * Vector(sinphi, -cosphi, 0.); Vector d2Pdvv = -(thetaMax - thetaMin) * (thetaMax - thetaMin) * Vector(phit.x, phit.y, phit.z); Compute coefﬁcients for fundamental forms ¡ Compute ∂n ∂u and ∂n ∂v from fundamental form coefﬁcients ¡ ¡ ¡ Compute coefﬁcients for fundamental forms ¢ £¡ Float E = Dot(dpdu, dpdu); Float F = Dot(dpdu, dpdv); Float G = Dot(dpdv, dpdv); Vector N = Cross(dpdu, dpdv); Float e = Dot(N, d2Pduu); Float f = Dot(N, d2Pduv); Float g = Dot(N, d2Pdvv); Compute ∂n ∂u and ∂n ∂v from fundamental form coefﬁcients ¡ ¡ ¢ £¡ Float invEGF2 = 1.f / (E*G - F*F); DifferentialGeometry 58 Vector dndu = (f*F - e*G) * invEGF2 * dpdu + Dot() 30 (e*F - f*E) * invEGF2 * dpdv; Shape::ObjectToWorld 64 Sphere::phiMax 69 Vector dndv = (g*F - f*G) * invEGF2 * dpdu + Sphere::thetaMax 69 (f*F - g*E) * invEGF2 * dpdv; Sphere::thetaMin 69 Sphere::zmax 69 Sphere::zmin 69 Vector 27 3.2.6 DifferentialGeometry Initialization Now that we have computed the surface parameterization and all the relevant par- tial derivatives, we can construct the DifferentialGeometry structure for this intersection. Initialize DifferentialGeometry from parametric information ¢ £¡ *dg = DifferentialGeometry(ObjectToWorld(phit), ObjectToWorld(dpdu), ObjectToWorld(dpdv), ObjectToWorld(dndu), ObjectToWorld(dndv), u, v, this); Since there is an intersection, we update the ray’s t hitp value to hold the hit distance along the ray, which was stored in thit. This will allow subsequent intersection tests to terminate early if the potential hit would be farther away than the existing intersection. Update t hitp for quadric intersection ¢ £¡ *t_hitp = thit; The sphere’s IntersectP() routine is almost identical to Sphere::Intersect(), but it does not ﬁll in the DifferentialGeometry structure. Because Intersect and IntersectP are always so closely related, we will not show IntersectP for the remaining shapes. Sec. 3.2] Spheres 77 Sphere Method Deﬁnitions ¡¡ ¢ bool Sphere::IntersectP(const Ray &r) const { Float phi; Point phit; Transform Ray to object space ¡ Compute quadratic sphere coefﬁcients ¡ Solve quadratic equation for t values ¡ Compute sphere hit position and φ ¡ Test sphere intersection against clipping parameters ¡ return true; } 3.2.7 Surface Area To compute the surface area of quadrics, we use a standard formula from integral calculus. If we revolve a curve y f x from y a to y b around the x axis, the ¢ ¡ ¢ ¢ surface area of the resulting swept surface is b 2 2π ¤ f x ¡¡ 1 f x ¡¡ ¤ dx ¡ a where f x denotes the derivative d f 4 . Since most of our surfaces of revolution 33 Point ¡ ¤ dx 36 Ray are only partially swept around the axis, we will instead use the formula: 68 Sphere b 69 Sphere::phiMax φmax ¤ 69 f x ¡ 1 ¡¡ ¤ f x 2 dx ¤ Sphere::radius a 69 Sphere::zmax Our sphere is a surface of revolution of a circular arc. So the function that 69 Sphere::zmin deﬁnes the proﬁle curve of the sphere is f x ¡ £¢ ¢ r2 x2 ¡ and its derivative is x f x ¢ ¡ ¤ ¤ £ r2 x2 Recall that the sphere is clipped at z min and zmax . The surface area is therefore z1 x2 A ¢ φmax ¤ ¢ r2 x2 1 dx z0 r2 x2 z1 ¢ φmax ¤ ¢ r2 x2 x2 dx z0 z1 ¢ φmax ¤ r dx z0 ¢ φmax r z1 z0 ¡ For the full sphere φmax 2π, zmin r and zmax r, so we have the standard ¢ ¢ ¢ formula A 4πr 2 , showing that our formula is correct. ¢ Sphere Method Deﬁnitions ¡¡ ¢ Float Sphere::Area() const { return phiMax * radius * (zmax-zmin); } 4 See Anton for a derivation (Anton, Bivens, and Davis 2001). 78 Shapes [Ch. 3 Figure 3.3: Basic setting for the cylinder shape. It has a radius of r and is covers a range of heights along the z-axis. A partial cylinder may be swept by specifying a maximum φ value. ¦ ¡¨ ¡ ¦ ©¡ ¥ ¨ £ Shape 63 cylinder.cpp* ¢ £¡ #include "shape.h" Cylinder Declarations ¡ Cylinder Method Deﬁnitions ¡ Cylinder Declarations ¢ £¡ class COREDLL Cylinder: public Shape { public: Cylinder Public Methods ¡ protected: Cylinder Private Data ¡ }; 3.3.1 Construction Another useful quadric is the cylinder; lrt provides cylinder Shapes that are cen- tered around the z axis. The user supplies a minimum and maximum z value for the cylinder, as well as a radius and maximum φ sweep value (See ﬁgure 3.3). In parametric form, a cylinder is described by the equations: φ ¢ u φmax x ¢ r cos φ y ¢ r sin φ z ¢ zmin v zmax zmin ¡ Sec. 3.3] Cylinders 79 Cylinder Method Deﬁnitions ¢ £¡ Cylinder::Cylinder(const Transform &o2w, bool ro, Float rad, Float z0, Float z1, Float pm) : Shape(o2w, ro) { radius = rad; zmin = min(z0, z1); zmax = max(z0, z1); phiMax = Radians(Clamp(pm, 0.0f, 360.0f)); } Cylinder Private Data ¢ £¡ Float radius; Float zmin, zmax; Float phiMax; 3.3.2 Bounding As we did with the sphere, we compute a conservative bounding box for the cylin- der using the z range but without taking into account the maximum φ. Cylinder Method Deﬁnitions ¡¡ ¢ BBox Cylinder::ObjectBound() const { 38 BBox 677 Clamp() Point p1 = Point(-radius, -radius, zmin); 78 Cylinder Point p2 = Point( radius, radius, zmax); 33 Point return BBox(p1, p2); 677 Radians() 63 Shape } 43 Transform 3.3.3 Intersection We can intersect a ray with a cylinder by substituting the ray equation into the cylinder’s implicit equation, similarly to the sphere case. The implicit equation for an inﬁnitely long cylinder centered on the z axis with radius r is x2 y2 r2 ¢ 0 ¤ Substituting the ray equation, 2.4.3, we have: 2 2 or ¡ x td r ¡ ¡ x or ¡ y td r ¡ y ¢ r2 ¡ When we expand this and ﬁnd the coefﬁcients of the quadratic equation At 2 Bt C, we get: 2 2 A ¢ dr ¡ x dr ¡ y B ¢ 2 d r xo r ¡ ¡ x d r yo r ¡ ¡ ¡ y 2 2 2 C ¢ or ¡ x or ¡ y r Compute quadratic cylinder coefﬁcients ¢ £¡ Float A = ray.d.x*ray.d.x + ray.d.y*ray.d.y; Float B = 2 * (ray.d.x*ray.o.x + ray.d.y*ray.o.y); Float C = ray.o.x*ray.o.x + ray.o.y*ray.o.y - radius*radius; 80 Shapes [Ch. 3 The solution process for the quadratic equation is similar for all quadric shapes, so some fragments from the Sphere intersection method will be re-used below. The fragments that are re-used from Sphere::Intersect() are marked with an arrow. Cylinder Method Deﬁnitions ¡¡ ¢ bool Cylinder::Intersect(const Ray &r, Float *t_hitp, DifferentialGeometry *dg) const { Float phi; Point phit; -> Transform Ray to object space ¡ Compute quadratic cylinder coefﬁcients ¡ -> Solve quadratic equation for t values ¡ Compute cylinder hit point and φ ¡ Test cylinder intersection against clipping parameters ¡ Find parametric representation of cylinder hit ¡ -> Initialize DifferentialGeometry from parametric information ¡ Cylinder 78 -> Cylinder::phiMax 79 Update t hitp for quadric intersection ¡ Cylinder::zmax 79 return true; Cylinder::zmin 79 DifferentialGeometry 58 } M PI 678 Point 33 Ray 36 3.3.4 Partial Cylinders Sphere 68 As with the sphere, we invert the parametric description of the cylinder to compute a φ value by inverting the x and y parametric equations to solve for φ. In fact, the result is the same as for the sphere. Compute cylinder hit point and φ ¢ £¡ phit = ray(thit); phi = atan2f(phit.y, phit.x); if (phi < 0.) phi += 2.f*M_PI; We now make sure that the hit is in the speciﬁed z range, and that the angle is acceptable. If not, we reject the hit and try with t 1 , if we haven’t already. Test cylinder intersection against clipping parameters ¢ £¡ if (phit.z < zmin || phit.z > zmax || phi > phiMax) { if (thit == t1) return false; thit = t1; if (t1 > ray.maxt) return false; Compute cylinder hit point and φ ¡ if (phit.z < zmin || phit.z > zmax || phi > phiMax) return false; } Again the u value is computed by scaling φ to lie between 0 and 1. Straightfor- ward inversion of the parametric equation for the cylinder’s z value gives us the v parametric coordinate. Sec. 3.3] Cylinders 81 Find parametric representation of cylinder hit ¢ £¡ Float u = phi / phiMax; Float v = (phit.z - zmin) / (zmax - zmin); Compute cylinder ∂p ∂u and ∂p ∂v ¡ ¡ ¡ Compute cylinder ∂n ∂u and ∂n ∂v ¡ ¡ ¡ The partial derivatives for a cylinder are quite easy to derive: they are ∂p φmax y φmax x 0 ∂u ¢ ¡ ¡ ¡ ∂p 0 0 zmax zmin ∂v ¢ ¡ ¡ ¡ Compute cylinder ∂p ∂u and ∂p ∂v ¡ ¡ ¢ £¡ Vector dpdu(-phiMax * phit.y, phiMax * phit.x, 0); Vector dpdv(0, 0, zmax - zmin); We again use the Weingarten equations to compute the parametric change in cylinder normal. The relevant partial derivatives are ∂2 p φ2 max x y 0 ∂u2 ¢ ¡ ¡ ¡ 78 Cylinder ∂2 p 79 Cylinder::phiMax 000 79 Cylinder::radius ∂u∂v ¢ ¡ ¡ ¡ 79 Cylinder::zmax ∂2 p 79 Cylinder::zmin 000 27 Vector ∂v2 ¢ ¡ ¡ ¡ Compute cylinder ∂n ∂u and ∂n ∂v ¡ ¡ ¢ £¡ Vector d2Pduu = -phiMax * phiMax * Vector(phit.x, phit.y, 0); Vector d2Pduv(0, 0, 0), d2Pdvv(0, 0, 0); Compute coefﬁcients for fundamental forms ¡ Compute ∂n ∂u and ∂n ∂v from fundamental form coefﬁcients ¡ ¡ ¡ 3.3.5 Surface Area A cylinder is just a rolled up rectangle. If you unroll the rectangle, its height is zmax zmin , and its width is rφmax : Cylinder Method Deﬁnitions ¡¡ ¢ Float Cylinder::Area() const { return (zmax-zmin)*phiMax*radius; } 82 Shapes [Ch. 3 Figure 3.4: Basic setting for the disk shape. The disk has radius r and is located at height h along the z-axis. A partial disk may be swept by specifying a maximum φ value. ﬁgure should use h not “height”. ¦ ¨ ¡ ¢ disk.cpp* ¢ £¡ Shape 63 #include "shape.h" Disk Declarations ¡ Disk Method Deﬁnitions ¡ Disk Declarations ¢ £¡ class COREDLL Disk : public Shape { public: Disk Public Methods ¡ private: Disk Private Data ¡ }; The disk is an interesting quadric since it has a particularly straightforward in- tersection routine that avoids solving the quadratic equation. In lrt, a Disk is a circular disk of radius r at height h along the z axis. In order to make partial disks, the caller may specify a maximum φ value beyond which the disk is cut off (Figure 3.4). In parametric form, it is described by: φ ¢ u φmax x ¢ r 1 v cos φ ¡ x ¢ r 1 v sin φ ¡ z ¢ h 3.4.1 Construction Sec. 3.4] Disks 83 Disk Method Deﬁnitions ¢ £¡ Disk::Disk(const Transform &o2w, bool ro, Float ht, Float r, Float tmax) : Shape(o2w, ro) { height = ht; radius = r; phiMax = Radians(Clamp(tmax, 0.0f, 360.0f)); } Disk Private Data ¢ £¡ Float height, radius, phiMax; 3.4.2 Bounding The bounding method is quite straightforward; we create a bounding box centered at the height of the disk along z, with extent of radius in both the x and y directions. Disk Method Deﬁnitions ¡¡ ¢ BBox Disk::ObjectBound() const { return BBox(Point(-radius, -radius, height), Point(radius, radius, height)); } 38 BBox 677 Clamp() 58 DifferentialGeometry 3.4.3 Intersection 82 Disk 33 Point Intersecting a ray with a disk is also quite easy. We intersect the ray with the z h 677 ¢ Radians() plane that the disk lies in and then see if the intersection point lies inside the disk. 36 63 Ray Shape Again, the re-used chunks are marked with an arrow. 43 Transform Disk Method Deﬁnitions ¡¡ ¢ bool Disk::Intersect(const Ray &r, Float *t_hitp, DifferentialGeometry *dg) const { -> Transform Ray to object space ¡ Compute plane intersection for disk ¡ See if hit point is inside disk radius and φ max ¡ Find parametric representation of disk hit ¡ -> Initialize DifferentialGeometry from parametric information ¡ -> Update t hitp for quadric intersection ¡ return true; } The ﬁrst step is to compute the parametric t value where the ray intersects the plane that the disk lies in. Using the same approach as we did for intersecting rays with boxes, we want to ﬁnd t such that the z component of the ray’s position is equal to the height of the disk. Thus, h ¢ or ¡ z t dr ¡ z 84 Shapes [Ch. 3 and h or ¡ z t ¢ drz ¡ We ﬁrst check whether the ray is parallel to the disk’s plane, in which case we re- port no intersection. We then see if t is inside the legal range of values mint maxt . ¡ ¢ If not, we can return false. whasssup with this magic constant? Make a “CloseToZero” function? Compute plane intersection for disk ¢ £¡ if (fabsf(ray.d.z) < 1e-7) return false; Float thit = (height - ray.o.z) / ray.d.z; if (thit < ray.mint || thit > ray.maxt) return false; We now compute the point phit where the ray intersects the plane. Once the plane intersection is known, we return false if the distance from the hit to the center of the disk is more than radius. We optimize this process by actually computing the squared distance to the center, taking advantage of the fact that the x and y coordinates of the center point 0 0 height are zero, and the z coordinate ¡ ¡ ¡ of phit is equal to height. Disk::height 83 See if hit point is inside disk radius and φ max ¢ £¡ Disk::phiMax 83 Point phit = ray(thit); Disk::radius 83 Float dist2 = phit.x * phit.x + phit.y * phit.y; M PI 678 Point 33 if (dist2 > radius * radius) Vector 27 return false; Test disk φ value against φmax ¡ If the distance check passes, we perform the ﬁnal test, making sure that the φ value of the hit point is between zero and φ max speciﬁed by the caller. Inverting the disk’s parameterization gives us the same expression for φ as the other quadric shapes. Test disk φ value against φmax ¢ £¡ Float phi = atan2f(phit.y, phit.x); if (phi < 0) phi += 2. * M_PI; if (phi > phiMax) return false; If we’ve gotten this far, we know that there is an intersection with the disk. The parameter u is scaled to reﬂect the partial disk speciﬁed by φ max and v is computed by inverting the parametric equation. The equations for the partial derivatives at the hit point can be derived with a similar process to that used for the previous quadrics. Because the normal of a disk is the same everywhere, the partial derivatives ∂n ∂u ¡ and ∂n ∂v are both trivially 0 0 0 . ¡ ¡ ¡ ¡ Find parametric representation of disk hit ¢ £¡ Float u = phi / phiMax; Float v = 1.f - (sqrtf(dist2) / radius); Vector dpdu(-phiMax * phit.y, phiMax * phit.x, 0.); Vector dpdv(-phit.x / (1-v), -phit.y / (1-v), 0.); Vector dndu(0,0,0), dndv(0,0,0); Sec. 3.5] Other Quadrics 85 3.4.4 Surface Area Disks have trivial surface area, since they’re just portions of a circle: φmax 2 A ¢ r 2 Disk Method Deﬁnitions ¡¡ ¢ Float Disk::Area() const { return phiMax * 0.5f * radius * radius; } ¦ £ ©§ ¡¤£ ¨ ¡ ¨ £¥ lrt supports three more quadrics: cones, paraboloids, and hyperboloids. They are implemented in the source ﬁles shapes/cone.cpp, shapes/paraboloid.cpp and shapes/hyperboloid.cpp. We won’t include their full implementations here, since the techniques used to derive their quadratic intersection coefﬁcients, para- metric coordinates and partial derivatives should now be familiar. However, we will brieﬂy describe the implicit and parametric forms of these shapes. 82 Disk 83 Disk::phiMax 3.5.1 Cones 83 Disk::radius The implicit equation of a cone centered on the z axis with radius r and height h is 2 2 hx hy 2 z h ¢ ¡ 0 ¤ r ¡ r ¡ Cones are also described parametrically: φ ¢ u φmax x ¢ r 1 v cos φ ¡ y ¢ r 1 v sin φ ¡ z ¢ vh The partial derivatives at a point on a cone are ∂p φmax y φmax x 0 ∂u ¢ ¡ ¡ ¡ ∂p x y h ∂v ¢ ¡ ¡ ¡ 1 v 1 v and the partial second derivatives are ∂2 p φ2 max x y 0 ∂u2 ¢ ¡ ¡ ¡ ∂2 p φmax y x0 ∂u∂v ¢ ¡ ¡ ¡ ¡ 1 v ∂2 p 000 ¤¡ ∂v2 ¢ ¡ ¡ 86 Shapes [Ch. 3 3.5.2 Paraboloids The implicit equation of a paraboloid centered on the z axis with radius r and height h is: hx2 hy2 z 0 ¢ r2 r2 and its parametric form is φ ¢ u φmax z ¢ v zmax zmin ¡ ¡ r ¢ rmax ¢ z zmax x ¢ r cos φ y ¢ r sin φ The partial derivatives are: ∂p φmax y φmax x 0 ∂u ¢ ¡ ¡ ¡ ∂p ¡ ¡ zmax zmin x z y z 1 ∂v ¢ ¡ ¡ ¡ ¡ and ∂2 p φ2 max x y 0 ∂u2 ¢ ¡ ¡ ¡ ∂2 p φmax zmax ¡ ¡ zmin y zx z0 ∂u∂v ¢ ¡ ¡ ¡ ¡ ∂2 p 2 x z2 y z2 0 ¡ ¡ 2 zmax zmin ∂v2 ¢ ¡ ¡ ¡ ¡ 3.5.3 Hyperboloids Finally, the implicit form of the hyperboloid is x2 y2 z2 ¢ 1 and the parametric form is φ ¢ u φmax xr ¢ 1 v x1 ¡ v x2 yr ¢ 1 v y1 ¡ v y2 x ¢ xr cos φ yr sin φ y ¢ xr sin φ yr cos φ z ¢ 1 v z1 ¡ v z2 The partial derivatives are: ∂p φmax y φmax x 0 ∂u ¢ ¡ ¡ ¡ ∂p x2 x1 cos φ y2 y1 sin φ x2 x1 sin φ y2 y1 cos φ z2 z1 ∂v ¢ ¡ ¡ ¡ ¡ ¡ ¡ ¡ and Sec. 3.6] Triangles and Meshes 87 ∂2 p φ2 max x y 0 ∂u2 ¢ ¡ ¡ ¡ ∂2 p φmax ∂y ∂v ∂x ∂v 0 ¡ ¡ ∂u∂v ¢ ¡ ¡ ¡ ∂2 p 000 ∂v2 ¢ ¡ ¡ ¡ ¦ ¨ ¡ ¤ § £ ¡ ¢ § ¥ §¡ ¨ ¡ ¥ ¥ trianglemesh.cpp* ¢ £¡ #include "shape.h" #include "paramset.h" TriangleMesh Declarations ¡ TriangleMesh Method Deﬁnitions ¡ TriangleMesh Declarations ¢ £¡ class COREDLL TriangleMesh : public Shape { public: 63 Shape TriangleMesh Public Methods ¡ protected: TriangleMesh Data ¡ }; The triangle is one of the most commonly used shapes in computer graphics. lrt supports triangle meshes, where a number of triangles are stored together so that their per-vertex data can be shared among multiple triangles. Single triangles are simply treated as degenerate meshes. The arguments to the TriangleMesh constructor are as follows: nt Number of triangles. nv Number of vertices. vi Pointer to an array of vertex indices. For the ith triangle, its three vertex positions are P[vi[3*i]], P[vi[3*i+1]], and P[vi[3*i+2]]. P Array of nv vertex positions. N An optional array of normal vectors, one per vertex in the mesh. If present, these are interpolated across triangle faces to compute the triangles shading differential geometry. S An optional array of tangent vectors, one per vertex in the mesh. These are also used to compute shading geometry. uv An optional array of a parametric u v value for each vertex. ¡ ¡ 88 Shapes [Ch. 3 We just copy the relevant information and store it in the TriangleMesh object. In particular, must make our own copies of vi and P, since the caller retains own- ership of the data being passed in. Triangles have a dual role among the primitives in lrt: not only are they a user-speciﬁed primitive, but other primitives may tessellate themselves into trian- gle meshes; for example, subdivision surfaces end up creating a mesh of triangles to approximate the smooth limit surface. Ray intersections are performed against these triangles, rather than directly against the subdivision surface. Because of this second role, it’s important that a routine that is creating a tri- angle mesh be able to specify the parameterization of the triangles. If a triangle was created by evaluating the position of a parametric surface at three particular u v coordinate values, for example, those u v values should be interpolated to ¡ ¡ ¡ ¡ compute the u v value at ray intersection points inside the triangle; hence the uv ¡ ¡ parameter. TriangleMesh Method Deﬁnitions ¢ £¡ TriangleMesh::TriangleMesh(const Transform &o2w, bool ro, int nt, int nv, const int *vi, const Point *P, const Normal *N, const Vector *S, const Float *uv) : Shape(o2w, ro) { Normal 34 ntris = nt; Point 33 nverts = nv; Shape 63 Transform 43 vertexIndex = new int[3 * ntris]; TriangleMesh 87 memcpy(vertexIndex, vi, 3 * ntris * sizeof(int)); Vector 27 Copy uv, N, and S vertex data, if present ¡ Transform mesh vertices to world space ¡ } The Copy uv, N, and S vertex data, if present fragment just allocates the ¡ appropriate amount of space and copies the data directly, if it is present. Its imple- mentation isn’t included here. TriangleMesh Data ¢ £¡ int ntris, nverts; int *vertexIndex; Point *p; Normal *n; Vector *s; Float *uvs; Unlike the other shapes that leave the primitive description in object space and then transform incoming rays from world space to object space, triangle meshes transform the shape into world space and save the work of transforming the in- coming rays into the object space or the intersection’s differential geometry out to world space. This is a good idea because this operation can be performed once at startup, avoiding transforming rays many times during rendering. Taking this with quadrics is be more complicated, though is possible—see the exercises for hints on how to do it. (Normal and s tangent vectors for shading geometry are left in ob- ject space, since the GetShadingGeometry() must transform them to world space with the transformation matrix supplied to that method, which may not necessarily be the one stored by the Shape.) Sec. 3.6] Triangles and Meshes 89 Transform mesh vertices to world space ¢ £¡ for (int i = 0; i < nverts; ++i) p[i] = ObjectToWorld(P[i]); The object-space bound of a triangle mesh is easily found by computing a bounding box that encompasses all of the vertices of the mesh. Because the vertex positions p were transformed to world space in the constructure, the implementa- tion here has to transform them back to object space before computing their bound. TriangleMesh Method Deﬁnitions ¡¡ ¢ BBox TriangleMesh::ObjectBound() const { BBox bobj; for (int i = 0; i < nverts; i++) bobj = Union(bobj, WorldToObject(p[i])); return bobj; } The TriangleMesh shape is one of the shapes that can usually compute a better world space bound than can be found by transforming its object-space bounding box to world space. Its world space bounce can be directly computed from the world-space vertices. TriangleMesh Method Deﬁnitions ¡¡ ¢ 38 BBox BBox TriangleMesh::WorldBound() const { 664 Reference 63 Shape BBox worldBounds; 64 Shape::ObjectToWorld for (int i = 0; i < nverts; i++) 64 Shape::reverseOrientation worldBounds = Union(worldBounds, p[i]); 64 Shape::WorldToObject 90 Triangle return worldBounds; 87 TriangleMesh } 88 TriangleMesh::ntris 88 TriangleMesh::nverts The TriangleMesh shape does not directly compute intersections. Instead, it 88 TriangleMesh::p splits itself into many separate Triangles, each representing a single triangle. All 40 Union() 658 vector of the individual reference the shared set of vertices in p, avoiding per-triangle replication of the shared data. It overrides the Shape::CanIntersect() method to indicate that TriangleMeshes cannot be intersected directly. TriangleMesh Public Methods ¡¡ ¢ bool CanIntersect() const { return false; } When lrt encounters a shape that cannot be intersected directly, it calls its Refine() method. Shape::Refine() is expected to produce a list of simpler shapes in the “refined” vector. The implementation here is simple; we just make a new Triangle for each of the triangles in the mesh. TriangleMesh Method Deﬁnitions ¡¡ ¢ void TriangleMesh::Refine(vector<Reference<Shape> > &refined) const { for (int i = 0; i < ntris; ++i) refined.push_back(new Triangle(ObjectToWorld, reverseOrientation, (TriangleMesh *)this, i)); } 90 Shapes [Ch. 3 3.6.1 Triangle TriangleMesh Declarations ¡¡ ¢ class COREDLL Triangle : public Shape { public: Triangle Public Methods ¡ private: Triangle Data ¡ }; The Triangle doesn’t store much data; just a pointer to the parent TriangleMesh that it came from and a pointer to its three vertex indices in the mesh. Triangle Public Methods ¢ £¡ Triangle(const Transform &o2w, bool ro, TriangleMesh *m, int n) : Shape(o2w, ro) { mesh = m; v = &mesh->vertexIndex[3*n]; } Note that the implementation stores a pointer to the ﬁrst vertex index, instead of storing three pointers to the vertices themselves. This reduces the amount of BBox 38 Point 33 storage required for each Triangle signiﬁcantly. Reference 664 Triangle Data¢ £¡ Shape 63 Shape::WorldToObject 64 Reference<TriangleMesh> mesh; Transform 43 int *v; TriangleMesh 87 TriangleMesh::p 88 As with TriangleMeshes, it is possible to compute better world space bounding boxes for individual triangles by bounding the world space vertices directly. TriangleMesh Method Deﬁnitions ¡¡ ¢ BBox Triangle::ObjectBound() const { Get triangle vertices in p1, p2, and p3 ¡ return Union(BBox(WorldToObject(p1), WorldToObject(p2)), WorldToObject(p3)); } TriangleMesh Method Deﬁnitions ¡¡ ¢ BBox Triangle::WorldBound() const { Get triangle vertices in p1, p2, and p3 ¡ return Union(BBox(p1, p2), p3); } Get triangle vertices in p1, p2, and p3 ¢ £¡ const Point &p1 = mesh->p[v[0]]; const Point &p2 = mesh->p[v[1]]; const Point &p3 = mesh->p[v[2]]; 3.6.2 Triangle Intersection An algorithm for ray–triangle intersection can be computed using barycentric co- ordinates. Barycentric coordinates provide a way to parameterize a triangle in Sec. 3.6] Triangles and Meshes 91 O t M-1 [O-v0 ] O-v0 v2 v2 -v0 v1 v0 v1 -v0 1 u 1 v Figure 3.5: Transforming the ray into a more convenient coordinate system for intersection. First, a translation is applied to make a corner of the triangle coincide with the origin. Then, the triangle is rotated and scaled to a unit right-triangle. The axis labels don’t match the text. terms of two variables, b1 and b2 : p b 1 b2 ¡ ¢ ¡ 1 b1 b2 p0 ¡ b1 p1 b2 p2 The conditions on b1 and b2 are that b1 0, b2 0, and b1 b2 1. This is the parametric form of a triangle. The barycentric coordinates are also a natural way to interpolate across the surface of the triangle; given values deﬁned at the vertices a 0 , a1 , and a2 and given the barycentric coordinates for a point on the triangle, we can compute an interpolated value of a at that point as 1 b 1 b2 a0 b1 a1 b2 a2 . ¡ (See Section ?? on page ?? for a texture that interpolates shading values over a triangle mesh in this manner.) To derive an algorithm for intersecting a ray with a triangle, we insert the para- metric ray equation into the triangle equation. or ¡ td r ¢ ¡ 1 b1 b2 p0 ¡ b1 p1 b2 p2 (3.6.1) o o Following the technique described by M¨ ller and Trumbore(M¨ ller and Trum- bore 1997), we use the shorthand notation E 1 p1 p0 , E2 p2 p0 , and T ¢ ¢ ¢ o r p0 . We can now rearrange the terms of Equation 3.6.1 to obtain the matrix ¡ equation: £¤ ¥¦ ¡ t d r E 1 E2 b1 T ¡ (3.6.2) ¢ ¢ b2 Solving this linear system will give us both the barycentric coordinates of the in- tersection point (which can easily be used to compute the 3D intersection point) as well as the distance along the ray. Geometrically, we can interpret this system as a translation of the triangle to the origin, and a transformation of the triangle to a unit triangle in y and z, keeping the ray direction aligned with x, as shown in Figure 3.5. We can easily solve equation 3.6.2 using Cramer’s rule. Note that we are intro- ¡ ¡ ducing a bit of notation for brevity here; we write a b c to mean the determi- ¡ ¡ nant of the matrix having a, b, and c as its columns. Cramer’s rule gives: 92 Shapes [Ch. 3 ¡ ¡ t T E 1 E2 £ ¥ £ ¥ ¡ ¡ ¤ ¦ ¤ ¦ 1 ¡ ¡ b1 dr T E2 (3.6.3) ¡ ¡ ¡ ¡ ¢ ¡ ¡ ¡ ¡ d r E 1 E2 ¡ ¡ ¡ ¡ b2 ¡ dr ¡ E1 T ¡ ¡ ¡ This can be rewritten as A B C ¡ ¡ ¢ A C ¡ ¢ B ¢ C B ¡ ¢ A. We can thus rewrite Equation 3.6.3 as: £¤ ¥¦ £¤ ¥¦ t T E 1 E2 ¢¡ 1 b1 ¢ ¡ d r E2 T ¢¡ (3.6.4) dr ¡ E 2 E1 ¢¡ b2 T E1 d r ¢ ¡ ¡ If we use the substitution s1 ¢ dr ¡ E2 and s2 ¢ t E1 we can make the common subexpressions more explicit: £¤ ¥¦ £¤ ¥¦ t s2 E2 ¢ 1 b1 ¢ s1 T ¢ (3.6.5) s1 E1 ¢ b2 s2 d r ¢ ¡ In order to compute E1 , E2 , and T we need 9 subtractions. To compute s 1 and s2 , we need two cross products, which is a total of 12 multiplications and DifferentialGeometry 58 6 subtractions. Finally, to compute t, b 1 , and b2 , we need 4 dot products (12 Ray 36 Triangle 90 multiplications and 8 additions), 1 reciprocal, and 3 multiplications. Thus, the total cost of ray–triangle intersection is 1 divide, 27 multiplies, and 17 adds (counting adds and subtracts together). Note that some of these operations can be avoided if it is determined mid-calculation that the ray does not intersect the triangle. TriangleMesh Method Deﬁnitions ¡¡ ¢ bool Triangle::Intersect(const Ray &ray, Float *t_hitp, DifferentialGeometry *dg) const { Compute s1 ¡ Compute ﬁrst barycentric coordinate ¡ Compute second barycentric coordinate ¡ Compute t to intersection point ¡ Fill in DifferentialGeometry from triangle hit ¡ *t_hitp = t; return true; } First, we compute the divisor from Equation 3.6.5. We ﬁnd the three mesh vertices that make up this particular Triangle, and then compute the edge vectors and divisor. Note that if the divisor is zero, this triangle is degenerate and therefore cannot intersect a ray. Sec. 3.6] Triangles and Meshes 93 Compute s1 ¢ £¡ Get triangle vertices in p1, p2, and p3 ¡ Vector E1 = p2 - p1; Vector E2 = p3 - p1; Vector S_1 = Cross(ray.d, E2); Float divisor = Dot(S_1, E1); if (divisor == 0.) return false; Float invDivisor = 1.f / divisor; We can now compute the desired barycentric coordinate b 1 . Recall that barycen- tric coordinates that are less than zero or greater than one represent points outside the triangle, so those are non-intersections. Compute ﬁrst barycentric coordinate ¢ £¡ Vector T = ray.o - p1; Float b1 = Dot(T, S_1) * invDivisor; if (b1 < 0. || b1 > 1.) return false; The second barycentric coordinate, b 2 , is computed in a similar way: Compute second barycentric coordinate ¢ £¡ 31 Cross() Vector S_2 = Cross(T, E1); 58 DifferentialGeometry 30 Dot() Float b2 = Dot(ray.d, S_2) * invDivisor; 35 Ray::o if (b2 < 0. || b1 + b2 > 1.) 27 Vector return false; Now that we know the ray intersects the triangle, we compute the distance along the ray at which the intersection occurs. This gives us one last opportunity to exit the procedure early, in the case where the t value falls outside our Ray::mint and Ray::maxt bounds. Compute t to intersection point ¢ £¡ Float t = Dot(E2, S_2) * invDivisor; if (t < ray.mint || t > ray.maxt) return false; We now have all the information we need to compute the DifferentialGeometry structure for this intersection. In contrast to previous shapes, we don’t need to transform the partial derivatives to world-space, since the triangle’s vertices were already transformed to world-space. Like the disk, the triangle’s normal partial derivatves are also both 0 0 0 . ¡ ¡ ¡ Fill in DifferentialGeometry from triangle hit ¢ £¡ Compute triangle partial derivatives ¡ Interpolate u v triangle parametric coordinates ¡ ¡ ¡ *dg = DifferentialGeometry(ray(t), dpdu, dpdv, Vector(0,0,0), Vector(0,0,0), tu, tv, this); In order to generate consistent tangent vectors over triangle meshes, it is neces- sary to compute the partial derivatives ∂p ∂u and ∂p ∂v using the parametric u v ¡ ¡ ¡ ¡ values at the triangle vertices, if provided. Although the partial derivatives are the same at all points on the triangle, the implementation here just recomputes them 94 Shapes [Ch. 3 each time an intersection is found. Although this results in redundant computation, the storage savings for large triangle meshes can be substantial. The triangle is the set of points u∂p ∂u v∂p ∂v ¡ ¡ pP ¡ for some pP , where u and v range over the parametric coordinates of the triangle. We also know the three vertex positions p i , i 0 1 2 and the texture coordinates ¢ ¡ ¡ ui vi at each vertex. From this it follows that ¡ ¡ ui ∂p ∂u vi ∂p ∂v ¡ ¡ pi ¢ pP ¤ This can be written in matrix form: ¢ ¥ ¢ ¥ ¢ ¥ ∂p ∂u ¡ p0 u0 v0 1 ∂p ∂v ¡ ¤ p1 ¨ ¢ ¤ u1 v1 1 ¨ ¤ ¨ p2 u2 v2 1 pP In other words, there is a unique afﬁne mapping from the two-dimensional u v ¡ ¡ space to points on the triangle (such a mapping exists even though the triangle is speciﬁed in 3D space, because it is planar.) To compute expressions for ∂p ∂u and ¡ ∂p ∂v, we just need to solve the matrix equation. We subtract the bottom row of ¡ each matrix from the top two rows, giving: What happens to pP from the previous equation? ∂p ∂u ¡ p0 p2 u0 u2 v0 v2 ¢ ∂p ∂v ¡ p1 p2 ¡ u1 u2 v1 v2 ¡ ¡ So 1 ∂p ∂u ¡ u0 u2 v0 v2 ¥ p0 p2 ∂p ∂v ¢ ¡ ¡ u1 u2 v1 v2 ¡ p1 p2 ¡ Inverting a 2 2 matrix is straightforward; we just inline the computation di- rectly in the code: The points don’t match the math! It looks like we’ve rotated the triangle here from the math. I don’t want to touch this; shorty can you ﬁx this so it matches the math and then test it? Sec. 3.6] Triangles and Meshes 95 Compute triangle partial derivatives ¢ £¡ Vector dpdu, dpdv; Float uvs[3][2]; GetUVs(uvs); Compute deltas for triangle partial derivatives ¡ Float determinant = du1 * dv2 - dv1 * du2; if (determinant == 0) { Handle zero determinant for triangle partial derivative matrix ¡ } else { Float invdet = 1.f / determinant; dpdu = Vector((dx1 * dv2 - dv1 * dx2) * invdet, (dy1 * dv2 - dv1 * dy2) * invdet, (dz1 * dv2 - dv1 * dz2) * invdet); dpdv = Vector((du1 * dx2 - dx1 * du2) * invdet, (du1 * dy2 - dy1 * du2) * invdet, (du1 * dz2 - dz1 * du2) * invdet); } why don’t the points being subtracted match up with the math? Can we do dx1, dx2, etc as Vectors here? Also need to ﬁx up idrafted dndu/dndv code 32 CoordinateSystem() equivalently. 31 Cross() 90 Triangle Compute deltas for triangle partial derivatives ¢ £¡ 96 Triangle::GetUVs() Float du1 = uvs[1][0] - uvs[0][0]; 87 TriangleMesh 27 Vector Float du2 = uvs[2][0] - uvs[0][0]; 30 Vector::Hat() Float dv1 = uvs[1][1] - uvs[0][1]; Float dv2 = uvs[2][1] - uvs[0][1]; Float dx1 = p2.x - p1.x; Float dx2 = p3.x - p1.x; Float dy1 = p2.y - p1.y; Float dy2 = p3.y - p1.y; Float dz1 = p2.z - p1.z; Float dz2 = p3.z - p1.z; Finally, it is necessary to handle the case when the matrix is singular and there- fore cannot be inverted. Note that this only happens when the user-supplied per- vertex parameterization values are degenerate. In this case, the Triangle just chooses an arbitrary coordinate system, making sure that it is orthonormal: Handle zero determinant for triangle partial derivative matrix ¢ £¡ CoordinateSystem(Cross(E2, E1).Hat(), &dpdu, &dpdv); To compute the u v parametric coordinates at the hit point, the barycentric ¡ ¡ interpolation formula is applied to the u v parametric coordinates at the vertices. ¡ ¡ Interpolate u ¡ v triangle parametric coordinates ¡ ¢ £¡ Float b0 = 1 - b1 - b2; Float tu = b0*uvs[0][0] + b1*uvs[1][0] + b2*uvs[2][0]; Float tv = b0*uvs[0][1] + b1*uvs[1][1] + b2*uvs[2][1]; The utility GetUVs() routine returns the u v coordinates for the three vertices ¡ ¡ of the triangle, either from the TriangleMesh, if it has them, or returning defaults 96 Shapes [Ch. 3 v1 v2 Figure 3.6: The area of a triangle with two edges given by vectors v 1 and v2 is one half of the area of the parallelogram. The parallelogram area is given by the length of the cross product of v1 and v2 . if none were speciﬁed with the mesh. TriangleMesh Method Deﬁnitions ¡¡ ¢ void Triangle::GetUVs(Float uv[3][2]) const { if (mesh->uvs) { uv[0][0] = mesh->uvs[2*v[0]]; uv[0][1] = mesh->uvs[2*v[0]+1]; uv[1][0] = mesh->uvs[2*v[1]]; Triangle 90 uv[1][1] = mesh->uvs[2*v[1]+1]; TriangleMesh::uvs 88 uv[2][0] = mesh->uvs[2*v[2]]; uv[2][1] = mesh->uvs[2*v[2]+1]; } else { uv[0][0] = 0.; uv[0][1] = 0.; uv[1][0] = 1.; uv[1][1] = 0.; uv[2][0] = 1.; uv[2][1] = 1.; } } 3.6.3 Surface Area Recall from Section 2.1 that the area of a parallelogram is given by the length of the cross product of the two vectors along its sides. From this, it’s easy to see that given the vectors for two edges of a triangle, its area is 1 of the area of the 2 parallelogram given by those two vectors–see Figure 3.6. TriangleMesh Method Deﬁnitions ¡¡ ¢ Float Triangle::Area() const { Get triangle vertices in p1, p2, and p3 ¡ return 0.5f * Cross(p2-p1, p3-p1).Length(); } 3.6.4 Shading Geometry text here need xrefs in the code here Sec. 3.6] Triangles and Meshes 97 Triangle Public Methods ¡¡ ¢ virtual void GetShadingGeometry(const Transform &obj2world, const DifferentialGeometry &dg, DifferentialGeometry *dgShading) const { if (!mesh->n && !mesh->s) { *dgShading = dg; return; } Initialize Triangle shading geometry with n and s ¡ } Initialize Triangle shading geometry with n and s ¢ £¡ Compute barycentric coordinates for point ¡ Use n and s to compute shading tangents for triangle, ss and ts ¡ Vector dndu, dndv; Compute ∂n ∂u and ∂n ∂v for triangle shading geometry ¡ ¡ ¡ *dgShading = DifferentialGeometry(dg.p, ss, ts, dndu, dndv, dg.u, dg.v, dg.shape, dg.dudx, dg.dvdx, dg.dudy, dg.dvdy); 58 DifferentialGeometry Recall that the u v parametric coordinates in the DifferentialGeometry for 400 ¡ ¡ DifferentialGeometry::dudx a triangle are computed with barycentric interpolation of parametric coordinates at 400 DifferentialGeometry::dudy the triangle vertices. 400 DifferentialGeometry::dvdx 400 DifferentialGeometry::dvdy 58 DifferentialGeometry::p u b 0 u0 b1 u1 b2 u2 ¢ 58 DifferentialGeometry::shape 58 DifferentialGeometry::u v b 0 v0 b1 v1 b2 v2 ¢ 58 DifferentialGeometry::v 43 Transform Because bi are barycentric coordinates, b0 1 b1 b2 . Here, u, v, ui and vi 90 Triangle ¢ are all known, u and v from the DifferentialGeometry and u i and vi from the 90 Triangle::mesh 88 TriangleMesh::n Triangle. We can substitue for the b0 term and rewrite the above equations, giving 88 TriangleMesh::s a linear system in two unknowns b1 and b2 . 27 Vector u1 u0 u2 u1 b1 u u0 v1 v0 v2 v1 ¡ b2 ¡ ¢ v v0 ¡ This is a linear system of the basic form Ab ¢ C. We can solve for b by inverting A, giving the two barycentric coordinates b ¢ A 1C ¥ ¤ The closed form solution for this is implemented in the utility routine SolveLinearSystem2x2(). Compute barycentric coordinates for point ¢ £¡ Float b[3]; Initialize A and C matrices for barycentrics ¡ if (!SolveLinearSystem2x2(A, C, &b[1])) { Handle degenerate parametric mapping ¡ } else b[0] = 1.f - b[1] - b[2]; 98 Shapes [Ch. 3 Initialize A and C matrices for barycentrics ¢ £¡ Float uv[3][2]; GetUVs(uv); Float A[2][2] = { { uv[1][0] - uv[0][0], uv[2][0] - uv[0][0] }, { uv[1][1] - uv[0][1], uv[2][1] - uv[0][1] } }; Float C[2] = { dg.u - uv[0][0], dg.v - uv[0][1] }; If the determinant of A is zero, the solution is undeﬁned and SolveLinearSystem2x2() returns false. This case happens if all three triangle vertices had the same texture coordinates, for example. In this case, the barycentric coordinates are all set arbi- trarily to 1 . 3 Handle degenerate parametric mapping ¢ £¡ b[0] = b[1] = b[2] = 1.f/3.f; Use n and s to compute shading tangents for triangle, ss and ts ¢ £¡ Normal ns; Vector ss, ts; if (mesh->n) ns = (b[0] * mesh->n[v[0]] + b[1] * mesh->n[v[1]] + b[2] * mesh->n[v[2]]).Hat(); else ns = dg.nn; Cross() 31 if (mesh->s) ss = (b[0] * mesh->s[v[0]] + b[1] * mesh->s[v[1]] + DifferentialGeometry::u 58 b[2] * mesh->s[v[2]]).Hat(); DifferentialGeometry::v 58 Normal 34 else ss = dg.dpdu.Hat(); SolveLinearSystem2x2() 675 ts = obj2world(Cross(ss, ns)).Hat(); Vector 27 ss = obj2world(Cross(ts, ns)).Hat(); make sure not to include the heightﬁeld on the CD. ¦ ¢¢¡ ¨ ¡ ¢ ¥¢¢£ ¤ ¤ ¥ ©§ £ ¡ ¡ ¨ ¢ We will wrap up this chapter by deﬁning a shape that implements subdivision surfaces, which are particularly well-suited to describing complex smooth shapes. The subdivision surface for a particular mesh is deﬁned by repeatedly subdividing the faces of the mesh into smaller faces, then changing the new vertex locations using weighted combinations of the old vertex positions. For appropriately chosen subdivision rules, this process converges to give a smooth limit surface as the number of subdivision steps goes to inﬁnity. In prac- tice, just a few levels of subdivision typically sufﬁce to give a good approximation of the limit surface. Figure 3.7 shows the effect of applying one set of subdivision rules to a tetrahedron; on the left is the original control mesh, and one, two, three, and four levels of subdivision are shown moving from left to right. Though originally developed in the 1970s, subdivision surfaces have recently received a fair amount of attention in computer graphics thanks to some key advan- tages over polygonal and spline-based representations of surfaces. The advantages of subdivision include: Subdivision surfaces are smooth, as opposed to polygon meshes which ap- pear faceted when viewed close up, regardless of how ﬁnely they are mod- eled. Sec. 3.7] ***ADV***: Subdivision Surfaces 99 Figure 3.7: tetra control mesh and 4 levels of subdivision. A lot of existing infrastructure in modeling systems can be retargeted to sub- division. The classic toolbox of techniques for modeling polygon meshes can be applied to modeling subdivision control meshes. Subdivision surfaces are well-suited to describing objects with complex topol- ogy, since we can start with a control meshes of arbitrary (manifold) topol- ogy. Parametric surface models generally don’t handle complex topology well. Subdivision methods are often generalizations of spline-based surface repre- sentations, so spline surfaces can often just be run through general subdivi- sion surface renderers. It is easy to add detail to a localized region of a subdivision surface, simply by adding faces to appropriate parts of the control mesh. This is much less easily done with spline representations. Here, we will describe an implementation of Loop subdivision surfaces 5 . The Loop rules are based on triangular faces in the control mesh; faces with more than three vertices are just triangulated at the start. At each subdivision step, all faces split into four child faces (Figure 3.8). New vertices are added along all of the edges of the original mesh, with positions computed using weighted averages of nearby vertices. Furthermore, the position of each original vertex is updated with a weighted average of its position and its new neighbors’ positions. 3.7.1 Mesh Representation 5 Don’t be fooled by the name. These surfaces are not “loopy”; they are named after the inventor of the subdivision rules, Charles Loop. 100 Shapes [Ch. 3 Figure 3.8: Basic reﬁnement process for Loop subdivision: the control mesh on the left has been subdivided once to create the new mesh on the right. Each triangular face of the mesh has been subdivided into four new faces by splitting each of the edges and connecting the new vertices with new edges. LoopSubdiv Declarations ¢ £¡ class COREDLL LoopSubdiv : public Shape { public: LoopSubdiv Public Methods ¡ Point 33 private: SDFace 102 LoopSubdiv Private Methods ¡ SDVertex 101 LoopSubdiv Private Data ¡ Shape 63 Transform 43 }; TriangleMesh 87 We will start by describing the data structures used to represent the subdivision mesh. These data structures need to be carefully designed in order to support all of the operations necessary to cleanly implement the subdivision algorithm. The parameters to the LoopSubdiv constructor specify a triangle mesh in exactly the same format used in the TriangleMesh constructor (see Section 3.6 on page 87.): each face is described by three integer vertex indices, giving offsets into the vertex array P for the face’s three vertices. We will need to process this data to determine which faces are adjacent to each other, which faces are adjacent to which vertices, etc. LoopSubdiv Method Deﬁnitions ¢ £¡ LoopSubdiv::LoopSubdiv(const Transform &o2w, bool ro, int nfaces, int nvertices, const int *vertexIndices, const Point *P, int nl) : Shape(o2w, ro) { nLevels = nl; Allocate LoopSubdiv vertices and faces ¡ Set face to vertex pointers ¡ Set neighbor pointers in faces ¡ Finish vertex initialization ¡ } We will shortly deﬁne SDVertex and SDFace structures, which hold data for vertices and faces in the subdivision mesh. We start by allocating one instance of the SDVertex class for each vertex in the mesh and an SDFace for each face. For now, these are mostly uninitialized. Sec. 3.7] ***ADV***: Subdivision Surfaces 101 Allocate LoopSubdiv vertices and faces ¢ £¡ int i; SDVertex *verts = new SDVertex[nvertices]; for (i = 0; i < nvertices; ++i) { verts[i] = SDVertex(P[i]); vertices.push_back(&verts[i]); } SDFace *fs = new SDFace[nfaces]; for (i = 0; i < nfaces; ++i) faces.push_back(&fs[i]); The LoopSubdiv destructor, which we won’t include here, just deletes all of the faces and vertices allocated above. LoopSubdiv Private Data ¢ £¡ int nLevels; vector<SDVertex *> vertices; vector<SDFace *> faces; The Loop subdivision scheme, like most other subdivision schemes, assumes that the control mesh is manifold, i.e. no more than two faces share any given edge. Such a mesh may be closed or open: a closed mesh has no boundary, and all faces have adjacent faces across each of their edges. An open mesh has some faces 100 LoopSubdiv 33 Point that do not have all three neighbors. The LoopSubdiv implementation supports 102 SDFace both closed and open meshes. 658 vector In the interior of a triangle mesh, most vertices are adjacent to six faces and have six neighbor vertices directly connected to them with edges. On the boundaries of an open mesh, most vertices are adjacent to three faces and four vertices. The number of vertices directly adjacent to a vertex is called the vertex’s valence. In- terior vertices with valence other than six, or boundary vertices with valence other than four are called extraordinary vertices; otherwise they are called regular. Loop subdivision surfaces are smooth everywhere except at their extraordinary vertices. Each SDVertex stores its position P, a boolean that indicates whether it is a regular or extraordinary vertex, and a boolean that records if it lies on the boundary of the mesh. It also holds a pointer to one of the faces adjacent to it; later we will use this pointer to start an iteration over all of the faces adjacent to the vertex by following pointers stored in each SDFace to record which faces are adjacent. ¡— - This sentence is pretty garbled. Finally, we have a pointer to store the new SDVertex for the next level of subdivision, if any. LoopSubdiv Local Structures ¡¡ ¢ struct SDVertex { SDVertex Constructor ¡ SDVertex Methods ¡ Point P; SDFace *startFace; SDVertex *child; bool regular, boundary; }; The constructor for SDVertex does the obvious initialization; note that SDVertex::startFace is initialized to NULL. 102 Shapes [Ch. 3 Figure 3.9: Each triangular face stores three pointers to SDVertex objects v[i] and three pointers to neighboring faces f[i]. Neighboring faces are indexed using the convention that the ith edge is the edge from v[i] to v[(i+1)%3], and the neighbor Point 33 across the ith edge is in f[i]. SDVertex 101 SDVertex Constructor ¢ £¡ SDVertex(Point pt = Point(0,0,0)) : P(pt), startFace(NULL), child(NULL), regular(false), boundary(false) { } The SDFace structure is where we maintain most of the topological information about the mesh. Because all faces are triangular, we always store three pointers to the vertices for this face and three pointers to the faces adjacent to this one. (The face neighbor pointers will be NULL if the face is on the boundary of an open mesh.) The face neighbor pointers are indexed such that if we label the edge from v[i] to v[(i+1)%3] as the ith edge, then the neighbor face across that edge is stored in f[i]–see Figure 3.9. This labeling convention is important to keep in mind; later when we are updating the topology of a newly subdivided mesh, we will make extensive use of it to navigate around the mesh. Similarly to the SDVertex class, we also store pointers to child faces at the next level of subdivision. LoopSubdiv Local Structures ¡¡ ¢ struct SDFace { SDFace Constructor ¡ SDFace Methods ¡ SDVertex *v[3]; SDFace *f[3]; SDFace *children[4]; }; The SDFace constructor is straightforward–it simply sets pointers to NULL–so it is not shown here. Sec. 3.7] ***ADV***: Subdivision Surfaces 103 Figure 3.10: All of the faces in the input mesh must be speciﬁed so that each shared edge is given once in each direction. Here, the edge from v 0 to v1 is traversed from v0 to v1 by face number one, and from v1 to v0 by face number two. Another way to think of this is in terms of face orientation: all faces’ vertices should be given consistently in either clockwise or counter-clockwise order, as seen from outside the mesh. In order to simplify navigation of the SDFace data structure, we’ll provide macros that make it easy to determine the vertex and face indices before or after a partic- ular index. These macros add appropriate offsets and compute the result modulus three to handle cycling around. To compute the previous index, we add 2 instead 100 LoopSubdiv of subtracting 1, which avoids taking the modulus of a negative number, the result 102 SDFace of which is implementation-dependent in C++. LoopSubdiv Macros ¢ £¡ #define NEXT(i) (((i)+1)%3) #define PREV(i) (((i)+2)%3) In addition to requiring a manifold mesh, the LoopSubdiv class expects that the control mesh speciﬁed by the user will be consistently ordered–each directed edge in the mesh can be present only once. An edge that is shared by two faces should be speciﬁed in a different direction by each face. Consider two vertices, v 0 and v1 , with an edge between them. We expect that one of the triangular faces that has this edge will specify its three vertices so that v 0 is before v1 , and that the other face o will specify its vertices so that v1 is before v0 (Figure 3.10). A M¨ bius strip is one example of a surface that cannot be consistently ordered, but such surfaces come up rarely in rendering so in practice this restriction is not troublesome. Given this assumption about the input data, we will initialize this mesh’s topo- logical data structures. We ﬁrst loop over all of the faces and set their v pointers to point to their three vertices. We also set each vertex’s SDVertex::startFace pointer to point to one of the vertex’s neighboring faces. It doesn’t matter which of its adjacent faces we choose, so we just keep resetting it each time we come across another face that it is incident to, ensuring that all vertices have some non-NULL face pointer by the time we’re done. 104 Shapes [Ch. 3 Set face to vertex pointers ¢ £¡ const int *vp = vertexIndices; for (i = 0; i < nfaces; ++i) { SDFace *f = faces[i]; for (int j = 0; j < 3; ++j) { SDVertex *v = vertices[vp[j]]; f->v[j] = v; v->startFace = f; } vp += 3; } Now we need to set each face’s f pointer to point to its neighboring faces. This is a bit trickier, since face adjacency information isn’t directly speciﬁed by the user. We’ll loop over the faces and store an SDEdge object for each of their three edges; when we come to another face that shares the same edge, we can update both faces’ neighbor pointers. LoopSubdiv Local Structures ¡¡ ¢ struct SDEdge { SDEdge Constructor ¡ SDFace 102 SDEdge Comparison Function ¡ SDFace::v 102 SDVertex 101 SDVertex *v[2]; SDVertex::startFace 101 SDFace *f[2]; SDFace **fptr; }; The constructor takes pointers to the two vertices at each end of the edge. It orders them so that v[0] holds the one that is ﬁrst in memory; This code may seem strange, but we’re simply relying on the fact that pointers in C++ are really just 32-bit numbers that can be manipulated like integers, and that the ordering of vertices on an edge is arbitrary. By sorting vertices on the address of the pointer, we guarantee that we properly recognize that the edge v a vb is the same as the ¡ ¡ edge vb va , regardless of what order the vertices are given in. ¡ ¡ okay, so “fptr” is never used for anything meaningful. Can we get rid of it please? Same for f[1]. SDEdge Constructor ¢ £¡ SDEdge(SDVertex *v0 = NULL, SDVertex *v1 = NULL) { v[0] = min(v0, v1); v[1] = max(v0, v1); f[0] = f[1] = NULL; fptr = NULL; } We also deﬁne an ordering operation for SDEdge objects so that they used by other data structures that rely on ordering being well-deﬁned. Sec. 3.7] ***ADV***: Subdivision Surfaces 105 SDEdge Comparison Function ¢ £¡ bool operator<(const SDEdge &e2) const { if (v[0] == e2.v[0]) return v[1] < e2.v[1]; return v[0] < e2.v[0]; } Now we can get to work, looping over the edges in all of the faces and updating the neighbor pointers as we go. We use an STL set<> to store the edges that have only one adjacent face so far. The set<> allows us to search for a particular edge in O log n , using the comparison function above. ¡ Set neighbor pointers in faces ¢ £¡ set<SDEdge> edges; for (i = 0; i < nfaces; ++i) { SDFace *f = faces[i]; for (int edge = 0; edge < 3; ++edge) { Update neighbor pointer for edge ¡ } } For each edge in each face, we create an edge object and see if the same edge was seen previously. If so, we initialize both faces’ neighbor pointers across the 103 NEXT edge. If not, we add the edge to the set of edges. 104 SDEdge this variable naming is very confusing – you shouldn’t be seetting a vertex 104 SDEdge::f 104 SDEdge::fptr (v0) to an edge (edge). I realize what’s going on, but the code is quite hard to 104 SDEdge::v read. Also, NEXT(edge) sounds like the next edge. This needs some ﬁxing. 102 SDFace 102 SDFace::f Update neighbor pointer for edge ¢ £¡ int v0 = edge, v1 = NEXT(edge); SDEdge e(f->v[v0], f->v[v1]); if (edges.find(e) == edges.end()) { Handle new edge ¡ } else { Handle previously-seen edge ¡ } Given an edge that we haven’t seen before, we store the current face’s pointer in the edge object’s f[0] member. When we come across the other face that shares this edge (if any), we can thus know what the neighboring face is. We also store a pointer to the location in the current SDFace that will point to the neighboring face once we ﬁnd it. this refers to fptr, right? It’s not used anywhere! Search for it, you’ll see I’m right. Let’s delete all this crap. Handle new edge ¢ £¡ e.f[0] = f; e.fptr = &(f->f[edge]); edges.insert(e); When we ﬁnd the second face on an edge, we can set the neighbor pointers for each of the two faces. We then remove the edge from the edge set, since no edge can be shared by more than two faces. 106 Shapes [Ch. 3 Figure 3.11: Given a vertex v[i] and a face that it is incident to, f, we deﬁne the next face as the face adjacent to f across the edge from v[i] to v[NEXT(i)]. The previous face is deﬁned analogously. Handle previously-seen edge ¢ £¡ e = *edges.find(e); *e.fptr = f; f->f[edge] = e.f[0]; SDEdge::f 104 edges.erase(e); SDEdge::fptr 104 SDFace 102 What happens to the edges left in the edges set at the end? Are they deleted? SDFace::f 102 Could we use those to set the boundary ﬂag? – Jessica wants to know. SDVertex 101 SDVertex::boundary 101 Now that all faces have proper neighbor pointers, we can set the boundary and SDVertex::startFace 101 regular ﬂags in each of the vertices. In order to deterime if a vertex is a bound- ary vertex, we’ll deﬁne an ordering of faces around a vertex (Figure 3.11). For a vertex v[i] on a face f, we deﬁne the vertex’s next face as the face across the edge from v[i] to v[NEXT(i)] and the previous face as the face across the edge from v[PREV(i)] to v[i]. We will frequently need to know the valence of a vertex, so we provide the method SDVertex::valence(). LoopSubdiv Inline Functions ¢ £¡ inline int SDVertex::valence() { SDFace *f = startFace; if (!boundary) { Compute valence of interior vertex ¡ } else { Compute valence of boundary vertex ¡ } } To compute the valence of a non-boundary vertex, we count the number of the adjacent faces starting by following each face’s neighbor pointers around the vertex until we reach the starting face. The valence is equal to the number of faces visited. Sec. 3.7] ***ADV***: Subdivision Surfaces 107 Figure 3.12: We can determine if a vertex is a boundry vertex by starting from the adjacent face startFace and following next face pointers around the vertex. If we come to a face that has no next neighbor face, then the vertex is on a boundary. If we return to startFace, it’s an interior vertex. 108 SDFace::nextFace() Compute valence of interior vertex £¡ ¢ 108 SDFace::prevFace() int nf = 1; while ((f = f->nextFace(this)) != startFace) ++nf; return nf; For boundary vertices we use the same approach, though in this case, the valence is one more than the number of adjacent faces. The loop over adjacent faces is slightly more complicated here: we follow pointers to the next face around the vertex until we reach the boundary, counting the number of faces seen. We then start again at startFace and follow previous face pointers until we encounter the boundary in the other direction. Compute valence of boundary vertex ¢ £¡ int nf = 1; while ((f = f->nextFace(this)) != NULL) ++nf; f = startFace; while ((f = f->prevFace(this)) != NULL) ++nf; return nf+1; By successively going to the next face around v, we can iterate over the faces adjacent to it. If we eventually return to the face we started at, then we are at an interior vertex; if we come to an edge with a NULL neighbor pointer, then we’re at a boundary vertex–see Figure 3.12. Once we’ve determined if we have a boundary vertex, we compute to valence of the vertex and set the regular ﬂag if the valence is 6 for an interior vertex or 4 for a boundary vertex. 108 Shapes [Ch. 3 Finish vertex initialization ¢ £¡ for (i = 0; i < nvertices; ++i) { SDVertex *v = vertices[i]; SDFace *f = v->startFace; do { f = f->nextFace(v); } while (f && f != v->startFace); v->boundary = (f == NULL); if (!v->boundary && v->valence() == 6) v->regular = true; else if (v->boundary && v->valence() == 4) v->regular = true; else v->regular = false; } Here is the utility function that ﬁnds the index of a given vertex for one of the faces adjacent to it. It’s a fatal error to pass a pointer to a vertex that isn’t part of the current face—this case would represent a bug elsewhere in the subdivision code. SDFace Methods ¢ £¡ NEXT 103 int vnum(SDVertex *vert) const { PREV 103 SDFace 102 for (int i = 0; i < 3; ++i) SDFace::f 102 if (v[i] == vert) return i; SDFace::v 102 Severe("Basic logic error in SDFace::vnum()"); SDVertex 101 SDVertex::boundary 101 return -1; SDVertex::regular 101 } SDVertex::startFace 101 Since the next face for a vertex v[i] on a face f is over the ith edge (recall the mapping of edge neighbor pointers from Figure 3.9), we can ﬁnd the appropriate face neighbor pointer easily given the index i for the vertex, which the vnum() utility function provides. The previous face is across the edge from PREV(i) to i, so we return f[PREV(i)] for the previous face. SDFace Methods ¡¡ ¢ SDFace *nextFace(SDVertex *vert) { return f[vnum(vert)]; } SDFace Methods ¡¡ ¢ SDFace *prevFace(SDVertex *vert) { return f[PREV(vnum(vert))]; } It will be very useful to be able to get the next and previous vertices around a face starting at any vertex. The SDFace::nextVert() and SDFace::prevVert() methods do just that (Figure 3.13). SDFace Methods ¡¡ ¢ SDVertex *nextVert(SDVertex *vert) { return v[NEXT(vnum(vert))]; } Sec. 3.7] ***ADV***: Subdivision Surfaces 109 Figure 3.13: Given a vertex v on a face f, the method f->prevVert(v) returns the previous vertex around the face from v and f->nextVert(v) returns the next vertex. SDFace Methods ¡¡ ¢ SDVertex *prevVert(SDVertex *vert) { return v[PREV(vnum(vert))]; } 3.7.2 Bounds 38 BBox 100 LoopSubdiv Loop subdivision surfaces have the convex hull property: the limit surface is guar- 103 PREV 102 SDFace::v anteed to be inside the convex hull of the original control mesh. Thus, for the 108 SDFace::vnum() bounding methods, we can just bound the original control vertices. The bounding 101 SDVertex methods are essentially equivalent to those in TriangleMesh, so we won’t include 87 TriangleMesh them here. LoopSubdiv Public Methods ¡¡ ¢ BBox ObjectBound() const; BBox WorldBound() const; 3.7.3 Subdivison Now we can show how subdivision proceeds with the Loop rules. The LoopSubdiv shape doesn’t support intersection directly, but will apply subdivision a ﬁxed num- ber of times to generate a TriangleMesh for rendering. An exercise at the end of the chapter discusses adaptive subdivision, where that each original face is subdi- vided just enough so that the result looks smooth from a particular viewpoint. LoopSubdiv Method Deﬁnitions ¡¡ ¢ bool LoopSubdiv::CanIntersect() const { return false; } The Refine() method handles all of the subdivision. We repeatedly apply the subdivision rules to the mesh, each time generating a new mesh to be used as the input to the next step. After each subdivision step, the f and v arrays in the Refine() method are updated to point to the faces and vertices from the level of subdivision just computed. When we are done subdividing, a TriangleMesh representation of the surface is created and returned to the caller. 110 Shapes [Ch. 3 Figure 3.14: Basic Loop subdivision of a single face: four child faces are created, ordered such that the ith child face is adjacent to the ith vertex of the original face and the fourth child face is in the center of the subdivided face. Three edge vertices need to be computed; they are numbered so that the ith edge vertex is along the ith edge of the original face. This diagram could be clearer, in particular it should show what the ”child” pointers do. What is an ObjectArena? We should say something about this before using LoopSubdiv 100 it. LoopSubdiv::faces 101 LoopSubdiv::nLevels 101 LoopSubdiv Method Deﬁnitions ¡¡ ¢ LoopSubdiv::vertices 101 void LoopSubdiv::Refine(vector<Reference<Shape> > &refined) const { ObjectArena 668 Reference 664 vector<SDFace *> f = faces; SDFace 102 vector<SDVertex *> v = vertices; SDVertex 101 ObjectArena<SDVertex> vertexArena; Shape 63 vector 658 ObjectArena<SDFace> faceArena; for (int i = 0; i < nLevels; ++i) { Update f and v for next level of subdivision ¡ } Push vertices to limit surface ¡ Compute vertex tangents on limit surface ¡ Create TriangleMesh from subdivision mesh ¡ } The main loop of a subdivision step proceeds as follows: We create vectors for all of the vertices and faces at this level of subdivision and then proceed to compute new vertex positions and update the topological representation for the reﬁned mesh. Figure 3.14 shows the basic reﬁnement rules for faces in the mesh. Each face is split into four child faces, such that the ith child face is next to the ith vertex of the input face and the ﬁnal face is in the center. Three new vertices are then computed along the split edges of the original face. Sec. 3.7] ***ADV***: Subdivision Surfaces 111 Update f and v for next level of subdivision ¢ £¡ vector<SDFace *> newFaces; vector<SDVertex *> newVertices; Allocate next level of children in mesh tree ¡ Update vertex positions and create new edge vertices ¡ Update new mesh topology ¡ Prepare for next level of subdivision ¡ First, we allocate storage for the updated values of the vertices in the input mesh. We also allocate storage for the child faces. We don’t yet do any initialization of the new vertices and faces other than setting the regular and boundary ﬂags for the vertices. Subdivision leaves boundary vertices on the boundary and interior vertices in the interior. Furthermore, it doesn’t change the valence of vertices in the mesh. Allocate next level of children in mesh tree ¢ £¡ for (u_int j = 0; j < v.size(); ++j) { v[j]->child = new (vertexArena) SDVertex; v[j]->child->regular = v[j]->regular; v[j]->child->boundary = v[j]->boundary; newVertices.push_back(v[j]->child); } 102 SDFace for (u_int j = 0; j < f.size(); ++j) 102 SDFace::children 101 SDVertex for (int k = 0; k < 4; ++k) { 101 SDVertex::boundary f[j]->children[k] = new (faceArena) SDFace; 101 SDVertex::regular newFaces.push_back(f[j]->children[k]); 658 vector } Computing new vertex positions Before we worry about the topology of the subdivided mesh, we compute po- sitions for all of the vertices in the mesh. First, we will consider the problem of computing updated positions for all of the vertices that were already present in the mesh; these vertices are called even vertices. We will then compute the new vertices on the split edges–these are called odd vertices. Update vertex positions and create new edge vertices ¢ £¡ Update vertex positions for even vertices ¡ Compute new odd edge vertices ¡ Different techniques are used to compute the updated positions for each of the different types of even vertices–regular and extraordinary, boundary and interior. This gives four cases to handle. Update vertex positions for even vertices ¢ £¡ for (u_int j = 0; j < v.size(); ++j) { if (!v[j]->boundary) { Apply one-ring rule for even vertex ¡ } else { Apply boundary rule for even vertex ¡ } } 112 Shapes [Ch. 3 Figure 3.15: The new position v for a vertex v is computed by weighting the ¤ adjacent vertices vi by a weight β and weighting v by 1 nβ , where n is the ¡ valence of v. The adjacent vertices v i are collectively referred to as the one ring around v. opSubdiv::weightOneRing() 113 For both types of interior vertices, we take the set of vertices adjacent to each SDVertex::boundary 101 vertex (called the one-ring around it, reﬂecting the fact that it’s a ring of neighbors) SDVertex::child 101 and weight each of the neighbor vertices by a weight β (Figure 3.15.). The vertex SDVertex::P 101 SDVertex::regular 101 we are updating, in the center, is weighted by 1 nβ, where n is the valence of the SDVertex::valence() 106 vertex. Thus, the new position v for a vertex v is: ¤ N v ¤ ¢ 1 nβ v ¡ ∑ βvi ¤ i 1 This formulation ensures that the sum of weights is one, which guarantees the convex hull property we used above for bounding the surface. The position of the vertex being updated is only affected by vertices that are nearby; this is known as local support. Loop subdivision is particularly efﬁcient to implement because its subdivision rules all have this property. The particular weight β used for this step is a key component of the subdivision method, and must be chosen carefully in order to ensure smoothness of the limit surface among other desirable properties. The LoopSubdiv::beta() method be- low computes a β value based on the vertex’s valence that ensures smoothness. For 1 regular interior vertices, LoopSubdiv::beta() returns 16 . Since this is a common 1 case, we use the number 16 directly instead of calling LoopSubdiv::beta() every time. either show why or direct the reader to a proof/derivation. Apply one-ring rule for even vertex ¢ £¡ if (v[j]->regular) v[j]->child->P = weightOneRing(v[j], 1.f/16.f); else v[j]->child->P = weightOneRing(v[j], beta(v[j]->valence())); What the heck is 3/16 here? Explain shit like this. Sec. 3.7] ***ADV***: Subdivision Surfaces 113 LoopSubdiv Private Methods ¢ £¡ static Float beta(int valence) { if (valence == 3) return 3.f/16.f; else return 3.f / (8.f * valence); } The LoopSubdiv::weightOneRing() function loops over the one-ring of adja- cent vertices and applies the given weight to compute a new vertex position. It uses the SDVertex::oneRing() function, deﬁned below, which returns the positions of the vertices around the vertex vert. LoopSubdiv Method Deﬁnitions ¡¡ ¢ Point LoopSubdiv::weightOneRing(SDVertex *vert, Float beta) { Put vert one-ring in Pring ¡ Point P = (1 - valence * beta) * vert->P; for (int i = 0; i < valence; ++i) P += beta * Pring[i]; return P; } Jesus, we re-compute valence a lot. Could we either make this a variable or thunk the damn function so we don’t walk all around the mesh like 20 times 100 LoopSubdiv per vertex? 33 Point Put vert one-ring in Pring ¢ £¡ 102 SDFace 108 SDFace::nextFace() int valence = vert->valence(); 108 SDFace::nextVert() Point *Pring = (Point *)alloca(valence * sizeof(Point)); 101 SDVertex vert->oneRing(Pring); 101 SDVertex::boundary 101 SDVertex::P 101 SDVertex::startFace LoopSubdiv Method Deﬁnitions ¡¡ ¢ 106 SDVertex::valence() void SDVertex::oneRing(Point *P) { if (!boundary) { Get one ring vertices for interior vertex ¡ } else { Get one ring vertices for boundary vertex ¡ } } It’s relatively easy to get the one-ring around an interior vertex: we loop over the faces adjacent to the vertex, and for each face grab the next vertex the center vertex. Get one ring vertices for interior vertex ¢ £¡ SDFace *face = startFace; do { *P++ = face->nextVert(this)->P; face = face->nextFace(this); } while (face != startFace); The one-ring around a boundary vertex is a bit more tricky. We will carefully store the one ring in the given Point array so that the ﬁrst and last entries in the array are the two adjacent vertices along the boundary. This requires that we ﬁrst 114 Shapes [Ch. 3 Figure 3.16: Subdivision on a boundary edge: the new position for the vertex in the center is computed by weighting it and its two neighbor vertices by the weights shown. loop around neighbor faces until we reach a face on the boundary and then loop around the other way, storing vertices one by one. if we’re just going to multiply SDFace 102 everything by β, why does the order matter? Say something here. ¡ SDFace::nextFace() 108 Get one ring vertices for boundary vertex £ ¤¢ SDFace::nextVert() 108 SDFace::prevFace() 108 SDFace *face = startFace, *f2; SDFace::prevVert() 109 while ((f2 = face->nextFace(this)) != NULL) SDVertex::child 101 face = f2; SDVertex::P 101 SDVertex::startFace 101 *P++ = face->nextVert(this)->P; do { *P++ = face->prevVert(this)->P; face = face->prevFace(this); } while (face != NULL); For vertices on the boundary, the new vertex’s position is only based on the two neighboring boundary vertices (Figure 3.16). By not depending on interior vertices, we ensure that two abutting surfaces that share the same vertices on the boundary will have abutting limit surfaces. The weightBoundary() utility func- tion applies the given weighting on the two neighbor vertices v 1 and v2 to compute the new position v as: ¨ ¡ v 1 2β v βv1 βv2 ¨ ¦ £ ¤ 1 The same weight of 8 is used for both regular and extraordinary vertices. ¡ Apply boundary rule for even vertex £ ¤¢ v[j]->child->P = weightBoundary(v[j], 1.f/8.f); The weightBoundary() function applies the given weights at a boundary ver- tex. Because the oneRing() function orders the boundary vertex’s one ring such that the ﬁrst and last entries are the boundary neighbors, the implementation here is particularly straightforward. Sec. 3.7] ***ADV***: Subdivision Surfaces 115 Figure 3.17: Subdivision rule for edge split: the position of the new odd vertex, marked with an “x” (what?), is found by weighting the two vertices at the ends of the edge and the two vertices opposite it on the adjacent triangles. On the left are the weights for an interior vertex; on the right are the weights for a boundary vertex. LoopSubdiv Method Deﬁnitions ¡¡ ¢ Point LoopSubdiv::weightBoundary(SDVertex *vert, Float beta) { Put vert one-ring in Pring ¡ Point P = (1-2*beta) * vert->P; 100 LoopSubdiv 33 Point P += beta * Pring[0]; 104 SDEdge P += beta * Pring[valence-1]; 102 SDFace return P; 101 SDVertex 101 SDVertex::P } Now we’ll compute the positions of the odd vertices, the new vertices along the split edges of the mesh. We loop over each edge of each face in the mesh, computing the new vertex that splits the edge (Figure 3.17). For interior edges, the new vertex is found by weighting the two vertices at the ends of the edge (v 0 and v1 ) and the two vertices across from the edge on the adjacent faces (v 2 and v3 ). We loop through all three edges of each face, and each time we see an edge that hasn’t been seen before, we compute and store the new odd vertex in the splitEdges associative array. Compute new odd edge vertices ¢ £¡ map<SDEdge, SDVertex *> splitEdges; for (u_int j = 0; j < f.size(); ++j) { SDFace *face = f[j]; for (int k = 0; k < 3; ++k) { Compute odd vertex on kth edge ¡ } } As we did when setting the face neighbor pointers in the original mesh, we create an SDEdge object for the edge and see if it is in the set of edges we’ve already visited. If it isn’t, we compute the new vertex on this edge and add it to the map. The map is an associative array structure that performs efﬁcient lookups. you know this is O(log n) time, not constant, right? map is implemented as a tree. 116 Shapes [Ch. 3 Compute odd vertex on kth edge ¢ £¡ SDEdge edge(face->v[k], face->v[NEXT(k)]); SDVertex *vert = splitEdges[edge]; if (!vert) { Create and initialize new odd vertex ¡ Apply edge rules to compute new vertex position ¡ splitEdges[edge] = vert; } In Loop subdivision, the new vertices added by subdivision are always regular. This means that the proportion of extraordinary vertices to regular vertices will decrease with each level of subdivision. We can therefore immediately initialize the regular member of the new vertex. The boundary member can also be easily initialized, by checking to see if there is a neighbor face across the edge that we’re splitting. Finally, we’ll go ahead and set the vertex’s startFace pointer here. For all odd vertices on the edges of a face, the center child (child face number three) is guaranteed to be adjacent to the new vertex. Create and initialize new odd vertex¢ £¡ vert = new (vertexArena) SDVertex; newVertices.push_back(vert); NEXT 103 vert->regular = true; SDEdge 104 SDEdge::v 104 vert->boundary = (face->f[k] == NULL); SDFace::children 102 vert->startFace = face->children[3]; SDFace::f 102 SDFace::otherVert() 117 For odd boundary vertices, the new vertex is just the average of the two adja- SDFace::v 102 cent vertices. For odd interior vertices, the two vertices at the ends of the edge are SDVertex 101 SDVertex::boundary 101 1 given weight 3 , and the two vertices opposite the edge are given weight 8 (Fig- 8 SDVertex::P 101 ure 3.17). These last two vertices can be found using the SDFace::otherVert() SDVertex::regular 101 utility, which returns the vertex opposite a given edge of a face. SDVertex::startFace 101 Apply edge rules to compute new vertex position¢ £¡ if (vert->boundary) { vert->P = 0.5f * edge.v[0]->P; vert->P += 0.5f * edge.v[1]->P; } else { vert->P = 3.f/8.f * edge.v[0]->P; vert->P += 3.f/8.f * edge.v[1]->P; vert->P += 1.f/8.f * face->otherVert(edge.v[0], edge.v[1])->P; vert->P += 1.f/8.f * face->f[k]->otherVert(edge.v[0], edge.v[1])->P; } The SDFace::otherVert() method is self-explanatory: Sec. 3.7] ***ADV***: Subdivision Surfaces 117 Figure 3.18: Each face is split into four child faces, such that the ith child is adja- cent to the ith vertex of the original face, and such that the ith child face’s ith vertex is the child of the ith vertex of the original face. The vertices of the center child are oriented such that the ith vertex is the odd vertex along the ith edge of the parent 102 SDFace::v face. 101 SDVertex SDFace Methods ¡¡ ¢ SDVertex *otherVert(SDVertex *v0, SDVertex *v1) { for (int i = 0; i < 3; ++i) if (v[i] != v0 && v[i] != v1) return v[i]; Severe("Basic logic error in SDVertex::otherVert()"); return NULL; } Updating mesh topology In order to keep the details of the topology update as straightforward as pos- sible, the numbering scheme for the subdivided faces and their vertices has been chosen carefully–see Figure 3.18 for a summary. Review the ﬁgure carefully; these conventions are key to the next few pages. There are four main tasks required to update the topological pointers of the reﬁned mesh: 1. The odd vertices’ SDVertex::startFace pointers need to store a pointer to one of their adjacent faces. 2. Similarly, the even vertices’ SDVertex::startFace pointers must be set. 3. The new faces’ neighbor f[i] pointers need to be set to point to the neigh- boring faces. 4. The new faces’ v[i] pointers need to point to the incident vertices. 118 Shapes [Ch. 3 We already initialized the startFace pointers of the odd vertices when we ﬁrst created them; we’ll handle the other three tasks in order here. Update new mesh topology ¢ £¡ Update even vertex face pointers ¡ Update face neighbor pointers ¡ Update face vertex pointers ¡ We will ﬁrst set the startFace pointer for the children of the even vertices. If a vertex is the ith vertex of its startFace, then it is guaranteed that it will be adjacent to the ith child face of startFace. Therefore we just need to loop through all the parent vertices in the mesh, and for each one ﬁnd its vertex index in its startFace. This index can then be used to ﬁnd the child face adjacent to the new even vertex. Update even vertex face pointers ¢ £¡ for (u_int j = 0; j < v.size(); ++j) { SDVertex *vert = v[j]; int vertNum = vert->startFace->vnum(vert); vert->child->startFace = vert->startFace->children[vertNum]; } Next we update the face neighbor pointers for the newly-created faces. We break NEXT 103 this into two steps: one to update neighbors among children of the same parent, and SDFace 102 one to do neighbors across children of different parents. This involves some tricky SDFace::children 102 SDFace::f 102 pointer manipulation. SDFace::vnum() 108 SDVertex 101 Update face neighbor pointers ¢ £¡ SDVertex::startFace 101 for (u_int j = 0; j < f.size(); ++j) { SDFace *face = f[j]; for (int k = 0; k < 3; ++k) { Update children f pointers for siblings ¡ Update children f pointers for neighbor children ¡ } } For the ﬁrst step, recall that the interior child face is always stored in children[3]. Furthermore, the k 1st child face (for k 0 1 2) is across the kth edge of the in- ¢ ¡ ¡ terior face, and the interior face is across the k 1st edge of the kth face. Update children f pointers for siblings ¢ £¡ face->children[3]->f[k] = face->children[NEXT(k)]; face->children[k]->f[NEXT(k)] = face->children[3]; We’ll now update the childrens’ face neighbor pointers that point to children of other parents. Only the ﬁrst three children need to be addressed here; the interior child’s neighbor pointers have already been fully initialized. Inspection of Fig- ¥ £ ¡ ¤¤¢ ure 3.18 reveals that the kth and k th edges of the ith child need to be set. ¡ To set the kth edge of the kth child, we ﬁrst ﬁnd the kth edge of the parent face, then the neighbor parent f2 across that edge. If f2 exists (meaning we aren’t on a boundary), we ﬁnd the neighbor paren’t index for the vertex v[k]. That index is equal to the index of the neighbor child we are searching for. We then repeat this ¥ £ ¡ ¤§¦ process to ﬁnd the child across the k th edge. ¡ Sec. 3.7] ***ADV***: Subdivision Surfaces 119 Update children f pointers for neighbor children ¢ £¡ SDFace *f2 = face->f[k]; face->children[k]->f[k] = f2 ? f2->children[f2->vnum(face->v[k])] : NULL; f2 = face->f[PREV(k)]; face->children[k]->f[PREV(k)] = f2 ? f2->children[f2->vnum(face->v[k])] : NULL; Finally, we handle the fourth step in the topological updates: setting the chil- drens’ v[i] vertex pointers. Update face vertex pointers¢ £¡ for (u_int j = 0; j < f.size(); ++j) { SDFace *face = f[j]; for (int k = 0; k < 3; ++k) { Update child vertex pointer to new even vertex ¡ Update child vertex pointer to new odd vertex ¡ } } For the kth child face (for k 0 1 2), the kth vertex corresponds to the even ¢ ¡ ¡ vertex that is adjacent to it. (For the non-interior children faces, there is one even 103 NEXT vertex and two odd vertices; for the interior child face, there are three odd vertices). 103 PREV We can ﬁnd this vertex by following the child pointer of the parent vertex, available 104 SDEdge 102 SDFace from the parent face. 102 SDFace::children Update child vertex pointer to new even vertex ¢ £¡ 102 SDFace::f 102 SDFace::v face->children[k]->v[k] = face->v[k]->child; 108 SDFace::vnum() 101 SDVertex To update the rest of the vertex pointers, we re-use the splitEdges associative 101 SDVertex::child array to ﬁnd the odd vertex for each split edge of the parent face. Three child faces have that vertex as an incident vertex. Fortunately, the vertex indices for the three faces are easily found, again based on the numbering scheme established in Figure 3.18. Update child vertex pointer to new odd vertex ¢ £¡ SDVertex *vert = splitEdges[SDEdge(face->v[k], face->v[NEXT(k)])]; face->children[k]->v[NEXT(k)] = vert; face->children[NEXT(k)]->v[k] = vert; face->children[3]->v[k] = vert; After the geometric and topological work has been done for a subdivision step, we move the newly-created vertices and faces into the v and f arrays, deleting the old ones, since we no longer need them. We only do these deletions after the ﬁrst time through the loop, however; the original faces and vertices of the control mesh are left intact. What is going on here 120 Shapes [Ch. 3 Figure 3.19: To push a boundary vertex onto the limit surface, we apply the weights shown to the vertex and its neighbors along the edge. Prepare for next level of subdivision ¢ £¡ #if 0 if (i != 0) { for (u_int j = 0; j < f.size(); ++j) delete f[j]; for (u_int j = 0; j < v.size(); ++j) delete v[j]; } #endif f = newFaces; v = newVertices; To the limit surface and output One of the remarkable properties of subdivision surfaces is that there are special subdivision rules that let us compute the positions that the vertices of the mesh would have if we continued subdividing inﬁnitely. We apply these rules here to initialize an array of limit surface positions, Plimit. Note that it’s important to temporarily store the limit surface positions somewhere other than in the vertices while the computation is taking place. Because the limit surface position of each vertex depends on the original positions of its surrounding vertices, the original positions of all vertices must remain unchanged until the computation is done. The limit rule for a boundary vertex weights the two neighbor vertices by 1 and 5 3 the center vertex by 5 (Figure 3.19 this ﬁgure doesn’t add very much); the rule for interior vertices is based on a function gamma(), which computes appropriate vertex weights based on the valence of the vertex. Sec. 3.7] ***ADV***: Subdivision Surfaces 121 Push vertices to limit surface ¢ £¡ Point *Plimit = new Point[v.size()]; for (u_int i = 0; i < v.size(); ++i) { if (v[i]->boundary) Plimit[i] = weightBoundary(v[i], 1.f/5.f); else Plimit[i] = weightOneRing(v[i], gamma(v[i]->valence())); } for (u_int i = 0; i < v.size(); ++i) v[i]->P = Plimit[i]; LoopSubdiv Private Methods ¡¡ ¢ static Float gamma(int valence) { return 1.f / (valence + 3.f / (8.f * beta(valence))); } In order to generate a smooth-looking triangle mesh with per-vertex surface nor- mals, we’ll also compute a pair of non-parallel tangent vectors at each vertex. As with the limit rule for positions, this is an analytic computation that gives the pre- cise tangents on the actual limit surface. Compute vertex tangents on limit surface ¢ £¡ 31 Cross() vector<Normal> Ns; 115 LoopSubdiv::weightBoundary() 113 LoopSubdiv::weightOneRing() Ns.reserve(v.size()); 34 Normal for (u_int i = 0; i < v.size(); ++i) { 33 Point SDVertex *vert = v[i]; 101 SDVertex 101 SDVertex::boundary Vector S(0,0,0), T(0,0,0); 106 SDVertex::valence() Put vert one-ring in Pring ¡ 658 vector if (!vert->boundary) { 27 Vector Compute tangents of interior face ¡ } else { Compute tangents of boundary face ¡ } Ns.push_back(Normal(Cross(S, T))); } Figure 3.20 shows the setting for computing tangents in the mesh interior. The center vertex is given a weight of zero and the neighbors are given weights w i . To compute the ﬁrst tangent vector S, the weights are 2πi wi ¢ cos ¡ n ¡ where n is the valence of the vertex. The second tangent T , is computed with weights 2πi wi sin ¢ ¤ n ¡ 122 Shapes [Ch. 3 Figure 3.20: To compute tangents for interior vertices, the one-ring vertices are weighted with weights wi . The center vertex, where the tangent is being computed, always has a weight of 0. Compute tangents of interior face ¢ £¡ M PI 678 for (int k = 0; k < valence; ++k) { Vector 27 S += cosf(2.f*M_PI*k/valence) * Vector(Pring[k]); T += sinf(2.f*M_PI*k/valence) * Vector(Pring[k]); } Tangents on boundary vertices are a bit trickier; Figure 3.21 shows the ordering of vertices in the one ring expected in the discussion below. The ﬁrst tangent, known as the across tangent, is given by the vector between the two neighboring boundary vertices: S ¢ vn 1 v0 ¤ ¥ The second tangent, known as the transverse tangent is computed based on the vertex’s valence. The center vertex is given a weight w c which can be zero. The one-ring vertices are given weights speciﬁed by a vector w 0 w1 wn 1 . The ¡ ¡ ¤¤ ££¤ ¡ ¡ ¥ transverse tangent rules we will use are: valence wc wi 2 -2 (1, 1) 3 -1 (0,1,0) 4 (regular) -2 (-1, 2, 2, -1) For valences of 5 and higher, wc ¢ 0 and w0 ¢ wn 1 ¢ sin θ ¥ wi ¢ 2 cos θ 2 sin θi ¡ ¡ where π θ ¢ ¤ n 1 Further Reading 123 Figure 3.21: Tangents at boundary vertices are also computed as weighted averages of the adjacent vertices. However, some of the boundary tangent rules incorporate the value of the center vertex. Compute tangents of boundary face¢ £¡ S = Pring[valence-1] - Pring[0]; 678 M PI if (valence == 2) 27 Vector T = Vector(Pring[0] + Pring[1] - 2 * vert->P); else if (valence == 3) T = Pring[1] - vert->P; else if (valence == 4) // regular T = Vector(-1*Pring[0] + 2*Pring[1] + 2*Pring[2] + -1*Pring[3] + -2*vert->P); else { Float theta = M_PI / float(valence-1); T = Vector(sinf(theta) * (Pring[0] + Pring[valence-1])); for (int k = 1; k < valence-1; ++k) { Float wt = (2 * cosf(theta) - 2) * sinf((k) * theta); T += Vector(wt * Pring[k]); } T = -T; } Finally, the fragment Create TriangleMesh from subdivision mesh cre- ¡ ates the triangle mesh object and adds it to the refined vector passed to the LoopSubdiv::Refine() method. We won’t include it here, since it’s just a straight- forward transformation of the subdivided mesh into an indexed triangle mesh. ¥ £ § £ £ ¨ ¡ § ¥ ¢ Introduction to Ray Tracing has an extensive survey of algorithms for ray–shape intersection (?). Heckbert has written a technical report that discusses the mathe- matics of quadrics for graphics applications in detail, with many citations to liter- ature in mathematics and other ﬁelds (Heckbert 1984). Hanrahan describes a sys- tem that automates the process of deriving a ray intersection routine for surfaces 124 Shapes [Ch. 3 deﬁned by implicit polynomials; his system emits C source code to perform the intersection test and normal computation for a surface described by a given equa- tion (Hanrahan 1983). Other notable early papers include Kajiya’s work on com- puting intersections with surfaces of revolution and procedurally-generated fractal terrains (Kajiya 1983) and his technique for computing intersections with paramet- u ric patches (Kajiya 1982). More recently, St¨ rzlinger and others have done work u on more efﬁcient techniques for direct ray intersection with patches (St¨ rzlinger 1998). The ray–triangle intersection test in Section 3.6 was developed by M¨ llero o and Trumbore (M¨ ller and Trumbore 1997). The notion of shapes that repeatedly reﬁne themselves into collections of other shapes until ready for rendering was ﬁrst introduced in the REYES renderer (Cook, Carpenter, and Catmull 1987). Pharr et al applied a similar approach to a ray tracer (Pharr, Kolb, Gershbein, and Hanrahan 1997). An excellent introduction to differential geometry is Gray’s book (Gray 1993); Section 14.3 of it presents the Weingarten equations. Turkowski’s technical report has expressions for ﬁrst and second derivatives of a handful of parametric primi- tives (Turkowski 1990a). The Loop subdivision method was originally developed by Charles Loop (Loop 1987). Our implementation uses the improved rules for subdivision and tangents along boundary edges developed by Hoppe et al (Hoppe, DeRose, Duchamp, Hal- stead, Jin, McDonald, Schweitzer, and Stuetzle 1994). There has been extensive work in subdivision recently; the SIGGRAPH course notes give a good summary o of the state-of-the-art and also have extensive references (Zorin, Schr¨ der, DeRose, Kobbelt, Levin, and Sweldins 2000). Procedural stochastic models Fournier et al (Fournier, Fussel, and Carpenter 1982). ¡ ¥ ¥ £ 3.1 One nice property of mesh-based shapes like triangle meshes and subdivision surfaces is that we can transform the shape’s vertices into world space, so that it isn’t necessary to transform rays into object space before performing ray intersection tests. Interestingly enough, it is possible to do the same thing for ray–quadric intersections. The implicit forms of the quadrics in this chapter were all of the form Ax2 Bxy Cxz Dy2 Eyz Fz2 G ¢ 0 ¡ where some of the constants A G were zero. More generally, we can deﬁne ¤¤ ££¤ quadric surfaces by the equation Ax2 By2 Cz2 2Dxy 2Eyz 2Fxz 2Gz 2Hy 2Iz J ¢ 0 ¡ where most of the parameters A J don’t directly correspond to the A G ¤¤ ££¤ ¤¤ ££¤ above. In this form, the quadric can be represented by a 4 4 symmetric matrix Q: £ ¥ £ ¥ ¤ A D F G ¦ ¤ x ¦ D B E H y PT Q P ¡ ¡ ¡ x y z 1 ¡ ¡ ¢ ¢ ¢ ¢ 0 ¢ F E C I z G H I J 1 Exercises 125 Given this representation, ﬁrst show that the matrix Q representing a quadric ¤ transformed by the matrix M is: Q ¤ ¢ MT ¡ ¥ 1 QM ¥ 1 ¤ To do so, show that for any point p where p T Qp 0, if we apply a transfor- ¢ mation M to p and compute p M p, we’d like to ﬁnd Q so that p T Q p ¤ ¢ ¤ ¡ ¤ ¤ ¤ ¢ 0. ¤ Next, substitute the ray equation into the more general quadric equation above to compute a, b, and c values for the quadratic equation in terms of entries of the matrix Q to pass to the Quadratic() function. Now implement this approach in lrt and use it instead of the original quadric intersection routines. Note that you will still need to transform the resulting world-space hit points into object space to test against θ max , if it is not 2π, etc. How does performance compare to the original scheme? 3.2 Improve the object-space bounding box routines for the quadrics to properly account for θmax 2π. ¢ 3.3 There is room to optimize the implementations of the various quadric prim- itives in lrt in a number of ways. For example, for complete spheres (i.e., not partial spheres with limited z and φ ranges), some of the tests in the in- tersection routine are unnecessary. Furthermore, many of the quadrics have excess calls to trigonometric functions that could be turned into simpler ex- pressions using insight about the geometry of the particular primitives. In- vestigate ways to speed up these methods. How much does this improve the overall runtime of lrt? 3.4 Currently lrt recomputes the partial derivatives ∂p ∂u and ∂p ∂v for tri- ¡ ¡ angles every time they are needed, even though they are constant for each triangle. Precompute these vectors and analyze the speed/storage tradeoff, especially for large triangle meshes. How does the depth complexity of the scene affect this tradeoff? 3.5 Implement a general polygon primitive. lrt currently transforms polygons with more than three vertices into a collection of triangles by XXX. This is actually only correct for convex polygons without holes. Support all kinds of polygons as as ﬁrst-class primitive. How to compute plane equation from a normal and a point on the plane.... Then intersect ray with the plane the polygon sits in. Project that point and the polygon vertices to 2D. Then apply a 2D point in polygon test; easy one is to essentially ray trace in 2D–intersect the ray with each of the edge segments, count how many it goes through. If odd number, are inside the polygon and have an intersection. Figure 3.22. Haines has written an article that surveys a number of approaches for efﬁ- cient point in polygon tests (Haines 1994); some of the techniques described there may be helpful for optimizing this test. Schneider and Eberly discuss strategies getting all the corner cases right, e.g. for when the 2D ray is aligned precisely with an edge of the polygon (Schnei- der and Eberly 2003, Section XX). 126 Shapes [Ch. 3 Figure 3.22: Polygon projection onto plane for intersection. 3.6 subdiv extensions: ”crease”, n integer vertices to specify chain of edges, one LoopSubdiv 100 ﬂoat, inﬁnity, giving sharpness. for crease, use boundary subdivision rules Shape 63 along the edges, giving a sharp feature there. ”hole” face property, inherit to children, just don’t output at end 3.7 Implement adaptive subdivision for the subdivision surface Shape. A weak- ness of the basic implementation is that each face is always reﬁned a ﬁxed number of times: this may mean that some faces are under-reﬁned, leading to visible faceting in the triangle mesh, and some faces are over-reﬁned, lead- ing to excessive memory use and rendering time. Instead, stop subdividing faces once a particular error threshold has been reached. An easy error threshold to implement computes the face normals of each face and its directly adjacent faces. If they are sufﬁciently close to each other (e.g. as tested via dot products), then the limit surface for that face will be reasonably ﬂat. The trickiest part of this exercise is that some faces that don’t need subdivi- sion due to the ﬂatness test will still need to be subdivided in order to provide vertices so that neighboring faces that do need to subdivide can get their ver- tex one-rings. In particular, adjacent faces can differ by no more than one level of subdivision. 3.8 Use the triangular face reﬁnement infrastructure from the LoopSubdiv shape to implement displacement mapping. Displacement mapping is a technique related to bump mapping, where an offset function is deﬁned over the entire surface. Rather than just adjusting the surface normal as in bump mapping, the actual surface shape is modiﬁed by displacement mapping. The usual approach to displacement mapping is to ﬁnely tessellate the geometric shape and to then evaluate the displacement function at its vertices, moving each vertex the given distance along its normal. Exercises 127 Because displacement mapping may make the extent of the shape larger, the bounding box of the un-displaced shape will need to be expanded by the maximum displacement distance that a particular displacement function will ever generate. Reﬁne each face of the mesh until, when projected onto the image, it is roughly the size of the separation between pixels. To do this, you will need to be able to estimate the image pixel-based length of an edge in the scene when it is projected onto the screen. After you have done this, use the texturing infrastructure in Chapter 11 to evaluate displacement functions. 3.9 CSG! 3.10 Ray tracing point-sampled geometry: extending methods for rendering com- plex models represented as a collection of point samples (Levoy and Whit- ted 1995; Pﬁster, Zwicker, van Baar, and Gross 2000; Rusinkiewicz and Levoy 2000), Schauﬂer and Jensen recently described a method for inter- secting rays with collections of oriented point samples in space (Schauﬂer and Jensen 2000). They probabilisticly determine that an intersection has occurred when a ray approaches a sufﬁcient local density of point samples and compute a surface normal with a weighted average of the nearby sam- ples. Read their paper and extend lrt to supoprt a point-sampled geometry 63 Shape 87 TriangleMesh shape. Do any of lrt’s basic interfaces need to be extended or generalized to support a shape like this? 3.11 Ray tracing ribbons: Hair is often modeled as a collection of generalized cylinders, which are deﬁned as the cylinder that results from sweeping a disk along a given curve. Because there are often a large number of individ- ual hairs, an efﬁcient method for intersecting rays with generalized cylinders is needed for ray tracing hair. A number of methods have been developed to compute ray intersections with generalized cylinders (Bronsvoort and Klok 1985; de Voogt, van der Helm, and Bronsvoort 2000); investigate these al- gorithms and extend lrt to support a fast hair primitive with one of them. Alternatively, investigate the generalization of Schauﬂer and Jensen’s ap- proach for probabilistic point intersection (Schauﬂer and Jensen 2000) to probabilistic line intersection and apply this to fast ray tracing of hair. 3.12 Implicit functions. More general functions, sums of them to deﬁne complex surface. Good for molecules, water drops, etc. Introduced by Blinn (Blinn 1982a). Wyvill and Wyvill give new falloff function with a number of ad- vantages (Wyvill and Wyvill 1989). Kalra and Barr (Kalra and Barr 1989) and Hart (Hart 1996) give methods for ray tracing them. 3.13 Procedurally-described parametric surfaces: write a Shape that takes an ex- pression of the form f u v ¡x y z that describes a parametric surface as ¡¡ ¡ ¡ ¡ a function of u v position. Evaluate the given function at a grid of u v po- ¡ ¡ ¡ ¡ sitions to create a TriangleMesh that approximates the given surface when the Shape::Refine() method is called. 3.14 Generative modeling: Snyder and Kajiya have described an elegant mathe- matical framework for procedurally-described geometric shapes (Snyder and 128 Shapes [Ch. 3 Kajiya 1992; Snyder 1992); investigate this approach and apply it to proce- dural shape description in lrt. 3.15 L-systems: A very successful technique for procedurally describing plants was ﬁrst introduced to graphics by Alvy Ray Smith (Smith 1984), who ap- plied Lindenmayer systems (l-systems) to describing branching plant struc- tures. L-systems describe the branching structure of these types of shapes via a grammar. Prusinkiewicz and collaborators have generalized this ap- ¨ proach to encompass XXX (Prusinkiewicz, M undermann, Karwowski, and Lane 2001; Deussen, Hanrahan, Lintermann, Mech, Pharr, and Prusinkiewicz 1998; Prusinkiewicz, James, and Mech 1994; Prusinkiewicz 1986). ¡ ¥ ¢¥¡¦ ¥ ¡ ¢ ¥ £ ¢ ¡ ¥ ¢ ¤ £ ¥ ¦¢ ¡ £ ¡ ¦¢ ¢ £ ¢ £ ¤ ¤ ¥ 135 Aggregate 132 GeometricPrimitive 134 InstancePrimitive 375 Material 130 Primitive 63 Shape 43 Transform The classes described in the last chapter focus exclusively on representing geo- metric properties of 3D objects. Although the Shape class is a convenient abstrac- tion for geometric operations such as intersection and bounding, it is insufﬁcient for direct use in a rendering system. To construct a scene, we must be able to place individual primitives at speciﬁc locations in world coordinates. In addition, we need to bind material properties to each primitive to we can specify their appear- ance. To accomplish these goals, we introduce the Primitive class, and provide three separate implementations. Shapes to be rendered directly are represented by the GeometricPrimitive class. This class, in addition to placing the shape within the scene, also contains a description of the shape’s appearance properties. So that the geometric and shading portions of lrt can be cleanly separated, these appearance properties are encapsu- lated in the Material class, which is described in chapter 10. Some scenes contain many instances of the same geometry at different locations. Direct support for instancing can greatly reduce the memory requirements for such scenes, since we only need to store a pointer to the geometry for each primitive. lrt provides the InstancePrimitive class for this task; each InstancePrimitive has a separate Transform to place it in the scene, but can share geometry with other InstancePrimitives. This allows us to render extremely complex scenes such as the one in ﬁgure ecosystem. Finally, we provide the Aggregate class, which can hold many Primitives. Although this can just be a convenient way to group geometry, lrt uses this class to implement acceleration structures, which are techniques for avoiding the O n ¡ ¤§ £ 130 Primitives and Intersection Acceleration [Ch. 4 linear complexity of testing a ray against all n objects in a scene. Since a ray through a scene will typically only intersect a handful of the primitives and will be nowhere near most of the others, there is substantial room for improvement compared to naively performing a ray intersection test with each primitive. An- other beneﬁt to re-using the Primitive interface for these acceleration structures is that lrt can support hybrid approaches where an accelerator of one type holds accelerators of another types. This chapter will describe the implementation of two accelerators, one (GridAccel) based on overlaying a uniform grid over the scene, and the other (KdTreeAccel) based on recursive spatial subdivision. § ¥ © ¨ ¤ ¥ ££ ¤ ¥ ¡ £ ¤ £ The abstract Primitive base class is really the bridge between the geometry processing and shading subsystems of lrt. In order to avoid complex logic about when Primitives can be destroyed, it inherits from the ReferenceCounted base class, which automatically tracks how many references there are to an object, free- ing its storage when the last reference goes out of scope. Rather than storing point- ers to these primitives, holding a Reference<Primitive> ensures that the refer- ence counts are computed correctly. The Reference class otherwise behaves as if BBox 38 it was a pointer to a Primitive. DifferentialGeometry 58 GridAccel 139 Primitive Declarations ¢ £¡ Intersection 131 class Primitive : public ReferenceCounted { KdTreeAccel 154 public: Reference 664 ReferenceCounted 663 Primitive Interface ¡ Shape 63 }; Because the Primitive class connects geometry and shading, its interface con- tains methods related to both. There are ﬁve geometric routines, of which all are similar to a corresponding Shape method. The ﬁrst, Primitive::WorldBound(), returns a box that encloses the primitive’s geometry in world space. There are many uses for such a bound; we use it to place the Primitive in the acceleration data structures. Primitive Interface ¡¡ ¢ virtual BBox WorldBound() const = 0; Similarly to the Shape class, all primitives must be able to either determine if a given ray intersects their geometry, or else reﬁne themselves into one or more new primitives. Like the Shape interface, we provide the Primitive::CanIntersect() method so lrt can determine whether the underlying geometry is intersectable or not. One difference from the Shape interface is that the Primitive intersection methods return Intersection structures rather than DifferentialGeometry. These Intersection structures hold mor information about the intersection than just the local coordinate frame, such as a pointer to the material properties at the hit point. Another difference is that Shape::Intersect() returns the parametric distance along the ray to the intersection in a Float * output variable, while Primitive::Intersect() is responsible for updating Ray::maxt with this value if an intersection is found. Sec. 4.1] Geometric Primitives 131 In this way, the geometric routines from the last chapter do not need to know how the parametric distance will be used by the rest of the system. Primitive Interface ¡¡ ¢ virtual bool CanIntersect() const; virtual bool Intersect(const Ray &r, Intersection *in) const = 0; virtual bool IntersectP(const Ray &r) const = 0; virtual void Refine(vector<Reference<Primitive> > &refined) const; The Intersection structure holds information about a ray–primitive intersec- tion, including information about the differential geometry of the point on the sur- face, and a pointer to the Primitive that the ray hit, and its world to object space transformation. Primitive Declarations ¡¡ ¢ struct Intersection { Intersection Public Methods ¡ DifferentialGeometry dg; const Primitive *primitive; Transform WorldToObject; }; It may be necessary to repeatedly reﬁne a primitive until all of the primitives 491 AreaLight it has returned are themselves intersectable. The Primitive::FullyRefine() 58 DifferentialGeometry utility method handles this task. Its implementation is straightforward; we maintain 130 Primitive a queue of primitives to be reﬁned (called todo in the code below), and invoke the 36 Ray 664 Reference Primitive::Refine() method repeatedly on entries in that queue. Intersectable 43 Transform Primitives returned by Primitive::Refine() are placed on the output queue, 658 vector while non-intersectable ones are placed on the todo list. Primitive Interface ¡¡ ¢ void FullyRefine(vector<Reference<Primitive> > &refined) const; Primitive Method Deﬁnitions ¡¡ ¢ void Primitive::FullyRefine( vector<Reference<Primitive> > &refined) const { vector<Reference<Primitive> > todo; todo.push_back(const_cast<Primitive *>(this)); while (todo.size()) { Reﬁne last primitive in todo list ¡ } } Reﬁne last primitive in todo list ¢ £¡ Reference<Primitive> prim = todo.back(); todo.pop_back(); if (prim->CanIntersect()) refined.push_back(prim); else prim->Refine(todo); In addition to the geometric methods, a Primitive object has two methods related to their material properties. The ﬁrst, Primitive::GetAreaLight(), re- turns a pointer to the AreaLight that describes the primitive emission distribution, 132 Primitives and Intersection Acceleration [Ch. 4 if the primitive is itself a light source. If the primitive is not emissive, this method returns NULL. The second method, Primitive::GetBSDF(), returns a representation of the light scattering properties of the material at the given point on the surface in a BSDF object. In addition to the differential geometry at the hit point, it takes the world to object space transformation as a parameter. This information will be required by the InstancePrimitive class, described later in this chapter. Primitive Interface ¡¡ ¢ virtual const AreaLight *GetAreaLight() const = 0; virtual BSDF *GetBSDF(const DifferentialGeometry &dg, const Transform &WorldToObject) const = 0; Whoa – this should be somewhere else, like in the lighting chapter Given the Primitive::GetAreaLight() method, we will add a method to the Intersection class that makes it easy to compute the emitted radiance at a surface point. Intersection Method Deﬁnitions ¢ £¡ Spectrum Intersection::Le(const Vector &w) const { const AreaLight *area = primitive->GetAreaLight(); return area ? area->L(dg.p, dg.nn, w) : Spectrum(0.); AreaLight 491 AreaLight::L() 492 } BSDF 370 DifferentialGeometry 58 DifferentialGeometry::p 58 4.1.1 Geometric Primitive InstancePrimitive 134 Intersection 131 The GeometricPrimitive class represents a single shape (e.g. a sphere) in the Intersection::dg 131 scene. One GeometricPrimitive is allocated for each shape in the scene descrip- Material 375 Primitive 130 tion provided by the user. Reference 664 Primitive Declarations ¡¡ ¢ Shape 63 Spectrum 181 class GeometricPrimitive : public Primitive { Transform 43 public: Vector 27 GeometricPrimitive Public Methods ¡ private: GeometricPrimitive Private Data ¡ }; Each GeometricPrimitive holds a reference to a Shape and its Material. In addition, because primitives in lrt may be area light sources, we store a pointer to an AreaLight object that describes its emission characteristics (this pointer is set to NULL if the primitive does not emit light). GeometricPrimitive Private Data ¢ £¡ Reference<Shape> shape; Reference<Material> material; AreaLight *areaLight; The GeometricPrimitive constructor just initializes these variables from the parameters passed to it; its implementation is omitted. GeometricPrimitive Public Methods ¡¡ ¢ GeometricPrimitive(const Reference<Shape> &s, const Reference<Material> &m, AreaLight *a); Sec. 4.1] Geometric Primitives 133 Most of the methods of the Primitive interface related to geometric processing are simply forwarded the corresponding Shape method. For example, GeometricPrimitive::Intersect() calls the Shape::Intersect() method of its enclosed Shape to do the actual ge- ometric intersection, and initializes an Intersection object to describe the hit found, if any. We also use the returned parametric hit distance to update the Ray::maxt member. The primary advantage of storing the distance to the clos- est hit in Ray::maxt is that we may be able to quickly reject any intersections that lie farther along the ray than any already found. GeometricPrimitive Method Deﬁnitions ¡¡ ¢ bool GeometricPrimitive::Intersect(const Ray &r, Intersection *isect) const { Float thit; if (shape->Intersect(r, &thit, &isect->dg)) { isect->primitive = this; isect->WorldToObject = shape->WorldToObject; r.maxt = thit; return true; } return false; } 135 Aggregate 491 AreaLight We won’t include the implementations of GeometricPrimitive::WorldBound(), BSDF 370 132 GeometricPrimitive::IntersectP(), GeometricPrimitive::CanIntersect(), GeometricPrimitive 134 InstancePrimitive or GeometricPrimitive::Refine() here; they just forward these requests on to 131 Intersection the Shape in a similar manner. 131 Intersection::dg 131 Intersection::primitive GeometricPrimitive::GetAreaLight() just returns the GeometricPrimitive::areaLight 131 Intersection::WorldToObject member. GeoemtricPrimitive::GetBSDF() is implemented in Section 10.2, af- 375 Material ter the Texture BSDF classes have been described. 130 Primitive 36 Ray 36 Ray::maxt 4.1.2 Object Instancing 664 Reference 63 Shape Object instancing is a classic technique in rendering that re-uses multiple trans- 64 Shape::WorldToObject 394 Texture formed copies of a single collection of geometry at multiple positions in a scene. For example, in a model of a concert hall with thousands of identical seats, the scene description can be effectively compressed by a large amount if all of the seats refer to a shared geometric representation of a single seat. The ecosystem scene in Figure ?? has over four thousand individual plants of various types, though only sixty one unique plant models. Because each plant model is instanced multiple times, the complete scene has 19.5 million triangles total, though only 1.1 million triangles are stored in memory. Thanks to primitive reuse though object instancing, lrt uses only approximately 300 MB for rendering this scene. Object instancing is handled by the InstancePrimitive class. It takes a refer- ence to the shared Primitive that represents the instanced model, and the instance- to-world-space transformation that places it in the scene. If the geometry to be instanced is contained in multiple Primitives, the calling code responsible for placing them in an Aggregate class. The InstancePrimitive similarly requries that the primitive be intersectable; it would be a waste of time and memory for all of the instances to individually 134 Primitives and Intersection Acceleration [Ch. 4 reﬁne the primitive. Seems like there should be some mechanism for hiding this cleanly, but I’m not sure. It’s annoying that we can’t do lazy reﬁnement just because there are multiple instances of something. What if they’re all hidden? (See the lrtObjectInstance() function in Appendix B.3.5 for the code that cre- ates instances based on the scene description ﬁle, reﬁning and creating aggregates as described here.) Primitive Declarations ¡¡ ¢ class InstancePrimitive : public Primitive { public: InstancePrimitive Public Methods ¡ private: InstancePrimitive Private Data ¡ }; InstancePrimitive Public Methods ¢ £¡ InstancePrimitive(Reference<Primitive> &i, const Transform &i2w) { instance = i; InstanceToWorld = i2w; WorldToInstance = i2w.GetInverse(); } cePrimitive::IntersectP() 134 Intersection 131 InstancePrimitive Private Data ¢ £¡ tersection::WorldToObject 131 Reference<Primitive> instance; Primitive 130 Primitive::Intersect() 131 Transform InstanceToWorld, WorldToInstance; Ray 36 Ray::maxt 36 need a clearer description of ”instance space”. I think this paragraph could Reference 664 use more work. The InstancePrimitive::Intersect() and InstancePrimitive::IntersectP() Transform 43 methods transform the ray from world space to instance space before passing it along to the primitive. If an intersection is found, the routines transform the re- turned differential geometry back out into “true” world space and updates the Intersection::WorldToObject transformation to be the correct full transfor- mation from world space to object space. This way, the instanced primitive in un- aware that its concept of “world space” is actually not the real scene world space; the InstancePrimitive does the necessary work so that instances behave as ex- pected. InstancePrimitive Method Deﬁnitions ¢ £¡ bool InstancePrimitive::Intersect(const Ray &r, Intersection *isect) const { Ray ray = WorldToInstance(r); if (instance->Intersect(ray, isect)) { r.maxt = ray.maxt; isect->WorldToObject = isect->WorldToObject * WorldToInstance; Transform instance’s differential geometry to world space¡ return true; } return false; } InstancePrimitive Public Methods ¡¡ ¢ BBox WorldBound() const { return InstanceToWorld(instance->WorldBound()); } Sec. 4.2] Aggregates 135 Finally, the InstancePrimitive::GetAreaLight() and InstancePrimitive::GetBSDF() methods should never be called; these methods in the primitive that the ray actually hit will be called instead. Their implementations (not shown here) simply result in a runtime error. ¨ ¡ £¢ ¢ ¦¤¡¥ £ ¥ £ § ¢ Only the most trivial ray tracing systems do not contain some sort of acceleration structure. Without one, tracing a ray through a scene would take O n time, since ¡ the ray would need to be tested against each primitive in turn, looking for the closest intersection. However, it most scenes, this is extremely wasteful, since the ray passes nowhere near the vast majority of primitives. The goal of acceleration structures is to allow the quick, simultaneous rejection of groups of primitives, and also to order the search process so that nearby intersections are likely to be found ﬁrst. The Aggregate class provides an interface for grouping multiple Primitive objects together. Because Aggregates themselves support the Primitive inter- face, no special support is required elsewhere in lrt for acceleration. In fact, by implementing acceleration in this way, it is easy to experiment with new accelera- tion techniques by simply adding a new Aggregate primitive to lrt. 38 BBox Like InstancePrimitives, the Aggregate intersection routines set the Intersection::primitive 139 GridAccel pointer to the primitive that the ray actually hit, not the aggregate that holds the 134 InstancePrimitive primitive. Because lrt will use this pointer to obtain information about the primi- 135 InstancePrimitive::GetAreaLight( 135 InstancePrimitive::GetBSDF() tive being hit (its reﬂection and emission properties), the Aggregate::GetAreaLight() 134 InstancePrimitive::instance and Aggregate::GetBSDF() methods should never be called, so those methods 134 InstancePrimitive::InstanceToWor (not shown here) will simply cause a runtime error. 154 KdTreeAccel 130 Primitive have some brief discussion of object subdivision (e.g. HBV) versus spatial 130 Primitive::WorldBound() subdivision approaches Also mention general trade-off of time spent builting the hierarchy to im- prove its quality versus number of ray intersection tests. Primitive Declarations ¡¡ ¢ class Aggregate : public Primitive { public: Aggregate Public Methods ¡ }; 4.2.1 Ray–Box Intersections Both the GridAccel and the KdTreeAccel in the next two sections store a BBox that surrounds all of their primitives. This box can be used to quickly determine if a ray doesn’t intersect any of the primitives; if the ray misses the box, it also must miss all of the primitives inside it. Furthermore, both of these accelerators use the point at which the ray enters the bounding box and the point at which it exits as part of the input to their traversal algorithms. Therefore, we will add a BBox method, BBox::IntersectP(), that checks for a ray–box intersection and returns the two parametric t values of the intersection if there is one. Note that BBox is not a Primitive, which is a beneﬁt here, since we want two t values from its IntersectP() method instead of none. 136 Primitives and Intersection Acceleration [Ch. 4 Figure 4.1: Intersecting a ray with an axis-aligned bounding box: we compute in- tersection points with each pair of slabs in turn, progressively narrowing the para- metric interval. Here in 2D, the intersection of the x and y extents along the ray gives the extent where the ray is inside the box. t far t near BBox 38 N = (1,0,0) Figure 4.2: Intersecting a ray with a pair of axis-aligned slabs: the two slabs shown here are planes described by x c, for some constant value c. The normal of each ¢ slab is 1 0 0 . ¡ ¡ ¡ Finding these intersections is fairly simple. One way to think of bounding boxes is as the intersection of three slabs, where a slab is simply the region of space between two parallel planes. To intersect a ray against a box, we intersect the ray against each of the box’s three slabs in turn. Because the slabs are aligned with the three coordinate axes, a number of optimizations can be made in the ray–slab tests. The basic ray–bounding box intersection algorithm works as follows: we start with a parametric interval that covers that range of positions t along the ray where we’re interested in ﬁnding intersections; typically, this is 0 ∞ . We will then suc- ¡ ¡ cessively compute the two parametric t positions where the ray intersects each pair of axis-aligned slabs. We compute the set-intersection of the per-slab intersection interval with our BBox intersection interval, returning failure if we ﬁnd that the resulting interval is degenerate. If, after checking all three slabs, the interval is non-degenerate, we have the parametric range of the ray that is inside the box. Figure 4.1 illustrates this process. Finding the intersection of a ray with an axis-aligned plane only requires a few computations; see the discussion of ray–disk intersections in Section 3.4.3 for a review of this process. Figure 4.2 shows the basic geometry of a ray and a pair of slabs. If the BBox::IntersectP() method returns true, the intersection’s parametric Sec. 4.2] Aggregates 137 range is returned in the optional arguments hitt0 and hitt1. Intersections outside of the Ray::mint/Ray::maxt range of the ray are ignored. BBox Method Deﬁnitions ¡¡ ¢ bool BBox::IntersectP(const Ray &ray, Float *hitt0, Float *hitt1) const { Float t0 = ray.mint, t1 = ray.maxt; for (int i = 0; i < 3; ++i) { Update interval for ith bounding box slab ¡ } if (hitt0) *hitt0 = t0; if (hitt1) *hitt1 = t1; return true; } For each pair of slabs, this routine needs to compute two ray–plane intersections, giving the parametric t values where the intersections occur. Consider the pair of slabs along the x axis: they are can be described by the two planes through the points x1 0 0 and x2 0 0 , each with normal 1 0 0 . There are two t values ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ to compute, one for each plane. Consider the ﬁrst one, t 1 . From the ray–plane intersection equation, we have: 38 BBox 39or x1 0 0 100 BBox::pMax t1 ¢ ¡ 39 ¡ ¡ ¢ ¡¡ ¡ ¡ ¡¡ BBox::pMin 36 dr 100 ¢ ¡ ¡ ¡ ¡¡ Ray 35 Ray::d Because the y and z components of the normal are zero, we can use the deﬁnition 36 Ray::maxt 36 Ray::mint of the dot product to simplify this substantially: 35 Ray::o o r x x1 ¡ x1 or ¡ x t1 ¢ ¢ drx ¡ drx ¡ The code to compute these values starts by computing the reciprocal of the cor- responding component of the ray direction so that it can multiply by this factor instead of performing multiple expensive divisions. Note that although we are di- viding by this component, it is not necessary to verify that it is non-zero! If it is, then invRayDir will hold an inﬁnite value, either ∞ or ∞, and the rest of the algorithm still works correctly.1 Update interval for ith bounding box slab ¢ £¡ Float invRayDir = 1.f / ray.d[i]; Float tNear = (pMin[i] - ray.o[i]) * invRayDir; Float tFar = (pMax[i] - ray.o[i]) * invRayDir; Update parametric interval from slab intersection ts ¡ The two distances are reordered so that t near holds the closer intersection and t far the farther one. This gives a parametric range t near tfar which is used to compute ¡ ¢ the set intersection with the current range t 0 t1 to compute a new range. If this new ¡ ¢ 1 This assumes that the architecture being used supports IEEE ﬂoating-point arithmetic, which is universal on modern systems. The relevant properties of IEEE ﬂoating-point arithmetic are that for all v 0, v 0 ∞ and for all w 0, w 0 ¡ ¡ ∞, where ∞ is a special value such that any positive ¢ ¤ ¡ £ ¡ number multiplied by ∞ gives ∞, any negative number multiplied by ∞ gives ∞, etc. See XXX for £ information about IEEE ﬂoating point. 138 Primitives and Intersection Acceleration [Ch. 4 We seem to be missing this ﬁgure. Figure 4.3: The grid setting caption. range is empty (i.e. t0 t1 ), then the code can immediately return failure. There is another ﬂoating-point related subtlety here, pointed out to us by Evan Parker: in the case where the ray origin is in the plane of one of the bounding box slabs and the ray lies in the plane of the slab, it is possible that tNear or tFar will be ¡ computed by an expression of the form 0 0, which results in a IEEE ﬂoating-point “not a number” (NaN) value. Like inﬁnity values, NaNs are have well-speciﬁed semantics: for example, any logical comparison involving a nan always evaluates to false. Therefore, the code that updates the values of t0 and t1 was carefully written so that if tNear or tFar is NaN, then t0 or t1 won’t ever take on a NaN value but will always remain unchanged. When we ﬁrst wrote this code, we wrote t0 = max(t0, tNear), which might assign NaN to t0 depending on how max() was implemented. Update parametric interval from slab intersection ts ¢ £¡ if (tNear > tFar) swap(tNear, tFar); t0 = tNear > t0 ? tNear : t0; GridAccel 139 t1 = tFar < t1 ? tFar : t1; KdTreeAccel 154 if (t0 > t1) return false; £ © ¨ ¡ ¦ ¨ ¥ ¤ § £ ¥ £ £ The GridAccel class is an accelerator that divides an axis-aligned region of space into equal-sized chunks (called “voxels”). Each voxel stores references to the primitives that overlap it (see Figure 4.3). Given a ray, it steps through each of the voxels that the ray passes through in order, checking for intersections with only the primitives in each voxel Useless ray intersection tests are reduced substantially because primitives far away from the ray aren’t considered at all. Furthermore, because the voxels are considered from near to far along the ray, it is possible to stop performing intersection tests once we have found an intersection and know that it is not possible for any closer intersections to exist. The GridAccel structure can be initialized quickly, and it takes only a simple computation to determine the sequence of voxels through which a given ray will pass. However, this simplicity is a doubled-edged sword; GridAccel can suf- fer from poor performance when the data in the scene aren’t distributed evenly throughout space. If there’s a small region of space with a lot of geometry in it, all that geometry might fall in a single voxel, and performance will suffer when a ray passes through that voxel, as many intersection tests will be performed. This is sometimes referred to as the “teapot in a stadium” problem. The basic problem is that the data structure cannot adapt well to the distribution of the data: if we use a very ﬁne grid we spend too much time stepping through empty space, and if our grid is too coarse we gain little beneﬁt from the grid at all. The KdTreeAccel in the next section adapts to the distribution of geometry such that it doesn’t suffer from this problem. we’re not consistent about saying stuff like this. lrt’s grid accelerator is deﬁned in accelerators/grid.cpp. Sec. 4.3] Grid Accelerator 139 GridAccel Declarations ¢ £¡ class GridAccel : public Aggregate { public: GridAccel Public Methods ¡ private: GridAccel Private Public Methods ¡ GridAccel Private Data ¡ }; 4.3.1 Creation The GridAccel constructor takes a vector of Primitives to be stored in the grid. It automatically determines the number of voxels to store in the grid based on the number of primitives. One factor that adds to the complexity of the grid’s implementation is the fact that some of these primitives may not be directly intersectable (they may return false from Primitive::CanIntersect()), and need to reﬁne themselves into sub-primitives before intersection tests can be performed. This is a problem be- cause when we building the grid, we might have a scene with a single primitive in it and choose to build a coarse grid with few voxels. However, if the primitive is 135 Aggregate later reﬁned for intersection tests, it might turn into millions of primitives and the 130 Primitive original grid resolution would be far too small to efﬁciently ﬁnd intersections. lrt 664 Reference 658 vector addresses this problem in one of two ways: If the refineImmediately ﬂag to the constructor is true, all of the Primitives are reﬁned until they have turned into intersectable primitives. This may waste time and memory for scenes where some of the primitives wouldn’t have ever been reﬁned since no rays approached them. Otherwise, primitives are reﬁned only when a ray enters one of the voxels they are stored in. If they create multiple Primitives when reﬁned, the new primitives are stored in a new instance of a GridAccel that replaces the original Primitive in the top-level grid. This allows us to handle primi- tive reﬁnement without needing to re-build the entire grid each time another primitive is reﬁned. We keep track of whether a grid was constructed explic- itly by lrt or implicitly by a Refine() method for bookkeeping purposes. GridAccel Method Deﬁnitions ¢ £¡ GridAccel::GridAccel(const vector<Reference<Primitive> > &p, bool forRefined, bool refineImmediately) : gridForRefined(forRefined) { Initialize prims with primitives for grid ¡ Initialize mailboxes for grid ¡ Compute bounds and choose grid resolution ¡ Compute voxel widths and allocate voxels ¡ Add primitives to grid voxels ¡ } 140 Primitives and Intersection Acceleration [Ch. 4 We seem to be missing this ﬁgure. Figure 4.4: This is why we need mailboxing. GridAccel Private Data ¢ £¡ bool gridForRefined; First, the constructor determines the ﬁnal set of Primitives to store in the grid, either directly using the primitives passed in or reﬁning all of them until they are intersectable. Initialize prims with primitives for grid ¢ £¡ vector<Reference<Primitive> > prims; if (refineImmediately) for (u_int i = 0; i < p.size(); ++i) p[i]->FullyRefine(prims); else prims = p; Because primitives may overlap multiple grid voxels, there is the possibility that a ray will be tested multiple times against the same primitive as it passes through those voxels (Figure 4.4). A technique called mailboxing makes it possible to AllocAligned() 667 GridAccel 139 quickly determine if a ray has already been tested against a particular primitive, so MailboxPrim 141 these extra tests can be avoided. In this technique, each ray is assigned a unique Primitive 130 integer id. The id of the most recent ray that was tested against that primitive Primitive::FullyRefine() 131 Reference 664 is stored along with the primitive itself. As the ray passes through voxels in the vector 658 grid, the ray’s id is compared with the primitives’ ids–if they are different, the ray– primitive intersection test is performed and the primitive’s id is updated to match the ray’s. If the ray encounters the same primitive in later voxels, the ids will match and the test is trivially skipped.2 The GridAccel constructor creates a MailboxPrim structure for each primi- tive. Grid voxels store pointers to the MailboxPrims of the primitives that overlap them. The MailboxPrim stores both a reference to the primitive as well as the in- teger tag that identiﬁes the last ray that was tested against it. All of the mailboxes are allocated in a single contiguous cache-aligned block for improved memory per- formance. Initialize mailboxes for grid ¢ £¡ nMailboxes = prims.size(); mailboxes = (MailboxPrim *)AllocAligned(nMailboxes * sizeof(MailboxPrim)); for (u_int i = 0; i < nMailboxes; ++i) new (&mailboxes[i]) MailboxPrim(prims[i]); 2 This approach depends on the fact that the grid ﬁnds the intersection for a ray and returns before any other rays are passed to GridAccel::Intersect(); if this was not the case, the grid would still ﬁnd the right ray–primitive intersections, though unecessary tests might be performed as multiple rays overwrote the mailbox ids in primitives that they passed by. In particular, if lrt was multi- threaded, the mailboxing scheme would need to be revisited as rays from different threads would sometimes be passing through the grid simultaneously. In general, parallel raytracing makes mail- boxing much more complicated. Sec. 4.3] Grid Accelerator 141 MailboxPrim Declarations ¢ £¡ struct MailboxPrim { MailboxPrim(const Reference<Primitive> &p) { primitive = p; lastMailboxId = -1; } Reference<Primitive> primitive; int lastMailboxId; }; GridAccel Private Data ¡¡ ¢ u_int nMailboxes; MailboxPrim *mailboxes; After the overall bounds have been computed, the grid needs to determine how many voxels to create along each of the x, y, and z axes. The voxelsPerUnitDist variable is set in the fragment below, giving the average number of voxels that should be created per unit distance in each of the three directions. Given that value, multiplication by the grid’s extent in each direction gives the number of voxels to make. We cap the number of voxels in any direction to 64, to avoid creating enormous data structures for complex scenes. 38 BBox Compute bounds and choose grid resolution ¢ £¡ 39 BBox::pMax for (u_int i = 0; i < prims.size(); ++i) 39 BBox::pMin 677 Clamp() bounds = Union(bounds, prims[i]->WorldBound()); 130 Primitive Vector delta = bounds.pMax - bounds.pMin; 130 Primitive::WorldBound() Find voxelsPerUnitDist for grid ¡ 664 Reference 40 Union() for (int axis = 0; axis < 3; ++axis) { 27 Vector NVoxels[axis] = Round2Int(delta[axis] * voxelsPerUnitDist); NVoxels[axis] = Clamp(NVoxels[axis], 1, 64); } GridAccel Private Data ¡¡ ¢ int NVoxels[3]; BBox bounds; As a ﬁrst approximation to choosing a grid size, the total number of voxels should be be roughly proportional to the total number of primitives; if the primi- tives were uniformly distributed, this would mean that a constant number of prim- itives were in each voxel. Though the primitives won’t be uniformly distributed in general, this is a reasonable approximation. While increasing the number of voxels improves efﬁciency by reducing the average number of primitives per voxel (and thus reducing the number of ray–object intersection tests that need to be per- formed), doing so also increases memory use, hurts cache performance, and in- creases the time spent tracing the ray’s path through the greater number of voxels it overlaps. On the other hand, too few voxels obviously leads to poor performance, due to an increased number of ray–primitive intersections tests to be performed. Given the goal of having the number of voxels be proportional to the number of primitives, the cube root of the number of objects is an appropriate starting point for the grid resolution in each direction. In practice, this value is typically scaled by an empirically-chosen factor; in lrt we use a scale of three. Whichever of 142 Primitives and Intersection Acceleration [Ch. 4 the x, y or z dimensions has the largest extent will have exactly 3 3 N voxels for £ a scene with N primitives. The number of voxels in the other two directions are set in an effort to create voxels that are as close to regular cubes as possible. The voxelsPerUnitDist variable is the foundation of these computations; it gives the is the number of voxels to create per unit distance. Its value is set such that cubeRoot voxels will be created along the axis with the largest extent. Find voxelsPerUnitDist for grid ¢ £¡ int maxAxis = bounds.MaximumExtent(); Float invMaxWidth = 1.f / delta[maxAxis]; Float cubeRoot = 3.f * powf(Float(prims.size()), 1.f/3.f); Float voxelsPerUnitDist = cubeRoot * invMaxWidth; Given the number of voxels in each dimension, the constructor next sets the GridAccel::Width vector, which holds the world-space widths of the voxels in each direction. It also precomputes the GridAccel::InvWidth values, so that rou- tines that would otherwise divide by the Width value can perform a multiplication rather than dividing. Finally, it allocates an array of pointers to Voxel structures for each of the voxels in the grid. These pointers are set to NULL initially and will only be allocated for any voxel with one or more overlapping primitives. 3 Compute voxel widths and allocate voxels ¢ £¡ AllocAligned() 667 BBox::MaximumExtent() 41 for (int axis = 0; axis < 3; ++axis) { GridAccel 139 Width[axis] = delta[axis] / NVoxels[axis]; GridAccel::NVoxels 141 InvWidth[axis] = (Width[axis] == 0.f) ? 0.f : 1.f / Width[axis]; MailboxPrim 141 Vector 27 } Voxel 144 int nVoxels = NVoxels[0] * NVoxels[1] * NVoxels[2]; voxels = (Voxel **)AllocAligned(nVoxels * sizeof(Voxel *)); memset(voxels, 0, nVoxels * sizeof(Voxel *)); GridAccel Private Data ¡¡ ¢ Vector Width, InvWidth; Voxel **voxels; Once the voxels themselves have been allocated, primitives can be added to the voxels that they overlap. The GridAccel constructor adds each primitive’s corresponding MailboxPrim to the voxels that its bounding box overlaps. Add primitives to grid voxels ¢ £¡ for (u_int i = 0; i < prims.size(); ++i) { Find voxel extent of primitive ¡ Add primitive to overlapping voxels ¡ } First, the world-space bounds of the primitive are converted to the integer voxel coordinates that contain that its two opposite corners. This is done by the utility function GridAccel::PosToVoxel(), which turns a world space x y z position ¡ ¡ ¡ into the voxel that contains that point. 3 Some grid implementations try to save even more memory by using a hash table from x y z ¡ ¡ ¢ voxel number to voxel structures. This saves the memory for the voxels array, which may be substantial if the grid has very small voxels, and the vast majority of them are empty. However, this approach increases the computational expense of ﬁnding the Voxel structure for each voxel that a ray passes through. Sec. 4.3] Grid Accelerator 143 Figure 4.5: Two examples of cases where using the bounding box of a primitive to determine which grid voxels it should be stored in will cause it to be stored in a number of voxels unnecessarily: on the left, a long skinny triangle has a lot of empty space inside its axis-aligned bounding box and it is inaccurately added to the shaded voxels. On the right, the surface of the sphere doesn’t intersect many of the voxels inside its bound, and they are also inaccurately included in the sphere’s extent. While this error slightly degrades the grid’s performance, it doesn’t lead to incorrect ray-intersection results. Find voxel extent of primitive ¢ £¡ BBox pb = prims[i]->WorldBound(); int vmin[3], vmax[3]; for (int axis = 0; axis < 3; ++axis) { 38 BBox 39 BBox::pMax vmin[axis] = PosToVoxel(pb.pMin, axis); 39 BBox::pMin vmax[axis] = PosToVoxel(pb.pMax, axis); 677 Clamp() } 142 GridAccel::InvWidth 141 GridAccel::NVoxels GridAccel Private Public Methods¢ £¡ 144 GridAccel::Offset() 142 GridAccel::voxels int PosToVoxel(const Point &P, int axis) const { 33 Point int v = Float2Int((P[axis] - bounds.pMin[axis]) * InvWidth[axis]); return Clamp(v, 0, NVoxels[axis]-1); } The primitive is now added to all of the voxels that its bounds overlap. This is a conservative test for voxel overlap–at worst it will overestimate the voxels that the primitive overlaps. Figure 4.5 shows an example of two cases where this method leads to primitives being stored in more voxels than necessary. An exercise at the end of this chapter describes a more accurate method for associating primitives with voxels. Add primitive to overlapping voxels ¢ £¡ for (int z = vmin[2]; z <= vmax[2]; ++z) for (int y = vmin[1]; y <= vmax[1]; ++y) for (int x = vmin[0]; x <= vmax[0]; ++x) { int offset = Offset(x, y, z); if (!voxels[offset]) { Allocate new voxel and store primitive in it ¡ } else { Add primitive to already-allocated voxel ¡ } } 144 Primitives and Intersection Acceleration [Ch. 4 The GridAccel::Offset() utility functions give the offset into the voxels array for a particular x y z voxel. This is a standard technique for encoding a ¡ ¡ ¡ multi-dimensional array in a 1D array. GridAccel Private Public Methods ¡¡ ¢ inline int Offset(int x, int y, int z) const { return z*NVoxels[0]*NVoxels[1] + y*NVoxels[0] + x; } To further reduce memory used for dynamically-allocated voxels and to im- prove their memory locality, the grid constructor uses an ObjectArena to hand out memory for voxels. This is the ﬁrst time we’ve talked about arenas, isn’t it? Obviously this needs to be beefed up. Allocate new voxel and store primitive in it ¢ £¡ voxels[offset] = new (voxelArena) Voxel(&mailboxes[i]); GridAccel Private Data ¡¡ ¢ ObjectArena<Voxel> voxelArena; If this isn’t the ﬁrst primitive to overlap this voxel, the Voxel has already been allocated and the primitive is handed off to the Voxel::AddPrimitive() method. GridAccel::NVoxels 141 Add primitive to already-allocated voxel ¢ £¡ GridAccel::voxels 142 voxels[offset]->AddPrimitive(&mailboxes[i]); MailboxPrim 141 ObjectArena 668 Now we will deﬁne the Voxel structure, which records the primitives that over- Voxel::AddPrimitive() 145 lap its extent. Because many Voxels may be allocated for a grid, we use a few simple techniques to keep the size of a Voxel small: variables that record its basic properties are packed into a single 32 bit word, and we use a union to overlap a few pointers of various types, only one of which will actually be used depending on the number of overlapping primitives. Voxel Declarations ¢ £¡ struct Voxel { Voxel Public Methods ¡ union { MailboxPrim *onePrimitive; MailboxPrim **primitives; }; u_int allCanIntersect:1; u_int nPrimitives:31; }; When a Voxel is ﬁrst allocated, only a single primitive has been found that overlaps it, so Voxel::nPrimitives is one, and Voxel::onePrimitive is used to store a pointer to its MailboxPrim. As more primitives are found to overlap, Voxel::nPrimitives will be greater than one, and Voxel::primitives is set to point to a dynamically-allocated array of pointers to MailboxPrim structures. Because these conditions are mutually-exclusive, the pointer to the single primitive and pointer to the array of pointers to primitives can share the same memory by being stored in a union. Voxel::allCanIntersect is used to record if all of the primitives in the voxel are intersectable or if some need reﬁnement. For starters, it is conservatively set to false. Sec. 4.3] Grid Accelerator 145 Voxel Public Methods¢ £¡ Voxel(MailboxPrim *op) { allCanIntersect = false; nPrimitives = 1; onePrimitive = op; } When Voxel::AddPrimitive() is called, this must mean that two or more primitives overlap the voxel, so the primitives’ MailboxPrim pointers will be stored in its Voxel::primitives array. Memory for this array must be allocated in two cases: if the voxel currently holds a single primitive and we need to store a second, or if the allocated array is full. Rather than using more space in the voxel structure to store the current size of the array, the code here follows the con- vention that the array size will always be a power of two. Thus, whenever the Voxel::nPrimitives count is a power of two, the array has been ﬁlled and more memory is needed. Voxel Public Methods ¡¡ ¢ void AddPrimitive(MailboxPrim *prim) { if (nPrimitives == 1) { Allocate initial primitives array in voxel ¡ } 667 AllocAligned() 678 IsPowerOf2() else if (IsPowerOf2(nPrimitives)) { 141 MailboxPrim Increase size of primitives array in voxel ¡ 144 Voxel } 144 Voxel::allCanIntersect 144 Voxel::nPrimitives primitives[nPrimitives] = prim; 144 Voxel::onePrimitive ++nPrimitives; 144 Voxel::primitives } Recall that Voxel::onePrimitive and Voxel::primitives are stored in a union. Therefore, it is important to store the memory for the array of pointers in a local variable on the stack and initialize its ﬁrst entry from Voxel::onePrimitive before Voxel::primitives is initialized with the array pointer. Otherwise, the value of Voxel::onePrimitive would be clobbered before it was added to the new array, since Voxel::onePrimitive and Voxel::primitives share the same memory. Allocate initial primitives array in voxel ¢ £¡ MailboxPrim **p = (MailboxPrim **)AllocAligned( 2 * sizeof(MailboxPrim *)); p[0] = onePrimitive; primitives = p; Similarly, it’s necessary to be careful with setting Voxel::primitives to the pointer to the expanded array of MailboxPrim pointers. 146 Primitives and Intersection Acceleration [Ch. 4 Increase size of primitives array in voxel ¢ £¡ int nAlloc = 2 * nPrimitives; MailboxPrim **p = (MailboxPrim **)AllocAligned(nAlloc * sizeof(MailboxPrim *)); for (u_int i = 0; i < nPrimitives; ++i) p[i] = primitives[i]; FreeAligned(primitives); primitives = p; We won’t show the straightforward implementations of the GridAccel::WorldBound() or GridAccel::CanIntersect() methods or its destructor. 4.3.2 Traversal The GridAccel::Intersect() method handles the task of determining which voxels a ray passes through and calling the appropriate ray–primitive intersection routines. GridAccel Method Deﬁnitions ¡¡ ¢ bool GridAccel::Intersect(const Ray &ray, Intersection *isect) const { Check ray against overall grid bounds ¡ BBox::Inside() 40 Get ray mailbox id ¡ BBox::IntersectP() 137 Set up 3D DDA for ray ¡ GridAccel 139 Walk ray through voxel grid ¡ GridAccel::CanIntersect() 146 GridAccel::WorldBound() 146 } Intersection 131 The ﬁrst task is to determine where the ray enters the grid, which gives the MailboxPrim 141 Point 33 starting point for traversal through the voxels. If the ray’s origin is inside the grid’s Ray 36 bounding box, then clearly it begins there. Otherwise the GridAccel::Intersect() Ray::mint 36 Voxel::nPrimitives 144 method ﬁnds the intersection of the ray with the grid’s bounding box. If it hits, the Voxel::primitives 144 ﬁrst intersection along the ray is the starting point. If the ray misses the grid’s bounding box, there can be no intersection with any of the geometry in the grid so GridAccel::Intersect() returns immediately. Check ray against overall grid bounds ¢ £¡ Float rayT; if (bounds.Inside(ray(ray.mint))) rayT = ray.mint; else if (!bounds.IntersectP(ray, &rayT)) return false; Point gridIntersect = ray(rayT); Once we know that there is work to do, the next task is to ﬁnd a unique ray identiﬁer for mailboxing. We simply use a monotonic sequence of ray identiﬁers sorted in the GridAccel::curMailboxId member. Get ray mailbox id ¢ £¡ int rayId = ++curMailboxId; GridAccel Private Data ¡¡ ¢ static int curMailboxId; We now compute the initial x y z integer voxel coordinates for this ray as well ¡ ¡ ¡ as a number of auxiliary values that will make it very efﬁcient to incrementally Sec. 4.3] Grid Accelerator 147 compute the set of voxels that the ray passes through. The ray–voxel traversal computation is similar in spirit to Bresenhamn’s classic line drawing algorithm, where the series of pixels that a line passes through are found incrementally using just addition and comparisons to step from one pixel to the next. The main differ- ence between the ray marching algorithm and Bresenham’s are that we would like to ﬁnd all of the voxels that the ray passes through, while Bresenham’s algorithm does not provide this guarantee. The values the ray–voxel stepping algorithm needs to keep track of are: 1. The coordinates of the voxel currently being considered, Pos. 2. The parametric t position along the ray where it makes its next crossing into another voxel in each of the x, y, and z directions, NextCrossingT (Fig- ure 4.6). For example, for a ray with a positive x direction component, the parametric value along the ray where it crosses into the next voxel in x, NextCrossingT[0] is the parametric starting point rayT plus the x dis- tance to the next voxel divided by the ray’s x direction component. (This is similar to the ray–plane intersection formula.) 3. The change in the current voxel coordinates after a step in each direction (1 or -1), stored in Step. 35 Ray::d 4. The distance along the ray between voxels in each direction, DeltaT. These values are found by dividing the width of a voxel in a particular direction by the ray’s corresponding direction component, giving the parametric distance along the ray that we have to travel to get from one side of a voxel to the other in the particular direction. 5. The coordinates of the last voxel the ray passes through before it exits the grid, Out. The ﬁrst two items will be updated as we step through the grid, while the last three are constant per ray. Set up 3D DDA for ray ¢ £¡ Float NextCrossingT[3], DeltaT[3]; int Step[3], Out[3], Pos[3]; for (int axis = 0; axis < 3; ++axis) { Compute current voxel for axis ¡ if (ray.d[axis] >= 0) { Handle ray with positive direction for voxel stepping ¡ } else { Handle ray with negative direction for voxel stepping ¡ } } Computing the voxel address that the ray starts out in is easy since this method has already determined the position where the ray enters the grid. We simply use the utility routine GridAccel::PosToVoxel deﬁned earlier. 148 Primitives and Intersection Acceleration [Ch. 4 Figure 4.6: Stepping a ray through a voxel grid: rayT is the distance along the ray to the ﬁrst intersection with the grid. The distance along the ray to the next distance we cross into the next voxel in the x direction is stored in NextCrosing[0], and similarly for the y and z (not shown) directions. When we cross into the next x voxel, for example, we can immediately update the value of NextCrossingT[0] by adding a ﬁxed value, the voxel width in x divided by the ray’s x direction, DeltaT[0]. BBox::pMin 39 GridAccel::NVoxels 141 Compute current voxel for axis ¢ £¡ GridAccel::PosToVoxel() 143 Pos[axis] = PosToVoxel(gridIntersect, axis); GridAccel::Width 142 Ray::d 35 If the ray’s direction component is zero for a particular axis, then the NextCrossingT value for that axis will be initialized to the IEEE ﬂoating point ∞ value by the com- putation below. The voxel stepping logic later in this section will always decide to step in one of the other directions and we will correctly never step in this direction. This is convenient because we can handle rays that are perpendicular to any axis without any special code to test for division by zero. Handle ray with positive direction for voxel stepping ¢ £¡ NextCrossingT[axis] = rayT + (VoxelToPos(Pos[axis]+1, axis) - gridIntersect[axis]) / ray.d[axis]; DeltaT[axis] = Width[axis] / ray.d[axis]; Step[axis] = 1; Out[axis] = NVoxels[axis]; This comes out of nowhere; maybe it should be presented alongsided Pos- ToVoxel? The GridAccel::VoxelToPos() method is the opposite of GridAccel::PosToVoxel(); it returns the position of a particular voxel’s lower corner. GridAccel Private Public Methods ¡¡ ¢ Float VoxelToPos(int p, int axis) const { return bounds.pMin[axis] + p * Width[axis]; } Sec. 4.3] Grid Accelerator 149 GridAccel Private Public Methods ¡¡ ¢ Point VoxelToPos(int x, int y, int z) const { return bounds.pMin + Vector(x * Width[0], y * Width[1], z * Width[2]); } Similar computations compute these values for rays with negative direction components: Handle ray with negative direction for voxel stepping ¢ £¡ NextCrossingT[axis] = rayT + (VoxelToPos(Pos[axis], axis) - gridIntersect[axis]) / ray.d[axis]; DeltaT[axis] = -Width[axis] / ray.d[axis]; Step[axis] = -1; Out[axis] = -1; Once all the preprocessing is done for the ray, we can step through the grid. Starting with the ﬁrst voxel that the ray passes through, we check for intersec- tions with the primitives inside that voxel. If we ﬁnd a hit, the boolean ﬂag hitSomething is set to true. We must be careful, however, because the intersec- tion point may be outside the current voxel since primitives may overlap multiple 39 BBox::pMin voxels. Therefore, the method doesn’t immediately return when done processing a 144 GridAccel::Offset() voxel where an intersection was found. Instead, we use the fact that the primitive’s 142 GridAccel::voxels intersection routine will update the Ray::maxt member. When stepping through 148 142 GridAccel::VoxelToPos() GridAccel::Width voxels, we will return only when we enter a voxel at a point that is beyond the 131 Intersection closest found intersection. 141 MailboxPrim 33 Point Walk ray through voxel grid £¡ ¢ 36 Ray bool hitSomething = false; 35 Ray::d for (;;) { 27 Vector 144 Voxel Voxel *voxel = voxels[Offset(Pos[0], Pos[1], Pos[2])]; if (voxel != NULL) hitSomething |= voxel->Intersect(ray, isect, rayId); Advance to next voxel ¡ } return hitSomething; perhaps a transition saying that we have a Voxel::Intersect routine GridAccel Method Deﬁnitions ¡¡ ¢ bool Voxel::Intersect(const Ray &ray, Intersection *isect, int rayId) { Reﬁne primitives in voxel if needed ¡ Loop over primitives in voxel and ﬁnd intersections ¡ } The boolean Voxel::allCanIntersect member tells us whether all of the primitives in the voxel are known to be intersectable. If this value is false, we loop over all primitives, calling their reﬁnement routines as needed until only in- tersectable geometry remains. The logic for ﬁnding the i th MailboxPrim in the loop over primitives is slightly complicated by a level of pointer indirection, since a single primitive and multiple primitives are stored differently in voxels. Handling 150 Primitives and Intersection Acceleration [Ch. 4 this case in the way done below is worthwhile since it moves the test for whether we should be using the Voxel::onePrimitive item for a single primitive or the Voxel::primitives array for multiple primitives outside the body of the loop. Reﬁne primitives in voxel if needed ¢ £¡ if (!allCanIntersect) { MailboxPrim **mpp; if (nPrimitives == 1) mpp = &onePrimitive; else mpp = primitives; for (u_int i = 0; i < nPrimitives; ++i) { MailboxPrim *mp = mpp[i]; Reﬁne primitive in mp if it’s not intersectable ¡ } allCanIntersect = true; } Primitives that need reﬁnement are reﬁned until only intersectable primitives re- main, and a new GridAccel is created to hold the returned primitives if more than one was returned. One reason to always make a GridAccel for multiple reiﬁned primitives is that doing so simpliﬁes primitive reﬁnement; a single Primitive al- ways turns into a single object that represents all of the new Primitives, so it’s GridAccel 139 MailboxPrim 141 never necessary to increase the number of primitives in the voxel. If this primi- MailboxPrim::primitive 141 tive overlaps multiple voxels, then because all of them hold a pointer to a single Primitive 130 MailboxPrim for it, it sufﬁces to just update the primitive reference in the the Primitive::CanIntersect() 131 Primitive::FullyRefine() 131 shared MailboxPrim directly, and there’s no need to loop over all of the voxels. 4 Reference 664 Reﬁne primitive in mp if it’s not intersectable ¢ £¡ vector 658 Voxel::allCanIntersect 144 if (!mp->primitive->CanIntersect()) { Voxel::nPrimitives 144 vector<Reference<Primitive> > p; Voxel::onePrimitive 144 mp->primitive->FullyRefine(p); Voxel::primitives 144 if (p.size() == 1) mp->primitive = p[0]; else mp->primitive = new GridAccel(p, true, false); } Once we know that we have only intersectable primitives, the loop over MailboxPrims for performing intersection tests here again has to deal with the difference between voxels with one primitive and voxels with multiple primitives in the same manner that the primitive reﬁnement code did. 4 The bounding box of the original unreﬁned primitive must encompass the reﬁned geometry as well, so there’s no danger that the reﬁned geometry will overlap more voxels than before. On the other hand, it also may overlap many fewer voxels, which would lead to unnecessary intersection tests. Sec. 4.3] Grid Accelerator 151 Loop over primitives in voxel and ﬁnd intersections¢ £¡ bool hitSomething = false; MailboxPrim **mpp; if (nPrimitives == 1) mpp = &onePrimitive; else mpp = primitives; for (u_int i = 0; i < nPrimitives; ++i) { MailboxPrim *mp = mpp[i]; Do mailbox check between ray and primitive ¡ Check for ray–primitive intersection ¡ } return hitSomething; Here now is the mailbox check; if this ray was previously intersected against this primitive in another voxel, the redundant intersection test can be trivially skipped. Do mailbox check between ray and primitive ¢ £¡ if (mp->lastMailboxId == rayId) continue; Finally, if we determine that a ray–primitive intersection test is necessary, the primitive’s mailbox needs to be be updated. Check for ray–primitive intersection ¢ £¡ 141 MailboxPrim mp->lastMailboxId = rayId; 141 MailboxPrim::lastMailboxId if (mp->primitive->Intersect(ray, isect)) { 131 Primitive::Intersect() 144 Voxel::nPrimitives hitSomething = true; 144 Voxel::onePrimitive } 144 Voxel::primitives After doing the intersection tests for the primitives in the current voxel, it is necessary to step to the next voxel in the ray’s path. We need to decide whether to step in the x, y, or z direction. Fortunately, the NextCrossingT variable tells us the distance to the next crossing for each direction, and we can simply choose the smallest one. Traversal can be terminated if this step goes outside of the voxel grid, or if the selected NextCrossingT value is beyond the t distance of an already- found intersection. Otherwise, we step to the chosen voxel, and increment the chosen direction’s NextCrossingT by its DeltaT value, so that future traversal steps will know how far to go before stepping in this direction again. Advance to next voxel ¢ £¡ Find stepAxis for stepping to next voxel ¡ if (ray.maxt < NextCrossingT[stepAxis]) break; Pos[stepAxis] += Step[stepAxis]; if (Pos[stepAxis] == Out[stepAxis]) break; NextCrossingT[stepAxis] += DeltaT[stepAxis]; Choosing the axis along which to step basically requires ﬁnding the smallest of three numbers, an extremely straightforward task. However, in this case an opti- mization is possible because we don’t care about the value of the smallest number, just its index in the NextCrossingT array. We can compute this index without any branching, which can lead to substantial performance improvements on a modern CPU. 152 Primitives and Intersection Acceleration [Ch. 4 The tricky bit of code below determines which of the three NextCrossingT val- ues is the smallest and sets stepAxis accordingly. It encodes this logic by setting each of the three low-order bits in an integer to the results of three comparisons be- tween pairs of NextCrossingT values. We then use a table (cmpToAxis) to map the resulting integer to the direction with the smallest value. This kind of optimization is frequently available when trying to ﬁnd the mini- mum or maximum of a very small group of numbers. An exercise at the end of the chapter asks you to explore the beneﬁts of this approach. Find stepAxis for stepping to next voxel ¢ £¡ int bits = ((NextCrossingT[0] < NextCrossingT[1]) << 2) + ((NextCrossingT[0] < NextCrossingT[2]) << 1) + ((NextCrossingT[1] < NextCrossingT[2])); const int cmpToAxis[8] = { 2, 1, 2, 1, 2, 2, 0, 0 }; int stepAxis = cmpToAxis[bits]; The grid also provides a special GridAccel::IntersectP() method that is optimized for checking for intersection along shadow rays, where we are only in- terested in the presence of an intersection, rather than the details of the intersec- tion itself. It is almost identical to the GridAccel::Intersect() routine, except that it calls the Primitive::IntersectP() method of the primitives rather than KdTreeAccel 154 Ray 36 Primitive::Intersect(), and it immediately stops traversal when any intersec- tion is found. Because of the small number of differences, we won’t include the implementation here. GridAccel Public Methods ¡¡ ¢ bool IntersectP(const Ray &ray) const; ¡¨ ¡ ¤ ¢ ¨ £ ¥ ¥ ¥ ¤§ £ ¥ £ £ Binary space partitioning (BSP) trees adaptively subdivide space into irregularly- sized regions. The most important consequence of this design is that they can be much more effective than a regular grid for irregular collections of geometry. A BSP tree starts with a bounding box that encompasses the entire scene. If the num- ber of primitives in the box is greater than some threshold, the box is split in half by a plane. Primitives are then assigned to whichever half they overlap. Primitives that lie in both halves are assigned twice. This process continues recursively un- til either each sub-region contains a sufﬁciently small number of primitives, or a maximum splitting depth is reached. Because the splitting planes can be placed at arbitrary positions inside the overall bound and because different parts of 3D space can be reﬁned to different degrees, BSP trees can easily handle uneven distributions of geometry. Two variations of BSP trees are kd-trees and octrees. A kd-tree simply restricts the splitting plane to be perpendicular to one of the coordinate axes; this makes traversal and construction of the tree more efﬁcient. The octree uses three axis- perpendicular planes simultaneously, splitting the box into eight regions at each step. In this section, we will implement a kd-tree for ray intersection acceleration in the KdTreeAccel class. Source code for this class can be found in the ﬁle accelerators/kdtree.cpp. Sec. 4.4] Kd-Tree Accelerator 153 Figure 4.7: The kd-tree is built by recursively splitting the bounding box of the scene geometry along one of the coordinate axes. Here, the ﬁrst split is along the x axis; it is placed so that the triangle is precisely alone in the right region and the rest of the primitives end up on the left. The left region is then reﬁned a few more times with axis-aligned splitting planes. The details of the reﬁnement criteria– which axis is used to split space at each step, at which position along the axis the plane is placed, and at what point reﬁnement terminates–can all substantially affect the performance of the tree in practice. 154 Primitives and Intersection Acceleration [Ch. 4 KdTreeAccel Declarations ¡¡ ¢ class KdTreeAccel : public Aggregate { public: KdTreeAccel Public Methods ¡ private: KdTreeAccel Private Data ¡ }; In addition to the primitives to be stored, the KdTreeAccel constructor takes a few values that will be used to guilde the decisions that will be made as the tree is built; these parameters are just stored in member variables for later use. For simplicity of implementation, the KdTreeAccel requires that all of the primitives it stores are intersectable. We leave as an exercise the task of improving the imple- mentation to do lazy reﬁnement like the GridAccel does. Therefore, the construc- tor starts out by reﬁning all primitives until all are intersectable before bulding the tree; see Figure 4.7 for an overview of how the tree is built. KdTreeAccel Method Deﬁnitions ¢ £¡ KdTreeAccel::KdTreeAccel(const vector<Reference<Primitive> > &p, int icost, int tcost, Float ebonus, int maxp, int maxDepth) : isectCost(icost), traversalCost(tcost), Aggregate 135 emptyBonus(ebonus), maxPrims(maxp) AllocAligned() 667 GridAccel 139 { MailboxPrim 141 vector<Reference<Primitive > > prims; Primitive 130 for (u_int i = 0; i < p.size(); ++i) Primitive::FullyRefine() 131 Reference 664 p[i]->FullyRefine(prims); vector 658 Initialize mailboxes for KdTreeAccel ¡ Build kd-tree for accelerator ¡ } KdTreeAccel Private Data ¢ £¡ int isectCost, traversalCost, maxPrims; Float emptyBonus; As with GridAccel, the kd-tree uses mailboxing to avoid repeated intersections with primitives that straddle splitting planes and overlap multiple regions of the tree. In fact, it uses the exact same MailboxPrim structure. Initialize mailboxes for KdTreeAccel ¢ £¡ curMailboxId = 0; nMailboxes = prims.size(); mailboxPrims = (MailboxPrim *)AllocAligned(nMailboxes * sizeof(MailboxPrim)); for (u_int i = 0; i < nMailboxes; ++i) new (&mailboxPrims[i]) MailboxPrim(prims[i]); KdTreeAccel Private Data ¡¡ ¢ u_int nMailboxes; MailboxPrim *mailboxPrims; mutable int curMailboxId; Sec. 4.4] Kd-Tree Accelerator 155 4.4.1 Tree Representation The kd-tree is a binary tree, where each interior node always has two children and where leaves of the tree store the primitives that overlap them. Each interior node must provide access to three pieces of information: Split axis: which of the x, y, or z axes we split along at this node Split position: the position of the splitting plane along the axis Children: information about how to reach the two child nodes beneath it Each leaf node needs only to record which primitives overlap it. It is worth going through a bit of trouble to ensure that all interior leaf nodes and many leaf notes use just 8 bytes of memory (assuming 4 byte Floats and pointers), because doing so ensures that four nodes will ﬁt into a 32 byte cache line. Because there are many nodes in the tree and because many nodes are accessed for each ray, minimizing the size of the node representation substantially improves cache performance. Our initial implementation used a 16 byte node representation; when we reduced the size to 8 bytes we obtained an almost 20% speed increase. Both leaves and interior nodes types of node are represented by the KdAccelNode struc- ture below; the comments after each union member indicate whether a particular 139 GridAccel ﬁeld is used for interior nodes, leaf nodes, or both. 141 MailboxPrim KdAccelNode Declarations ¡¡ ¢ struct KdAccelNode { KdAccelNode Methods ¡ union { u_int flags; // Both Float split; // Interior u_int nPrims; // Leaf }; union { u_int aboveChild; // Interior MailboxPrim *onePrimitive; // Leaf MailboxPrim **primitives; // Leaf }; }; The two low order bits of the KdAccelNode::flags variable are used to dif- ferentiate between interior nodes with x, y, and z splits (where these bits hold the values 0, 1, and 2, respectively), and leaf nodes (where these bits hold the value 3.) It is relatively easy to store leaf nodes in 8 bytes: since the low two bits of KdAccelNode::flags are used to indicate that this is a leaf, the upper 30 bits of KdAccelNode::nPrims are available to record how many primitives overlap it. This should be plenty, because if the tree was built properly there should be just a handful of primitives in each leaf. As with GridAccel, if just a single primi- tive overlaps a KdAccelNode leaf, its MailboxPrim pointer is stored directly in the KdAccelNode::onePrimitive ﬁeld. If more primitives overlap, memory is dy- namically allocated for an array of them pointed to by KdAccelNode::primitives. 156 Primitives and Intersection Acceleration [Ch. 4 Leaf nodes are easy to initialize; the number of primitives must be shifted two bits to the left before being stored so that the low two bits of KdAccelNode::flags can be set to 11 to indicate that this is a leaf node. KdAccelNode Methods ¢ £¡ void initLeaf(int *primNums, int np, MailboxPrim *mailboxPrims, MemoryArena &zone) { nPrims = np << 2; flags |= 3; Store MailboxPrim *s for leaf node ¡ } For leaf nodes with zero or one overlapping primitives, no dynamic memory allocation is necessary thanks to the KdAccelNode::onePrimitive ﬁeld. For the case where multiple primitives overlap, the caller passes in a MemoryArena for allocating memory for the arrays of MailboxPrim pointers. This helps to reduce wasted space for these allocations and improves cache efﬁciency by placing all of these arrays together in memory. Store MailboxPrim *s for leaf node ¢ £¡ if (np == 0) onePrimitive = NULL; KdAccelNode 155 else if (np == 1) KdAccelNode::flags 155 KdAccelNode::nPrims 155 onePrimitive = &mailboxPrims[primNums[0]]; KdAccelNode::onePrimitive 155 else { KdAccelNode::primitives 155 MailboxPrim 141 primitives = (MailboxPrim **)zone.Alloc(np * MemoryArena 670 sizeof(MailboxPrim *)); for (int i = 0; i < np; ++i) primitives[i] = &mailboxPrims[primNums[i]]; } Getting interior nodes down to 8 bytes takes a bit more work. As explained above, the lowest two bits of KdAccel::flags are used to record which axis the node was split along. Yet the split position along that axis is stored in KdAccelNode::split, a Float value that occupies the same memory as KdAccelNode::flags. This seems impossible—we can’t just ask the compiler to use the top 30 bits of KdAccelNode::split as a Float. It turns out that as long as the lowest two bits of KdAccelNode::flags are set after KdAccelNode::split, this technique works thanks to the layout of Floats in memory. For IEEE ﬂoating point, the two bits used by KdAccelNode::flags are the least-signiﬁcant bits of the ﬂoating-point mantissa value, so changing their original value only minimally affects the ﬂoating-point value that is stored. Fig- ure 4.8 illustrates the layout in memory. Although this trick is fairly complicated, it is worth it for the performance beneﬁts. In addition, all of the complexity is hidden behind a small number of KdAccelNode methods, so the rest of the system is insulated from our special rep- resentation. So that we don’t need memory to store pointers to the two child nodes of an interior node, all of the nodes are allocated in a single contiguous block of memory, and the child of an interior node that is responsible for space “below” the splitting plane is always stored in the array position immediately after its parent (this also Sec. 4.4] Kd-Tree Accelerator 157 Figure 4.8: Layout of ﬂoats and ints in memory... improves cache performance, by keeping at least one child close to its parent in memory.) The other child, representing space above the splitting plane will end up at somewhere else in the array; KdAccelNode::aboveChild stores its position. Given all those conventions, the code to initialize an interior node is straightfor- ward. The split position is stored before the split axis is written in KdAccelNode::flags. Rather than directly assigning the axis to KdAccelNode::flags, which would clobber KdAccelNode::split as well, it’s necessary to carefully set just the low two bits of the ﬂags with the axis’s value. 155 KdAccelNode KdAccelNode Methods ¡¡ ¢ 155 KdAccelNode::flags void initInterior(int axis, Float s) { 155 KdAccelNode::nPrims split = s; 155 KdAccelNode::split flags &= ˜3; flags |= axis; } Finally, we’ll provide a few methods to extract various values from the node, so that callers don’t have to be aware of the admittedly complex details of its repre- sentation. KdAccelNode Methods ¡¡ ¢ Float SplitPos() const { return split; } int nPrimitives() const { return nPrims >> 2; } int SplitAxis() const { return flags & 3; } bool IsLeaf() const { return (flags & 3) == 3; } 4.4.2 Tree construction The kd-tree is built with a recursive top-down algorithm. At each step, we have an axis-aligned region of space and a set of primitives that overlap that region. Each region is either split into two sub-regions and turned into an interior node, or a leaf node is created with the overlapping primitives, terminating the recursion. As mentioned in the discussion of KdAccelNodes, all tree nodes are stored in a contiguous array; KdTreeAccel::nextFreeNode records the next node in this array that is available, and KdTreeAccel::nAllocedNodes records the total num- ber that have been allocated. By setting both of them to zero and not allocating any nodes at startup, we ensure that an allocation will be done immediately when the ﬁrst node of the tree is initialized. 158 Primitives and Intersection Acceleration [Ch. 4 It is also necessary to determine a maximum tree depth if one wasn’t given to the constructor. Though the tree construction process will normally terminate naturally at a reasonable depth, we cap the maximum depth so that the amount of memory used for the tree cannot grow without bound in pathological cases. We have found that the expression 8 1 3 log N gives a good maximum depth for a variety of ¤ ¡ scenes. Build kd-tree for accelerator ¢ £¡ nextFreeNode = nAllocedNodes = 0; if (maxDepth <= 0) maxDepth = Round2Int(8 + 1.3f * Log2Int(prims.size())); Compute bounds for kd-tree construction ¡ Allocate working memory for kd-tree construction ¡ Initialize primNums for kd-tree construction ¡ Start recursive construction of kd-tree ¡ Free working memory for kd-tree construction ¡ KdTreeAccel Private Data ¡¡ ¢ KdAccelNode *nodes; int nAllocedNodes, nextFreeNode; BBox 38 Because the construction routine will be repeatedly using the bounding boxes of KdAccelNode 155 the primitives along the way, we store them in a vector so that the potentially-slow Primitive::WorldBound() 130 prims0 166 Primitive::WorldBound() methods don’t need to be called repeatedly. prims1 166 Compute bounds for kd-tree construction ¢ £¡ Union() 40 vector 658 vector<BBox> primBounds; primBounds.reserve(prims.size()); for (u_int i = 0; i < prims.size(); ++i) { BBox b = prims[i]->WorldBound(); bounds = Union(bounds, b); primBounds.push_back(b); } KdTreeAccel Private Data ¡¡ ¢ BBox bounds; One of the parameters to the tree construction routine is an array of integers indicating which primitives overlap the current node. For the root node, we just need an array with prims.size() entries set up such that the i th entry has the value i. Initialize primNums for kd-tree construction ¢ £¡ int *primNums = new int[prims.size()]; for (u_int i = 0; i < prims.size(); ++i) primNums[i] = i; KdTreeAccel::buildTree() is called for each tree node; it is responsible for deciding if the node should be an interior node or leaf and updating the data struc- tures appropriately. The last three parameters, edges, prims0, and prims1, are pointers to data from the Allocate working memory for kd-tree construction frag- ¡ ment, which will be deﬁned in a few pages. Sec. 4.4] Kd-Tree Accelerator 159 Start recursive construction of kd-tree ¢ £¡ buildTree(0, bounds, primBounds, primNums, prims.size(), maxDepth, edges, prims0, prims1); This destructor is out of nowhere... Free working memory for kd-tree construction ¢ £¡ delete[] primNums; The main parameters to KdTreeAccel::buildTree are the offset into the array of KdAccelNodes to use for the node that it creates, the bounding box that gives the region of space that the node covers, and the indices of primitives that overlap it. KdTreeAccel Method Deﬁnitions ¡¡ ¢ void KdTreeAccel::buildTree(int nodeNum, const BBox &nodeBounds, const vector<BBox> &allPrimBounds, int *primNums, int nPrims, int depth, BoundEdge *edges[3], int *prims0, int *prims1, int badRefines) { Get next free node from nodes array ¡ Initialize leaf node if termination criteria met ¡ Initialize interior node and continue recursion ¡ } 667 AllocAligned() If all of the allocated nodes have been used, node memory is reallocated with 38 BBox 162 BoundEdge twice as many entries and the old values are copied over. The ﬁrst time KdTreeAccel::buildTree() 155 KdAccelNode is called, KdTreeAccel::nAllocedNodes will be zero and an initial block of tree 156 KdAccelNode::initLeaf() nodes will be allocated. 154 KdTreeAccel 158 KdTreeAccel::nAllocedNodes Get next free node from nodes array £¡ ¢ 158 KdTreeAccel::nextFreeNode if (nextFreeNode == nAllocedNodes) { 158 KdTreeAccel::nodes 166 prims0 int nAlloc = max(2 * nAllocedNodes, 512); 166 prims1 KdAccelNode *n = (KdAccelNode *)AllocAligned(nAlloc * 658 vector sizeof(KdAccelNode)); if (nAllocedNodes > 0) { memcpy(n, nodes, nAllocedNodes * sizeof(KdAccelNode)); FreeAligned(nodes); } nodes = n; nAllocedNodes = nAlloc; } ++nextFreeNode; A leaf node is created (stopping the recursion) either if there are a sufﬁcently small number of primitives in the region, or if the maximum depth has been reached. The depth parameter starts out as the tree’s maximum depth and is decremented at each level. Initialize leaf node if termination criteria met ¢ £¡ if (nPrims <= maxPrims || depth == 0) { nodes[nodeNum].initLeaf(primNums, nPrims, mailboxPrims, zone); return; } 160 Primitives and Intersection Acceleration [Ch. 4 As described above, KdAccelNode::initLeaf() uses a memory zone to allo- cate space for variable-sized arrays of primitives. Because the zone used here is a member variable, the memory it allocates will naturally all be freed when the KdTreeAccel is destroyed. KdTreeAccel Private Data ¡¡ ¢ MemoryArena zone; If we are building an internal node, it is necessary to choose a splitting plane, classify the primitives with respect to that plane, and recurse. Initialize interior node and continue recursion ¢ £¡ Choose split axis position for interior node ¡ Create leaf if no good splits were found ¡ Classify primitives with respect to split ¡ Recursively initialize children nodes ¡ Our implementation chooses a split based on a cost model that estimates the computational expense of performing ray intersection tests, including the time spent traversing nodes of the tree and the time spent on ray–primitive intersec- tion tests. Its goal is to minimize the total cost; we implement a greedy algorithm that minimizes the cost for the node individually. The estimated cost is computed KdTreeAccel 154 for several candidate splitting planes in the node, and the split that gives the lowest MemoryArena 670 Primitive 130 cost is chosen. The idea behind the cost model is straightforward: at any node of the tree we could just create a leaf node for the current region and geometry. In that case, any ray that passes through this region will be tested against all of the overlapping primitives and will incur a cost of N ∑ ti i ¡ ¡ i 1 where N is the number of primitives in the region and t i i is the time to compute a ¡ ray–object intersection with the ith primitive. The other option is to split the region. In that case, rays will incur the cost Nb Na tt p0 ∑ ti bi ¡ p1 ∑ ti ai ¡ ¡ i 1 i 1 where tt is the time it takes to traverse the interior node and determine which of the children the ray passes through, p 0 and p1 are the probabilities that the ray passes through each of the two regions, b i and ai are the indices of primitives below and above the splitting plane, and and Nb and Na are the number of primitives that overlap the regions below and above the splitting plane, respectively. The choice of splitting plane affects both the two probabilities as well as the number of primitives on each side of the split. In our implementation, we will make the simplifying assumption that t i i is the ¡ same for all of the primitives; this is probably not too far from reality, and any error that it introduces doesn’t seem to affect the performance of this accelerator very much. Another possibility would be to add a method to Primitive that returns an estimate of the number of CPU cycles its intersection test requires. The intersection Sec. 4.4] Kd-Tree Accelerator 161 Figure 4.9: Split A into B and C... cost ti and the traversal cost tt can be set by the user; their default values are 80 and 1, respectively. Ultimately, it is the ratio of these two values that primarily determines the behavior of the tree-building algorithm. Finally, it is worth giving a slight preference to choosing splits where one of the children has no primitives overlapping it, since rays passing through these regions can immediately advance to the next kd-tree node without any ray–primitive inter- section tests. Thus, the revised costs for unsplit and split regions are respectively ti N 162 BoundEdge tt 1 be pb Nbti ¡ pa Nati ¡ ¡ where be is a “bonus” value that is zero unless one of the two regions is completely empty, in which case it takes on a value between zero and one. The probabilities p0 and p1 are easily computed using ideas from geometric probability. It can be shown that for a convex volume A contained in another convex volume B, the conditional probability that a random ray passing through B will also pass through A is the ratio of their surface areas, s A and sB : sA p AB £¡ ¦ ¢ ¤ sB Because we are interested in the cost for rays passing through the interior node, we can use this result directly. Thus, given a split of a region A into two sub-regions B and C (see Figure 4.9), the probability that a ray passing through A will also pass through either of the subregions is easily computed. The last problem to address is how to generate candidate splitting positions and how to efﬁciently compute the cost for each candidate. It can be shown that the minimum cost with this model will be attained at a split that is coincident with one of the faces of one of the primitives bounding boxes–there’s no need to consider splits at intermediate positions. (To convince yourself of this, consider what hap- pens to the cost function between the edges of the faces). Here, we will consider all bounding box faces inside the region for all three axes. The cost for checking all of these candidates thus can be kept relatively low with a carefully-structured algorithm. To compute these costs, we will sweep across the projections of the bounding boxes onto each axis and keep track of which gives the lowest cost (Figure 4.10.) Each bounding box has two edges on each axis, each of which is represented by a BoundEdge structure. This structure records the position 162 Primitives and Intersection Acceleration [Ch. 4 Figure 4.10: Projections of bbox edges onto the axis... of the edge along the axis, whether it represents the start or end of a bounding box (going from low to high along the axis), and which primitive it is associated with. KdAccelNode Declarations ¡¡ ¢ struct BoundEdge { BoundEdge Public Methods ¡ Float t; int primNum; enum { START, END } type; }; BoundEdge Public Methods ¡¡ ¢ BoundEdge(Float tt, int pn, bool starting) { t = tt; primNum = pn; type = starting ? START : END; } We will need at most 2 * prims.size() BoundEdges when computing costs for any tree node, so we allocate the memory for the edges for all three axes once and then reuse it for each node that is created. The fragment Free working memory for kd-tree construction , not included here, frees this space after the tree has been ¡ built. Allocate working memory for kd-tree construction ¢ £¡ BoundEdge *edges[3]; for (int i = 0; i < 3; ++i) edges[i] = new BoundEdge[2*prims.size()]; After determining the estimated cost for creating a leaf, KdTreeAccel::buildTree() loops over the three axes and computes the cost function for each candidate split. bestAxis and bestOffset record the axis and bounding box edge index that gave the lowest cost so far, bestCost. invTotalSA is initialized to the reciprocal of the node’s surface area; its value will be used when computing the probabilities of rays passing through each of the candidate children nodes. Sec. 4.4] Kd-Tree Accelerator 163 Choose split axis position for interior node ¢ £¡ int bestAxis = -1, bestOffset = -1; Float bestCost = INFINITY; Float oldCost = isectCost * nPrims; Vector d = nodeBounds.pMax - nodeBounds.pMin; Float invTotalSA = 1.f / (2.f * (d.x*d.y + d.x*d.z + d.y*d.z)); Choose which axis to split along ¡ Initialize edges for axis ¡ Compute cost of all splits for axis to ﬁnd best ¡ text here more than just text, we don’t loop over the axes any more. Some of the previous discussion is therefore wrong. Always split along the axis with the largest extent; works well in practice, saves the work of checking all of them. Choose which axis to split along ¢ £¡ int axis; if (d.x > d.y && d.x > d.z) axis = 0; else axis = (d.y > d.z) ? 1 : 2; First the edges array for the current axis is initialized using the bounding boxes 38 BBox of the overlapping primitives. The array is then sorted from low to high along the 162 BoundEdge axis so that we can sweep over the box edges from ﬁrst to last. 678 INFINITY 27 Vector Initialize edges for axis ¢ £¡ for (int i = 0; i < nPrims; ++i) { int pn = primNums[i]; const BBox &bbox = allPrimBounds[pn]; edges[axis][2*i ] = BoundEdge(bbox.pMin[axis], pn, true); edges[axis][2*i+1] = BoundEdge(bbox.pMax[axis], pn, false); } sort(&edges[axis][0], &edges[axis][2*nPrims]); The C++ standard library routine sort() requires that the structure being sorted deﬁne an ordering; this is easily done with the BoundEdge::t values. However, one subtlety is that if the BoundEdge::t values match, it is necessary to try to break the tie by comparing the node’s types; this is necessary since sort() depends on the fact that a < b and b < a is only true if a == b. BoundEdge Public Methods ¡¡ ¢ bool operator<(const BoundEdge &e) const { if (t == e.t) return (int)type < (int)e.type; else return t < e.t; } Given the sorted array of edges, we’d like to quickly compute the cost function for a split at each one of them. The probabilities for a ray passing through each child node are easily computed, and the number of primitives on each side of the split is tracked by nBelow and nAbove. At the ﬁrst edge, all primitives must be above that edge by deﬁnition, so nAbove is initialized to nPrims and nBelow is zero. When we encounter a starting edge of a bounding box, we know that the 164 Primitives and Intersection Acceleration [Ch. 4 enclosed primitive will overlap the volume below the potential split at that edge. When we encounter an ending edge, the enclosed primitive must be above the edge. The tests at the start and end of the loop body update the primitive counts for these cases. Compute cost of all splits for axis to ﬁnd best ¢ £¡ int nBelow = 0, nAbove = nPrims; for (int i = 0; i < 2*nPrims; ++i) { if (edges[axis][i].type == BoundEdge::END) --nAbove; Float edget = edges[axis][i].t; if (edget > nodeBounds.pMin[axis] && edget < nodeBounds.pMax[axis]) { Compute cost for split at ith edge ¡ } if (edges[axis][i].type == BoundEdge::START) ++nBelow; } Given all of this information, the cost for a particular split is easily computed. belowSA and aboveSA are hold the surface areas of the two candidate child bounds; they are easily computed by adding up the areas of the six faces. Given an axis number, we can use the otherAxis array to quickly compute the indices of the BoundEdge 162 other two axes without branching. Compute cost for split at ith edge ¢ £¡ int otherAxis[3][2] = { {1,2}, {0,2}, {0,1} }; int otherAxis0 = otherAxis[axis][0], otherAxis1 = otherAxis[axis][1]; Float belowSA = 2 * (d[otherAxis0] * d[otherAxis1] + (edget - nodeBounds.pMin[axis]) * (d[otherAxis0] + d[otherAxis1])); Float aboveSA = 2 * (d[otherAxis0] * d[otherAxis1] + (nodeBounds.pMax[axis] - edget) * (d[otherAxis0] + d[otherAxis1])); Float pBelow = belowSA * invTotalSA, pAbove = aboveSA * invTotalSA; Float eb = (nAbove == 0 || nBelow == 0) ? emptyBonus : 0.f; Float cost = traversalCost + isectCost * (1.f - eb) * (pBelow * nBelow + pAbove * nAbove); Update best split if this is lowest cost so far ¡ Update best split if this is lowest cost so far ¢ £¡ if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; } It may happen that there are no possible splits found in the tests above (Fig- ure 4.11 illustrates a case where this may happen). In this case, there isn’t a single candidate position at which to split the node. Reﬁning such a node doesn’t do any good, since both children will still have the same number of overlapping primitives. When we detect this condition, we give up and make a leaf node. It is also possible that the best split will have a cost that is still higher than the cost for not splitting the node at all. If it is substantially worse and there aren’t too many primitives, a leaf node is made immeditately. Otherwise, badRefines keeps Sec. 4.4] Kd-Tree Accelerator 165 Figure 4.11: No useful splits possible due to overlap. Three bounding boxes over- lap the node, yet none of their edges are inside it. track of how many bad splits have been made so far above the current node of the tree. It’s worth allowing a few slighly poor reﬁnements since later splits may be able to ﬁnd much better ones given a smaller subset of primitives to consider. Create leaf if no good splits were found ¢ £¡ if (bestCost > oldCost) ++badRefines; if ((bestCost > 4.f * oldCost && nPrims < 16) || 162 BoundEdge bestAxis == -1 || badRefines == 3) { 156 KdAccelNode::initLeaf() nodes[nodeNum].initLeaf(primNums, nPrims, mailboxPrims, zone); prims0 166 return; 166 prims1 } Having chosen a split position, the edges can be used to quickly classify the primitives as being above, below, or on both sides of the split in the same way as was done to keep track of nBelow and nAbove in the code above. Classify primitives with respect to split ¢ £¡ int n0 = 0, n1 = 0; for (int i = 0; i < bestOffset; ++i) if (edges[bestAxis][i].type == BoundEdge::START) prims0[n0++] = edges[bestAxis][i].primNum; for (int i = bestOffset+1; i < 2*nPrims; ++i) if (edges[bestAxis][i].type == BoundEdge::END) prims1[n1++] = edges[bestAxis][i].primNum; Recall that the node number of the “below” child of this node is the current node number plus one. After the recursion has returned from that side of the tree, the nextFreeNode offset is used for the “above” child. The only other important de- tail here is that the prims0 memory is passed directly for re-use by both children, while the prims1 pointer is advanced forward ﬁrst. This is necessary since the current invocation of KdTreeAccel::buildTree() depends on its prims1 values being preserved over the ﬁrst recursive call to KdTreeAccel::buildTree() be- low, since it must be passed as a parameter to the second call. However, there is no corresponding need to preserve the edges values or to preserve prims0 beyond its immediate used in the ﬁrst recursive call. 166 Primitives and Intersection Acceleration [Ch. 4 Recursively initialize children nodes ¢ £¡ Float tsplit = edges[bestAxis][bestOffset].t; nodes[nodeNum].initInterior(bestAxis, tsplit); BBox bounds0 = nodeBounds, bounds1 = nodeBounds; bounds0.pMax[bestAxis] = bounds1.pMin[bestAxis] = tsplit; buildTree(nodeNum+1, bounds0, allPrimBounds, prims0, n0, depth-1, edges, prims0, prims1 + nPrims, badRefin nodes[nodeNum].aboveChild = nextFreeNode; buildTree(nodes[nodeNum].aboveChild, bounds1, allPrimBounds, prims1, n1, depth-1, edges, prims0, prims1 + nPrims, badRefines); Thus, much more space for the prims1 array of integers for storing the overlap- ping primitive numbers is needed than for the prims0 array, which only needs to handle the primitives at a single level at a time. Allocate working memory for kd-tree construction ¡¡ ¢ int *prims0 = new int[prims.size()]; int *prims1 = new int[(maxDepth+1) * prims.size()]; 4.4.3 Traversal BBox 38 BBox::IntersectP() 137 Figure 4.12 shows the basic process of ray traversal through the tree. Intersecting Intersection 131 the ray with the tree’s overall bounds gives initial tmin and tmax values, marked KdAccelNode::aboveChild 155 with “x”s in the ﬁgure. As with the grid accelerator, if the ray misses the scene AccelNode::initInterior() 157 KdTreeAccel 154 bounds, we can quickly return false. Otherwise, we begin to descend into the KdTreeAccel::buildTree() 159 tree, starting at the root. At each interior node, we determine which of the two KdTreeAccel::nextFreeNode 158 children the ray enters ﬁrst, and process both children in order. Traversal ends KdTreeAccel::nodes 158 Ray 36 either when the ray exits the tree or when the closest intersection is found. KdTreeAccel Method Deﬁnitions ¡¡ ¢ bool KdTreeAccel::Intersect(const Ray &ray, Intersection *isect) const { Compute initial parametric range of ray inside kd-tree extent ¡ Prepare to traverse kd-tree for ray ¡ Traverse kd-tree nodes in order for ray ¡ } The algorithm starts by ﬁnding the overall parametric range t min tmax of the ¡ ¢ ray’s overlap with the tree, exiting immediately if there is no overlap. Compute initial parametric range of ray inside kd-tree extent ¢ £¡ Float tmin, tmax; if (!bounds.IntersectP(ray, &tmin, &tmax)) return false; Before tree traversal starts, a new mailbox id is found for the ray and the the reciprocals of the components of the direction vector are precomputed so that it is possible to order to replace divides with multiplies in the main traversal loop. The array of KdToDo structures is used to record the nodes yet to be processed for the ray; it is ordered so that the last active entry in the array is the next node that should be considered. The maximum number of entries needed in this array is Sec. 4.4] Kd-Tree Accelerator 167 Figure 4.12: Traversal of a ray through the kd-tree: the ray is intersected with the bounds of the tree, giving an initial parametric t min tmax range to consider. ¡ ¢ Because this range is non-empty, we need to consider the two children of the root node, here. The ray ﬁrst enters the child on the right, labeled “near”, where it has a parametric range tmin tsplit . If the near node is a leaf with primitives in it,we ¡ ¢ intersect the ray with the primitives; otherwise we process its children nodes. If no hit is found, or if a hit is found beyond t min tsplit , then the far node, on the ¡ ¢ left, is processed. This sequence continues–processing tree nodes in a depth-ﬁrst, front-to-back traversal–until the closest intersection is found or the ray exits the tree. 168 Primitives and Intersection Acceleration [Ch. 4 the maximum depth of the kd-tree; the array size used below should be more than enough in practice. XXX load ray.o into a Point here so compiler knows won’t be modiﬁed? XXX Prepare to traverse kd-tree for ray ¢ £¡ int rayId = curMailboxId++; Vector invDir(1.f/ray.d.x, 1.f/ray.d.y, 1.f/ray.d.z); #define MAX_TODO 64 KdToDo todo[MAX_TODO]; int todoPos = 0; KdTreeAccel Declarations ¡¡ ¢ struct KdToDo { const KdAccelNode *node; Float tmin, tmax; }; The traversal continues through the nodes, processing a single leaf or interior node each time through the loop. XXX uncomment and do no book stats stuff XXX Traverse kd-tree nodes in order for ray ¢ £¡ KdAccelNode 155 KdAccelNode::IsLeaf() 157 bool hit = false; KdTreeAccel::curMailboxId 154 const KdAccelNode *node = &nodes[0]; KdTreeAccel::nodes 158 Ray::d 35 while (node != NULL) { Ray::maxt 36 Bail out if we found a hit closer than the current node ¡ Vector 27 if (!node->IsLeaf()) { Process kd-tree interior node ¡ } else { Check for intersections inside leaf node ¡ Grab next node to process from todo list ¡ } } return hit; An intersection may have been previously found in a primitive that overlaps multiple nodes. If the intersection was outside the current node when ﬁrst detected, it is necessary to keep traversing the tree until we come to a node where t min is beyond the intersection; only then is it certain that there is no closer intersection with some other primitive. Bail out if we found a hit closer than the current node ¢ £¡ if (ray.maxt < tmin) break; For interior tree nodes the ﬁrst thing to do is to intersect the ray with the node’s splitting plane and determine if one or both of the children nodes needs to be pro- cessed and in what order the ray passes through them. Process kd-tree interior node ¢ £¡ Compute distance along ray to split plane ¡ Get node children pointers for ray ¡ Advance to next child node, possibly enqueue other child ¡ Sec. 4.4] Kd-Tree Accelerator 169 Figure 4.13: The position of the origin of the ray with respect to the splitting plane can be used to determine which of the node’s children should be processed ﬁrst. If a ray like r1 is “below” side of the splitting plane, we should process the “below” chilld before the “above” child, and vice versa. The parametric distance to the split plane is computed in the same manner as was done in computing the intersection of a ray and an axis-aligned plane for the ray–bounding box test. Compute distance along ray to split plane ¢ £¡ int axis = node->SplitAxis(); 155 KdAccelNode Float tplane = (node->SplitPos() - ray.o[axis]) * invDir[axis];157 KdAccelNode::SplitAxis() 157 KdAccelNode::SplitPos() Now it is necessary to determine the order the ray encounters the children nodes, 35 Ray::o so that the tree is traversed in front-to-back order along the ray. Figure 4.13 shows the geometry of this computation. The position of the ray’s origin with respect to the splitting plane is enough to distinguish between the two cases, ignoring for now the case where the ray doesn’t actually pass through one of the two nodes. Get node children pointers for ray ¢ £¡ const KdAccelNode *firstChild, *secondChild; int belowFirst = ray.o[axis] <= node->SplitPos(); if (belowFirst) { firstChild = node + 1; secondChild = &nodes[node->aboveChild]; } else { firstChild = &nodes[node->aboveChild]; secondChild = node + 1; } It may not be necessary to process both children of this node. Figure 4.14 shows some conﬁgurations where the ray only passes through one of the children that need to be handled. The ray will never miss both children, since otherwise the current interior node should never have been traversed. The ﬁrst if test in the code below corresponds to the left side of the ﬁgure: only the near node needs to be processed if it can be shown that the ray doesn’t overlap the far node because it faces away from the far node or doesn’t overlap it because tsplit tmax . The right side of the ﬁgure shows the similar case tested in the second if test: the near node may not need processing if the ray doesn’t overlap it. 170 Primitives and Intersection Acceleration [Ch. 4 Figure 4.14: Two cases where both children of a node don’t need to be processed because the ray doesn’t overlap them. On the left, the top ray intersects the splitting plane beyond the ray’s tmax position and thus doesn’t enter the far child. The bottom ray is facing away from the splitting plane, indicated by a negative t split value. On the right, the ray intersects the plane before the ray’s t min value, indicating that the near plane doesn’t need processing. KdToDo::node 168 KdToDo::tmax 168 KdToDo::tmin 168 Otherwise, the else clause handles the case of both children needing processing; the near node will be processed next and the far node goes on the todo list. Advance to next child node, possibly enqueue other child ¢ £¡ if (tplane > tmax || tplane < 0) node = firstChild; else if (tplane < tmin) node = secondChild; else { Enqueue secondChild in todo list ¡ node = firstChild; tmax = tplane; } Enqueue secondChild in todo list ¢ £¡ todo[todoPos].node = secondChild; todo[todoPos].tmin = tplane; todo[todoPos].tmax = tmax; ++todoPos; If the current node is a leaf, intersection tests are performed against the prim- itives in the leaf, though the mailbox test makes it possible to avoid re-testing primitives that have already been considered for this ray. Further Reading 171 Check for intersections inside leaf node ¢ £¡ u_int nPrimitives = node->nPrimitives(); if (nPrimitives == 1) { MailboxPrim *mp = node->onePrimitive; Check one primitive inside leaf node ¡ } else { MailboxPrim **prims = node->primitives; for (u_int i = 0; i < nPrimitives; ++i) { MailboxPrim *mp = prims[i]; Check one primitive inside leaf node ¡ } } Finally, we check the mailbox id of the ray, and call the Primitive::Intersect() routine. Check one primitive inside leaf node ¢ £¡ if (mp->lastMailboxId != rayId) { mp->lastMailboxId = rayId; if (mp->primitive->Intersect(ray, isect)) hit = true; 139 GridAccel } 157 KdAccelNode::nPrimitives() 155 KdAccelNode::onePrimitive After doing the intersection tests at the leaf node, the next node to process is 155KdAccelNode::primitives 168KdToDo::node loaded from the todo array. If no more nodes remain, then we know that the ray 168 KdToDo::tmax passed through the tree without hitting anything. 168KdToDo::tmin 154 KdTreeAccel Grab next node to process from todo list £¡ ¢ 141 MailboxPrim if (todoPos > 0) { 141 MailboxPrim::lastMailboxId 131 Primitive::Intersect() --todoPos; 36 Ray node = todo[todoPos].node; tmin = todo[todoPos].tmin; tmax = todo[todoPos].tmax; } else break; Like the GridAccel, the KdTreeAccel has a specialized intersection method for shadow rays which is not shown here. It is largely similar to the KdTreeAccel::Intersect() method, just calling Primitive::IntersectP() method and returning true as soon as it ﬁnds any intersection without worrying about ﬁnding the closest one. KdTreeAccel Public Methods ¡¡ ¢ bool IntersectP(const Ray &ray) const; 172 Primitives and Intersection Acceleration [Ch. 4 ¥ £ § £ £ ¨ ¡ § ¥ ¢ After the introduction of the ray tracing algorithm, an enormous amount of re- search was done to try to ﬁnd effective ways to speed it up, primarily by devel- oping improved ray tracing acceleration structures. Arvo and Kirk’s chapter in An Introduction to Ray Tracing summarizes the state of the art as of 1989. Ray Tracing News, www.acm.org/tog/resources/RTNews/, is an excellent resource for general ray tracing information and has particularly useful discussion about implementation issues and tricks of the trade. Clark ﬁrst suggested using bounding volumes to cull collections of objects for standard visible-surface determination algorithms (Clark 1976). Building on this work, Rubin and Whitted developed the ﬁrst hierarchical data structures for scene representation for fast ray tracing (Rubin and Whitted 1980). Weghorst et al’s pa- per discussed the trade-offs of using various shapes for bounding volumes and sug- gested projecting objects to the screen and using a z-buffer rendering to accelerate eye rays (Weghorst, Hooper, and Greenberg 1984). Fujimoto et al were the ﬁrst to intorduce uniform voxel grids, similar to what we describe in this chapter (Fujimoto, Tanaka, and Iwata 1986). Snyder and Barr described a number of key improvements to this approach, and showed their use for rendering extremely complex scenes (Snyder and Barr 1987). Hierarchical grids KdTreeAccel 154 were ﬁrst described by Jevans and Wyvill (Jevans and Wyvill 1989). More recent techniques for hierarchical grids were developed by Cazals et al and Klimaszewski and Sederberg (Cazals, Drettakis, and Puech 1995; Klimaszewski and Sederberg 1997). Glassner introduced the use of octrees for ray intersection acceleration (Glass- ner 1984); this approach was more robust to scenes with non-uniform distribu- tions of geometry. The kd-tree was ﬁrst described by Kaplan (?). Kaplan’s tree construction algorithm always split nodes down their middle. A better approach for building trees and the basis for the method used in the KdTreeAccel was in- truced by MacDonald and Booth (MacDonald and Booth 1990), who estimated ray–node traversal probabilities using relative surface areas. Naylor has also writ- ten on general issues of constructing good kd-trees (Naylor 1993). Havran and Bittner (Havran and Bittner 2002) have recently revisited many of these issues and introduced some useful improvements. Adding a bonus factor for tree nodes that are completely empty was suggested by Hurley et al (Hurley, Kapustin, Reshetov, and Soupikov 2002). Jansen ﬁrst described the efﬁcient ray traversal algorithm for kd-trees (Jansen 1986); Arvo has also investigated these issues (Arvo 1988). Sung and Shirley de- scribe a ray traversal algorithm’s implementation for a BSP-tree accelerator (Sung and Shirley 1992); our KdTreeAccel traversal code is loosely based on theirs. An early object subdivision approach was the hierarchial bounding volumes of Goldsmith and Salmon (Goldsmith and Salmon 1987). They also were the ﬁrst to introduce techniques for estimating the probability of a ray intersecting a bounding volume based on the volume’s surface area. Arnaldi et al and Amanatides and Woo came up with mailboxing (Arnaldi, Priol, and Bouatouch 1987; Amanatides and Woo 1987). Kay Kajiya (Kay and Kajiya 1986). Arvo and Kirk 5D position direction subdivision (Arvo and Kirk 1987). Exercises 173 Figure 4.15: If a bounding box of the overlapping geometry is stored in each voxel for fast rejection of unnecessary ray–primitive intersection tests, an alternative to checking for ray–bounding box intersection is to ﬁnd the bounding box of the ray inside the voxel (shown here with a dashed line) and test to see if that overlaps the geometry bound. Kirk and Arvo introduced the unifying principle of meta-hierarchies (Kirk and Arvo 1988); they showed that by implementing acceleration data structures to con- form to the same interface as is used for primitives in the scene, it’s easy to mix and match multiple intersection schemes in a scene without needing to have particular knowledge of it. Smits on fast ray–box intersection, general issues of efﬁcient ray tracing (Smits 1998). Papers by Woo, Pearce, etc. with additional clever tricks ¡ ¥ ¥ £ 4.1 Try using bounding box tests to improve the grid’s performace: inside each grid voxel, store the bounding box of the geometry that overlaps the voxel. Use this bounding box to quickly skip intersection tests with geometry if the ray doesn’t intersect the bound. Develop criteria based on the number of primitives in a voxel and the size of their bound with respect to the voxel’s bound to only do the bounding box tests for voxels where doing so is likely to improve performance. When is this extra work worthwhile? 4.2 Rather than computing a ray–bounding box intersection for the technique described in the previous exercise, it can be more efﬁcient to check to see if the bounding box of a ray’s implement ray bound in each voxel; then check for overlap of ray bound with world bound of the objects ﬁrst–very cheap test... 4.3 Rewrite the Find stepAxis for stepping to next voxel fragment to com- ¡ pute the stepping axis in the obvious way (with a few comparisons and branches). Evaluate the performance beneﬁt of lrt’s table-based approach on several CPU’s. What do the results tell you about the architecture of each CPU? How do your results vary if you compare 4 or 5 numbers instead of just 3? 174 Primitives and Intersection Acceleration [Ch. 4 4.4 Generalize the grid implementation in this chapter to be hierarchical: reﬁne voxels that have an excessive number of primitives overlapping them to in- stead hold a ﬁner sub-grid to store its geometry. (See for example Jevans and Wyvill’s paper for a basic approach to this problem (Jevans and Wyvill 1989).) 4.5 Develop a more complex hierarchial grid implementation, following the ap- proach of either Cazals et al (Cazals, Drettakis, and Puech 1995) or Kli- maszewski and Sederberg (Klimaszewski and Sederberg 1997). How does it compare to hierarchical grids based on Jevans and Wyvill’s approach? 4.6 Implement a primitive list “accelerator” that just stores an array that holds all of the primitives and loops over all of them for every intersection test. How much does using this accelerator make the system slow down? Is this accelerator ever faster than the GridAccel or KdTreeAccel? Describe a contrived example where the primitive list would be faster than a grid or kd-tree even for a complex scene. 4.7 Implement smarter overlap tests for building accelerators. Using objects’ bounding boxes to determine which grid cells and which sides of a kd-tree GridAccel 139 split they overlap can hurt the performance of the accelerators and cause un- KdTreeAccel 154 necessary intersection tests. (Recall Figure 4.5.) Add a bool Shape::Overlaps(const BBox &) const method to the shape interface that takes a world-space bound- ing box and determines if the shape truly overlaps the given bound. A default implementation could get the world bound from the shape and use that for the test and specialized versions could be written for frequently-used shapes. Implement this method for Spheres and Triangles and modify the acceler- ators to call it. Measure the change in lrt’s performance. 4.8 Fix the KdTreeAccel so that it doesn’t always immediately reﬁne all prim- itives before building the tree. For example, one approach is to build addi- tional kd-trees as needed, storing these sub-trees in the hierarchy where the original unreﬁned primitive was. Implement this approach, or come up with a better technique to address this problem and measure the change in running time and memory use for a variety of scenes. 4.9 Investigate alternative cost functions for building kd-trees for the KdTreeAccel. How much can a poorly cost function hurt its performance? How much im- provement can be had compared to the current one? 4.10 Geometry caching: hold limited amount in memory, discard as needed and call Primitive::Refine() later if geometry is needed again. LRU scheme... 4.11 The grid and kd-tree accelerators both take a 3D region of space, subdivite it into cells, and record which primitives overlap each cell. Hierarchical bounding volumes (HBVs) appraoch the problem in a different way, starting with all of the primitives and progressively partitioning them into smaller spatially-nearby subsets. This process gives a hierarchy of primitives... XXX The top node of the hierarchy holds a bound that encompasses all of the primitives in the scene (see Figure 4.16). It has two or more children nodes, Exercises 175 We seem to be missing this ﬁgure. Figure 4.16: The Hierarchical bounding ﬁgure? each of which bounds a subset of the scene. This continues recursively until the bottom of the tree, at which point the bound around a single primitive is stored. Read Goldsmith and Salmon’s paper about building HBV hierar- chies and implement their approach as an Accelerator in lrt. Compare its performance against the grid and kd-tree accelerators. 4.12 Meta hierarchies: The idea of using spatial data structures can be general- ized to include spatial data structures that themselves hold other spatial data structures, rather than just primitives. Not only could we have a grid that has sub-grids inside the grid cells that have many primitives in them (thus partially solving the adaptive reﬁnement problem), but we could also have the scene organized into a HBV where the leaf nodes are grids that hold smaller collections of spatially-nearby primitives. Such hybrid techniques can bring the best of a variety of spatial data structure-based ray intersection acceleration methods. In lrt, because both geometric primitives and inter- section accelerators inherit from the Primitive base class and thus provide the same interface, it’s easy to mix and match in this way. 130 Primitive 4.13 Disable the mailbox test in the grid or kd-tree accelerator and measure how much lrt slows down when rendering various scenes. How effective is mail- boxing? How many redundant intersection tests are performed without it? One alternative to mailboxing is to update the rays t min tmax range for the ¡ ¢ accelerator cell that it is currently in, so that the primitives will ignore inter- sections outside that range and may be able to avoid performing a complete intersection test if no intersection is possible in the current range. How does the performance of that approach compare to mailboxing? 4.14 There is a subtle bug in the mailboxing schemes for both the grid and the kd- tree that may cause intersections to be missed after a few billion rays have been traced. Describe a scenario where this might happen and suggest how this bug could be ﬁxed. How likely is this bug to cause an incorrect result to be returned by the accelerator? ¡ ¡ ¡ ¢ ¢ ¢ ¥ ¢ £ ¡ 181 Spectrum In order to describe how light is represented and sampled to compute images, we will ﬁrst establish some background in radiometry. Radiometry is the area of study of the propagation of electromagnetic radiation in environments. The wave- lengths of electromagnetic radiation between (approximately) 370nm and 730nm account for light visible to the human visual system and are of particular interest in rendering. The lower wavelengths, λ 400nm are the blue-ish colors, the middle wavelengths λ 550nm are the greens, and the upper wavelengths λ 650nm are the reds. We will introduce four key radiometric quantities–ﬂux, intensity, irradiance, and radiance–that describe electromagnetic radiation. By evaluating the amount of ra- diation arriving on the camera’s image plane, we can accurately model the pro- cess of image formation. These radiometric quantities generally vary according to wavelength, and are described by a spectral power distribution (SPD), which is a function of wavelength, λ. This chapter starts by describing the Spectrum class that lrt uses to represent SPDs. We will then introduce basic concepts of radiometry and some theory behind light scattering from surfaces. For now, we will ignore the effects of smoke, fog, and all other atmospheric phenomena and assume that the scene is a collection of surfaces in a vacuum. These restrictions will be relaxed in Chapter 12. § ¨ ¤¥ § ¤§¥ ¥ £ ¥ § £ £ £ £ ¡ ¡ color.h* ¢ £¡ #include "lrt.h" Spectrum Declarations ¡ ¡ 178 Color and Radiometry [Ch. 5 100 Brightness 50 0 400 500 600 700 Wavelength 0.6 Brightness 0.4 0.2 0 400 500 600 700 Wavelength Figure 5.1: Spectral power distributions of a ﬂuorescent light (top) and the re- ﬂectance of lemon skin (bottom). Wavelengths around 400nm are blue-ish colors, greens and yellows are in the middle range of wavelengths, and reds have wave- lengths around 700nm. The ﬂuorescent light’s SPD is even spikier than shown here, where the SPDs have been binned into 10nm ranges; it emits much of its illu- mination at single frequencies.The y-axis of the lemon graph is labeled wrong, and the text is REALLY small. color.cpp* ¢ £¡ #include "color.h" Spectrum Method Deﬁnitions ¡ The SPDs of real-world objects can be quite complicated; Figure 5.1 shows a graph of the spectral distribution of emission from a ﬂuorescent light and the spectral distribution of the reﬂectance of lemon skin. Given such functions, we would like a compact, efﬁcient, and accurate way to represent them. A number of approaches have been developed that are based on ﬁnding good basis functions to represent SPDs. The idea behind basis functions is to map the inﬁnite-dimensional space of possible SPD functions to a low-dimensional space of coefﬁcients c i . ¥ ¦¤ For example, a trivial basis function is the constant function B λ 1. An arbitrary § © ¨ SPD would be represented by a single coefﬁcient c equal to its average value, so that its basis function approximation would be cB λ c. This is obviously a poor § © ¨ approximation, since it has no chance to account for the SPD’s possible complexity. It is often convenient to limit ourselves to linear basis functions. This means that the basis functions are pre-determined functions of wavelength and aren’t them- selves parameterized. For example, if we were using Gaussians as basis functions Sec. 5.1] Spectral Representation 179 and wanted to have a linear basis, we need to set their respective widths and central wavelengths ahead of time. If we allowed the widths and center positions to vary based on the SPD we were trying to ﬁt, we would be performing non-linear approx- imation. Though non-linear basis functions can naturally adapt to the complexity of SPDs, they tend to be less computationally efﬁcient. Also, the theory of non- linear approximation is very difﬁcult, and even an introduction would be beyond the scope of this book. Because it is not a primary goal of lrt to provide the most comprehensive spectral representations, we will only implement infrastructure for linear basis functions. Given a set of linear basis functions B i , coefﬁcients ci for a SPD S λ can be ¡ computed by ci ¤ ¢ Bi λ S λ dλ ¡ ¡ ¡ (5.1.1) λ so that Sλ ¡ ∑ ci Bi λ ¤¡ i Measured SPDs of real-world objects are often given in 10nm increments; this corresponds to a step-function basis: a λ b ¢ 1 : Bλ £ ¡ ab ¢ 0 : otherwise Another common basis function is the delta function that evaluates the SPD at single wavelengths. Others that have been investigated include polynomials and Gaussians. Given an SPD and its associated set of linear basis function coefﬁcients, a num- ber of operations on the spectral distributions can be easily expressed directly in terms of the coefﬁcients. For example, to compute the coefﬁcients c i for the SPD ¤ given by multiplying a scalar k with a SPD S λ , where the coefﬁcients for S λ ¡ ¡ are ci , we have: ci¤ ¢ ¤ Bi λ kS λ dλ ¡ ¡¡ λ ci¤ ¢ k ¤ Bi λ S λ dλ ¡ ¡ λ ci¤ ¢ kci Such a multiplication might be used to adjust the brightness of a light source. Sim- ilarly, for two SPDs S1 λ and S2 λ represented by coefﬁcients c1 and c2 I don’t ¡ ¡ i i like numerical superscripts; they’re too confusing and look like powers, the sum S1 λ S2 λ can be shown to be ¡ ¡ ci ¤ ¢ ∑ c1 i c2 i ¤ Thus, by converting to a basis function representation, a number of otherwise potentially-tricky operations with SPDs are made straightforward. We will often need to multiply two SPDs together. For example, the product of the SPD of light arriving at a surface with the SPD of the surface’s reﬂectance gives the SPD of light reﬂected from the surface. In general, the coefﬁcients for 180 Color and Radiometry [Ch. 5 the SPD representing the product of two SPDs doesn’t work out so cleanly, even with linear basis functions: ci ¢ ¤ Bi λ S1 λ S2 λ dλ ¡ ¡ ¡¡ λ ¤ λ Bi λ ¡ ∑ c1j B j λ ¡ ¡ ¢ ∑ c2 Bk k λ ¡ ¡ ¢ dλ j k ¢ ∑ ∑ c1j c2 k ¤ λ Bi λ B j λ Bk λ dλ ¡ ¡ ¡ j k The integrals of the product of the three basis functions can be precomputed and stored in n matrices of size n2 each, where n is the number of basis functions. Thus, n3 multiplications are necessary to compute the new coefﬁcients. Alternatively, If one of the colors is known ahead of time (e.g. a surface’s reﬂectance), we can precompute an matrix S deﬁned so that the S i j element is Si j ¤ ¢ S1 λ B i λ B j λ ¡ ¡ ¤¡ λ Then, multiplication with another SPD is just a matrix-vector multiply with S and Spectrum 181 the vector c2 , requiring n2 multiplications. i In lrt, we will choose computational efﬁciency over generality and further limit the supported basis functions to be orthonormal. This means that for i j, ¢ ¤ Bi λ B j λ dλ ¡ ¡ ¢ 0 λ and ¤ Bi λ Bi λ dλ ¡ ¡ ¢ 1 ¤ λ Under these assumptions, the coefﬁcients for the product of two SPDs is just the product of their coefﬁcients ci c1 c2 i i ¢ ¡ requiring only n multiplications. XXX need to note, though, that the coefﬁcients for the product of two SPDs will not in general have the same values as the products of their coefﬁcients: Other than requiring that the basis functions used be linear and orthonormal, lrt places no further restriction on them. In fact, lrt operates purely on basis function coefﬁcients: colors are speciﬁed in input ﬁles and texture maps as coefﬁcients and lrt can write out images of coefﬁcients–almost no knowledge of the particular basis functions being used is needed. 5.1.1 Spectrum Class The Spectrum class holds a compile-time ﬁxed number of basis function coefﬁ- cients, given by COLOR_SAMPLES. Global Constants ¡¡ ¢ #define COLOR_SAMPLES 3 Sec. 5.1] Spectral Representation 181 Spectrum Declarations ¢ £¡ class Spectrum { public: Spectrum Public Methods ¡ Spectrum Public Data ¡ private: Spectrum Private Data ¡ }; Spectrum Private Data ¢ £¡ Float c[COLOR_SAMPLES]; Two Spectrum constructors are provided, one initializing a spectrum with the same value for all coefﬁcients, and one initializing it from an array of coefﬁcients. Spectrum Public Methods ¢ £¡ Spectrum(Float intens = 0.) { for (int i = 0; i < COLOR_SAMPLES; ++i) c[i] = intens; } Spectrum Public Methods ¡¡ ¢ 180 COLOR SAMPLES Spectrum(Float cs[COLOR_SAMPLES]) { for (int i = 0; i < COLOR_SAMPLES; ++i) c[i] = cs[i]; } A variety of arithmetic operations on Spectrum objects are supported; the im- plementations are all quite straightforward. First are operations to add pairs of spectral distributions. Spectrum Public Methods ¡¡ ¢ Spectrum &operator+=(const Spectrum &s2) { for (int i = 0; i < COLOR_SAMPLES; ++i) c[i] += s2.c[i]; return *this; } Spectrum Public Methods ¡¡ ¢ Spectrum operator+(const Spectrum &s2) const { Spectrum ret = *this; for (int i = 0; i < COLOR_SAMPLES; ++i) ret.c[i] += s2.c[i]; return ret; } Similarly, subtraction, multiplication and division of spectra is deﬁned component- wise. We won’t include all of the code for those cases, or for multiplying or divid- ing them by scalar values, since there’s little additional value to seeing it all. this text needs work While this method is redundant given the operators deﬁned so far, for perfor- mance critical sections of code where one would like to update a Spectrum with 182 Color and Radiometry [Ch. 5 a weighted value of another Spectrum, (s = w*s2;), the AddWeighted() method can do the same computation more efﬁciently. Many compilers are not able to op- timize the computation as well if it’s written using the operators above, since they lead to the creation of a temporary Spectrum to hold the product, which is then assigned to the result. Spectrum Public Methods ¡¡ ¢ void AddWeighted(Float w, const Spectrum &s) { for (int i = 0; i < COLOR_SAMPLES; ++i) c[i] += w * s.c[i]; } We also provide the obvious equality test. Spectrum Public Methods ¡¡ ¢ bool operator==(const Spectrum &sp) const { for (int i = 0; i < COLOR_SAMPLES; ++i) if (c[i] != sp.c[i]) return false; return true; } We frequently want to know if a spectrum is “black”. If, for example, a surface COLOR SAMPLES 180 has zero reﬂectance, we can avoid casting reﬂection rays that will eventually be Spectrum 181 multiplied by zeroes. Spectrum::c 181 Spectrum Public Methods ¡¡ ¢ bool Black() const { for (int i = 0; i < COLOR_SAMPLES; ++i) if (c[i] != 0.) return false; return true; } Also useful are functions that take the square root of a spectrum or raise the components of a Spectrum to a given power (note that the power is also given as a Spectrum, to allow component-wise powers Pow() is not used, do we need it?). Because the product of two spectra is computed with products of their coefﬁcients, taking the square root of the coefﬁcients gives the square root of the SPD. The square root of a spectrum is used to approximate Fresnel phenomena in Chapter 9. Spectrum Public Methods ¡¡ ¢ Spectrum Sqrt() const { Spectrum ret; for (int i = 0; i < COLOR_SAMPLES; ++i) ret.c[i] = sqrtf(c[i]); return ret; } Spectrum Public Methods ¡¡ ¢ Spectrum Pow(const Spectrum &e) const { Spectrum ret; for (int i = 0; i < COLOR_SAMPLES; ++i) ret.c[i] = c[i] > 0 ? powf(c[i], e.c[i]) : 0.f; return ret; } Sec. 5.1] Spectral Representation 183 And for volume rendering... Spectrum Public Methods ¡¡ ¢ Spectrum operator-() const; friend Spectrum Exp(const Spectrum &s); Some portions of the image-processing pipeline will want to clamp a spectrum to ensure that its coefﬁcients are within some allowable range. Spectrum Public Methods ¡¡ ¢ Spectrum Clamp(Float low, Float high) const { Spectrum ret; for (int i = 0; i < COLOR_SAMPLES; ++i) ret.c[i] = ::Clamp(c[i], low, high); return ret; } Finally, we provide a useful debugging routine to check if any of the coefﬁcients of an SPD is NaN. This frequently happens when code accidentaly divides by zero. Spectrum Public Methods ¡¡ ¢ bool IsNaN() const { for (int i = 0; i < COLOR_SAMPLES; ++i) 677 Clamp() if (isnan(c[i])) return true; 180 COLOR SAMPLES return false; 181 Spectrum } 181 Spectrum::c 5.1.2 XYZ Color A remarkable property of the human visual system makes it possible to repre- sent colors with just three ﬂoating-point numbers. The tristimulus theory of color perception says that all visible SPDs can be accurately represented for human ob- servers with three values, xλ , yλ , and zλ . Given a SPD S λ , these values are com- ¡ puted by convolving it with the spectral matching curves, X λ , Y λ and Z λ : ¡ ¡ ¡ xλ ¢ ¤ S λ X λ dλ ¡ ¡ λ yλ ¢ ¤ S λ Y λ dλ ¡ ¡ λ zλ ¢ ¤ S λ Z λ dλ ¡ ¡ ¤ λ ´ These curves were determined by the Commission Internationale de l’ Eclairge (CIE) standards body after a series of experiments with human test subjects. and are graphed in Figure 8.5. It is believed that these matching curves are generally similar to the responses of the three types of color-sensitive cones in the human retina. Remarkably, SPDs with substantially different distributions may have very similar xλ , yλ , and zλ values. To the human observer, such SPDs actually appear the same visually. Pairs of such spectra are called metamers. This brings us to a subtle point about color spaces and spectral power distribu- tions. Most color spaces attempt to model colors that are visible to humans, and 184 Color and Radiometry [Ch. 5 therefore use only three coefﬁcients, exploiting the tristimulus theory of color per- ception. Although XYZ works well to represent a given SPD to be displayed for a human observer, it is not a particularly good set of basis functions for spectral computation. For example, though XYZ values would work well to describe the perceived color of lemon-skin or a ﬂuorescent light individually (recall Figure 5.1, which graphs these two SPDs), the product of their respective XYZ values is likely to give a noticeably different XYZ color than the XYZ value computed by multi- plying more accurate representations of their SPDs and then computing the XYZ value. With that in mind, we will add a method to the Spectrum class that returns the XYZ values for its SPD. It turns out that when converting a spectrum described by basis function coefﬁcients in one basis to another basis, the new basis function coefﬁcients can be written as weighted sums of the old basis function coefﬁcients. For example, for xλ , xλ ¢ ¤ S λ X λ dλ ¡ ¡ ¤ λ ∑ ci Bi λ ¡ ¡ ¢ X λ dλ ¡ i COLOR SAMPLES 180 Spectrum 181 ¢ ∑ ci ¤ λ Bi λ X λ dλ ¡ ¡ i ¡ ¢ ∑ ci wx i ¤ i Thus, the weight values wx i , wy i and wz i can be precomputed and stored in an array for whatever particular basis functions are being used. The Spectrum::XYZ() method uses these arrays to return the spectrum’s XYZ representation. Spectrum Public Methods ¡¡ ¢ void XYZ(Float xyz[3]) const { xyz[0] = xyz[1] = xyz[2] = 0.; for (int i = 0; i < COLOR_SAMPLES; ++i) { xyz[0] += XWeight[i] * c[i]; xyz[1] += YWeight[i] * c[i]; xyz[2] += ZWeight[i] * c[i]; } } Therefore, we now ﬁnally need to settle on the default set of SPD basis functions for lrt. Though not sufﬁcient for high-quality spectral computations, an expedi- ent choice is to use the spectra of standard red, green, and blue phosphors for televisions and CRT display tubes. A standard set of these RGB spectra has been deﬁned for high-deﬁnition television; the weights to convert from these RGBs to XYZ values are below: this sucks. If the user changes COLOR SAMPLES, it should just work. Should we actually go ahead and do spectral rendering? How much work would that be? This is a pretty non-physically based part of LRT right here. Also the LRT input should be able to convert any given input color data into its own internal representation. Can the LRT input take RGB, XYZ, and/or sampled spectral data and use it out of the box? Sec. 5.2] Basic Radiometry 185 Spectrum Method Deﬁnitions ¡¡ ¢ Float Spectrum::XWeight[COLOR_SAMPLES] = { 0.412453f, 0.357580f, 0.180423f }; Float Spectrum::YWeight[COLOR_SAMPLES] = { 0.212671f, 0.715160f, 0.072169f }; Float Spectrum::ZWeight[COLOR_SAMPLES] = { 0.019334f, 0.119193f, 0.950227f }; For convenience in computing values for XWeight, YWeight and ZWeight for other spectral basis functions, we will also provide the values of the standard X λ , ¡ Y λ , and Z λ response curves sampled at 1nm increments from 360nm to 830nm. ¡ ¡ Spectrum Public Data ¢ £¡ static const int CIEstart = 360; static const int CIEend = 830; static const int nCIE = CIEend-CIEstart+1; static const Float CIE_X[nCIE]; static const Float CIE_Y[nCIE]; static const Float CIE_Z[nCIE]; 180 COLOR SAMPLES 181 Spectrum The y coordinate of the XYZ color is closely related to luminance, which mea- 181 Spectrum::c sures the percieved brightness of a color. (Luminance is discussed in more detail in Section 8.3.1.) For the convenience of methods there, we will provide a method to compute it alone in a separate utility method. Spectrum Public Methods ¡¡ ¢ Float y() const { Float v = 0.; for (int i = 0; i < COLOR_SAMPLES; ++i) v += YWeight[i] * c[i]; return v; } The y coordinate also gives a convenient way to order Spectrum instances from dark to bright. Spectrum Public Methods ¡¡ ¢ bool operator<(const Spectrum &s2) const { return y() < s2.y(); } ¨ ¡ ¥ § §§ ¨ ¤ £ ¥ £ Radiometry gives us a set of ideas and mathematical tools to describe light prop- agation and reﬂection in environments; it forms the basis of the derivation of the rendering algorithms that will be used throughout the rest of this book. Interest- ingly enough, radiometry wasn’t originally derived from ﬁrst principles using the basic physics of light, but was based on an abstraction of light based on particle 186 Color and Radiometry [Ch. 5 ﬂows. As such, effects like polarization of light do not naturally ﬁt into radiome- try, though connections have since been made between radiometry and Maxwell’s equations, giving it a solid basis in physics. Radiative transfer is the phenomenological study of the transfer of radiant en- ergy. It is based on radiometric principles and operates at the geometrical optics level, where macroscopic properties of light sufﬁce to describe how light interacts with objects much larger than the light’s wavelength. It is not uncommon to in- corporate results from wave optics models, but these results need to be expressed in the language of radiative transfer’s basic abstractions. 1 In this manner, it is possible to describe interactions of light with objects whose size is close to the wavelength of the light and thereby model effects like dispersion and interference. At an even ﬁner level of detail, quantum mechanics is needed to describe light’s interaction with atoms. Fortunately, direct simulation of quantum mechanical prin- ciples is unnecessary for solving rendering problems in computer graphics, so the intractability of such an approach is avoided. In lrt, we will assume that geometric optics is an adequate model for the de- scription of light and light scattering. This leads to a few assumptions about the behavior of light: Linearity: the combined effect of two inputs to an optical system is always equal to the sum of the effects of each of the inputs individually. Energy conservation: more energy is never produced by a scattering event than there was to start with. No polarization: we will ignore polarization of the electromagnetic ﬁeld; as such, the only relevant property of light particles is their wavelength (or frequency). While the radiative transfer framework has been extended to include the effects of polarization, we will ignore this effect for simplicity. No ﬂuorescence or phosphorescence: the behavior of light at one wavelength is completely independent of light’s behavior at other wavelengths. As with polarization, it is not too difﬁcult to include these effects, but they would add little practical value to our system. Steady state: light in the environment is assumed to have reached equlibrium, so its radiance distribution isn’t changing over time. This happens nearly instantaneously with light in realistic scenes. The most signiﬁcant loss from assuming geometric optics is that diffraction and interference effects cannot easily be accounted for. As noted by Preisendorfer, this is hard to ﬁx given these assumptions because, for example, the total ﬂux over two areas isn’t necessarily equal to sum of ﬂux over each individually (Preisendorfer 1965, p. 24). 1 Preisendorfer has connected radiative transfer theory to Maxwell’s classical equations describ- ing electromagnetic ﬁelds (Preisendorfer 1965, Chapter 14); his framework both demonstrates their equivalence and makes it easier to apply results from one world-view to the other. More recent work was done in this area by Fante (Fante 1981). Sec. 5.2] Basic Radiometry 187 Figure 5.2: Radiant ﬂux, Φ, measures energy passing through a surface or region of space. Here, ﬂux from a point light source is being measured at a sphere that surrounds the light. 5.2.1 Basic quantities There are four radiometric quantities that are central to rendering: Flux Irradiance Intensity Radiance All of these quantities are generally functions of wavelength. For the remainder of this chapter, we will not make this dependence explicit, but it is important to keep in mind. Radiant ﬂux, also known as power, is the total amount of energy passing through a surface or region of space per unit time. Its units are J (more commonly “Watts”) s and it is normally signiﬁed by the symbol Φ. Total emission from light sources is generally described in terms of ﬂux; Figure 5.2 shows ﬂux from a point light mea- sured by the total amount of energy passing through the imaginary sphere around the light. Note that the amount of ﬂux measured on either of the two spheres in Figure 5.2 is the same–although less energy is passing through any local part of the large sphere than the small sphere, the greater area of the large sphere accounts for this. W Irradiance (E) is the area density of ﬂux, m2 . For the point light example in Figure 5.2, irradiance on the outer sphere is less than the irradiance on the inner sphere, since the area on the outer sphere is larger. In particular, for a sphere in this conﬁguration that has radius r, Φ E ¢ ¤ 4πr2 This explains why received energy from a light falls off with the squared distance from the light. 188 Color and Radiometry [Ch. 5 A A θ A1 A2 Figure 5.3: Irradiance (E) arriving at a surface varies according to the cosine of the angle of incidence of illumination, since illumination is over a larger area at lower incident directions. This effect was ﬁrst described by Lambert; it is known as Lambert’s Law. The irradiance equation can also help us understand the origin of Lambert’s Law, which says that the amount of light arriving at a surface is related to the cosine of the angle between the light direction and the surface normal–see Figure 5.3. Consider a light source with area A and ﬂux Φ that is shining on a surface. If the light is shining directly down on the surface (left), then the area on the surface receiving light A1 is equal to A and irradiance at any point inside A 1 is Φ E1 ¤ A However, if the light is at an angle to the surface (right), the total area on the surface receiving light is larger. If the area of the light source is small, then the area receiving ﬂux, A2 , is roughly A cos θ. For points inside A 2 , the irradiance is therefore Φ cos θ E2 ¤ A This is the origin of the cosine law for radiance. More formally, to cover the cases like when the emitted ﬂux distribution isn’t constant, irradiance at a point is actually deﬁned as dΦ E (5.2.2) dA where the differential ﬂux from the light is computed over a differential area re- ceiving ﬂux. In order to deﬁne the radiometric quantity intensity, we ﬁrst need to deﬁne the notion of the solid angle. Solid angles are just the extension of two-dimensional angles in a plane to angle on a sphere. The plane angle is the total angle subtended by some object with respect to some position; see Figure 5.4. Consider the unit circle around the point p; if we project the shaded object on to that circle, some Sec. 5.2] Basic Radiometry 189 c p s Figure 5.4: The plane angle of an object c as seen from a point p is equal to the angle it subtends as seen from p, or equivalently as the length of the arc s on the unit sphere. c s Figure 5.5: The solid angle s subtended by an object c in three dimensions is similarly computed by projecting c onto the unit sphere and measuring its area there. length of the circle s will be covered by its projection. The arc-length of s (which is the same as the angle θ) is the angle subtended by the object. Plane angle is given the unit radians. The solid angle extends the 2D unit circle to a 3D unit sphere (Figure 5.5). The total area s is the solid angle subtended by the object. Solid angle is given the unit steradians. The entire sphere subtends a solid angle of 4π and a hemisphere subtends 2π. We will use the symbol ω to describe directions on the unit sphere centered around some point. (These directions can also be thought of as point on the unit sphere around p. We will therefore use the convention that ω is always a normal- ized vector). We can now deﬁne intensity, which is ﬂux density per solid angle, dΦ I ¢ ¤ (5.2.3) dω Intensity is generally only used when describing the distribution of light by direc- tion from point light sources. Finally, radiance (L) is the ﬂux density per unit area, per unit solid angle. In terms of ﬂux, it is d2 Φ L ¢ (5.2.4) dω dA 190 Color and Radiometry [Ch. 5 dω dA dA Figure 5.6: Radiance L is deﬁned at a point by the ratio of the differential ﬂux incident along a direction ω to the differential solid angle dω times the differential projected area of the receiving point. N Li x Figure 5.7: Irradiance at a point p is given by the integral of radiance times the cosine of the incident direction over the entire upper hemisphere above the point. where dA is the projected area of dA on a hypothetical surface perpendicular to ω–see Figure 5.6. All those differential terms don’t need to be as confusing as they initially appear–just think of radiance as the limit of the measurement of incident light at the surface as a small cone of incident directions of interest dω becomes very small, and as the local area of interest on the surface dA also becomes very small. Now that we have deﬁned these various units, it’s easy to derive relations be- tween them. For instance, irradiance at a point p due to radiance over a set of directions Ω is E p L p ω cos θ dω¤ ¢ ¡ (5.2.5) ¡ ¡ ¡ Ω where L p ω denotes the arriving radiance at position p as seen along direction ¡ ¡ ω (see Figure 5.7). (The cos θ term in this integral is due to the dA term in the deﬁnition of radiance.) We are often interested in irradiance over the hemisphere of directions about a given surface normal n, H 2 n or the entire sphere of directions ¡ S 2. £ ¨ ¡ ¦ ¡ ¡ ¤ ¢ ¨ ©§ £ ¥ ¤ ¥ ££ ¢ ¥£ ¡ ¡¦¤¢ § £ One of the main tasks in rendering is integrating information about the values of particular radiometric quantities to compute information about other radiometric Sec. 5.3] Working with Radiometric Integrals 191 c s p Figure 5.8: The projected solid angle subtended by an object c is the cosine- weighted solid angle that it subtends. It can be computed by ﬁnding the object’s solid angle s, projecting it down to the plane, and measuring its area there. Thus, the projected solid angle depends on the surface normal where it is being measured, since the normal orients the plane of projection. quantities. There are a few important tricks that can be used to make this task easier. 5.3.1 Integrals over projected solid angle The various cosine terms in integrals for radiometric quantities can clutter things up and distract from what is being expressed in the integral. There is an different way that the integrals can be written that removes this distraction. The projected solid angle subtended by an object is determined by projecting the object on to the unit sphere, as is done for solid angle, but then projecting the resulting shape down on to the unit disk–see Figure 5.8. Integrals over hemispheres of directions with respect to solid angle can equivalently be written as integrals over projected solid angles. The projected solid angle measure is related to the solid angle measure by dω cos θ dω so the irradiance-from-radiance integral can be written more simply as ¡ ¡ ¡ E p n £ L ω dω £ ¤ H2 n ¢ £ For the rest of this book, we will write integrals over directions in terms of solid angle, rather than projected solid angle. When reading rendering integrals in other contexts, however, be sure to be aware of the measure of domain of integration. 192 Color and Radiometry [Ch. 5 z θ y φ x Figure 5.9: A given direction vector can be written in terms of spherical coordinates θ φ if the x, y, and z basis vectors are given as well. The spherical angle formulae ¡ ¡ make it easy to convert between the two representations. 5.3.2 Integrals over spherical coordinates Vector 27 It is often convenient to transform integrals over solid angle into integrals over spherical coordinates θ φ . Recall that an x y z direction vector can be alterna- ¡ ¡ ¡ ¡ ¡ tively written in terms of spherical angles (see Figure 5.9): x ¢ sin θ cos φ y ¢ sin θ sin φ z ¢ cos θ For convenience, we’ll deﬁne two functions that turn θ and φ values into x y z ¡ ¡ ¡ direction vectors. The ﬁrst applies the equations above directly. Notice that these functions are passed the sine and cosine of θ, but the angle φ. This is because the sine and cosine of θ are frequently available directly to the calling function (through a vector dot product, for example). Geometry Inline Functions ¡¡ ¢ inline Vector SphericalDirection(Float sintheta, Float costheta, Float phi) { return Vector(sintheta * cosf(phi), sintheta * sinf(phi), costheta); } The second function takes three basis vectors to replace the x, y and z axes and returns the appropriate direction vector with respect to the coordinate frame that they deﬁne. Sec. 5.3] Working with Radiometric Integrals 193 Geometry Inline Functions ¡¡ ¢ inline Vector SphericalDirection(Float sintheta, Float costheta, Float phi, const Vector &x, const Vector &y, const Vector &z) { return sintheta * cosf(phi) * x + sintheta * sinf(phi) * y + costheta * z; } The spherical angles for a direction can be found by: θ ¢ arccos z y φ ¢ arctan z Corresponding functions are below. Note that SphericalTheta() assumes that the vector v has been normalized before being passed in. Geometry Inline Functions ¡¡ ¢ inline Float SphericalTheta(const Vector &v) { return acosf(v.z); } Geometry Inline Functions ¡¡ ¢ 678 M PI inline Float SphericalPhi(const Vector &v) { 27 Vector return atan2f(v.y, v.x) + M_PI; } In order to write an integral over solid angle in terms of an integral over θ φ , ¡ ¡ we need to be able to express the relationship between the differential area of a set of directions dω and the differential area of a θ φ pair–see Figure 5.10. The ¡ ¡ differential area dω is the product of the differential lengths of the sides of dω, sin θdφ and dθ. Therefore, dω sin θ dθ dφ ¢ ¤ We can thus see that the irradiance integral over the hemisphere (Equation 5.2.5 with Ω H 2 n ) can equivalently be written ¢ ¡ 2π π 2 L p θ φ cos θ sin θ dθ dφ E p n ¡ ¤ £¡ ¢ ¤ ¡ ¡ ¡ 0 0 So if the radiance is the same from all directions, this simpliﬁes to E πL. ¢ Just as we found irradiance in terms of incident radiance, we can also compute the total ﬂux emitted from some object over the hemisphere about the normal by integrating over the object’s surface area A: Φ ¤ ¤ ¢ L p ω cos θ dω dA ¡ ¡ A H2 n ¡ 5.3.3 Integrals over area One last transformation of integrals that can simplify computation is to turn inte- grals over directions into integrals over area. Consider the irradiance integral again (Equation 5.2.5), where there is a quadrilateral with constant outgoing radiance and 194 Color and Radiometry [Ch. 5 z sin θ dφ dθ sin θ dθ dA θ φ y dφ x Figure 5.10: The differential area dA subtended by a differential solid angle is the product of the differential lengths of the two edges sin θdφ and dθ. The resulting relationship, dω sin θdθdφ, is the key to converting between integrals over solid ¢ angles and integrals over spherical angles. where we’d like to compute the resulting irradiance at a point p. The easiest way to write this integral is over the area of the quadrilateral; writing it as an integral over directions is less straightforward, since given a particular direction, the com- putation to determine if the quadrilateral is visible in that direction is non-trivial. Differential area is related to differential solid angle by dA cos θ dω ¢ (5.3.6) r2 where θ is the angle between the surface normal of dA and r is the distance from p to dA (see Figure 5.11). We will not derive this result here, but it can be understood intuitively: if dA is at distance 1 from p and is aligned exactly so that it is facing down dω, then dω dA, θ 0, and Equation 5.3.6 holds. As dA moves farther away from p, or ¢ ¢ as it rotates so that it’s not aligned with the direction of dω, the r 2 and cos θ terms compensate accordingly to reduce dω. Therefore, we can write the irradiance integral for the quadrilateral source as cos θo dA E p ¤ ¢ ¡ L cos θi A r2 where θi is the angle between the surface normal at p and the direction from p to the point p on the light, and θo is the angle between the surface normal at p on the ¤ ¤ light and the direction from p to p (see Figure 5.12.) ¤ § £ ¡ ¨ ¥ © ¢ ¡¥ ¤¥ £ ¡ When light in an environment is incident on a surface, the surface scatters the light, re-reﬂecting some of it back into the environment. For example, the skin of a lemon mostly absorbs light in the blue wavelengths, but reﬂects most of light in the Sec. 5.4] Surface Reﬂection 195 dA θ r dω x Figure 5.11: The differential solid angle subtended by a differential area dA is equal to dA cos θ r 2 , where θ is the angle between dA’s surface normal and the ¡ vector to the point p and r is the distance from p to dA. N θo θi x Figure 5.12: To compute irradiance at a point p from a quadrilateral source, it’s easier to integrate over the surface area of the source than to integrate over the irregular set of directions that it subtends. The relationship between solid angles and areas given by Equation 5.3.6 lets us go back and forth between the two ap- proaches. 196 Color and Radiometry [Ch. 5 N ωo ωi Figure 5.13: The bidirectional reﬂectance distribution function (BRDF) is a four- dimensional function over pairs of directions ω i and ωo that describes how much incident light along ωi is scattered from the surface in the direction ω o . red and green wavelengths (recall the lemon skin reﬂectance SPD in Figure 5.1.) Therefore, when it is illuminated with white light, its color is yellow. The skin has pretty much the same color no matter what direction it’s being observed from, although for some directions a highlight is visible, where it is more white than yellow. In contrast, the color seen in a mirror depends almost entirely on the viewing direction. At a ﬁxed point on the mirror, as the viewing angle changes, the object that is reﬂected in the mirror changes accordingly. Furthermore, mirrors generally don’t change the color of the object they are reﬂecting. 5.4.1 The BRDF There are a few concepts in radiometry that give formalisms for describing these types of reﬂection. One of the most important is the bidirectional reﬂectance dis- tribution function, (BRDF). Consider the setting in Figure 5.13: we’d like to know how much radiance is leaving the surface in the direction ω o toward the viewer, Lo p ωo as a result of incident radiance along the direction ω i , Li p ωi . The § ¨ § ¨ reader is warned not to be misled by diagrams like Figure 5.13, however. These kinds of diagrams frequently show the scattering situation from a side view, but we always need to be aware that the vectors ω o and ωi are not always co-planar with the surface normal N. If the direction ωi is considered a differential cone of directions, we can compute the resulting differential irradiance at p by dE p ωi § © ¨ L p ωi cos θi dωi § ¨ ¡ (5.4.7) A differential amount of radiance will be reﬂected in the direction ω o . An im- portant assumption made in radiometry is that the system is linear: doubling the amount of energy going into it will lead to a doubling of the amount going out of it. This is a reasonable assumption as long energy levels are not extreme. Therefore, the reﬂected differential radiance is dLo p ωo ∝ dE p ωi § ¨ § ¨ ¡ Sec. 5.4] Surface Reﬂection 197 The constant proportionality for the particular pair of directions ω i and ωo is de- ﬁned to be the surface’s BRDF: dL p ωo dL p ωo fr p ωo ωi ¡ ¡ ¡ ¡ (5.4.8) dE p ωi L p ωi cos θi dωi ¡ ¡ ¢ £¡ ¢ ¡ ¡ ¡ ¡ Physically-based BRDFs have two important qualities: 1. Reciprocity: for all pairs of directions ω i and ωo , fr p ωi ωo ¡ ¡ ¢ ¡ fr p ωo ωi . ¡ ¡ ¡ 2. Energy conservation: the total energy of light reﬂected is less than or equal to the energy of incident light. For all directions ω, ¤ fr p ωo ω cos θdω ¡ ¡ ¡ 1 ¤ S2 The surface’s bidirectional transmittance distribution function (BTDF) can be deﬁned in a similar manner to the BRDF. The BTDF is generally denoted by fr p ωo ωi , where ωi and ωo are in opposite hemispheres around p. Interestingly, ¡ ¡ ¡ the BTDF does not obey reciprocity; we will discuss this in detail in Section 9.2. For convenience in equations, we will denote the BRDF and BTDF considered together as f p ωo ωi ; we will call this the bidirectional scattering distribution ¡ ¡ ¡ function (BSDF). Chapter 9 is entirely devoted to describing BSDFs that are used in graphics. Using the deﬁnition of the BSDF, we have dLo p ωo ¡ ¢ £¡ Li p ωi f p ωo ωi cos θi dωi ¡ ¡ ¡ ¡ ¡ ¤ We can integrate this over the sphere of incident directions around p to compute the outgoing radiance in direction ω o due to the incident illumination at p: explain where the absolute value signs come from in this equation; they’re not in the previous one... Lo p ω o ¡ ¤ ¢ ¡ Li p ωi f p ωo ωi cos θi dωi ¡ ¡ ¡ ¡ §¡ ¦ ¦ (5.4.9) S2 This is a fundamental equation in rendering; it describes how an incident distribu- tion of light at a point is transformed into an outgoing distribution, based on the scattering properties of the surface. It is often called the scattering equation when the sphere S 2 is the domain (as it is here), or the reﬂection equation, when just the upper hemisphere H 2 n is being integrated over. ¡ 198 Color and Radiometry [Ch. 5 ¥ £ § £ £ ¨ ¡ § ¥ ¢ Hall’s book summarizes the state-of-the-art in spectral representations through 1989 (Hall 1989) and Glassner’s Principles of Digital Image Synthesis covers the topic through the mid-90s (?). Meyer was the one of the ﬁrst researchers to closely investigate spectral representations in graphics; XXX. Later, Raso and Fournier proposed a polynomial representation for spectra (Raso and Fournier 1991). Our discussion of SPD representation with basis functions is based on Peercy’s 1993 SIGGRAPH paper (Peercy 1993). In that paper, Peercy chose particular basis functions in a scene-dependent manner: by looking at the SPDs of the lights and reﬂecting objects in the scene, a small number of basis functions that could accu- rately represent the scene’s SPDs were found using characteristic vector analysis. Another approach to spectral representation was investigated by Sun et al; they partitioned SPDs into a smooth base SPD and a set of spikes (Sun, Fracchia, Drew, and Calvert 2001). Each part was represented differently, using basis functions that worked well for each particular type of function. He and Stam have use wave optics stuff in graphics (He, Torrance, Sillion, and Greenberg 1991; Stam 1999). Also cite appropriate part of Preisendorfer and Chandrasekhar. Non-linear approximation paper (cited in Ren’s paper...)XXX Spectrum 181 Arvo has investigated the connection between rendering algorithms in graphics and previous work in transport theory, which applies classical physics to parti- cles and their interactions to predict their overall behavior and global illumination algorithms (Arvo 1993; Arvo 1995). XXX where to get real-world SPD data McCluney’s book on radiometry (McCluney 1994) is an excellent introduction to the topic. Preisendorfer also covers radiometry in an accessible manner and delves into the relationship between radiometry and the physics of light (Preisendor- fer 1965). Moon and Spencer’s books (Moon and Spencer 1936; Moon and Spencer 1948) and Gershun’s article (Gershun 1939) are classic early introductions to ra- diometry. Lambert’s seminal early writings about photometry from the mid-18th century were recently translated by DiLaura (Lambert 2001). ¡ ¥ ¥ £ 5.1 Experiment with different basis functions for spectral representation. How many coefﬁcients are needed for accurate rendering of tricky situations like ﬂuorescent lighting? How much does the particular choice of basis affect the number of coefﬁcients needed? 5.2 Generalize the Spectrum class so that it’s not limited to orthonormal basis functions. Implement Peercy’s approach of choosing basis functions based on the main SPDs in the scene. Does the improvement in accuracy make up for the additional computational expense of computing the products of spectra. 5.3 Generalize the Spectrum class further to support non-linear basis functions. Compare the results to more straightforward spectral representations. Exercises 199 5.4 Compute the irradiance at a point due to a unit-radius disk h units directly above its normal with constant outgoing radiance of 10 J/m 2 sr. Do the com- putation twice, once as an integral over solid angle and once as an integral over area. (Hint: if the results don’t match and you write the integral over the disks’ area as an integral over radius r and an integral over angle θ, see Section XXX in the Monte Carlo chapter for a hint about XXXXXX.) 5.5 Similarly, compute the irradiance at a point due to a square quadrilateral with outgoing radiance of 10 J/m2 sr that has sides of length 1 and is 1 unit directly above it along its surface normal. ¡¡ § ¡ ¢ ¢ £ ¥¢ £ 202 Camera In addition to describing the objects that make up the scene, we also need to describe how the scene is viewed and how its three-dimensional representation is mapped to a two-dimensional image. This chapter describes the Camera class and its implementations, which generate primary to sample the scene and generate the image. By generating these rays in different ways, lrt can create many types of images of the same 3D scene. We will show a few implementations of the Camera interface, each of which generates rays in a different way. §¨ £ § ¤¥ ¤ § ¨ ¥ camera.h* ¢ £¡ #include "lrt.h" #include "color.h" #include "sampling.h" #include "geometry.h" #include "transform.h" Camera Declarations ¡ camera.cpp* ¢ £¡ #include "lrt.h" #include "camera.h" #include "film.h" #include "mc.h" Camera Method Deﬁnitions ¡ We will deﬁne an abstract Camera base class that holds generic camera options and deﬁnes an interface for all camera implementations to provide. § ¡ 202 Camera Models [Ch. 6 Figure 6.1: The camera’s clipping planes give the range of space along the z axis that will be images; objects in front of the hither plane or beyond the yon plane will not be visible in the image. Setting the clipping planes to tightly encompass the objects in the scene is important for many scanline algorithms, but is less important for ray-tracing. Camera Declarations ¢ £¡ class Camera { public: Camera Interface ¡ Ray 36 Camera Public Data ¡ protected: Camera Protected Data ¡ }; The main method that camera subclasses need to implement is Camera::GenerateRay(), which generates a ray for a given image sample. It is important that the camera normalize the direction component of the returned ray—many other parts of the system will depend on this behavior. This method also returns a ﬂoating-point value that gives a weight for the effect that light arriving at the ﬁlm plane along the generated ray will have on the ﬁnal image. Most cameras will always set this to one, although cameras that simulate real physical lens systems might need to set this value based on the optics and geometry of the lens system being simulated. Camera Interface ¢ £¡ virtual Float GenerateRay(const Sample &sample, Ray *ray) const = 0; The base Camera constructor takes a number of parameters that are appropriate for all camera types. They include the transformation that places the camera in the scene, and the near and far clipping planes, which give distances along the camera space z axis that delineate the scene being rendered 1 . Any geometric primitives in front of the near plane or beyond the far plane will not be rendered; see Figure 6.1. Real-world cameras have a shutter that opens for a short period of time to expose the ﬁlm to light; one result of this non-zero exposure time is that objects that move during the ﬁlm exposure time are blurred; this effect is called motion blur. To 1 Althoughthe names “near” and “far” make clear intuitive sense for these planes, graphics sys- tems frequently refer to them as “hither” and “yon”, respectively. Although there is probably a historic reason for this WHAT MIGHT THAT BE?, a practiacal reason is that near and far are reserved keywords in Microsoft’s C and C++ compilers. Sec. 6.1] Camera Model 203 model this effect in lrt, each ray has a time value associated with it–by sampling the scene over a range of times, motion can be captured. Thus, all Cameras store a shutter open and shutter close time. Note, however, that lrt does not currently support motion blur. We provide a properly sampled time value to allow for this future expansion, however. Finally, Cameras contain an instance of the Film class to represent the ﬁnal image to be computed. Film will be described in Chapter 8. Camera Method Deﬁnitions ¡¡ ¢ Camera::Camera(const Transform &world2cam, Float hither, Float yon, Float sopen, Float sclose, Film *f) { WorldToCamera = world2cam; CameraToWorld = WorldToCamera.GetInverse(); ClipHither = hither; ClipYon = yon; ShutterOpen = sopen; ShutterClose = sclose; film = f; } Camera Protected Data ¢ £¡ 202 Camera Transform WorldToCamera, CameraToWorld; 294 Film Float ClipHither, ClipYon; 43 Transform 55 Transform::GetInverse() Float ShutterOpen, ShutterClose; Camera Public Data ¢ £¡ Film *film; 6.1.1 Camera Coordinate Spaces We have already made use of two important modeling coordinate spaces, object space and world space. We will now introduce three more useful coordinate spaces that have to do with the camera and imaging. Including object and world space, we now have the following. (See Figure 6.2.) Object space: This is the coordinate system in which geometric primitives are deﬁned. For example, spheres in lrt are deﬁned to be centered at the origin of object space. World space: While each primitive may have its own object space, there is a single world space that the objects in the scene are placed in relation to. Each primitive has an object to world transformation that determines how it is located in world space. World space is the standard frame that all spaces are deﬁned in terms of. Camera space: A virtual camera is placed in the scene at some world-space point with a particular viewing direction and “up” vector. This deﬁnes a new coordinate system around that point with the origin at the camera’s location. The z axis is mapped to the viewing direction, and the y axis mapped to the up direction.(see Section ?? on page ??.) This is a handy space for reasoning 204 Camera Models [Ch. 6 Camera 202 Figure 6.2: A handful of camera-related coordinate spaces help to simplify the implementation of Cameras. The camera class holds transformations between them. Scene objects in world space are viewed by the camera, which sits at the origin of camera space and looks down the z axis. Objects between the hither and yon planes are projected onto the image plane at z hither in camera space. ¢ The image plane is at z 0 in raster space, where x and y range from 0 0 to ¢ ¡ ¡ xResolution 1 yResolution 1 . Normalized device coordinate (NDC) space ¡ ¡ normalizes raster space so that x and y range from 0 0 to 1 1 . ¡ ¡ ¡ ¡ Sec. 6.2] Projective Camera Models 205 about which objects are potentially visible to the camera. For example, if an object’s camera-space bounding box is entirely behind the z 0 plane (and ¢ the camera doesn’t have a ﬁeld of view wider than 180 degrees), the object will not be visible to the camera. Screen space: Screen space is deﬁned on the image plane. The camera projects objects in camera space onto the image plane; the parts inside the screen window are visible in the image that is generated. What are the x and y extents of this space? This is confusing later I think. Depth z values in screen space range from zero to one, corresponding to points at the near and far clipping planes, respectively. Note that although this is called “screen” space, it is still a 3D coordinate system, since z values are meaningful. NDC Normalized device coordinate space: This is the coordinate system for the actual image being rendered. In x and y, this space ranges from 0 0 to ¡ ¡ 1 1 , with 0 0 being the upper left corner of the image. Depth values are ¡ ¡ ¡ ¡ the same as in screen space and a linear transformation converts from screen to NDC space. Raster space: This is almost the same as NDC space, except the x and y coordinate range from 0 0 to xResolution 1 yResolution 1 . ¡ ¡ 202 Camera ¡ ¡ 294 Film All cameras store a world space to camera space transformation; this can be used to transform primitives in the scene into camera space. The origin of camera space is the camera’s position, and the camera looks down the camera space z axis. The projective cameras in the next section will use matrices to transform between all of these spaces as needed, but cameras with unusual imaging characteristics can’t necessarily represent these transformations with 4x4 matrices. ¨ ¡ ¤ £ ¡ £ ¥ ¡ ¥ § ¤¥ ¤ § £ ¥ ¨ One of the fundamental parts of 3D computer graphics is the 3D viewing prob- lem: how a three-dimensional scene is projected onto a two-dimensional image for display. Most of the classic approaches can be expressed by a 4x4 projective trans- formation matrix. Therefore, we will introduce a projection matrix camera class and then deﬁne two simple camera models. The ﬁrst implements an orthographic projection, and the other implements a perspective projection–these are two classic and widely-used projections. Camera Declarations ¡¡ ¢ class ProjectiveCamera : public Camera { public: ProjectiveCamera Public Methods ¡ protected: ProjectiveCamera Protected Data ¡ }; In addition to the world to camera transformation and the projective transfor- mation matrix, the ProjectiveCamera takes the screen-space extent of the image, clipping plane distances, a pointer to the Film class for the camera, and additional 206 Camera Models [Ch. 6 parameters for motion blur and depth of ﬁeld. Depth of ﬁeld, the implementation of which will be shown at the end of this section, simulates blurriness of out-of-focus objects in real lens systems. Camera Method Deﬁnitions ¡¡ ¢ ProjectiveCamera::ProjectiveCamera(const Transform &w2c, const Transform &proj, const Float Screen[4], Float hither, Float yon, Float sopen, Float sclose, Float lensr, Float focald, Film *f) : Camera(w2c, hither, yon, sopen, sclose, f) { Initialize depth of ﬁeld parameters ¡ Compute projective camera transformations ¡ } The ProjectiveCamera implementations pass the projective transformation up to the base class constructor here. This transformation gives us the camera to screen projection; from that we can compute most of the others that we need. Compute projective camera transformations ¢ £¡ CameraToScreen = proj; Camera 202 WorldToScreen = CameraToScreen * WorldToCamera; Film 294 Compute projective camera screen transformations ¡ Film::xResolution 294 RasterToCamera = CameraToScreen.GetInverse() * RasterToScreen; Film::yResolution 294 ProjectiveCamera 205 Scale() 47 ProjectiveCamera Protected Data ¢ £¡ Transform 43 Transform CameraToScreen, WorldToScreen, RasterToCamera; Transform::GetInverse() 55 Translate() 46 The only non-trivial one of the precomputed transformations is ProjectiveCamera::ScreenToRaste Vector 27 Note the composition of transformations where (reading from bottom to top), we start with a point in screen space, translate so that the upper left corner of the screen is at the origin, and then scale by the reciprocal of the screen width and height, giving us a point with x and y coordinates between zero and one (these are NDC coordinates). Finally, we scale by the raster resolution, so that we end up covering the raster range from 0 0 up to the overall raster resolution. ¡ ¡ Compute projective camera screen transformations ¢ £¡ ScreenToRaster = Scale(film->xResolution, film->yResolution, 1.f) * Scale(1.f / (Screen[1] - Screen[0]), 1.f / (Screen[2] - Screen[3]), 1.f) * Translate(Vector(-Screen[0], -Screen[3], 0.f)); RasterToScreen = ScreenToRaster.GetInverse(); ProjectiveCamera Protected Data ¡¡ ¢ Transform ScreenToRaster, RasterToScreen; 6.2.1 Orthographic Camera Sec. 6.2] Projective Camera Models 207 Figure 6.3: The orthographic view volume is an axis-aligned box in camera space, deﬁned such that objects inside the region are projected onto the z hither face of ¢ the box. orthographic.cpp* ¢ ¡ #include "camera.h" #include "film.h" #include "paramset.h" OrthographicCamera Declarations ¡ 294 Film 208 Orthographic() OrthographicCamera Deﬁnitions ¡ 205 ProjectiveCamera 43 Transform OrthographicCamera Declarations ¢ £¡ class OrthoCamera : public ProjectiveCamera { public: OrthoCamera Public Methods ¡ }; The orthographic transformation takes a rectangular region of the scene and projects it onto the front face of the box that deﬁnes the region. It doesn’t give the effect of foreshortening–objects becoming smaller on the image plane as they get farther away–but it does leave parallel lines parallel and preserves relative distance between objects. Figure 6.3 shows how this rectangular volume gives the visible region of the scene. The orthographic camera constructor generates the orthographic transformation matrix with the Orthographic() transformation function, which will be deﬁned shortly. OrthographicCamera Deﬁnitions ¢ £¡ OrthoCamera::OrthoCamera(const Transform &world2cam, const Float Screen[4], Float hither, Float yon, Float sopen, Float sclose, Float lensr, Float focald, Film *f) : ProjectiveCamera(world2cam, Orthographic(hither, yon), Screen, hither, yon, sopen, sclose, lensr, focald, f) { } The orthographic viewing transformation leaves x and y coordinates unchanged, but maps z values at the hither plane to 0 and z values at the yon plane to 1. (See 208 Camera Models [Ch. 6 Figure 6.4: orthographic ray generation: raster space to ray... Figure 6.3.) It is easy to derive: ﬁrst, the scene is translated along the z axis so that the near clipping plane is aligned with z 0. Then, the scene is scaled in z so that ¢ the far clipping plane maps to z 1. The composition of these two transformations ¢ Scale() 47 gives the overall transformation. Transform 43 Translate() 46 Transform Method Deﬁnitions ¡¡ ¢ Vector 27 Transform Orthographic(Float znear, Float zfar) { return Scale(1.f, 1.f, 1.f / (zfar-znear)) * Translate(Vector(0.f, 0.f, -znear)); } We can now write the code to take a sample point in raster space and turn it into a camera ray. The Sample::imageX and Sample::imageY components of the camera sample are raster-space x and y coordinates on the image plane (the contents of the Sample structure are described in detail in chapter 7. We use fol- lowing process: ﬁrst, we transform the raster-space sample position into a point in camera space; this gives us the origin of the camera ray–a point located on the near clipping plane. Because the camera-space viewing direction points down the z axis, the camera space ray direction is 0 0 1 . The ray’s maxt value is set so that ¡ ¡ ¡ intersections beyond the far clipping plane will be ignored; this is easily computed since the ray’s direction is normalized. Finally, the ray is transformed into world space before this method returns. If depth of ﬁeld has been enabled for this scene, the fragment Modify ray for depth of ﬁeld takes care of modifying the ray so that depth of ﬁeld is simulated. ¡ Depth of ﬁeld will be explained later in this section. Need to ensure no scaling in camera to world... Sec. 6.2] Projective Camera Models 209 OrthographicCamera Deﬁnitions ¡¡ ¢ Float OrthoCamera::GenerateRay(const Sample &sample, Ray *ray) const { Generate raster and camera samples ¡ ray->o = Pcamera; ray->d = Vector(0,0,1); Set ray time value¡ Modify ray for depth of ﬁeld ¡ ray->mint = 0.; ray->maxt = ClipYon - ClipHither; ray->d = ray->d.Hat(); CameraToWorld(*ray, ray); return 1.f; } The Sample structure tells us what “time” this ray should be traced (again, this is for a future motion blur expansion). The Sample’s time value ranges between 0 and 1, so we simply use it to linearly interpolate between the provided shutter open and close times. Set ray time value ¢ £¡ ray->time = Lerp(sample.time, ShutterOpen, ShutterClose); 203 Camera::CameraToWorld Once all of the transformation matrices have been set up, we just set up the raster 203 Camera::ClipHither space sample point and transform it to camera space. 203 Camera::ClipYon 203 Camera::ShutterClose Generate raster and camera samples ¢ £¡ 203 Camera::ShutterOpen Point Pras(sample.imageX, sample.imageY, 0); 677 Lerp() 207 OrthoCamera Point Pcamera; 33 Point RasterToCamera(Pras, &Pcamera); 206 ProjectiveCamera::RasterToCamera 36 Ray 35 Ray::d 36 Ray::maxt 6.2.2 Perspective Camera 36 Ray::mint 35 Ray::o perspective.cpp* ¢ £¡ 36 Ray::time 239 Sample::imageX #include "camera.h" 239 Sample::imageY #include "film.h" 239 Sample::time #include "paramset.h" 27 Vector PerspectiveCamera Declarations ¡ PerspectiveCamera Method Deﬁnitions ¡ The perspective projection is similar to the orthographic projection in that it projects a volume of space onto a 2D image plane. However, it includes the ef- fect of foreshortening: objects that are far away are projected to be smaller than objects of the same size that are closer. Furthermore, unlike the orthographic pro- jection, the perspective projection also doesn’t preserve distances or angles in gen- eral, and parallel lines no longer remain parallel. The perspective projection is a reasonably close match for how the eye and camera lenses generate images of the three-dimensional world. 210 Camera Models [Ch. 6 Figure 6.5: The perspective transformation matrix projects points in camera space onto the image plane. The x and y coordinates of the projected points are equal to ¤ ¤ the unprojected x and y coordinates divided by the z coordinate. The projected z ¤ coordinate is computed so that z points on the hither plane map to z 0 and points ¤ ¢ on the yon plane map to z 1. ¤ ¢ Film 294 ProjectiveCamera 205 Transform 43 PerspectiveCamera Declarations ¢ £¡ class PerspectiveCamera : public ProjectiveCamera { public: PerspectiveCamera Public Methods ¡ }; PerspectiveCamera Method Deﬁnitions ¢ £¡ PerspectiveCamera::PerspectiveCamera(const Transform &world2cam, const Float Screen[4], Float hither, Float yon, Float sopen, Float sclose, Float lensr, Float focald, Float fov, Film *f) : ProjectiveCamera(world2cam, Perspective(fov, hither, yon), Screen, hither, yon, sopen, sclose, lensr, focald, f) { } The perspective projection describes perspective viewing of the scene. Points in the scene are projected onto a viewing plane at z 1; this is one unit away from ¢ the virtual camera at z 0)–see Figure 6.5. ¢ Transform Method Deﬁnitions ¡¡ ¢ Transform Perspective(Float fov, Float n, Float f) { Perform projective divide ¡ Scale to canonical viewing volume ¡ } The process is most easily understood in two steps: First, points p in camera space are projected onto the viewing plane. A little algebra shows that the projected x and y coordinates on the viewing plane ¤ ¤ Sec. 6.2] Projective Camera Models 211 can be computed by dividing x and y by the point’s z coordinate value. The projected z depth is remapped so that z values at the hither plane go to 0 and z values at the yon plane go to 1. The computation we’d like to do is: ¡ x ¤ ¢ x z ¡ y ¤ ¢ y z f z n z ¤ ¢ z f n ¤ ¡¡ Fortunately, all of this can easily be encoded in a four-by-four matrix using homogeneous coordinates: 1 0 0 0 £ ¥ ¤ ¦ 0 1 0 0 ¡ f fn ¡ 0 0 f n f n ¥ ¥ 0 0 1 0 Perform projective divide ¢ £¡ Matrix4x4 *persp = new Matrix4x4(1, 0, 0, 0, 675 Matrix4x4 0, 1, 0, 0, 677 Radians() 0, 0, f/(f-n), -f*n/(f-n), 47 Scale() 43 Transform 0, 0, 1, 0); Second, we account for the angular ﬁeld of view speciﬁed by the user and scale the x y values on the projection plane so that points inside the ﬁeld of ¡ ¡ view project to coordinates between 1 1 on the view plane. (For square ¡ ¢ images, both x and y will lie between 1 1 in screen space. Otherwise, the ¡ ¢ direction in which the image is narrower will map to 1 1 and the wider ¡ ¢ direction will map to an appropriately larger range of screen-space values.) The scale that is applied after the projective transformation takes care of this. (Recall that the tangent is equal to the ratio of the opposite side of a right triangle to the adjacent side. Here the adjacent side is deﬁned to have ¡ a length of 1, so the opposite side has the length tan fov 2 . Scaling by the ¡ reciprocal of this length this maps the ﬁeld of view to range from 1 1 . ¡ ¢ Scale to canonical viewing volume ¢ £¡ Float invTanAng = 1.f / tanf(Radians(fov) / 2.f); return Scale(invTanAng, invTanAng, 1) * Transform(persp); this is confusing. Doesn’t ortho have the same rastertoscreen transforma- tion as perspective? Where is the -1 1 transformation happening in the ortho camera? Has anyone tested the ortho camera in a while? For a perspective projection, rays originate from the sample position on the hither plane and have the direction given by the vector from 0 0 0 through the ¡ ¡ ¡ sample position. Therefore, we compute the ray’s direction by subtracting 0 0 0 ¡ ¡ ¡ from the sample’s camera-space position. In other words, the ray’s vector direction 212 Camera Models [Ch. 6 is component-wise equal to its point position. Rather than doing a useless subtrac- tion to convert the point to a direction, we just component-wise initialize the vector ray->d from the point Pcamera. Because the generated ray’s direction may be quite short, we scale it up by the inverse of the near clip plane location; although this isn’t strictly necessary (there’s no particular need for the ray direction to be normalized), it can be more intuitive when debugging if the ray’s direction has a magnitude somewhat close to one. As with the OrthoCamera, the ray’s maxt value is set to lie on the far clipping plane. PerspectiveCamera Method Deﬁnitions ¡¡ ¢ Float PerspectiveCamera::GenerateRay(const Sample &sample, Ray *ray) const { Generate raster and camera samples ¡ ray->o = Pcamera; ray->d = Vector(Pcamera.x, Pcamera.y, Pcamera.z); Set ray time value ¡ Modify ray for depth of ﬁeld ¡ ray->d = ray->d.Hat(); ray->mint = 0.; Camera::CameraToWorld 203 ray->maxt = (ClipYon - ClipHither) / ray->d.z; Camera::ClipHither 203 CameraToWorld(*ray, ray); Camera::ClipYon 203 PerspectiveCamera 210 return 1.f; Ray 36 } Ray::d 35 Ray::maxt 36 Ray::mint 36 6.2.3 Depth of Field Ray::o 35 Vector 27 Real cameras have lens systems that focus light through a ﬁnite-sized aperture onto the ﬁlm plane. Because the aperture has ﬁnite area, a single point in the scene may be projected onto an area on the ﬁlm plane. (And correspondingly, a single point on the ﬁlm plane may see different parts of the scene, depending on which part of the lens it’s receiving light from.) Figure 6.6 shows this effect. The point p 1 doesn’t lie on the plane of focus, so is projected through the lens onto an area p 1 ¤ on the ﬁlm plane. The point p2 does lie on the plane of focus, so it projects to a single point p2 on the image plane. Therefore, p1 will be blurred on the image ¤ plane while p2 will be in sharp focus. To understand how to compute the proper ray through an arbitrary point on the lens, we make the simplifying assumption that we are using a single spherical lens. Figure 6.7 shows such a lens, as well as many quantities that we will refer to in the following derivation. These quantities are summarized in the following table: U Angle of the ray with respect to the lens axis U Angle of refracted ray with respect to the lens axis ¤ Φ Angle of sphere normal at hit point with respect to the lens axis I Angle of incidence of primary ray I Angle of incidence of refracted ray ¤ Z distance from the ray’s sphere intersection to the ray’s axis intersection Z distance from the ray’s sphere intersection to the refracted ray’s axis intersection ¤ h height of intersection ε depth of intersection into sphere Sec. 6.2] Projective Camera Models 213 Figure 6.6: Real-world cameras have a lens with ﬁnite aperture and lens controls that adjust the lens position with respect to the ﬁlm plane. Because the aperture is ﬁnite, objects in the scene aren’t all imaged onto the ﬁlm in perfect focus. Here, the point p1 doesn’t lie on the plane of points in perfect focus, so it images to an area p1 on the ﬁlm and is blurred. The point p 2 does lie on the focal plane, so it images to a point p2 and is in focus. Both increasing aperture size and increasing an object’s distance from the focal plane increase its blurriness. Figure 6.7: Cross-section of a spherical lens. 214 Camera Models [Ch. 6 Note that I Φ U and I = Φ U . Now, in order to determine the relationships ¢ ¤ ¤ between these variables, we ﬁrst make two simplifying assumptions: R is big, and the incoming ray is nearly parallel to the lens axis. These assumptions are called the paraxial assumptions. Two immediate consequences of these assumptions are that ε 0 and for most small angular quantities α, sin α tan α α. ¢ ¢ ¢ Now, we simply apply Snell’s law and simplify with our approximations. Snell’s law gives: η sin I ¤ ¤ ¢ η sin I But by the paraxial approximation, this is ηI ¤ ¤ ¢ η η Φ ¤ U ¤ ¢ ¢ η Φ U ¡ Now, we make use of some angular approximations: h U tanU ¢ z h U ¤ tanU ¤ ¢ z ¤ h Φ sin Φ ¢ R Substituting these approximations, we have h h h h η ¤ ¢ η R z ¤ ¡ R z ¡ Cancelling h and rearranging terms gives η ¤ η η ¤ η ¢ z ¤ z R 1 1 Notice that the form of this equation looks like z z C, which leads directly ¢ to the perspective transformation. Also note that the relationship between z and z ¤ does not depend on the angle of incidence; all rays through z refract to z . This is ¤ how lenses are able to focus! Of course, a real (thin) lens will have two spherical surfaces through which rays refract. Each surface contributes a η Rη term, so the refractive power of a thin lens ¥ is given by 1 1 1 η η ¤ ¢ ¢ R1 R2 f ¡ 1 f is called the focal length of the lens, and is measured in units of m , which are sometimes called diopters. Note that the focal length is not the same as the focal distance. Focal length is an inherent property of a lens and does not change 2 . 2 Readers familiar with traditional photography have probably used a zoom lens; these are special kinds of lenses that do in fact allow the focal length to change. This is accomplished by using multiple glass lenses and moving them relative to each other. The full lens system then has a focal Sec. 6.2] Projective Camera Models 215 Figure 6.8: To adjust a camera ray for depth of ﬁeld, we ﬁrst compute the distance along the ray, ft, where it intersects the focal plane. We then shift the ray’s origin from the center of the lens to the sampled lens position and construct a new ray (dashed line) from the new origin that still goes through the same point on the focal plane. This ensures that points on the focal plane remain in focus but that other points are blurred appropriately. Focal distance, however, is not ﬁxed and can almost always be changed in any camera system. A point in space will image through the lens to a ﬁnite area on the ﬁlm plane, as shown in ﬁgure ??. This area is typically circular, and is called the circle of confusion. The size of the circle of confusion determines how out-of-focus the point is. Note that this size depends on the size of the aperture; larger apertures will yield larger circles of confusion. This is why pinhole cameras render the entire scene perfectly in focus; the inﬁnitessimally small aperture results in extremely small circles of confusion. The size of the circle of confusion is also affected by the distance between the object and the lens. The focal distance is the distance from the lens to the plane where objects project to a circle of confusion with zero radius. These points will appear to be perfectly in focus. It is crucial to understand, however, that all types of ﬁlm (analog or digital) will tolerate a certain amount of blur. This means that objects do not have to be exactly on the focal plane to appear in sharp focus. In the case of computer graphics, this corresponds (roughly) to the circle of confusion being smaller than a pixel. There will be some minimum and maximum distances from the lens at which objects will appear in focus; this range is called the lenses depth of ﬁeld. Projective cameras take two extra parameters for depth of ﬁeld: one sets the size of the lens aperture and the other sets the focal distance. Initialize depth of ﬁeld parameters ¢ £¡ LensRadius = lensr; FocalDistance = focald; ProjectiveCamera Protected Data ¡¡ ¢ Float LensRadius, FocalDistance; length that can be adjusted, even though the focal lengths of the individual glass elements remains ﬁxed. 216 Camera Models [Ch. 6 The math behind computing circles of confusion and depth of ﬁeld boundaries is not difﬁcult; it mostly involves repeated application of similar triangles. Even so, we can simulate focus in a ray tracer without understanding any of these con- structions, and in just a few lines of code: Modify ray for depth of ﬁeld ¢ £¡ if (LensRadius > 0.) { Sample point on lens ¡ Compute point on plane of focus ¡ Update ray for effect of lens ¡ } To see why this is so simple, consider how the projective cameras simulate a pinhole camera: The rays generated for a pinhole camera must all pass through the pinhole (i.e., the center of the lens. However, for a lens of non-zero radius, we would like the ray to be able to pass through an arbitrary point on the lens. Since the camera is pointing down the z axis, we only need to modify the x and y coordinates of the ray origin to accomplish this. The ConcentricSampleDisk() function, deﬁned in Chapter 14, takes a u v ¡ ¡ sample position in 0 1 2 and maps it to the 2D disk with radius 1. To get a point ¡ ¢ on the lens, we scale these coordinates by the lens radius. The Sample provides Camera::ClipHither 203 the u v lens-sampling parameters in the Sample::lensX and Sample::lensY ConcentricSampleDisk() 515 ¡ ¡ Point 33 ﬁelds. can we rename these to lensU and lensV? tiveCamera::FocalDistance 215 Sample point on lens ¢ £¡ jectiveCamera::LensRadius 215 Ray::d 35 Float lens_x, lens_y; Sample::lensX 239 ConcentricSampleDisk(sample.lensX, sample.lensY, &lens_x, &lens_y); Sample::lensY 239 lens_x *= LensRadius; lens_y *= LensRadius; Once we have adjusted the origin of the ray away from the center of the lens, we need to determine the proper direction for the new ray. We could compute this using Snell’s law, but the paraxial approximation and our knowledge of focus makes this much simpler. We know that all rays from our given image sample through the lens must converge somewhere on the focal plane. Finding this point of convergence is extremely simple; we just compute it directly for the ray through the center of the lens. Since rays through the lens center remain straight, no refraction is required! Since we know that the focal plane is perpendicular to the z axis and the ray originates on the near clipping plane, intersecting the lens through the ray center with the plane is particularly simple. The t value of the intersection is given by: focalDistance hither t ¢ drz ¡ Compute point on plane of focus ¢ £¡ Float ft = (FocalDistance - ClipHither) / ray->d.z; Point Pfocus = (*ray)(ft); Now we can compute the ray. The origin is shifted to the sampled point on the lens and the direction is set so that the ray still passes through the point on the plane of focus, Pfocus. Sec. 6.3] Environment Camera 217 Update ray for effect of lens ¢ £¡ ray->o.x += lens_x; ray->o.y += lens_y; ray->d = Pfocus - ray->o; ¨ ¦ ¡ ¡ £ ¡ £¡ ¤ £ ¡ ¤§¥ £§ ¤ ¥ §£ environment.cpp* ¢ £¡ #include "camera.h" #include "film.h" #include "paramset.h" EnvironmentCamera Declarations ¡ EnvironmentCamera Deﬁnitions ¡ EnvironmentCamera Declarations ¢ £¡ class EnvironmentCamera : public Camera { public: EnvironmentCamera Public Methods ¡ private: EnvironmentCamera Private Data 202 ¡ Camera }; 205 ProjectiveCamera 35 Ray::d One advantage of ray tracing compared to scanline or rasterization rendering 35 Ray::o methods is that it’s easy to have unusual image projections; we have great freedom in how the image sample positions are mapped into ray directions, since the ren- dering algorithm doesn’t depend on properties such as straight lines in the scene always projecting to straight lines in the image. In this section, we will describe a camera model that traces rays in all directions around a point in the scene, giving a two-dimensional view of everything that is visible from that point. Consider a sphere around the camera position in the scene; choosing points on that sphere gives directions to trace rays in. If we parameter- ize the sphere with spherical coordinates, each point on the sphere is associated with a θ φ pair, where θ 0 π and φ 0 2π . (See Section 5.3.2 on page 192 ¡ ¡ ¡ ¢ ¡ ¢ for more details on spherical coordinates.) This type of image is particularly use- ful because it compactly captures a representation of all of the incident light at a point on the scene. It will be useful later when we discuss environment mapping and environment lighting: two rendering techniques that are based on image-based representations of light in a scene. Notice that the EnvironmentCamera derives directly from the Camera class, not the ProjectiveCamera class. This is because the environmental projection is non-linear and cannot be captured by a single 4 4 matrix. An image generated with this kind of projection is shown in Figure 6.9. θ values range from 0 at the top of the image to π at the bottom of the image, and φ values range from 0 to 2π, moving from left to right across the image. All rays generated by this camera have the same origin; for efﬁciency we com- pute the world-space position of the camera once in the constructor. 218 Camera Models [Ch. 6 Figure 6.9: An image rendered with the EnvironmentCamera, which traces rays in all directions from the camera position. The resulting image gives a representation of all light arriving at that point in the scene, and can be used for image-based lighting techniques that will be described in Chapters 13 and 16. EnvironmentCamera Deﬁnitions ¢ £¡ EnvironmentCamera::EnvironmentCamera(const Transform &world2cam, Float hither, Float yon, Float sopen, Float sclose, Camera 202 Film *film) Camera::CameraToWorld 203 : Camera(world2cam, hither, yon, sopen, sclose, film) { Camera::ClipHither 203 Camera::ClipYon 203 rayOrigin = CameraToWorld(Point(0,0,0)); EnvironmentCamera 217 } Film 294 Point 33 EnvironmentCamera Private Data ¢ £¡ Ray 36 Ray::maxt 36 Point rayOrigin; Ray::mint 36 Ray::o 35 Note that the EnvironmentCamera still uses the near and far clipping planes to Transform 43 restrict the value of the ray’s parameter t. In this case, however, these are really clipping spheres, since all rays originate at the same point and radiate outward. given the above, don’t we really want to ignore cliphither and clipyon if we’re trying to sample the environment for lighting? EnvironmentCamera Deﬁnitions ¡¡ ¢ Float EnvironmentCamera::GenerateRay(const Sample &sample, Ray *ray) const { ray->o = rayOrigin; Generate environment camera ray direction ¡ Set ray time value ¡ ray->mint = ClipHither; ray->maxt = ClipYon; return 1.f; } To compute the θ φ coordinates for this ray, we ﬁrst compute NDC coodi- ¡ ¡ nates from the raster image sample position. These are then scaled up to cover the θ φ range and then the spherical coordinate formula is used to comupte the ray ¡ ¡ direction. Further Reading 219 Generate environment camera ray direction ¢ £¡ Float theta = M_PI * sample.imageY / film->yResolution; Float phi = 2 * M_PI * sample.imageX / film->xResolution; Vector dir(sinf(theta) * cosf(phi), cosf(theta), sinf(theta) * sinf(phi)); CameraToWorld(dir, &ray->d); Our readers familiar with cartography will recognize this is the classic Mercator projection. ¥ £ § £ £ ¨ ¡ § ¥ ¢ o Akenine–M¨ ller and Haines have a particularly well-written derivation of the or- thographic and perspective projection matrices in Real Time Rendering (Akenine- o M¨ ller and Haines 2002). Other good references for projections are Rogers and Adams’ Mathematical Elements for Computer Graphics (Rogers and Adams 1990), Watt and Watt (Watt and Watt 1992), Foley et al (Foley, van Dam, Feiner, and Hughes 1990) and Eberly’s book on game engine design (Eberly 2001). (Origi- nally Sutherland sketchpad stuff?) Potmesil and Chakravarty did early work on depth of ﬁeld and motion blur in computer graphics (Potmesil and Chakravarty 1981; Potmesil and Chakravarty 1982; Potmesil and Chakravarty 1983). Cook and collaborators developed a more 203 Camera::CameraToWorld accurate model for these effects based on distribution ray tracing; this is the ap- 294 Film::xResolution proach we have implemented in this chapter (Cook, Porter, and Carpenter 1984; 294 Film::yResolution 678 M PI Cook 1986). 35 Ray::d Kolb et al investigated simulating complex camera lens systems with ray trac- 239 Sample::imageX ing in order to model the imaging effects of real cameras (Kolb, Hanrahan, and 239 27 Sample::imageY Vector Mitchell 1995). Another unusual projection method was used by Greene and Heck- bert for generating images for Omnimax theaters (Greene and Heckbert 1986a). The EnvironmentCamera in this chapter is similar to the camera model described by Musgrave (Musgrave 1992). More about map projections... XXX ¡ ¥ ¥ £ 6.1 Moving camera 6.2 Kolb, Mitchell, and Hanrahan have described a camera model for ray tracing based on simulating the lens system of a real camera, which is comprised of a set of glass lenses arranged to form an image on the ﬁlm plane (Kolb, Hanrahan, and Mitchell 1995). Read their paper and implement a camera model in lrt that implements theyr algorithm for following rays through lens systems. Test your implementation with some of the lens description data from their paper. ¡§ ©§ ¡ ¢ ¢ ¥ ¡ ¡ ¢ ¢ ¤¥ ¡ ¤ £ ¤ ¡ ¥ ¢ ¤ 237 Sampler We will now describe how the Sampler chooses the points at which the image should be sampled, and how the pixels in the output image are computed from the radiance values computed for the samples. We saw a glimpse of how these sample points were used by lrt’s camera model in the previous chapter. The mathematical background for this process is given by sampling theory: the theory of taking dis- crete sample values from continuous signals and then reconstructing new signals from those samples. Most of the previous development of sampling theory has been for encoding and compressing audio (e.g. over the telephone), and for televi- sion signal encoding and transmission. In rendering, we face the two-dimensional instance of this problem, where we’re sampling an image at particular positions by tracing rays into the scene and then using the reconstructed approximation of the image function to compute values for the output pixels that form an image when displayed. It is important to carefully address the sampling problem in a renderer; a relatively small amount of work in improving sampling can substantially improve the images that the system generates. A closely related problem is reconstruction: how to use the samples and the values that were computed for them to compute values for the pixels in the ﬁnal image. Many samples around each pixel may contribute to its ﬁnal value; the way in which they are blended together to compute the pixel’s value can also noticeably affect the quality of the ﬁnal image. § ©§ 222 Sampling and Reconstruction [Ch. 7 Include that famous Seurat painting with a blowup. Do we need permission? Figure 7.1: Seurat’s painting blah blah frenchy something. Notice how the overall painting appears to be a smoothly varying image, while the magniﬁed inset reveals the pointillistic style. £ §¨ £ £ ¥ ¤ ¥ What is an image? Although this might seem like a simple question, it belies a rich ﬁeld of theory called signal processing. Consider ﬁrst the case of human vision. The signal leaving the retina is a bunch of dots, one for each cone or rod 1 . But we do not perceive the world as an up-close Seurat painting (see Figure ??; we see a continuous signal! The rods and cones in the eye sample the world, and our brain reconstructs the world from those samples. Digital image synthesis is not so different. A renderer also produces a collection of individual colored dots. These dots are a discrete approximation to a continuous signal. Of course, this approximation can introduce errors, and the ﬁeld of signal processing helps us understand, quantify, and lessen this error. Everyone would probably agree that we would like as much spatial resolution as possible. However, in practice, our ability to generate ultra-high resolution images is limited by things like screen resolution, computational expense, or in the case of photography, ﬁlm grain. At ﬁrst glance, this situation seems pretty hopeless. How can we hope to capture tiny phenomena in a coarse image? Of course, we know that the situation is not hopeless; we look at images every day without trouble. The answer lies in certain properties of the human visual sys- tem, such as area-averaging and insensitivity to noise. We will not give an overview of human vision in this book; see Glassner (Glassner 1995) for an introduction bet- ter reference for human vision. One simple way to make pictures look their best is just not to display anything that is wrong. While this might seem obvious, it is not simple to deﬁne “wrong” in this context. In practice, this goal is unattainable because of the aforementioned discrete approximation. The sampling and reconstruction process introduces error known as aliasing, which can manifest itself in a variety of ways including jagged edges, strobing, ﬂashing, ﬂickering, or popping. These errors come up because the sampling process discards information from the continuous domain. To make matters worse, the visual system will tend to ﬁll in data where there is none, so we also need to worry about how the missing data will be interpreted by the viewer. In the one dimensional case, consider a signal given by a function f x where ¡ we can evaluate f at any x value we choose. Each such x is a sample position, ¤ ¤ and the value of f x is the sample value. The left half of Figure 7.2 shows a set ¡¤ of samples taken with uniform spacing (indicated by black dots) of a smooth 1D function. From a set of such samples, x f x , we’d like to reconstruct a new ¤ ¡ ¡ ¡¤ signal f˜ that approximates the original function f . On the right side of Figure 7.2 is a piecewise-linear reconstructed function that approximates f x by linearly inter- ¡ polating neighboring sample values (readers already familiar with sampling theory will recognize this as reconstruction with a hat function). Because the only infor- mation we have about f comes from the sample values at the positions x , f˜ is likely¤ 1 Of course the human visual system is substantially more complex than this. Sec. 7.1] Fourier Theory 223 Figure 7.2: By taking a set of point samples of f x , we determine its value at ¡ those positions (left). From the sample values, we can reconstruct a function f˜ x ¡ which is an approximation to f x (right). The sampling theorem, introduced in ¡ Section ??, makes a precise statement about the conditions on f x and the num- ¡ ber of samples taken under which f˜ x is exactly the same as f x . That the exact ¡ ¡ function can sometimes be reconstructed exactly from mere point samples is re- markable. to not match f perfectly since we have no knowledge of f ’s behavior between the sample values that we have. We would like the reconstructed function to match the original function as closely as possible. The tool we will use to understand the quality of the match is Fourier analysis. Although we will review the basic concepts here, we hope that most readers have had some exposure to Fourier analysis already, and we will not present a full treatise on the subject here. Indeed, that would be a book in itself; Glassner provides a few useful chapters with a computer graphics angle (Glassner 1995), and Bracewell has a more complete treatment (Bracewell 2000). Students already very familiar with Fourier analysis are still encouraged to read this sum- mary, since each presentation (unfortunately) seems to introduce its own notation. Fourier analysis is based around the Fourier transform, which represents a sig- nal in “frequency space”. This representation helps us underestand what happens when a signal is turned into a bunch of samples. Basically, the Fourier transform decomposes a signal into a weighted sum of sinusoids. Just as a vector in n can be projected onto an arbitrary basis, so too can functions be projected onto new bases. To understand what this means, ﬁrst consider the example of vectors in 3 . The typical reference frame is some origin O, and three perpendicular unit vectors X, Y, and Z. Vectors in this space are written as V Vx Vy Vz . If we think of ¢ ¡ ¡ ¡ these as weights, we can write V Vx X Vy Y Vz Z. Notice, however, that Vx ¢ is just the inner (dot) product of V and X, and so we can alternately write V ¢ V X X V Y Y V Z Z. ¢ ¡ ¢ ¡ ¢ ¡ This idea of taking the inner product of your object and each basis member in turn generalizes nicely to other spaces, such as the space of functions. For example, assume we have a 1D function y f x and we would like to represent it as a ¢ ¡ weighted sum of basis functions φi x : f x ∑i ci φi x . We would also like our ¡ ¢ ¡ ¡ basis functions to have the same “perpendicular” property as our vector basis; for functions the corresponding notion is called orthogonality. Two functions φi x and φ j x are orthogonal over some interval Γ a b if: ¡ ¡ ¢ ¡ ¢ ¡ b 0 i j φi x φ j x dx ¤ ¢ ¡ ¡ ¢ a ¢ 0 otherwise 224 Sampling and Reconstruction [Ch. 7 Notice that we have taken the compliment of φ i x , in case it is a complex function. ¡ Should explain this – see PODIS for good explanation. Now, suppose we want to project a function f x onto some given orthogonal ¡ basis φi . That is, we would like to write f x ¦ ¥ ∑i ci φi x . By minimizing the ¢ ¡ ¡ mean squared error of the projection over the interval a b , it can be shown that ¡ ¢ the desired ci are: b a f ¡ ¡ x φi x ci ¡¢ a φi x φi x b ¡ ¡ 7.1.1 Complex Exponentials In Fourier analysis, the basis set we will chose are the complex exponentials e iωt , where i 1. There are two important things to understand about the complex ¢ £ exponentials. First, these functions are periodic! Euler’s formula is: eiωt ¢ cos ωt ¡ i sin ωt ¡ From this formulation it is easy to see that the complex exponentials are periodic, with period T 2π . ω ¢ Second, complex exponentials form an orthogonal basis for the space of all 1D functions! We take an inﬁnite family of complex exponentials Ψ n t einωt , where ¢ ¡ n ¢ £ . These functions are orthogonal over a full period Γ t 0 t0 2π , which ω ¢ ¡ ¢ can be seen from the deﬁnition of orthogonality: ¤ Ψn t Ψm t dt ¡ ¡ ¢ intΓ e ¥ inωt imωt e dt Γ intΓ eiωt n m dt ¡ ¢ ¥ 2π If n ¢ m, then this is just ω. If not, 1 e i m n ωt ¡ ¡ ¡ t0 ¤ 2π ω nω ¡ ¢ im ¥ ¡ ¡ t0 1 n ωt0 ei m ei2π m n 1 ¡ ¡ nω ¢ ¥ ¥ im ¡ ¡ But the right hand term is zero because of the periodicity of the complex exponen- tials, so we have orthogonality. At this point we will skip some details regarding the projection of periodic vs. aperiodic functions and simply assert that we can project any function x t onto the ¡ complex exponentials with the formula: 1 ∞ X ω £¡ ¢ ¤ xt e¡ ¥ iωt dt (7.1.1) 2π ¥ ∞ This new function X is a function of frequency. It tells us how much of each frequency ω is present in the original signal. For example, a sinusoid x t sin 2πt ¢ ¡ contains a single frequency ω 1, and the fourier transform of this x t is indeed ¢ ¡ a delta function X ω δ ω 1 sanity check on this. £¡ ¢ ¡ Sec. 7.2] Sampling Theory 225 BOX SINC GAUSSIAN GAUSSIAN CONSTANT DELTA SINUSOID TRANSLATED DELTA SHAH SHAH Table 7.1: Fourier pairs – need better captions and graphs and equations Equation 7.1.1 is called the Fourier analysis equation, or sometimes just the Fourier transform. We can also transform from the frequency domain back to the spatial domain using the Fourier synthesis equation, or the inverse Fourier trans- form: ∞ xt ¤ ¢ ¡ X ω eiωt dt ¡ (7.1.2) ¥ ∞ The reader should be warned that the constants in front of these integrals are not always the same in different derivations. Some authors (notably Glassner (Glassner 1995) and many in the physics community) prefer to multiply each integral by 1 2π to emphasize the symmetry between the two equations. Table 7.1.1 shows some of the more important functions in rendering, and their Fourier transforms. ¤§ ¨ ¡ ¡ ¢ ¤ ¤ ¥ © £ Representing general continuous functions in a computer is impossible. When faced with this task, the typical solution is to carve the function domain into small regions and associate a number with each region. Informally, this is the sampling process. In graphics, we could just choose a value for x, ω, t, and any other parameters we need to trace a ray, and call that a sample. But this is not how the real world works; sensors such as a CCD cell integrate over some ﬁnite area, they don’t sample. This approximation can lead to many different kinds of errors. Of course, this is only half the story. In order to produce something we can see, we have to eventually recreate some continuous intensity function. This is typically the job of the display. For example, in a CRT, each phosphor glows in some distribution not unlike a Gaussian function in area. Pixel intensity has some angular distribution as well; this is mostly uniform for CRT’s, but as anyone who has tried to view a laptop from an angle knows, it can be quite directional on LCD displays. Informally, the process of taking a collection of numbers and converting them back to a continuous signal is the reconstruction process. If you aren’t careful about each of these two processes, you can get all kinds of artifacts. It is sometimes useful to distinguish between artifacts due to sampling and those due to reconstruction; when we wish to be precise we will call sampling artifacts “prealiasing”, and reconstruction artifacts “postaliasing”. Any attempt to ﬁx these errors is broadly classiﬁed as “antialiasing”, although the distinction of “antiprealiasing” and “antipostaliasing” is typically not made. The best way to understand, analyze, and eliminate these errors is through Fourier analysis. 226 Sampling and Reconstruction [Ch. 7 Figure, maybe like the one from the 2000 Subdivision notes? Figure 7.3: The convolution operator. 7.2.1 Convolution Formally, the convolution operation is deﬁned as: ∞ f t ¡ ht ¤ ¢ ¡ f τht ¡ τ dτ¡ ∞ The convolution of two functions is a new function. Intuitively, to evaluate this new function at some value t, we center the function h (typically referred to as the “kernel” or “ﬁlter”) at t, and integrate the product of this shifted version of h and the function f . Figure 7.3 shows this operation in action. This operation is crucial in sampling theory, mainly due to two important facts. The ﬁrst is that the convolution of a function f with a delta function is simply the original function f (this is easy to prove — see exercise ??). The other is the “Convolution Theorem”. This theorem gives us an easy way to compute the Fourier transform of a convolved signal. In particular, the Convolution Theorem answers the question: given two functions f and h with associated Fourier transforms F and H, respectively, what is the Fourier transform Y ω of f h? ¡ From the deﬁnitions, we have: ∞ Y ω ¡ ¢ ¤ f he ¡ ¥ iωt dt ¥ ∞ ∞ ∞ ¢ ¤ ¤ f τht ¡ τ dτ e ¡ ¥ iωt dt ¥ ∞ ¥ ∞ ¡ Changing the order of integration, we get: ∞ ∞ ¢ ¤ f τ ¡ ¤ ht τe ¡ ¥ iωt dt dτ ¥ ∞ ¥ ∞ ¡ A property of the Fourier transform is that F g t t0 ¢ ¡¡ e ¥ iωt0 F g t , so: ¡¡ ∞ ¢ ¤ f τe¡ ¥ iωτ H ω dτ¡ ¥ ∞ ¢ F ωH ω ¡ ¡ This shows that convolution in the spatial domain is equivalent to multiplication in the frequency domain. We leave it to the reader to prove the property of the Fourier transform mentioned above (see exercise ??). We can similarly show that multiplication in the spatial domain is equivalent to convolution in the frequency domain. 7.2.2 Back to Sampling What does this all have to do with sampling and reconstruction? Recall the infor- mal deﬁnition of the sampling process is that we just evaluate a function at some Sec. 7.2] Sampling Theory 227 ¡ Figure 7.4: Formalizing the sampling process. The function f x is multiplied by ¡ £ the shah function IIIT x , leaving an inﬁnite sequence of scaled delta functions. £ regular interval and store the values. Formally, this corresponds to multiplying the ¡ function by a “shah”, or “impulse train” function. The shah III T x is deﬁned as: £ ∞ ∑δ x ¡ ¡ IIIT x £ ¦ nT £ ∞ where T deﬁnes the period, or sampling rate. This formal way of thinking about sampling is shown in ﬁgure 7.4. Of course, this multiplication is happening in the spatial domain. Let’s consider what’s going on in the frequency domain. Furthermore, we will assume that the ¡ function f x is bandlimited2 , so its Fourier transform will have compact support. £ ¡ A representative spectrum for F ω is shown in ﬁgure 7.5. £ ¡ We also know the spectrum of the shah function III T x , from table 7.1.1. The £ Fourier transform of a shah function with period T is another shah function with period 2π . This reciprocal period is crucial — it means that if the samples are T farther apart in the spatial domain, they are closer together in the frequency domain. By the convolution theorem, we know that the fourier transform of our sampled ¡ signal is just the convolution of F ω and this new shah function. But remember £ that convolving a function with a delta function just yields a copy of that function. Therefore, convolving with a shah function yeilds an inﬁnite series of copies of the original function, with spacing equal to the period of the shah. This is shown in ﬁgure 7.6. This is the spectrum of our series of samples. 7.2.3 Reconstruction So how do we get back to our original function? Looking at ﬁgure 7.6, the answer is obvious: just throw away all but one of the copies of our spectrum, obtaining the 2 A function f x is bandlimited if there exists some frequency ω such that f x contains no ¡ ¡ ¢ 0 ¢ frequencies greater than ω0 . 228 Sampling and Reconstruction [Ch. 7 Figure 7.5: A representative bandlimited spectrum F ω . Notice that all bandlim- ¡ ited functions must have spectra with compact support. Figure 7.6: The convolution of F ω and our shah function, resulting in inﬁnitely ¡ many copies of the function F. Sec. 7.2] Sampling Theory 229 Figure 7.7: Multiplying a series of copies of F ω by the appropriate box function ¡ yields the original spectrum. Figure 7.8: Undersampled 1D function: when the original function has undula- tions at a higher frequency than half the sampling frequency, it’s not possible to reconstruct the original function accurately. The result of under-sampling a high- frequency function is aliasing, where low-frequency errors that aren’t present in the original function appear. Here, the reconstructed function on the right has a much larger value over much of the left side of the graph of f˜ x than the original ¡ function f x did.had to take this out of ifdraft for a reference; might not make ¡ sense anymore. original curve F ω . Then we have the original spectrum, and can easily compute ¡ the original function by means of the inverse Fourier transform. In order to throw away all but the center copy of the spectrum, we just multiply (in the frequency domain, remember) our copies by a box function of the appro- priate width, as shown in ﬁgure 7.7. The box function acts as a perfect low-pass ﬁlter. This seems great! We started with F ω , but the sampling process yeilded an ¡ inﬁnite set of copies, so we throw all but one away with a low-pass ﬁlter, and we’re back to the original function. Does this mean that we can always sample any function and perfectly reconstruct it? Surely there must be a catch. 7.2.4 Aliasing The key to succesful reconstruction is the ability to exactly recover the original spectrum F ω by simply multiplying the sampled spectrum with a box function of ¡ the appropriate width. Notice that in ﬁgure 7.6, the spectra are separated by empty space, so perfect reconstruction is possible. Consider what happens, however, if 230 Sampling and Reconstruction [Ch. 7 Figure 7.9: Aliasing from point sampling the function cos x 2 y2 ; at the left side ¡ of the image, the function has a low frequency–tens of pixels per cycle–so it is represented accurately. Moving to the right, however, aliasing artifacts appear in the top image since the sampling rate doesn’t keep up with the function’s highest frequency. If high frequency elements of the signal are removed with ﬁltering before sampling, as was done in the bottom image, the right side of the image takes on a constant grey color. (Example due to Don Mitchell.) Some aliasing errors remain in both images, due to the book printing process. the original function was sampled with a lower sampling rate. Recall that the fourier transform of a shah function III T with period T is a new shah function with period 2π . This means that if the spacing between samples T increases in the spatial domain, the sample spacing decreases in the frequency domain, pushing the copies of our original spectrum F ω closer together. If the ¡ copies get too close together, they actually overlap! Because we are actually adding the copies together, the new spectrum no longer looks like many copies of the original, as shown in ﬁgure ??. When we multiply this new spectrum by a box function, we obtain a spectrum that is similar but not equal to our original F ω . In ¡ particular high-frequency details in the original signal have manifested themselves as low-frequency details in the new signal. These new low-frequency artifacts are called aliases (because high frequencies are “masquerading” as low frequencies), and the resulting signal is said to be aliased. Figure 7.9 shows the effect of undersampling the two-dimensional function f xy ¡ ¢cos x2 y2 ; the origin 0 0 is at the center of the left edge of the image. £¡ ¡ ¡ ¡ At the left side of the top image, the reconstructed image accurately represents the original signal, though as we move farther to the right and f has higher and higher frequency content, aliasing starts. The circular patterns that appear in the center and right of the image are severe aliasing artifacts. A naive solution to this problem would be to simply increase the sampling rate Sec. 7.2] Sampling Theory 231 until the copies of the spectrum are sufﬁciently far apart as to not overlap, thereby eliminating aliasing completely. In fact, the sampling theorem tells us exactly what rate is required. This theorem says that as long as the frequency of uniform sample points ωs is greater than twice the maximum frequency present in the signal ω m , it is possible to reconstruct the original signal perfectly from the samples. This minimum sampling frequency is called the Nyquist frequency. Unfortunately, this assumes that ω m is ﬁnite, and therefore is only relevant for bandlimited signals. Non-bandlimited signals have spectra with inﬁnite support, so no matter how far apart the copies of their spectra are (e.g., how high a sampling rate we use) there will always be overlap. This naturally leads to the question: “what signals are bandlimited”? Unfortunately, it turns out that in computer graph- ics, there are almost no bandlimited signals that we would want to sample. In particular, any function containing a discontinuity cannot be bandlimited, and we can therefore not perfectly sample and reconstruct it. This makes sense, because the function’s discontinuity will always fall between two samples, and the samples provide no information about the location of the discontinuity. If, in spite of all this, we still want to compute an un-aliased set of samples that represent the function, but our sampling rate isn’t high enough to eliminate aliasing, sampling theory offers two options: Sample the function at a higher-frequency (super-sample it). If we can achieve a high enough sampling rate that the Nyquist limit is respected, the resulting sample values can be used to perfectly reconstruct the origi- nal function. Even if this is not possible and aliasing is still present, the error is always less. Filter (e.g., blur) the function so that no high frequencies remain that can’t be captured accuately by the original sampling rate. While the ﬁlter process modiﬁes the original function by blurring it, it is gener- ally preferable to have an alias-free sampled representation of the blurred function than an aliased representation of the original function. On the bottom of Figure 7.9, high frequencies have been removed from the function before sampling. The result is that the image takes on the average value of the function where previously there was aliasing error. For the functions we need to sample in rendering, it’s often either impossible or very difﬁcult to know the frequency content of the signal being sampled. Neverthe- less, the sampling theorem is still useful. First, it tells us the effect of increasing the sampling frequency: the point at which aliasing starts is pushed out to a higher fre- quency. Second, given some particular sampling frequency, it tells us the frequency beyond which we should try to remove high frequency data from the signal; if the function can ﬁlter itself directly according to the rate at which it is being sampled, aliasing can also be reduced. (This idea will be revisited in Chapter 11 when we introduce texture ﬁltering.) 7.2.5 Application to image synthesis The basic application of these ideas to the two-dimensional case of sampling and reconstructing images of rendered scenes is straightforward; we have an image, 232 Sampling and Reconstruction [Ch. 7 which we can think of as a function of two-dimensional x y image locations to ¡ ¡ radiance values L: f xy L ¡ ¡ ¤ The good news is that, with our ray tracer, we can evaluate this function at any x y point that we choose. The bad news is that we can only point sample the ¡ ¡ image function f : it’s not generally possible to pre-ﬁlter f to the high frequencies from it before sampling3 . It is useful to generalize the deﬁnition of the scene function to be a higher- dimensional function that also depends on the time t and u v lens sample position ¡ ¡ at which it is sampled. Because the rays from the Camera are based on these ﬁve quantities, varying any of them gives a different potential ray and thus a potentially- different value of f . For a particular image position, the radiance at that point will generally vary across time and position on the lens (if there are moving objects in the scene and a ﬁnite-aperture camera, respectively.) Even more generally, because many of the integrators deﬁned in Chapter 16 use statistical techniques to estimate the radiance along a given ray, they may return a different radiance value when repeatedly given the same ray. If we further extend the scene radiance function to include sample values used by the integrator (e.g. to choose points on area light sources for illumination computation), we have an Sampler 237 even-higher dimensional image function f x y t u v i 1 i2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ££¤ ¤¤ L ¤ Sampling all of these dimensions well is an important part of generating high- quality imagery efﬁciently; for example, if we ensure that nearby x y positions ¡ ¡ on the image don’t tend to have similar u v positions on the lens, we have better ¡ ¡ coverage of the sample space and better images for a given number of samples. The Sampler classes in the next few sections will address the issue of sampling all of these dimensions as well as possible. 7.2.6 Sources of Aliasing in Rendering Geometry is one of the biggest causes of aliasing in rendered images. When pro- jected onto the image plane, an object’s boundary introduces a step function, where the image function’s value discontinuously jumps from one value to another. A one-dimensional example of a step function is shown in Figure 7.10. Unfortu- nately, step functions have inﬁnite frequency content, which means that no sam- pling density is sufﬁciently high to correctly capture them. Even worse, the perfect reconstruction ﬁlter causes artifacts when applied to aliased samples–ringing arti- facts appear in the reconstructed image, an effect known as Gibbs phenomenon. Figure 7.11 shows an example of this effect for a 1D function. Choosing an effec- tive reconstruction ﬁlter in the face of aliasing requires a mix of science, artistry 3 The pre-ﬁltering approach to antialiasing in image synthesis has been tried. Given a ﬁlter F and a signal S, the ﬁnal pixel value can be written as A δ x y S F dx dy. But we can rewrite this as ¡ ¢ ¡ ¢ A δ x y F S dx dy! Here we are pre-ﬁltering the sampler’s delta function instead of the signal. ¡ ¡ ¢ ¢ ¢ This leads to the idea of generalized rays (e.g. cone tracing or pyramid tracing), which was explored in the mid 1980’s. Eventually this approach was abandoned due to the complexity of intersecting a generalized ray with a primitive, and also due to the errors of repeated approximations introduced by secondary rays. Sec. 7.2] Sampling Theory 233 Figure 7.10: 1D step function: the function discontinuously jumps from one value to another. Such functions have inﬁnitely-high frequency content. As such, a ﬁnite number of point samples can never adequately capture their behavior well enough so that we can apply perfect reconstruction. 1 0 -1 0 1 2 3 4 Figure 7.11: Illustration of Gibbs phenomenon. When a set of aliased samples of a function that hasn’t been sampled at the Nyquist rate is reconstructed with the sinc reconstruction ﬁlter, the reconstructed function will have “ringing” artifacts, where it oscillates around the true function. Here a 1D step function (dashed line) has been sampled with a sample spacing of 0 125. When reconstructed with the ¤ sinc, the ringing appears (solid line). 234 Sampling and Reconstruction [Ch. 7 and personal taste, as we will see later in this chapter. Another source of geomet- ric aliasing is very small objects in the scene: if geometry is small enough that it falls in between samples on the image plane, it can unpredictably disappear and reappear over multiple frames of an animation. Another source of aliasing can come from the texture and materials on an object. Shading aliasing can come from texture maps on objects that haven’t been ﬁltered correctly (addressing this problem is the topic of much of Chapter 11), or from small highlights on shiny surfaces; if the sampling rate is not high enough to sample these features adequately, aliasing will result. Furthermore, a sharp shadow cast by an object introduces another step function in the ﬁnal image; while it is possible to identify the position of step functions from geometric edges on the image plane, detecting step functions from shadow boundaries is much more difﬁcult. The key insight is that we can never remove all sources of aliasing from our images, but we must also develop techniques for mitigating their impact. 7.2.7 Non-uniform sampling Although the image functions that we’ll be sampling are known to have inﬁnite- frequency components and thus can’t be perfectly reconstructed, not all is lost. It turns out that varying the spacing between samples in a non-uniform way can reduce the visual impact of aliasing. For a ﬁxed sampling rate that isn’t sufﬁcient to capture the function, both uniform and non-uniform sampling produce incorrect reconstructed signals. However, non-uniform sampling tends to turn the regular aliasing artifacts into noise, which is less objectionable to the human visual system. Figure 7.12 shows this effect with the same cosine function example as was used as an example previously. On top, we have the function sampled at a ﬁxed rate us- ing uniform samples. Below, we have jittered each sample location, adding a small random number to its position in x and y before evaluating the cosine function. The aliasing patterns have been broken up and transformed into high-frequency noise artifacts. This is an interesting result, since it shows that the best sampling patterns ac- cording to the signal processing view (which only argues for increasing the uniform sample frequency) don’t always give the best results perceptually. In particular, some image artifacts are more visually acceptable than others. This observation will guide our development of good image sampling patterns through the rest of this chapter. 7.2.8 Adaptive sampling One approach that has been suggested to combat aliasing is adaptive super-sampling: if we can identify the regions of the signal with frequencies higher than the Nyquist limit, we can take additional samples in those regions without needing to incur the computational expense of increasing the sampling frequency everywhere. It is hard to get this approach to work well in practice, however, since it’s hard to ﬁnd all of the places where super-sampling is needed. Most schemes for doing so are based on examining adjacent sample values and ﬁnding ones where there is a signiﬁcant change in sample value between the two; the hypothesis is that the signal may have high frequencies in that region. Sec. 7.2] Sampling Theory 235 Figure 7.12: Jittered sampling of the aliased cosine function (top) changes the regular, low-frequency aliasing artifacts from under-sampling the signal into high- frequency noise (bottom). I still see some ring-like structure near the right. Why is that? In general, adjacent sample values cannot tell us with certainty what is really happening in between them: the function may have huge variation between the two of them, but just happen to return to the same value at each. Or adjacent samples may have substantially different values without aliasing being present. For example, the texture ﬁltering algorithms in Chapter 11 work hard to eliminate aliasing due to image maps and procedural textures; we would not want an adaptive sampling routine to take extra samples in an area where texture values are changing quickly but not with excessively high frequencies present. In general, some areas that need super-sampling will always be missed by adap- tive approaches, leaving the only recourse to be increasing the basic sampling rate anyway. Adaptive anti-aliasing works well to turn a very aliased image to a less aliased image, but is usually not able to make a visually ﬂawless image more efﬁ- ciently. 7.2.9 Understanding Pixels There are two important subtleties related to the pixels that constitute a discrete image that are important to keep in mind throughout the remainder of this chapter. First, it is crucial to remember that pixels are point samples of the image function at discrete points on the image plane; there is no “area” associated with a pixel. As Alvy Ray Smith has emphatically pointed out, thinking of pixels as small squares with ﬁnite area is an incorrect mental model that leads to a series of errors (Smith 1995). By introducing the topics of this chapter with a signal processing approach, we have tried to lay the groundwork for the right mental model. 236 Sampling and Reconstruction [Ch. 7 Figure 7.13: Discrete versus continuious representation of pixels... The second issues is that there is a subtlety in how the pixel coordinates are computed. The pixels in the ﬁnal image are naturally deﬁned at discrete integer x y coordinates on a pixel grid but the Samplers in this chapter will be gen- ¡ ¡ erating image samples at continuous ﬂoating-point x y positions. Heckbert has ¡ ¡ written a short note that explains possible pitfalls in the interaction between these two domains. The natural way to map between them is to round continuous coordi- nates to the nearest discrete coordinate; this is appealing since it maps continuous coordinates that happen to have the same value as discrete coordinates to that dis- crete coordinate. However, the result is that given a range of discrete coordinates Sampler 237 spanning a range x0 x1 , then the set of continuous coordinates that covers that ¡ ¢ range is x0 5 x1 5 . Thus, code that generates continuous sample positions ¤ ¡ ¤ ¡ for a given discrete pixel range is littered with 0 5 offsets. It easy to forget some of ¤ these, leading to subtle errors. If instead we truncate continuous coordinates c to discrete coordinates d by d ¢ ¡¢ c ¡ and convert from discrete to continuous by c ¢ d ¤ 5 ¡ then the range of continuous coordinates for the discrete range x 0 x1 is naturally ¡ ¢ x0 x1 1 and our code is much simpler. This convention, which we will adopt in ¡ ¡ lrt, is shown graphically in Figure 7.13. ¨ ¦ ¡ ¤ § ¢ ¥ ¤ § ¡ ¢ £ ¥£ § ¤¦¤¡ ¥ © ¢ The core sampling declarations and functions are in the ﬁles sampling.h and sampling.cpp. Each of the various speciﬁc sample generation plug-ins is in its own source ﬁle in the samplers/ directory. sampling.h* ¢ £¡ #include "lrt.h" #include "geometry.h" Sampling Declarations ¡ Sampling Inline Functions ¡ Sec. 7.3] Image Sampling Interface 237 sampling.cpp* ¢ £¡ #include "lrt.h" #include "sampling.h" #include "transport.h" #include "volume.h" Sampler Method Deﬁnitions ¡ Sample Method Deﬁnitions ¡ Sampling Function Deﬁnitions ¡ We can now start to describe the operation of a few classes that generate good image sampling patterns. It may be surprising to see that some of them are have a signiﬁcant amount of complexity behind them. In practice, creating high-quality sample patterns can substantially improve a ray tracer’s efﬁciency, allowing it to create a high quality image with fewer rays than if a lower-quality pattern was used. The run-time expense for using the best sampling patterns is approximately the same as for lower-quality patterns, and because evaluating the value for each image sample is not inexpensive, doing this work is pays dividends. All of the sampler implementations inherit from an abstract Sampler class that deﬁnes their interface. Samplers have two main tasks: 1. Generating a sequence of multi-dimensional sample positions. Two dimen- sions give the raster-space image sample position and another gives the time at which the sample should be taken; this ranges from zero to one, and is scaled by the camera to cover the time period that the shutter is open. Two more sample values give a u v lens position to sample for depth of ﬁeld; ¡ ¡ these also vary from zero to one. Just as we can conquer the complexity of the 2D image function, most of the light transport algorithms in Chapter 16 use sample points to choose positions on area light sources when estimating illumination from them. For this and other tasks, they are most efﬁcient then their sample points are also well-chosen. We also make choosing these points the job of the Sampler, since the best ways to select these points take into account the sample points chosen for adjacent image samples. 2. Taking the radiance values computed for particular image samples, recon- structing and ﬁltering them, and computing the ﬁnal values for the output pixels, which are usually located at different positions than any of the sam- ples taken. We will describe this part of their operation in Section 7.7. Sampling Declarations ¢ £¡ class Sampler { public: Sampler Interface ¡ Sampler Public Data ¡ }; All Samplers take a few common parameters in their constructors that must be passed on to the base class’s constructor. They are the overall image resolution in the x and y dimensions, the number of samples per pixel to take in each direction, the image crop window in normalized device coordinates 0 1 2 , and a pointer to ¡ ¢ 238 Sampling and Reconstruction [Ch. 7 a Filter to be used to ﬁlter the image samples to compute the ﬁnal pixel values. We store these values in member variables for later use. The samples-per-pixel stuff is kind of annoying; it doesn’t always corre- spond naturally to the sampler at hand. Thoughts? What’s up with the differ- ent number per pixel in x and y? Sampler Method Deﬁnitions ¡¡ ¢ Sampler::Sampler(int xstart, int xend, int ystart, int yend, int xs, int ys) { xPixelSamples = xs; yPixelSamples = ys; xPixelStart = xstart; xPixelEnd = xend; yPixelStart = ystart; yPixelEnd = yend; } The constructor just initializes variables that give the range of pixels in x and y for which we need to generate samples. Samples for pixels ranging from xPixelStart to xPixelEnd-1, inclusive, in x (and analogously in y) need to be generated by the Filter 281 Sampler 237 Sampler. Sampler Public Data ¢ £¡ int xPixelSamples, yPixelSamples; int xPixelStart, xPixelEnd, yPixelStart, yPixelEnd; Samplers need to implement the Sampler::GetNextSample() method, which is here declared as a pure virtual function. The Scene::Render() method calls this function until it returns false; as long as it keeps returning true, it should ﬁll in the sample that is passed in with sample values for the next sample to be taken. All of the dimensions of the sample values it generates have values in the range 0 1 , except for imageX and imageY, which should be given with respect to the ¡ ¢ image size in raster coordinates. Sampler Interface ¡¡ ¢ virtual bool GetNextSample(Sample *sample) = 0; So that it’s easy for the main rendering loop to ﬁgure out what percentage of the scene has been rendered after some number of samples have been processed, the Sampler::TotalSamples() method returns the total expected number of samples that the Sampler will be returning.4 Sampler Interface ¡¡ ¢ int TotalSamples() const { return xPixelSamples * yPixelSamples * (xPixelEnd - xPixelStart) * (yPixelEnd - yPixelStart); } 4 The low discrepancy and best candidate samplers, described later in the chapter, may actually return a few more or less samples than TotalSamples() reports. However, since computing the actual number that they will generate can’t be done quickly, and since an exact number doesn’t need to be known for this purpose, we just return the expected number. Sec. 7.3] Image Sampling Interface 239 7.3.1 Sample representation and allocation The Sample structure is used by Samplers to store a single sample. A single Sample is allocated in the Scene::Render() method. For each camera ray to be generated, this Sample pointer is passed to the Sampler to have its values initial- ized. It is then passed to the camera and integrators, which read values from it to use to construct the camera ray and to do lighting calculations, respectively. Cameras use the ﬁxed ﬁelds in the Sample (imageX, imageY, etc.) to gener- ate an appropriate camera ray, but various integrators have different needs from the camera, depending on the details of the light transport algorithm they imple- ment. For example, the basic Whitted integrator doesn’t do any random sampling, so doesn’t need any sample values, but the direct lighting integrator uses sample values to randomly choose which light source to sample as well as to randomly choose positions on area light sources. Therefore, the integrators will be given an opportunity to request various sample types. Information about what they ask for is stored in the Sample object; when it is later passed to the particular Sampler implementation, it is the Sampler’s responsibility to generate all of the requested types of samples. Sampling Declarations ¡¡ ¢ struct Sample { 237 Sampler Sample Public Methods ¡ Camera Sample Data ¡ Integrator Sample Data ¡ }; Camera Sample Data ¢ £¡ Float imageX, imageY; Float lensX, lensY; Float time; The Sample constructor immediately calls the Integrator::RequestSamples() methods of the surface and volume integrators asking them what samples they will need. The integrators can ask for 1D and/or 2D patterns, each with an arbitrary number of entries. For example, in a scene with two area light sources, where the integrator wanted to trace four shadow rays to the ﬁrst source and eight to the sec- ond, the integrator would ask for two 2D sample patterns for each image sample, with four and eight samples each. (It needs a 2D pattern since two dimensions are needed to parameterize the surface of a light.) If the integrator wanted to randomly select a single light source of out many, it could request a 1D sample pattern for this purpose. By informing the Sampler of as much of the random sampling that will be done during integration as possible, the Integrator makes it possible for the Sampler to carefully construct sample points that cover the high-dimensional sample space well. For example, if nearby image samples tend to use different parts of the area lights for their illumination ocmputations, the resulting images are generally better, since more information has been discovered. In lrt, we don’t allow the integrator to request three or higher dimensional sample patterns; these are much less commonly needed for rendering than one and two dimensional patterns. If necessary, the integrator can combine points from 240 Sampling and Reconstruction [Ch. 7 lower-dimensional patterns to get higher-dimensional sample points (e.g. a 1D and a 2D sample pattern to form a 3D pattern.) This may not be quite as good a set of sample points as we could generate with a direct 3D sample generation process, but the shortcomings aren’t too bad in practice. This is the reason we provide 2D patterns instead of expecting integrators to request two 1D patterns. The integrators’ implementations of Integrator::RequestSamples() will in turn call the Sample::Add1D() and Sample::Add2D() methods below, which re- quest another sample sequence in one or two dimensions, respectively, with a given number of sample values. After they are done calling these methods, the Sample constructor can continue, allocating storage for the requested sample values. Sample Method Deﬁnitions ¢ £¡ Sample::Sample(SurfaceIntegrator *surf, VolumeIntegrator *vol, const Scene *scene) { surf->RequestSamples(this, scene); vol->RequestSamples(this, scene); Allocate storage for sample pointers ¡ Compute total number of sample values needed ¡ Allocate storage for sample values ¡ } Sample::n1D 241 Sample::n2D 241 The Sample::Add1D() and Sample::Add2D() methods let the integrators ask Scene 8 for 1D and 2D sets of samples; the implementations of these methods just record SurfaceIntegrator 563 the number of samples asked for and return an integer tag that the integrator can VolumeIntegrator 630 later use to access the sample values in the Sample. Sample Public Methods ¡¡ ¢ u_int Add1D(u_int num) { n1D.push_back(num); return n1D.size()-1; } Sample Public Methods ¡¡ ¢ u_int Add2D(u_int num) { n2D.push_back(num); return n2D.size()-1; } It is the Sampler’s responsibility to store the samples it generates for the in- tegrators in the Sample::oneD and Sample::twoD arrays. For 1D sample pat- terns, it needs to generate n1D.size() independent patterns, where the ith pat- tern has n1D[i] sample values. These values are stored in oneD[i][0] through oneD[i][n1D[i]-1]. To access the samples, the integrator stores the sample tag returned by Add1D() in a member variable (for example, sampleOffset), and can then access the sam- ple values in a loop like: for (i = 0; i < sample->n1D[sampleOffset]; ++i) { Float s = sample->oneD[sampleOffset][i]; ... } Sec. 7.3] Image Sampling Interface 241 In 2D, the process is equivalent, but where the ith sample is given by the two values sample->twoD[offset][2*i] and sample->twoD[offset][2*i+1]. Integrator Sample Data ¢ £¡ vector<u_int> n1D, n2D; Float **oneD, **twoD; The Sample constructor ﬁrst allocates memory to store the pointers. Rather than allocating memory twice, we do a single allocation that gives enough memory for both 1D and 2D pointers to the oneD and twoD sample arrays. twoD is then set to point at an appropriate offset into this memory, after the last pointer for oneD. Splitting up a single allocation like this is useful here because it ensures that oneD and twoD point to nearby locations in memory, which is likely to reduce cache misses. Allocate storage for sample pointers ¢ £¡ int nPtrs = n1D.size() + n2D.size(); if (!nPtrs) { oneD = twoD = NULL; return; } oneD = (Float **)AllocAligned(nPtrs * sizeof(Float *)); 667 AllocAligned() twoD = oneD + n1D.size(); 658 vector We then use the same trick for allocating memory for the actual sample values. First we ﬁnd the total number of Float values needed. Compute total number of sample values needed ¢ £¡ int totSamples = 0; for (u_int i = 0; i < n1D.size(); ++i) totSamples += n1D[i]; for (u_int i = 0; i < n2D.size(); ++i) totSamples += 2 * n2D[i]; And then we do a single allocation, handing it out in pieces to the various col- lections of samples. Allocate storage for sample values ¢ £¡ Float *mem = (Float *)AllocAligned(totSamples * sizeof(Float)); for (u_int i = 0; i < n1D.size(); ++i) { oneD[i] = mem; mem += n1D[i]; } for (u_int i = 0; i < n2D.size(); ++i) { twoD[i] = mem; mem += 2 * n2D[i]; } The Sample destructor, not shown here, just frees the dynamically allocated memory. 242 Sampling and Reconstruction [Ch. 7 §££ ¨ ¡ £ ¤ ¡ ¤¥ ¨ ¤§ ¡ ¢ The ﬁrst sample generator that we will introduce divides the image plane into rectangular regions and generates a single sample inside each region. These regions are commonly called strata, and this sampler is called the StratifiedSampler. The key idea behind stratiﬁcation is that by subdividing the sampling domain into non-overlapping regions and taking a single sample from each one, we are less likely to miss important features of the image entirely, since the samples are guar- anteed to not be all bunched together. Put another way, for a given number of samples, it does us no good if multiple samples are taken from nearby points in the sample space, since they don’t give us much new information about the behavior of the image function that we didn’t have already. Better is to take samples far away from the ones we’ve already taken; stratiﬁcation improves this substantially. From a signal processing viewpoint, we are implicitly deﬁning an overall sampling rate, where the smaller the strata are, the more of them we have, and thus the higher the sampling rate. In chapter 15 we will develop the necessary mathematics to prop- erly analyze the beneﬁt of stratiﬁed sampling; for now we will simply assert that it is better. The stratiﬁed sampler places each sample by choosing a random point inside each stratum; this is done by jittering the center point of the stratum by a random StratifiedSampler 244 amount up to half its width and height. The non-uniformity that results from this jittering helps turn aliasing into noise, as described earlier in the chapter. This sam- pler also offers a mode where this jittering is not done, giving uniform sampling in the strata; this unjittered mode is mostly useful for comparisons between different sampling techniques rather than rendering ﬁnal images. Figure 7.14 shows a comparison of a few sampling patterns. On the top is a completely random sampling pattern: we have chosen a number of image samples to take and have computed that many random image locations–not using the strata at all. The result is a terrible sampling pattern; some regions of the image have few samples and other areas have clumps of many samples. In the middle is a stratiﬁed pattern without jittering (i.e. uniform super-sampling). On the bottom, we have jittered the uniform pattern, adding a random offset to each sample’s lo- cation but keeping it inside its cell. This gives a better overall distribution than the purely random pattern while preserving the beneﬁts of stratiﬁcation, though there are still some clumps of samples and some regions that are under-sampled. We will present a more sophisticated image sampling methods in the next two sections that ameliorate some of these remaining problems. Figure 7.15 shows images of the StratifiedSampler in action and shows how jittered sample positions turn aliasing artifacts into less-objectionable noise. stratiﬁed.cpp* ¢ £¡ #include "sampling.h" #include "paramset.h" #include "film.h" StratiﬁedSampler Declarations ¡ StratiﬁedSampler Method Deﬁnitions ¡ Sec. 7.4] Stratiﬁed Sampling 243 Random Uniform Jittered Figure 7.14: Three 2D sampling patterns. The random pattern on the top is a poor pattern, with many clumps of samples that leave large sections of the image poorly sampled. In the middle is a uniform stratiﬁed pattern which is better distributed but can exacerbate aliasing artifacts. On the bottom is a stratiﬁed jittered pattern, which turns aliasing from the uniform pattern into high-frequency noise while still maintaining the beneﬁts of stratiﬁcation. 244 Sampling and Reconstruction [Ch. 7 StratiﬁedSampler Declarations ¢ £¡ class StratifiedSampler : public Sampler { public: StratiﬁedSampler Public Methods ¡ private: StratiﬁedSampler Private Data ¡ }; The StratifiedSampler generates samples by scanning over the pixels from left-to-right and top-to-bottom, generating all of the samples for the strata in each pixel before advancing to the next . The sampler holds the offset of the current pixel in the xPos and yPos member variables, which are initialized to point at the ﬁrst pixel in the upper left of the image’s pixel extent to start out. (Both the crop window and the sample ﬁltering process can cause this corner to not necessarily be 0 0 .) ¡ ¡ StratiﬁedSampler Method Deﬁnitions ¢ £¡ StratifiedSampler::StratifiedSampler(int xstart, int xend, int ystart, int yend, int xs, int ys, bool jitter) : Sampler(xstart, xend, ystart, yend, xs, ys) { jitterSamples = jitter; AllocAligned() 667 xPos = xPixelStart; Sampler 237 Sampler::xPixelSamples 238 yPos = yPixelStart; Sampler::xPixelStart 238 Allocate storage for a pixel’s worth of stratiﬁed samples ¡ Sampler::yPixelSamples 238 Generate stratiﬁed camera samples for (xPos,yPos) ¡ Sampler::yPixelStart 238 } StratiﬁedSampler Private Data ¢ £¡ bool jitterSamples; int xPos, yPos; The StratifiedSampler computes image, time, and lens samples for an en- tire pixel’s worth of image samples all at once; this allows us to compute better- distributed patterns for the time and lens samples than we could if we compute each sample’s values independently. Here we allocate enough memory to store all of the sample values for a pixel. Allocate storage for a pixel’s worth of stratiﬁed samples ¢ £¡ imageSamples = (Float *)AllocAligned(5 * xPixelSamples * yPixelSamples * sizeof(Float)); lensSamples = imageSamples + 2 * xPixelSamples * yPixelSamples; timeSamples = lensSamples + 2 * xPixelSamples * yPixelSamples; StratiﬁedSampler Private Data ¡¡ ¢ Float *imageSamples, *lensSamples, *timeSamples; Naive application of stratiﬁcation to high-dimensional sampling quickly leads to an intractable number of samples. For example, if we divided the ﬁve-dimensional image, lens, and time sample space into four strata in each dimension, the total number of samples per pixel would be 4 5 1024. We could reduce this impact ¢ by taking fewer samples in some dimensions (or not stratifying some dimensions, corresponding to a single stratum), but we would then lose the beneﬁt of having Sec. 7.4] Stratiﬁed Sampling 245 Figure 7.15: Comparisons of image sampling methods with a checkerboard tex- ture: this is a difﬁcult image to render well, since the checkerboard’s frequency with respect to the pixel sample spacing increases toward inﬁnity as we approach the horizon. On the top is a reference image, ren