VIEWS: 12 PAGES: 9 POSTED ON: 7/28/2011
CUDA LDI Project Report 1. Abstract 2. Introduction 2.1. LDI Efficient collision detection is a fundamental problem in physically-based simulation and in computer animation. In order the accelerate collision detection for rigid bodies, many approaches based on pre-computed bounding-volume hierarchies have been proposed. However, in case of deformable objects these hierarchical data structures cannot be pre-computed, but have to be updated frequently, and it's computation prohibitive for dynamic environments. Recently, image-space techniques have been proposed for collision detection. These approaches commonly process projections of objects and they do not require any pre-processing and employ graphics hardware. Therefore, they are especially appropriate for dynamic environments. Layered Depth Images (LDI), a new image-space technique for collision detection for arbitrarily shaped, deforming objects, is presented in the recent papers [3]. An LDI is a view of the scene from a single input camera view, but with multiple pixels along each line of sight. A layered depth pixel stores a set of depth pixels along one line of sight sorted in front to back order. The front element in the layered depth pixel samples the first surface seen along that line of sight, the next pixel in the layered depth pixel samples the next surface seen along that line of sight, etc. The LDI approach detects collisions and self-collisions of 3D objects of manifold geometry. Although the approach is not confined to triangular messes, a watertight object surface is required in order to perform volumetric collision queries. The algorithm computes an approximative volumetric representation of the object. This representation is used for three different collision queries. The algorithm proceeds in three stages : 2.1.1. Stage 1: Computes the Volume-of-Interest (Vol) The VoI is an axis-aligned bounding-box (AABB) representing the volume where collision queries are performed. In the case of self-collision test, the VoI is chosen to be the AABB of the object. When a collision test between a pair of objects is performed, the VoI is not empty, the objects and the VoI are further processed in stage 2. 2.1.2. Stage 2: Computes an LDI for objects inside the VoI. Note, that LDI generation is restricted to the VoI and object primitives outside the VoI are discarded. The LDI consists of images or layers of depth values representing object surfaces. The depth values of the LDI can be interpreted as intersections of parallel rays or 3D scan lines entering or exit the object. Thus, an LDI classifies the VoI into inside and outside regions with respect to an object. Additional information on face orientation is stored in the LDI. Thus, depth and front-face / back-face classification is known for each entry in the LDI data structure. Stage 2 results in an LDI with sorted depth values and explicitly labeled entry (front-face) and exit (back-face) points within the VoI. 2.1.3. Stage 3: Performs one of these possible collision queries. a) Self-collision are detected by analyzing the order of entry (front-face) and exit (back-face) points within the LDI. If they correctly alternate, there is no self-collision. If invalid sequences of front-face and back-faces are detected, the operation provides the explicit representation of the self-intersection volume. b) Collision between pairs of objects are detected by combining their LDIs using Boolean intersection. If the intersection of all inside regions is empty, no collision is detected. If it is not, the operation provides an explicit representation of the intersection volume. c) Individual vertices are tested against the volume of the object. The vertex is transformed into the local coordinate system of the LDI. If a transformed vertex intersects with an inside region, a collision is detected. Figure 1 The LDI process.(a) AABB computations, (b) Rasterization of each object, (c) Volume intersection. 2.2. Depth Peeling Depth Peeling is the underlying technique that makes order-independent transparency possible. The standard depth test gives us the nearest fragment at each pixel, but there is also a fragment that is second nearest, third nearest, and so on. Standard depth testing gives us the nearest fragment without imposing any ordering restrictions, however, it does not give us any straightforward way to render the second nearest or nth nearest surface. Depth peeling solves this problem. The essence of what happens with this technique is that with n passes over a scene, we can get n layers deeper into the scene. For example, with 2 passes over the scene, we can extract the nearest and second nearest surfaces in a scene. We get both the depth and color (RGBA) information for each layer. As what we concern is the depth value of each pixel, depth peeling algorithm can help us to resolve the depth value of the fragments on the surfaces of object in a GPU-friendly way. Figure 2 These images illustrate simple depth peeling. Layered 0 shows the nearest depth, layer 1 shows the second depths, and so on. 2.3. CUDA based LDI generation The VoI computation in stage 1 and the collision queries in stage 3 do not significantly contribute to the computational cost of the algorithm. and the LDI generation in stage 2 is comparatively expensive. CUDA provides more flexible control over the GPU memory so that we can capture multiple fragments in a single pass. With a CUDA rasterizer many graphics applications can benefit from the free control of GPU memory, especially for the multi-fragment effects. F. Liu et al. present two efficient schemes to capture and sort multiple fragments per pixel in a single geometry pass via the atomic operations of CUDA without read-modify-write (RMW) hazards [1]. Experimental results show significant speed up to classical depth peeling, especially for large scenes. 3. Algorithm Overview The diagram below gives an overview of the algorithm about CUDA LDI generation algorithm. Figure 3 Algorithm Overview Our CUDA rasterizer is designed by packing the triangles into a texture and pass it to a CUDA kernel, with each thread projecting a single triangle onto the screen and rasterizing it by the scan-line algorithm. On each pixel location covered by the projected triangle, a fragment will be generated with interpolated attributes, such as depth. Then the LDI generation algorithm will be applied on the pixel. As the stage 2 of LDI generation algorithm is processed in a parallel way with the help of CUDA framework, the overall performance of the triangle can be expedited greatly. For the LDI generation kernel, the diagram below gives a detail depiction: Figure 4 Detail Diagram of LDI generation kernel 4. Algorithm Detail 4.1. Triangle Mapping Because layered depth image is a 2D plane, all the points on the 3D space should be projected onto the 2D plane. As the surface of objects in 3D space is composed of thousands of triangles, all the triangles should be projected on the 2D plane. In most cases, the 2D plane is the screen. Figure 5 Triangle Mapping As is shown in the demonstration above, all the vertices on the object is projected onto the plane. In our cases, orthographic projection is applied. 4.2. Triangle Rasterisation Because the coordinates of the vertex is continuous float number, while the data unit for LDI is pixel, so all the triangles projected on the 2D plane should be rasterized into pixels for further processing. Figure 6 Triangle Rasterisation Scan-line algorithm was adopted to implementing the triangle rasterisation. All the scan-lines are horizon line. For every scan-line with certain y value, we will check if there is any intersection with the edges of the triangle. If so, linear interpolation was applied to calculate the x value of left endpoint x1 and right endpoint x2. Then start from x1, step through x2 by increasing the x value one by one, we will check whether the pixel is inside the triangle or not. 4.3. Bilinear Interpolation For LDI processing, we need know the depth, i.e. z value of arbitrary pixel inside the triangle. Bilinear interpolation was applied to reach this goal. Figure 7 Bilinear Interpolation In mathematics, bilinear interpolation is an extension of linear interpolation for interpolating functions of two variables on a regular grid. The key idea is to perform linear interpolation in one direction, and then again in the other direction. Suppose that we want to find the z value of P at the point P=(x, y). It's assumed that we know the z value at the four points Q11, Q21, Q12 and Q22. We first do linear interpolation in the x-direction. This yields: We proceed by interpolating the y-direction. This gives us the desired estimate of z(x, y). 5. Memory Hierarchy 5.1. Placement of vertices information For one object, the memory layout for the triangles on the surfaces is are placed in continuous space, as demonstrated as below: Triangle 1 Triangle 1 Triangle 1 Triangle 2 Triangle 2 … Vertex index 1 Vertex 2 Vertex 3 Vertex 1 Vertex 2 Each vertex has three coordinate(x, y, z). Vertex 1 x coord. Vertex 1 y coord Vertex 1 z coord Vertex 2 x coord … Each CUDA thread function processes one vertex. Part of the pseudo code snippet is listed as below: For each thread: VertexIndex1 = *(TriangleIndexMat + 1 + threadId * 3); VertexIndex2 = *(TriangleIndexMat + 1 + threadId * 3 + 1); VertexIndex3 = *(TriangleIndexMat + 1 + threadId * 3 + 2); Vertex1x = *(VertexMat + VertexIndex1 * 3); Vertex1y = *(VertexMat + VertexIndex1 * 3 + 1); Vertex1z = *(VertexMat + VertexIndex1 * 3 + 2); ...... End The z depth value and direction will be resolved from the LDI kernel function. The calculated fragment information after the processing of CUDA LDI thread function will be stored in global memory. It's a large 2D array (idiMat) with the layout. All the fragments in one pixel will be placed in a one-demission array, with the length of maximum depth predefined in the 3D space. The first element of the array is the counter about how many fragments in the pixel, and the following data will be the z depth value and direction for the fragments. The detail layout is demonstrated as below: Figure 8 Memory Layout of LDI generated data 6. Result Evaluation 6.1. Correctness To evaluate the correctness of our algorithm, we will compare the result with the depth peeling based LDI generation software from NVIDIA [2]. 6.2. Efficiency We will compare the processing time and frame rate with the paper [1] of F. Liu et al to evaluate the efficiency of our algorithm. The result from the paper of F. Liu et al. is listed as below: Model Dragon Buddha Lucy Statue Bunny Tri No. 871K 1.0M 2.0M 10.0M 70K CUDP 1 321 fps 287 fps 153 fps 59 fps 459 fps CUDP 2 363 fps 309 fps 165 fps 61 fps 573 fps SRAB 116f/2g 106f/2g 49f/2g 8f/2g 334f/2g DP 29f/13g 26f/13g 14f/13g 2f/15g 128f/13g 7. References [1]. F. Liu, M. Huang, X. Liu, E, Wu, Single Pass Depth Peeling via CUDA Rasterizer. SIGGRAPH 2009: Talks [2]. Everitt, C. 2001. Interactive order-independent transparency. Tech. rep., NVIDIA Corporation. [3]. B. Heidelberger, M. Teschner, M. Gross, Detection of Collisions and Self-collisions Using Image-space Techniques. Journal of WSCG, Vol. 12, No. 1-3, WSGC 2004, Feb 2-6, 2004, Plzen, Czech Republic. [4]. F. Faure, S. Barbier, J. Allard, F. Falipou. Image-based Collision Detection and Response between Arbitrary Volume Objects. Eurographics / ACM SIGGRAPH Symposium on Computer Animation 2008.