Combining Computer Vision and Physics Simulations Using GPGPU
Justin Hensley∗ John Isidoro† Arcot J. Preetham‡
Graphics Product Group, AMD Inc
We present a system that uses the immense processing capabilities
of graphics processors (GPUs) to enable a computer vision algo-
rithm, such as stereo depth extraction, to drive a physics simulation
in an interactive environment. This combination of processing has
the potential to dramatically alter the way that people interact with
computers through novel user interfaces and in interactive gaming.
2 Depth Extraction
Before depth extraction is performed, a stereo image pair is con-
verted from its raw Bayer pattern data, a format commonly used (a) Screenshot of model con- (b) Screenshot of model being re-
by digital cameras to pack color images into a single channel im- structed from a single stereo im- constructed from a simulated vor-
age, to RGB data. After the input images are converted to RGB, age pair. tex.
ﬁltered to remove sensor noise, and then background subtraction
is performed. Next the images are rectiﬁed using a lookup-table
based warping function. Finally depth extraction is performed; our
system currently uses a variation of Wang et al’s technique .
The depth extraction takes place in three separate passes. In the ﬁrst
pass, a disparity cost volume is constructed. In the second pass, the
cost volume is updated using a spatially varying ﬁlter. This pass
differs from Wang et al’s method, whereas they generate two aux-
iliary weight volumes before updating the cost volume, we directly
compute the aggregate cost volume from the input images and the
initial cost volume. This modiﬁcation of their algorithm allows us
to generate a depth map with 64 distinct disparity levels in the same
amount of time that Wang el al generated a depth map with only (c) Screenshot of model crum- (d) Screenshot of model being
16 disparity levels. In the ﬁnal pass, the minimum cost disparity bling under the effects of gravity. impacted by projectile
is found for each pixel (often referred to as ”winner-takes-all”). It Figure 1: Screenshots from our system showing the combination of
is important to stress, that once the images are captured from the depth extraction and rigid-body physics. Aside from image capture,
input sources, then entire image-processing and depth extraction the entire depth extraction process is performed on the GPU. Once
takes place entirely on the GPU with no need to copy the data back a depth map is created, the model, which consists of toy building
to the CPU for further processing. bricks, is generated on the GPU, and then rigid-body physics are
simulated using a combination of the CPU and GPU. Along with
3 Model Generation and Physics Simula- rendering, the entire process runs at video frame rates.
rently in their depth map based target position are able to collide
One powerful feature of the GPU depth extraction system is that the with the other bricks moving into position. Our current system runs
resulting depth map is GPU resident and immediately available for in two possible conﬁgurations: A single GPU for rendering, physics
use as input for subsequent processing on the GPU. One possible and stereo, or multiple GPUs with the ﬁrst GPU for rendering and
use of this is to use the stereo system for interactive model preview the second GPU for physics and stereo.
and construction. In our application, the depth values are used to
position 20,000 individual toy building bricks every frame in order 4 Conclusion
to visualize the depth map in 3D in realtime. In the application
the blocks are simulated using a rigid body GPU physics system This paper has brieﬂy presented a system that leverages the GPU to
which allows for collisions against any of the other blocks. The meld computer vision and physics simulation in single application
physics system, part of the Havok FX framework, takes advantage which allows the users actual movements to affect a physical simu-
of the GPU by using it to compute narrow phase collision detection, lation running in a virtual environment. Depth from stereo itself is
collision resolution, integration, and behavior shader. The behavior a powerful technique that can be used as an intuitive 3D input de-
shader is able to use the depthmap as input to compute forces on the vice. One can imagine future games using this technology for many
blocks so that they can be pulled into place from any conﬁguration. applications such as generating a virtual 3D avatar in real-time, or
for interacting with objects in the game world.
Figure 1(b) shows how the system can transition from a physical
effect (a vortex of toy bricks generated by wind forces) to a model References
generated from the depth map by exerting attraction forces on the
blocks toward depth map based target positions. During this tran- ´
WANG , L., L IAO , M., G ONG , M., YANG , R., AND N IST E R , D.
sition the stereo is also being updated, so the blocks which are cur- 2006. High-quality real-time stereo using adaptive cost aggrega-
tion and dynamic programming. In 3DPVT.
∗ e-mail: email@example.com