VIEWS: 101 PAGES: 7 POSTED ON: 3/4/2010 Public Domain
Lecture 1.2 COMP 646 Jan. 10, 2008 Stereo disparity Let’s now reinterpret the model we developed in lecture 1.1. Rather than moving the camera over time, suppose we have two cameras that are in a ﬁxed position relative to each other. Assume the optical axes of the two cameras are parallel, i.e. the cameras have the same Z direction, but that one camera is displaced to the right (which we’ll call the positive x direction) of the other by a distance Tx . This distance is sometimes called the baseline of the camera pair. In terms of the previous model, the two cameras correspond to t = 0 and t = 1, and Ty = Tz = 0. A 3D point with coordinates (X0 , Y0, Z0 ) in the left camera’s coordinate system would have coordinates (X0 − Tx , Y0 , Z0 ) in the right camera’s coordinate system. As such, this 3D point would project to a diﬀerent x value but not to a diﬀerent y value in the left and right images. The diﬀerence in x position is called the binocular disparity, X X − Tx Tx d ≡ xl − xr = f− f = f. Z0 Z0 Z0 From this equation, we can make two key observations: 1. A 3D point in the world projects to the same row (y value) in the left and right image plane. This implies that to match corresponding points in the two images, the vision system only needs to search within corresponding rows in the two images. 2. Once the visual system has found corresponding points in the two images, it can use the disparity to calculate the depth of the scene point, assuming Tx and f are known. The ﬁrst problem is often called the correspondence problem. This is the problem we discuss today. To develop an intuition, hold up your ﬁnger in front of your face and look at something far away behind the ﬁnger so that the two eyes are pointing in roughly the same direction. You should notice that you see two copies of your ﬁnger (double vision). If this is not obvious, then close the left and right eyes, alternately back and forth. You’ll notice that the image of your ﬁnger jumps back and forth, relative to the background scene. The above discussion highlights the basic features of stereovision. • disparity: The two eyes see the world from two diﬀerent 3D positions. As a result, the images seen by the two eyes are slightly diﬀerent. • correspondence: To avoid “double vision,” the brain must match points in the left and right eyes’s images. • depth perception: The position diﬀerence of points in the left and right eye images depends on the 3D geometry of the scene. A vision system can use the position diﬀerences to infer depth. We could generalize the geometry by allowing for arbitrary camera translation and rotation between. This would follow a derivation similar to what we saw with motion in last lecture. However, since we are mainly concerned with correspondence today, we keep the geometry simple and only consider translation in X. 1 Lecture 1.2 COMP 646 Jan. 10, 2008 History: random dot stereograms and the correspondence problem How does the eye/brain match corresponding points in the left and right images? Until the early 1960’s, it was commonly believed that this correspondence problem was “easy”. The eye/brain found some familiar pattern in the left image and matched it to the same familiar pattern in the right image, or vice-versa. This makes sense intuitively – you can see quite well with just one eye. In the 1960’s, engineers and psychologists became interested in exactly how this process worked and started doing experiments with digital stereo image pairs. One important type of image that was used was the random dot stereogram (RDS) which was invented by Bela Julesz at Bell Labs. The RDS is a pair of images (a “stereo pair”), each of which is merely a random collection of white and dark dots. As such, it does not contain any familiar features. Although each image on its own is a set of random dots, there is a relation between the random dots in the two images. The random dots in the left eye’s image are related to the random dots in the right eye’s image by shifting a patch of the left eye’s image relative to the right eye’s image. Julesz carried out many experiments with RDSs. These are described in detail in his classic book from 1971 and in a paper1 . His results are very important in understanding how stereo vision. They strongly suggest the human visual system (HVS) does not rely on matching familiar monocular features to solve the correspondence problem. (Each image of a random dot stereogram is maximally random. There are no familiar patterns in there (except with extremely small probability.) The construction of the random dot stereograms is illustrated in the ﬁgure above. First, one image (say the left) is created by setting each pixel (“picture element”) randomly to either black or white. Then, a copy of this image is made. Call this copy the right image. The right image is then altered by taking a square patch2 and shifting that patch horizontally by d pixels to the left, writing over any pixels values. The pixels vacated by shifting the patch are ﬁlled in with random values. This procedure yields four types of regions in the two images. • the shifted pixels (visible in both left and right images) • the pixels in the left image that were erased from the right image, because of the shift and write; (left only) • the pixels in the right image that were vacated by the shift (right only) • any other pixels in the two images (both left and right) To view a stereogram such as above, your left eye should look at the left image and your right eye should look at the right image. (This is diﬃcult to do without training.) If you do it correctly, then you will see a square ﬂoating in front of a background. Disparity space Let’s now relate the above example to a 3D scene geometry that could give rise to it. Suppose the optical axes of two cameras are parallel; the right camera is displaced from the left in the X direction only. This allows us to assume the same projection plane Z = f for the two cameras. 1 B. Julesz, ”Binocular depth perception without familiarity cues”, Science, 145:356-362 (1964) 2 it doesn’t have to be a square 2 Lecture 1.2 COMP 646 Jan. 10, 2008 left eye only right eye only square square background background left eye image right eye image perception The 3D scene is a small square in front of a background. This scene yields a disparity d between the small square in the left and right images. Consider a single horizontal line y = y0 in the image projection plane such that this line cuts across the displaced square. We wish to understand the disparity along this line. The ﬁgure above on the right represents this line in the two images using a coordinate system (xl , xr ). For each 3D scene point that projects to this line, there is a unique xl and xr value. Moreover, each depth value Z corresponds to a unique disparity value, since d = xl − xr = Tx /Z. Notice that the set of lines that arrive at the left eye are vertical lines in the ﬁgure on the right, and the set of lines that arrive at the right eye are horizontal lines in the ﬁgure on the right. Similarly, each horizontal line in the ﬁgure on the left represents a line of constant depth (constant disparity). Each diagonal line in the ﬁgure on the right represents a line of constant disparity (constant depth). Make sure you understand where each point on the foreground square and each point on the background appear in the left and right ﬁgures. (I talked through it in class.) Note that xl > xr for all points Z > 0. Thus, disparity d is positive. Because of the geometry of the projection, certain points on the background surface are visible to one eye only; others are visible to both eyes; still others are visible to neither eye. Points that are visible to one eye only are called monocular points. 3 Lecture 1.2 COMP 646 Jan. 10, 2008 not visible to either eye background right eye only foreground square not visible to either eye x r visible to both eyes left eye only projection plane x l left eye right eye Formulating the stereo correspondence problem Now that we better understand the geometry of stereo, let’s try to formulate the stereo correspon- dence problem as a computation problem. How could a vision system solve the correspondence problem, say for a random dot stereogram? The correspondence problem was ﬁrst addressed in a formal way by David Marr and Tomasso Poggio3 in the mid 1970s. They proposed a speciﬁc algorithm that matches white pixels in the left image to white pixels in the right image, and similarly black to black. Marr and Poggio try to derive an algorithm from ﬁrst principles. They introduce two “con- straints” in the 3D world that make the correspondence problem easier to solve. • C1 (uniqueness:) A given pixel in each image is typically the projection of one 3D point in the world, namely a point on an opaque surface. • C2 (continuity:) Surfaces in the world are typically continuous. The two constraints often hold, but not always. C1 fails when we have a transparent surface such as a window, since we have objects visible at diﬀerent depths and at the same image position. C2 fails when the scene contains small surfaces at diﬀerent depths, for example, a bush or tree. In this case, there will be lots of discontinuities in the depth function, and hence lots of discontinuities in dis- parity. Anyhow, the two constraints suggest two rules that can be used to solve the correspondence problem: • R1 (uniqueness:) For any position in either the left or right eye’s image, there exists exactly one disparity value; • R2 (continuity:) Disparity varies continuously across the left and right image. 3 D. Marr and T. Poggio. ”Cooperative Computation of Stereo Disparity”. Science, 194:282-287. (1976). 4 Lecture 1.2 COMP 646 Jan. 10, 2008 Note that the second constraint does not hold everywhere in the above example, since there is a depth discontinuity. Using the constraints/rules on the stereo correspondence problem, they propose an iterative algorithm to solve the correspondence problem. Assuming N pixels in each image, they discretize the space (xl , xr ) using an N × N square grid. Each row/column in the grid corresponds to a single position in the left/right image respectively. The goal is then to compute a binary labeling of the nodes in this N × N grid. A 1-label means that the node corresponds to a visible surface at that (xl , xr ) value, a 0-label otherwise. The rules R1 and R2 constrain the labeling solution: • The uniqueness rule R1 implies that at most one node can have the label 1 in any row and in any column. • The rule R2 implies that the disparity d should vary slowly as a function of xl and as a function of xr . Any ﬁxed value of disparity d deﬁnes a line xl − xr = d in (xl , xr ) space. This is a diagonal line with slope 1. Thus, the continuity constraint implies that nodes with label 1 should cluster along such diagonal lines. Marr and Poggio proposed an iterative algorithm for computing such a labeling, given binary images Il and Ir from one row in a random dot stereogram. Their paper is signiﬁcant as it is one of the ﬁrst papers to propose a computational model of a problem solved in visual perception. How binocular stereo depends on depth It is often claimed that stereovision is good for near objects, but poor for distant objects. What does this mean? For the visual system to use stereovision to perceive the relative depth of two points, it needs to detect a change in disparity ∆d between these two points. Similarly, for the visual system to distinguish a surface that has constant depth from one that does not have constant depth, it needs to detect a change in disparity ∆d along the surface. Consider two points, one at depth Z and the other at depth Z + ∆Z. From d = f Tx /Z0 , we approximate 4 : 1 −1 ∆d = Tx f (∆ ) ≈ Tx f 2 (∆Z) Z Z or Tx f ∆Z ∆d ≈ −( )( ). Z Z How do we interpret this? If we consider two nearby points in the scene, we have a diﬀerence in depth and a diﬀerence in disparity. The diﬀerence in disparity is proportional to two terms: the T ratio Tx , and the relative change ∆Z in depth. The ﬁrst term fZx is the disparity itself. The second Z Z term ∆Z is the relative depth of the two points. Z For example, suppose we are looking at a ball and comparing the nearest point on the ball with a point near the edge of the ball (as seen by left eye, see ﬁgure below). The term ∆Z is equal to Z 4 If g(Z) = 1/Z is a function of Z, its derivative is g ′ (Z) = −1/Z 2 and so ∆g(Z) ≈ g ′ (Z)∆Z . 5 Lecture 1.2 COMP 646 Jan. 10, 2008 the (half)angle subtended by the ball on the image plane. (Actually, it is the tangent of this angle, but for small angles, tan θ ≈ θ.) Notice that this relative depth is the same for the ball as it is for any sphere that subtends the same angle, for example the moon. Thus, two spheres that subtend the same angle but that are at diﬀerent depths have a diﬀerent the disparity range ∆d. If one of the spheres is very far away (the moon), the absolute depth Z is large and so the disparity range is near zero. Such a sphere will look ﬂat to a human observer, since Tx ≈ 0. By contrast, a nearby sphere that subtends the same visual angle as the moon will Z not look ﬂat, since the change in disparity along the surface is large enough that it can be detected by the visual system. Z Tx ∆Z Vergence and disparity I have set up the mathematics of stereo such that the two eyes are pointing in the same direction, namely perpendicular to the vector that separates the two eyes. This is not the most general situation. In particular, in human vision, the eyes rotate and the optical axes (the depth direction) of the two eyes can converge on an arbitrary point in the image. The geometry of stereo gets a bit more complicated in the case that the eyes converge. I will delay presenting the details of this geometry until much later. For now, let’s keep the discussion qualitative. Consider the 2D plane that is deﬁned by three points: the two eyes, and the point in space where the optical axes of the eyes converge. This 3D point is, called the vergence point. It is ‘the point the eyes are looking at. The depth direction Z is not the same in the left and right eyes when the eyes are converging on a point that is a ﬁnite distance away. Thus, if we keep the deﬁnition of disparity as before, d = xl − xr , then the disparity value will be diﬀerent from the case when the optical axes of the eyes were parallel. Roughly speaking, as the eyes converge, the xl value of any scene point will decrease and the xr value will increase, and so the disparity will decrease. Here are a few common facts/deﬁnitions you should know about: • The disparity of the vergence point is 0. The vergence point is not the only point with zero disparity. Deﬁne the horoptor to be the locus of points with zero disparity. • The horoptor is roughly identicaly to the Vieth-M¨ller circle, which is the unique circle con- u taining the two eyes and the vergence point. • Any point P has crossed disparity if the dP > 0. This happens if if P is closer than the horoptor. Crossed disparity means that you would have to “cross your eyes” in order to ﬁxate on the object. 6 Lecture 1.2 COMP 646 Jan. 10, 2008 • P has uncrossed disparity if dP < 0. This happens if P is farther than the horoptor. When the eyes are parallel (verged at inﬁnity), disparity is always crossed or zero. But when the eyes converge, disparity can be crossed or uncrossed. disparity < 0 disparity > 0 left eye right eye Stereoblindness As an aside, you may be interested to know that many people are ”stereoblind” or partly ”stere- oblind”. This doesn’t mean that they can see with one eye only. Rather, it means they cannot get any depth information from stereovision. Surprisingly, some people are stereoblind for crossed disparities, but they have stereo vision for uncrossed disparities and vice-versa. Stereoblindless is analogous to color blindness. As I will brieﬂy discuss later in the course, color blindless arises when a person is missing certain photoabsorbing pigments in their eyes. There are diﬀerent types of color blindness, depending on which pigments you are missing. Ten percent of human males are missing one of the three pigments, and hence partly color blind. Color blindless doesn’t mean that you cannot see color. Rather it means that you see only a range of colors. 7