CPSC 425
Sample Midterm Exam
October 6, 2008
Estimated Time: 60 minutes
Name:
Student Number:
Signature:
SPECIAL INSTRUCTIONS: 1. Print your name, student number, and sign above, as indicated. 2. This exam is closed book. Answer all questions in the space provided. Keep your answers short. If you run out of space for a question, you probably have written too much. If you need more space, use the back of a page and clearly indicate the location of your answer. 3. You may use pencil or pen. 4. Use the number of marks allocated to each question to help you determine how much time you should spend on that question. UNIVERSITY REGULATIONS: – Each candidate should be prepared to produce, upon request, his/her library card. – No candidate shall be permitted to enter the examination room after the expiration of one half hour, or to leave during the first half hour of the examination. – CAUTION: Candidates guilty of any of the following, or similar, dishonest practices shall be immediately dismissed from the examination and shall be liable to disciplinary action. 1. Having at the place of writing, or making use of, any books, papers or memoranda, electronic equipment, or other memory aid or communication devices, other than those authorized by the examiners. 2. Speaking or communicating with other candidates. 3. Purposely exposing written papers to the view of other candidates. The plea of accident or forgetfulness shall not be received. – Candidates must not destroy or mutilate any examination material; must hand in all examination papers; and must not take any examination material from the examination room without permission of the invigilator. THIS EXAMINATION CONSISTS OF 7 PAGES. CHECK TO ENSURE THAT THIS PAPER IS COMPLETE.
Part A: Multiple Choice Questions. Note that some questions can have more than one
correct answer (where indicated). Circle the letter (or letters) corresponding to your answer. Question 1: Which of the following statements are true of a pinhole camera? (Select ALL answers that apply). (A) A pinhole camera is a box with a small hole in it. (B) Images in a pinhole camera are upside down. (C) A pinhole camera has a fixed focal length, f . (D) Images in a pinhole camera are a perspective projection. Solution : (A) True, (B) True, (C) False, (D) True Question 2: What must be true for a set of 3D lines so that they all share the same horizon line when projected into an image? (A) The lines are all parallel in 3D. (B) The lines all lie on the same 3D plane. (C) The lines are both parallel and on the same 3D plane. Solution : (B) Question 3: What type of image distortion is caused by lens vignetting? (Select ALL of the answers that apply). (A) The slight curvature of straight lines away from the center of the image. (B) The shift in colour caused by the varying refraction of light at different wavelengths. (C) The darkening of an image towards its edges. (D) The difficulty in bringing all parts of an image into focus at the same time. Solution : (C) Question 4: We can detect an object in an image by performing normalized cross-correlation of a template with the image and selecting the best match. In order to detect the object at a range of different sizes, one approach would be to correlate templates of increasing size with the image. However, a more efficient approach is to build a Gaussian pyramid and convolve a fixed-size template with each level of the pyramid. Why is this more efficient? (Select ALL of the answers that apply). 2
(A) The higher levels of the pyramid have fewer pixels, which reduces the cost of cross-correlation compared to using larger templates on the original image. (B) The cost of creating new levels of the pyramid is less than the cost of creating larger versions of the template. (C) It takes fewer operations to correlate templates with fixed sizes than ones with increasing sizes. (D) The higher levels of the pyramid avoid aliasing. Solution : (A) and (C) Question 5: Why are two thresholds used when linking edge points in Canny edge detection? (A) Different thresholds are needed to select edge points when linking edges forward or backward from the starting location. (B) The detection of edge points is more accurate when two thresholds are used. (C) The use of two thresholds prevents gaps that would otherwise appear in the linked edge points. (D) The X and Y directional derivatives each require a threshold when linking to new edge points. Solution : (C) Question 6: Under what image transformations does the Harris corner detector select stable features? Features are considered stable if the same locations on an object are typically selected in the transformed image. (Select ALL answers that apply). (A) Image scaling. (B) Image translation. (C) Image rotation. (D) Image blur. Solution : (B) and (C) Question 7: The Efros and Leung texture synthesis method works best for a particular window size that depends on each texture. What can happen if the window size is too small? (Select ALL answers that apply). 3
(A) For textures that consist of repeated sub-elements, like “bricks” or “rings,” the shape of the sub-elements may not be captured. (B) For textures that consist of repeated sub-elements, like “bricks” or “rings,” the correct spacing between and among the sub-elements may not be captured. (C) There will be too few samples in the texture that are close to the window being matched. Therefore, the synthesized texture will not closely follow the sample texture. (D) The computation will become much less efficient since many more iterations will be required to synthesize a texture patch of a given size. Solution : (A) True, (B) True, (C) False, (D) False Question 8: Texture representation is hard. Which of the following statements are true of texture? (Select ALL answers that apply). (A) Texture depends on scale, illumination and viewpoint. (B) To date, texture analysis has proven more tractable than texture synthesis. (C) The “spots” and (oriented) “bars” approach to texture representation described in Forsyth and Ponce is motivated, in part, by properties of human vision. (D) The Laplacian pyramid provides no explicit representation of orientation. But, if we process each layer of the Laplacian pyramid further with a set of oriented filters then we can represent energy at distinct scales and orientations as an “oriented pyramid.” Solution : (A) True, (B) False, (C) True, (D) True Question 9: Under what conditions does the epipolar constraint used in stereo matching hold between the images from two cameras? You can assume that the cameras perform standard perspective projection. (A) The two cameras must have coplanar projection planes. (B) The two cameras must face in the same direction (i.e., have parallel optical axes). (C) The two images must be rectified. (D) There are no restrictions on camera locations or orientations, an epipolar constraint always applies. Solution : (D) Question 10: Stereo matching can be performed by correlating windows of pixels between the two images. But, it is difficult to know what window size to use. What is the major problem that would arise if the selected window size is too small? 4
(A) There would be fewer matches. (B) Places where depth is discontinuous would be pooly matched. (C) There would be more false matches due to ambiguity and image noise. (D) The epipolar constraint would not be as effective to limit the number of matches. Solution : (C)
5
Part B: Short Answer Questions. Answer each question concisely and clearly. Points will
be deducted for overly long or unclear answers. Question 11: Give a 3 × 3 linear filter that shifts an image 1 pixel downwards and also reduces the image brightness by 50%. Be sure to indicate whether you intend your filter to be implemented as a correlation or as a convolution. Solution : As correlation 1 0 2 0 0 0 0 0 0 0 Question 12: In this question, we compare the computational costs of convolution with and without using separable filters. We just count the number of multiplication operations required, as they typically dominate the cost of convolution. (A) Let S be the number of multiplication operations required to convolve a Gaussian linear filter with an image of size n × n pixels. Assume that we use two separable 1D filters, each of length 6 σ as in Assignment 2. Give an expression for S in terms of σ and n. For simplicity, express S as a real number without accounting for integer roundoff in filter length or any special treatment near image boundaries. (B) Let R be the number of multiplications operations required to convolve the equivalent single 2D Gaussian filter with an image of size n × n pixels, rather than using the two separable 1D filters. Give an expression for R in terms of σ and n. Again, express R as a real number without accounting for integer roundoff in filter length or any special treatment near image boundaries. (C) For what values of σ is S less than R (i.e., is it always true that the separable approach is faster)? Solution : (A) There are n2 pixels and 2 × 6 σ multiplications at each pixel. Therefore, S = 2 × 6 σ n2 = 12 σ n2 (B) There are n2 pixels and (6 σ)2 multiplications at each pixel. Therefore, R = (6 σ)2 n2 6 As convolution 0 0 0 0 0 0 1 0 2 0
(C) S is less than R when 12 σ n2 12 σ 12 σ < (6 σ)2 n2 < (6 σ)2 < 36 σ > 1/3
Aside: This means that S is less than R for all σ in practice since σ ≤ 1/3 results in a filter that is too small to be implemented in the spatial domain. Question 13: It is common to use normalization of image patches when they are being matched for stereo correspondence. For the Efros and Leung texture synthesis method (as implemented in Assignment 3) would it further improve the results to also do normalization of patches in the matching step? Explain your answer with just one or two sentences. Solution : No, normalization would allow patches with differing brightness to match, and this would create sharp lines at their boundaries. [However, if some compensating brightness adjustment was performed when adding the patch, then it is possible a way could be found to improve the results.] Question 14: As we have seen, determining corresponding points in the left image and the right image is the hardest part of stereo vision. A variety of things can go wrong in stereo matching. In a sentence or two for each, give a specific example of a scene where (A) there are not enough locally distinct features that match (B) there are too many locally distinct features that match (C) locally distinct features match incorrectly Hint: Remember that this is a question about the problem of stereo vision, not a question about the properties of a particular algorithm or technique used to do stereo matching. Solution : (A) Any scene containing extended smooth, featureless regions suffices. A stereo pair obtained by viewing a blank grey wall is one good example. (B) Any scene containing many closely spaced, visually similar features suffices. A random dot stereogram is a canonical example. (C) Any scene containing visually similar features that do not correspond to the same object point suffices. A surface, like that of a sphere, that curves smoothly away from view is the example cited in the text. In class, we added highlights/specularities on a smooth surface, like a sphere, as another example. 7