Document Sample

Chapter 3 Segmentation Image Acquisition Recording of the image information by projection of intensities upon the recording medium: photographical film, CCD-sensor, etc. For the continuous case (photographic film) the intensity is given by a contiuous function f(x,y). Here x and y indicate the coordinates on the film (i.e. origin is at left lower corner) f(x,y) is the brightness or light intensity value at the point with coordinates x,y (e.g. 0 ... 1). In the discrete case (image consists of ‘pixels’, or ‘picture elements) the image is a matrix, with M rows (lines) and N columns are let i and k be the row and column indices, so i = 1 ... M, k = 1 ... N. In digital image processing the intensity function can only assume discrete values (e.g. 0, 1, ... 255). Thus denote the intensity function with f(i,k): y k x i Image preprocessing Problem : Images delivered by the camera often have insufficient quality: noise, distortions, bad contrast, illumination problems, motion blurring Image preprocessing : Low level processing to make the image usable for further processing steps (i.e. object recognition) ‘Image filtering’ Image filtering a) Averaging Local calculation of the average intensity for a neighborhood of adjacent pixels: 1 E3 = (E1 + E2 + E3 + E4 + E5) 5 E1 E2 E3 E4 k E5 i or spreading the averaging process onto regions of any size: g(i,k) = 1 N f(i,k) i,k Region Effect of averaging: noise reduction, image is "flattened" by removal of small, disturbances or data gaps Linear filter operator Disadvantage: Edges in the image (important strucural features) are also flattened by averaging b) Median filtering Creation of the local median value: sort a 3x3 pixel matrix around E5 in ascending order into a list. Gray level of E5 is then replaced by the median value of the list Example: List (1, 44, 77, 140, 190) Median is 77 Average is (1+44+77+140+190)/5 Non-linear operator E1 E2 E3 E4 E5 E6 k E7 E8 E9 i Advantage of the median: less sensitive with respect to data gaps and random disturbances Example: Pixel intensity values: (99,100,0,101,100) (value 0 is a data gap) Median: 100 (correct) Average: 400/5 = 80 Example (original image of a hammer) Random noise added to the original image: Averaging with 3x3 and 9x9 matrix 3x3: 9x9: Median- Filtering Original: median: mean3: -> median suppresses noise with little smooting c) Fourier transform Universal method for the image preprocessing, can be used globally and locally. Contrast enhancement, noise filtering, edge detection Filter characteristics can be adjusted in many ways Principle decompose a (periodic) function f into partial frequencies e.g. Horzontal line in the image (row of pixels) can be regarded as a function f of one parameter: x: pixel number along the image line f(x) gray levels of the x-pixel f is decomposed into individual frequencies theorem: any (periodic) function can be decomposed in this way f is only defined in small sector (image range), f can thus be regarded as section of a periodic function Result: f is represented as a weighted sum of component functions sin(nx) and cos(nx), Short: sn, cn denote sin(nx), cos(nx). For large n, the function sin(nx) is a function with a high frequency Application Consider a (horizontal) line in the image Noise reduction: calculate fourier decomposition and delete all partial functions with a high frequency, i.e. all partial functions sin(nx), cos(nx) for large n Edge enhancement: delete all partial functions with low frequency: edges are sudden changes of intensity within one line abrupt changes correspond to steep ascends, can only be represented with high frequencies, -> deleting the low frequencies enhances the sharp ascends Reverse tranformation: After deleting certain components of the form sin(nx), cos(nx) reconstruct f from the remaining components only Similar procedure is possible with a 2D image matrix instead of a single line (2D fourier transform) Advantage: FT is independent of direction Example Orig. Image Orig. Img. as function: Fourier-transform (power spectrum) Fourier-transform (as function) Low pass High pass Filtering (with low and high pass) Low pass: High pass: Image after low pass (obtained by reverse transform) (Notice that a vertical line in the image is marked) high pass: Image along this vertical line (drawn as function) Low pass: High pass: Original: Edge detection An edge appears in an image as a sudden change of gray levels (intensities) Goal of edge detection: find line segments or curves where such sudden changes occur in the image Fundamental method for most edge finding procedures: Mathematical function differentiation I.e. regard image line as function f(x), where x is the index (pixel number) Calculate derivative df/dx or f ’(x) from f(x) As long as the gray level remains constant, the derivative is null For edges, derivative is distinct from null For two-dimensional images (whole image instead of single line in image), the derivative of the intensity function f(x,y) depends on the direction Example derivative along a line of the intensity matrix: y y0 x f(x, y0) x First derivative in x-direction: f'(x,y0) x Second derivative in x-direction: f''(x,y0) x Absolute value of the first derivative corresponds the value of the gray level change: Edge detection with threshold value: Whenever the value of the first derivative exceeds threshold: report an edge at this pixel Problem: How can the threshold value be determined? Second derivative is more sensitive to changes than first derivative (steeper ascend) Second derivative is also highly sensitive towards noise. Therefore it is typically applied only after noise filtering. Important advantage: At each edge f''(x,y) has a zero-crossing. ->No threshold value is necessary for second derivative Derivatives deliver different results for different directions. Desirable: Direction-less edge detection Edge detection without having to specify any direction in advance Gradients method: a procedure based on the first derivative of the image matrix Definition of the gradient for a function of two variables f(x,y): G(x,y) = (df(x,y)/dx , df(x,y)/dy) = : (fx, fy) The direction of this gradient vector is the direction of the greatest ascent/descent, with starting point (x,y). The absolute value of the gradient 2 2 G(x,y)= fx + fy is a measure for the amount of change, independent of direction. Absolute value of gradients is such a direction-less edge detector If the preferred direction is desired: instead of taking the absolute value multiply the gradient with a vector perpendicular with the preferred edge direction Calculation of the gradient Approximation of the partial derivatives by simple differences : fx = ( f(x,y) - f(x - x, y) ) / (x) fy = ( f(x,y) - f(x, y - y) ) / (y) Here x = 1 can be assumedFor the case of discrete image matrices: fi = f(i,k) - f(i - 1, k) fk = f(i,k) - f(i, k - 1) Implemetation of both formulas as operators ''operator masks'': Gx -1 +1 Gy +1 k i -1 Application of the operator move the mask over the entire image matrix and multiply each intensity value with the weighting factors (here +1 and –1) Improvement: Use of more than two pixels and extra weighting of some pixels: (reduces noise sensitivity) Example for such operators: Gx Gy -1 0 +1 +1 +2 +1 -2 0 +2 0 0 0 -1 i 0 +1 -1 i -2 -1 y x The combination of these two operators is known as ''Sobel-Operator''. 2 2 GS(x,y) = Gx (x,y) + Gy (x,y) There are many other possibilities for defining operators based on the first derivative or gradients Example: "Roberts-Cross" G(x,y) = MAX(abs(fx, absfy)) Common feature of nearly all gradient-based operators: maximum at the position of the edge has width of several pixels Desirable: width only one pixel ->line thinning procedures necessary -> Erosion, dilatation (based on the Minkowski-sum) Examples (Sobel operator) Original: Sobel (Gs) vertical edges (Gx) horizontal edges (Gy) operators can be tested online under www9.in.tum.de:8000 Segmentation Partitioning of the image into regions (segments or ''semantic units'') according to appropriate homogeneity criteria Example: Image with one large object. Instead of finding the object’s edges, find the region representing the object in the image Result: more compact descriptions of the scene, a higher abstraction level Two procedures can be distinguished : a) Homogeneity orientated segmentation: construction of regions of similar image elements until a discontinuity is encountered (region-oriented) b) Discontinuity orientated segmentation: search for edges first, then connect the edges to region boundaries with appropriate criteria (boundary-oriented) Construction of connected regions (search for homogeneities) Simplest method : threshold value Starting from a starting point (x,y) all surrounding image points are checked for their gray level. An new image point at the position (x’,y’) is included in the region, if: Abs(f(x,y) - f(x’,y’)) <= T Very simple procedure. Problems : Where is the best starting point for a region? How to choose the threshold value for a region? Frequently used method: region orientated segmentation through partitioning and reassembling ("Split - And - Merge") Starting from the entire image, the current region is partitioned into four quadratic sub-regions Each quadratic sub-region is partitioned further (into four smaller quadrants) if the threshold value inside the region is exceeded (Splitting). Algorithm stops, if no further splitting or merging is possible. Start Split : Start 1 2 1 2 3 4 4 3 Split : Start 21 22 1 24 23 1 2 3 4 41 42 31 32 44 43 34 33 41 42 43 44 31 32 33 34 21 22 23 24 Merge : START 23 32 1 2 3 4 23 24 31 32 42 Split : START 231232 234233 1 2 3 4 321322 324323 23 24 31 32 42 231 232 233 234 321 322 323 324 Merge : START 1 2 3 4 23 24 31 32 42 231 234 321 324 Representation of the solution as a tree: - Expansion of a node, if threshold value exceeded - Non-expanded nodes are merged to a node "Split - And - Merge" avoids problem of searching for a good starting point. However: Threshold value still needed. For scenes with known lighting and approximately known image content: Determination of the threshold value "by educated guessing" of the user In appropriate cases: use grey level histogram to find threshold value: N(r) r 1 2 3 4 x-axis: gray levels (e.g. 0..255) y-axis: number of pixels with grey level 0, number of pixels with grey level 1, ...A large region contains many points of the same grey level. Thus, if there are several regions with distinct grey levels, the histogram of the image has several distinct maxima and minima. The threshold value limits are then placed at the minima. This reduces the danger to split regions, which actually belong together. Threshold value determination is particularly simple in case of a histogram with two distinct maxima (so- called “bimodal histogram”): output image has only two gray levels image can be processed as binary image: Image elements of one gray level value are dedicated to objects, the other gray level value represents the background Advantages of pure binary image processing: - very simple hardware - fast ("real-time segmentation") - straightforward analysis These advantages are partially compensated by the necessity of special lighting Edge-orientated segmentation Construction of contours by following sequences of object edges in the image Requires edge detection procedures (derivatives, Sobel operator, etc.) Connect adjacent edges to regions Problem : How to connect edges, particularly in the case of alternative solutions? Two possibilities : Contour detection is reduced to the search of a path through a graph. Graph describes all transitions between an image point and its “neighbor”. As soon as the path has been found, the region is described by its contour. Advantage of the second method: It is possible to use prior knowledge about the expected contours By changing the heuristic used for the search one can adapt to the respective problem. But the determination of this heuristic can be difficult. Procedure : In discrete images an edge consists of ‘edge elements’, An edge-element is the segment of an edge between to adjacent pixels Example: Pixels A and B have one edge element in common, denoted by (A, B) A B C The edge elements are D E represented as nodes in a graph: Starting point: A k(A,D) k(A,B) A,D k(A,B) A,B k(A,D) k(D,E) k(D,E) k(B,E) D,E k(B,E) B,E k(B,C) B,C Thus: Nodes in the graph are edge elements e.g. edge element between pixel A and B denoted by (A,B) Successor of a node: Edge element (A, B) has two end points The are successors of the node (A, B) are the edge elements touching these end points Example (B, E) is a successor of (A, B)Starting from a point S adding a new edge element to an existing contour causes a certain cost The cost function g(n) describes the cost for a path from the starting point S to node n The costs between any two nodes ni and nj are denoted by k(ni, nj) Thus edge detection is reduced to the search of the shortest path in a graph. The set of all nodes (edge elements), which are along the shortest path, describes the desired contour The cost function depends on the intensity difference between the edge elements determining image points P1 and P2 Large intensity difference between the pixels (A,B): Edge element (A, B) has low costs Small intensity difference: pixels A and B belong to the same region. Thus (A, B) should not be part of the contour -> (A, B) receives high cost (penalty) Example for a contour : A B C 10 10 5 D E F 10 5 5 G H I 10 5 4 Graph : START 10 5 A,B B,C A,D C, F 5 5 5 10 D,E B,E E,F 5 10 10 10 5 E,H 9 5 9 G,H H,I Finding a shortest path in a graph is equivalent to finding a path with lowest total cost. Several methods for finding shortest paths in graphs are known Graph searching techniques: Given: Start node and one or more goal nodes (goal states). Wanted is an optimal (low-cost) path, to connect the start node to the goal node In case of contour detection: a start node (=starting image point) and several target nodes a) closed contour is completely visible the image: starting point = end point b) contour lies only partially inside image: end point is pixel on the image boundary Breadth first search Breadth first search always finds the shortest path, if one exists But: Number of nodes searched is exponential -> impractical in most cases Thus: Include previous knowledge via the cost function, ‘always only follow the one path currently having lowest cost’ instead of all paths as in breadth-first search Application of the A*-algorithm to segmentation Nodes again correspond to edge elements, as above Cost function depends on the gradient as above, i.e. the lower the intensity difference between the two pixels of an edge element, the higher the costs of this edge element Heuristic function depends on the expected contour The greater the deviations from the expected contour, the higher the heuristic cost of this path Example for the calculation of the heuristic function h(n) for a node n: a) Assume squares must be detected in the image. Then choose h such that costs are increase with distance to the starting point. Additionally any deviation from a straight line, which does not amount ±90o should cause high cost b) Likewise, to detect circles of known radius, function h can be chosen proportional to the difference of a default radius of curvature Thus h is only useful, if the shape of objects to be detected is known in advance Object recognition The goal of object recognition in images is to extract assertions of the form: "Image region X with the properties Y it is an apple (a dog, an assembly part…), if projected with method Z onto the image area." To make such assertions, a model is needed, i.e. a) Knowledge of all objects potentially occurring in the image and b) Means for the description of the current image contents. Approach : a) Comparison between characteristics of prototypes with an observed features (characteristics) in the image on the basis of statistic methods (decision theory). b) Classification with neural networks Statistic approach: An object in the image is characterized by n descriptors xi (e.g. length, area, color, circularity,…). Assemble the descriptors to a so-called characteristic vector: x0 x1 x= . . xn The decision is made in such a way, that a presented object is mapped to a certain object class i. Decision is made by evaluating a decision function di(x). If M different object classes have to be distinguished, M decision functions are needed, which map the characteristic vector to the different classes. The decision is made in such a way, that characteristic vector x of the object is inserted into all decision functions. The object belongs to that class, whose decision function is smaller (greater) than all other values of decision functions, i.e. xa belongs to the class i, if di(xa) < dj(xa) j \ j=iIn the simplest case the decision function is the Euclidian distance between the characteristic vector x and the prototype vector of the class di(x) = absx – miThe prototype vector for each class is computed by "averaging" over a large number N of actual characteristic vectors: N mi = xk 1 Nk = 1 Very fast (parallelism !) decision making is possible, but the process is relatively rigid Interpretation with neural networks Characteristic vector as above But first of all an adequate model is computed by 'training' Training through back-propagation Approach Internal system parameters (weight values) are adapted to best fit the training data adaptation in small steps, similar to numeric minimization Basic model x1 g1 x2 + >S ... xn gn 'Perceptron' Binary input values x1,..,xn (only 0 or 1) Output also binary Threshold value S Weights g1,...,gn Alternatives State storage x1 g1 x2 Z >S ... f xn gn More general function f replaces summation, in the above model f(x1,...,xn) = x1 + ...+xn Function f depends on the state Z, Z is stored Alternative 2 Combination with a logic gate l1 g1 l2 + >S g3 l3 Logic gates transform many binary inputs to one output Goal: determine values gi from a number of training data sets each training data set has known classification determine values gi such that a new previously un- classified data set can be classified correctly Training instances are pre-classified, i.e. each training instance is marked as a positive or a negative example for the feature to be learned Neural network hidden Input cells Output >S1 >S1' >S2 >S2' . . . . . . . . . >Sn >Sm' resp. (Threshold values transformed into as weights) I O g h 11 >0 11 >0 >0 >0 . . . . . . . . . >0 >0 1 1 Binary input in cells in layer I Binary output in layer O Procedure for the calculation of an output for a given input 1. Multiplication of the weights gij with the input parameters 2. Threshold value comparison in the hidden layer 3. Propagation into the output layer, multiplication with the weights hij 4. Threshold value comparison in the output layer Training Given instances weights gij and hij must be calculated Method: Error Back-propagation Thus: In the beginning all weights are random values First input vector (x1,...,xn) of the training set is propagated through the network in forward direction (i.e. from left to right) Delivers output vector (y1,...,ym) Desired output is given in the training data set! Actual output (y1,...,ym) is compared to the desired output (1,...,m) Call the difference between actual and desired output the error F Adapt weights gij and hij in such a way, that the error F is minimized Thus: consider the value (y1 -1) 2 at first output cell a1 Adaptation of weight value h11 such that the error value (y1 -1) 2 is reduced Short cut: write h instead of h11 Consider the function F(h,x) in the rectangular sector shown I g 11 >0 h a 1 >0 . . . . . . . . . Rect. >0 sector 1 1 Zoom of the above sector with error calculation added at the end: a h 1 >0 2 x (y-) sector error function F(x,h) will is transformed into differentiable function by substituting the threshold value function at a1 I.e. previous simple 0-1-threshold value computation is substituted by a differentiable function Specifically: replace the step function 1 by 1 node a1 transforms the input to the value Error! (different choices the latter function are possible) How to adapt h ? Determine one-dimensional derivative F’(x, h) of F(x,h) with respect to h (x is regarded as a constant here!!) F (h) x h1 h h 2 x is constant, thus we can write Fx(h) instead of F(x,h) To reduce the error, simply subtract the value F’x(h) from h Effect: if F’x(h) is negative (i.e. at h2), error is reduced If F’x(h) is positive (i.e. at h1), error is also reduced Calculation of the derivative of Fx(h): Set r(h, x) = hx s(u) = Error! t(y) = (y - )2 as above, write rx(h) = r(h,x) = hx Thus : Fx(h) = t(s( rx(h)) and x are constant! Then s'(u) = s(u) (1-s(u)) (Exercise: check the latter formula!!) Therefore F'x(h) = t'( s ( rx(h))) s '( rx(h)) r 'x(h) (one-dimensional chain rule!) = 2( s ( rx(h)) - ) s '( rx(h)) x = 2(s(u) - ) s(u) (1-s(u)) x where u = rx(h) = xh Visualization (one-dimensional) F (h) x h h h 1 2 at h1 the derivative F'x(h) is positive, thus subtracting F'x(h) reduces the error in h2 F'x(h) is negative thus subtracting F'x(h) also reduces the error Extension to the overall network: Extend the error function F to the entire network Take derivative of F(x1,...,xn) (g11,...,gkl, h11,...,hrs) with respect to each individual gij (resp. hij) (partial derivatives) Change the weight vector into direction of the negative gradient, i.e. 'in direction of' F0 = -( Error!,...., Error!) i.e. subtract F0 from the weight vector G = (g11,...,gkl, h11,...,hrs) Derivation of F with respect to the weights gij can be calculated as above Recall the derivative of a two-dimensional function F : mapping R2 to R Vector dF of the partial derivatives is perpendicular to the Isocurves dF points towards the direction of the steepest increase Example: Function F: (x,y) -> x (Graph of the function is the plane in R3, vector of the partial derivative is the vector (1,0)) Adaptation of the weight vector G through subtraction G - F0 Thereby is a constant, which must be determined experimentally Caution: individual weights calculated above must not be changed, before all remaining weights have been calculated and stored (otherwise a typical programming bug would arise, calculated weight changes also depend on the current values of the remaining weights) Alternatives - gradient descent with a momentum - more layers

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 0 |

posted: | 3/30/2013 |

language: | Unknown |

pages: | 72 |

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.