Docstoc

Neural network

Document Sample
Neural network Powered By Docstoc
					Chapter 3

Segmentation
Image Acquisition


Recording of the image information by projection of
   intensities upon the recording medium:
   photographical film, CCD-sensor, etc.


For the continuous case (photographic film) the
   intensity is given by a contiuous function f(x,y).
   Here x and y indicate the coordinates on the film
   (i.e. origin is at left lower corner)
f(x,y) is the brightness or light intensity value at the
    point with coordinates x,y
   (e.g. 0 ... 1).
In the discrete case (image consists of ‘pixels’, or
    ‘picture elements)
   the image is a matrix, with M rows (lines) and N
   columns are
   let i and k be the row and column indices,
   so i = 1 ... M, k = 1 ... N.
    In digital image processing the intensity function
    can only assume discrete values (e.g. 0, 1, ...
    255).
    Thus denote the intensity function with f(i,k):


y




                               k

                         x
                                       i




Image preprocessing
Problem : Images delivered by the camera often
   have insufficient quality: noise, distortions, bad
   contrast, illumination problems, motion blurring


Image preprocessing : Low level processing to make
   the image usable for further processing steps (i.e.
   object recognition)
   ‘Image filtering’




Image filtering


a) Averaging
Local calculation of the average intensity for a
   neighborhood of adjacent pixels:


           1
       E3 = (E1 + E2 + E3 + E4 + E5)
           5
                                            E1

                                    E2      E3   E4
 k
                                            E5
       i


or spreading the averaging process onto regions of
    any size:


           g(i,k) =
                    1
                    N
                                       f(i,k)
                         i,k  Region


Effect of averaging: noise reduction, image is
"flattened" by removal of small, disturbances or data
gaps


Linear filter operator


Disadvantage: Edges in the image (important
   strucural features) are also flattened by averaging


b) Median filtering
Creation of the local median value: sort a 3x3 pixel
   matrix around E5 in ascending order into a list.
   Gray level of E5 is then replaced by the median
   value of the list


Example:
     List (1, 44, 77, 140, 190)


     Median is 77


     Average is (1+44+77+140+190)/5




Non-linear operator




                                    E1    E2     E3

                                    E4    E5     E6
 k
                                    E7    E8     E9
        i
Advantage of the median: less sensitive with respect
   to data gaps and random disturbances




Example:
   Pixel intensity values: (99,100,0,101,100)


   (value 0 is a data gap)


   Median: 100 (correct)
   Average: 400/5 = 80
Example (original image of a hammer)
Random noise added to the original image:
Averaging with 3x3 and 9x9 matrix




3x3:




9x9:
   Median- Filtering




   Original:




   median:




mean3:                            -> median
  suppresses noise with little smooting
c) Fourier transform


Universal method for the image preprocessing, can
   be used globally and locally. Contrast
   enhancement, noise filtering, edge detection


Filter characteristics can be adjusted in many ways


Principle


   decompose a (periodic) function f into partial
   frequencies


   e.g. Horzontal line in the image (row of pixels)
   can be regarded as a function f of one parameter:


   x: pixel number along the image line


   f(x) gray levels of the x-pixel
f is decomposed into individual frequencies


theorem: any (periodic) function can be decomposed
   in this way


f is only defined in small sector (image range), f can
     thus be regarded as section of a periodic function




Result: f is represented as a weighted sum of
   component functions sin(nx) and cos(nx),
   Short: sn, cn denote sin(nx), cos(nx).


For large n, the function sin(nx) is a function with a
   high frequency


Application
   Consider a (horizontal) line in the image


   Noise reduction: calculate fourier decomposition
   and delete all partial functions with a high
   frequency, i.e. all partial functions sin(nx), cos(nx)
   for large n
   Edge enhancement: delete all partial functions
   with low frequency:
         edges are sudden changes of intensity
         within one line


         abrupt changes correspond to steep
         ascends, can only be represented with high
         frequencies,


         -> deleting the low frequencies enhances
         the sharp ascends




Reverse tranformation: After deleting certain
   components of the form sin(nx), cos(nx)
   reconstruct f from the remaining components only


Similar procedure is possible with a 2D image matrix
   instead of a single line


(2D fourier transform)
Advantage: FT is independent of direction


Example




Orig. Image




Orig. Img. as function:
Fourier-transform (power spectrum)
Fourier-transform (as function)




Low pass
High pass
Filtering (with low and high pass)


Low pass:




High pass:
Image after low pass (obtained by reverse transform)




(Notice that a vertical line in the image is marked)




high pass:
Image along this vertical line (drawn as function)
Low pass:                  High pass:




Original:
Edge detection




An edge appears in an image as a sudden change of
   gray levels (intensities)


Goal of edge detection: find line segments or curves
  where such sudden changes occur in the image


Fundamental method for most edge finding
   procedures:
   Mathematical function differentiation


   I.e. regard image line as function f(x), where x is
   the index (pixel number)


   Calculate derivative df/dx or f ’(x) from f(x)




As long as the gray level remains constant, the
    derivative is null
For edges, derivative is distinct from null


For two-dimensional images (whole image instead of
   single line in image), the derivative of the
   intensity function f(x,y) depends on the direction




Example derivative along a line of the intensity
  matrix:
  y
           y0



                                   x



f(x, y0)




                                       x

First derivative in x-direction:
f'(x,y0)




                                       x
Second derivative in x-direction:
f''(x,y0)




                                              x




Absolute value of the first derivative corresponds the
   value of the gray level change:


    Edge detection with threshold value:
    Whenever the value of the first derivative
    exceeds threshold: report an edge at this pixel


    Problem: How can the threshold value be
    determined?
Second derivative is more sensitive to changes than
   first derivative (steeper ascend)
Second derivative is also highly sensitive towards
   noise. Therefore it is typically applied only after
   noise filtering.


   Important advantage:
   At each edge f''(x,y) has a zero-crossing.
   ->No threshold value is necessary for second
   derivative
Derivatives deliver different results for different
   directions.




Desirable:
    Direction-less edge detection
    Edge detection without having to specify any
    direction in advance


Gradients method: a procedure based on the first
   derivative of the image matrix


Definition of the gradient for a function of two
   variables f(x,y):




G(x,y) = (df(x,y)/dx , df(x,y)/dy) = : (fx, fy)
The direction of this gradient vector is the direction of
   the greatest ascent/descent, with starting point
   (x,y).


The absolute value of the gradient
                                      2    2
                G(x,y)=            fx + fy
is a measure for the amount of change,
independent of direction.


Absolute value of gradients is such a direction-less
   edge detector


If the preferred direction is desired:
   instead of taking the absolute value
   multiply the gradient with a vector perpendicular
   with the preferred edge direction




Calculation of the gradient
Approximation of the partial derivatives by simple
   differences :


   fx = ( f(x,y) - f(x - x, y) ) / (x)


   fy = ( f(x,y) - f(x, y - y) ) / (y)
Here x = 1 can be assumedFor the case of
   discrete image matrices:

              fi = f(i,k) - f(i - 1, k)

              fk = f(i,k) - f(i, k - 1)


Implemetation of both formulas as operators
''operator masks'':
                                       Gx

                                  -1           +1


                                          Gy

                                          +1
 k
      i
                                          -1
Application of the operator
     move the mask over the entire image matrix and
     multiply each intensity value with the weighting
     factors (here +1 and –1)




Improvement: Use of more than two pixels and extra
   weighting of some pixels:


     (reduces noise sensitivity)
     Example for such operators:
              Gx                          Gy

         -1     0     +1             +1    +2    +1

         -2     0     +2             0      0     0

         -1
          i     0     +1             -1
                                      i    -2    -1
 y

        x
The combination of these two operators is known as
   ''Sobel-Operator''.
                       2          2
      GS(x,y) =      Gx (x,y) + Gy (x,y)




There are many other possibilities for defining
operators based on the first derivative or gradients
Example:


"Roberts-Cross"
   G(x,y) = MAX(abs(fx, absfy))
Common feature of nearly all gradient-based
  operators:
   maximum at the position of the edge has width of
   several pixels
   Desirable: width only one pixel
   ->line thinning procedures necessary




   -> Erosion, dilatation
   (based on the Minkowski-sum)
Examples (Sobel operator)

Original:




Sobel (Gs)
vertical edges (Gx)




horizontal edges (Gy)




operators can be tested online under

www9.in.tum.de:8000
Segmentation



Partitioning of the image into regions (segments or
   ''semantic units'') according to appropriate
   homogeneity criteria



Example: Image with one large object. Instead of
   finding the object’s edges, find the region
   representing the object in the image



Result: more compact descriptions of the scene,

        a higher abstraction level



Two procedures can be distinguished :



a) Homogeneity orientated segmentation:
    construction of regions of similar image elements
    until a discontinuity is encountered

   (region-oriented)
b) Discontinuity orientated segmentation: search for
    edges first, then connect the edges to region
    boundaries with appropriate criteria

   (boundary-oriented)



Construction of connected regions (search for
  homogeneities)



Simplest method : threshold value



Starting from a starting point (x,y) all surrounding
   image points are checked for their gray level.

   An new image point at the position (x’,y’) is
   included in the region, if:



   Abs(f(x,y) - f(x’,y’)) <= T


Very simple procedure. Problems :

Where is the best starting point for a region?

How to choose the threshold value for a region?
Frequently used method: region orientated
   segmentation through partitioning and
   reassembling ("Split - And - Merge")




Starting from the entire image, the current region is
   partitioned into four quadratic sub-regions

   Each quadratic sub-region is partitioned further
   (into four smaller quadrants) if the threshold value
   inside the region is exceeded (Splitting).



Algorithm stops, if no further splitting or merging is
   possible.
                                                 Start




Split :

                                             Start
      1    2


                             1           2           3       4
      4        3




Split :
                                     Start
          21 22
     1
          24 23
                        1        2           3           4
  41 42   31       32

  44 43   34       33                                41 42 43 44

                                 31 32 33 34
               21 22 23 24
Merge :
                                     START

            23

            32         1             2            3           4

                           23 24                 31 32                    42




Split :                                  START

          231232
          234233            1            2            3           4
          321322
          324323
                                23 24              31 32                       42


                   231 232 233 234                     321 322 323 324
Merge :

                                             START



                                1            2            3           4

                                    23 24             31 32                     42


                           231 234                        321 324
Representation of the solution as a tree:



  - Expansion of a node, if threshold value
    exceeded



  - Non-expanded nodes are merged to a node



"Split - And - Merge" avoids problem of searching for
   a good starting point.

   However: Threshold value still needed.




For scenes with known lighting and approximately
  known image content:

   Determination of the threshold value "by
   educated guessing" of the user



In appropriate cases: use grey level histogram to find
    threshold value:
 N(r)




                                                    r

         1           2              3      4

x-axis: gray levels (e.g. 0..255)

y-axis: number of pixels with grey level 0, number of
   pixels with grey level 1, ...A large region contains
   many points of the same grey level. Thus, if there
   are several regions with distinct grey levels, the
   histogram of the image has several distinct
   maxima and minima.



The threshold value limits are then placed at the
   minima. This reduces the danger to split regions,
   which actually belong together.

Threshold value determination is particularly simple
in case of a histogram with two distinct maxima (so-
called “bimodal histogram”):



   output image has only two gray levels
   image can be processed as binary image:



        Image elements of one gray level value are
        dedicated to objects, the other gray level
        value represents the background




Advantages of pure binary image processing:

  - very simple hardware

  - fast ("real-time segmentation")

  - straightforward analysis



These advantages are partially compensated by the
   necessity of special lighting
Edge-orientated segmentation



Construction of contours by following sequences of
   object edges in the image



Requires edge detection procedures (derivatives,
   Sobel operator, etc.)



Connect adjacent edges to regions



Problem : How to connect edges, particularly in the
   case of alternative solutions?



Two possibilities :
Contour detection is reduced to the search of a path
   through a graph. Graph describes all transitions
   between an image point and its “neighbor”.



As soon as the path has been found, the region is
   described by its contour.
Advantage of the second method: It is possible to
  use prior knowledge about the expected contours



By changing the heuristic used for the search one
   can adapt to the respective problem. But the
   determination of this heuristic can be difficult.




Procedure :

In discrete images an edge consists of ‘edge
    elements’,

   An edge-element is the segment of an edge
   between to adjacent pixels



   Example: Pixels A and B have one edge element
   in common, denoted by (A, B)
  A        B       C
                        The edge elements are
  D         E

   represented as nodes in a graph:



                  Starting point: A
         k(A,D)                  k(A,B)

    A,D             k(A,B)
                                      A,B
                  k(A,D)
k(D,E)                           k(D,E)       k(B,E)

         D,E                 k(B,E)
                                               B,E


                                            k(B,C)

                                                       B,C



Thus:

   Nodes in the graph are edge elements

   e.g. edge element between pixel A and B
   denoted by (A,B)
Successor of a node:

   Edge element (A, B) has two end points

   The are successors of the node (A, B) are the
   edge elements touching these end points

   Example (B, E) is a successor of (A, B)Starting
   from a point S adding a new edge element to an
   existing contour causes a certain cost



The cost function g(n) describes the cost for a path
   from the starting point S to node n


The costs between any two nodes ni and nj are
   denoted by k(ni, nj)

Thus edge detection is reduced to the search of the
   shortest path in a graph. The set of all nodes
   (edge elements), which are along the shortest
   path, describes the desired contour



The cost function depends on the intensity difference
   between the edge elements determining image
   points P1 and P2
   Large intensity difference between the pixels
   (A,B):

   Edge element (A, B) has low costs



   Small intensity difference: pixels A and B belong
   to the same region. Thus (A, B) should not be
   part of the contour -> (A, B) receives high cost
   (penalty)

Example for a contour :


               A          B        C
               10         10       5
               D          E        F
               10         5        5
               G          H        I
               10         5        4

Graph :
                            START

                            10         5
                A,B                            B,C
 A,D                                                             C, F
           5            5                  5            10

   D,E                       B,E                             E,F
                5                                  10

           10                                           10
       5                         E,H                         9

                    5                          9
  G,H                                                        H,I




Finding a shortest path in a graph is equivalent to
   finding a path with lowest total cost.



Several methods for finding shortest paths in graphs
   are known

Graph searching techniques:



Given:
   Start node and one or more goal nodes (goal
   states). Wanted is an optimal (low-cost) path, to
   connect the start node to the goal node




In case of contour detection: a start node (=starting
    image point) and several target nodes



   a) closed contour is completely visible the image:

       starting point = end point

   b) contour lies only partially inside image: end
   point is pixel on the image boundary

Breadth first search



Breadth first search always finds the shortest path, if
   one exists



But:



   Number of nodes searched is exponential

   -> impractical in most cases
Thus:

   Include previous knowledge via the cost function,

   ‘always only follow the one path currently having
   lowest cost’ instead of all paths as in breadth-first
   search
Application of the A*-algorithm to segmentation



Nodes again correspond to edge elements, as above



Cost function depends on the gradient as above, i.e.
   the lower the intensity difference between the two
   pixels of an edge element, the higher the costs of
   this edge element



Heuristic function depends on the expected contour
   The greater the deviations from the expected
   contour, the higher the heuristic cost of this path

Example for the calculation of the heuristic function
   h(n) for a node n:

a) Assume squares must be detected in the image.
    Then choose h such that costs are increase with
    distance to the starting point. Additionally any
    deviation from a straight line, which does not
    amount ±90o should cause high cost

b) Likewise, to detect circles of known radius,
    function h can be chosen proportional to the
    difference of a default radius of curvature
Thus h is only useful, if the shape of objects to be
   detected is known in advance



Object recognition




The goal of object recognition in images is to extract
   assertions of the form:



   "Image region X with the properties Y it is an
   apple (a dog, an assembly part…), if projected
   with method Z onto the image area."



To make such assertions, a model is needed, i.e.



   a) Knowledge of all objects potentially occurring
   in the image and

   b) Means for the description of the current image
   contents.
Approach :




a) Comparison between characteristics of prototypes
    with an observed features (characteristics) in the
    image on the basis of statistic methods (decision
    theory).



b) Classification with neural networks



Statistic approach:



An object in the image is characterized by n
   descriptors xi (e.g. length, area, color,
   circularity,…).



Assemble the descriptors to a so-called
   characteristic vector:
                            x0
                            x1
                       x= .
                             .
                            xn


The decision is made in such a way, that a presented
   object is mapped to a certain object class i.




Decision is made by evaluating a decision function
   di(x).

If M different object classes have to be distinguished,
    M decision functions are needed, which map the
    characteristic vector to the different classes.
The decision is made in such a way, that
   characteristic vector x of the object is inserted
   into all decision functions. The object belongs to
   that class, whose decision function is smaller
   (greater) than all other values of decision
   functions, i.e. xa belongs to the class i, if



di(xa) < dj(xa) j \ j=iIn the simplest case the
    decision function is the Euclidian distance
    between the characteristic vector x and the
    prototype vector of the class


di(x) = absx – miThe prototype vector for each
    class is computed by "averaging" over a large
    number N of actual characteristic vectors:


                             N
                   mi =  xk
                       1
                       Nk = 1
Very fast (parallelism !) decision making is possible,
   but the process is relatively rigid



Interpretation with neural networks
Characteristic vector as above

    But first of all an adequate model is computed by
    'training'



    Training through back-propagation

Approach


Internal system parameters (weight values) are
adapted to best fit the training data


adaptation in small steps, similar to numeric
minimization



Basic model

           x1    g1
           x2
                       +           >S
           ...
           xn    gn



    'Perceptron'
    Binary input values x1,..,xn (only 0 or 1)

    Output also binary

    Threshold value S

    Weights g1,...,gn



Alternatives

    State storage


    x1    g1
    x2            Z         >S
    ...          f
    xn    gn




    More general function f replaces summation, in
    the above model f(x1,...,xn) = x1 + ...+xn


    Function f depends on the state Z, Z is stored


Alternative 2
    Combination with a logic gate




                    l1
                           g1

                    l2            +         >S

                           g3
                    l3




Logic gates transform many binary inputs to one
output


Goal: determine values gi from a number of training
data sets

each training data set has known classification

determine values gi such that a new previously un-
classified data set can be classified correctly


Training instances are pre-classified, i.e. each
training instance is marked as a positive or a
negative example for the feature to be learned
Neural network


                      hidden
             Input    cells    Output
                        >S1        >S1'

                        >S2        >S2'
                 .        .        .
                 .        .        .
                 .        .        .
                        >Sn        >Sm'




resp. (Threshold values transformed into as weights)
               I                                      O
                           g             h
                               11   >0
                                             11
                                                  >0

                                    >0            >0

                       .            .             .
                       .            .             .
                       .            .             .
                                    >0            >0


                   1                1


Binary input in cells in layer I

Binary output in layer O


Procedure for the calculation of an output for a given
input



    1.    Multiplication of the weights gij with the input
          parameters

    2.   Threshold value comparison in the hidden
         layer



    3.   Propagation into the output layer,
         multiplication with the weights hij
    4.     Threshold value comparison in the output
           layer

Training

    Given instances

    weights gij and hij must be calculated


    Method:
        Error Back-propagation

           Thus:

           In the beginning all weights are random
           values


           First input vector (x1,...,xn) of the training
           set is propagated through the network in
           forward direction (i.e. from left to right)


           Delivers output vector (y1,...,ym)

           Desired output is given in the training data
           set!

           Actual output (y1,...,ym) is compared to the
           desired output (1,...,m)
Call the difference between actual and desired
output the error F

Adapt weights gij and hij in such a way, that the
error F is minimized


Thus: consider the value
(y1 -1) 2 at first output cell a1


Adaptation of weight value h11 such that the
error value (y1 -1) 2 is reduced

Short cut: write h instead of h11

Consider the function F(h,x) in the rectangular
sector shown

    I
                g
                    11   >0
                              h       a
                                          1
                         >0

            .            .        .
            .            .        .
            .            .        .           Rect.
                         >0
                                              sector

        1                1
Zoom of the above sector with error calculation
added at the end:


                         a
                 h           1
                         >0           2
            x                    (y-)
                                     




                                 sector



error function F(x,h) will is transformed into
differentiable function by substituting the threshold
value function at a1


I.e. previous simple 0-1-threshold value computation
is substituted by a differentiable function


Specifically: replace the step function



                     1
by


                      1




node a1 transforms the input to the value

     Error!




(different choices the latter function are possible)


How to adapt h ?


     Determine one-dimensional derivative F’(x, h) of
     F(x,h) with respect to h (x is regarded as a
     constant here!!)


              F (h)
               x




                          h1        h h
                                     2
x is constant, thus we can write Fx(h)
instead of F(x,h)

To reduce the error, simply subtract the value F’x(h)
from h

    Effect:
        if F’x(h) is negative (i.e. at h2), error is
        reduced
        If F’x(h) is positive (i.e. at h1), error is also
        reduced


Calculation of the derivative of Fx(h):

    Set        r(h, x) = hx

               s(u) =   Error!




               t(y) = (y - )2


as above, write

          rx(h) = r(h,x) = hx
Thus :


    Fx(h) = t(s( rx(h))
              and x are constant!

Then


    s'(u) = s(u) (1-s(u))


(Exercise: check the latter formula!!)

Therefore


    F'x(h)    = t'( s ( rx(h))) s '( rx(h))   r 'x(h)

    (one-dimensional chain rule!)

              = 2( s ( rx(h)) - ) s '( rx(h)) x


              = 2(s(u) - ) s(u) (1-s(u)) x



    where u = rx(h) = xh
Visualization (one-dimensional)
            F (h)
             x




                         h               h h
                          1               2


at h1 the derivative F'x(h) is positive,
thus subtracting F'x(h) reduces the error


in h2 F'x(h) is negative
thus subtracting F'x(h) also reduces the error


Extension to the overall network:

    Extend the error function F to the entire network


    Take derivative of

         F(x1,...,xn) (g11,...,gkl, h11,...,hrs)


    with respect to each individual gij (resp. hij)


         (partial derivatives)
    Change the weight vector into direction of the
    negative gradient, i.e. 'in direction of'



    F0 =      -(   Error!,...., Error!)


    i.e. subtract F0 from the weight vector

         G = (g11,...,gkl, h11,...,hrs)


    Derivation of F with respect to the weights gij
    can be calculated as above

Recall the derivative of a two-dimensional function

    F : mapping R2 to R

    Vector dF of the partial derivatives is
    perpendicular to the Isocurves
    dF points towards the direction of the steepest
    increase

    Example:

         Function
            F: (x,y) -> x

         (Graph of the function is the plane in R3,
         vector of the partial derivative is the vector
         (1,0))
    Adaptation of the weight vector G through
    subtraction

         G - F0

    Thereby is a constant, which must be
    determined experimentally

Caution: individual weights calculated above must
not be changed, before all remaining weights have
been calculated and stored


(otherwise a typical programming bug would arise,
calculated weight changes also depend on the
current values of the remaining weights)

Alternatives

    -    gradient descent with a momentum

    -    more layers

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/30/2013
language:Unknown
pages:72
dominic.cecilia dominic.cecilia http://
About