# Neural network

Document Sample

```					Chapter 3

Segmentation
Image Acquisition

Recording of the image information by projection of
intensities upon the recording medium:
photographical film, CCD-sensor, etc.

For the continuous case (photographic film) the
intensity is given by a contiuous function f(x,y).
Here x and y indicate the coordinates on the film
(i.e. origin is at left lower corner)
f(x,y) is the brightness or light intensity value at the
point with coordinates x,y
(e.g. 0 ... 1).
In the discrete case (image consists of ‘pixels’, or
‘picture elements)
the image is a matrix, with M rows (lines) and N
columns are
let i and k be the row and column indices,
so i = 1 ... M, k = 1 ... N.
In digital image processing the intensity function
can only assume discrete values (e.g. 0, 1, ...
255).
Thus denote the intensity function with f(i,k):

y

k

x
i

Image preprocessing
Problem : Images delivered by the camera often
have insufficient quality: noise, distortions, bad
contrast, illumination problems, motion blurring

Image preprocessing : Low level processing to make
the image usable for further processing steps (i.e.
object recognition)
‘Image filtering’

Image filtering

a) Averaging
Local calculation of the average intensity for a
neighborhood of adjacent pixels:

1
E3 = (E1 + E2 + E3 + E4 + E5)
5
E1

E2      E3   E4
k
E5
i

or spreading the averaging process onto regions of
any size:

g(i,k) =
1
N
          f(i,k)
i,k  Region

Effect of averaging: noise reduction, image is
"flattened" by removal of small, disturbances or data
gaps

Linear filter operator

Disadvantage: Edges in the image (important
strucural features) are also flattened by averaging

b) Median filtering
Creation of the local median value: sort a 3x3 pixel
matrix around E5 in ascending order into a list.
Gray level of E5 is then replaced by the median
value of the list

Example:
List (1, 44, 77, 140, 190)

Median is 77

Average is (1+44+77+140+190)/5

Non-linear operator

E1    E2     E3

E4    E5     E6
k
E7    E8     E9
i
Advantage of the median: less sensitive with respect
to data gaps and random disturbances

Example:
Pixel intensity values: (99,100,0,101,100)

(value 0 is a data gap)

Median: 100 (correct)
Average: 400/5 = 80
Example (original image of a hammer)
Random noise added to the original image:
Averaging with 3x3 and 9x9 matrix

3x3:

9x9:
Median- Filtering

Original:

median:

mean3:                            -> median
suppresses noise with little smooting
c) Fourier transform

Universal method for the image preprocessing, can
be used globally and locally. Contrast
enhancement, noise filtering, edge detection

Filter characteristics can be adjusted in many ways

Principle

decompose a (periodic) function f into partial
frequencies

e.g. Horzontal line in the image (row of pixels)
can be regarded as a function f of one parameter:

x: pixel number along the image line

f(x) gray levels of the x-pixel
f is decomposed into individual frequencies

theorem: any (periodic) function can be decomposed
in this way

f is only defined in small sector (image range), f can
thus be regarded as section of a periodic function

Result: f is represented as a weighted sum of
component functions sin(nx) and cos(nx),
Short: sn, cn denote sin(nx), cos(nx).

For large n, the function sin(nx) is a function with a
high frequency

Application
Consider a (horizontal) line in the image

Noise reduction: calculate fourier decomposition
and delete all partial functions with a high
frequency, i.e. all partial functions sin(nx), cos(nx)
for large n
Edge enhancement: delete all partial functions
with low frequency:
edges are sudden changes of intensity
within one line

abrupt changes correspond to steep
ascends, can only be represented with high
frequencies,

-> deleting the low frequencies enhances
the sharp ascends

Reverse tranformation: After deleting certain
components of the form sin(nx), cos(nx)
reconstruct f from the remaining components only

Similar procedure is possible with a 2D image matrix
instead of a single line

(2D fourier transform)
Advantage: FT is independent of direction

Example

Orig. Image

Orig. Img. as function:
Fourier-transform (power spectrum)
Fourier-transform (as function)

Low pass
High pass
Filtering (with low and high pass)

Low pass:

High pass:
Image after low pass (obtained by reverse transform)

(Notice that a vertical line in the image is marked)

high pass:
Image along this vertical line (drawn as function)
Low pass:                  High pass:

Original:
Edge detection

An edge appears in an image as a sudden change of
gray levels (intensities)

Goal of edge detection: find line segments or curves
where such sudden changes occur in the image

Fundamental method for most edge finding
procedures:
Mathematical function differentiation

I.e. regard image line as function f(x), where x is
the index (pixel number)

Calculate derivative df/dx or f ’(x) from f(x)

As long as the gray level remains constant, the
derivative is null
For edges, derivative is distinct from null

For two-dimensional images (whole image instead of
single line in image), the derivative of the
intensity function f(x,y) depends on the direction

Example derivative along a line of the intensity
matrix:
y
y0

x

f(x, y0)

x

First derivative in x-direction:
f'(x,y0)

x
Second derivative in x-direction:
f''(x,y0)

x

Absolute value of the first derivative corresponds the
value of the gray level change:

Edge detection with threshold value:
Whenever the value of the first derivative
exceeds threshold: report an edge at this pixel

Problem: How can the threshold value be
determined?
Second derivative is more sensitive to changes than
first derivative (steeper ascend)
Second derivative is also highly sensitive towards
noise. Therefore it is typically applied only after
noise filtering.

At each edge f''(x,y) has a zero-crossing.
->No threshold value is necessary for second
derivative
Derivatives deliver different results for different
directions.

Desirable:
Direction-less edge detection
Edge detection without having to specify any

Gradients method: a procedure based on the first
derivative of the image matrix

Definition of the gradient for a function of two
variables f(x,y):

G(x,y) = (df(x,y)/dx , df(x,y)/dy) = : (fx, fy)
The direction of this gradient vector is the direction of
the greatest ascent/descent, with starting point
(x,y).

The absolute value of the gradient
2    2
G(x,y)=            fx + fy
is a measure for the amount of change,
independent of direction.

Absolute value of gradients is such a direction-less
edge detector

If the preferred direction is desired:
instead of taking the absolute value
multiply the gradient with a vector perpendicular
with the preferred edge direction

Calculation of the gradient
Approximation of the partial derivatives by simple
differences :

fx = ( f(x,y) - f(x - x, y) ) / (x)

fy = ( f(x,y) - f(x, y - y) ) / (y)
Here x = 1 can be assumedFor the case of
discrete image matrices:

fi = f(i,k) - f(i - 1, k)

fk = f(i,k) - f(i, k - 1)

Implemetation of both formulas as operators
Gx

-1           +1

Gy

+1
k
i
-1
Application of the operator
move the mask over the entire image matrix and
multiply each intensity value with the weighting
factors (here +1 and –1)

Improvement: Use of more than two pixels and extra
weighting of some pixels:

(reduces noise sensitivity)
Example for such operators:
Gx                          Gy

-1     0     +1             +1    +2    +1

-2     0     +2             0      0     0

-1
i     0     +1             -1
i    -2    -1
y

x
The combination of these two operators is known as
''Sobel-Operator''.
2          2
GS(x,y) =      Gx (x,y) + Gy (x,y)

There are many other possibilities for defining
operators based on the first derivative or gradients
Example:

"Roberts-Cross"
G(x,y) = MAX(abs(fx, absfy))
Common feature of nearly all gradient-based
operators:
maximum at the position of the edge has width of
several pixels
Desirable: width only one pixel
->line thinning procedures necessary

-> Erosion, dilatation
(based on the Minkowski-sum)
Examples (Sobel operator)

Original:

Sobel (Gs)
vertical edges (Gx)

horizontal edges (Gy)

operators can be tested online under

www9.in.tum.de:8000
Segmentation

Partitioning of the image into regions (segments or
''semantic units'') according to appropriate
homogeneity criteria

Example: Image with one large object. Instead of
finding the object’s edges, find the region
representing the object in the image

Result: more compact descriptions of the scene,

a higher abstraction level

Two procedures can be distinguished :

a) Homogeneity orientated segmentation:
construction of regions of similar image elements
until a discontinuity is encountered

(region-oriented)
b) Discontinuity orientated segmentation: search for
edges first, then connect the edges to region
boundaries with appropriate criteria

(boundary-oriented)

Construction of connected regions (search for
homogeneities)

Simplest method : threshold value

Starting from a starting point (x,y) all surrounding
image points are checked for their gray level.

An new image point at the position (x’,y’) is
included in the region, if:

Abs(f(x,y) - f(x’,y’)) <= T

Very simple procedure. Problems :

Where is the best starting point for a region?

How to choose the threshold value for a region?
Frequently used method: region orientated
segmentation through partitioning and
reassembling ("Split - And - Merge")

Starting from the entire image, the current region is
partitioned into four quadratic sub-regions

Each quadratic sub-region is partitioned further
(into four smaller quadrants) if the threshold value
inside the region is exceeded (Splitting).

Algorithm stops, if no further splitting or merging is
possible.
Start

Split :

Start
1    2

1           2           3       4
4        3

Split :
Start
21 22
1
24 23
1        2           3           4
41 42   31       32

44 43   34       33                                41 42 43 44

31 32 33 34
21 22 23 24
Merge :
START

23

32         1             2            3           4

23 24                 31 32                    42

Split :                                  START

231232
234233            1            2            3           4
321322
324323
23 24              31 32                       42

231 232 233 234                     321 322 323 324
Merge :

START

1            2            3           4

23 24             31 32                     42

231 234                        321 324
Representation of the solution as a tree:

- Expansion of a node, if threshold value
exceeded

- Non-expanded nodes are merged to a node

"Split - And - Merge" avoids problem of searching for
a good starting point.

However: Threshold value still needed.

For scenes with known lighting and approximately
known image content:

Determination of the threshold value "by
educated guessing" of the user

In appropriate cases: use grey level histogram to find
threshold value:
N(r)

r

1           2              3      4

x-axis: gray levels (e.g. 0..255)

y-axis: number of pixels with grey level 0, number of
pixels with grey level 1, ...A large region contains
many points of the same grey level. Thus, if there
are several regions with distinct grey levels, the
histogram of the image has several distinct
maxima and minima.

The threshold value limits are then placed at the
minima. This reduces the danger to split regions,
which actually belong together.

Threshold value determination is particularly simple
in case of a histogram with two distinct maxima (so-
called “bimodal histogram”):

output image has only two gray levels
image can be processed as binary image:

Image elements of one gray level value are
dedicated to objects, the other gray level
value represents the background

Advantages of pure binary image processing:

- very simple hardware

- fast ("real-time segmentation")

- straightforward analysis

These advantages are partially compensated by the
necessity of special lighting
Edge-orientated segmentation

Construction of contours by following sequences of
object edges in the image

Requires edge detection procedures (derivatives,
Sobel operator, etc.)

Connect adjacent edges to regions

Problem : How to connect edges, particularly in the
case of alternative solutions?

Two possibilities :
Contour detection is reduced to the search of a path
through a graph. Graph describes all transitions
between an image point and its “neighbor”.

As soon as the path has been found, the region is
described by its contour.
Advantage of the second method: It is possible to
use prior knowledge about the expected contours

By changing the heuristic used for the search one
can adapt to the respective problem. But the
determination of this heuristic can be difficult.

Procedure :

In discrete images an edge consists of ‘edge
elements’,

An edge-element is the segment of an edge
between to adjacent pixels

Example: Pixels A and B have one edge element
in common, denoted by (A, B)
A        B       C
The edge elements are
D         E

represented as nodes in a graph:

Starting point: A
k(A,D)                  k(A,B)

A,D             k(A,B)
A,B
k(A,D)
k(D,E)                           k(D,E)       k(B,E)

D,E                 k(B,E)
B,E

k(B,C)

B,C

Thus:

Nodes in the graph are edge elements

e.g. edge element between pixel A and B
denoted by (A,B)
Successor of a node:

Edge element (A, B) has two end points

The are successors of the node (A, B) are the
edge elements touching these end points

Example (B, E) is a successor of (A, B)Starting
from a point S adding a new edge element to an
existing contour causes a certain cost

The cost function g(n) describes the cost for a path
from the starting point S to node n

The costs between any two nodes ni and nj are
denoted by k(ni, nj)

Thus edge detection is reduced to the search of the
shortest path in a graph. The set of all nodes
(edge elements), which are along the shortest
path, describes the desired contour

The cost function depends on the intensity difference
between the edge elements determining image
points P1 and P2
Large intensity difference between the pixels
(A,B):

Edge element (A, B) has low costs

Small intensity difference: pixels A and B belong
to the same region. Thus (A, B) should not be
part of the contour -> (A, B) receives high cost
(penalty)

Example for a contour :

A          B        C
10         10       5
D          E        F
10         5        5
G          H        I
10         5        4

Graph :
START

10         5
A,B                            B,C
A,D                                                             C, F
5            5                  5            10

D,E                       B,E                             E,F
5                                  10

10                                           10
5                         E,H                         9

5                          9
G,H                                                        H,I

Finding a shortest path in a graph is equivalent to
finding a path with lowest total cost.

Several methods for finding shortest paths in graphs
are known

Graph searching techniques:

Given:
Start node and one or more goal nodes (goal
states). Wanted is an optimal (low-cost) path, to
connect the start node to the goal node

In case of contour detection: a start node (=starting
image point) and several target nodes

a) closed contour is completely visible the image:

starting point = end point

b) contour lies only partially inside image: end
point is pixel on the image boundary

Breadth first search always finds the shortest path, if
one exists

But:

Number of nodes searched is exponential

-> impractical in most cases
Thus:

Include previous knowledge via the cost function,

‘always only follow the one path currently having
lowest cost’ instead of all paths as in breadth-first
search
Application of the A*-algorithm to segmentation

Nodes again correspond to edge elements, as above

Cost function depends on the gradient as above, i.e.
the lower the intensity difference between the two
pixels of an edge element, the higher the costs of
this edge element

Heuristic function depends on the expected contour
The greater the deviations from the expected
contour, the higher the heuristic cost of this path

Example for the calculation of the heuristic function
h(n) for a node n:

a) Assume squares must be detected in the image.
Then choose h such that costs are increase with
distance to the starting point. Additionally any
deviation from a straight line, which does not
amount ±90o should cause high cost

b) Likewise, to detect circles of known radius,
function h can be chosen proportional to the
difference of a default radius of curvature
Thus h is only useful, if the shape of objects to be
detected is known in advance

Object recognition

The goal of object recognition in images is to extract
assertions of the form:

"Image region X with the properties Y it is an
apple (a dog, an assembly part…), if projected
with method Z onto the image area."

To make such assertions, a model is needed, i.e.

a) Knowledge of all objects potentially occurring
in the image and

b) Means for the description of the current image
contents.
Approach :

a) Comparison between characteristics of prototypes
with an observed features (characteristics) in the
image on the basis of statistic methods (decision
theory).

b) Classification with neural networks

Statistic approach:

An object in the image is characterized by n
descriptors xi (e.g. length, area, color,
circularity,…).

Assemble the descriptors to a so-called
characteristic vector:
x0
x1
x= .
.
xn

The decision is made in such a way, that a presented
object is mapped to a certain object class i.

Decision is made by evaluating a decision function
di(x).

If M different object classes have to be distinguished,
M decision functions are needed, which map the
characteristic vector to the different classes.
The decision is made in such a way, that
characteristic vector x of the object is inserted
into all decision functions. The object belongs to
that class, whose decision function is smaller
(greater) than all other values of decision
functions, i.e. xa belongs to the class i, if

di(xa) < dj(xa) j \ j=iIn the simplest case the
decision function is the Euclidian distance
between the characteristic vector x and the
prototype vector of the class

di(x) = absx – miThe prototype vector for each
class is computed by "averaging" over a large
number N of actual characteristic vectors:

N
mi =  xk
1
Nk = 1
Very fast (parallelism !) decision making is possible,
but the process is relatively rigid

Interpretation with neural networks
Characteristic vector as above

But first of all an adequate model is computed by
'training'

Training through back-propagation

Approach

Internal system parameters (weight values) are
adapted to best fit the training data

adaptation in small steps, similar to numeric
minimization

Basic model

x1    g1
x2
+           >S
...
xn    gn

'Perceptron'
Binary input values x1,..,xn (only 0 or 1)

Output also binary

Threshold value S

Weights g1,...,gn

Alternatives

State storage

x1    g1
x2            Z         >S
...          f
xn    gn

More general function f replaces summation, in
the above model f(x1,...,xn) = x1 + ...+xn

Function f depends on the state Z, Z is stored

Alternative 2
Combination with a logic gate

l1
g1

l2            +         >S

g3
l3

Logic gates transform many binary inputs to one
output

Goal: determine values gi from a number of training
data sets

each training data set has known classification

determine values gi such that a new previously un-
classified data set can be classified correctly

Training instances are pre-classified, i.e. each
training instance is marked as a positive or a
negative example for the feature to be learned
Neural network

hidden
Input    cells    Output
>S1        >S1'

>S2        >S2'
.        .        .
.        .        .
.        .        .
>Sn        >Sm'

resp. (Threshold values transformed into as weights)
I                                      O
g             h
11   >0
11
>0

>0            >0

.            .             .
.            .             .
.            .             .
>0            >0

1                1

Binary input in cells in layer I

Binary output in layer O

Procedure for the calculation of an output for a given
input

1.    Multiplication of the weights gij with the input
parameters

2.   Threshold value comparison in the hidden
layer

3.   Propagation into the output layer,
multiplication with the weights hij
4.     Threshold value comparison in the output
layer

Training

Given instances

weights gij and hij must be calculated

Method:
Error Back-propagation

Thus:

In the beginning all weights are random
values

First input vector (x1,...,xn) of the training
set is propagated through the network in
forward direction (i.e. from left to right)

Delivers output vector (y1,...,ym)

Desired output is given in the training data
set!

Actual output (y1,...,ym) is compared to the
desired output (1,...,m)
Call the difference between actual and desired
output the error F

Adapt weights gij and hij in such a way, that the
error F is minimized

Thus: consider the value
(y1 -1) 2 at first output cell a1

Adaptation of weight value h11 such that the
error value (y1 -1) 2 is reduced

Short cut: write h instead of h11

Consider the function F(h,x) in the rectangular
sector shown

I
g
11   >0
h       a
1
>0

.            .        .
.            .        .
.            .        .           Rect.
>0
sector

1                1
Zoom of the above sector with error calculation
added at the end:

a
h           1
>0           2
x                    (y-)


sector

error function F(x,h) will is transformed into
differentiable function by substituting the threshold
value function at a1

I.e. previous simple 0-1-threshold value computation
is substituted by a differentiable function

Specifically: replace the step function

1
by

1

node a1 transforms the input to the value

Error!

(different choices the latter function are possible)

How to adapt h ?

Determine one-dimensional derivative F’(x, h) of
F(x,h) with respect to h (x is regarded as a
constant here!!)

F (h)
x

h1        h h
2
x is constant, thus we can write Fx(h)

To reduce the error, simply subtract the value F’x(h)
from h

Effect:
if F’x(h) is negative (i.e. at h2), error is
reduced
If F’x(h) is positive (i.e. at h1), error is also
reduced

Calculation of the derivative of Fx(h):

Set        r(h, x) = hx

s(u) =   Error!

t(y) = (y - )2

as above, write

rx(h) = r(h,x) = hx
Thus :

Fx(h) = t(s( rx(h))
and x are constant!

Then

s'(u) = s(u) (1-s(u))

(Exercise: check the latter formula!!)

Therefore

F'x(h)    = t'( s ( rx(h))) s '( rx(h))   r 'x(h)

(one-dimensional chain rule!)

= 2( s ( rx(h)) - ) s '( rx(h)) x

= 2(s(u) - ) s(u) (1-s(u)) x

where u = rx(h) = xh
Visualization (one-dimensional)
F (h)
x

h               h h
1               2

at h1 the derivative F'x(h) is positive,
thus subtracting F'x(h) reduces the error

in h2 F'x(h) is negative
thus subtracting F'x(h) also reduces the error

Extension to the overall network:

Extend the error function F to the entire network

Take derivative of

F(x1,...,xn) (g11,...,gkl, h11,...,hrs)

with respect to each individual gij (resp. hij)

(partial derivatives)
Change the weight vector into direction of the
negative gradient, i.e. 'in direction of'

F0 =      -(   Error!,...., Error!)

i.e. subtract F0 from the weight vector

G = (g11,...,gkl, h11,...,hrs)

Derivation of F with respect to the weights gij
can be calculated as above

Recall the derivative of a two-dimensional function

F : mapping R2 to R

Vector dF of the partial derivatives is
perpendicular to the Isocurves
dF points towards the direction of the steepest
increase

Example:

Function
F: (x,y) -> x

(Graph of the function is the plane in R3,
vector of the partial derivative is the vector
(1,0))
Adaptation of the weight vector G through
subtraction

G - F0

Thereby is a constant, which must be
determined experimentally

Caution: individual weights calculated above must
not be changed, before all remaining weights have
been calculated and stored

(otherwise a typical programming bug would arise,
calculated weight changes also depend on the
current values of the remaining weights)

Alternatives

-    gradient descent with a momentum

-    more layers

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 3/30/2013 language: Unknown pages: 72
dominic.cecilia http://