Associative Networks for Vision James Austin and Aaron Turner by tiny54tim


									Associative Networks for Vision

James Austin and Aaron Turner

Advanced Computer Architecture Group
Department of Computer Science
University of York

1. Introduction
The work described here forms part of the basis of a collaborative research project,
"Vision by Associative Reasoning", supported by the SERC and DTI, just starting at the
University of York, UK. The project aims to provide methodologies to integrate high and
low level computer vision and aims to exploit parallel methods to achieve this. The
demonstrator for the project is the matching of ground-based images to maps, to be used in
guidance of airborne vehicles. The aim of this paper is to report recent work on a neural
network-based recogniser that will be developed further in the project. The design of the
recogniser is reported in detail elsewhere. This paper reports initial evaluations of an
enhancement to the recogniser that permits invariant recognition to selected image

The original recogniser was aimed at object recognition and recall of complete prototype
images of the object recognised. As such it acts as a distributed associative memory
(ADAM). As it architecture is similar to multi-layer neural networks it suffers from the
problems inherent in many networks of this type, namely, an inability to recognise objects
other than in the position that they were taught. The most recent work on ADAM has been
to enhance the network by the application of a front-end processor that enables the recog-
nition abilities of the system to be invariant to any selected set of image transforms.

The following sections briefly present an overview of the ADAM system and the front-end
processor. Initial results on the robustness of the recogniser are then presented.

2. The ADAM memory
For brevity only a part of the ADAM memory is described here; this consists of the recog-
nition engine of the network. The architecture of this part of the network has been used in
the front-end processor to enable dedicated implementations to be efficiently utilized. As
shown in fig. 1 this part of the network is very similar to the N tuple recognisers used in
many applications. The operation of the network follows other neural networks, in that the
network has a training phase and a teaching phase. However, these two phases can be
intermixed with out losing the ability to recognise patterns previously taught, and training
does not require repeated application of the data.
ESPRIT-BRA Workshop on natural and artificial vision, 29-30 Jan 1991, France.

When ever a pattern is presented to the network a set of tuples are constructed from the
image, each N in size, and are then individually applied to an encoding function. This
function computes a state for each tuple. The decoders shown in fig. 1 achieve this opera-

                                                                            Corellation Matrix

     Input Image       A                  Q

                       B                  R



                                                                                 Class output
                                           Decoder Function
                               Inputs               ¤          Outputs
                                 A             B        P
                                                               Q R    ¤
                                 0             0        1      0    0                0
                                 0             1        0      1¤
                                                                    0                0
                                 1             0        0    ¤¤
                                                               0    1                0
                                 1             1        0      0    0                1

                           Fig. 1. The front-end processor to ADAM.

The output of the decoders is fed to a correlation matrix memory, which associates this
output pattern with a ‘class’ pattern. The class is used as an identifier for each pattern class
and in the full ADAM memory is sent to a second stage for associative recall of a proto-
type example belonging to the class. The correlation matrix learns by setting a link at
cross-points with both activated lines active in training. During testing the outputs of the
correlation matrix form a pattern which is thresholded to recover the class pattern origi-
nally taught. To permit both simple recall and efficient use of memory, the class pattern is
made up of a unique K bit pattern for each class. Because each class is constructed in this
way, thresholding during testing can be achieved by selecting the K highest elements of the
response from the correlation matrix memory.

The thresholding process used allows the original class to be recovered even if the input
pattern is very noisy. Because the class is a distributed pattern, the number of distinct
classes that can be represented in a q bit class array is;

                                 patterns =                    
                                               K!(q − K) !

Where K is the number of bits set in each class.

However, due to staturation problems this limit cannot be reached.

While many other associative memories have been developed (i.e. Kanervas Sparse Dis-
tributed Associative Memory, Gardner Medwin and Willshaws Non-holographic Memory)
none has used such a simple and effective method for recovering stored patterns.

The essential feature of the ADAM memory is its use of the K point class pattern. The fol-
lowing will show how this coding method has been used in the front-end processor.

3. Addition of Invariance
The theoretical aspects of the pre-processor used to allow invariance has been described in
Austin. Although the pre-processor was designed to be a front-end to ADAM, the descrip-
tion here is limited to its use as a pre-processor to a simple distance classifier. This has
allowed us to evaluate the general use of the method which could be applied to many net-

The pre-processor uses a very direct approach to achieve invariance. The description here
concentrates on testing invariance to rotation and scale transformations. However, in prin-
cipal, the method can be used to gain invariance to any transformation of the image.

The pre-processor works by efficiently encoding a set of transformation matrices which
represent the transformations to which the recogniser is to be insensitive to. In use, when a
pattern is presented to the system, all the encoded transforms are applied to the image. The
resultant image represents the image under all encoded transformations. This image is then
passed to the classifier, which identifies the image and which transformation has been
applied to it to enable a match to be found.

The system encodes the transformations to both reduce computation and reduce the
memory needed to store the transformations. For example a typical system may require
insensitivity to 20 rotations and 100 scale changes of the image. To allow any transforma-
tion to be stored, the transformation matrices would occupy 1.6 10 13 bits of storage.
Furthermore, the application of these 2000 transformations to obtain a best match would
require large computation resources if the operation were to be achieved in real time.

The system reduces the memory use by storing the transformation matrices in a memory
using the K point code. Briefly, each sparse transformation matrix is encoded using a
unique K point code that identifies the transformation. The is achieved by replacing each
location in the transformation matrix at logical 1 with the code, all other locations with
zero. When all matrices are encoded they are logically ORed together - thus encoding them
into one memory. Application of this matrix to the unknown pattern results in a representa-
tion of the pattern under each stored transformation, but where each pattern is now coded.
A stored template of the pattern under transformation T can then be compared to the coded
input pattern. At each point in the template an element of the template pattern will

coincide with one of the coded points of the input pattern. Typically, all the coincidences
will contain the code for the transformation T. Other codes will of course be combined
with this code. However, it is a simple summing operation over these matches, followed
by K point thresholding, that will recover the code.

In effect, the result of applying the coded transformations is a virtual ADAM memory
which stores all the transformed representations of the unknown input pattern within it.
This can then be accessed in the same way as an ADAM memory is, to recover the

A full description of this can be found in Austin.

The coding results in a dramatic reduction in memory use. If the input is sampled in a car-
tisian coodinate system, a reduction factor of 10 4 is achieved; if polar sampling is used, a
factor of 10 8 is achieved. This is also reflected in the reduced computational overhead,
although this comparison is difficult, as many ways of achieving what we have done here
are possible.

The following sections present initial results on a system which has the form described

4. Initial Evaluations of the Pre-processor

The invariant pre-processor was evaluated for a log-polar sampling window of 20 rotations
and 100 scaling increments. The system was loaded with transformations to make recogni-
tion invariant to all 20 rotations and 100 scales. Using the reductions described in Austin
the memory used by such a system was 7.76 x 10 5 bits. For all the evaluations randomly
generated patterns were generated and the ability to recognise the images in the 20 possible
rotations was assessed. To obtain reliable interpretations many runs were performed, and
the results averaged. This took a considerable amount of CPU time (months).

4.1. Experiment 2
The first evaluation examined the effect of k, the number of bits set to one in each code.
The size of the code array was set to 116 elements. The results are given in graphs 1 and

These graphs clearly show that a sparse code is most effective. Typically the number of
elements set to one in the code should be set such that all the transformations can be given
a unique code. Furthermore, optimal assignment is preferable (see for one way to achieve

The results also show how the method degrades gradually as the number of elements in the
code increases.

4.2. Experiment 2
This evaluated the effect that the image density had on recognition performance. Between
0 and 500 bits were set in the 2000 bit image (0-0.25 saturation). Graph 2 shows results for
a code size of 135 elements and 2 bits set, and for a code size of 60 and 3 bits set.

With a 135 element code, the performance drops sharply for image densities greater than
0.15 (300 bits set). For a 60 element code this occurs at 0.1 (200 bits set). These results
show the system performing very well, and to expectations. The fall off in the response is
thought to be due to excessive autocorrelations occurring within the image.

4.3. Experiment 3
This final set of evaluations investigated the effect of the size of the code array. Evalua-
tions for 2 bits set in each code over the code array sizes 30 to 60. The image densities
were 0.075 and 0.25 for graphs 3 and 4 respectively. With an image density of 0.25, failure
to recognise the pattern is overwhelming. The results for 0.075 density show good recogni-
tion success (average 0.0175). The principal aim of these evaluations was to test the
hypothesis developed in Austin that variations in code size have little effect on perfor-
mance. This is borne out by the results.

5. Summary
This paper has presented initial results showing that the invariance pre-processor preforms
well on clean data. The next stage of the work is to present noisy data to the system and
test its performance. Initial results bear out theoretical predictions, in that the scheme is
very robust to additive noise.

Graph 1
The number of rotations correctly identified (as a proportion of the total number of rota-
tions) as a function of the density of the class array.
Class size: 116
Image size : 20x100 = 2000 points Image Density: 0.075
Number of Rotations : 20

To top