“The term texture generally refers to a repetition of basic texture elements called
texels. The texel contains several pixels, whose placement could be periodic, quasi-
periodic or random. Natural textures are generally random, whereas artificial textures
are often deterministic or periodic. Texture may be coarse, fine, smooth, granulated,
rippled, regular, irregular, or linear.”
(A. K. Jain: Fundamentals of Image Processing)
My independent study on the texture analysis project is a continuation of a series
of experiments that I have started the last semester. In the fall, I investigated black and
white photographs depicting mostly natural scenes. My goal was to understand whether
the analysis of texture or the roughness of surfaces could allow me to interpret an array of
gray-scale intensities, and separate the pictures into different regions. For my analysis, I
mainly used statistical methods and the Fast Fourier Transform to describe the input.
This semester, I concentrated my attention on five new images that I acquired from a
medical laboratory. [Figure 1] The black and white images depict rat abdomen segments
with serious lesions on the lower portion of the (medically prepared) animal organ.
Besides continuing my previous studies in describing and interpreting the notion and
characteristics of “texture”, my plan was to give an interpretation of the above mentioned
clinical images. One example of that latter goal could be to determine an estimate of the
area of the damaged tissues.
To distinguish the heavily damaged portions of the tissue from the rest of the
environment the following sequence of steps has to be completed:
separation of the object from the background,
separation of the upper and lower parts of the organ (the lesion is visible only on
the lower portion!),
the recognition and characterization of the small dark entities of lesions.
These procedures all involve several interesting problems related to computer vision. For
example: line searching, image preprocessing, and intensity histogram evaluation.
It was towards the end of the fall semester, when I started analyzing the output of
the Fast Fourier Transform. As the experiments seemed to be promising, and I was not
able to become familiar with all the aspects of that Mathematical formula, I spent almost
two months with creating test cases for my Fast Fourier Transform implementation this
spring. I wanted to be aware of how powerful this method could be as a first step of an
image processing routine.
An image is a spatially varying function. The Fast Fourier Transform examines the
spatial variations by decomposing an image function into sinusoidal (Fourier) functions.
It is a continuous function that can convert an intensity image (image expressed by the
gray scale values of its pixels) into the domain of spatial frequencies. In order to assure
myself about the correctness of the algorithm, I created some black and white images
with sinus waves and tested the outcome of the FFT code on them [Figure 2]. For
example, in case of these vertical sinus curves, the image of the FFT output is a
horizontal line. The latter always indicates the orthogonal direction to the main direction
on the original image. This approach allowed me to deeply understand the transform and
to realize what effect a change of periods and function coefficients have on the displayed
images. Increasing the period of the function would increase the frequency, and
changing the coefficients would merely modify the range of the function image. These
tests also helped me in forming justifiable estimations about what outcomes I could
possibly expect in case of random photographs.
In my project, the FFT pictures mainly consist of values of 255 and 0’s. The
explanation for that is the following: the output values of a Fourier transform vary over a
huge range. The lowest value is always approximately –9000 and the upper value is in
the 20000s. Thus, it is impossible to discard these extreme values and just calculate with
the rest of the data. Otherwise, too much information would be lost. Thus, in this way,
after normalizing the gray-scale values and typecasting the results into integers, the
majority of the data becomes zero or 255. That is the reason why usually only two values
are visible on the graphs.
I had an interesting way of implementing the Fast Fourier Transform. Instead of
processing one big portion of the input photograph as a whole, I divided it up into 300
sub-blocks, each of size 32 x 32 pixels. I was curious whether comparing the output of
the transform of these sub-images would let me describe similarities and differences
between different objects and surfaces. The results of this multiple FFT method clearly
separated the background from the object, and also indicated variances inside the
observed tissue. [Figure 3] The background, with low frequency, is represented merely
by a white dot in the middle of the x-y-coordinate system. All other regions with higher
variance acquire a different FFT output.
The next problem to attack was to find a way to compare these results and describe
their relationship to each other. At that time, I have just read about Pyramid Processing
and I thought that approach might be useful in my attempts.
A resolution pyramid is a generalized image data structure consisting of the same
image at several, successively increasing levels of resolution. The value at each pixel at
level j can be regarded as a weighted sum of the values of a small number of pixels below
it. The higher the resolution is, the more samples are required to represent the increased
amount of information. Hence, the higher the processing level is the larger its size
becomes. .) The finest level of the pyramid is the input image itself. The dimensions of
the representation arrays typically increase by a factor of 2 between adjacent levels of the
pyramid. (In the traditional version of the pyramid, this relationship is strictly one-to-
four.) The idea behind this procedure is to complete complex calculations on the lower
level and then refine the results by approaching the higher levels.
There are two ways of implementing this approach. One can reduce our image into a
lower level by sub-sampling the filtered image (e.g. selecting one random representative
pixel out of the 4), and then carrying out some complex computations on the smaller
image. Or this method can be reversed. The implementation should depend on the
weight of computations at each level.
I experimented with the first method. I took the original, 480 x 640 black and
white image and then arrived at a 160 x 120 representative by twice applying the
following average taking process. For each block of four pixels in the original image, I
calculated the average intensity and this value represented the 4 x 4 region on the “first-
step” of the pyramid. I repeated the same method to obtain the final, second-level image.
Before trying to analyze the low-level pyramid picture, I started implementing my
algorithms on the higher level, where even the fine changes were visible. I submitted the
regular sized image with 300 image blocks to the FFT, and when the results were
displayed on the screen. Then I applied one of my three comparison algorithms to
describe the characteristics of the output. I tried to find all the image blocks that
demonstrated similar properties and color them with the same gray-scale value. The
different matching procedures are described in the following.
1. My first method is essentially the same as growing regions in order to find all the
image blocks that belong together. I began my operation in the upper left corner of the
image. This image block is referred to as the “comparison base”. It is assigned the
highest possible gray-scale value, 255. After calculating the “distance” between the base
and another image block, a color value is assigned to the second one. If the calculated
distance exceeds a predetermined threshold, the color attribute of the second block
becomes the base color minus a constant. Otherwise it remains unchanged. This
procedure is repeated until the base block has been compared to all the rest of the images.
Then, I take the first image block possessing a lower intensity value relative to the base
one. Name it the “base” and its color the “base color”. Apply this process recursively
until no block with a lower intensity value (compared to the current base) can be found.
[Figure 5] Although the result of this procedure is satisfactory, and it does not depend on
the location of the first base block, it demands a lot of CPU time. This inefficiency,
however, can be corrected. Instead of decreasing the value of the compared block only
by a constant value, it should be decreased proportionally to the computed difference. In
this manner, it is enough to parse the input image only once (and not 255/constant times
in the worst case).
This enhancement has its drawbacks. If the first base block is not selected properly,
then the output of the comparison is less meaningful. As the title of “base block” is
assigned only once, the starting base block has to reside in an area of “interest” to our
experiments. Therefore, one should attempt to pick regions that are potentially rich in
information (e.g.: highly varying surfaces or borderlines of different objects) and try to
avoid areas of low frequency (such as backgrounds, or plain fields.) It is extremely
difficult to determine a completely general method for finding an adequate starting block
on a yet unseen image.
2. My second algorithm, similarly to the above-described one, assigns a specific
color value to a whole image block. But, instead of arbitrarily selecting this value, the
color attribute is computed. The algorithm focuses on intensity distribution. It calculates
the average intensity of the block, and estimates whether the FFT outputs are more
dominant along the x or y coordinate axis. This code provides a crude approximation for
calculating similarities between images. Its output is meaningful, nonetheless does not
contain a lot of information for further analyzing the image regions. [Figure 6]
3. The third method also calculates the distance between two image blocks. Here,
however, each block is compared to only two other blocks: the neighbor to the right and
the neighbor below. If their distance exceeds 15% of the sub-image content, then a
borderline is drawn between the blocks. (The percentage rate can be calculated as 15%
of the total number of pixels in the image block, as the graph essentially consists of two
gray-scale values: 0 and 255.) This method again clearly separates the object of primary
interest from the background, but a finer comparison step is still missing. [Figure 7]
At this stage of the experiments, it seemed to be difficult to make a progress by
merely applying the Fast Fourier Transform. Therefore, I started looking for other
The simplest procedure, that could quickly assist me in drawing a borderline around
the examined object was ad hoc determining a threshold value and painting every pixel
below that value to “x”, and all the other values to “y”. (The values for x and y are non-
negative integers between 0 and 255). This technique can provide one with bits of useful
information, but a lot of detail remains hidden. The fixed threshold value is also a serious
To be able to use a more general algorithm, it is necessary that the threshold value be
calculated based on the actual/current image characteristics. In order to gain more
information about the original image and its intensity distribution, one can construct a
gray-scale histogram. A gray-level histogram is a function that gives the frequency of
occurrence of each intensity value in the input image. The intensity histogram is a 255-
long integer array that contains, at each element, the number of pixels holding that value
on the image.
In case of the rat images, one can observe that, the majority of the picture elements
form clusters around a small percentage of intensity values. (Chart 1) This means that
the images could be, in fact, described by merely using a limited set of gray-scale values.
By examining the rat image intensity histograms, it is clearly visible, that the images
could be represented by merely three values. (That is to obtain a general characterization
of the input photograph.) The two local minimum values that separate the mostly
“favored” intensity values are clearly visible, (around 162 and 225 on Chart 1) but it is a
great challenge to find these computationally.
To ease the calculation task, and emphasize the high frequency (more “interesting”)
portions of the input array, I applied an image enhancement technique. Image enhance
techniques are used to focus on and sharpen image features for display and analysis.
They can be applied during pre- or post-processing. In computer vision, specifically,
they are used as a pre-processing tool, for example, to strengthen edges of an object. By
nature, these operations are application specific and require problem domain knowledge.
Therefore, they have to be developed empirically.
Enhancement methods operate in the spatial domain by manipulating the pixel data
or/and in the frequency domain by modifying spectral components. The operations can
be grouped into three major groups:
point operations: each pixel is modified according to an algorithm that is not
dependent on any other picture element’s intensity value
mask operations: the pixels are modified according to the intensity values of
the neighboring pixels
global operations: the pixel values are changed by taking all the pixel values
Gray-level scaling procedures belong to the category of point operations and operate
by changing the pixel values by a mapping equation. A mapping equation is generally
linear and maps the original gray-scale values to other, specified intensities. Some of the
most frequent operations are shrinking, stretching and intensity-scale slicing. They are
most frequently used for feature and contrast enhancement.
An alternate procedure to gray level scaling is histogram modification. Stretching
and shrinking operations also exist in this case, and they are often referred to as
histogram scaling. In general, an image with a wide spread of intensities has a high
contrast, whereas an image with its histograms clustered at the low end of the range is
black and the histogram with the values gathered at the high end of the range corresponds
to a bright image. The rat pictures belong to the latter category. Because of the high
percentage of white background region, there is great number of pixels gathering at the
high end of the range. Some histogram modification algorithms are histogram slide
(modification that retains the original relationship between the picture elements),
histogram equalization (technique that can be used for converting the intensity
distribution into as uniform as possible) and histogram specification (interactive way of
histogram manipulation). These histogram manipulation operations usually improve the
detectability of picture features by expanding the contrasts.
In my project, I defined a histogram modification algorithm myself. It is called
"color spreading", and it combines both a clipping and a stretching operation. Stretching
a histogram has the effect of increasing the contrast of a low contrast image. (The shape
of the original histogram, the relative distribution of the gray-scale intensities remains the
same.) The clipping method is also necessary, as there are some so-called “outliers”.
This expression refers to a small set of values that forces the histogram to span the entire
range. Clipping a small percentage of picture element values at the low and high end of
the range, in this case, can be extremely effective. My mapping equation is one that
allows stretching the values of a subinterval to the whole interval. To make the
distribution more even over the whole interval of intensity values, though, I apply the
clipping algorithm first. I trim off a certain percentage of the gray-scale values from the
two ends. I declare the min and max values for the algorithm at the gray-scale values
where the sum of the pixel elements, counted from the beginning of the interval, exceeds
the given percentage of the total number of pixels. Then the gray-scale intensities are
recalculated to cover the 0-255 range. [Figure 8 & Chart 2] This function should
theoretically be continuous, however, because of truncation defaults every third value is
evaluated as zero. This makes the already difficult local min-max search even more
convoluted. [Figure 10] shows the output of the color spreading operation followed by
the 3_pixel_comparison to draw the borderlines. Examining this image and [Figure 7],
one can see that the drastic difference between some of the sub-image blocks decreased
due to the histogram manipulation, and the borderlines between those blocks are not
depicted any more.
After a lot of trials, I found a way to approximate two local minimums on the
histogram that give sufficient results in all 5 cases of the rat images. Thus, I manage to
paint my images with three colors, black, gray and white. [Figure 9] Then I apply my
old 3_pixel_comparison algorithm, to paint the dividing borderline between all image
pixels that are of different values. The last step is to color all the borderlines with black
and the rest of the picture into white. One of the resulting images can be seen at [Figure
11]. With this method, not only I can distinguish the rat abdomen from the background,
but also I find an approximately continuous borderline to it.
From this stage, the next step is to separate the lower portion from the upper part with
a line and eliminate everything else in the picture. Then one should analyze the original
pixel array for that restricted data set to locate the lesions and calculate their area.
The most recent problem that I was working on is noise removal. The need for
this operation occurred to me after finally obtaining the rough outline of the examined
objects. [Figure 11] This operation is necessary as individual pixels or pixel groups can
often satisfy the coloring criteria, but they do not hold relevant information to my
analysis. Hence, my intention is to clear them from my output display. It is a
complicated problem to describe and locate these noise values, because it is impossible to
predict their shape, size and relationship to the “true” values in advance. I have two
functions written already: clearImage and clearSpots. In both cases, I use a window to
parse the whole image that is under analysis. The “clearImage” function handles a 3 x 3
window. The examined pixel is located in the top left corner. If that particular pixel is
black (indicating a borderline between different objects/regions) but all the other pixels in
the window belong to the background, the examined pixel is deleted. It is assigned the
background intensity value. The “clearSpots” algorithm applies a bigger window that is
7 x 7 in size. If there are any pixels in the 3 x 3 center of this block that vary form the
background color and all the surrounding picture elements do belong there, the middle
portion is cleared.
Both of these methods attack only small-sized noises. They are perfect tools for final
output refinement, but they are not sufficient for deleting bigger, isolated picture element
blocks. In my texture analysis, these two methods are called one after another. First the
bigger window is activated, and then the “clearImage” function deletes all the leftovers.
[Figure 12] I am satisfied with their performance. However, I am still looking for an
operation to separate the borderline of the area covered with the lesions.
As I have already hinted in the above sections, throughout this semester, I
encountered a lot of difficult problems. In the following, I will describe two of these, the
threshold and the min-max search in particular.
I always had concerns about using threshold values in my computations. It seems
that applying a threshold value just weakens an algorithm, as it implies some background
knowledge about the image being processed. In case I intend to write a general code (or
a program that can be used with photographs different from my 5 clinical images),
counting on some image specific features is be extremely limiting. For example, in case
of my rat images I could break down the histogram into three different sections that
represented the regions fairly well. However, applying the same operation to another
image, for example to a natural landscape, does not function properly. The latter scene
could have (and probably does have) a lot more variations overall, and the majority of the
picture is of higher frequency. Hence, it is not correct for me to assume that there exist
three main gray-scale values, which could represent the image at a base level. Therefore,
I have intended to create some algorithms that could be applied to a wide range of
images. These functions compute the threshold and limit values from features of the
original input photograph. One example of that is the comparison method called
compareFFTs. Here, I encounter the similarities between image blocks implicitly. I
calculate three characteristic features of an image (average value, row and column
distribution of white colored pixels), and then assign a gray-scale value computed from
these results to each image block. [Figure 6]
On the other hand, I have to admit that if I am working with only a small set of
images; it is unreasonable to give up any extra information that could improve my
The other problem that occupied my mind for a long time was the local min-max
search on a histogram. How can I determine the critical values on a given interval after I
have found one local minimum or maximum? How can I be assured that a “high” value
does not belong to the same sub-curve where the other maximum was found? My search
became even more difficult, as after the image pre-processing technique (which modifies
the histogram distribution) the histogram became discrete. Hence, I had to disregard
certain values in my comparison, and could not assume properties of a continuous
function any more. I tried to characterize certain intervals instead of just one array
element in the search, but this method did not turn out to be a lot more useful either.
Because of truncation and typecasting, the values belonging to an array element might
change drastically (not smoothly), so the characteristics of an interval is more
complicated to predict as well. Finally, my solution to the problem was to use some
heuristics. I implement some test cases examining the neighborhood of the potential
critical points, and then decide whether that candidate is probable to qualify (at least
theoretically) for being a local minimum or maximum on the given interval. That method
performs well on the set of images that I possess right now, however, I have not had a
chance to test it thoroughly in case of other types of photographs.
In summary, although I have not been able to fulfill all my plans this semester, I
became aware of many new image-processing methods and managed to obtain some
essential information after processing the input photographs. The final steps of my image
interpretation process are still missing, nevertheless, I have created a collection of
techniques (image preprocessing, the Pyramid Process, histogram manipulation, etc.) that
could prove to be useful in my further studies and experiments, too.
If I were to continue the texture analysis project, I know exactly what I would do
next. As the 3 color histogram analysis and the noise clearing functions already provide a
good approximation of the lower abdomen, I would carry on the project from their output
images. I would try to use the FFT results to get rid of the bigger sized unwanted group
of pixels. (This is because FFT proved to serve as good estimator of the boundaries.)
Also, applying the Fast Fourier Transform to smaller sub-images (e.g.: 8 x 8), might
result in even finer details. After obtaining the continuous boundary line of the lower
abdomen, I would restrict my image processing window size to one that just covers the
indicated region. (In this way I could save a lot of unnecessary CPU calculations on the
background.) Then I would start analyzing the inner region using the very initial values.
Throughout that process I might be able to incorporate the 2nd step pyramid image, too.
Carrying out calculations on that level, especially with the reduced region, would be
extremely fast. On the higher resolution image representatives, I would only have to
refine the results. After locating the lesions on the examined area, I would have to count
the pixels belonging to these damaged areas.