WAVELET-BASED TEXTURE RETRIEVAL AND MODELING VISUAL TEXTURE PERCEPTION
ZHOU SHAOHUA
NATIONAL UNIVERSITY OF SINGAPORE 2000
Founded 1905
WAVELET-BASED TEXTURE RETRIEVAL AND MODELING VISUAL TEXTURE PERCEPTION
BY
ZHOU SHAOHUA (B. Eng., Univ. of Sci. & Tech. of China) DEPARTMENT OF ELECTRICAL ENGINEERING
A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2000
Acknowledgment
I am indebted to my supervisor, Professor Y. V. Venkatesh, for his valuable advice on my research and my life as well. I received significant help during my pleasant stay at the Computer Vision Laboratory, Indian Institute of Science (IISc), Bangalore, from him and his colleagues, to whom I extend my sincere thanks.
I am also grateful to my supervisor, Professor C. C. Ko, for his sustained encouragement throughout my research, and the frequent seminars arranged by him, which stimulated my thoughts and enhanced my knowledge. I wish to thank my friends at the National University of Singapore (NUS), who helped me in innumerable ways during my study.
This is an occasion for me to express my special gratitude to my wife, Zhang Chunhui, for her patient and everlasting support and love.
i
Table of Contents
ACKNOWLEDGMENT TABLE OF CONTENTS ABBREVIATIONS AND SYMBOLS LIST OF FIGURES LIST OF TABLES SUMMARY CHAPTER 1 INTRODUCTION 1.1 Texture Retrieval 1.2 Visual Texture Perception 1.3 Organization of theThesis CHAPTER 2 MATHEMATICAL PRELIMINARIES 2.1 The Fourier Transfrom 2.2 The Gabor Transform 2.3 The Wavelet Transform 2.3.1 Wavelet Decomposition of Images
i ii iv vi ix v 1 3 5 9 10 10 12 14 18
CHAPTER 3 TEXTURE RETRIEVAL USING THE TREE-STRUCTURED WAVELET TRANSFORM 20 3.1 Tree-Structured Wavelet Transform 3.2 Retrieving Issues 3.2.1 Distance function for retrieval 3.3.2 Retrieval efficiency 20 23 23 24
ii
3.3 Experimental Results CHAPTER 4 SPECTRAL INTERPRETATIONS OF VISUAL TEXTURE PERCEPTION 4.1 Visual Texture with Complete Periodicity 4.2 Visual Texture with Positional Jitter 4.3 Visual Texture with Orientation Randomness 4.4 Visual Texture with Positional Jitter and Orientation Randomness
25
30 31 34 39 41
CHAPTER 5 EXPERIMENTS AND DISCUSSIONS ON VISUAL TEXTURE PERCEPTION 44 5.1 Experiment I: Periodicty and Randomness 5.2 Experiment II: Positional Jitter 5.3 Experiment III: Orientation Randomness 5.4 Experiment IV: Perceptual Asymmetry I 5.5 Experiment V: Perceptual Asymmetry II CHAPTER 6 CONCLUSIONS REFERENCES APPENDIX A FILTER COEFFICIENTS OF WAVELET TRANSFORMS APPENDIX B THE NOTATION OF SUBBAND ID IN TABLE 3.2 47 51 54 57 60 68 72 80 82
iii
Abbreviations and Symbols
Abbreviations
HVS TSWT MRF GPD GMRF SAR CWT MRA QMF PDF SNR
Human Visual System Tree-Structured Wavelet Transform Markov Random Field Gibbs Probability Distribution Gaussian Markov Random Field Simultaneous Autoregressive Continuous Wavelet Transform Multiresolution Analysis Quadrature Mirror Filter Probability Density Function Signal-to-Noise Ratio
iv
Important symbols
f (m, n)
image function Fourier transform of f (m, n) Gabor function Fourier transform of g ( x, y ) parameters of g ( x, y ) mother wavelet function scaling function orthonormal wavelet basis orthonormal basis of scaling functions wavelet packets basis impulse responses of QMF H Q and GQ . subband energy ranking order of subband single stimulus visual texture with complete periodicity visual texture with positional jitter visual texture with orientation randomness visual texture with positional jitter and orientation randomness newly defined SNR measures
F (u, v)
g ( x, y ) G( , )
x , y , f c , ,
(t )
(t )
{ k ,l (t )} { k ,l ( x )} {Wn }0 n
h(n) , g (n)
Es Rs
(m, n)
f 0 ( m, n )
f 1 (m, n) f 2 (m, n)
f 3 ( m, n )
SNRu , SNRv
v
List of Figures
Fig. 1.1
Examples of surface and visual textures. Surface textures (Brodatz‟s textures): (a) D15. (b) D84. (c) D100. Visual textures: (d) Random dots. (e) L-shaped stimuli embedded in +shaped stimuli. (f) L-shaped stimuli embedded in T-shaped stimuli. Examples of visual textures with perceptual asymmetry and symmetry. (a) Random +-shaped stimuli embedded in random L-shaped stimuli. (b) Nonrandom L-shaped stimuli embedded in nonrandom +-shaped stimuli. (c) Nonrandom +-shaped stimuli embedded in nonrandom L-shaped stimuli. Diagram of wavelet decomposition of an image. Examples of wavelet decomposition. (a) Lena. (b) Wavelet decomposition of Lena. (c) D21. (d) Wavelet decomposition of D21. (e) The arrangement of the four subbands. 3-level tree-structured wavelet decomposition of D21 with different values of : (a) =0.25 (b) =0.5 (c) =0.75. Examples of textures used in the database. From left to right: D6-1, D6-2, D24-2, and D95-1. Performance curves of Methods I and II. Generation of visual textures. (a) Single stimulus. (b) Complete periodicity. (c) Positional jitter. (d) Orientation randomness. (e) Both positional jitter and orientation randomness. Examples of visual textures with complete periodicity. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the centres of the spectra. (a) Single +-shaped stimulus. (b) Periodic +-shaped stimuli. (c) Periodic line-segment stimuli. The curves of versus u , with b 0 , v 0 , and N = 256. Examples of visual textures with positional jitter, orientation randomness, and both, using the +-shaped stimuli. From left to
2
Fig. 1.2
7 19
Fig. 2.1 Fig. 2.2
19
Fig. 3.1
22
Fig. 3.2
26 28
Fig. 3.3 Fig. 4.1
31
Fig. 4.2
33 36
Fig. 4.3 Fig. 4.4
vi
right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing the center of the spectra. (a) Positional jitter (a=b=4). (b) Orientation randomness. (c) Positional jitter (a=b=4) and orientation randomness. Fig. 4.5 Examples of visual textures with positional jitter, orientation randomness, and both, using the vertical line-segment stimulus. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing the center of the spectra. (a) Positional jitter (a=b=3). (b) Orientation randomness. (c) Positional jitter (a=b=3) and orientation randomness. Examples of visual textures used in experiment I. Positional jitter parameters a=b=4. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the center of the spectra. Random region size: (a) r=2; (b) r=8; (c) r=14. The curves of SNRu and SNRv versus region size r obtained in Experiment I. The four stimuli used in the experiment are +shape, vertical line-segment ('|'), T-shape, and L-shape. Examples of visual textures demonstrating horizontal and vertical visual perception. Positional jitter parameters a=b=4. Random region size r=8. (a) Vertical line-segment stimuli. (b) T-shaped stimuli. (c) L-shaped stimuli. See Fig. 5.1b for +shaped stimuli. Examples of visual textures demonstrating SNR declining trend. Positional jitter parameters a=b=4. (a) Completely periodic +-shaped stimuli as base image. (b)-(f) +-shaped stimuli with r=1, 2, 8, 9, and 10, respectively. Examples of visual textures used in experiment II. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the center of the spectra. Positional jitter parameters: (a) a=b=2 (b) a=b=6 (c) a=b=10.
42
43
Fig. 5.1
48
Fig. 5.2
49
Fig. 5.3
50
Fig. 5.4
51
Fig. 5.5
52
Fig. 5.6 The curves of and versus positional jitter parameter a (a=b) obtained in Experiment II. The four stimuli used in the experiment are +-shape, vertical line-segment ('|'), T-shape, and L-shape. 53
Fig. 5.7 Examples of visual textures used in Experiment III. The texture stimuli in the images (a)-(h) are „+‟, „+1‟, „+2‟, „+3‟, „L‟, „T‟, „H‟, and „5‟-shaped, respectively. 55 Fig. 5.8 Examples of visual textures used in Experiment IV. (a) Base image. (b)-(e) The first set of four test images used in Experiment IVa. (e)-(h) The second set of four
vii
test images used in Experiment IVb. The embedded stimuli in the center of (a)-(h) are „+‟, „+1‟, „+2‟, „+3‟, „L‟, „T‟, „H‟, and „5‟-shaped, respectively. 57
Fig. 5.9 The curves of and obtained in Experiment IV. Left: (a) The curves of and for Experiment IVa. Right: (b) The curves of and for Experiment IVb. 58 Fig. 5.10 Examples of visual texture used in Experiment IV. (a)(b) +/L combinations, with region size r=2 and 14, respectively; (c)(d)(e) L/+ combinations, with region size r=2, 8, and 14, respectively; (f)(g) L/+ combinations, with region size r=2 and 14, respectively; (h)(i) S/mS combinations, with region size r=2 and 8, respectively. 61 Fig. 5.11 The curves of and versus region size r obtained in Experiment V. 62 Fig. 5.12 Examples of the Gabor magnitude images of different images shown above: (a) Fig. 5.4c; (b) Fig. 5.4d; (c) Fig. 5.4e; (d) Fig. 5.7b; (e) Fig. 5.7h; (f) Fig. 1.1e; (g) Fig. 5.10d; (h) Fig. 1.1f; (i) Fig. 5.10i. 67
viii
List of Tables
Table 3.1 The subband energy values of Lena and D21.
21
Table 3.2 Subband energy values, ranking orders, and retrieval distances of textures D6-1, D6-2, D242, and D95-1. 26 Table 3.3 Retrieval efficiency of Methods I and II 27 Table 5.1 Examples of values in Experiment III. (Unit of data: dB.)
56
ix
Summary
Two topics related to texture images are examined in the thesis: texture retrieval and visual texture perception.
Texture (or surface texture) is an important visual cue in a wide range of natural images, and, in fact, serves as a reasonable and reliable tool for the purpose of contentbased retrieval. In this thesis, a new algorithm for texture retrieval using the treestructured wavelet transform (TSWT) is presented. Wavelet transform-based texture analysis, as found in the literature, uses subband energy values as features, but not the order of energy values. In fact, a textured image, after wavelet decomposition, yields an energy distribution, which can be rank-ordered with respect to the subbands. It has been found that the combination of subband energy values and their ranking orders leads to a more efficient texture retrieval mechanism.
Visual texture, on the other hand, is different from the aforementioned surface texture, the latter being more often referred to in the literature on image processing. In visual texture perception, a research area germane to both psychophysical and computational analysis, a class of visual textures consisting of repetitively placed micro-patterns or texture stimuli are used frequently for analysis. In an attempt to quantify visual texture perception, a new, unified framework is presented for extracting information from visual textures in the spectral domain. This framework enables us not only to explain many phenomena of visual texture perception (as characterized by the
x
parameters, viz., periodicity, positional jitter, orientation randomness, and perceptual asymmetry) but also to establish plausible relationships between the Signal-to-Noise Ratio ( SNR ) and these parameters, and simultaneously partition visual texture perception into three classes: spontaneous, semi-, and non-segregation. The perceptual boundaries are then extracted in the spatial domain using the Gabor transform.
The two problems considered in the thesis are related as follows: In many instances of image analysis, there arises a need to associate an unknown texture in a given image with a texture stored in an image library for purposes of classification. To this end, the human perceptual interpretation (like periodicity, randomness, coarseness, graininess, etc.) of a given texture is to be quantified by certain measures that can then be used to search for the matching texture in the image library.
In comparison with the conventional methods found in the literature, the proposed texture retrieval algorithm is more efficient due to the novel and successful utilization of the ranking orders of subbands. Also the proposed model for visual texture perception, which is quantitative and computationally simple, interprets the psychophysical performance of the human visual system (HVS) successfully, and demonstrates its practical advantage over others, in explaining the experimental results on periodicity, positional jitter, orientation randomness, and perceptual asymmetry.
xi
CHAPTER 1
Introduction
Texture-oriented research has aroused considerable interests in the past few decades. In the literature, essentially two different kinds of textures [1] ---- surface and visual ---- are examined. Surface texture (or, simply, texture), as a primary visual cue, is abundantly observed in natural images: for instance, satellite images obtained by remote sensing, and Brodatz's texture album [2]. Texture analysis, which is commonly found in the literature on image processing and computer vision, is expected to provide appropriate features for subsequent studies, such as content-based image retrieval, pattern recognition, and image segmentation. Visual texture, on the other hand, is synthetically generated, and has „an isolated perceptual quality, simplified for purpose of study‟ [1]. Research on visual texture, which is referred to as visual texture perception or visual texture discrimination, tends to extract or simulate the mechanism of preattentive vision of human visual system (HVS), thereby implying the effortless (or instantaneous) perception without local scrutiny, by psychophysical experiments or computational approaches. Fig. 1.1 gives some examples of natural (Brodatz‟s textures more specifically) and synthetic (visual) textures.
1
Chapter 1
Introduction
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1.1: Examples of surface and visual textures. Surface textures (Brodatz’s textures): (a) D15. (b) D84. (c) D100. Visual textures: (d) Random dots. (e) L-shaped stimuli embedded in +-shaped stimuli. (f) L-shaped stimuli embedded in T-shaped stimuli.
There are many types of texture stimuli characterizing the visual texture. Some typical examples include: random dots in line with differing probability distributions or stochastic processes [3-6], discs of varying sizes [7], sums of sinusoidal gratings [8– 10], small squares of different intensities on various backgrounds [11, 12], and, the most frequently studied, micro-patterns with geometric shapes [12-19].
In this thesis, both natural and visual textures are studied. In particular, a selected set of Brodatz‟s textures is studied in the part of texture retrieval, which is reported in [20]. Visual textures with geometric shapes, mainly made out of line segments, are studied in the part of visual texture perception, which is reported in [19].
2
Chapter 1
Introduction
1.1 Texture Retrieval
A tremendous number of images are being created and captured in this era of multimedia technology. How to search for user-specified images and retrieve them from such a huge image database are open and challenging questions. While normal literal descriptions provide some facilities, they are ineffective in most cases. Thus a retrieval mechanism based on pictorial content, namely content-based retrieval, as a natural and inevitable approach, is called for [21][22]. Typically, “contents” would include color, shape and texture (in natural and synthetic images), and motion (in a video sequence). Texture-based retrieval of an image database (or texture retrieval) is currently an active research branch of texture analysis, which has a long history itself.
A large number of approaches have been proposed in the literature on texture retrieval [20-26]. The strategy adopted in most cases is as follows: the given textural images are first analyzed and their representative features are extracted. These features are matched with the features in the library, thereby leading to the retrieval of the corresponding image(s). Pentland et al. [22] employ a set of interactive photobook tools for browsing and searching the face, shape, and texture databases. The features for textural images are extracted by a 2-D Wold decomposition model [23] (see also [27-29] for its principle and some other applications), according to which, if a textural image is assumed to be a homogeneous 2-D discrete random field, it can be decomposed into three mutually orthogonal components, namely a purely nondeterministic field, a generalized evanescent field and a harmonic field. Interestingly, this decomposition can be implemented via spectral operations (see Section 4.2 for detail).
3
Chapter 1
Introduction
Gimel‟farb and Jain [25] model the textural images using a Markov random field (MRF) with a Gibbs probability distribution (GPD). In [26], Manjunath and Ma filter the textural images with a bank of Gabor functions (or „Gabor wavelets‟), and then extract, from each filtered image, the statistical features for retrieval.
In principle, any approach to texture analysis can be applied for image retrieval because texture analysis, to some extent, means the extraction of certain features from the textural images. Relevant to this context, some representative approaches are presented in [30-46]. While earlier endeavors (see [30-35] for reviews) mostly concentrate on statistical, geometrical, and structural approaches, recently, modelbased [25, 36-39] and signal processing techniques [20, 22-24, 40-46] have also been explored. The common texture models found in the literature are: simultaneous autoregressive (SAR) model [36], Markov random field (MRF) [25][37][38], and Gaussian Markov random field (GRMF) [39].
Signal processing approaches have aroused considerable attention in the literature. For example, three different approaches using Fourier, Gabor and Wavelet transform have been proposed to analyze the textural images. The Fourier transform has a welldeveloped mathematical framework, and has been widely applied to many areas, including texture analysis [23][27][29][30][40]. The Gabor function, which is, in effect, a localized spatial filter, is generally acknowledged to mimic some characteristics of human cortical simple cells, and has been found to play a successful role in the area of texture analysis [41-43].
4
Chapter 1
Introduction
Due to its properties of space-frequency localization and multifrequency decomposition, the wavelet transform [47-50] has been applied to texture analysis [20][44-46]. In [44], a tree-structured wavelet transform (TSWT) has been developed corresponding to the characteristics of textural images. The features used for classification are the energy values of subbands but an important aspect of these energy values, namely their ranking order, has been left out. In this thesis, one of the main results is to include it for texture image retrieval from a library.
1.2 Visual Texture Perception
What is visual texture perception? Fig. 1.1 gives two examples of visual textures to illustrate this phenomenon. In Fig. 1.1e, the region filled with L-shaped stimuli segregates itself spontaneously (in pre-attentive vision) from the background with +shaped stimuli, thereby implying that the texture stimuli are viewed instantaneously or effortlessly. However, in Fig. 1.1f, the region filled with the L-shaped stimuli does not segregate itself spontaneously from the background with T-shaped stimuli, thereby demonstrating the need for a careful inspection stimulus by stimulus.
Julesz [3] first conjectured that textures with the same first- and second-order statistics could not be visually discriminated. Counter-examples [4] to this conjecture have been generated. In brief, the first- and second-order statistics are inadequate for discrimination because textures, which have the same first-, second-order statistics, can still be visually discriminated. Attempts are made in this thesis to interpret some of these findings in the spectral domain, thereby opening up the possibility of a new approach to model human texture (pre-attentive) perception.
5
Chapter 1
Introduction
Subsequently, Julesz [14][15] proposed the texton theory to explain this psychophysical phenomenon. In this theory, textons include elongated blobs (e.g. rectangles, ellipses, line segments with specific colors, angular orientations, widths, and lengths), terminators (e.g. ends of line segments), and crossings of line segments. According to Julesz [14], it is the first-order statistics of textons that are significant for preattentive perception. However, it appears that there is only a literal description of texton, and therefore it has been found to be difficult [51-53] to extract textons using computational approaches. What is more important is that this theory has been subjected to scrutiny, especially in reference to the terminators and crossings [18][52], without any satisfactorily quantitative results. However, Beck's explanation [16][17] is quite distinct from the above: he groups texture stimuli in terms of physically defined properties, such as brightness, color, movement, size and slopes of contours and lines of figures. In their recent paper [11], Beck et al. hypothesize that texture segregation arises from the different outputs of spatial frequency channels obtained from textures with inhomogeneity.
How to detect a texture boundary, motivated by the properties of human visual perception, is thus a crucial problem. There are a lot of computational models in the literature [54-59]. In these approaches, Gabor function [11][56-59] is widely used because of its jointly optimal resolutions in both the spatial and spectral domains (including orientation). Moreover, the Gabor function is believed to mimic the characteristics of cortical simple cells. In the thesis, the power of the Gabor function is also exploited to arrive at a perceptual boundary to provide a spatial interpretation.
6
Chapter 1
Introduction
(a)
(b)
(c)
Figure 1.2: Examples of visual textures with perceptual asymmetry and symmetry. (a) Random +-shaped stimuli embedded in random L-shaped stimuli. (b) Nonrandom Lshaped stimuli embedded in nonrandom +-shaped stimuli. (c) Nonrandom +-shaped stimuli embedded in nonrandom L-shaped stimuli.
Perceptual asymmetry is reported in [60-66]. One typical instance [64] pointed out by Gurnsey and Browse [64] is demonstrated in Figs. 1.1e and 1.2a. When a region filled with L-shaped stimuli is embedded in a region filled with +-shaped stimuli as shown in Fig. 1.1e, its discriminability of 0.93 (see [64] for its definition and computation) is much greater than that (0.53) of the inverse case, Fig. 1.2a.
Surprisingly, this asymmetry nearly vanishes with the replacement of the random stimuli in Figs. 1.1e and 1.2a with the nonrandom ones in Figs. 1.2b and 1.2c [60]. There are several approaches in the literature to explain this perceptual asymmetry:
1) The texton theory [14] can also be invoked to explain this phenomenon. The stimuli with a larger number of textons will be more salient, thereby indicating that a human is excited by a stronger visual interaction.
2) The stimuli characteristics, including the element intensities, positional jitter, and orientations (vertical vs. near-vertical lines [62, 63]), are attributed to this
7
Chapter 1
Introduction
asymmetry. In the case of positional jitter [64, 65], the more jittery the stimulus is, the more salient it appears.
3) Rubenstein and Sagi [60,61] explain this asymmetry using the Gabor filter, and extract (what they call) 'wavelength-dependent noise' [61] for illustration. (In contrast, the results presented in the thesis are more quantitative and simpler to derive.)
However, there is one common fact which can be observed in the literature: most of the visual phenomena are described literally and supported psychophysically, but computational evidences which successfully justify them are not often available. How to interpret these visual phenomena, using appropriate features extracted from the texture stimuli, is an important and challenging research problem. In this context, one successful result is Malik and Perona's algorithm [54]. The features extracted by their model match the psychophysical data obtained by Gurnsey and Browse [64], and Krose [67]. On the other hand, the second-order statistics used in [3] here are derived through global operations in the spectral domain, which seem to throw away a lot of useful information.
In the thesis, a new mechanism is proposed to extract quantitative features in the spectral domain, and explain some phenomena about visual texture perception, such as periodicity, positional jitter, orientation randomness, and perceptual asymmetry. As significantly, the features so extracted interpret these psychophysical phenomena well. Since any modeling of human perception entails quantification of such phenomena
8
Chapter 1
Introduction
(like boundaries), the perceptual boundary is detected, using the Gabor transform, in the spatial domain.
1.3 Organization of the Thesis
The thesis is organized as follows. In Chapter 2, the Fourier, Gabor and wavelet transforms and some of their (spatial and spectral) properties relevant to this study are briefly reviewed. The main results of the thesis are found in Chapters 3-5. In Chapter 3, the tree-structured wavelet transform (TSWT) algorithm and its experimental results on retrieving a selected set of Brodatz‟s textures are presented. In Chapter 4, some visual phenomena, such as periodicity, positional jitter, and orientation randomness, including combinations of the latter two, are quantitatively interpreted using Fourier spectral analysis. Chapter 5 contains the experimental results and discussions on visual texture perception, along with the detection of perceptual boundaries using the Gabor function. Finally, Chapter 6 concludes the thesis.
9
CHAPTER 2
Mathematical Preliminaries
In this chapter, the theorems of the Fourier, Gabor and wavelet transforms are briefly reviewed. Also, some of the properties of the Fourier and Gabor transforms related to the thesis, and the application of the Wavelet transform to image decomposition are presented.
2.1 The Fourier Transform
For an image { f (m, n)}, m 0, 1, ..., M ; n 0, 1, ..., N 1 , (without loss of generality, and for simplicity, M N ), its discrete 2-D Fourier transform is defined by
F (u, v) 1 N
m0 n 0
f (m, n) exp( j
N 1 N 1
2 (mu nv) ), N
u, v 0, 1, ..., N 1 .
(2.1)
Its power spectrum is defined by
F (u, v) F (u, v) F (u, v) ,
2
(2.2)
where (the bar, -) means complex conjugate.
10
Chapter 2
Mathematical Preliminaries
Its inverse Fourier transform is given by
f (m, n) 1 N
F (u, v) exp( j
u 0 v 0
N 1 N 1
2 (mu nv) ). N
(2.3)
Property 1: Periodicity
F ( N u, N v ) F ( u, v ) f ( N m, N n ) f ( m, n )
(2.4)
Here it is assumed that the original domain is periodically extended, and this extension is constrained by Eq. (2.4). Also, it is worthwhile to point out that, taking the 1-D discrete case for simplicity, the frequency range of the whole spectrum is ( , ] with 0 at the end of low frequency and at the end of high frequency. Herein the argument u, v in F (u, v ) corresponds to the frequency component 2u / N , 2v / N . The larger the number N is, the better the discrete spectrum approximates to continuous case. In practice, the DC component F (0,0) is shifted to the center
[ N / 2, N / 2] in order to obtain a better viewing result of the spectrum with low
frequencies near the center and high frequencies near the edges and with symmetry (Property 2, see below) as well.
Property 2: Symmetry (for real images) For real images satisfying f (m, n) f (m, n) , the following holds.
F ( N u, N v) F (u, v) .
(2.5)
Thus, the power spectrum satisfies F ( N u, N v ) F (u, v ) .
2 2
(2.6)
11
Chapter 2
Mathematical Preliminaries
It means that the Fourier spectrum of real images is symmetric about the center of the spectra. Also, this property constrains F (0,0) , F (0, N / 2) , F (N / 2,0) and
F ( N / 2, N / 2) to be real.
Property 3: Parseval theorem
m 0 n 0
N 1 N 1
f ( m, n )
2
1 N 1 N 1 2 F ( u, v ) 2 N u 0 v 0
(2.7)
This relates the energy of the image in the spatial domain with that in the spectral domain. Alternatively, Fourier transform can be interpreted as a rotation of the basis coordinates, and the value F (u, v ) as the projection of f (m, n) using the new basis.
2.2 The Gabor Transform
The Gabor function is defined by:
g ( x, y)
1 x' 2 y ' 2 exp( ( 2 2 )) exp(2jf c ( x' cos y ' sin )) (2.8) 2 x y 2 x y 1
where ( x' , y' ) is the rotated coordinate of ( x, y ) , i.e.,
x' x cos y sin . y ' x sin y cos
(2.9)
The parameters, x and y , control the spatial and spectral widths of the function; the parameters, f c and , control the frequency shift in the spectral domain (see Eq. 2.10 below); and the parameter controls the rotation of the coordinate.
12
Chapter 2
Mathematical Preliminaries
Its Fourier transform is given by:
1 ( ' f c cos ) 2 ( ' f c sin ) 2 G( , ) exp( ( )) , 2 2 2
where
(2.10)
1 2 x
,
1 2
y
' cos sin v' sin cos
.
(2.11)
Thus, the Gabor function has the property of finite effective width in both the spatial and spectral domains that (viz., the property) is of relevance to texture analysis, especially texture segmentation since different textures tend to concentrate, in many cases, their significant energies into certain narrow ranges of frequencies [41] [42].
Here only the continuous case has been discussed. For the discrete case, it should be noted, sampling should be dense enough to avoid aliasing. In the experiments reported in the thesis, sampling is dense enough since only a slow-varying frequency
f c (i.e. low frequency) is chosen as the parameter. Because g ( x, y ) is a complex
function, its convolution with a real image yields a complex image. For simplicity, only the magnitude of this result is presented.
How to choose the parameters (viz., x , y , f c , , and ) to get a better (discrimination) result is an important problem [69][70]. In the experiments on visual texture perception reported in the thesis, only one simple Gabor function is used for illustration, and its parameters are: x y 16 , f c 0.125 , and 0 . The parameters, x y 16 , are chosen in order to cover range of the width of one
13
Chapter 2
Mathematical Preliminaries
stimulus; the parameter, f c 0.125 , e.g., the second harmonic frequency component, is chosen because the first harmonic frequency component ( f c 0.0625 ) is too strong, making it difficult to highlight the boundary; and the parameters, 0 , are chosen without loss of generality. (Examples of the Gabor filtered images are provided in Fig. 5.12 in Chapter 5.)
2.3 The Wavelet Transform
Given a real, mother wavelet function (t ) L2 ( R) , which satisfies the admissibility condition,
C | ( ) |2 | | d ,
(2.12)
where ( ) is the Fourier transform of (t ) , the scaling parameter p and the shifting parameter q are introduced to construct a family of basis functions or wavelets
{ p , q (t )}, p, q R, p 0 , where
p , q (t )
1 tq ( ). p | p|
(2.13)
If ( ) has sufficient decay, the admissibility condition reduces to the constraint that
(0) 0 or
(t )dt (0) 0 .
(2.14)
Because the Fourier transform is zero at the origin, and the spectrum decays at high frequencies, the wavelet behaves as a bandpass filter.
14
Chapter 2
Mathematical Preliminaries
The continuous wavelet transform (CWT) of a function or a signal x(t ) L2 ( R) is then defined as
X CWT ( p, q)
x(t ) p , q (t )dt ,
(2.15)
and the inverse wavelet transform is given by
x(t )
1 C
X CWT ( p, q) p , q (t )
dpdq . p2
(2.16)
Since { p , q (t )} is a redundant basis set, it implies that, by discretizing the scaling and shifting parameters as follows:
p 2 k , q 2 k l, k , l Z ,
(2.17)
an orthonormal basis { k ,l (t )} is obtained,
k ,l ( x) 2k / 2 (2k x l ), k , l Z ,
which comply with the condition,
(2.18)
k ,l (t ) k ',l ' (t )dt (k k ' ) (l l ' ), k , k ' , l , l ' Z .
(2.19)
The wavelet transform is an appropriate tool for multiresolution analysis (MRA) [47]. In MRA, a scaling function (x) and the associated mother wavelet function
(x) , which are needed in the construction of a complete basis (see below), must
satisfy the two-scale difference equations:
( x) 2 h(n) (2 x n)
n
( x) 2 g (n) ( 2 x n)
n
(2.20)
where the coefficients h(n) and g (n) satisfy the following:
g (n) (1) n h(1 n)
(2.21)
15
Chapter 2
Mathematical Preliminaries
Similar to the construction of { k ,l (t )} , a family of orthonormal basis { k ,l ( x )} can be obtained through translation and dilation of the kernel (x) .
k ,l ( x) 2k 2 (2k x l ), k , l Z
(2.22)
A series of nested subspaces V j , which is spanned by the orthonormal basis
{ j ,k ( x)}, k Z , forms a multiresolution space. The subspace W j , which is spanned
by the orthonormal basis { j ,k ( x)}, k Z , is the complementary space of V j in the subspace V j 1 .
V j 1 V j W j
(2.23)
The subspaces V j and W j are called the approximate and residue spaces respectively at resolution j .
For the discrete case, the coefficients h(n) and g (n) also play an important role because the continuous forms of (x) and (x) can be neglected, and the coefficients can be directly applied to the discrete signal using the following iterations:
a j 1 (n) a j (k )h(k 2n) r j 1 (n) a j (k ) g (k 2n)
k k
(2.24)
where a j (n ) and r j (n ) are the coefficients at resolution j .
Thus, for a J –level discrete wavelet decomposition of the given coefficients
a 0 ( n) , a series of coefficients, { a J (n) , rJ (n ) , …, r1 (n) }, is obtained. Because of the
orthonormality of the wavelet transform, the number of coefficients after the
16
Chapter 2
Mathematical Preliminaries
decomposition is equivalent to that before decomposition. The synthesis of the signal from the wavelet coefficients obeys the following:
a j (n) a j 1 (k )h(k 2n) rj 1 (k ) g (k 2n)
k k
(2.25)
~
~
If h(n) and g (n) are defined as
h(n) h(n),
~ ~
g ( n) g ( n) ,
(2.26)
and regarded as the respective impulse responses of the quadrature mirror filters (QMF), H Q and GQ , which correspond to the halfband lowpass and highpass filter respectively, Eq. (2.24) can be conveniently implemented using convolution and downsampling (here the downsampling rate is 2 which implies that every other sample is omitted) techniques in the signal processing literature. Refer to Fig. 2.1 below for the diagram of the implementation for the 2-D case. If the wavelet decomposition is recursively applied to the output of the lowpass filter, a pyramid-structured decomposition is obtained.
It is interesting to note that the wavelet framework also links the tree-structured wavelet transform with wavelet packets. The library of wavelet packet basis functions
{Wn }0 can be obtained from a given W0 as follows: n
W2 n 2 h(k )Wn (2 x k )
k
W2 n 1 2 g (k )Wn (2 x k )
k
(2.27)
where the functions W0 and W1 are set to the scaling function (x) and the mother wavelet function (x) , respectively. Then, the functions Wn (2k x l ), k , l Z , n N form the orthogonal wavelet packet basis. The implementation of the wavelet packets
17
Chapter 2
Mathematical Preliminaries
will naturally lead to a tree-structured decomposition, thereby implying that both the outputs of the lowpass and highpass filters are recursively decomposed.
2.3.1 Wavelet Decomposition of Images
It is easy to extend the above 1-D derivations to the 2-D case. Thus, for an image, the 2-D filter coefficients can be expressed as
hLL (m, n) h(m)h(n), hLH (k , l ) h(k ) g (l ), hHL (m, n) g (m)h(n), hHH (k , l ) g (k ) g (l )
(2.28)
where the first and second subscripts denote, respectively, the lowpass and highpass filtering along the row and column directions of the image.
As far as computation is concerned, due to the separability of the filters, the wavelet transform can be implemented (convolution and downsample) along the rows and columns separately. Fig. 2.1 shows how to implement the wavelet decomposition of an image. After the decomposition, four subbands, LL, LH, HL and HH subbands, which represent the average, horizontal, vertical, and diagonal information respectively, are obtained. Fig. 2.2 gives two practical examples of wavelet decomposition where the Haar wavelet is chosen. (For the sake of completeness, the coefficients of the Haar transform are given in Appendix A.)
The mathematical preliminaries have been reviewed in this chapter. The next chapter (Chapter 3) will cover the experiments on texture retrieval using the treestructured wavelet transform, and Chapters 4 and 5 will deal with the interpretations of visual texture perception based on the Fourier and Gabor transforms.
18
Chapter 2
columns rows columns h(-n) h(-n) f(m,n) !2 g(-n) h(-n) g(-n) !2 g(-n)
Mathematical Preliminaries
rows
!2
LL subband
!2 !2
LH subband HL subband
!2
HH subband
Figure 2.1: Diagram of wavelet decomposition of an image. The block (!2) means the downsampler with the sampling rate equal to 2.
(a)
(b)
(c)
(d)
(e)
Figure 2.2: Examples of wavelet decomposition. (a) Lena. (b) Wavelet decomposition of Lena. (c) D21. (d) Wavelet decomposition of D21. (e) The arrangement of the four subbands. (The respective coefficients of LH, HL, and HH subband are linearly scaled to the range [0,255] for display.)
19
CHAPTER 3
Texture Retrieval using the Tree-Structured Wavelet Transform (TSWT)
In this chapter, the motivation for the application of TSWT to a textural image, as well as the corresponding algorithm of TSWT, is presented, and two issues regarding texture image retrieval, namely distance function and retrieval efficiency, are analyzed. Experimental results are also given in order to demonstrate the superiority of the proposed scheme.
3.1 Tree-Structured Wavelet Transform (TSWT)
The motivation for the application of TSWT to a textural image arises from the contents of Table 3.1, which lists the subband energy values of the ordinary image (e.g. Fig. 2.2a Lena) and the textural image (e.g. Fig. 2.2c D21). For a subband
f s ( x, y ) , with 1 x X , and 1 y Y , its energy, Es, is defined as follows:
Es
1 XY
| f
x 1 y 1
X
Y
s
( x, y ) |
(3.1)
20
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
Energy Es Lena (1) Lena (2) D21 (1) D21 (2)
LL 134.37 33.57 75.78 40.76
LH 2.52 2.52 17.69 17.69
HL 3.72 3.72 23.22 23.22
HH 1.95 1.95 8.01 8.01
Ratio (*) 2.77% 11.08% 30.64% 56.97%
(*): This is the ratio of the second maximum subband energy to the maximum one (1): No image mean removal before wavelet decomposition (2): Image mean removal before wavelet decomposition
Table 3.1: The subband energy values of Lena and D21.
In Table 3.1, it is easy to observe the following by comparing the energy values and the ratios provided. For the ordinary image, the energy is mainly concentrated in the LL subband after wavelet decomposition. However, this is not the case for the textural image, and the energy is distributed all over the subbands since the high frequency components abound in the textural image. Bases on this observation, it is reasonable to decompose (i) the ordinary image using the pyramid structure, and (ii) the textural image using the tree structure.
The algorithm of the tree-structured wavelet transform (TSWT) consists of the following steps:
1) Decompose a given image into four subbands.
2) Calculate the energy of each subband, using Eq. (3.1), and find the maximum energy value Es m ax of the same scale.
21
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
3) Given a constant in the range [0,1], if the energy of a subband is insignificantly small such that Es * Es m ax , then stop the decomposition of this subband; otherwise continue decomposition in this subband, and return to Step 2).
It is obvious that if =0, 0< <1, and =1, then what is implemented is, respectively, a wavelet-packet, tree-structured, and pyramid-structured wavelet decomposition.
In practice, when to stop the decomposition is determined not only by the subband energy but also by the level of decomposition. An image of 2 n 2 n can be decomposed to at most n levels. However, if the decomposed subband has a very narrow size, its location and the energy value may change greatly from sample to sample, which makes this feature not so robust. A three or four level decomposition is expected to lead to a procedure for extraction of more robust features.
(a)
(b)
(c)
Figure 3.1: 3-level tree-structured wavelet decomposition of D21 with different values of : (a) =0.25 (b) =0.5 (c) =0.75.
22
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
Fig. 3.1 shows a 3-level tree-structured wavelet decomposition of D21 with different values of . The less the value of is, the more will be the number of subbands. In order to arrive at a wider energy distribution over subbands, a reasonable choice of is less than 0.3.
3.2 Retrieval Issues
As stated in the introduction, texture retrieval means a matching procedure in the feature space after the features are extracted. Therefore, how to measure the similarity (distance) in the feature domain and the efficiency of the retrieval turn out to be very significant issues requiring careful consideration. It has been found that no identical measures to those presented in the thesis exist in the literature.
3.2.1 Distance function for retrieval
With the energy values and ranking order of subbands available, the retrieval distance function df is defined as follows:
df dEs * dRs
(3.2)
where dEs means Euclidean distance of energy values of subbands, and dRs means Euclidean distance of ranks of subbands.
For example, given an input image, select the subbands with top m energy values as feature, and denote the energy vector by Es ( Es1 , Es 2 , ..., Es m ) , and the rank vector by Rs ( Rs1 , Rs 2 , ..., Rs m ) . For the input image, Rs i i , Rs (1, 2, ..., m) . For
23
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
image k in the database, obtain the energy values of the same subbands and their corresponding ranks, and denote the energy vector by Es k ( Es k ,1 , Es k , 2 , ..., Es k ,m ) , and the rank vector by Rs k ( Rs k .1 , Rs k , 2 , ..., Rs k .m ) . The definitions of dEs and dRs are as follows:
dEs
1 m ( Esi Esk ,i ) 2 m i 1
(3.3)
dRs
1 m (| Rsi Rsk ,i | 1) 2 m i 1
(3.4)
If the ranking orders of the two images are identical, the rank distance dRs reaches its minimum value of 1.
3.2.2 Retrieval Efficiency
In [23][24], a recognition rate is presented which is as follows: Suppose there are
nc * d images, where n c is the number of textures, and d is the number of images
belonging to one same texture. Provided an input image t1 belongs to texture t , an ideal retrieval result would be that all the images in (t1 , t 2 , ..., t d ) are the first d retrieved images, closest to the input image in terms of a distance function [23][24]. It is then concluded that 100% retrieval is obtained if all images in (t1 , t 2 , ..., t d ) are falling in the first n f (n f d ) retrieved images.
Thus, two parameters can be now used to characterize the retrieval efficiency:
24
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
1) The ratio nin / d at a given n f , where nin is the number of images from
(t1 , t 2 , ..., t d ) falling in the first n f retrieved images. The closer the ratio
nin / d approaches 100% at a given n f , the more efficient the retrieval is. In
particular, when n f d , this measure indicates an initial retrieving performance.
2) The minimum value of n f (n f d ) , n f m in , guarantees 100% retrieval. Since the search has been confined to the smallest number of images, this is also a measure of the speed of „convergence‟. The smaller n f m in is better, and the minimum value of n f m in is d.
3.3 Experimental Results
In the image database used for experimentation, there are 480 images of size, 128x128, containing 30 textures [44] (refer to Table 3.3 for details), each of which is represented by 16 images. The mean of each image is removed before applying the TSWT. Daubechies‟ 16-tap filter coefficients (see Appendix A for their values) have been adopted in the 4-level wavelet packets decomposition, and the top 5 subbands have been used as features.
Some images of these textures are displayed in Fig. 3.2, and their corresponding subband energies, ranking orders, and distances are presented in Table 3.2. Here D6-1 is used as input image. If only dEs is used as a distance function, the sequence of the retrieval is {D6-1, D24-2, D6-2, D95-1}. What is unsatisfactory here is that D24-2 is
25
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
retrieved before D6-2, which belongs to the same texture as D6-1. However, if df is used as distance function, the sequence turns into {D6-1, D6-2, D24-2, D95-1}, which is more compatible with human visual interpretation.
Figure 3.2: Examples of textures used in the database. From left to right: D6-1, D6-2, D24-2, and D95-1.
Texture Subband ID$ 1331 1134 1131 1314 1114 dEs | dRs Df = dEs * dRs
D6-1
D6-2
D24-2
D95-1 Rank 10 14 9 46 8 20.9
Energy Rank Energy Rank Energy Rank Energy 190.7 190.2 166.2 153.0 143.8 0.0 0.0 1 2 3 4 5 1.0 132.7 104.5 156.1 126.7 178.0 50.3 194.9 3 7 2 4 1 3.9 181.2 145.5 181.2 108.5 121.9 31.0 299.0 4 8 3 19 16 9.6 92.8 65.9 94.2 17.7 102.9 100.2
2093.0
Table 3.2: Subband energy values, ranking orders and retrieval distances of textures D6-1, D6-2, D24-2, and D95-1.
$
Refer to Appendix B for the notation of subband ID. 26
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
Texture ID D3 D4 D6 D9 D11 D16 D19 D21 D24 D29 D34 D36 D52 D53 D55
Method-I
nin / d
n f m in
Method-II
nin / d
n f m in
Texture ID D57 D65 D68 D74 D77 D78 D79 D82 D83 D84 D92 D95 D102 D103 D105 Overall
Method-I
nin / d
n f m in
Method-II
nin / d
n f m in
56.6% 59.4% 98.4% 59.8% 99.6% 100% 74.6% 100% 78.9% 82.4% 94.1% 77.3% 67.6% 100% 90.2%
170.4 46.1% 153.0 67.6 16.3 58.2 16.1 16.0 35.8 16.0 38.1 28.9 17.8 64.1% 66.0% 51.2% 96.1% 100% 68% 84.4% 70.7% 68.4% 51.2% 69.2 40.6 70.9 18.1 16.0 53.0 35.2 45.8 37.0 86.7 58.8
80.1% 56.3% 72.7% 85.6% 99.6% 54.7% 73.4% 97.7% 95.3% 98.4% 54.7% 78.1% 69.5% 99.6% 100% 81.8%
48.2 31.4 38.0 21.9 16.1 66.4 28.6 17.1 20.8 16.4 68.3 26.0 29.7 16.3 16.0 41.9
73.4% 56.3%
58.6 45.5
60.6% 101.3 82.8% 100% 23.6 16.0
25.4% 228.8 47.7% 94.9% 75.0% 97.3% 45.7% 77.0% 75.8% 100% 100% 72.5% 95.5 22.1 49.0 16.6 79.8 38.9 28.5 16.0 16.0 60.5
159.2 76.2%
100.6 42.6% 240.7 16.0 38.3 99.6% 78.5% 16.1 36.9
Table 3.3: Retrieval efficiency of Methods I and II.
27
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
Two methods for texture image retrieval are proposed: In the first method (Method-I), the distance function, df , which is defined in Eq. (3.2), is used. In the second method (Method-II), only the Euclidean distance of energy values, dEs, which is defined in Eq. (3.3), is used. Table 3.3 shows the retrieval efficiency of both the methods. It should be noted here that, for each texture, the values of nin / d and n f m in are the averages of 16 images belonging to this texture. It is evident that the overall performance of Method-I is much better than that of Method-II.
In Fig. 3.3, the abscissa represents the number n f , and the ordinate, the corresponding ratio nin / d . Method-I approaches 100% much more quickly than Method-II.
100% 95% 90% 85% 80% 75% 70% 10 25 40 Method-I 55 Method-II 70 85 100
Figure 3.3: Performance curves of Methods I and II.
28
Chapter 3
Texture Retrieval using the Tree-Structured Wavelet Transform
However, for some textures, such as D4, D102, Method-I turns out to be not better than Method-II. On a detailed study, it has been found that, after the wavelet decomposition of these textures, the energy is almost evenly distributed at the top 10 to 15 subbands. The energy of 10th subband is found to be, in many cases, even more than 70% of that of 1st subband. This means that a little variation between the samples could give rise to large differences between their energy distributions of the top subbands.
This chapter has presented the results of some experiments on texture retrieval whose efficiency is much better than that of conventional methods in the literature. From the next chapter on, the modeling of visual texture perception will be addressed.
29
CHAPTER 4
Spectral Interpretations of Visual Texture Perception
In this chapter, experiments (related to the modeling of human perception) are conducted on certain visual textures that are generated synthetically. The corresponding images include visual textures with complete periodicity (Section 4.1), with positional jitter (Section 4.2), with orientation randomness (Section 4.3), and with the combination of the latter two (Section 4.4). Attempts are made to establish a relationship between preattentive perception and spectral properties. The generation of these visual textures is illustrated in Fig. 4.1.
The sizes of all the synthesized images displayed below, including visual texture, Fourier spectra, and others, are 256 by 256. (For conserving space, they are printed here with 50% zoom.) All the Fourier spectra are normalized within the range [0,255] to give a better viewing performance, and the power spectrum value | F (u, v ) |2 , for display convenience, is replaced by its log-processed value, log10 (1 F (u, v) ) , before normalization. As far as the figures of images and their Fourier spectra are concerned,
30
Chapter 4
Spectral Interpretations of Visual Texture Perception
it is to be noted here that the original images (of size 256 by 256) have been reduced by half, and printed using a laser printer of 650 dots per inch resolution. Therefore, in many cases it is found that the bright spots in the Fourier spectrum are not explicitly visible in the printouts.
(a)
(b)
(c)
(d)
(e)
Figure 4.1: Generation of visual textures. (a) Single stimulus. (b) Complete periodicity. (c) Positional jitter. (d) Orientation randomness. (e) Both positional jitter and orientation randomness.
4.1 Visual Texture with Complete Periodicity
A completely periodic visual texture is completely defined by its stimulus and its placement rule, which must be periodic. Suppose there is a texture stimulus defined by
(m, n), m 0, 1, ...,
N N 1; n 0, 1, ..., 1 , and its corresponding Fourier transform S T
31
Chapter 4
Spectral Interpretations of Visual Texture Perception
defined by (u, v), u, v 0, 1, ..., N 1 . As shown in Fig. 4.1b, a periodic visual texture
f 0 ( m, n) can be created by repetitively placing S stimuli along the m direction, and
T stimuli along the n direction in the spatial domain. The new visual texture f 0 ( m, n)
can be written as follows:
f 0 (m, n) (m
s 0 t 0 S 1 T 1
sN tN ,n ) S T
(4.1)
Its Fourier transform F0 (u, v) can be shown to be:
F0 (u, v) ST(u, v) (u
sN tN , v ), s 0, 1, ...,S 1; t 0, 1, ...,T 1 S T
(4.2)
Due to the sampling problem, Eq. (4.2) holds only if S and T are integral factors of N . In all the images of visual textures used in this paper, S T 16 .
The Fourier transform of a periodic visual texture is a combination of 2-D functions, and the impulse intensity at each meaningful point (u, v ) is equal to
ST(u, v) . The Fourier transform is observed to contain only bright spots (harmonic
components, which are formed by repetitive placement of stimuli in the spatial domain). Fig. 4.2 gives examples of a single stimulus and stimuli with periodicity, along with their corresponding Fourier spectra.
In Fig. 4.2a, it can be observed that there exist dark visual contours formed by the frequency elements with low values, while these contours disappear in Figs. 4.2b and 4.2c because of the dotted distribution. In order to observe the corresponding relationship between the single stimulus and the completely periodic stimuli, two
32
Chapter 4
Spectral Interpretations of Visual Texture Perception
cross-section lines along u and v directions, passing through the center of the spectra, are introduced. It is noticed that, with an increase in frequency, the values of the components decrease, thereby making the first and second harmonic elements more important. Also it is noticed that there is a resemblance between two cross-section lines, arising from the symmetry of the +-shaped stimulus in Fig. 4.2b. If the +-shaped stimulus is changed into another stimulus without symmetry, e.g., the line-segment stimulus in Fig. 4.2c, this resemblance is lost subsequently.
35.9
35.9
(a)
9180 6147 9180 6417
(b)
5100 5100
5002
2415
(c) Figure 4.2: Examples of visual textures with complete periodicity. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the centres of the spectra. (a) Single +-shaped stimulus. (b) Periodic +-shaped stimuli. (c) Periodic line-segment stimuli.
33
Chapter 4
Spectral Interpretations of Visual Texture Perception
4.2 Visual Texture with Positional Jitter
How to generate a visual texture with positional jitter is shown in Fig. 4.1c. Two random variables rm and rn are introduced to denote the jittered shift along m and n directions, respectively. Suppose that rm and rn obey uniform distributions.
1 , r a, a 1, ...,0, ..., a 1, a p( rm ) 2a 1 m 0, otherwise 1 , r b, b 1, ...,0, ..., b 1, b p( rn ) 2b 1 n 0, otherwise
(4.3)
(4.4)
The new visual texture f1 (m, n) and its Fourier transform F1 (u, v ) can be written as follows:
f1 (m, n) (m
s 0 t 0 S 1 T 1
sN tN rm , n rn ) S T
(4.5)
F1 (u, v) exp( j (
s 0 t 0
S 1 T 1
2r u 2rn v 2su 2tv )) exp( j ( m )) (u, v) (4.6) S T N N
The expected value of F1 (u, v ) is:
E[ F1 (u , v)] * F0 (u, v)
(4.7)
where
(2a 1)u (2b 1)v sin N N u v (2a 1) sin (2b 1) sin N N sin
(4.8)
34
Chapter 4
Spectral Interpretations of Visual Texture Perception
The value of is directly related to the values a and b . If a b 0 , 1 , then it reduces to the periodic visual texture. The parameters a and b are called positional jitter parameters. Theoretically, the above derivation does not strictly hold for the following two cases of the shifted stimuli:
1) Colliding stimuli. This means that the shifted stimuli partially or fully collide with each other inside the image range.
2) Vanshing stimuli. This means that the shifted stimuli are partially or fully out of the image range, especially for stimuli near the image edge and for the case of large values of a and b .
If controlled properly, these situations rarely happen in practice, or even when they happen, their influence is rather limited (see Fig. 4.4). (This can be more clearly seen in Experiment II in Chapter 5, and there the extreme situations are examined.) Another point is that random sample images from a large image data set are used to simulate the problem, thus unavoidably bringing in some perturbations.
Because of the dotted distribution of F0 (u, v) , local maxima (harmonic components) can still be observed in the Fourier spectrum, but their intensity values are scaled by a factor | | , which is not greater than 1. Fig. 4.3 shows some values of
. It is interesting to notice that the curve (in Fig. 4.3) serves as a low-pass filter with
large values near 0 and 256 (passing low frequencies), and small values near 128 (suppressing high frequencies). At the same time, due to the Parseval theorem, the reduction of harmonic frequency components introduced by positional jitter is
35
Chapter 4
Spectral Interpretations of Visual Texture Perception
consequently accompanied by the generation of non-harmonic frequency components, which are distributed in the spectral domain. Figs. 4.4a and 4.5a show this phenomenon clearly.
1
0.5
a=1 a=2 a=3 a=4 a=5 a=6
0
-0.5
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256
Figure 4.3: The curves of versus u , with b 0 , v 0 , and N 256 .
In Figs. 4.4a and 4.5a, three points are worthy of notice:
1) Due to the existence of vanishing stimuli, the DC component is slightly reduced, but, however, it does not matter too much.
2) It is observed that the first harmonic components (bright spots in the spectra) approximate the expected value, while high-order harmonic components decrease more rapidly than before. In order to see this clearly, compare (i) Figs. 4.2b and 4.4a, and (ii) Figs. 4.2c and 4.5a. It can be easily imagined that if bigger values of a and b are chosen in practice in order to introduce more positional jitter, harmonic frequency components decrease more rapidly.
36
Chapter 4
Spectral Interpretations of Visual Texture Perception
3) The dark contours may be misleading because a human tends to observe these contours at first sight. In fact, these contours are inherited from (u, v) . As mentioned above, the contours are formed by frequency components with low values. From Eq. (4.6), if (u, v) is small enough (in fact, the values are all near zero by measurement), then F1 (u, v ) will also be very small, thus forming the contour again. (For more images, see Fig. 5.5 of Experiment II in Chapter 5.)
Naturally, if the local maxima are regarded as signal, and their neighboring frequencies as noise, the term Signal-to-Noise Ratio ( SNR ) can be used to describe this phenomenon more clearly. The higher the SNR value is, the higher the periodicity. For a completely periodic image, SNR tends to infinity because there is no noise at all with all the dots as pure signal. The lower the SNR value is, the greater the randomness, and the more jittery the stimuli are. (The detailed relationship will be revealed in Experiments I and II in Chapter 5.)
Here only the simple case of the positional jitter obeying uniform distribution has been considered. It is easy to generalize the results to an arbitrary distribution. However, it is natural to ask whether there is a possibility of devising distributions for a specific stimulus, or devising stimuli for a specific distribution, such that all the frequency components are equalized over the complete spectral domain (except the DC component). The answer seems to be affirmative but difficult. The difficulty comes from the following. First, note that the curve has an intrinsic low-pass nature. More specifically,
37
Chapter 4
Spectral Interpretations of Visual Texture Perception
pm (0) 2 pm (k ) cos
k 1
a
b 2ku 2lv * pn (0) 2 pn (k ) cos N N l 1
(4.9)
where 0 pm ( k ) 1, k 0, 1, ..., a and 0 pn (l ) 1, l 0, 1, ..., b are the corresponding probabilities of the positional jitter of the stimuli.
It is easy to show that, when u v 0 , 1 , which means that the DC component is not affected at all. Now if u is gradually increased from 1 (keeping v fixed for simplicity), the term cos eventually.
2ku ( a, b N ) decreases, thereby decreasing N
Now only a stimulus can be chosen to preferably have strong high frequencies and weak low frequencies. However, for a visual stimulus with specific shape, such as +, L, T, or line-segment, strong low frequencies and weak high frequencies are usually obtained. However, it seems to be possible (e.g., using a random texture stimulus) to construct a stimulus satisfying the above criteria. The construction of such a stimulus is not considered here, since it constitutes a separate subject by itself for an independent study.
In the above derivations, there are no strict restrictions on the generation of textures except the problems of vanishing and colliding stimuli, whose influence, however, can be controlled appropriately. Thus it is safe to extend these derivations to natural textural images because periodicity is one of the three most important perceptual dimensions in natural texture discrimination [68]. In fact, as stated above, if a textural image is assumed as a homogeneous 2-D discrete random field, the Wolddecomposition model [22][23][27-29] decomposes it into three mutually orthogonal
38
Chapter 4
Spectral Interpretations of Visual Texture Perception
components ---- purely non-deterministic, generalized evanescent, and harmonic ---(as already explained in the introduction), and this decomposition is implemented via operations in the spectral domain. For example, the harmonic components are extracted by finding local maxima in the spectral domain, a procedure that typically matches the above observations.
4.3 Visual Texture with Orientation Randomness
How to generate a visual texture with orientation randomness is shown in the block schematic of Fig. 4.1d. A random variable is introduced to denote the orientation, with a uniform probability density function (PDF) over [0,2 ) .
p( ) 1 , [0,2 ) 2
(4.10)
The new visual texture f 2 ( m, n ) can be written as follows:
f 2 (m, n) (m'
s 0 t 0 S 1 T 1
sN tN , n' ) S T
(4.11)
where (m' , n' ) is the rotated coordinate of (m, n) , i.e.,
m' m cos n sin n' m sin n cos
(4.12)
Its Fourier transform F2 (u, v ) is given by
F2 (u, v) exp( j (
s 0 t 0 S 1 T 1
2su 2tv )) (u ' , v' ) S T
(4.13)
where (u' , v ' ) is the rotated coordinate of (u, v ) , i.e.,
39
Chapter 4
Spectral Interpretations of Visual Texture Perception
u' u cos v sin v' u sin v cos
(4.14)
The expected value of F2 (u, v ) is:
E[ F2 (u, v)] ST (u
sN tN , v ) E (u, v), s 0,1,...,S 1, t 0,1,...,T 1 (4.15) S T
where E (u, v) is the expected value of (u, v) on the circle rc u 2 v 2 .
E (u, v)
1 2
u v rc
2 2
(u, v)d
(4.16)
In the above theoretical derivation, (0,0) is considered as the origin of rotation for each stimulus, while, in practice, the origin of rotation is its own center for each stimulus. In this case also, however, a similar result can be obtained.
Now what has happened to the spectra? First, it can still be observed that there are local maxima in the spectra because of repetitive placement of stimuli. However, from the above derivation, E (u, v) , theoretically, is circularly uniform over the spectrum. In practice, the introduction of orientation randomness balances all frequency components on the same circle. Figs. 4.4b and 4.5b provide visual textures with orientation randomness to illustrate this phenomenon.
In Figs. 4.4b and 4.5b, it is easy to observe the first harmonic components and also the circular shape in the Fourier spectrum. Especially for the vertical line-segment stimuli in Fig. 4.5b, the original horizontal and vertical first harmonic components are different (see the data marked in Fig. 4.2c), but now they become equalized too because of the balancing properties introduced by orientation randomness.
40
Chapter 4
Spectral Interpretations of Visual Texture Perception
4.4 Visual Texture with Positional Jitter and Orientation Randomness
As shown in Fig. 4.1e, positional jitter and orientation randomness can be combined together to create a visual texture with both positional jitter and orientation randomness. With the help of the random variables rm , rn , and introduced above, the new visual texture f 3 ( m, n ) , its Fourier transform F3 (u, v ) , and the expected value
E[ F3 (u, v )] can be written as follows:
f 3 (m, n) (m'
s 0 t 0
S 1 T 1
sN tN rm , n' rn ) S T
(4.17)
F3 (u, v) exp( j (
s 0 t 0
S 1 T 1
2r u 2rn v 2su 2tv )) exp( j ( m )) (u ' , v' ) (4.18) S T N N
E[ F3 (u, v)] ST (u
sN tN , v ) E (u, v), s 0, 1, ...,S 1; t 0, 1, ...,T 1 (4.19) S T
where m' , n ' , u ' , v ' , and E (u, v) are as defined before.
As explained above, positional jitter reduces harmonic components and, as a result, introduces 'noisy' frequency components, while orientation randomness tends to circularly balance the frequency components. If positional jitter and orientation randomness are combined together in the spatial domain, their properties are also integrated in the spectral domain because they are two independent processing procedures. Hence more uniformly distributed frequency components all over the spectral domain are obtained. Figs. 4.4c and 4.5c give examples of visual textures with both positional jitter and orientation randomness.
41
Chapter 4
Spectral Interpretations of Visual Texture Perception
In Figs. 4.4c and 4.5c, it is easy to observe both the reduced first harmonic frequency components and circular equalization in the Fourier domain. Also it can be noted that the horizontal and vertical first harmonic components are close to each other. In addition, if the value of the first harmonic component marked in (i) Fig. 4.4c is compared with that in Fig. 4.4b, or (ii) Fig. 4.5c with that in Fig. 4.5b, this ratio approximates . This further establishes the correctness of the derivations in Section 4.2.
9166
9166
3787
3829
(a)
9164 6081 9164 6088
(b)
9162 9162
3531
3621
(c) Figure 4.4: Examples of visual textures with positional jitter, orientation randomness, and both, using the +-shaped stimuli. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing the center of the spectra. (a) Positional jitter (a=b=4). (b) Orientation randomness. (c) Positional jitter (a=b=4) and orientation randomness.
42
Chapter 4
Spectral Interpretations of Visual Texture Perception
5100 3689
5100
1785
(a)
5112 3512 5112 3580
(b)
5106 5106
2660
2668
(c) Figure 4.5: Examples of visual textures with positional jitter, orientation randomness, and both, using the vertical line-segment stimulus. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing the center of the spectra. (a) Positional jitter (a=b=3). (b) Orientation randomness. (c) Positional jitter (a=b=3) and orientation randomness.
In this chapter, some phenomena of visual texture perception have been interpreted using spectral analysis. Base on these interpretations, five experiments on visual texture perception (for modeling human perception) are devised in the next chapter.
43
CHAPTER 5
Experiments and Discussions on Visual
Texture Perception
As stated in Chapter 4, in order to explain the visual phenomena more clearly, horizontal and vertical cross-sections of the spectral functions, passing through the center, have been utilized. More significantly, the term SNR is employed to describe the randomness or noisy state introduced by positional jitter. In the following experiments, the value of a new SNR is still used to give a quantitative measure. Two questions deserve attention: How to calculate this SNR ? Using the whole spectrum or cross-section lines? It is found that the latter is to be preferred for the following reasons:
1) Avoidance of global operations in the spectral domain. The global operations in the spectral domain to extract second-order statistics have been proved to be unsuccessful in explaining visual phenomena [4]. This limitation has been attributed to the global nature of the Fourier transform. However, if the cross-section lines are used, global operations in the spectral domain, and therefore the possible inaccuracies caused by the operations, are avoided.
44
Chapter 5
Experiments and Discussions on Visual Texture Perception
2) Energy concentration. T-shaped, +-shaped, L-shaped and other texture stimuli are widely used in all the experiments on visual texture perception. For these stimuli, the energy concentration on cross-section lines passing through the center of the spectra is very high. As pointed out before, the loworder harmonic components play a more important role than the other frequency components, and the cross-section lines elegantly include them.
3) Computational efficiency. It is obvious that a significant amount of computation can be saved, using two cross-section lines instead of the whole spectra.
In the experiments described below, the following steps will be taken. First, one base image is used as the original signal. Then certain parameters are changed to (i) control the degree of 'distortion' exerted on the original image or (ii) replace the stimuli in the base image with the other stimuli to acquire a series of test images. The details will be exhibited in different experiments. The new SNR (defined below) values are then calculated for all the test images, and employed, in plotted or tabulated format, to interpret the corresponding visual phenomena. Due to the limited samples in the following experiments, note that the obtained SNR values are only approximate. However, they are enough for us to perceive the inherent trends and to illustrate the problems of perception.
The SNR value is defined as follows. Suppose there is one horizontal crosssection line, h(u), u 0,1,...,N 1 , from the Fourier spectrum F (u, v ) . Thus
45
Chapter 5
Experiments and Discussions on Visual Texture Perception
h(u) F (u,0) holds. In what follows, the notation SNRu means that it is along u
direction, distinguishing it from the vertical one, SNRv . Given the base and the distorted images, there are their corresponding hb (u ) and hd (u ) functions.
SNRu 10 log10
where
2 0 2 n
(5.1)
02 hb (u ) mb
u 0
N 1
2
(5.2)
mb
N 1 u 0
1 N
h (u )
u 0 b
N 1
(5.3)
2 n hd (u ) hb (u )
2
(5.4)
It turns out that the localization of perceptual boundaries is not satisfactory using Fourier analysis. However, using the Gabor function, they can be extracted successfully. In the following experiments, wherever applicable, additional results of perceptual boundaries are presented.
Below, five experiments are reported on the relationships between the SNR values and the various parameters of the generated textured images. The parameters are respectively: periodicity and randomness (Experiment I), positional jitter (Experiment II), orientation randomness (Experiment III), and perceptual asymmetry (Experiments IV and V).
46
Chapter 5
Experiments and Discussions on Visual Texture Perception
5.1 Experiment I: Periodicity and Randomness
From the derivations in Section 4.1, it is known that the dotted distribution (for instance, in Fig. 4.2) in the spectral domain comes from the periodicity in the spatial domain. Any impairment of this periodicity, for example, arising from the introduction of randomness in the spatial domain, will destroy the dotted distribution in the spectral domain, thereby introducing the 'noisy' frequency components. In this experiment, the relationship between the extent of randomness and the SNR level will be tested.
The experiment starts from a completely periodic visual texture as the base image. Then, using the technique of positional jitter (note that the positional jitter parameters
a and b are fixed for all textures), randomness is introduced in the center, and is
gradually expanded. Suppose the variable r denotes the size of the random region, in this region, there are r 2 random stimuli. Fig. 5.1 shows some examples of textures used in the experiment, along with the Fourier spectrum and the scan-line crosssections of the spectrum.
From Fig. 5.1, it is obvious that, with the increase in random region size, the harmonic frequency components ('signal') are decreased whereas the non-harmonic frequency components ('noise') are increased. In Fig. 5.1a, compared with the large DC component, the noise level is very small, (but noise really exists --- see the Fourier spectrum in the figure for detail); and therefore it is difficult to display the noise because of the problem of scaling.
47
Chapter 5
Experiments and Discussions on Visual Texture Perception
(a)
(b)
(c) Figure 5.1: Examples of visual textures used in experiment I. Positional jitter parameters a=b=4. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the center of the spectra. Random region size: (a) r=2; (b) r=8; (c) r=14.
Fig. 5.2 provides the curves of SNRu and SNRv versus region size r for four different texture stimuli: +-shape, T-shape, L-shape, and vertical line-segment. In order to make SNRu and SNRv curves distinct, SNRv is added to 40dB for display. Thus the upper four curves correspond to SNRv while the lower four curves correspond to
SNRu .
48
Chapter 5
Experiments and Discussions on Visual Texture Perception
80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 +u +v Tu Tv Lu Lv |u |v
Figure 5.2: The curves of SNRu and SNRv versus region size r obtained in Experiment I. The four stimuli used in the experiment are +-shape, vertical linesegment ('|'), T-shape, and L-shape.
For all the four stimuli, the SNRu curves twist together, thereby suggesting that humans observe similar vertical randomness in the spatial domain since the vertical information in the spatial domain appears horizontal in the spectral domain. However, the SNRv curve for the vertical line-segment stimulus is distinct from the other three twisted curves. This is because, for vertical line-segment stimuli, the vertical
information dominates human visual perception, making it difficult to identify horizontal randomness. For the other three stimuli, both vertical and horizontal information dominates human visual perception. Fig. 5.3 provides examples for comparison.
49
Chapter 5
Experiments and Discussions on Visual Texture Perception
(a)
(b)
(c)
Figure 5.3: Example of visual textures demonstrating horizontal and vertical visual perception. Positional jitter parameters a=b=4. Random region size r=8. (a) Vertical line-segment stimuli. (b) T-shaped stimuli. (c) L-shaped stimuli. See Fig. 5.1b for +shaped stimuli.
The initial SNR is about 35dB. A typical range for the values of r such that its corresponding SNR decreases to below 20dB is 5-6, while that for the corresponding
SNR to decrease to below 10dB is 9-11. In general, the SNR 's decreasing trend is
gradually weakening with the increase in region size r , while the number of jittered stimuli is equal to r 2 . Visually, it means that a human is very sensitive to randomness initially introduced by positional jitter. However, when this randomness is increased to some extent, this sensitiveness gradually decreases. The images in Fig. 5.4 explain this in detail. In comparison with Fig. 5.4a, the randomness can be easily detected in Figs. 5.4b and 5.4c. When Figs. 5.4d, 5.4e and 5.4f are compared, it is found that there is ambiguity about the difference in the sizes of the perceptual boundaries on the basis of pre-attentive vision, and therefore a careful inspection of the images is needed in order to estimate the differences.
50
Chapter 5
Experiments and Discussions on Visual Texture Perception
(a)
(b)
(c)
(d)
(e)
(f)
Figure 5.4: Examples of visual textures demonstrating SNR declining trend. Positional jitter parameters a=b=4. (a) Completely periodic +-shaped stimuli as base image. (b)-(f) +-shaped stimuli with r=1, 2, 8, 9, and 10, respectively.
The Gabor function is now employed to detect the perceptual boundaries. Some examples are provided in Figs. 5.12a – 5.12c. In these figures, it is noticed there is a dark area in the center, which simulates the perceptual boundary one can observe. Note that there are gray intermediate regions between the dark and light areas, which indicate that the perceptual boundary cannot be definitely decided in preattentive vision. Also note that it is these gray regions that make it difficult to distinguish the difference between the boundaries of Figs. 5.4d and 5.4e, as explained above.
5.2 Experiment II: Positional Jitter
From the derivations in Section 4.2, it is known that positional jitter suppresses the harmonic components ('signal'), and introduces non-harmonic components ('noise').
51
Chapter 5
Experiments and Discussions on Visual Texture Perception
In this experiment, the relationship between the extent of positional jitter (described by parameters a and b ) and the SNR level will be tested.
Once again, the experiment starts from a completely periodic visual texture as the base image. Then the parameters a and b (setting a = b for convenience) are gradually increased in order to control the extent of positional jitter. Fig. 5.5 includes, along with the visual textures, some examples of the Fourier spectrum and crosssection lines used in the experiment.
9180
(a)
9052
(b)
8543
(c) Figure 5.5: Examples of visual textures used in experiment II. From left to right: visual texture, the Fourier spectrum, and horizontal and vertical cross-sectional lines passing through the center of the spectra. Positional jitter parameters: (a) a=b=2 (b) a=b=6 (c) a=b=10.
52
Chapter 5
Experiments and Discussions on Visual Texture Perception
Fig. 5.6 shows curves of SNRu and SNRv versus positional jitter parameter a for four different stimuli, +-shape, T-shape, L-shape, and vertical line-segment. Note that the value of SNRv is added to 20dB merely for display.
40 35 +u 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 +v Tu Tv Lu Lv |u |v
Figure 5.6: The curves of SNRu and SNRv versus positional jitter parameter a (b=a) obtained in Experiment II. The four stimuli used in the experiment are +-shape, vertical line-segment ('|'), T-shape, and L-shape.
The initial SNR value for a b 1 is about 10dB, which means a rapid decrease in periodicity. Therefore, it is observed that there is a rapid decrease with the increase in a . Refer to textures shown in Fig. 5.5 for detail. In Fig. 5.5a, something like a periodic texture is still observed, while in Fig. 5.5b, the periodicity is very small, and it is totally lost in Fig. 5.5c that corresponds to positional jitter parameters
a b 10 . Suppose that the positional jitter parameters are greater than 10, there is no
big difference in the visual appearance of the patterns because a totally random distribution of +-shaped stimuli is expected. Also note that the SNRv curve of vertical
53
Chapter 5
Experiments and Discussions on Visual Texture Perception
line-segment stimuli is above the other three curves. The reason is the same as what has been set forth in Experiment I.
Another point is that with the increase in positional jitter parameters, the vanishing and colliding stimuli are also increased. See the DC component values marked in Fig. 5.5. This may affect the results presented here. In this context, see the irregular perturbation in the SNR curves when the positional jitter parameters are in the range 10 - 16. However, this effect is limited due to the reasons mentioned above.
5.3 Experiment III: Orientation Randomness
From the derivations in Section 4.3, it is known that orientation randomness has the ability to circularly equalize all frequency components in the spectral domain. As a consequence, the harmonic components ('signal') are reduced and the non-harmoniccomponents ('noise') are increased. However, for different stimuli, the noise levels introduced by this technique are also different, thus forming different SNR values. In this experiment, the relationship between the stimuli and their corresponding SNR levels will be tested.
The experiment starts, as before, from a visual texture with orientation randomness as the base image. Then for the same stimulus, other eight visual textures also with orientation randomness are created, i.e., taking other eight random samples. The SNR value is calculated for each sample texture, and the mean of all eight values is used for illustration.
54
Chapter 5
Experiments and Discussions on Visual Texture Perception
The eight different stimuli are shown in Figs. 5.7a – 5.7h. For simplicity, they are symbolized by '+', '+1', '+2', '+3', 'L', 'T', 'H', and '5'. The SNR values are provided in Table 1.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 5.7: Examples of visual textures used in Experiment III. The texture stimuli in images (a)-(h) are ‘+’, ‘+1’, ‘+2’, ‘+3’, ‘L’, ‘T’, ‘H’, and ‘5’-shaped, respectively.
There are two observations from Table 5.1. (1) Considered in the range of variance, the average SNRu is very similar to the average SNRv for all stimuli. This matches the derivations of the function of orientation randomness. (2) The L-shaped stimulus has the lowest SNR value, while the +-shaped stimulus has the highest SNR value. In other words, the L-shaped stimulus appears noisiest in contrast with the +shaped stimulus.
55
Chapter 5
Experiments and Discussions on Visual Texture Perception
Stimulus
SNRu
1
2
3
4
5
6
7
8
Mean Var.
17.93 17.63 19.22 18.90 17.54 18.93 18.00 16.98 18.14 0.74 17.74 19.83 17.34 17.50 18.94 17.48 18.55 18.00 18.17 0.82 16.23 16.08 15.35 16.21 15.51 16.22 16.05 15.83 15.93 0.32 17.66 16.66 17.75 15.69 15.28 17.45 16.70 16.10 16.66 0.86 14.29 14.42 15.45 15.01 14.49 15.13 14.98 15.33 14.89 0.41 15.19 15.76 14.82 15.67 15.43 14.14 14.90 15.51 15.18 0.50 12.49 13.63 12.50 13.01 11.85 13.75 14.21 12.37 12.98 0.76 12.35 14.13 13.06 13.77 13.02 13.97 13.69 13.10 13.39 0.56 13.89 12.42 12.68 13.10 12.26 11.78 12.77 13.27 12.77 0.61 11.43 12.50 10.84 11.41 13.12 12.03 12.13 11.09 11.82 0.72 12.46 12.34 13.59 11.99 13.80 12.81 12.84 13.62 12.93 0.63 12.74 12.59 13.12 14.05 12.41 13.86 12.46 11.87 12.89 0.70 14.23 14.25 13.60 13.65 13.76 14.04 14.53 14.50 14.07 0.34 13.68 14.54 13.66 14.43 12.79 14.41 12.97 14.03 13.81 0.63 16.19 16.87 17.17 16.49 15.62 17.12 17.76 16.69 16.74 0.61 16.78 16.10 15.89 16.06 15.68 16.44 16.10 16.03 16.14 0.32
„+‟
SNRv SNRu
„+1‟
SNRv SNRu
„+2‟
SNRv SNRu
„+3‟
SNRv SNRu
„L‟
SNRv SNRu
„T‟
SNRv SNRu
„H‟
SNRv SNRu
„5‟
SNRv
Table 5.1: Examples of SNR values in Experiment III. (Unit of data: dB.)
56
Chapter 5
Experiments and Discussions on Visual Texture Perception
5.4 Experiment IV: Perceptual Asymmetry I
In this experiment, two sets of images have been examined, and each set consists of one base image and four test images. Fig. 5.8 shows all eight images used in this experiment. (The same base image and one common test image in both the sets have been used.)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 5.8: Examples of visual textures used in Experiment IV. (a) Base image. (b)-(e) The first set of four test images used in Experiment IVa. (e)-(h) The second set of four test images used in Experiment IVb. The embedded stimuli in the center of (a)-(h) are ‘+’, ‘+1’, ‘+2’, ‘+3’, ‘L’, ‘T’, ‘H’, and ‘5’-shaped, respectively.
57
Chapter 5
Experiments and Discussions on Visual Texture Perception
In all the test images shown in Fig. 5.8, different stimuli have been embedded in the center while keeping the +-shaped stimuli as background. Orientation randomness and positional jitter with a b 1 are introduced to make perception more difficult. For all these stimuli embedded in the center, their gray levels have been set to the gray value of the +-shaped stimulus in order to avoid a first-order difference. Because of perceptual asymmetry, a human will get different visual impulses from these stimuli. The more salient one appears noisier [61]; therefore a lower SNR value can be expected in the experiment.
Fig. 5.9 provides SNRu and SNRv curves for two sets of test images. In order to make the SNRv curve distinct from the SNRu curve, SNRv is added to 10dB for display.
30
30
20
snru snrv
snru 20 snrv
10 b c d e
10 e f g h
(a)
(b)
Figure 5.9: The curves of SNRu and SNRv obtained in Experiment IV. Left: (a) The curves of SNRu and SNRv for Experiment IVa. Right: (b) The curves of SNRu and
SNRv for Experiment IVb.
From Fig. 5.9a, it is observed that SNR is monotonically decreasing from b to e, which means that the noise is increasing, and, therefore, visually, it should be easier to distinguish the embedded center from the background. Now if the four test images
58
Chapter 5
Experiments and Discussions on Visual Texture Perception
(Fig. 5.8) used in Experiment IVa are observed, it is difficult to discriminate the stimuli in the center from the +-shaped background in Fig. 5.8b, whereas it is easy to detect the L-shaped stimuli in the center in Fig. 5.8e.
From Fig. 5.9b, it is observed that there is a monotonically increasing order from e to h. Thus, visually, the L-shaped stimuli stand out more distinctly than the other three stimuli. Though a strict psychophysical experiment is needed to justify this result, a finding reported in the literature [67] is that the L-shaped stimulus is more visually salient than the T-shaped one, which, in fact, corresponds to the result presented here. Also, psychophysically, the size (defined as the minimal enclosing ellipse in [61]) of the stimulus is an important factor, and is often considered for explanation of psychophysical phenomena. The bigger the size is, the more salient the stimulus appears [66]. For the four stimuli embedded in Figs. 5.8e – 5.8f, there is an increasing order of such a size. This also proves the efficiency of the method presented here.
It is easy to explain this by referring to the results obtained in Experiment III. For the first set of four test images, Table 1 demonstrates that, if the SNR values are ranked, then SNR ('+1') > SNR ('+2') > SNR ('+3') > SNR ('L'). Since the L-shaped stimulus is the noisiest stimulus, visually it is the most salient. The same reason can be applied to the second set since SNR ('L') < SNR ('T') < SNR ('H') < SNR ('5') holds for this case.
Fig. 5.12 provides examples of the Gabor magnitude images of Figs. 5.8b, 5.8e and 5.8g. Note that since Fig. 5.8e is similar to Fig. 1.1e, except for a slight positional jitter, a b 1 , Fig. 5.12f is regarded as an approximation to the perceptual
59
Chapter 5
Experiments and Discussions on Visual Texture Perception
boundary of Fig. 5.8e. In these three images, dark borders are observed in the center. However, in Fig. 5.12d, this dark border hardly exists, thereby indicating the difficulty of detecting the perceptual boundary. When comparing the dark borders in Fig. 5.12e and 5.12f, a wide border is observed in Fig. 5.12f, thereby indicating that the region of the L-shaped stimuli is more salient in Fig. 5.8e than that of the H-shaped stimuli in Fig. 5.8g.
5.5 Experiment V: Perceptual Asymmetry II
This experiment starts from a base image with orientation randomness. Then the stimuli in the center are replaced with different stimuli, and this region is increased at the same time. The following four combinations have been tested.
1) +/L combination. The +-shaped stimuli are replaced by the L-shaped stimuli. Examples are provided in Figs. 5.10a, 5.10b and 1.1e.
2) L/+ combination. The L-shaped stimuli are replaced by the +-shaped stimuli. Examples are provided in Figs. 5.10c – 5.10e.
3) T/L combination. The T-shaped stimuli are replaced by the L-shaped stimuli. Examples are provided in Figs. 5.10f, 5.10g and 1.1f.
4) S/mirror-S (S/mS) combination. The S-shaped stimuli are replaced by the mS-shaped stimuli. Examples are provided in Figs. 5.10h and 5.10i.
60
Chapter 5
Experiments and Discussions on Visual Texture Perception
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 5.10: Examples of visual texture used in Experiment IV. (a)(b) +/L combinations, with region size r=2 and 14, respectively; (c)(d)(e) L/+ combinations, with region size r=2, 8, and 14, respectively; (f)(g) L/+ combinations, with region size r=2 and 14, respectively; (h)(i) S/mS combinations, with region size r=2 and 8, respectively.
Fig. 5.11 provides curves of SNRu and SNRv versus region size r for three combinations. In order to make the SNRv curves distinct from SNRu curves, a signal of 20 dB is added to SNRv for display.
61
Chapter 5
Experiments and Discussions on Visual Texture Perception
45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 +/L u +/L v L/+ u L/+ v T/L u T/L v S/mS u S/mS v
Figure 5.11: The curves of SNRu and SNRv versus region size r obtained in Experiment V.
For the +/L combination, after the replacement, the center region filled with the Lshaped stimuli segregates spontaneously from the background filled with the +-shaped stimuli, thus a human is very sensitive to this replacement. Refer to Figs. 5.10a, 1.1e, and 5.10b. This explains the rapid decline of the SNR curve. With the center region increasing in r , the L-shaped stimuli gradually dominate human vision. Hence, the
SNR value reaches its nadir when r equals to 16, which means a totally different
texture from the base image.
For the L/+ combination, the same decreasing trend is observed as the +/L combination, thereby implying that the center region filled with the +-shaped stimuli also segregates from the background filled with the L-shaped stimuli. However, it is obvious that, compared with that of the +/L combination, the SNR level is reduced by about 2 - 5dB. In fact, perceptual asymmetry arises from this reduction.
62
Chapter 5
Experiments and Discussions on Visual Texture Perception
For the T/L combination, a human finds it difficult to distinguish the center region from the background in preattentive vision. However, there exist some doubts about whether they (i.e., background and foreground) are identical. Thus the SNR curve is very steady before r reaches a value in the range 10 - 12, but, as can be noticed, decreases slightly. However, with an increase in r , especially when the number of the L-shaped stimuli outweighs that of the T-shaped stimuli, this difference can be detected, which results in a rapid decrease of the curve. Compare Figs. 5.10f and 5.10g for details.
For the S/mS combination, a human has no way of preattentively detecting the difference between the S-shape and the mS-shape. Even when the image is viewed stimulus by stimulus, a cue is needed for detecting this difference. Thus, the corresponding SNR curve should be steady everywhere. However, there is an inevitable variation from sample to sample. Fig. 5.11 provides such a varying curve, but any obvious decreasing trend cannot be observed in it. This observation also matches the texton theory because, according to the texton theory, the S-shaped and the mS-shaped stimuli have the same number of textons.
The above phenomena can be explained as follows. For each combination, the noise (i.e., the non-harmonic frequency components) arises from two parts: (i) the background with orientation randomness (note that if positional jitter is also introduced as in Experiment IV, then noise is introduced too), and (ii) the replacement of the stimuli. Note that the first part is fixed (in practice, it varies from sample to sample), and only the second part is varying with the region size r . When r is very small, the
63
Chapter 5
Experiments and Discussions on Visual Texture Perception
noise arising from the background is dominant. But with the increase in r , the role that the second part plays depends on the following three situations.
1) If the noise introduced by stimulus replacement is very large, this role becomes dominant since the number of the replaced stimuli is equal to r 2 . Thus a rapid decline in the SNR value is observed. Replacement of the +shape stimulus with the L-shaped stimulus or vice versa belongs to this situation because their SNR levels have a distinct difference of about 5 - 7dB, according to Table 5.1. Perceptual asymmetry arises in this situation because of the different 'noisy' extent of the background. The 'noisier' L-shaped stimuli, as the background, make the detection of the +-shaped stimuli at the center more difficult than the inverse case.
2) If the noise introduced by stimulus replacement is very weak, the role of the second part is also very weak at the beginning. Thus a steady SNR curve with a slight decrease is observed. However, with the accumulation of this noise, its effect cannot be omitted any more. Then a salient SNR decline is observed. Replacement of the T-shape stimulus with the L-shaped stimulus or vice versa belongs to this situation because their SNR levels have, according to Table 5.1, a slight difference of about 1-3dB. Perceptual asymmetry is also weak in this situation.
3) If the noise introduced by stimulus replacement is near zero, then the replacement does not function any more. Thus a steady SNR curve without decline is observed. Replacement of the S-shape stimulus with the mS-shaped
64
Chapter 5
Experiments and Discussions on Visual Texture Perception
stimulus or vice versa belongs to this situation. Their SNR levels can be expected to be similar. Suppose the order S/mS is changed to mS/S, i.e., by embedding the S-shaped stimuli into the mS-shaped stimuli, no perceptual asymmetry will exist.
From the above observation, the phenomena of preattentive texture segregation can be roughly divided into three classes (only the cases with the same first-order statistics are considered here):
1) spontaneous segregation. The +/L and L/+ combinations belong to this class. Both perceptual asymmetry and perceptual boundary exist for this class. This corresponds to the difference between the SNR values of two stimuli, which is above 5dB.
2) semi-segregation. The T/L combination belongs to this class. Perceptual asymmetry may exist, or exist but may not be so apparent, for this class. It is the same for perceptual boundaries. This corresponds to the difference between the SNR values of two stimuli, which is about 1-3dB.
3) non-segregation. The S/mS combination belongs to this class. No perceptual asymmetry and no perceptual boundary exist for this class. There is no difference between the SNR values of two stimuli, or the difference is below 1dB.
65
Chapter 5
Experiments and Discussions on Visual Texture Perception
If the Gabor function is employed to detect the perceptual boundary, these three classes can be revealed more clearly. Figs. 5.12f – 5.12i provide such examples. In Figs. 5.12f and 5.12g, clear boundaries can be observed but the size of the boundaries is different. Perceptual asymmetry exists because of this difference. It is obvious that Fig. 5.12f has a wider boundary than Fig. 5.12g, indicating that the center filled with the L-shaped stimuli against the background filled with +-shaped stimuli is more salient than the converse case. However, both the cases belong to the class of spontaneous segregation. In Fig. 5.12h, no clear boundary can be observed, but there is the luminance difference between the center region and the background, indicating a semi-segregation situation. In Fig. 5.12i, neither the boundary nor the luminance differences exist, and, therefore, there is non-segregation. In comparison with the results of Voorhees and Poggio [52], it is believed that the perceptual boundaries obtained from the presented model are more representative (than those of [52]) of the actual ones.
66
Chapter 5
Experiments and Discussions on Visual Texture Perception
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 5.12: Examples of the Gabor magnitude images of different images shown above: (a) Fig. 5.4c; (b) Fig. 5.4d; (c) Fig. 5.4e; (d) Fig. 5.7b; (e) Fig. 5.7h; (f) Fig. 1.1e; (g) Fig. 5.10d; (h) Fig. 1.1f; (i) Fig. 5.10i. Images (a)-(c) are normalized to the range [0-128], while images (d)-(i) are normalized to the range [0-255].
In this chapter, an attempt has been made to examine certain models of visual texture perception, which are unified in one framework. It is surprising that our simple framework interprets all these visual phenomena successfully. The perceptual boundaries, wherever applicable, have also been presented. The next chapter will conclude the thesis.
67
CHAPTER 6
Conclusions
The thesis deals with the problems of (i) texture retrieval using a tree-structured wavelet transform; and (ii) the modeling of visual texture perception in preattentive vision of human visual system (HVS). The leitmotiv for the thesis is the existence of an intimate relationship between these two apparently different problems. It is acknowledged that the human ability to visually characterize textures is astonishing, but attempts (in the literature) to quantify it have yet barely satisfactory. Therefore, if a machine is to detect textures in the image of a scene, and identify them as belonging to (or retrieve them from) the contents of an image library, it would be very useful to develop a mathematical model for imitating human ability to discriminate textures.
To this end, an attempt has been made to characterize textures using wavelet transforms, and, subsequently, sets of experiments have been conducted in order to relate perceptual attributes to spectral properties.
In texture retrieval, it has been demonstrated that a combination of rank and energy distances contributes to a better texture image retrieval than a single energy
68
Chapter 6
Conclusions
distance since this amounts to a better utilization of the information provided by wavelet decomposition. Some illustrative examples are also given.
In visual perception modeling, a new framework has been presented for a spectral interpretation of visual texture perception. The visual textures considered here are completely periodic and with positional jitter, orientation randomness, and both. Then a mechanism has been presented to extract quantitative measures, SNRu and SNRv , from the Fourier spectrum. These measures have been demonstrated to be very efficient in interpreting some perceptual phenomena of visual textures (illustrated in five different experiments), which have been synthesized to have periodicity and randomness, positional jitter, orientation randomness, perceptual asymmetry I, and II, respectively. Finally, the Gabor function has been employed to localize perceptual boundaries to provide spatial interpretations.
From these experimental results on visual texture perception, the main findings are as follows:
1) Julesz's texton theory [14][15] points out the importance of the geometric shape of micropattern, which serves as the stimuli to human visual system, though, in preattentive vision, this definite shape cannot be exactly extracted. From a spectral point of view, stimuli with different shapes have different frequency distributions. When this difference is large enough, human visual perception can discover it, and, furthermore, employ it to detect the perceptual boundary or other phenomena. The proposed mechanism helps us to discover this
69
Chapter 6
Conclusions
difference, and the results presented above are believed to be more quantitative than those of the literature.
2) The proposed new SNRu and SNRv measures are semi-local statistics extracted from the Fourier spectrum, which means that they can be used efficiently as an interpretation tool, but exclusive employment of these measures cannot detect the perceptual boundary in the spatial domain. Beck [11] suggests the application of the multi-frequency channeling nature of human visual system, and employs the Gabor function as the multichannel filter. The experimental results presented here also confirm the utility of the Gabor function for this purpose. However, since the difference between the frequency components of various stimuli varies in the spectral domain, it turns out to be somewhat critical to select an appropriate parameter set for the Gabor function.
3) These findings can also be applied to other texture-related research areas without any difficulty since visual texture is nothing but a specially simplified texture for study purposes. For example, in the first part of texture-based image retrieval, the SNRu and SNRv measures exhibit the visual differences between various textures, thereby demonstrating their competence for this task. In the case of texture segmentation, they can be utilized as an indicator of the existence or otherwise of diverse textures in the image field.
Some possible future areas for investigation include:
70
Chapter 6
Conclusions
1) Though the SNRu and SNRv measures fit the visual phenomena presented in the experiments very well, critical psychophysical experiments are needed to test the robustness of these measures.
2) As indicated above, these measures are still semi-local statistics extracted from the Fourier spectrum, which is a global representation of all frequency components, while texture perception is a complex combination of both global and local properties. Therefore, it will be helpful if these measures can be combined with features extracted from the phase spectrum, which records the spatial information in a texture.
3) It is interesting to explore these measures in a multi-scale context such as wavelet decomposition. While the wavelet transform is a localized, hierarchical filtering in the spatial domain, the proposed model is supposed to operate globally in the spatial domain. Hence, this integration of local filtering and global operation is believed to simulate the human perception of textures in a more precise way since texture perception, including visual texture perception, is a sophisticated procedure combining both local and global, as also hierarchical information extracted from the texture field.
71
REFERENCES
[1] J. R. Bergen, "Theories of visual texture perception," in Spatial Vision, ed. D. Regan, pp. 114-134, CRC Press, New York, 1991. [2] P. Brodatz, Textures: A photographic album for artists and designers. New York: Dover, 1966. [3] B. Julesz, "Visual texture discrimination," IRE Trans. on Information Theory, Vol. 8, pp. 84-92, 1962. [4] B. Julesz, H. L. Frisch, E. N. Gilbert, and L. A. Shepp, "Inability of humans to discriminate between visual textures that agree in second-order statistics," Biological Cybernetics, Vol. 31, pp. 137-140, 1973. [5] W. K. Pratt, O. D. Faugeras, and A. Galalowicz, "Visual discrimination of stochastic texture fields," IEEE Trans. on Systems, Man, Cybernetics, Vol. 8, No. 11, November 1978. [6] A. Galalowicz, "A new method for texture fields synthesis: some applications to the study of human vision," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 3, No. 5, pp. 520-533, September 1978. [7] J. D. Victor and S. E. Brodie, "Discriminable textures with identical Buffonneedle statistics," Biological Cybernetics, Vol. 31, pp. 231-234, 1978. [8] L. O. Harvey and M. J. Gervais, "Visual texture perception and Fourier analysis," Perception Psychophysics, Vol. 24, pp. 534-542, 1978.
72
References
[9]
W. Richard and A. Polit, "Texture matching," Kybernetik, Vol. 16, pp. 155-162, 1974.
[10] S. A. Klein and C. W. Tyler, "Phase discrimination of compound gratings: generalized autocorrelations analysis," Journal of Optical Society of America A, Vol. 3, pp. 868-879, 1986. [11] J. Beck, A. Sutter, and R. Ivry, "Spatial frequency channels and perceptual grouping in texture segregation," Computer Vision, Graphics and Image Processing, Vol. 37, pp. 299-325, 1987. [12] H. C. Nothdurft, "Sensitivity for structure gradient in texture discrimination," Vision Research, Vol. 25, pp. 1957-1968, 1985. [13] B. Julesz, "Experiments in the visual perception of texture," Scientific America, Vol. 232, pp. 34-43, April 1975. [14] B. Julesz, "Textons, the elements of texture perception, and their interactions," Nature, Vol. 290, pp. 91-97, March 1981. [15] B. Julesz, "Texton gradients, the texton theory revisited," Biological Cybernetics, Vol. 54, pp. 245-251, 1986. [16] J. Beck, "Similarity grouping and peripheral discrimination under uncertainty," American Journal of Psychology, Vol. 85, pp. 1-19, 1972. [17] J. Beck, "Textural segmentation," Organization and Representation in Perception, ed. J. Beck, Hillside, NJ: Erlbaum. [18] H. C. Nothdurft, "Texton segregation by associated differences in global and local luminance distribution," Proc. of Royal Society London B, Vol. 239, pp. 295-320, 1990.
73
References
[19] Shaohua Zhou, Y. V. Venkatesh, and C. C. Ko, "Spectral and spatial analysis of visual texture perception," submitted to IEEE Trans. on Pattern Analysis and Machine Intelligence, March 2000. [20] Shaohua Zhou, Y. V. Venkatesh, and C. C. Ko, "Texture retrieval using treestructures wavelet transform," accepted by Int'l. Conf. Computer Vision, Pattern Recognition and Image Processing (CVPRIP’2000), February 2000. [21] J. K. Wu, "Content-based indexing of multimedia databases," IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 6, pp. 978-990, September 1997. [22] A. Pentland, R. W. Picard, and S. Scarloff, "Photobook: Tools for content-based manipulation of image databases," Proc. SPIE Conf. on Storage and Retrieval of Image and Video Databases, Vol. 2, No.2, pp. 33-47, February 1994. [23] F. Liu and R. W. Picard, "Periodicity, directionality, and randomness: Wold features for image modeling and retrieval," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 18, No. 7, pp. 722-733, July 1996. [24] R. W. Picard, T. Kabir, and F. Liu, "Real-time recognition with the entire Brodatz texture database," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 638-639, New York, June 1993. [25] G. L. Gimelfarb and A. K. Jain, "On retrieving textured images from an image database," Pattern Recognition, Vol. 29, No. 9, pp. 1461-1483, 1996. [26] B. S. Manjunath and W. Y. Ma, "Texture feature for browsing and retrieval of images from a database," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 18, No. 8, pp. 837-841, August 1996. [27] J. M. Francos, A. Z. Meiri, and B. Porat, "A unified texture model based on a 2D Wold-like decomposition," IEEE Trans. on Signal Processing, Vol. 41, No. 8, pp. 2665-2677, August 1995.
74
References
[28] J. M. Francos, A. N. Narasimhan, and J. W. Woods, "Maximum likeliness parameter estimation of texture using a Wold-decomposition based model," IEEE Trans. on Image Processing, Vol. 4, No. 12, pp. 1655-1666, December 1995. [29] R. Sriram, J. M. Francos, and W. Pearlman, "Texture coding using a Wold decomposition model," IEEE Trans. on Image Processing, Vol. 5, No. 9, pp. 1382-1386, September 1996. [30] J. S. Weszka, C. R. Dyer, and A. Rosenfeld, "A comparative study of texture measures for terrain classification," IEEE Trans. on Systems, Man. Cybernet., Vol. 6, pp.269-285, 1976. [30] H. Tamura, S. Mori, and T. Yamawaki, "Textural features corresponding to visual perception," IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-8, No. 6, June 1978. [31] R. M. Haralick, "Statistical and structural approaches to texture," Proc. of IEEE, Vol. 67, No. 5, pp. 786-804, May 1979. [32] A. Rosenfeld and L. Davis, "Image segmentation and image models," Proc. of IEEE, Vol. 67, No. 5, pp. 764-772, May 1979. [33] R. W. Conners and C. A. Harlow, "A theoretical comparison of texture algorithms," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-2, No. 3, pp. 204-222, May 1980. [34] M. Tuceryan and A. K. Jain, "Texture Analysis," Handbook of Pattern Recognition and Computer Vision, eds. C. H. Chen, L. F. Pau, and P. S. P. Wang, pp. 235-276. Singapore: World Scientific, 1993.
75
References
[35] T. R. Reed and J. M. H. du Buf, " A review of recent texture segmentation and feature extraction techniques," CVGIP: Image Understanding, Vol. 57, pp. 359372, May 1993. [36] J. Mao and A. K. Jain, "Texture classification and segmentation using multiresolution simultaneous autoregressive models," Pattern Recognition, Vol. 25, No. 2, pp. 173-188, 1992. [37] G. R. Cross and A. K. Jain, "Markov random field texture models," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 5, No. 1, pp. 25-39, January 1983. [38] C. Bouman and Bede Liu, "Multiple resolution segmentation of textured images," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 13, No. 2, pp. 99-113, February 1991. [39] S. Krishnamachari and R. Chellappa, "Multiresolution Gaussian-Markov random field models for texture segmentation," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 6, No. 2, pp. 251-267, February 1997. [40] J. M. Coggins, and A. K. Jain, "A spatial filtering approach to texture analysis," Pattern Recognition Letters, Vol. 3, pp. 195-203, 1985. [41] A. C. Bovik, M. Clark, and W. S. Geisler, "Multichannel texture analysis using localized spatial filter," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 12, No. 1, pp. 55-73, January 1990. [42] A. C. Bovik, "Analysis of multichannel narrow-band filters for image texture segmentation," IEEE Trans. on Signal Processing, Vol. 39, No. 9, pp. 20252043, September 1996. [43] A. K. Jain and F. Farrokhnia, "Unsupervised texture segmentation using Gabor filters," Pattern Recognition, Vol. 24, No. 12, pp. 1167-1186, 1991.
76
References
[44] T. Chang and C.-C. J. Kuo, "Texture analysis and classification with treestructured wavelet transform," IEEE Trans. on Image Processing, Vol. 2, No. 4, pp. 429-441, October 1993. [45] A. Laine and J. Fan, "Texture Classification by Wavelet Packet Signatures," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, No. 11, pp. 1186-1191, November 1996. [46] M. Unser, "Texture classification and segmentation using wavelet frames," IEEE Trans. on Image Processing, Vol. 4, No. 11, pp. 1549-1560, November 1995. [47] S. G. Mallat, "A theory of multiresolution signal decomposition: the wavelet representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674-693, July 1989. [48] S. G. Mallat, "Multifrequency channel decompositions of images and wavelet models," IEEE Trans. on Acoustics Speech And Signal Processing, Vol. 11, No. 7, pp. 674-693, July 1989. [49] I. Daubechies, "The wavelet transform, time-frequency localization and signal analysis," IEEE Trans. on Information Theory, Vol. 36, No. 9, pp. 961-1005, September 1990. [50] I. Daubechies, "Orthonormal bases of compactly supported wavelets," Communications on Pure and Applied Mathematics, Vol. 41, No. 11, pp. 909996, November 1990. [51] J. R. Bergen and E. H. Adelson, "Early vision and texture perception," Nature, Vol. 333, pp. 363-364, May 1988. [52] H. Voorhees and T. Poggio, "Computing texture boundaries from images," Nature, Vol. 333, pp. 364-367, May 1988.
77
References
[53] E. Barth, C. Zetzsche, and I. Rentschler, "Intrinsic two-dimensional features as textons," J. Opt. Soc. Am. A, Vol. 15, No. 7, pp. 1723-1732, July 1998. [54] J. Malik and P. Perona, "Preattentive texture discrimination with early vision mechanisms," J. Opt. Soc. Am. A, Vol. 7, No. 5, pp. 923-932, May 1990. [55] J. R. Bergen and M. S. Landy, "Computational modeling of visual texture segregation," in Computational Models of Visual Processing, eds. M. S. Landy and J. A. Movshan, pp. 253-271, MIT Cambridge, Mass. 1991. [56] D. Sagi, "The psychophysics of texture segmentation," in Early Vision and Beyond, ed. T. V. Papathomas, pp. 70-78, MIT Cambridge, Mass. 1995. [57] M. R. Turner, "Texture Discrimination by Gabor functions", Biological Cybernetics, Vol. 55, pp. 77-82, 1986. [58] I. Fogel and D. Sagi, "Gabor filters as texture discriminators," Biological Cybernetics, Vol. 61, pp. 103-113, 1989. [59] M. Porat, and Y. Y. Zeevi, "Localized texture processing in vision: analysis and synthesis in the Gaborian space," IEEE Trans. Biomedical Engineering, Vol. 36, No. 1, pp. 115-129, January 1989. [60] B. S. Rubenstein and D. Sagi, "Spatial variability as a limiting factor in texturediscrimination tasks: implications for performance asymmetries," J. Opt. Soc. Am. A, Vol. 7, No. 9, pp. 1632-1643, September 1990. [61] B. S. Rubenstein and D. Sagi, "Preattentive texture segmentation: the role of line terminations, size, and filter wavelength," Perception & Psychophysics, Vol. 58, No. 4, pp. 489-509, 1996. [62] A. Treisman, "Preattentive processing in vision," Computer Vision, Graphics Image Processing, Vol. 31, pp. 156-177, 1985.
78
References
[63] A. Treisman, "Features and objects in visual processing," Sci. Am., Vol. 255, pp. 106-125, 1986. [64] R. Gurnsey and R. Browse, "Micropattern properties and presentation conditions influencing visual texture discrimination," Perception & Psychophysics, Vol. 41, pp. 239-252, 1987. [65] R. Gurnsey and R. Browse, "Aspects of visual texture discrimination," in Computational Processes in Human Vision: An Interdisciplinary Perspective, ed. Z. Pylyshyn, Ablex, Norwood, NJ, 1988. [66] R. Gurnsey and R. Browse, "Asymmetries in visual texture discrimination," Spatial Vision, Vol. 4, pp. 31-44, 1989. [67] B. J. Krose, A description of visual structure, Ph.D. dissertation, Delft University of Technology, Delft, The Netherlands, 1986. [68] A. R. Rao and G. L. Lohse, "Towards a texture naming system: identifying relevant dimensions of texture," Proc. IEEE Conf. Visualization, pp. 220-227, San Jose, California, October 1993. [69] D. Dunn and W. E. Higgins, "Optimal Gabor filters for texture segmentation," IEEE Trans. Image Processing, Vol. 4, No. 7, pp. 947-964, July 1995. [70] T. Weldon, W. E. Higgins and D. Dunn, "Gabor filter design for multiple texture segmentation," Optical Engineering, Vol. 35, No.10, pp. 2852-2863, October 1995.
79
APPENDIX A
Filter Coefficients of Wavelet Transforms
For the sake of completeness, this appendix lists the filter coefficients h(n) , in C language format, of Haar transform and Daubechies‟ 16-tap wavelet transform [50], which have been used in the experiment on texture retrieval. Note that the Haar transform is the special case of Daubechies‟ 2-tap wavelet transform.
double daub2[2] = { 0.707106781186547, 0.707106781186547 }; double daub16[16] = { 0.05441584224300, 0.3128715909140, 0.6756307362970, 0.5853546836540,
80
Appendix
-0.01582910525600, -0.2840155429620, 0.0004724845740000, 0.1287474266200, -0.01736930100200, -0.04408825393100, 0.01398102791700, 0.008746094047000, -0.004870352993000, -0.0003917403730000, 0.0006754494060000, -0.0001174767840000 };
81
Appendix
APPENDIX B
The Notation of Subband ID in Table 3.2
The notation of a subband ID obeys the following rules: (1) the number of the digits indicates the decomposition level; (2) after one decomposition, the corresponding four subbands, namely the LL, LH, HL, HH subband, are denoted by the digits „1‟, „2‟, „3‟, and „4‟, respectively; and (3) the leftmost digit denotes the subband obtained after the first-level decomposition, and the rightmost digit denotes the subband obtained after the last-level decomposition.
Take subband „3214‟ for example. Here a 4-level decomposition is presented. After the 1st level decomposition, the HL subband („3‟) is further decomposed in the 2nd level; then the LH subband („32‟) of subband 3 is further decomposed in the 3rd level; then the LL subband („321‟) of subband 32, is further decomposed in the 4th level, and the HH subband („3214‟) of subband 321 is the final subband. The same logic can be applied to the other subbands also.
82