Docstoc

Gray Image Coloring Using Texture Similarity Measures

Document Sample
Gray Image Coloring Using Texture Similarity Measures Powered By Docstoc
					                              The University of Monofiya
   ‫ا‬
                          Faculty of Computers and Information



Gray Image Coloring Using Texture Similarity Measures

                Thesis Submitted in accordance with the requirements of
                     The University of Monofiya for the degree of
                                                 Information
                         Master of Computers and Information
                              ( Information Technology )



                          El-
           Eng. Noura Abd El-Moez Al Sebaey Semary
                         Department of Information Technology
                         Faculty of Computers and Information
                                  Monofiya University



                                        Supervised by

 Prof. Mohiy Mohamed Hadhoud                          Prof. Nabil Abdel Wahid Ismail

 Department of Information Technology                   Department of Computer Science
 Faculty of Computers and Information                 Faculty of Computers and Information
          Monofiya University                                  Monofiya University

       [                      ]                            [                        ]


                                  Dr. Wail Shawki El-Kilani

                             Department of Information Technology
                             Faculty of Computers and Information
                                      Monofiya University

                                    [                     ]



                                          Year 2007
                                The University of Monofiya
   ‫ا‬
                            Faculty of Computers and Information



Gray Image Coloring Using Texture Similarity Measures

                 Thesis Submitted in accordance with the requirements of
                      The University of Monofiya for the degree of
                           Master of Computers and Information
                                ( Information Technology )



                           El-
            Eng. Noura Abd El-Moez Al Sebaey Semary
                           Department of Information Technology
                           Faculty of Computers and Information
                                    Monofiya University



                                  Examiners Committee:

 Prof. Dr. Said El Saied El-Khamy                Prof. Dr. Moawad Ibrahim Dessoky

   Department of Electrical Engineering                    Department of Electronics
         Faculty of Engineering                          And Electrical Communications
         Alexandria University                          Faculty of Electronic Engineering
                                                              Monofiya University

       [                         ]                        [                          ]


Prof. Dr. Mohiy Mohamed Hadhoud                  Prof. Dr. Nabil Abdel Wahid Ismail

  Department of Information Technology                  Department of Computer Science
  Faculty of Computers and Information                Faculty of Computers and Information
           Monofiya University                                 Monofiya University

       [                         ]                        [                          ]



                                          Year 2007
                                                 ‫ا‬                                                                       ‫ا‬
                                      ‫ت‬        ‫توا‬                 ‫ا‬     ‫آ‬




                                  ‫ا‬          ‫ت‬             ‫َ‬                            ‫ر‬
                                            ‫در‬         ‫ل‬
                         ‫ت‬                ‫ا ﺱ توا‬                                           ‫ا‬
                             ‫ت‬              ‫ا‬     ‫ﺕ‬                               ‫ﺕ‬



                                                                                                       ‫.‬
                                      ‫ت‬         ‫ا‬              ‫ﺕ‬
                                      ‫ت‬        ‫توا‬                 ‫ا‬     ‫آ‬
                                                  ‫ا‬


                                           ‫اف‬      ‫إ‬

         ‫ا اﺡ إ‬               ‫أ.د.‬                                                    ‫هه د‬         ‫ﻡ‬       ‫أ.د. ﻡ‬
      ‫ـــــ ــــ م ا ـــــ‬                                                            ‫ــ ت‬       ‫ــ ا‬           ‫ﺕ‬
    ‫ت‬           ‫توا‬         ‫آ ا‬                                                       ‫ت‬         ‫توا‬            ‫آ ا‬
    ‫ـــــــــــ‬   ‫ـــــــــ ا‬                                                          ‫ــــــــ‬    ‫ـــــــــــ ا‬

‫[‬                             ‫]‬                                                   ‫[‬                                  ‫]‬

                                               ‫ا‬                       ‫د.وا‬
                                      ‫ــ ت‬       ‫ــ ا‬           ‫ﺕ‬
                                      ‫ت‬         ‫توا‬            ‫آ ا‬
                                       ‫ــــــــ‬    ‫ـــــــــــ ا‬

                                  ‫[‬                                           ‫]‬

                                             ‫م ٧٠٠٢‬
                                                  ‫ا‬                                                              ‫ا‬
                                         ‫ت‬      ‫توا‬             ‫ا‬   ‫آ‬




                                     ‫ا‬         ‫ت‬        ‫َ‬                     ‫ر‬
                                              ‫در‬    ‫ل‬
                            ‫ت‬                ‫ا ﺱ توا‬                              ‫ا‬
                                 ‫ت‬             ‫ا‬     ‫ﺕ‬                  ‫ﺕ‬



                                                                                                ‫.‬
                                         ‫ت‬       ‫ا‬          ‫ﺕ‬
                                         ‫ت‬      ‫توا‬             ‫ا‬   ‫آ‬
                                                   ‫ا‬


                                               ‫وا‬           ‫ا‬

         ‫أ.د. ﻡ ض إﺏ اه د‬                                                     ‫ا ﻡ‬           ‫ا‬         ‫أ.د.‬
     ‫وﻥ ت‬     ‫ه ـ ا‬                                                                  ‫ا ﺏ‬         ‫ا‬
         ‫و ا ﺕ ــ ت ا ﺏ‬                                                           ‫ــــــــ‬ ‫آ ـــــــ ا‬
       ‫ـ ا آ وﻥ‬        ‫آ ـ ا‬                                                      ‫ریـــ‬    ‫ـــ ا‬
      ‫ـــــ‬  ‫ـــــــــ ا‬

‫[‬                                ‫]‬                                      ‫[‬                                    ‫]‬

           ‫ا اﺡ إ‬                ‫أ.د.‬                                       ‫هه د‬           ‫ﻡ‬        ‫أ.د. ﻡ‬
      ‫ـــــــ‬   ‫ـــــــ م ا‬                                                 ‫ــ ت‬       ‫ـ ا‬          ‫ـ ﺕ‬
    ‫ت‬          ‫توا‬           ‫ا‬   ‫آ‬                                          ‫ت‬         ‫توا‬            ‫آ ا‬
     ‫ـــــــــ‬    ‫ــــــــــ ا‬                                               ‫ــــــــ‬    ‫ــــــــــ ا‬

‫[‬                                ‫]‬                                      ‫[‬                                    ‫]‬

                                               ‫م ٧٠٠٢‬
                                                                       Bibliography




                         BIBLIOGRAPHY

The researcher / Noura Abd El-Moez Semary received bachelor of Information
Technology from Faculty of Computers and Information, Cairo University in 2001,
with grade of Very Good with honor.

In 2002 she received the Diploma degree of Information Technology Institute (ITI) of
The Ministry of Communications, with very good grade.

In 2003, she has joined the Information Technology department, Computers and
Information Faculty, Monofiya University as a demonstrator. She registered for
master degree in Information Technology in Nov-2004.




                                         i
                                                                           Acknowledgment




                     ACKNOWLEDGMENT
   First of all, I thank my God for achieving this work and giving me the ability to
finish it in that distinct and satisfactory form. Good thing is from God and bad things
are from me and the evil.

   I would like to express my appreciation to my supervisor Prof. Mohiy Mohamed
Hadhoud for his continuous support and encouragement during the research study in
this thesis. He really influenced my way of thinking and developing the research ideas
adopted in this thesis. I am very grateful for his strong effort with me and his highly
useful advice through out the development of this work.

   I would like also to express my great and special thanks and gratefulness to my
supervisors Prof. Nabil Abdel Wahed Ismail and Dr. Wail Shawki El Kilani for their great
support and useful advices till the end of this thesis. Special thanks to Dr. Fathi El-Said
for his care and help for solving some research problems. And great thanks to all my
colleges who have helped me during the research period.

   Finally, this work is especially dedicated to my dear father who encourages me
along my success way from preschools till now, and also to my husband for his
patience, support and encouragement all the time. Also I’d like to give special thanks
to my dear mother, brothers and sister. Without their praying, encouragement,
patience and support I would not have been able to finish this work in such manner.

   And Now, I thank the excellence examiners for their revision and care to redirect
me for completing this thesis in a satisfactory form.


                                    ♥♥ Thank You ♥♥




                                            ii
                                                                                  Abstract




                                ABSTRACT
       The trend for coloring gray scale mages becomes very interesting research
point nowadays since it is utilized to increase the visual appeal of images such as old
black and white photos, movies or scientific illustrations, and different techniques for
doing that exist in the literature. There are many traditional ways for coloring like
hand-coloring and pseudocoloring, but the trend for automatic computer based
coloring is an important goal for many researchers in this field. From our side of view
all these trials are still depend on human interaction in the coloring process.


In this thesis, we propose a new system for automatic gray image coloring. The
proposed system is constructed of texture based recognition system that recognizes the
objects in textural images like natural scenes, and then retrieve their actual colors from
a previous knowledge about their classes. This procedure is a trial of simulating the
human vision in this area.


The proposed Texture Recognition based Image Coloring System (TRICKS) is
composite of three stages; segmentation stage to segment the image into different
regions according to their textural characteristics, classification stage to classify or
recognize each region to be one of a predefined set of texture classes. For each class, a
hue color is attached which is transferred to the regions of this class in the final stage,
the coloring stage.


This thesis is a research work for studying different gray image features, segmentation
and classification techniques, and different color modes to select the suitable
techniques for achieving the best coloring results.




                                            iii
                                                                                  Summary




                                SUMMARY
Since the trend for automatic computer based coloring is an important goal for many
image processing researchers especially in this field, we have proposed an intelligent
fully automatic coloring system for textural images like natural images. It works by
segmenting the image into different textured regions then finding the suitable class for
each region from a set of different textures stored in a special database and then
recoloring the image parts by the attached color with the recognized class.

The texture features used in this paper have varied between Wavelets coefficients,
Laws kernels, Co-occurrence matrix and Tamura measures. Both Mean shift and Fast
K-Mean clustering techniques were used in the segmentation stage. KNN classifier is
used in the classification stage. Finally, the coloring process was done using HSV
color space to preserve reality in the image. Also we have proposed simple
modifications to improve our system results and to make it fully automatic system.

The results are presented in a large number of gray natural images. We have also
introduced a comparison between our results and the results obtained from other
techniques and our system results are found acceptable and very satisfactory.

The thesis is organized as follows:

Chapter 1 introduces the problem and the traditional solutions like pseudocoloring and
the different ways used for coloring by image processing softwares like Paintshop and
Photoshop. Also it presents a simple introduction on the automatic coloring and the
proposed technique.

Chapter 2 surveys the previous works contributed for automatic coloring that we
classified in three trends. The chapter presents all these trials with examples illustrate
the disadvantages and problems of each.




                                            iv
                                                                                 Summary




Chapter 3 covers the basic principles of texture analysis and the different algorithms
used for extracting the texture features in the literature. Also the chapter presents the
most databases used as benchmarks for evaluating textured image analysis techniques.

In chapter 4, the most common segmentation and classification techniques like Mean
Shift, K-Mean K Nearest Neighbor classifier that were used in our system are
presented. Also the different techniques used to enhance the results are presented.

Chapter 5 presents the proposed coloring system with its main three stages in details.
The chapter also presents the main improvements we have made to enhance the results
of each stage. Comparisons between using different techniques in each stage also are
presented in the chapter.

Chapter 6 presents the system results, conclusions and future works. Also the chapter
presents comparisons between our system and two well known coloring systems.

Finally Appendix A presents the different color models like RGB, HSV, HLS, …etc to
help readers understanding the specification of each mentioned color model.

Keywords:
Colorization, Texture features, Segmentation, Classification, Wavelets, Laws, Color
models




                                           v
                                                                                                  Table Of Contents




                     TABLE OF CONTENTS

Subject                                                                                                       Page

   List of Tables..........................................................................................   x
   List of Figures......................................................................................... xi
   List of Abbreviations..............................................................................        xvi
   List of Symbols....................................................................................... xxii
   List of Publications................................................................................       xxviii
Chapter 1. Introduction                                                                                       1
   1.1. Digital image principles..................................................................            1
   1.2. Coloring Problem............................................................................ 5
   1.3. Coloring Trials................................................................................       6
        1.3.1. Hand coloring......................................................................... 6
        1.3.2. Semi automatic coloring......................................................... 8
        1.3.3. Automated coloring................................................................ 10
   1.4. Research Objective.......................................................................... 11
   1.5. Thesis Organization......................................................................... 11

Chapter 2. Previous works                                                                                     12
   2.1. Transformational coloring................................................................ 12
   2.2. Matched image coloring.................................................................. 13
   2.3. User selection coloring.................................................................... 23
   2.4. Video coloring trials........................................................................ 27

Chapter 3. Texture                                                                                            30
   3.1. What is texture?..............................................................................        30
   3.2. Texture analysis .............................................................................        32
   3.3. Texture features .............................................................................. 34



                                                           vi
                                                                                              Table Of Contents




      3.3.1. Statistical methods.................................................................         34
                − Co-occurrence Matrix....................................................... 35
                − Tamura.............................................................................. 36
      3.3.2. Geometrical method............................................................... 39
      3.3.3. Model-base methods..............................................................             41
      3.3.4. Signal processing methods..................................................... 42
           − Laws’ texture energy.............................................................. 42
         − Wavelets Analysis.................................................................. 44
  3.4. Texture Databases........................................................................... 53
           − Brodatz...................................................................................   53
           − VisTex.................................................................................... 53
           − MeasTex................................................................................. 55
           − CUReT.................................................................................... 55
           − OUTex.................................................................................... 56
           − PANN..................................................................................... 58


Chapter 4. Image Analysis Techniques                                                                      60
  4.1. Segmentation techniques................................................................. 60
      4.1.1. Mean Shift algorithm ............................................................. 60
                − What is it?......................................................................... 61
                − Mean shift filtering...........................................................         64
                − Mean shift Segmentation..................................................               65
                − Choice of bandwidth......................................................... 67
                − Use of variable bandwidth................................................ 67
                − Calculation of bandwidth using k-nearest neighbors........ 68
                − Speeding up meanshift computations...............................                       69
                − Kd-Trees...........................................................................     70
                − Locality-sensitive hashing................................................              72



                                                       vii
                                                                                               Table Of Contents




        4.1.2. K-Mean clustering algorithm.................................................               75
                 − Accelerated k-mean..........................................................           78
                 − Applying the triangle inequality.......................................                79
                 − K-mean example............................................................... 81
    4.2. Classification Techniques ..............................................................         82
        4.2.1. Similarity Measurement ........................................................ 83
                 − Heuristic histogram distances........................................... 83
                 − Non-parametric test statistics............................................ 83
                 − Ground distance measures................................................               84
                 − Information-theoretic divergence.....................................                  84
        4.2.2. Nearest neighbor classifier..................................................... 85
                 − K-Nearest Neighbor Algorithm........................................                   86
                 − Strength and Weakness of K-Nearest Neighbor
                  Algorithm.......................................................................... 88
        4.2.3. Validation Algorithms............................................................ 89



Chapter 5.       Texture Recognition Based Coloring System                                                91
    5.1. Segmentation Stage......................................................................... 92
        5.1.1. Feature extraction phase......................................................... 92
        5.1.2. Clustering phase / Mean Shift ............................................... 97
        5.1.3. Clustering phase / K-mean..................................................... 99
             − Adaptive k-mean....................................................................        102
             − Minimum region size estimation............................................ 104
    5.2. Classification Stage........................................................................ 106
        5.3. Coloring Stage...........................................................................    115


Chapter 6.       Results And Conclusions                                                                  119
        6.1. Experiments ..............................................................................   119



                                                        viii
                                                                                              Table Of Contents




      6.2. System results............................................................................    121
      6.2.1. Accurate Results ...................................................................        121
      6.2.2. Misclassified results..............................................................         126
      6.3. Comparisons..............................................................................     128
      6.4. Conclusion ................................................................................   132
      6.5. Future Work                                                                                   134


Appendix A. Color Models                                                                                 135
      A.1. Color Fundamentals...............................................................             135
      A.2. Color Models.......................................................................... 137
          A.2.1. RGB.................................................................................    137
          A.2.2. CMY/CMYK...................................................................             137
          A.2.3. HSI/HLS..........................................................................       139
          A.2.4. HSV/HSB........................................................................         140
          A.2.5. YIQ..................................................................................   145
          A.2.6. YUV................................................................................. 146
          A.2.7. CIE Lab/ lαβ....................................................................        147


References                                                                                               151




                                                       ix
                                                                                         List of Tables




                        LIST OF TABLES
   Table                                             Title                                            page

Table 1-1   A comparison between a colored and gray image……..……….                                      4

Table 1-2   Examples of different colors with the same gray value………..                                 7

Table 3-1   Texture databases comparison.................................................             58

Table 4-1   Similarity measures comparisons.............................................              85

Table 5-1   Mean shift results....................................................................    98

Table 5-2   K-mean results.........................................................................   105

Table A.1   Hue and luminance distribution................................................            141

Table A.2   Change in saturation................................................................      141

Table A.3   RGB values from HSV..............................................................         144




                                                 x
                                                                                    List of Abbreviations




                          LIST OF FIGURES
    Figure                                                Title                                          Page

Figure 1-1   Normalized RGB cube model.....................................................                 2

Figure 1-2   Binary image example.................................................................          2

Figure 1-3   Intensity image example..............................................................          2

Figure 1-4   Indexed image example ………………………………………..                                                        4

Figure 1-5   True color image example……………………………………                                                         4

Figure 1-6   Hand coloring examples………………………………………..                                                        7

Figure 1-7   Pixel based Pseudocoloring …………………………………....                                                   9

Figure 1-8   Group based Pseudocoloring.......................................................              9

Figure 1-9   Coloring using different color map.............................................                9

Figure 2-1   Al-Gindy coloring system diagram and results.........................                          14

Figure 2-2   Global matching coloring result..................................................              17

Figure 2-3   Night vision image coloring........................................................            17

Figure 2-4   Local color transfer results..........................................................         17

Figure 2-5   Welsh swatches...........................................................................      20

Figure 2-6   Matting coloring algorithm sequence..........................................                  20

Figure 2-7   H. Nodaa Coloring system result.................................................               22




                                                     xi
                                                                                          List of Abbreviations




Figure 2-8    Fully automatic coloring system.................................................                    22

Figure 2-9    A.Levin system results................................................................              26

Figure 2-10   V. Konushin coloring results.......................................................                 26

Figure 2-11   S. Premoˇze aerial orthoimagery coloring..................................                          26

Figure 2-12   D. S´ykora cartoon coloring system............................................                      29

Figure 2-13   The animation results for the canyon model...............................                           29

Figure 3-1    Example samples of Brodatz texture database............................                             31

Figure 3-2    Comparison between different texture types..............................                            31

Figure 3-3    Co-occurrence matrix calculations..............................................                     36

Figure 3-4    Comparison between a sine wave and wavelet wave.................                                    45

Figure 3-5    Time-frequency tiling for sampled data, FFT, and DWT basis
              functions.....................................................................................      47

Figure 3-6    1D signal discrete Wavelet Transform for 1 level......................                              49

Figure 3-7    Three levels discrete wavelets decomposition............................                            49

Figure 3-8    The analysis filter bank for 2D wavelet decomposition..............                                 51

Figure 3-9    The reconstruction filter bank for 2D wavelet decomposition....                                     51

Figure 3-10   2D wavelets decomposition coefficients.....................................                         52

Figure 3-11   Some samples of wavelets families.............................................                      52

Figure 3-12   Samples from Brodatz database benchmark...............................                              54




                                                        xii
                                                                                          List of Abbreviations




Figure 3-13   Samples from Vistex database benchmark.................................                                     54

Figure 3-14   Sample from MeasTex database benchmark...............................                                       57

Figure 3-15   Samples from Columbia-Utrecht database benchmark...............                                             57

Figure 3-16   Outex database example..............................................................                        57

Figure 3-17   Samples from PANN database benchmark.................................                                       59

Figure 4-1    Successive computations of the meanshift define a path leading to a
              local density maximum........................................................                               66

Figure 4-2    Mean shift filtering and segmentation.........................................                              66

Figure 4-3    Mean shift segmentation result....................................................                          66

Figure 4-4    kd-tree example...........................................................................                  71

Figure 4-5    The difference between linear and data-driven partitioning.......                                           74

Figure 4-6    The locality-sensitive hashing data structure..............................                                 74

Figure 4-7    K-mean procedure block diagram.............................................                                 76

Figure 4-8    K-mean example.........................................................................                     81

Figure 4-9    Nearest neighbor classifier in action on a two class problem.....                                           87

Figure 5-1    TRICS diagram.....................................................................................          91

Figure 5-2    Interpolated wavelets coefficients vs. previous level reconstruction
              coefficients.............................................................................................   94

Figure 5-3    Modified wavelets feature extraction....................................................                    94




                                                       xiii
                                                                                       List of Abbreviations




Figure 5-4    wavelets approximation levels...............................................................            95

Figure 5-5    5 three levels modified DWT subbands. ..............................................                    95

Figure 5-6    Lows kernels coefficients......................................................................         96

Figure 5-7    FAMS results ........................................................................................   98

Figure 5-8    k-mean with spatial features ................................................................           100

Figure 5-9    Segmentation results before and after splitting disjoint regions………                                     100

Figure 5-10   Elimination small regions.....................................................................          101

Figure 5-11   Segmentation results by laws and wavelets.........................................                      101

Figure 5-12   Example to illustrate the significance of the VRC index.....................                            103

Figure 5-13   Training set1 generation.......................................................................         107

Figure 5-14   Training set2 samples...........................................................................        107

Figure 5-15   samples of the artificial images used in set1 classification..................                          109

Figure 5-16   Largest rectangle extraction example...................................................                 109

Figure 5-17   64×64 rectangle extraction example.....................................................                 109

Figure 5-18   Padding results by ZR, MR, and OBP...................................................                   112

Figure 5-19   Classification results for GLCM, Tamura features...............................                         112

Figure 5-20   Natural scenes classification results......................................................             114

Figure 5-21   HSV/HSB and HSI/HLS channels........................................................                    116

Figure 5-22   Change in saturation coloring results....................................................               118




                                                      xiv
                                                                                      List of Abbreviations




Figure 5-23   Replacing saturation with the complement of lightness.......................                          118

Figure 6.1    Brodatz textures experiment results.....................................................              120

Figure 6.2    Training set images................................................................................   120

Figure 6.3    Segmentation with spatial features......................................................              122

Figure 6.4    Segmentation without spatial features.................................................                123

Figure 6.5    Other test set images..............................................................................   124

Figure 6.6    PANN Database, segmentation with splitting disjoint regions.............                              125

Figure 6.7    Misclassified results............................................................................     127

Figure 6.8    Comparison between LCT, GIM techniques and TRICS system.........                                      129

Figure 6.9    Comparison with GIM technique..........................................................               130

Figure 6.10   HSI- HSV coloring comparison......................................................                    131

Figure A.1    Absorption of light by the red, green, and blue cones in the human
              eye as a function of wavelength........................................................               136

Figure A.2    RGB color cube.....................................................................................   137

Figure A.3    Additive colors and subtractive colors.................................................               138

Figure A.4    HSV,HSL color models.......................................................................           140

Figure A.5    Different models representation...........................................................            141

Figure A.6    Hue, Saturation and Lightness/Brightness channels............................                         142

Figure A.7    XYZ and Lab color spaces...................................................................           148




                                                      xv
                                                                                    List of Abbreviations




             LIST OF ABBREVIATIONS

     App                                           Mean                                             Page

1D         One Dimension......................................................................       42

2D         Two Dimension.....................................................................        19

3D         Three Dimension...................................................................        32

ACA        Adaptive Clustering Algorithm...............................................              81

AMISE      Asymptotic Mean Integrated Squared Error.........................                         62

AR         Resolution Autoregressive....................................................             41

ARMA       Autoregressive-Moving-Average..........................................                   41

ASM        Angular Second Moment.......................................................              35

B          Blue Channel..........................................................................   142

BCSS       Between Clusters Sum Of Squares.........................................                 102

BW         Black To White-Axe…………………………………………                                                       139

CBIR       Content Base Image Retrieval………………………………..                                               111

CCIR       International Radi Consultive Committee…………………….                                         145

CEC        Combined Estimation Criterion……………………………….                                               103

CIE        Commision Internationale De l'Eclairage……………………..                                        145

CIR        Color Infrared Image………………………………………….                                                     21

CML        Coupled Map Lattices…………………………………………                                                      25

CMY        Cyan, Magneta And Yellow Color Model…………………….                                            137




                                                   xvi
                                                List of Abbreviations




CMYK    Cyan, Magneta , Yellow And Black Color Model…………….     137

Con     Contrast………………………………………………………..                        35

Cor     Correlation…………………………………………………….                       35

CRT     Cathode Ray Tube……………………………………………..                    135

CUReT   Columbia University And Utrecht University…………………      55

CV      Cross Validation……………………………………………….                    90

CVC     Color Variation Curve…………………………………………                  28

CVM     Cramer/Von Mises Distance Measure…………………………            84

CWT     Continuous Wavelet Transform……………………………….              45

DB      Database………………………………………………………..                        116

DCT     Discrete Cosine Transform…………………………………….               27

DWT     Discrete Wavelet Transform…………………………………..              49

E5      5 Elements Edge Kernel Mask Of Laws……………………….          43

EMD     Earth Mover Distance………………………………………….                  84

Ent     Entropy………………………………………………………..                         35

FAMS    Fast Adaptive Mean Shift……………………………………..               69

FCM     Fuzzy C-Mean………………………………………………….                       81

G       Green Channel…………………………………………………                       142

GIM     Global Image Matching Technique……………………………             128

GLCM    Gray Level Co-Occurrence Matrix…………………………….            35

GMM     Gaussian Mixture Model………………………………………                  16

H       Hue Channel……………………………………………………                        139




                              xvii
                                                   List of Abbreviations




HLS     Hue, Lightness And Saturation Color Model………………….         118

HF      High Frequency………………………………………………..                        14

HSB     Hue, Saturation And Brightness Color Model…………………          5

HSI     Hue, Saturation And Intensity Color Model…………………..         5

HSV     Hue, Saturation And Value Color Model……………………             118

I       Intensity/Lightness Channel In HSI Model…………………..         142

I       1st Chromatic Channel In YIQ color Model…………………..         145

IDCT    Inverse Discrete Cosine Transform……………………………              14

ISCT    Image Sequence Color Transfer……………………………….                28

JD      Jeffrey-Divergence…………………………………………….                      85

KL      Kullback-Leibler Divergence………………………………….                 85

KNN     K-Nearest Neighbor……………………………………………                       86

KS      Kolmogrov-Smirnov Distance…………………………………                   84

l       Luminance Channel In Lαβcolor Model……………………….             148

L       L Channel In MSL Model/ Or Lightness Channel In HSL…….    148

L5      5 Elements Level Kernel Mask Of Laws………………………             43

L5S5'   25 Elements Matrix Of L5 * S5’ Of Laws……………………..          43

LCT     Local Color Transfer Technique……………………………….               128

LMS     Color Space Used For Conversion……………………………..              149

LOOCV   Leave-One-Out Cross Validation……………………………...              89

Lp      Minkowski-Form Distance…………………………………….                    83

LSH     Local Sensitivity Hashing…………………………………….                  69




                               xviii
                                                       List of Abbreviations




LUT       Look Up Table…………………………………………………                             8

lαβ       Lαβ Color Model………………………………………………                            5

M         M Channel In LMS Channel…………………………………..                     148

MA        Moving-Average……………………………………………….                           41

MAP       Maximum A Posteriori………………………………………...                      21

MeasTex   The Measurement Of Texture Classification Algorithms……..    55

MISE      Mean Integrated Squared Error………………………………..                 67

MR        Mirroring Padding……………………………………………..                        110

MRF       Markov Random Field Model………………………………….                     21

MRI       Magnetic Resonance Imaging………………………………….                    10

MRS       Minimum Region Size…………………………………………                         65

NTSC      National Television System Committee Video Standard ……..    145

O(N)      Order Of N……………………………………………………..                            71

OBP       Object-Based Padding………………………………………….                       110

OUTex     University Of Oulu Texture Database…………………………               56

PAL       Phase Alternating Line Video Standard……………………….             145

PBF       Pixel Based Feature…………………………………………….                       92

Q         2nd Chromatic Channel In Yiq color Model…………………...          145

QF        Quadratic Form………………………………………………...                         84

R         Red Channel……………………………………………………                             142

R5        5 Elements Ripple Kernel Mask Of Laws……………………..             43

RBF       Region Based Features…………………………………………                       108




                                   xix
                                                    List of Abbreviations




RGB      Red , Green, And Blue Color Model…………………………..              1

S        Saturation Channel/ Or S Channel In LMS Channel…………..     142

S5       5 Elements Spot Kernel Mask Of Laws……………………….             43

SECAM    ‘Sequential Couleur Avec Memoire’ Video Standard………..     145

SEM      Scanning Electron Microscopy………………………………...               10

SOFM     Self Organization Feature Map Neural Network………………        12

TEM      Texture Energy Measure………………………………………                     43

TRICS    Texture Recognition Based Image Coloring System…………..     91

TSS      Total Sum Of Squares…………………………………………                      102

U        1st Chromatic Channel In Yuvcolor Model…………………….          146

USA      United States Of America …………………………………….                  145

V        Brightness/Value Channel/Or 2nd Chromatic Channel In

         Yuvcolor Model………………………………………………..                        144

VisTex   Vision And Modeling Group At The MIT Media Lab………...      53

VRC      Variance Ratio Criterion……………………………………….                  102

W5       5 Elements Wave Kernel Mask Of Laws………………………              43

WCSS     Within Cluster Sum Of Squares……………………………….                102

WMV      Weighted-Mean-Variance Distance……………………………                83

X        X Component Of XYZ Color Model…………………………..                148

X2       X2 Distance Measure…………………………………………..                     84

XYZ      Device Independent Color Space………………………………                147




                                 xx
                                                  List of Abbreviations




Y       Y Component Of XYZ Color Model/ Or Luminance

        Component In YIQ And YUV Models………………………..               145

YCbCr   Color Model Used In Digital Component Video ……………...     146

YIQ     YIQ Color Model………………………………………………                         5

YPbPr   Color Model Used In Analog Component Video ……………..       146

YUV     YUV Color Model……………………………………………..                       145

Z       Z Component Of XYZ Color Model…………………………..               148

ZR      Zero Padding…………………………………………………..                        110

α       1st Chromatic Channel In Lαβcolor Model……………………          148

β       2nd Chromatic Channel In Lαβcolor Model…………………...        148




                               xxi
                                                                                               List of Symbols




                        LIST OF SYMBOLS
      Symbol                                           Mean                                                Page

A              Asymmetric matrix.................................................................           25

acj            Approximation coefficient for scale j.....................................                   47

ad             Directional angle....................................................................        38

adp            Position of pth peak................................................................         39

Av             Average..................................................................................    36

B1, B2, B3     Basis images...........................................................................      28

b              Block size the power for base 2.............................................                 37

c              Coefficient..............................................................................    62

C              Category or class....................................................................        86

cj             Cluster center of index j.........................................................           75

Cj             Over all clusters center...........................................................         102

Cli            Class index.............................................................................     87

CLp            Image clusters.........................................................................      65

coP            Co-occurrence probability density function...........................                        35

D              Diagonal matrix whose diagonal elements are the sum of

               the affinities...........................................................................    25

d              Dimension.............................................................................

dcj            Detail/wavelet coefficient for scale j.....................................                  47




                                                    xxii
                                                                                       List of Symbols




dco    Co-occurrence distance..........................................................              35

dij    Dissimilarity between i,j........................................................             84

E      Distance / error......................................................................        18

e      Number of iterations..............................................................            78

ed     Edge strength..........................................................................       38

f(.)   Mapping function...................................................................           15

ƒˆ     Probability density function...................................................               62

fc     Cost function..........................................................................       23

Fcon   Contrast..................................................................................    37

Fdir   Directionality..........................................................................      39

Flin   Line likeness...........................................................................      39

Freg   Regularity...............................................................................     39

Frgh   Roughness...............................................................................      39

g      Gradient function....................................................................         63

G,K    Kernels....................................................................................   64

Gi     Gaussian component...............................................................             16

Hd     Histogram................................................................................     38

Hdir   Histogram of quantized directional.........................................                   38

hn     Edges length/window width....................................................                 61

hr     Range bandwidth....................................................................           65

hs     Spatial bandwidth...................................................................          65

hw     Wavelets features bandwidth.................................................                  98



                                            xxiii
                                                                                      List of Symbols




hφ    Wavelet filter for scale function.............................................                51

hψ    Wavelet filter for base function..............................................                51

i,j   Counters..................................................................................    ...

I,J   Images.....................................................................................   ...

Ic    Color image.............................................................................      8

Ick   K channel color component....................................................                 12

Ig    Gray image..............................................................................      8

Is    Source color image.................................................................           13

It    Target image...........................................................................       16

k     K-mean clusters number.........................................................               75

ki    Number of samples for class i.................................................                87

knn   K nearest neighbors................................................................           86

l     Luminance of a pixel..............................................................            14

L2    Euclidean Distance metric.....................................................                18

Li    Pixel label..............................................................................     65

low   Lower bound..........................................................................         80

M     Image height...........................................................................       5

m     Meanshift function..................................................................          64

Mc    Total number of classes.........................................................              87

mco   Co-occurrence matrix width step...........................................                    36

N     Image width...........................................................................        5

n     Number of samples................................................................             62



                                           xxiv
                                                                                               List of Symbols




Nco          Co-occurrence matrix size (gray levels) ................................                        35

nco          Co-occurrence matrix height step...........................................                     36

np           Neighborhood pixels............................................................                 18

np           Number of peaks..................................................................               39

Nt           Number of training samples...................................................                   87

p            Pixel........................................................................................   15

pi           Pixel of index i........................................................................        75

Pn           Probability density..................................................................           61

ps           Source pixel............................................................................        23

pt           Target pixel.............................................................................       23

q            A query point..........................................................................         74

r            Normalizing factor..................................................................            39

R            Set of real numbers.................................................................            47

R∩           Intersection region.................................................................            74

RU           Union region...........................................................................         74

ri           Segmented region i.................................................................             16

Rn           D-dimensional hypercube.......................................................                  61

s            Scale factor.............................................................................       45

Si           Image sequence......................................................................            28

t            Time........................................................................................    15

T1, T2, T3   Three target images.................................................................            28

Tk           Transformational function on channel k.................................                         12



                                                   xxv
                                                                                       List of Symbols




u        Unit norm vector.....................................................................      25

upp      Upper bound.............................................................. ............     80

Vn       Volume of region Rn and samples Kn falls in Rn...... ............                          61

vx, vy   Optical flow.............................................................. ............    24

W        (n pixels × n pixels) matrix whose elements are the pairwise

         affinities between pixels.......................................... ..............         25

wp       Arrange of angles around the pth peak....................................                  39

wts      A weighting function sums to one..........................................                 19

Wφ       Wavelets coefficients for scale function.................................                  48

Wψ       Wavelets coefficients for base function..................................                  48

x,y      Pixel coordinates....................................................................       1

x1 q     Query sample feature 1...........................................................          86

x1 t     Training sample feature 1.......................................................           86

xi       Input points/samples...............................................................        64

yi       Meanshift basins of attraction points......................................                64

yi,c     Mean shift mean point............................................................          64

Z        Set of integers.........................................................................   47

zi       Output filtered points..............................................................       64

α        1st chromatic channel in lαβcolor model................................                    148

α4       The second kurtosis/ the fourth moment................................                     37

β        2nd chromatic channel in lαβcolor model..............................                      148




                                             xxvi
                                                                                   List of Symbols




ζ    Inequalities used to separate the buckets................................                  72

η    The number of partitions.........................................................          72

θ    Co-occurrence matrix angle....................................................             35

λ    Proportionality constant..........................................................         68

µ    Mean value..............................................................................   15

ρ    The power of Minkowski........................................................             83

σ    Standard deviation..................................................................       15

τ    Translation factor....................................................................     45

φ    Wavelet scaling function........................................................           48

ψ    Wavelet base function.............................................................         48

ψD   Diagonal wavelets...................................................................       48

ψH   Horizontal wavelets................................................................        48

ψV   Vertical wavelets....................................................................      48

ω    A unit hypercube centered at the origin.................................                   61

     Differentiation.........................................................................   63




                                        xxvii
                                                    List of Publications




             LIST OF PUBLICATIONS
[1] Noura A.Semary, Mohiy M. Hadhoud, W. S. El-Kilani, and Nabil A. Ismail,
    “Texture Recognition Based Gray Image Coloring”, The 24th National
    Radio Science Conference (NRSC2007), pp. C22, March 13-15, 2007, Faculty
    of Engineering, Ain-Shams Univ., Egypt.




                                   xxviii
Chapter 1                                                                  Introduction




                                CHAPTER ONE

                           INTRODUCTION
       In this chapter we present an introduction on the gray image coloring problem
starting with a short description on digital image principles and what is the difference
between colored and gray image. Also this chapter presents an introduction on the
different techniques used for coloring. Following that, a brief introduction on our
proposed coloring system is presented.


1.1 Digital image principles

Digital image is considered to be a discrete function I(x,y), where x and y are spatial
(plan) coordinates, and the amplitude of I at any pair of coordinates (x,y) is called the
intensity or the gray level of the image at that point. Each point in the image is called
pixel and is denoted by its position and intensity. For a colored image, any pixel color
is defined by a triple values vary from 0 to 255 or from 0 to 1 for normalized values,
describing the amount of red, green and blue color components of this pixel. This type
of image is sometimes called a full color image, RGB image, or alternatively a
contone image. The term contone comes from the fact that these images give the
appearance of a continuous range of color tones. RGB cube color model figure 1.1 is
the simplest color model describing the color space. Other color models are presented
in Appendix A. Typically we can consider four types of images:

       -    Binary image. In a binary image, each color pixel is represented by a single
            bit to represent the color as black or white. Figure 1.2 represents an
            example on a binary image.

       -    Intensity image or a gray scale image. A grayscale image contains only
            grays, much like a black and white photograph or TV movie. It’s sometimes




                                            1
Chapter 1                                                               Introduction




            Gray values




                          Figure 1.1: Normalized RGB cube model




                                 Figure 1.2: Binary image




                                                                  0
                                                                  .
                                                                  50
                                                                  .
                                                                  100
                                                                  .
                                                                  150
                                                                  .
                                                                  200
                                                                  .
                                                                  250
                                                                  .
                                                                  255


                                    Figure 1.3: Intensity image




                                            2
Chapter 1                                                                   Introduction




       called a monochrome image .Simply when the red, green, and blue values are
       equal (the marked diagonal in figure 1.1) it is considered as gray image, so
       each pixel in a grayscale image is represented by a single value, representing
       the gray level from 0 for black right through to 255 for white. Typically each
       pixel is represented by a single byte, giving only 256 levels of gray (figure 1.3).
       (Note that red, green and blue values in the image are normalized)

       -    Indexed image. In this type, each pixel has an index referring to a triple
            RGB color value in attached color map. The color depth of the image is
            measured by the number of triple values in the color map. Figure 1.4
            presents an indexed image with 256 levels color map.

       -    True color image. In this type, each color component (also called channel)
            is represented by a single byte (8 bits), giving 256 discrete levels of each
            color channel. Each pixel is therefore represented by 3 bytes, or 24 bits,
            giving a total of about 16 million different combinations of red, green and
            blue. The quality of a 24 bit RGB image is perfectly adequate for almost all
            purposes as presented in figure 1.5.

For instance, table 1.1 shows the difference between an RGB image and the gray
version of the same image to illustrate the difference between both. Notice that, the
total size of the image concludes the header of the image file.




                                             3
Chapter 1                                                                 Introduction




                              Figure 1.4: Indexed image




                             Figure 1.5: True color image


                    Image Size        Possible No.        Actually Used   Total size
       Image
                      (pixel)           colors             No. colors      (byte)


                   170 ×120 px         16777216              15998        60.0 kb




                   170 ×120 px            256                 246         21.2 kb


               Table 1.1: A comparison between a colored and gray image




                                          4
Chapter 1                                                                    Introduction




1.2 Coloring Problem:

Gray image coloring is a new research point area since it is utilized to increase the
visual appeal of images such as old black and white photos, movies or scientific
illustrations. In addition, the information content of some scientific images can be
perceptually enhanced with color by exploiting variations in chromaticity as well as
luminance. To illustrate the problem of coloring, next paragraphs show how to convert
from color to gray and how it is impossible to convert from gray to color.

There are two definitions to describe the gray value as an equation of the three basic
components of RGB color model (red, green, and blue):

1: Intensity (most common used):
                          Gray = (Red + Green + Blue) /3                            (1.1)

2: Luminance (NTSC standard):
                   Gray = 0.299 Red + 0.587 Green + 0.114Blue                       (1.2)

These two equations are not reversible, that means if we have a gray value we can't
convert it back to its red, green, and blue components. Each gray value from 0 to 255
has 256×256 combinations of completely different colors. For instance, table 1.2
shows some samples of different colors that have the same gray value.

For more illustration, to convert a color image to a gray scale one using other color
models like HSB, HSI, YIQ and lαβ (Appendix A), only the intensity channel B, I,Y
and l respectively is transferred to the gray image, while the chromatic channels
HS,IQ and αβ respectively will take zero values. That means if we have M×N gray
image, then there are 2×M×N missed chromatic values needed to convert it back to a
colored image.




                                           5
Chapter 1                                                                  Introduction




1.3 Coloring Trials:

This problem of coloring excites the researchers especially in image processing area.
And the trends for coloring were appeared from 80s. We classify these trends into
three categories: Hand coloring, Semi automatic coloring, and automatic coloring.
Next subsections will illustrate these trends in details.

1.3.1 Hand coloring:

Hand coloring has been long used by artists as a way of showing their talent. Many
trials were found for gray image hand coloring to enhance its visual appearance. For
instance, the coloring of the classic movie Casablanca in 1988 took more than two
months and nearly US$ 450,000 to complete, in a process that involved among many
other labor intensive steps researching historical wardrobe notes from original movie’s
set to discover the specific colors that the actors and actresses wore in most of the
scenes. [1]

One of the commercial software packages, BlackMagic [2] figure1.6 (a), which is
designed for still image colorization, provides the user with a range of useful brushes
and color palettes. The main drawback of it is that a segmentation task is done
completely manually. Many trials also are found using the common image editing
softwares like Adobe Photoshop and Paintshop Pro, most of these trials are based on
segmenting the image manually and duplicate the image into another layer one of
them are colored by the suitable colors as solid areas, and the other is transferred to a
transparent image with some ratio of transparency, finally the two layers are merged
to result in a final colored image like figure 1.6(b). Another way is to change the Hue
and the saturation of the different image parts as in figure 1.6(c).




                                             6
Chapter 1                                                                            Introduction




             RGB Color          R, G, B values         Gray value          Gray Color

                                 100, 150, 87             128


                                 147, 87, 149             128


                                 149, 147, 87             128


                  Table 1.2: Examples of different colors with the same gray value




                         (a)                                              (b)




                                                 (c)

            Figure 1.6 (a) Black magic (b)Photoshop layers (c) Paintshop pro colorization




                                                 7
Chapter 1                                                                   Introduction




1.3.2 Semi automatic coloring:

Pseudocoloring is a common example for the semi automatic coloring technique for
adding color to grayscale images; where the mapping of luminance values to color
values is done by converting the gray image to an indexed image. The choice of the
colormap is commonly determined by human decision. This colormap is also called
'Look Up Table' (LUT).

Pseudocoloring can be performed for each pixel; so the pixel value can be considered
as an index for a triple colors value (figure1.7). Also quantization can be used for
grouping similar pixels to take the same color value. Quantization can be done over
the intensity value or any other pixel features (figure1.8). So we can represent the
coloring progress as the following equation

                                  Ic(x,y) =colormap(Ig(x,y) )                        (1.3)

Where Ic is the color image, Ig is the gray image and (x,y) is the pixels coordinate. It's
so difficult to select the suitable colormap especially for the images that have real
colors like natural scenes or photos, so this method usually is used for unreal color
images as medical images or production images. Figure 1.9 illustrates this meaning by
examples. In figure 1.9 (a), this is a normal gray scale image presented with its color
map in the left. Palette index 0 is in the upper left corner and palette index 255 is in
the lower right corner. In figure 1.9 (b), this is the same image using the spectrum
color palette that has 256 different colors and the mapping done by pixel index. In
figure 1.9(c), another 256 custom color palette with black highlights, blue darker mid
tones and green shadows running to black. In figure 1.9(d), a group based
pseudocoloring example using a 16 levels custom color palette with black to white
tones running through red.




                                            8
Chapter 1                                                                         Introduction




                                                                       Ind    R      G     B
             g                                                           .   …      …     …
            115                                                         10   12     43    54
             10                                                          .   …      …     …
            255                                                        115   54     234   126
             …                                                           .   …      …     …
                                                                       255   43     65    243


                        Figure 1.7: Pixel based Pseudocoloring


                                                                       Ind    R      G     B
             g                                                           .   …      …     …
             8                                                          10   12     43    54
             10                                                          .   …      …     …
            255                                                        115   54     234   126
            250                                                          .   …      …     …
             …                                                         255   43     65    243


                        Figure 1.8: Group based Pseudocoloring




                  (a)                                            (b)




                  (c)                                            (d)
                  Figure 1.9 Coloring using different color map




                                          9
Chapter 1                                                                   Introduction




Pratt [1991] describes this method for medical images such as X-ray, Magnetic
Resonance Imaging MRI, Scanning Electron Microscopy (SEM) and other imaging
modalities as an “image enhancement” technique because it can be used to “enhance
the delectability of detail within the image” [3]

However, by using a colormap which does not increase monotonically in luminance,
pseudocolored images may introduce perceptual distortions. Studies have found a
strong correlation of the perceived “naturalness” of face images and the degree to
which the luminance values increase monotonically in the colormap. [3]

1.3.3 Automatic coloring:

The trend to fully automated gray image coloring is an important point in this field
and many trials to do so were found, each of them uses different methodology for
coloring, their techniques will be covered in the next chapter . On the other hand, we
try to simulate the human intelligence in this area. We believe that to make the
computer recognizes the image parts or objects and recolor them using previous
knowledge of their real color will result in better results.

We have proposed an intelligent fully automatic coloring system for textural images
like natural images. It works by segmenting the image into different textured regions
then finding the suitable class for each region from a set of different textures stored in
a special database. Each class is supported with a specific color value. We make use
of the well known techniques for gray image texture segmentation and classification.
The colorization is done by using HSV/HSB (Hue, Saturation, and Value/Brightness)
color model. Since in this model only the brightness channel has the value of the gray
image intensity while the hue and saturation channels have zero values, it's simple to
colorize the image by adding hue and saturation values for each pixel. The hue for
any region is taken from the texture class color value. Saturation channel is computed
by inverting the original gray image intensity. Our system is compared with two




                                            10
Chapter 1                                                               Introduction




common coloring techniques in the literature. Our system results are very satisfying
and perform the other techniques.


1.4 Research Objectives

The main goal of this thesis is to improve the process of gray image coloring using
recognition systems to simulate the human vision in retrieving the missed colors in a
gray image. To achieve this goal, we concentrate our research on finding the suitable
techniques for building a powerful texture based recognition system to recognize
natural scenes objects and colorize it. This recognition system should be fully
automated and doesn’t depend on any user support. Also, as an important aim in our
research, we have looked for minimizing the execution time of whole the coloring
process, as a basic requirement in video coloring. By the end of this thesis, we have
reached our goals successfully.



1.5 Thesis Organization

The thesis is organized as follows: Chapter 2 surveys the previous works contributed
for automatic coloring. Chapter 3 covers the basic principles of texture analysis. In
chapter 4, the most common segmentation and classification techniques used in our
system are presented. The proposed coloring system is presented in chapter 5.
Results, conclusions and future works are presented in chapter 6 with comparisons
between our system and two well known coloring systems. At the end of this thesis,
appendix-A presents the different color models to help readers understanding the
specification of each mentioned color model.




                                         11
Chapter 2                                                                 Previous works




                                CHAPTER TWO

                        PREVIOUS WORKS
       In this chapter, previous trials for automatic coloring will be discussed.
Automatic coloring has been proposed in the literature by so many authors. But there
is no published work which has surveyed all the works in this area in one paper. More
over, there is no listed classification for their work. From our side of view, we have
classified their relatively work, according to their trends in obtaining the colors which
are transferred to the gray image pixels, into three categories: Transformational
coloring, Matched image coloring and User selected coloring. Each of these types
will be presented in details.


2.1 Transformational coloring:

In the transformational coloring, the coloring is done by applying a function that we
have called a transformation function Tk upon the intensity value of each pixel Ig(x,y)
resulting in the chromatic value Ick(x,y) for channel k. Hence we can say:

                            I ck ( x, y ) = Tk [ I g ( x, y )]                       (2.1)

Where Ig is the intensity value of pixel the x,y, and Ick is the k channel chromatic value
for the same pixel). Since there must be three channels for the new colored pixel in
any color model (Appendix A), there must be three transformation functions that
transform the lonely available value ‘intensity’ to three channels values. Some times
more parameters are added to the equation such as pixel position or any other features.

Pseudocoloring can be considered as a case of this type, where the transformation
function simply maps the color map to the gray levels directly (Chapter 1). J. Yoo et
al. [4] proposed a system that generalized the pseudocoloring. They used the SOFM
neural network for codebook construction from actual images. Variants of intensity



                                                12
Chapter 2                                                                  Previous works




are classified to code vectors for color encoding using a back-propagation neural
network. The drawback of their system is that they used RGB color space (Appendix
A) which caused a decrease in the quality of the colored results.

Al-Gindy et al [5] is another trail of this type of coloring. Although it's not cited but
it's considered to be a trial, they proposed a coloring system that uses the frequency
domain to extract new chromatic values for each gray pixel. The technique is based on
the association of colors with spatial frequencies. The gray level image is converted
into the frequency domain by using the discrete cosine transform, and then two filters
were used to generate the red and blue components of the RGB image. The gray scale
image was used to represent the green component. The RGB components were
combined to generate the color image. High frequencies is omitted from the colored
image again to produce more refined colored one. The technique diagram and results
are shown in figure 2.1.

This type of coloring has the lack of reality in the obtained colors. Besides, it may
cause inconsistency or cutoff channels artifact between the colors. This type can be
useful for images that don’t have real colors such as medical images.


2.2 Matched image coloring

In this technique, a colored image is selected manually to be a reference for the color
palette used in coloring process. The pixels of the gray image Ig are matched with a
source colored image Is pixels; the most similar pixel color is transferred to the
corresponding gray one by the color transfer technique proposed by E.Reinhard et .al
[6]. The process can be described as follows:

For each gray pixel Ig(x,y) there exists a colored pixel Is(x2,y2) such that the distance E
(some similarity measure) between them is the minimum value. The chromatic values
in Is are transferred to Ig and the achromatic value of Ig remains.




                                            13
Chapter 2                                                                           Previous works




                                      Gray image


            Green              Blue                 Red


                                         DCT


                           Replacing HF      Replacing HF
                                                                             Gray image 1

                                        IDCT


                           RDCR × New        RDCB × New
                              blue              red


                                      Color image
                                                                           Colored image 1

                                      Color image


               Gray             Red             Blue        Green


               DCT



      Remove HF from 512 to 3                                                Gray image2


               IDCT



             New Gray              Mean Square Error


                                 Resul for R & B comp                      Colored image2


                        Figure 2.1 : Al-Gindy coloring system diagram and results




                                                       14
Chapter 2                                                                Previous works




From that description, the color model used for this type of coloring should have a
separated luminance channel. Authors differ in the way of matching the two images'
pixels; most of them use only the luminance value of the pixel; which mean that each
pixel in the grayscale image will have the chromatic components of the nearest pixel
in luminance in the reference color image. In "Global matching procedure" of T.
Welsh et al [3], transferring the entire color “mood” of the source to the target image
is done by matching luminance and texture information between the images. Then the
technique transfers only chromatic information and retain the original luminance
values of the target image.

First each image is converted into the lαβ color space .Jittered sampling was used to
select a small subset of pixels in the color image as samples. Then for each pixel in the
grayscale image in scan-line order, the best matching sample in the color image is
selected. The best match is determined by using a weighted average of pixel
luminance and the neighborhood statistics (standard deviation of the luminance values
of the pixel neighborhood). The chromaticity values (α and β channels) of the best
matched pixel are then transferred to the grayscale image while the original luminance
value is retained to form the final image. So the technique also can be used for
recoloring color images. A sample of this procedure results is shown in figure2.2.

Welsh algorithm works best when the luminance distributions of the target and source
images are locally similar. Its performance will degrade when the pixel histograms of
the target and source luminance images are substantially different. So in [7], a
preconditioning step precedes the matching step; a linear remapping of the luminance
distribution of the source image is computed first. That remapping is done such that,
the first and second order statistics of the luminance distribution of the source image
become equal to those of the target image. This helps to create a better
correspondence between the luminance ranges of the target and source images, but
does not alter the luminance values of the target image. More concretely, if l(p) is the
luminance of a pixel in the source image, then the remapping is done as follows




                                           15
Chapter 2                                                                    Previous works




                                               σt
                                    l ( p) =      (l ( p ) − µ s ) + µ t               (2.2)
                                               σs

Where µt and µs represent the mean luminance, and σt and σs are the standard
deviations of the luminance distributions, both taken with respect to the target and
source luminance distributions respectively. According to this improvement, the
system is used in coloring nightvision images like in figure2.3.


Yu-Wing Tai et al. [8] described a general color transfer algorithm (figure 2.4). They
perform probabilistic segmentation of both images using Gaussian Mixture Model
(GMM); given an input image pair Is for source color image and It for target image
(gray or color), a probabilistic segmentation in each of them is constructed so that the
colors in every segmented region ri can be fitted by a Gaussian component Gi of a
GMM appropriately. It guarantees that any two regions in the two input images have
similar statistical model, and a natural mapping can be achieved between them.
Hence, region ri and Gaussian component Gi are closely related. After performing
probabilistic segmentation on the two input images Is and It respectively, they obtain a
set of segmented regions. A mapping function f(.) is established which maps each
Gaussian component Gtj (related to region rtj ) in image It to some Gaussian in image
Is .

The function f(.) is estimated directly from color statistics. In practice, humans are
more sensitive to the luminance channel, where the monotonic constraint should be
maintained. Specifically, for a pair of Gaussian components Gti (ti; µti, σti ) and Gtj (tj ;
µtj, σtj ) where µti ≥ µtj in the luminance channel, the transferred Gaussian means µ'
should also satisfy µ'ti ≥ µ'tj . In their method, Gtj is mapped to Gsi where their means in
luminance channel are closest and monotonic constraint is enforced simultaneously.




                                                 16
Chapter 2                                                Previous works




            Figure 2.2 Global matching coloring result




             Figure 2.3 Night vision image coloring




              Figure 2.4 Local color transfer results




                                17
Chapter 2                                                                                       Previous works




After obtaining the mapping function f(.) which maps Gaussian component from Gtj (tj
; µtj, σtj ) to Gsi (si; µsi, σsi ), the final transferred color in pixel It(x, y) is computed as:

                                                        σ si
                       g ( I t ( x, y )) = ∑ tj Pxy (        ( I t ( x, y ) − µ tj ) + µ si )            (2.3)
                                          j             σ tj


Where tjPxy is the probability that I(x, y) belongs to region tj of the target image. It
guarantees the smoothness after the transfer, especially in regions rich in colors.

Welsh et al [3] proposed also another technique to improve the coloring results when
the matching results are not satisfying. It was achieved by asking the user to identify
and associate small rectangles, called “swatches” (figure 2.5) in both the source and
destination images to indicate how certain key colors should be transferred. This is
performed by luminance remapping as in the global procedure but only between
corresponding swatches. Again, random jittered sampling with approximately 50
samples per swatch is used. The second step is similar to texture synthesis algorithms
of Efros and Leung 1999; Efros and Freeman 2001 in which the L2 (Euclidian)
distance is used to find texture matches. The error distance E is defined using the L2
metric between neighborhood npg in the grayscale image and neighborhood nps in the
colorized swatch as:

                                 E (np g , np s ) =     ∑[I
                                                        p∈N
                                                              g   ( p ) − l s ( p )] 2                   (2.4)


Where Ig is the grayscale image, lS is the luminance channel of the colorized swatch
and p are the pixels in these neighborhoods. Note, at this stage the color image is no
longer searched for texture matches but only searching is done for matches within the
colorized swatches in the target image. The advantage of the approach is that in the
first stage colors are transferred to the swatches selectively which prevents pixels with
similar neighborhood statistics from the wrong part of the image from corrupting the
target swatch colors. It also allows the user to transfer colors from any part of image
to a select region even if the two corresponding regions vary largely from one another



                                                         18
Chapter 2                                                               Previous works




in texture and luminance levels. Secondly, since it is expected there to be more texture
coherence within an image than between two different images, it is expected that
pixels which are similar in texture to the colorized target swatches to be colorized
similarly.


T.Chen et al [9] built their coloring system on the idea of swatches too (figure 2.6).
The first part of their approach is an efficient grayscale image matting algorithm in
Bayesian framework. The foreground and background color distributions, and the
alpha’s distribution are modeled with spatially varying sets of Gaussians. The major
novelties of this matting algorithm are the introduction of alpha’s distribution and
gradient into the Bayesian framework and an efficient optimization scheme. This
grayscale image matting algorithm can effectively handle objects with intricate and
vision sensitive boundaries, such as hair strands or facial organs. In the second part,
by combining the grayscale image matting algorithm with color transferring
techniques using swatches, an efficient colorization scheme is proposed, which
provides great improvement over existing techniques for some difficult cases, such as
human faces or images with confusing luminance distribution.


U. Lipowezkys [10] system used swatches for aerial image coloring. The proposed
method is based on a simple premise: similar textures should have similar color
distributions. This premise is formalized by using Bayesian texture classification of
grayscale destination imagery with a set of textural prototypes or swatches from
source color imagery. Only 2D distributions of chrominance channels are transferred
and original luminance value is retained. A special segmentation technique is
proposed in order to increase the performance of the algorithm. Examples on real
images show high quality of colorization neither without using low-resolution color
imagery nor without any user intervention. Appropriate prototype selection enables
modeling of season changes of vegetation.




                                          19
Chapter 2                                                                     Previous works




                                 Figure 2.5 : Welsh swatches




     Grayscale Image     Masks      Objects   Examples    Colorized Objects   Colorized Image
                       Figure 2.6 Matting coloring algorithm sequence




                                              20
Chapter 2                                                               Previous works




Other trials [11, 12] depend on seed points selection from the original colored image
as reference points for coloring the same gray image. T. Horiuchi [11] assumed local
Markov property on the images and by minimizing a total of RGB pixel-wise
differences, the problem was considered as a combinatorial optimization problem and
it was solved by using the probabilistic relaxation. H. Nodaa et al [12] algorithm was
based on the idea that the colorization problem could be formulated as the maximum a
posteriori MAP estimation of a color image given a monochrome image, where a
color image was modeled by a Gaussian Markov field ‘MRF’ model. The global MAP
estimation problem for a whole image was decomposed into local MAP estimation
problems for each pixel, and the local MAP estimation was reduced to a simple
quadratic programming problem with constraints. Their results are shown in figure 2.7



Since this trend depends completely on the user selection of the source image, some
authors proposed technique s to elevate this constraint. L. Vieira et al [1,13] proposed
what could be called a fully automated coloring system (figure 2.8), by using a
database of colored images as a source of implicit prior knowledge about color
statistics in natural images. They used content-based image retrieval methods to find
suitable source images in an image database. To assess the merit of their
methodology, they performed a survey where volunteers were asked to rate the
plausibility of the colorings generated automatically for grayscale images. In most
cases, automatically-colored images were rated either as totally plausible or as mostly
plausible.   They studied    four   techniques   "luminance     histogram, luminance
subsampling, luminance + gradient histogram, and luminance + gradient
subsampling" to construct the signature vector used to automatically select the source
image for color transfer.




                                          21
Chapter 2                                                                             Previous works




                (a)                                   (b)                         (c)

Figure 2.7 (a) Original color images; (b) Estimated color images using evenly spaced 5×5 references;
                     and (c) Estimated ones using evenly spaced 20 × 20 references.


                                           Database
                                            index




                                                            Database
       ……
  Signature vector
     (128 bits)


                                -

                      Compute similarity


                                                                       Chrominance


                              Figure 2.8: Fully automatic coloring system




                                                      22
Chapter 2                                                                     Previous works




2.3 User selection coloring

In user selection coloring, the user marks some samples points in the gray image and
selects the colors of each. This step is done using a brush like tool. The computer then
is responsible of spreading the color on the bounded region containing the colored
samples. Different techniques are proposed on how to determine the similar neighbors
or bounded region. A. Levin et al. [14] and V. Konushin et al [15] are examples of this
type. They have the assumption, that neighboring pixels in space-time with similar
intensities should have similar colors.

- A. Levin et al [14] proposed a technique that is based on a unified framework
applicable to both still images and image sequences. The user indicates how each
region should be colored by scribbling the desired color in the interior of the region
(figure 2.9), instead of tracing out its precise boundary. Using these user supplied
constraints their technique automatically propagates colors to the remaining pixels in
the image sequence. They have used YUV color space, as a common space for video
processing where Y is the monochromatic luminance channel, which they referred to
simply as intensity, while U and V are the chrominance channels, encoding the color.
The algorithm is given as input an intensity volume Y(x; y; t) and outputs two color
volumes U(x; y; t) and V(x; y; t).The target pixel pt and source pixel ps should have
similar colors if their intensities are similar. So the goal is to minimize the difference
between the color U(pt) at pixel pt and the weighted average of the colors at
neighboring pixels:

                                                                          2
                                                                     
                            f c (U ) = ∑  U ( p t ) − ∑ wtsU ( p s ) 
                                                                                     (2.5)
                                       pt            p t ∈np ( pt )  

where fc is the cost function and wts is a weighting function that sums to one, large
when Y(pt) is similar to Y(ps), and small when the two intensities are different. The
simplest weighting function that is commonly used by image segmentation algorithms
and is based on the squared difference between the two intensities:



                                               23
Chapter 2                                                                                   Previous works




                                                                            2
                                                 − ( Y ( pt ) −Y ( p s ))       2σ t2
                                         wts ∝ e                                                     (2.6)

A second weighting function is based on the normalized correlation between the two
intensities:

                                                1
                               wts ∝ 1 +              (Y ( pt ) − µ t )(Y ( p s ) − µ s )            (2.7)
                                               σ t2

where µ t and σt are the mean and variance of the intensities in a window around pt.
The correlation affinity assumes that the color at a pixel U(pt) is a linear function of
the intensity Y(pt): U(pt) = aiY(pt) +bi and the linear coefficients ai;bi are the same for
all pixels in a small neighborhood around pt. It means that when the intensity is
constant the color should be constant, and when the intensity is an edge the color
should also be an edge (although the values on the two sides of the edge can be any
two numbers). While this model adds to the system a pair of variables per each image
window, a simple elimination of the ai;bi variables yields an equation equivalent to
equation (2.5) with a correlation based affinity function. The notation pt                           N(ps)
denotes the fact that pt and ps are neighboring pixels.

In a single frame, they define two pixels as neighbors if their image locations are
nearby. Between two successive frames, they define two pixels as neighbors if their
image locations, after accounting for motion, are nearby. More formally, let vx(x; y);
vy(x; y) denote the optical flow calculated at time t. Then the pixel (x0; y0; t) is a
neighbor of pixel (x1; y1; t +1) if:

                      ( x0 + v x ( x0 ), y 0 + v y ( y 0 )) − ( x1 , y1 ) < T                        (2.8)


The flow field vx(x0); vy(y0) is calculated using a standard motion estimation algorithm
. Note that the optical flow is only used to define the neighborhood of each pixel, not
to propagate colors through time. Their algorithm is closely related to algorithms
proposed for other tasks in image processing. In image segmentation algorithms based




                                                       24
Chapter 2                                                                   Previous works




on normalized cuts, one attempts to find the second smallest eigenvector of the matrix
D - W where W is a (n pixels × n pixels) matrix whose elements are the pairwise
affinities between pixels (i.e., the pt , ps entry of the matrix is wts) and D is a diagonal
matrix whose diagonal elements are the sum of the affinities. The second smallest
eigenvector of any symmetric matrix A is a unit norm vector u that minimizes uTAu
and is orthogonal to the first eigenvector. By direct inspection, the quadratic form
minimized by normalized cuts is exactly the cost function fc, that is uT(D - W)u = fc(u).
Thus, the algorithm minimizes the same cost function but under different constraints.
In image denoising algorithms based on anisotropic diffusion of Perona and Malik
1989; Tang et al. 2001 one often minimizes a function similar to equation 1, but the
function is applied to the image intensity as well.

- V. Konushin et al [15] present a new colorization method, based on GrowCut image
segmentation algorithm. In their approach a user just marks some pixels with desired
colors, and the algorithm propagates these colors to the remainder of the image (figure
2.10). After the initial colorization is computed, the user can interactively adjust and
refine the colors of the image. The main advantage of their method over the existing
ones is that modification and refining of the image colors doesn’t lead to full
recomputation of the image colorization and is performed very fast, thanks to
evolutionary nature of the coupled map lattices (CML).

- S. Premoˇze et al [16] proposed another approach for automatic coloring of
panchromatic aerial orthoimagery (figure 2.11). The method is able to remove shading
and shadowing effects in the original image so that shading and shadowing
appropriate to variable times of day and year can be added. The method they present is
based on pattern recognition principles.




                                            25
Chapter 2                                                                          Previous works




                                Figure 2.9 : A.Levin system results




 Figure 2.10 V. Konushin coloring results (a) Original black-and-white photo with scribbled colors.
                                    (b) Result of colorization




  Figure 2.11 S. Premoˇze aerial orthoimagery coloring, Gray image, Classified image and Colored
                                              image




                                                26
Chapter 2                                                                   Previous works




It classifies regions in the original panchromatic orthoimagery into classes that are
subsequently colored with user selected color palette. The method requires very little
user interaction and is robust. The user only needs to select a few training points for
each class that are later used in the pattern recognition and classification step.

User selection coloring gives high quality colorization, but is rather time-consuming,
and what is much more important, it requires the colorization to be fully recomputed
after even slightest change of the initial marked pixels.


2.4 Video coloring trials

Finally coloring grayscale movies can be obtained by coloring the first frame using
any of the previous techniques, and using motion detection algorithms to continue
colorizing the remaining frames.

So many trials for colorizing black and white movies are found in the literature. D.
S´ykora et al [17] proposed coloring system for black and white cartoon movies
colorization. They proposed a color-by-example technique which combines image
segmentation, patch-based sampling and probabilistic reasoning. This method is able
to automate colorization when new color information is applied on the already
designed black-and white cartoon (figure 2.12). Their technique is especially suitable
for cartoons digitized from classical celluloid films, which were originally produced
by a paper or cell based method. In this case, the background is usually a static image
and only the dynamic foreground needs to be colored frame-by-frame. They also
assume that objects in the foreground layer consist of several well visible outlines
which will emphasize the shape of homogeneous regions.

E. Ritwik et al [18] proposed colorization system specified for video compression.
Compression is achieved by firstly discarding chrominance information for all but
selected reference frames and then using motion prediction and DCT based




                                            27
Chapter 2                                                               Previous works




quantization techniques. While decoding, luminance-only frames are colored using
chrominance information from the reference frames using the proposed color transfer
techniques.

C. Wang et al [19] presented a novel image sequence color transfer algorithm (ISCT).
It is able to render an image sequence with color characteristics borrowed from three
user-given target images. The input of this algorithm consists of a single input image
(I1) and three target images (T1, T2, T3). The output of the algorithm is an image
sequence {Si} with N images that embeds itself with color mood variations. A user
selects necessary parameters, and a color variation curve CVC (a linear or parabolic),
through a user friendly interface they have developed. The algorithm completes the
task in three steps. First, it executes a color transfer algorithm using the input image
and three target images, producing three basis images (B1, B2, B3). Their color
features are thus borrowed from the three target images (T1, T2, T3). Given the user-
defined CVCs and three basis images generated in the first step, the ISCT algorithm
then interpolates the mean and variance values for N images. Finally, referring to the
input image and values interpolated in the second step, the algorithm automatically
generates an image sequence. This is achieved in several seconds by applying the
color transfer algorithm individually for every interpolated value. In addition, they
have developed a user interface which makes it possible to view the rendered image
sequence. Results are shown in figure 2.13.




                                          28
Chapter 2                                                                         Previous works




   Figure 2.12 D. S´ykora coloring .Color has been applied on the gray-scale image in the middle
                  without user intervention using left color image as an example.




   B1

                                           B2

                                                                                             B3
     Figure 2.13: The animation results for the canyon model change color variation gradually.




                                                29
Chapter 2                                                                Previous works




                                CHAPTER THREE

                                    TEXTURE
       In this chapter we present an introduction on texture analysis problems starting
with simple definition of image texture as presented in the literature. Many techniques
for describing image texture are presented, and only the used techniques in our
proposed system are described in details. Also in this chapter we present brief
information about different texture databases benchmarks that are used in the field of
texture analysis research.



3.1 What is texture?


There is no exact definition for image texture, but we can describe the texture as the
external skin covers the image or image parts. Usually texture has many interleaved
gray level intensities or different colors arrangement that gives the feel of its skin
touch (e.g. soft or coarse). We recognize texture when we see it but it is very difficult
to be defined. This difficulty is demonstrated by the number of different texture
definitions attempted by vision researchers. In [20] a catalogue of texture definitions
in the computer vision literature has been compiled.


Any way, texture is considered as an arrangement of some primitives like dots, lines,
curves and some times shapes. The smallest unit in the texture called textile. These
textiles produce the texture by repetition, rotating, flipping or any other
transformations. Figure 3.1 presents some of Brodatz textures [21]. Texture is the
visual cue due to the repetition of image patterns, which may be perceived as being
directional or non-directional, smooth or rough, coarse or fine, regular or irregular,
etc. figure 3.2 illustrates this.



                                           30
Chapter 2                                                                           Previous works




                      Figure 3.1 Example samples of Brodatz texture database




      (a). directional. vs. non-directional                       (b). smooth vs. rough




              (c). coarse. vs. fine.                           (d). regular vs. irregular
                       Figure 3.2 Comparison between different texture types




                                                 31
Chapter 2                                                                Previous works




3.2 Texture analysis


There are 4 main research fields for texture analysis problems [22]:


1. Texture segmentation: partitioning of an image into regions which have
   homogeneous properties with respect to texture.


2. Texture synthesis: to build a model of image texture that can be used for
   generating the texture.


3. Texture classification: to assign an unknown sample image to one of a set of
   known texture classes.


4. Shape from texture: a 2D image is considered to be a projection of a 3D scene and
   apparent texture distortions in the 2D image are used to estimate surface
   orientations in the 3D scene.


Texture analysis is important in many applications of computer image analysis for
classification or segmentation of images based on local spatial variations of intensity
or color. Important applications include industrial and biomedical surface inspection,
for example for defects and disease, ground classification and segmentation of satellite
or aerial imagery, segmentation of textured regions in document analysis, and content-
based access to image databases.


However, despite many potential areas of application for texture analysis in industry,
there are only a limited number of successful examples. A major problem is that
textures in the real world are often not uniform, due to changes in orientation, scale or



                                           32
Chapter 2                                                                Previous works




other visual appearance. In addition, the degree of computational complexity of many
of the proposed texture measures is very high. When choosing a texture analysis
algorithm, a number of aspects should be considered [23]:


   1. Illumination (gray scale) invariance; how sensitive the algorithm is to
      changes in gray scale. This is particularly important for example in industrial
      machine vision, where lighting conditions may be unstable.


   2. Spatial scale invariance; can the algorithm cope, if the spatial scale of
      unknown samples to be classified is different from that of training data.


   3. Rotation invariance; does the algorithm cope, if the rotation of the images
      changes with respect to the viewpoint.


   4. Projection invariance (3D texture analysis); in addition to invariance wrt.
      spatial scale and rotation the algorithm may have to cope with changes in tilt
      and slant angles.


   5. Robustness wrt. Noise; how well the algorithm tolerates noise in the input
      images.


   6. Robustness wrt. Parameters; the algorithm may have several built-in
      parameters; is it difficult to find the right values for them, and does a given set
      of values apply for a large range of textures.


   7. Computational complexity; many algorithms are so computationally intensive
      that they can not be considered for applications with high throughput
      requirements, e.g. real-time visual inspection and retrieval of large databases




                                          33
Chapter 2                                                               Previous works




   8. Generativity; does the algorithm facilitate texture synthesis, i.e. regenerating
       the texture that was captured using the algorithm.


   9. Window/sample size; how large sample the algorithm requires to be able to
       produce a useful description of the texture content.


3.3 Texture features


A successful texture analysis requires an efficient description of image texture, the
description of the texture considered as texture features. Feature extraction is a phase
that precedes any texture analysis system and it concerns with the quantification of
texture characteristics in terms of a collection of descriptors or quantitative feature
measurements often referred to as a feature vector. The choice of appropriate
descriptive parameters will radically influence the reliability and effectiveness of
subsequent feature qualification through the system.


A wide variety of techniques for describing image texture (features) have been
proposed, Tuceryan and Jain (1993) divided texture analysis methods into four
categories: Statistical, Structural/geometrical, Model-based, and Spectral/signal
processing features.



3.3.1 Statistical methods


Statistical methods analyze the spatial distribution of grey values, by computing local
features at each point in the image, and deriving a set of statistics from the
distributions of the local features. With this method, the textures are described by
statistical measures. Depending on the number of pixels defining the local feature, the




                                           34
Chapter 2                                                                                   Previous works




statistical methods can be further classified into first-order (one pixel), second-order
(two pixels) and higher-order (three or more pixels) statistics.


   -   Co-occurrence Matrix


One commonly applied and referenced method is the co-occurrence method [24],
introduced by Haralick (1973). In this method, the relative frequencies of grey level
pairs of pixels separated by a distance d in the direction θ combined to form a relative
displacement vector (dco, θ), which is computed and stored in a matrix (figure 3.3),
referred to as gray level co-occurrence matrix (GLCM) COOC. This matrix is used to
extract second-order statistical texture features. Haralick suggests 14 features
describing the two dimensional probability density function coP. Four of the most
popular commonly used are ASM (Angular Second Moment), Con (Contrast), Cor
(Correlation) and Ent (Entropy):


                              N CO −1 N CO −1
                                                                                                     (3.1)
                               ∑ ∑ coP
                                                            2
                   ASM =                               ij
                               i =0      j =0
                           N CO −1 N CO −1
                   Con =    ∑ ∑ (i − j )                2
                                                            coPij                                    (3.2)
                             i =0     j =0
                                       N CO −1 N CO −1
                               1
                   Cor =
                           σ xσ y
                                        ∑ ∑ [(ij )coP
                                        i =0    j =0
                                                                         ij   − µxµ y   ]            (3.3)
                           N CO −1 N CO −1
                   Ent =    ∑ ∑ coP log coP
                            i =0      j =0
                                                ij                  ij
                                                                                                     (3.4)



where µx , µy ,σx, and σy are the means and the standard deviations of the corresponding
distributions; and NCO is the number of grey levels.


Other statistical approaches include autocorrelation function, which has been used for
analyzing the regularity and coarseness of texture (Kaizer 1955), and gray level run



                                                                35
Chapter 2                                                                           Previous works




lengths (Galloway 1975), but their performance has been found to be relatively poor
(Conners & Harlow 1980).


                                                            θ


                                                          dco sin θ


                                     dco

                                        dco sin θ

    G                            C1,3                           gx+m,y+n       COOC1,3(g)
                                                                           6    0   2       3
                                                                           5    0   5       3
    y
                                                                           0    4   2       1
                                                                           1    2   4       4


                                                                                     gx,y
              x                             mco=1,nco=3

                         Figure 3.3 Co-occurrence matrix calculations



   -    Tamura


Tamura et al took the approach of devising texture features that correspond to human
visual perception [24]. They defined six textural features (coarseness, contrast,
directionality, line-likeness, regularity and roughness) and compared them with
psychological measurements for human subjects. The first three attained very
successful results and are used in our evaluation as joint values.


• Coarseness has a direct relationship to scale and repetition rates and was seen by
Tamura et al as the most fundamental texture feature. An image will contain textures
at several scales; coarseness aims to identify the largest size at which a texture exists,




                                                    36
Chapter 2                                                                                 Previous works




even where a smaller micro texture exists. Computationally, one first takes average at
every point over neighborhoods the linear size of which is powers of 2. The average
over the neighborhood of size 2b× 2b at the point (x, y) is


                                                                   b −1
                                                 x + 2 b−1 −1 y + 2 −1
                                Avk ( x, y ) =      ∑ ∑ I(i,j) / 2              2b
                                                                                                   (3.5)
                                                 i = x − 2 b −1 j = y − 2 b−1




Then at each point one takes differences between pairs of averages corresponding to
non-overlapping neighborhoods on opposite sides of the point in both horizontal and
vertical orientations. In the horizontal case this is



                      Eb ,h ( x, y ) = Avb ( x + 2 b −1 , y ) − Av b ( x − 2 b −1 , y )            (3.6)



At each point, one then picks the best size bopt which gives the highest output value,
where k maximizes E in either direction. The coarseness measure is then the average
of Sopt(x, y) = 2bopt over the picture.


• Contrast aims to capture the dynamic range of grey levels in an image, together with
the polarization of the distribution of black and white. The first is measured using the
standard deviation of grey levels and the second the kurtosis α4. The contrast measure
is therefore defined as



                                Fcon = σ /(α 4 ) n            where α 4 = µ 4 / σ 4                (3.7)



α4 is the fourth moment about the mean and σ2 is the variance. Experimentally,
Tamura found n = 1/4 to give the closest agreement to human measurements. This is
the value we used in our experiments.



                                                            37
Chapter 2                                                                              Previous works




• Directionality is a global property over a region. The feature described does not aim
to differentiate between different orientations or patterns, but measures the total
degree of directionality. Two simple masks are used to detect edges in the image. At
each pixel the angle and magnitude are calculated. A histogram, Hd, of edge
probabilities is then built up by counting all points with magnitude greater than a
threshold and quantizing by the edge angle. The histogram will reflect the degree of
directionality. To extract a measure from Hd the sharpness of the peaks are computed
from their second moments. In other words:


1.   The edge strength ed(x,y) and the directional angle ad(x,y) are computed using
     approximate pixel-wise derivatives computed by the Sobel edge detector in the
     3×3 moving window:


               −1 0 1                                        1 1 1
               −1 0 1             ∆x( x, y )                 0 0 0      ∆y ( x , y )
               −1 0 1                                        −1 −1 −1
                                                                                                (3.8)
                                  (
               ed ( x, y ) = 0.5 ∆ x ( x, y ) + ∆ y ( x, y )     )
               ad ( x, y ) = tan −1 (∆ y ( x, y ) / ∆ x ( x, y ) )



2.   Histogram Hdir(ad) of quantized directional values ad -> the numbers of the edge
     pixels (x,y) with the directional angle ad (x,y) and the edge strength ed (x,y) greater
     than a predefined threshold


     – The histogram Hdir(ad) is relatively uniform for images without strong orientation

     – The histogram Hdir(ad) exhibits peaks for highly directional images

3.   The degree of directionality relates to the sharpness of peaks:




                                                             38
Chapter 2                                                                       Previous works




                                      np

                     Fdir = 1 − r . n p ∑   ∑ (a
                                      p =1 a d ∈w p
                                                      d   − adp ) H dir (ad )
                                                                 2
                                                                                         (3.9)



              np - the number of peaks

              adp - the position of the pth peak

              wp - the range of the angles around the pth peak

              r - a normalizing factor related to quantizing levels of a

              ad - the quantized directional angle (modulo 180o)

Other three tamura features are highly correlated with the first three features


• Linelikeness Flin: an average coincidence of the coded directional angles in the pairs
of pixels separated by a distance d along the edge direction in every pixel


• Regularity Freg=1−r (σcrs+σcon+σdir + σlin) where r is a normalizing factor and each
σ... is the standard deviation of the corresponding feature F... in each subimage the
texture is partitioned into


• Roughness Frgh= Fcrs+Fcon



3.3.2 Geometrical method


The geometrical models of texture are based on the view that textures are made up of
primitives with geometrical properties. In these models, it is common either to
compute statistical features, or to identify the placement rules that describe the texture.
These structural methods model textures as consisting of primitives that appear in



                                                      39
Chapter 2                                                                Previous works




certain patterns. A texture is then defined by these primitives and some displacement
rules [Rosenfeld70]. In general, it is difficult to extract these elements from a real
texture. Structural methods may also be used for texture synthesis.


The primitives may be extracted by edge detection with a Laplacian-of-Gaussian or
difference-of-Gaussian filter (Marr 1982, Voorhees & Poggio 1987, Tuceryan & Jain
1990), by adaptive region extraction (Tomita & Tsuji 1990), or by mathematical
morphology (Matheron 1967, Serra 1982). Once the primitives have been identified,
the analysis is completed either by computing statistics of the primitives (e.g.
intensity, area, elongation, and orientation) or by deciphering the placement rule of the
elements (Zucker 1976, Fu 1982). The structure and organization of the primitives can
also be presented using Voronoi tessellations (Ahuja 1982, Tuceryan & Jain 1990).
Image edges are an often used primitive element. Davis et al. (1979) and Davis et al.
(1981) defined generalized co_occurrence matrices, which describe second-order
statistics of edges. Dyer et al. (1980) extended the approach by including the gray
levels of the pixels near the edges into the analysis. An alternative to generalized
co_occurrence matrices is to look for pairs of edge pixels, which fulfill certain
conditions regarding edge magnitude and direction. Hong et al. (1980) assumed that
edge pixels form a closed contour, and primitives were extracted by searching for
edge pixels with opposite directions (i.e. they are assumed to be on the opposite sides
of the primitive), followed with a region growing operation. Properties of the
primitives (e.g. area and average intensity) were used as texture features. Pietik‫ن‬inen
and Rosenfeld (1982) did not require edges to form closed contours, but statistics
computed for pairs of edge pixels, which had opposite directions and were within a
predetermined distance of each other, were used as texture features (e.g. distance
between the edge pixels, average gray level on the line between the edge pixels).




                                           40
Chapter 2                                                                 Previous works




3.3.3 Model-base methods


Model-based texture methods try to capture the process that generated the texture.
With model-based features, some image model is assumed, its parameters estimated
for a subimages, and the model parameters or attributes derived from them, are used
as features. There are currently three major model based methods: Markov Random
Fields (MRF) by Dubes and Jain (1989), fractals by Pentland (1984), and The multi-
resolution autoregressive (AR) features introduced by Mao and Jain (1992). For
detailed discussions of image models see Kashyap (1986), and Chellappa et al. (1993).


Features extracted from Markov random fields are both descriptive and generative.
Thus they have been found to be very useful for texture classification, image
segmentation, and other applications by Khotanzad and Kashyap (1987). Random
field models analyze spatial variations in two dimensions. Global random field models
treat the entire image as a realization of a random field, whereas local random field
models assume relationships of intensities in small neighborhoods. A widely used
class of local random field models type is Markov random field models .The Markov
Random Fields model for texture assumes that the texture field is stochastic and
stationary and satisfied a conditional independence assumption.


The auto-regression model provides a way to use linear estimates of a pixel’s grey
level, given the grey levels in the neighborhood containing it. For coarse textures, the
coefficients will all be similar. For fine textures, the coefficients will vary widely.
Various types of models can be obtained with different neighborhood system; One-
dimensional time-series models, autoregressive (AR), moving-average (MA), and
autoregressive-moving-average     (ARMA),      model     statistical   relationships   by
McCormick & Jayaramamurthy (1974), and Box (1976). General class of 2D
autoregressive models has been applied for describing textures and subsequent




                                          41
Chapter 2                                                                  Previous works




recognition and classification by Sarkar (1997). The power of the auto-regression
linear estimator approach is that it is easy to use the estimator in a mode that
synthesizes texture from any initially given linear estimator. Its weakness is that the
textures it can characterize are likely to consist mostly of micro textures.


Mandelbrot (1983) proposed describing images with fractals, a set of self-similar
functions characterized by so-called fractal dimension, which is correlated to the
perceived roughness of image texture. In contrast to autoregressive and Markov
models, fractals have high power in low frequencies, which enables them to model
processes with long periodicities. An interesting property of this model is that fractal
dimension is scale invariant. Several methods have been proposed for estimating the
fractal dimension of an image as Keller (1989), DuBuf (1990). Fractal models of
surfaces have been employed in image analysis where the objects are rough, irregular,
and multi-scale such as cloud, trees and natural textures as Huang (1990) and Peli
(1990).



3.3.4 Signal processing methods


Signal processing methods perform frequency analysis of the textures. This is
achieved by using the common used filters in this approach such as Spatial Domain
Filters, Fourier domain filtering, Gabor, and Wavelet models. In our system we have
used both Laws (1980) as spatial domain filters and discrete wavelet transform
coefficients.


       - Laws’ texture energy

Some well known signal processing method are based on Law’s Filter (1980).Laws
observed that certain gradient operators such as Laplacian and Sobel operators
accentuated the underlying microstructure of texture within an image. This was the



                                            42
Chapter 2                                                                Previous works




basis for a feature extraction scheme based on a series of pixel impulse response
arrays obtained from combinations of the following 1D vectors.




                          Level     L5 =    [ 1 4 6 4 1]

                          Edge      E5 =    [ -1 –2 0 2 1]

                          Spot      S5 =    [ -1 0 2 0 –1]

                          Wave      W5=     [ -1 2 0 –2 1]

                          Ripple    R5 =    [ 1 –4 6 –4 1]

In our system we use the method presented in [25] to compute the Laws energy. Each
of these vectors is convoluted with themselves and each other rotated by 90° to
produce 25 two-dimensional masks. An example would be convoluting L5 with S5' to
produce L5S5' which is of form:

                                   -1 -4 -6 -4     -1
                                   0 0 0 0          0
                                   2 8 12 8         2
                                   0 0 0 0          0
                                   -1 -4 -6 -4     -1


We now have 25 different feature extractors. If each is convoluted with the grey scale
image being worked on, then 25 new grayscale images will be formed, each with
different texture energy information based on the convolution kernel used. These new
images will form the basis of the textural analysis. All of the kernels generated will
measure the texture energy in a particular direction; if directionality is not important
to the particular process then they can be combined to reduce the number of
dimensions being processed. For example L5E5' detects vertical edges and the reverse
(E5L5') detects horizontal edges. If we were to simply add them together an 'edge




                                           43
Chapter 2                                                                Previous works




content' measure would be found. However, for texture feature extraction and analysis
it is important that directionality is taken into account, so the full 25 kernels will be
used.

Laws then suggests the use of a windowing operation that converts each pixel in each
image to its respective Texture Energy Measure (TEM). This operation is simply the
use of a large window (15x15 for example), placed over the data with its centre at a
particular pixel. This pixel is then replaced with the sum of the absolute values of the
neighborhood pixels. The above technique is called ‘absolute value windowing’. A
slight modification also suggested by Laws involves squaring the values in the
windows before summing, then using the square root of the answer, rather like
calculating a norm. All of the kernels used to produce the new grayscale images were
zero-mean (i.e. the sum of the elements is zero) with the exception of the L5L5 kernel.
The produced L5L5 image can therefore be used as a normalization image, where all
the other Texture Energy Measure images are divided pixel-by-pixel with the L5L5
image and thus are normalized for contrast. The L5L5 image can then be discarded
unless a contrast feature is required. This leaves us with 24 different values for every
pixel of the original image which can then be used for clustering.


Another class of spatial filters is moments, which correspond to filtering the image
with a set of spatial masks. The resulting images are then used as texture features.
Tuceryan (1992) used moment-based features successfully in texture segmentation.


        -Wavelets analysis


It is well known from Fourier theory that a signal can be expressed as the sum of a,
possibly infinite, series of sines and cosines. This sum is also referred to as a Fourier
expansion. The big disadvantage of a Fourier expansion is that it has only frequency
resolution and no time resolution. This means that although we might be able to



                                           44
Chapter 2                                                                    Previous works




determine all the frequencies present in a signal, we do not know when they are
present. The wavelet transform or wavelet analysis is probably the most recent
solution to overcome the shortcomings of the Fourier transform. In multi-resolution
analysis, the so-called wavelet transform, the window is shifted along the signal and
for every position the spectrum is calculated. Then this process is repeated many times
with a slightly shorter (or longer) window for every new cycle. In the end the result
will be a collection of time-frequency representations of the signal, all with different
resolutions. Compare wavelets with sine waves, which are the basis of Fourier
analysis. Sinusoids do not have limited duration — they extend from minus to plus
infinity. And where sinusoids are smooth and predictable, wavelets tend to be
irregular and asymmetric.


Fourier analysis consists of breaking up a signal into sine waves of various
frequencies (figure 3.4a). Similarly, wavelet analysis is the breaking up of a signal
into shifted and scaled versions of the original (or mother) wavelet (figure 3.4b). It
takes two arguments: time and scale. Just looking at pictures of wavelets and sine
waves, you can see intuitively that signals with sharp changes might be better
analyzed with an irregular wavelet than with a smooth sinusoid. It also makes sense
that local features can be described better with wavelets that have local extent.




                         (a)                                   (b)

            Figure 3.4 Comparison between (a) sine wave, (b) wavelet wave (db10)



The wavelet analysis can be described as the continuous wavelet transform CWT.
More formally it is written as:




                                            45
Chapter 2                                                                              Previous works




                                   γ ( s,τ ) = ∫ f (t )ψ * (t )dt                              (3.10)
                                                         s,τ

Where * denotes complex conjugation. This equation shows how a function f(t) is
decomposed into a set of basis functions ψs,τ (t), called the wavelets. The variables s
and τ, scale and translation, are the new dimensions after the wavelet transform. For
completeness sake (3.11) gives the inverse wavelet transform.


                                    f (t ) = ∫ ∫ f (t )ψ                (t )dτds               (3.11)
                                                                  s,τ
The wavelets are generated from a single basic wavelet ψs,τ (t), the so-called mother
wavelet, by scaling and translation:

                                                            1       t −τ 
                                       ψ         (t ) =           ψ                          (3.12)
                                           s,τ                s     s 

In (3.12) s is the scale factor, τ is the translation factor and the factor s-1/2 is for energy
normalization across the different scales.

Discrete wavelets are not continuously scalable and translatable but can only be scaled
and translated in discrete steps. This is achieved by modifying the wavelet
representation (3.12) to create

                                                          1         t − kτ 0 s 0j 
                                   ψ          (t ) =             ψ
                                                                  
                                                                                   
                                                                                              (3.13)
                                       j, k               s 0j          s 0j      


Although it is called a discrete wavelet, it normally is a (piecewise) continuous
function. In (3.13) j (scale) and k (index of the finite or infinite sum) are integers and
s0 > 1 is a fixed dilation step. The translation factor τ0 depends on the dilation step.
The effect of discretizing the wavelet is that the time-scale space is now sampled at
discrete intervals. We usually choose s0 = 2 so that the sampling of the frequency axis
corresponds to dyadic sampling. This is a very natural choice for computers, the
human ear and music for instance. For the translation factor we usually choose τ0 = 1
so that we also have dyadic sampling of the time axis (Figure 3.5).




                                                       46
Chapter 2                                                                                               Previous works




 Frequency




                     Time                                     Time                                     Time

                      (a)                                      (b)                                     (c)
             Figure 3.5: Time-frequency tiling for (a) sampled data (b) FFT, and (c) DWT basis functions




Now consider the set of expansion functions composed of integer translations and
binary scaling of the real, square-integrable function φ(x); that is, the set {φj,k(x)}
where

                                                ϕ j ,k ( x ) = 2 j / 2 ϕ ( 2 j x − k )                          (3.14)

for all j, k є Z (set of integers) and φ(x) є L2(R) – (where R is the set of real numbers,
,denotes the set of measurable, ,square-integrable one dimensional functions) -. Here,
k determines the position of φj,k(x) along the x-axis, j determines φj,k(x)'s width – how
board or narrow it is along the x-axis – and 2j/2 controls its height or amplitude.
Because the shape of φj,k(x) changes with j, φ(x) is called a scaling function. By
choosing φ(x) wisely, {φj,k(x)} can be made to span L2(R), the set of all measurable,
square-integrable functions.

So now we can define the wavelet series expansion of function f(x) є L2(R) relative to
wavelet ψ(x) and scaling function φ(x):

                                                                      ∞
                              f ( x) = ∑ ac j0 (k )ϕ j0 , k ( x) +   ∑ ∑ dc (k )ψj       j ,k   ( x)            (3.15)
                                        k                            j = j0 k



where j0 is an arbitrary starting scale and the acj0(k)'s are called approximation or
scaling coefficients; and dcj(k)'s are referred to as the detail or wavelet coefficients.




                                                                47
Chapter 2                                                                                                Previous works




The one-dimensional transformations previously presented are easily extended to two-
dimensional functions like images. In two dimensions, a two-dimensional scaling
function, φ(x,y), and three two-dimensional wavelets, ψH(x,y), ψV(x,y), and ψD(x,y),
are required. Each of the product of one-dimensional scaling function φ and
corresponding wavelet ψ. Excluding the products that produce one-dimensional
results, like φ(x)φ(x), ,the remaining four products produce the separable scaling
function
                                                  φ(x,y) = φ(x)φ(y)

and separable, "directionally sensitive" wavelets

                                                ψH(x,y) = ψ(x)φ(y)

                                                ψV(x,y) = φ(x)ψ(y)

                                                ψD(x,y) = ψ(x)ψ(y)

By these definitions eq.(3.14) will be defined as

            ϕ j ,m ,n ( x, y ) = 2 j / 2 ϕ (2 j x − m, 2 j y − n),
                                                                                                                 (3.16)
            ψ ij ,m ,n ( x, y ) = 2 j / 2ψ i (2 j x − m, 2 j y − n)                 i = {H ,V , D}


Finally, wavelets transform coefficients will be defined as following:


                                           M −1 N −1
                                    1
             Wϕ ( j0 , m, n) =
                                    MN
                                           ∑∑ f ( x, y)ϕ
                                            x =0 y =0
                                                                  j0 , m, n   ( x, y )

                                           M −1 N −1
                                                                                                                 (3.17)
                                     1
             Wψ ( j0 , m, n) =
                i

                                     MN
                                            ∑∑ f ( x, y)ψ
                                            x =0 y =0
                                                                  i
                                                                  j ,m,n   ( x, y )         i = {H ,V , D}




One can think in this process as filtering process by two filters; low-pass filter to
produce the scaling coefficients, and high-pass filter for the wavelets coefficients. The
filtering process at its most basic level looks like figure 3.6.




                                                             48
Chapter 2                                                                       Previous works




The decomposition process can be iterated, with successive approximations being
decomposed in turn, so that one signal is broken down into many lower resolution
components. This is called the discrete wavelet transform ‘DWT’ tree (figure 3.7).



                                              S


                                            Filters

                                Low-pass              High-Pass




                                  A                        D

                 Figure 3.6: 1D signal discrete Wavelet Transform for 1 level




                                                         A1


                                                A1                 D1


                                       A2                D2



                           A3                   D3


                      …………

                  Figure 3.7: Three levels discrete wavelets decomposition




                                             49
Chapter 2                                                                  Previous works




For 2D-wavelet decomposition, the transpose of both the low pass and the high pass
filters are applied again for each resulted filtered image. So the decomposition yields
to four filtered images of wavelets coefficients. The decomposition filters are called
the analysis filter. Figure 3.8 illustrates the process and figure 3.9 illustrates the
inverse process to reconstruct the image from its wavelets coefficients.


The result of the decomposition process is shown in the figure 3.10a. As shown in the
figure, one level decomposition results in four filtered images, also called subbands,
of quarter the size of the original image due to the downsampling step. In figure 3.10b,
the approximation coefficient is used in the next iteration as the base image.


Usually the three subbands of details coefficients are used for texture features. For our
system, we use both the decomposition and the reconstruction wavelet phases to get
our texture features as will be discussed later in chapter no.5. The big advantage of
the wavelet description of an image is that it is invertible, shift invariant, size
invariant, and very detailed decomposition. For more details about wavelets, readers
can return to [26]

There are many wavelet families. The different wavelet families make different trade-
offs between how compactly the basis functions are localized in space and how
smooth they are. Figure 3.11 shows some samples of wavelet family. Many other
wavelets exist such as Biorthogonal, Morlet, Mexican Hat, and Meyer.


Other spatial/spatial-frequency method includes the pseudo-Wigner distribution
introduced by Jacobson and Wechsler (1982). Texture description with these methods
is performed by filtering the image with a bank of filters, each filter having a specific
frequency (and orientation). Texture features are then extracted from the filtered
images. For a detailed discussion on spatial/spatial-frequency methods see Reed and
Wechsler (1990).



                                           50
Chapter 2                                                                              Previous works




                                                            hψ(-m)           2    WψD ( j , m, n)
                                                                       Rows
                               hψ(-n)       2                        (along m)
                                        Columns
                                        (along n)           hφ(-m)
                                                                             2    WψV ( j , m, n)
    Wϕ ( j + 1, m, n)                                                  Rows

                                                            hψ(-m)           2    WψH ( j , m, n)
                                                                       Rows
                               hφ(-n)       2
                                        Columns             hφ(-m)
                                                                             2    Wϕ ( j , m, n)
                                                                       Rows




                   Figure 3.8: the analysis filter bank for 2D wavelet decomposition



     WψD ( j , m, n)       2            hψ(m)
                          Rows
                        (along m)                   +        2       hψ(n)
                                                         Columns
     WψV ( j , m, n)                    hφ(m)            (along n)
                           2
                          Rows                                                   Wϕ ( j + 1, m, n)

     WψH ( j , m, n)       2            hψ(m)
                          Rows
                                                    +        2       hφ(n)
                                        hφ(m)            Columns
     Wϕ ( j , m, n)        2
                          Rows

                Figure 3.9: the reconstruction filter bank for 2D wavelet decomposition




                                                    51
Chapter 2                                                         Previous works




            (a)                                            (b)

            Figure 3.10: 2D wavelets decomposition coefficients




              Figure 3.11: Some samples of wavelets families




                                    52
Chapter 2                                                                Previous works




3.4 Texture Databases


Most texture Benchmarks databases are found in the literature such as Brodatz,
Vistex, MeasTex, CURet, OUTex, Photex, and more. These benchmarks are used
usually for the purpose of texture analysis research. Most of them used real textures
that were captured by specific camera and special conditions. Next paragraphs show
brief description on some of these databases.

Brodatz

The “Brodatz texture database” is derived from the Brodatz album [21]. It has a
relatively large number of classes (112 classes), and a small number of examples for
each class (figure 3.12). There are several advantages of using this texture data. First,
this data is de facto standard for texture analysis. As we have seen in our literature
survey, several studies use Brodatz images for texture analysis. Hence, there is a
common understanding on how difficult these images are for classification. However,
one of the problems using this data is that several different digitized collections exist
and a number of authors have been very choosy in which images they have used for
analysis. This makes comparison difficult. Another disadvantage is that the album was
never designed in the first place for image analysis and as such does not contain all
textures that are relevant for vision research.


VisTex


VisTex (Vision and Modeling Group at the MIT Media Lab) [27] collection has been
assembled and maintained to assist in the development of more robust computer
vision algorithms and their comparison on a common set of data. The database
contains images of 18 different natural objects (figure 3.13).




                                            53
Chapter 2                                                                  Previous works




                    Figure 3.12 Samples from Brodatz database benchmark




      bark01.pgm    fabric01.pgm        food1.pgm         metal1.pgm      sand1.pgm




    brick01.pgm     building01.pgm cloud01.pgm         flowers01.pgm wood1.pgm
                   Figure 3.13: Samples from Vistex database benchmark




                                            54
Chapter 2                                                                Previous works




The goal of VisTex is to provide texture images that are representative of real world
conditions. This database is a standard framework for measuring the accuracy of
texture classification algorithms. The database consists of real world images in
different lighting conditions for analysis of various texture feature classification
algorithms. There are a total of 67 images of different scenes with different lighting
conditions. The images are divided into 18 different categories namely, Bark, Brick,
Buildings, Clouds, Fabric, Flowers, Food, Grass, Leaves, Metal, Miscellaneous,
Painting, Sand, Stone, Terrain, Tile, Water, Whereswaldo and Wood.


MeasTex


MeasTex (the MEASurement of TEXture classification algorithms) [28], an image
database and quantitative measurement framework for image texture analysis
algorithms. It is not only a texture database, but also a frame work for the quantitative
measurement of texture algorithms targeted on a number of texture testing suites, and
an implementation of some major well-known texture classification paradigms. This
texture database contains images of natural objects with homogeneous texture. There
are a total of 67 images of synthetic homogeneous textures. The images are divided
into 5 different categories: Asphalt, Concrete, Grass, Miscellaneous and Rock.
Various examples of images from the MeasTex database are presented in figure 3.14


CUReT


Researchers at Columbia University and Utrecht University have put together a
database that contains reflectance measurements for over 60 samples each observed
with 200 different combinations of viewing and illumination conditions [29]. As a part
of this, 60 different real world surfaces have been used. The categories include
specular surfaces (aluminum foil, artificial grass), diffuse surfaces (plaster concrete),




                                           55
Chapter 2                                                                Previous works




isotropic surfaces (cork, leather, and styrofoam), anisotropic surfaces (straw,
corduroy, corn husk), surfaces with large height variations (sandpaper, quarry tile,
brick), pastel surfaces (paper, cotton), colored surfaces (velvet, rug), natural surfaces
(moss, lettuce, fur), and man made surfaces (sponge, terrycloth). The appearance of
texture is considered as a function of the viewing and illumination conditions. This is
a relatively new data set compared to other texture benchmarks and not enough
studies on its analysis are available. Also the database has been developed more for
the graphic community for texture rendering studies. The database acts as a starting
point for exploring 3D texture rendering algorithms. As such, it remains to be seen if
it will prove useful for texture discrimination studies. Figure 3.15 presents some of
CURet samples.


OUTex


OUTex (University of Oulu Texture database) [30], is a framework for the empirical
evaluation of texture classification and segmentation algorithms. At this time, the
collection of 319 surface textures is captured by well defined variations to a given
reference in terms of illumination directions, surface rotations and spatial resolutions.
OUTex database also include image data that consists of a sequence of 22 outdoor
scene images of 2272x1704 pixels. Ground truth segmented image is provided for
each image for the purpose of evaluating any classification algorithm. For our system
and as will shown later, we concern with clear natural scenes that don't consist any
shadows, reflections, or refractions. Also we don't concern with outdoor scenes that
include buildings. Most of the OUTex database images have these drawbacks.See
figures 3.16




                                           56
Chapter 2                                                                         Previous works




   grass01.pgm        misc01.pgm         rock01.pgm         asphalt01.pgm       concrete01.pgm
                      Figure 3.14 Sample from MeasTex database benchmark




                 Figure 3.15: Samples from Columbia-Utrecht database benchmark.




            Figure 3.16 Outex database example (a) the original image (b) ground truth




                                               57
Chapter 2                                                                 Previous works




Comparison between the different databases is illustrated in table 3.1:


                                  CURet OUTex VisTex MeasTex Brodatz
            Image rotation          Yes     Yes      Yes        Yes        Yes
       3D Surface rotation          Yes     Yes       No        No          No
     Controlled illumination        Yes     Yes      Yes        No          No
     Registered photometric
                                    No      No        No        No          No
                stereo
             No.classes             61       29       19           4      100-112
         No.images/class            205     1-47     1-20      4-25         1-4
      Evaluation framework          No      Yes       No        Yes         No
     Define test/traiing data       No      Yes       No        Yes         No
            Unique copy             Yes     Yes      Yes        Yes         No
                          Table 3.1 Texture databases comparison


PANN database


As a part of Sharma’s [31] project, outdoor data was collected from the University of
Exeter campus using a Panasonic digital video camera (Model NV-DS77B) having
720x576 pixels resolution. The data was collected in the form of 448 colored digital
stills. Images of different natural scenes containing grass, trees, sky, clouds, pebbles,
road and different samples of bricks were collected. The camera was stabilized using a
tripod. All the images were taken during daytime to give a realistic view of the real
environment. The shadow effects were minimized. A selection of images from this
database is shown in next figure.


For natural images collections, we found PANN database is very suitable for testing
our system.      Coloring results of this database is shown in chapter 5.




                                           58
Chapter 2                                                      Previous works




            Figure 3.17 Samples from PANN database benchmark




                                  59
Chapter 4                                                   Image Analysis Techniques




                             CHAPTER FOUR

         IMAGE ANALYSIS TECHNIQUES
      In this chapter, different segmentation and classification techniques that have
been used in our system will be presented.


4.1 Segmentation techniques

Image segmentation is one of the primary steps in image analysis for object
identification. The main aim is to recognize homogeneous regions within an image as
distinct and belonging to different objects. Segmentation stage does not worry about
the identity of the objects. They can be labeled later. The segmentation process can be
based on finding the maximum homogeneity in grey levels within the identified
regions. One of the common problems encountered in image segmentation is choosing
a suitable approach for isolating different objects from the background. The
segmentation doesn’t perform well if the grey levels of different objects are quite
similar. Image enhancement techniques seek to improve the visual appearance of an
image. They emphasize the salient features of the original image and simplify the task
of image segmentation. The type of operator chosen has a direct impact on the quality
of the resultant image. It is expected that an ideal operator will enhance the boundary
differences between the objects and their background making the image segmentation
task easier. Issues related to segmentation involve choosing good segmentation
algorithms, measuring their performance, and understanding their impact on the scene
analysis system.

4.1.1 Mean Shift algorithm

The mean shift algorithm was proposed by Fukunaga and Hostetler in 1975, although
it was largely forgotten until Cheng in 1995. The procedure can be used as a




                                          60
Chapter 4                                                                 Image Analysis Techniques




computational module for feature space analysis (i.e. clustering) and has a variety of
applications in vision tasks including filtering and segmentation [32]. It can be applied
to both grayscale and color images and the only parameter that needs to be supplied is
the resolution or bandwidth that is to be considered.

- What is it?

Kernel density estimation is the most popular density estimation method. As explained
in [33], it assumes that region Rn is a d-dimensional hypercube of edges length hn. The
number of points that fall in the hypercube can be calculated by defining the following
window function:

                                    1
                                    
                            ω (u ) =  u j ≤ 12 ;          j = 1,...d 
                                                                                             (4.1)
                                    0 otherwise
                                                                     


ω(u) is a unit hypercube centered at the origin. This means that if the set of data is x
and the current point we are interested in is xi. Then ω((x – xi)/hn) is 1 if xi falls within
the hypercube and 0 otherwise. Using this it can be said that the number of samples
that fall in a particular hypercube is given by centered at xi is:

                                              n
                                                 x−x 
                                           = ∑ω 
                                        k n i =1  hn i 
                                                                                             (4.2)
                                                       

It is already known that the probability density when Vn is the volume of Rn, kn is the
number of samples falling in Rn and pn(x) is the nth estimate of p(x) is of the form:

                                                    kn n
                                        Pn ( x) =                                             (4.3)
                                                     Vn


So substituting (4.2) into (4.3) gives a density function (4.4):

                                        1 n 1  x − xi 
                            Pn ( x) =     ∑ ω         
                                        n i =1 Vn  hn 
                                                                                              (4.4)
                                                      




                                                    61
Chapter 4                                                       Image Analysis Techniques




If we allow other classes of windowing functions to be chosen we can get a more
general approach to estimating density function. It is required that whatever window is
chosen is non-negative and integrates to one, but apart from these restrictions we are
free to use whatever window we wish. The window function is really being used for
interpolation, with each sample contributing to the estimate in accordance with its
distance from x. It is usual to choose a Gaussian function as the window due to it
satisfying the restrictions and being so mathematically useful.

The window width hn acts as a resolution parameter. If hn is large then x must be far
from xi before it starts to lose the effect on the density function. This means that pn(x)
is n slowly changing functions and is a smooth, out of focus estimate of p(x). If hn is
small, p(x) is the superposition of n sharp pulses centered at the samples. hn obviously
has an important role to play in determining pn(x) where choosing too large a value
will give too smooth a density with no modes and too small a value will give too
much statistical variability. It is needed to be compromised due to the number of
samples, but with many samples we should try to let Vn approach zero as n increases
to have pn(x) converge to p(x). Using equation (4.4) and adapting it slightly by setting
K = ω/Vn and using only one bandwidth value h, we arrive at the following estimate
of the probability density function:


                                    ˆ        1 n  x − xi 
                                    f (x ) =    ∑ K
                                             nh i =1  h 
                                                                                    (4.5)


It is needed to use a high quality kernel density estimator which means that a low
mean of the square error between the density and its estimate integrated over the
domain of definition is needed (this value is called the AMISE). The AMISE value is
minimized by using an Epanechnikov kernel which yields the multivariate (as in x is a
vector) normal kernel:

                                                       1 2
                            K N ( x) = (2π ) − d 2 exp − x                         (4.6)
                                                       2   




                                             62
Chapter 4                                                                         Image Analysis Techniques




The kernel (4.6) can then be used to rewrite the kernel density estimator:

                                                   c        n     x−x   2
                                                                             
                                     ˆ
                                     f k , K ( x) = k , d
                                                   nh d
                                                            ∑ k h i
                                                                 
                                                            i =1 
                                                                             
                                                                             
                                                                                                      (4.7)
                                                                             

We now have a formula that represents the underlying density function. The next step
is to find the modes of this density. These will be located at the points where the
gradient of the density function is zero and the function is at a local maximum. The
mean shift procedure calculates these zeroes without estimating the density. In other
words, it is a hill climbing technique that finds the nearest stationary point of the
density. First, eq(4.7) is differentiated to give a gradient function:

                                              2ck n                   x − xi     2
                                                                                      
                                ˆ
             ∇ f h , K ( x) ≡ ∇ f k , K ( x) = d ,+d2 ∑ ( x − xi )k ′
             ˆ                                                                                       (4.8)
                                              nh                      h              
                                                      i =1                           

Let g(x) = -k'(v) and substitute into equation (4.8) to give:

                                                2ck n                  x − xi        2
                                                                                          
                                 ˆ
                                 ∇f h , K ( x) = d ,+d2 ∑ ( xi − x) g                               (4.9)
                                                nh                     h                 
                                                        i =1                             




Which when manipulated algebraically gives the following:

                                                             n       x − xi 2     
                                                             ∑ xi g               
                           2ck        n       x − xi   i =1
                                                       2              h        
            ˆ
            ∇f h , K ( x) = d ,+d2   ∑                                     − x               (4.10)
                                                                                     
                           nh         i =1 
                                                h       n  x − x 2 
                                                                                  
                                                             ∑  h
                                                                  g       i  
                                                                                    
                                                             i =1                 

In equation (4.10), the first set of square brackets and the constant are proportional to
the density estimate at x computed with kernel G. The second term is what we are
interested in, the mean shift.




                                                                63
Chapter 4                                                            Image Analysis Techniques




                                         n           x − xi 2 
                                         ∑  h 
                                               xi g 
                                                               
                           mk ,G ( x ) =
                                         i =1                 −x                      (4.11)
                                             n     x − xi 2 
                                           ∑ g h 
                                           i =1 
                                                            
                                                             

The mean shift vector computed with kernel G is proportional to the normalized
density gradient estimate computed with kernel K. The normalization is by the density
estimate in x computed with kernel G. Therefore the mean shift vector will always
point towards the direction of the maximum increase in density. Hence the local mean
shifts toward the region which contains the majority of points meaning we follow a
path leading to a stationary point of the estimated density. These stationary points are
actually the modes of the density and the mean shift procedure is guaranteed to
converge on a nearby point where equation (4.7) has zero gradient [32,33]. To use the
mean shift procedure for clustering choose a known point xi and iterate equation (4.11)
replacing xi with the previously calculated mean value, this is the 'mean shift'. When
subsequent iterations produce the same mean value, we know that we are on a mode
and that must be a cluster center and that the point we started with belongs to that
cluster. The filtering and segmentation procedures can be described in algorithmic
way as follows:

- Mean shift filtering

Let xi and zi, i=1,…,n be the d-dimensional input and filtered image pixels in the joint
spatial-range domain. For each pixel,

       1. Initialize j=1 and yi,1=xi.
       2. Compute yi,j+1 according to (eq. 4.11) until convergence, y=yi,c.
       3. Assign zi=(xi,yi,c).
The superscripts s and r denote the spatial and range components of a vector,
respectively. The assignment specifies that the filtered data at the spatial location xsi
will have the range component of the point of convergence yri,c.




                                                 64
Chapter 4                                                       Image Analysis Techniques




- Mean shift Segmentation

Let xi and zi , i=1,…,n be the d-dimensional input and filtered image pixels in the joint
spatial-range domain and Li the label of the ith pixel in the segmented image.

       1. Run the mean shift filtering procedure for the image and store all the
            information about the d-dimensional convergence point in zi ,i.e., zi=yi,c.

       2. Delineate in the joint domain the clusters {CLp}p=1,…,m by grouping together
            all zi which are closer than hs in the spatial domain and hr in the range
            domain, i.e., concatenate the basins of attraction of the corresponding
            convergence points.

       3. For each i=1,…,n assign Li={p | zi є CLp}.

       4. Eliminate spatial regions of size less than minimum region size MRS
            threshold.

Figure 4.1 illustrates the process of the filtering, as seen in the figure, the circle around
the pixel shows the neighbours of the pixels, the basins of attraction pixels are the
pixels drawing the path toward the mean of this pixel. In Figure 4.2 the visualization
of mean shift-based filtering and segmentation for gray-level data is presented. As
seen in the figure, figure 4.2b presents the mean shift paths for the pixels on the
plateau and on the line. The black dots are the points of convergence. Figure 4.2c
shows the filtering result where (hs, hr) = (8,4). While in figure 4.2d the segmentation
result is presented. Figure 4.3 presents an example of a final segmented gray image
with its region boundaries.




                                             65
Chapter 4                                                    Image Analysis Techniques




Figure 4.1: Successive computations of the meanshift define a path leading to a local
                                density maximum




  Figure 4.2: Mean shift filtering and segmentation (a) Input. (b) Mean shift paths.
                    (c) Filtering result (d) Segmentation result.




   Figure 4.3: (a) Input. (b) Mean shift segmentation result.(c) regions boundaries.



                                          66
Chapter 4                                                       Image Analysis Techniques




- Choice of bandwidth

Bandwidth h should be chosen such that it 'achieves an optimal compromise between
the bias and the variance over all x є Rd ' [34]. This value is also known as the mean
integrated mean square error or MISE. It is found as follows:


                                           (
                                           ˆ            ) 2
                           MISE ( x) = E ∫ f ( x) − f ( x) dx                        (4.12)


Our aim is to minimize this value, the technique used to do this is not practical though,
because is 'depends on the Laplacian of the unknown density being estimated' [34].
The best known method of bandwidth calculation is called the plug-in rule and can be
found in Sheather and Jones (1991). There is one problem though, and that is related
to using a single, fixed bandwidth. It is often difficult to calculate a global bandwidth
value that satisfactorily applies to the entire data set. This is due to the fact that local
characteristics are changing across the feature space. High density points would
require a different value to those points in sparse regions. This cannot be represented
in the mean shift algorithm, so to enable its use we use a slightly modified version
called the Variable Bandwidth Mean Shift, also known as the Adaptive Mean Shift.
This is explained in detail in [34], but it leads to more computations.

- Use of variable bandwidth

In the standard mean shift procedure a fixed bandwidth value is used. That is 'h is held
constant across x є Rd ' [34]. Hence the fixed bandwidth procedure estimates the
density at each point x by taking the average of identically scaled kernels centered at
each of the data points. The bandwidth h should be chosen so that it minimizes the
mean integrated square error (MISE) and the plug-in rule is thought to be the best way
to calculate this. Recall equation (4.5), the multivariate fixed bandwidth kernel density
estimate:


                                   ˆ        1 n  x − xi 
                                   f ( x) =    ∑ K
                                            nh i =1  h 
                                                                                    (4.13)




                                               67
Chapter 4                                                                   Image Analysis Techniques




There are two possible ways to vary the bandwidth. We could select a different
bandwidth value for each estimation point x or for each data point xi. These
respectively lead to the 'balloon density estimator' (4.14) where the estimate of f at x is
the average of identically scaled kernels centered at each data point and the 'sample
point density estimator' (4.15) where the estimate of f at x is the average of differently
scaled kernels centered at each data point.


                            ˆ             1          n
                                                               x − xi 
                            f1 ( x) =
                                        nh( x ) d
                                                    ∑ K  h( x) 
                                                                                             (4.14)
                                                    i =1               
                            ˆ         1 n       1       x − xi 
                            f 2 ( x) = ∑              K                                      (4.15)
                                      n i =1 h( xi )  h( xi ) 
                                                    d
                                                               




The sample point estimators are the better of the two as for a particular choice of h(xi)
the bias is greatly reduced which in turn minimizes the MISE value hence improving
the effectiveness of the bandwidth. The recommended method of bandwidth
calculation is the plug-in rule which calculates the pilot or initial estimate. A formula
(4.16) is then used to calculate the bandwidth for each point based on the pilot,
proportionality constant and the density estimate function.

                                                                   12
                                                  λ 
                                    h( xi ) = h0                                             (4.16)
                                                  f ( xi ) 

- Calculation of bandwidth using k-nearest neighbors

An alternative method of bandwidth calculation is to use nearest neighbors. This can
be used to form the pilot estimate for use in the formula from which each point
bandwidth is then calculated, or it can be used to find each point bandwidth
individually. We say that xi,k be the k-nearest neighbor of point xi. We can then use
the L1 norm of a certain number of nearest neighbors and use this as the bandwidth
estimate, this leads to a good data structure for later use. i.e:




                                                         68
Chapter 4                                                     Image Analysis Techniques




                                 hl = xi − xi ,k   1
                                                                                   (4.17)


The number of neighbors k needs to be large enough 'to assure there is an increase in
density within the support of most kernels having bandwidth hi' [35] . This means that
k should increase with the number of dimensions d although this is not of critical
importance for performance. If all hi = h, then we have a global bandwidth and will
hence be the same as the fixed bandwidth mean shift method. It is of great importance
that all hi > 0 otherwise many errors will occur such as division by zero.

- Speeding up meanshift computations

The problem with using the mean shift algorithm (or the adaptive mean shift
algorithm) is its computational complexity; this is increased further when using many
dimensions in what is known as 'the curse of dimensionality'. There are two forms of
this problem, the statistical curse and the computational curse. The statistical form
occurs because of the sparseness of the data, where mode detection will now require
variable bandwidth to maintain accuracy.

Speeding methods are studied like using kd-tree structure or LSH (Local Sensitivity
Hashing) for quick finding neighbors instead of scanning the image sequentially. B.
Georgescu, I. Shimshoni, P. Meer [35] called this technique ‘Fast Adaptive Mean Shift
‘(FAMS).

We therefore need to find a way to be able to reduce the number of points needing to
be considered when performing these high dimension calculations. One way to
achieve this is to partition the data into different regions and only look at those points
in a particular region when performing calculations. There are many different
techniques to do this, first k-d trees will be discussed as they are commonly used for
computer applications, then Locality Sensitive Hashing will be looked at as an
alternative.




                                             69
Chapter 4                                                      Image Analysis Techniques




- Kd-Trees

The algorithm for constructing the k-d tree [25] is as follows. For each node we
choose the discriminator such that the spread of attribute values (some statistical
method such as the variance or distance from maximum to minimum) is at the
maximum value for the subcollection represented by the node. The partitioning value
is chosen as the median value of this attribute. Then we recursively apply this step to
each child of just the partitioned node. We continue the recursive step until the size of
the subcollection is less than some predetermined parameter, at this point the
algorithm terminates. We are left with a coordinate space that is divided into a number
of buckets, each containing roughly the same number of points and each
approximately cubical in shape (each partition in each dimension is a rectangle of
some kind, these are tessellated on top of each other to give some kind of straight
edged shape). As with the one-dimensional binary search algorithm, range searching
with k-d trees is straightforward. We start at the root of the tree and recursively search
in the following manner. When we visit a node that discriminates by the jth key,
compare the jth range of the query with the discriminator value. If the range query is
greater (or less) than that value we only need to search the right (or left) subtree, the
other no longer needs to be considered during the search. If the query range overlaps
both, then both child nodes need to be searched still in the same recursive manner. To
avoid this situation we should try to build the structure so that the situation does not
arise in the first place. The k-d tree introduced by Bentley says that the 'discriminators
are chosen cyclically (that is, the root is discriminated by the first key, its children by
the second and so on.)'.

In figure 4.4 an example k-d tree is shown. At the root, point A, it is determined that
the search key is in the right hand branch and not the left, the left is therefore removed
from consideration. We then look at the first nodes on the right branch of A, which is
C. The same is done again but this time the search key is determined to overlap both
of the children so neither F nor G can be removed from consideration. We therefore




                                            70
Chapter 4                                                    Image Analysis Techniques




have to recursively look at both F and G. First, F is considered, again it is determined
that the search key overlaps both children so we look at each in turn. We find that two
of the buckets under F are in the range of our search key. After F has been considered
in detail, we look at G and in a similar way find that one bucket is in the range. We
have therefore had to search only three buckets, a small number compared to the 16.


                                     A


                     B                                              C


            D                  E                   F                    G




     H           I        J          K         L        M           N        O




                              Figure 4.4 kd-tree example

The main reason for using k-d trees was to introduce a more ordered sensible structure
to store multidimensional data and to allow fast querying to be performed accurately
and reliably. So, in comparison to other multidimensional searching techniques the k-d
tree structure performs favorably, with many other algorithms requiring O(N) for both
constructing and searching. It is most effective for situations where little is known
about the kind of queries that will be used or if a wide variety of queries are to be
expected, hence is ideal for feature space analysis with many dimensions and many
points. It can also be used for other types of search other than range queries if
required.




                                          71
Chapter 4                                                       Image Analysis Techniques




- Locality-sensitive hashing:

Locality-sensitive hashing allows us to reduce the computational complexity of the
mean shift algorithm by decreasing the number of points being considered and
manipulated at each stage. We aim to partition the data set into buckets using a hash
function in the preprocessing phase and only the points that appear in the intersection
of all of the same buckets as our current point, found by querying, will be used for the
calculations as in the nearest neighbors methods. So using a slightly different structure
to the aforementioned k-d trees, we do the following.

Preprocessing:

It is aimed to take in a set of data points of size n with dimensionality d and return a
data structure that allows us to easily query which points are going to be closest to a
particular point. We have a parameter η, which determines the number of partitions
from each dimension that will be used and also another parameter ζ that defines the
inequalities used to separate the buckets. For each d, we work through all of the points
hashing them into buckets based on their value. We end with a set of d hash tables
each with n points in them. We are left with a tree structure with d branches each
broken into ζ more branches with individual points as leaves. Of course, ζ does not
necessarily need to be the same for each dimension.

This is a form of approximate nearest neighbor query. As an input we have a
particular data point, for each dimension d we hash the value of the data point and
return the list of points that are contained within that hash table bucket, for each
dimension we use η random partitions defined by the inequalities ζ. We then union the
returned sets of points to give us a larger set (but still a subset of the whole set of data
points) which can then be used for whatever application requires it. Another operation
that can be performed is the intersection of all of the subsets; this will give only those
points that are in the same partition as the considered point in every dimension. It is
very likely that whatever calculations are performed on one of the intersection points




                                            72
Chapter 4                                                      Image Analysis Techniques




will lead to the same answer as if that calculation was performed on another point.
This is a further way to save complexity; as we can store the answer and recall it for
all of the points in a particular intersection. However, often the intersection contains
only one or two points so it may not aid a great deal.


Types of partitioning:


There are two options for partitioning, linear partitioning and data-driven partitioning.
With linear we set equally spaced values between the lowest and highest data values
of the data set and simply drop the points into whichever partition values they fall
between. This works well when 'the density of the data is approximately uniform in
the entire space' [35], but this is often not the case as vision feature spaces are usually
multimodal and have clusters of data. Data-driven partitioning tries to create partitions
so that each contains roughly the same number of points. This means that high density
areas will have more cuts in them and sparse areas will have fewer. Using data-driven
partitioning can produce Locality Sensitive Hashing much more efficient k-nearest
neighbor queries which will lead to performance and accuracy gains. It is also quicker
to build the hash table in the preprocessing stage.


To improve the efficiency of the neighborhood queries the following data structure is
constructed. The data is tessellated η times with random partitions, each defined by ζ
inequalities (Figure 4.6). In each partition ζ pairs of random numbers, dk and υk are
used. First dk, an integer between 1 and d is chosen, followed by υk, a value within the
range of the data along the dk-th coordinate.




                                            73
Chapter 4                                                       Image Analysis Techniques




    Figure 4.5: The difference between (a) linear and (b) data-driven partitioning.
    Notice that the number of points per partition in (a) varies wildly, whereas (b)
                        maintains a roughly constant number.
                                          q
                                                          t=1



                                                          t=2
                                                          .
                                                          .
                                                          .
                                                          t=η



                                                          RU

                                                          R∩




   Figure 4.6: The locality-sensitive hashing data structure. For the query point q
The overlap of η cells yields the region RU which approximates the desired
neighborhood.




                                          74
Chapter 4                                                      Image Analysis Techniques




4.1.2 K-Mean clustering algorithm

Suppose the member pixels of an image are {pi: = 1,..., MN }. The goal of k-means
algorithm is to partition the observations into k groups with cluster centers c1 , c2 ,...,
ck that minimize the square error E. The square error is defined below:

                                    MN
                           E (k ) = ∑ min ( p i − c j ) 2                           (4.18)
                                    i =1 1≤ j ≤ k



The advantage of K-means algorithm is that it works well when clusters are not well
separated from each other, which is frequently encountered in images.

In the standard k-mean algorithm [36], we know there are k clusters (i.e. we have
input this as a parameter) with each cluster having a center at some point in the space.
It can be considered as the following two steps:

   • Assume that the cluster centers are known and allocate each point to the closest
       cluster center.

   • Assume the allocation is known and choose a new set of cluster centers where
       each center is the mean of the points allocated to that cluster.

This is a simple technique and can be iterated until either a set number of iterations
have elapsed (a running time compromise) or until successive iterations lead to the
same mean values at which point we know that they are the required numbers. But it
is still necessary to input a parameter value which we are trying not to do. It may be
reasonable to run a test algorithm beforehand to automatically predetermine this
parameter, T.Kasparis, D.Charampidis [37] developed an iterative k-mean algorithm,
based on the algorithm of T. Calinski et al [38] and we use it as will be presented in
the next chapter.

There are a lot of applications of the K-mean clustering, range from unsupervised
learning of neural network, pattern recognition, classification analysis, artificial



                                                    75
Chapter 4                                                       Image Analysis Techniques




intelligent, image processing, and machine vision, etc. Here is step by step k means
clustering algorithm:


                                Start


                             Number of k
                               clusters




                              Centroid                      -

                          Distance objects to             No object
                                                                        +
                               centroids                 move group?
                                                                                End


                          Grouping based on
                          minimum distance




                        Figure: 4.7 k-mean procedure block diagram

Step 1: Begin with a decision on the value of k = number of clusters

Step 2: Put any initial partition that classifies the data into k clusters. You may assign
the training samples randomly, or systematically as the following:

   1. Take the first k training sample as single-element clusters

   2. Assign each of the remaining (n-k) training

   3. Samples to the cluster with the nearest centroid. After each assignment,
       recomputed the centroid of the gaining cluster.

Step 3: Take each sample in sequence and compute its distance from the centroid of
each of the clusters. If a sample is not currently in the cluster with the closest centroid,
switch this sample to that cluster and update the centroid of the cluster gaining the
new sample and the cluster losing the sample.



                                                76
Chapter 4                                                      Image Analysis Techniques




Step 4: Repeat step 3 until convergence is achieved, that is until a pass through the
training sample causes no new assignments.

If the number of data is less than the number of cluster then we assign each data as the
centroid of the cluster. Each centroid will have a cluster number. If the number of data
is bigger than the number of cluster, for each data, we calculate the distance to all
centroid and get the minimum distance. This data is said belong to the cluster that has
minimum distance from this data.

Since we are not sure about the location of the centroid, we need to adjust the centroid
location based on the current updated data. Then we assign all the data to this new
centroid. This process is repeated until no data is moving to another cluster anymore.
Mathematically this loop can be proved to be convergent. The convergence will
always occur if the following condition satisfied:

   1. The sum of distances from each training sample to that training sample's group
       centroid is decreased.

   2. There are only finitely many partitions of the training examples into k clusters.

Similar to other algorithms, K-mean clustering has many weaknesses:

   •   When the numbers of data are not so many, initial grouping will determine the
       cluster significantly.

   •   The number of cluster, k, must be determined before hand.

   •   We never know the real cluster, using the same data, if it is inputted in a
       different order may produce different cluster if the number of data is a few.

   •   Sensitive to initial condition. Different initial condition may produce different
       result of cluster. The algorithm may be trapped in the local optimum.




                                            77
Chapter 4                                                      Image Analysis Techniques




   •    We never know which attribute contributes more to the grouping process since
        we assume that each attribute has the same weight.

   •    Weakness of arithmetic mean is not robust to outliers. Very far data from the
        centroid may pull the centroid away from the real one.

   •    The result is circular cluster shape because based on distance.

One way to overcome those weaknesses is to use K-mean clustering only if there are
many data available. To overcome outliers problem, we can use median instead of
mean.

Some people pointed out that k-means clustering cannot be used for other type of data,
rather than quantitative data. This is not true! The key to use other type of
dissimilarity is in the distance matrix.


- Accelerated k-mean:

However, the k-mean algorithm is still slow in practice for large datasets. The number
of distance computations for n points image is nkde where k is the number of clusters
to be found and e is the number of iterations required. Empirically, e grows
sublinearly with k, n, and the dimensionality d of the data.

C. Elkan [39] developed a new technique to accelerating k-means using triangle
inequality. The accelerated algorithm avoids unnecessary distance calculations by
applying the triangle inequality in two different ways, and by keeping track of lower
and upper bounds for distances between points and centers.

      _
The algorithm is based on the fact that most distance calculations in standard k-means
are redundant. If a point is far away from a center, it is not necessary to calculate the
exact distance between the point and the center in order to know that the point should
not be assigned to this center. Conversely, if a point is much closer to one center than



                                            78
Chapter 4                                                     Image Analysis Techniques




to any other, calculating exact distances is not necessary to know that the point should
be assigned to the first center. We show below how to make these intuitions concrete.

Therefore, we need the accelerated algorithm to satisfy three properties. First, it
should be able to start with any initial centers, so that all existing initialization
methods can continue to be used. Second, given the same initial centers, it should
always produce exactly the same final centers as the standard algorithm. Third, it
should be able to use any black-box distance metric, so it should not rely for example
on optimizations specific to Euclidean distance


- Applying the triangle inequality


- For any three points p1, p2, and p3, E(p1,p3) ≤ E(p1,p2) + E(p2,p3). This is the only
“black box” property that all distance metrics E possess. The difficulty is that the
triangle inequality gives upper bounds, but we need lower bounds to avoid
calculations. Let p be a point and let c1 and c2 be centers; we need to know that
E(p,c1) ≤ E(p,c2) in order to avoid calculating the actual value of d(x,c).


The following two lemmas show how to use the triangle inequality to obtain useful
lower bounds.


       Lemma 1: Let p be a point and let c1 and c2 be centers.
       If E(c1,c2) ≥ 2E(p,c1) then E(p,c2) ≥ E(p,c1).


       Lemma 2: Let p be a point and let c1 and c2 be centers.
       Then E(p,c2) ≥ max{0,E(p,c1) – E(c1,c2)}

The proofs of these lemmas are presented in ref [39]. Putting the observations above
together, the accelerated k-means algorithm is as follows.




                                           79
Chapter 4                                                     Image Analysis Techniques




   • First, pick initial centers. Set the lower bound low(p,c) = 0 for each point p and
      center c. Assign each p to its closest initial center c(p) = argminc E(p,c), using
      Lemma 1 to avoid redundant distance calculations. Each time E(p,c) is
      computed, set low(p,c) = E(p,c). Assign upper bounds upp(p) = minc E(p,c).

   • Next, repeat until convergence:

   1. For all centers c and c’, compute E(c,ć). For all centers c, compute s(c) where
      s(c) = ½minć≠c E(c,ć).

   2. Identify all points p such that upp(p) ≤ s(c(p)).

   3. For all remaining points p and centers c such that:

       (i). c≠ c(p) and

       (ii). upp(p) > low(p,c) and

      (iii). upp(p) > ½E(c(p),c):

                   iii-a    If r(p) then compute E(p,c(p)) and assign r(p)=false.
                        Otherwise, E(p,c(p))=upp(p).

                   iii-b    If E(p,c(p)) > low(p,c) or E(p,c(p)) >½ E(c(p),c) then

                            Compute E(p,c), If E(p,c) < E(p,c(p)) then assign c(p) = c.

      4. For each center c, let m(c) be the mean of the points assigned to c.

      5. For each point p and center c assign

             low(p,c) = max{low(p,c) – E(c,m(c)),0}

      6. For each point p, assign

             upp(p)=upp(p) + E(m(c(p)),c(p))

             r(p) = true.

      7. Replace each center c by m(c).




                                           80
Chapter 4                                                     Image Analysis Techniques




- K-mean example

An example on the standard k-mean vs. the accelerated k-mean, we have a 100 points
random sample data of 2D feature space as shown in figure 4.8a, the results of
clustering the date into two clusters are shown in figure 4.8b, and figure4.8c.




                                           (a)




            (b) Standard k-means                             (c) Fast k-means
             Elapsed time: 0.5470                             Elapsed time: 0.1100

                        Figure 4.8 K-mean example
There are many other clustering techniques such as ‘Split and Merge’, ACA
‘Adaptive clustering algorithm’, FCM ‘Fuzzy c-mean’ and SOFM ‘Self organization
feature map’ neural network … etc. But we didn’t go deeply in them and just the
presented algorithms are tested in our system. In the next chapter, the results of image
clustering by these algorithms are presented.




                                           81
Chapter 4                                                     Image Analysis Techniques




4.2 Classification Techniques

Classification refers to as assigning a physical object or incident into one of a set of
predefined categories [40]. Any classification process involves two phases: the
learning phase and the recognition phase.

In the learning phase, the target is to build a model for the object content of each
object class present in the training data, which generally comprises of images with
known class labels. The object content of the training images is captured with the
chosen features analysis method, which yields a set of features for each image. These
features, which can be scalar numbers or discrete histograms or empirical
distributions, characterize given properties of the images.

In the recognition phase the object content of the unknown sample is first described
with the same feature analysis method. Then the features of the sample are compared
to those of the training images with a classification algorithm, and the sample is
assigned to the category with the best match. Optionally, if the best match is not
sufficiently good according to some predefined criteria, the unknown sample can be
rejected instead.

The performance of the image description method is often demonstrated using a
classification experiment, which typically comprises of following steps (please note
that not all steps may always be needed and the order of the steps may vary):

   1. Selection of image data.

   2. Partitioning of the image data into subimages.

   3. Preprocessing of the (sub) images.

   4. Partitioning of the (sub) images data into training and testing sets.

   5. Selection of the classification algorithm.




                                            82
Chapter 4                                                                       Image Analysis Techniques




   6. Definition of the performance criterion: In the case of class assignments the
       items in the testing set are classified, and the proportion of correctly
       (classification accuracy) or erroneously (classification error) classified items is
       used as performance criterion

4.2.1 Similarity Measurement

After having extracted features, our next task is to find a similarity measure E (not
necessarily a distance) such that E(I,j) is small if and only if I and J are close. There
are many different distance calculations as mentioned in [41]. There are categorized in
four categories:

   1. Heuristic histogram distances:

                                                           p
        (i). The Minkowski-form distance (L ):

                                                                   1
                                                            ρ
                                                                       ρ
                             E ( I , J ) =  ∑ I (i) − J (i )                                     (4.19)
                                            i                

As examples of this form, when ρ=1 the distance then is called Manhattan distance,
and when ρ=2 the distance is called the Euclidean distance.

       (ii). The Weighted-Mean-Variance (WMV) :


                                    µr ( I ) − µr ( J ) σ r ( I ) − σ r ( J )
                   E d (I , J ) =                      +                                           (4.20)
                                         σ ( µr )             σ (σ r )

A superscript Er indicates that the respective measure is applied only to the marginal
distributions along d dimension.

   2. Non-parametric test statistics:

        (i). The Kolmogrov-Smirnov distance (KS):




                                                      83
Chapter 4                                                                                                Image Analysis Techniques




                            E r ( I , J ) = max Fr (i; I ) − Fr (i; J )                                                        (4.21)
                                                   i



       (ii). A statistic of the Cramer/von Mises (CVM):


                            D d ( I , J ) = ∑ (I d (i) − J d (i ) )
                                                                                      2
                                                                                                                               (4.22)
                                              i


                    2
       (iii). The X -statistics:



                            E(I , J ) = ∑
                                                  (I (i) − Ω(i) ) 2                   , where
                                          i
                                                  ˆ
                                                 f (i )                                                                        (4.23)
                                     Ω(i ) = [I (i ) + J (i )] / 2
   3. Ground distance measures:

        (i). The Quadratic Form (QF):

                                            r r          r r
                            E ( I , J ) = ( I − J ) T A( I − J )                                                               (4.24)

       r        r
where I and J are vectors that list all the entries in I(i) and J(i) respectively

       (ii). The Earth Mover Distance (EMD):



                                      E(I , J ) =
                                                       ∑ vd  i, j      ij        ij
                                                                                                                               (4.25)
                                                       ∑ v      i, j        ij




where dij denotes the dissimilarity between bins i and j, and vij ≥ 0 is the optimal flow
between the two distributions such that the total cost                                    ∑    i, j
                                                                                                      vij d ij is minimized.


   4. Information-theoretic divergence:

        (i). The Kullback-Leibler divergence (KL):


                                                                                      I (i)
                                      E ( I , J ) = ∑ I (i) log                                                                (4.26)
                                                         i                            J (i )




                                                        84
Chapter 4                                                                           Image Analysis Techniques




          (ii). The Jeffrey-divergence (JD):


                                                             I (i)              J (i)
                               E ( I , J ) = ∑ I (i) log           + I (i ) log                          (4.27)
                                            i                Ω(i)               Ω(i)

Table 4.1 compares the properties of the different measures.


                      Lp        WMV Ks/CvM                        χ2          KL          JD      QF     EMD
Symmetric             yes         yes           yes              yes           no         yes     yes     yes
Triangle             valid       valid          valid          invalid      invalid     invalid   see     see
inequality                                                                                        text    text
Computational       medium        low       medium            medium medium medium high                   high
complexity
Exploits ground       no           no           yes              no            no         no      yes     yes
distance
Individual            no          yes            no              no            no         no      no      yes
binning
Multiple              yes         yes            no              yes          yes         yes     yes     yes
dimension
Partial matches     see text       no            no              no            no         no      no      yes
Non-parametric        yes          no           yes              yes          yes         yes     yes     yes


                        Table 4.1: Similarity measures comparisons

4.2.2 Nearest neighbor classifier

K-nearest neighbor is a supervised learning algorithm where the result of new instance
query is classified based on majority of K-nearest neighbor category. The purpose of
this algorithm is to classify a new object based on attributes and training samples. The
classifiers do not use any model to fit and only based on memory. Given a query
point, we find K number of objects or (training points) closest to the query point. The
classification is using majority vote among the classification of the K objects. Any ties




                                                        85
Chapter 4                                                         Image Analysis Techniques




can be broken at random. K Nearest neighbor algorithm used neighborhood
classification as the prediction value of the new query instance.

- K-Nearest Neighbor (KNN) Algorithm:

K-nearest neighbor algorithm is very simple. It works based on minimum distance
from the query instance to the training samples to determine the K-nearest neighbors.
After we gather K nearest neighbors, we take simple majority of these K-nearest
neighbors to be the prediction of the query instance.

Suppose we determine knn = 8 (we will use 8 nearest neighbors) as parameter of this
algorithm. Then we calculate the distance between the query-instance and all the
training samples. Because we use only quantitative Xi, we can use Euclidean distance.
Suppose the query instance have coordinates (x1q, x2q) and the coordinate of training
sample is (x1t, x2t) then square Euclidean distance is E= (x1t - x1q)2 + (x2t – x2q)2 .

The next step is to find the K-nearest neighbors. We include a training sample as
nearest neighbors if the distance of this training sample to the query instance is less
than or equal to the K-th smallest distance. In other words, we sort the distance of all
training samples to the query instance and determine the K-th minimum distance. If
the distance of the training sample is below the K-th minimum, then we gather the
category C of this nearest neighbors' training samples.

The KNN prediction of the query instance is based on simple majority of the category
of nearest neighbors. In our example, if the category is only binary, then the majority
can be taken as simple as counting the number of ‘+' and ‘-‘signs. If the number of
plus is greater than minus, we predict the query instance as plus and vice versa. If the
number of plus is equal to minus, we can choose arbitrary or determine as one of the
plus or minus. If your training samples contain C as categorical data, take simple
majority among this data. If the C is quantitative, take average or any central tendency
or mean value such as median or geometric mean.




                                              86
Chapter 4                                                    Image Analysis Techniques




Nearest neighbor methods have been used as an important pattern recognition tool. In
such methods, the aim is to find the nearest neighbors of an unidentified test pattern
within a hypersphere of pre-defined radius in order to determine its true class. The
traditional k nearest neighbor rule has been described as follows.

   • Out of Nt training vectors, identify the knn, irrespective of class label. knn is
       chosen to be odd.

   • Out of these knn samples, identify the number of vectors, ki, that belong to
       class Cli, i=1, 2,…Mc where Mc is the total number of classes.

   • Assign the query sample q to the class Cli with the maximum number ki of
       samples.



                                                                     Class 1


                                                                     Class 2

                                    0.35
                                                                     Class 3
                              0.3
            Feature 1




                                    ?
                                                                ? Unknown
                                    0.2
                                                                  Sample


                                                                     Nearest
                                                                     Neighbors



                                    Feature 2
       Figure 4.9: Nearest neighbor classifier in action on a two class problem

As shown in Figure 4.9, assuming knn=3, the query point is classified as class1 since
there are two of the nearest three samples belong to class 1. Nearest neighbor methods
can detect a single or multiple number of nearest neighbors. A single nearest neighbor
method is primarily suited to recognizing data where we have sufficient confidence in




                                           87
Chapter 4                                                      Image Analysis Techniques




the fact that class distributions are non-overlapping and the features used are
discriminatory. In most practical applications however the data distributions for
various classes are overlapping and more than one nearest neighbors are used for
majority voting.

In k-nearest neighbor methods, certain implicit assumptions about data are made in
order to achieve a good recognition performance.

       The first assumption requires that individual feature vectors for various classes
       are discriminatory. This assumes that feature vectors are statistically different
       across various classes. This ensures that for a given test data, it is more likely
       to be surrounded by data of its true class rather than of different classes.

       The second assumption requires that the unique characteristic of a pattern that
       defines its signature, and ultimately its class, is not significantly dependent on
       the interaction between various features. In other words, nearest neighbor
       methods work better with data where features are statistically independent.
       This is because nearest neighbor methods are based on some form of distance
       measure and nearest neighbor detection of test data is not dependent on their
       feature interaction.

- Strength and Weakness of K-Nearest Neighbor Algorithm

Advantage

   •   Robust to noisy training data (especially if we use inverse square of weighted
       distance as the “distance”)

   •   Effective if the training data is large.

Disadvantage

   •   Need to determine value of parameter K (number of nearest neighbors)




                                             88
Chapter 4                                                       Image Analysis Techniques




   •   Distance based learning is not clear which type of distance to use and which
       attribute to use to produce the best results. Shall we use all attributes or certain
       attributes only?

   •   Computation cost is quite high because we need to compute distance of each
       query instance to all training samples. Some indexing (e.g. K-D tree) may
       reduce this computational cost




4.2.3 Validation Algorithms

It’s important to evaluate the features, classifier and the distance used in any
classification system to measure their compatibility and to validate their usage in the
required recognition system. This evaluation procedure is what we mean by the
validation algorithms [12]. Here are three common validation algorithms:

• Leave-one-out cross validation (LOOCV)


       1. For each training example (xi, yi)



              • Train a classifier with all training data except (xi, yi)


              • Test the classifier’s accuracy on (xi, yi)


       2. LOOCV accuracy = average of all n accuracies



• The holdout set method


       1. Randomly choose, say 30%, of the training data, set it aside




                                            89
Chapter 4                                                      Image Analysis Techniques




       2. Train a classifier with the remaining 70%

       3. Test the classifier’s accuracy on the 30%

• K-fold cross validation (CV)

       1. Randomly divide training data into k chunks (folds).

       2. For each fold:

            • Train a classifier with all training data except the fold

            • Test the classifier’s accuracy on the fold

       3. K-fold CV accuracy = average of all k accuracies

It is obvious that the final outcome of any classification experiment depends on
numerous factors, both in terms of the possible built-in parameters in the feature
description algorithm and the various choices in the experimental setup. Results of
classification experiments have always been suspect to dependence on individual
choices in image acquisition, preprocessing, sampling etc., since no performance
characterization has been established in the image analysis literature.




                                            90
Chapter 5                                         Texture Recognition Based Image Coloring System




                                            CHAPTER FIVE

   TEXTURE RECOGNITION BASED IMAGE
          COLORING SYSTEM
In this chapter we present our proposed intelligent fully automated coloring system
that we have named "Texture Recognition based Image Coloring System" (TRICS).
As shown in figure 5.1; the system is composed of three main stages: Segmentation
stage, Classification stage, and Coloring stage. In the following sections we will
describe the construction of each of these stages in details.
                     Gray image


                                                                         Segmentation
                Features extraction                                Segmentation
                (Joint, wavelets, laws,…)                   (Mean Shift, K-Mean, FCM,..)


                                                                                                     Segmented
                                                                                                   image, Clusters


                                                                        Classification
                     Features extraction                            Classification
            (Co-occurrence, Tamura, Wavelets energies)              (K-NN classifier,..)


                                                                                Samples features
                                             Class labels
                                                                          Classes Hues       Database




                                                                               Coloring

         Convert image                  Set Hue, Saturation,                Convert to
         to HSV channels                  and Brightness                      RGB



                                            Colored image

                                              Figure 5.1: TRICS diagram




                                                              91
Chapter 5                          Texture Recognition Based Image Coloring System




5.1 Segmentation Stage:


In this stage the image pixels have to be clustered or segmented according to their
features into different regions of similar texture features. Any segmentation technique
must be preceded by a feature extraction phase. In fact, there are features already
attached to each pixel such as the position (spatial features) and the gray level (range
features) of each pixel. Also there are many features that could be computed from
these features.


Any feature must be one of two basic types of features: Pixel based (attached to each
pixel) or Region based (attached to a group of pixels) feature. Since by definition the
segmentation usually is done by attaching each pixel to a specific cluster, the features
needed for our segmentation stage must be pixel based features (PBF). That means a
feature vector for each pixel needed to be constructed or, by other words, we need a
matrix of the same image size for each feature.



5.1.1 Feature extraction phase


One of the most common pixel based texture features are the Discrete Wavelets
Transform (DWT) coefficients (chapter 3). As known, DWT decomposition results in
four subbands; three subbands for details coefficients and one subband for
approximation coefficients. Usually the details coefficients are used to generate pixels
features, while the approximation subband is used for the next level decomposition.
Each of these subbands has a size of quarter the original image size. To make use of
these coefficients as pixels features, each subband must have the same size of the
original image.




                                          92
Chapter 5                          Texture Recognition Based Image Coloring System




One solution is to resize each subband to a doubly sized one. This step usually is done
by upsampling or by any interpolation type. These methods caused bad boundary
segmentation results. Many researchers recommend using Discrete Wavelet Frames
(DWF) coefficients instead [43]. But we have recovered this problem by
reconstructing the previous level coefficients of the approximation subband followed
by averaging operation so four images of the original image size are obtained [25].
Our modified wavelets features results are better than interpolation results (figure 5.2).


Figure 5.3 illustrates this modification diagram; first, one level decomposition is done.
The approximation subband is used to reconstruct the previous level subbands.
Usually the obtained approximation subband must be the same as the original image.
An averaging filter is applied for each new subband and the new approximation
subband is used again in the next decomposition. We found that 9×9 averaging filter
size is suitable for most cases. Figure 5.4 shows the approximation subbands for both
decomposition and reconstruction levels. Reader can notice that the approximation
subband after reconstruction is a blurred version of the original image; this is due to
the averaging step after the decomposition. Figure 5.5 shows 3 level coefficients
example.


Finally, our PBF vector contains the spatial features combined with the pixel intensity,
what D. Comaniciu & P. Meer [32] called ‘Joint domain’ feature space, and L × 3
modified wavelets coefficients as texture features.

Another PBF are Laws kernels energy coefficients (chapter 3). Laws kernels
coefficients are 25 different kernels which by the convolution with the image result in
25 different images of the same size (figure 5.6). So in this case, our PBF vector has
28 elements.

Notice that figures 5.5 and 5.6 are drawn using the gray color map instead of the color
map of the actual image to distinguish between different subbands.



                                           93
Chapter 5                                  Texture Recognition Based Image Coloring System




     Figure 5.2 Interpolated wavelets coefficients vs. previous level reconstruction coefficients




            x
            y                                        Joint domain features
                                                                                        Image
            g                                                               One level
                                                                             DWT
                                                           Previous level
            h0                                       A0     construction A1      H1
                                                H0
            v0                             V0
                                      D0                                    V1   D1
            d0     Averaging
                     filter
                                                                            One level
            h1                                                               DWT
                                                     A1
            v1                                  H1                          A2   H2      A1
            d1                             V1              Previous level
                                      D1                    construction
                   Averaging                                                V2   D2
            …
                     filter
            …




 Pixel Feature (PBF) Vector

                              Figure 5.3: Modified wavelets feature extraction




                                                      94
Chapter 5                                Texture Recognition Based Image Coloring System




             On level down                                              On level up
              construction                                           reconstruction and
                                                                         averaging




                             Figure 5.4: wavelets approximation levels




            A1                      H1                       V1                           D1




            A2                      H2                       V2                           D2




            A3                      H3                       V3                           D3
                          Figure 5.5 three levels modified DWT subbands
                     (A: approximation, H: horizontal, V: vertical, D: diagonal)




                                                 95
Chapter 5           Texture Recognition Based Image Coloring System




            Figure 5.6: Lows kernels coefficients




                             96
Chapter 5                          Texture Recognition Based Image Coloring System




5.1.2 Clustering phase / Mean Shift

After extracting the image features, a clustering technique must be considered. First,
Mean shift is used as a clustering technique as it’s a new and a powerful clustering
technique in the literature. Its power appears in determining the data clusters without
any knowledge about the clusters number or shapes. The clustering quality also seems
better than any other clustering technique because it based on clustering the data
according to their density to their modes. Table 5.1 shows the results of using
meanshift with three different images using two types of features:

   1. Clustering only using joint space (spatial + range domains)
   2. Clustering using one level wavelets detail coefficients with the joint space.

The results show also the time spent for each image. It's clearly observed that it is so
much time for executing the mean shift even with small images. The drawback of this
technique is the estimated time during the clustering process, specially when the
Matlab package is used for coding such system, that the time to cluster n point is
O(n2de) where n is the number of data points, d is the space dimension (feature vector
length), and e is the number of the iteration which is relative to n, while the time
estimated to cluster the same data points using k-means technique is O(knde).

Another drawback in the Mean Shift is the associated window length with each
feature in the space which usually needs to perform the procedure more than one time
to select the best windows size. The adaptive meanshift (chapter 4) which was
proposed to solve this problem increases the time of the execution too. Fast meanshift
(FAMS) also has the drawback of the large number of parameters needed to run the
system, and the time of execution still long time which is not suitable for our system
requirements. Figure 5.7 shows an example of using the fast mean-shift with fixed
bandwidth of 16 applied on the 3rd image in table 5.1 using the code of M.Harbour
[25]. Reader can notice that the time of execution is still long. Besides, the results are
not good enough.




                                           97
Chapter 5                                     Texture Recognition Based Image Coloring System




     Original image           Segmented with joint domain             Segmented with texture features




                              - hs=16,hr=16,MRS=20                    hs=16,hr=16, hw=128, MRS=20

        (64 × 64)             - Time : 0 1 14                         - Time : 0 3 46
                                                      - Classes : 2                                 - Classes : 2




                              - hs=16,hr=16,MRS=20                    hs=16,hr=16, hw=16,MRS=20

        (64 × 64)             - Time:    0      1    14               - Time : 0       13     15
                                                      - classes : 3                                 - Classes : 2




                              - hs=16,hr=16,MRS=500                   - hs=8,hr=8,hw=4,MRS=500
        (170×256)             - Time:    0      34   15               - Time:      0     39    54
                              - classes : 9                           - classes : 7
                                        Table 5.1 Mean shift results




                        (a)                                                        (b)
 Figure 5.7: FAMS results with Laws.(hs, hr=16), Elapsed time: 18 min, (a)2 clusters, (b)9 clusters




                                                          98
Chapter 5                          Texture Recognition Based Image Coloring System




5.1.3 Clustering phase / K-mean

Because of these drawbacks, we have looked for another clustering technique. We
have used the Fast k-Mean algorithm (chapter 4) to accelerate the traditional k-mean
algorithm. Any way, k-means technique still has some drawbacks; one of these
drawbacks is that the spatial features cause the segmentation to be spatially structured
which is not the case of the natural scenes.

One solution to overcome this problem is increasing the numbers of segments. In
figure 5.8; the left segmented images used the actual number of clusters while the
middle ones used large number of clusters. Another solution is using k-means without
spatial features what gives more accurate results except that it may exist one cluster
that is scattered around disjoint regions. We overcome this problem by splitting the
cluster of many disjoint regions to more than one cluster figure 5.9. In our system we
are concerned with regions clustering rather than classes clustering. By this
modification the number of clusters used as a parameter in the k-mean algorithm is
considered as initial clustering values and the final number of regions may exceed this
value.

Finally, any segmentation technique may result in small scattered regions that seem to
be noise in the final segmented image. In our system these small regions usually cause
bad results and system delay in the classification stage. So we overcome this
drawback by eliminating the small regions that have a size less than a threshold MRS.
Figure 5.10 shows some examples of segmentation results before and after small
regions elimination step. Figure 5.10 also shows that both laws and wavelets have
similar segmentation results, but it not always the case. Some images have better
segmentation with wavelets rather than laws, like figure 5.11, although laws
boundaries sometimes are better than wavelets, as seen in the marked region in the
figure.




                                           99
Chapter 5                               Texture Recognition Based Image Coloring System




                          Figure 5.8 k-mean with spatial features k= {3,9}




                                           1             1                 1            2
                                                   2                               3


                                                   1                               4

                                               3                               5

            Figure 5.9 Segmentation results before and after splitting disjoint regions k=3




                                                   100
Chapter 5                             Texture Recognition Based Image Coloring System




               (a)                             (b)                            (c)




                              (d)                             (e)




                              (f)                             (g)
     Figure 5.10 Elimination small regions (a) original gray image (b) segmented image by k=2
  (c) Segmented after eliminating small regions (MRS=500), (d,e) segmentation by k=3 with laws
                 features (f,g) segmentation by k=3 with wavelets features6regions




      (a) Original gray               (b) laws features              (c) wavelets features
             Figure 5.11: Segmentation results by laws and wavelets (k=3, MRS=500)




                                              101
Chapter 5                              Texture Recognition Based Image Coloring System




- Adaptive k-mean
As a result of this section we can notice that there are two parameters needed to
perform the k-mean segmentation k, MRS. As attempt to eliminate the number of the
system parameters we have searched for an adaptive algorithm to generate the best
number of clusters, k, needed for k-mean clustering technique. So we have proposed a
new adaptive k-mean clustering algorithm.
The algorithm selects the cluster number based on a modified version of the variance
ratio criterion (VRC) index introduced first by Calinski et al[38]. and used in the
adaptive algorithm of D. Charalampidis et al [44]
                                       (n − k ) BCSS
                            VRC =                                          (5.1)
                                       (k − 1)WCSS

Where BCSS is the “Between Clusters Sum of Squares” and WCSS is the “Within
Cluster Sum of Squares”. Assume n is the total number of pixels, and f is the number
of clusters. The “within” and “between” cluster sums of squares are, respectively,
defined as
                                        n     k                                    (5.2)
                           WCSS = ∑∑ ( xij − cij ) 2
                                       i =1 j =1
                                   n     k
                           TSS = ∑∑ ( xij − C j ) 2                                (5.3)
                                  i =1 j =1

                           BCSS = TSS − WCSS
                                                                                   (5.4)


Where TSS is the “Total Sum of Squares”. In (eq 5.2) and (eq 5.3), xij is the value of
the element of the FV, cij is the corresponding cluster center, and Cj is the
corresponding overall FV center. The number of clusters is incremented starting at 1.
The estimated number of clusters is the one for which index VRC is maximized.

Some insight about the significance of the VRC index is given in figure 5.12, where
two-cluster examples are presented. Essentially, the WCSS is the total distance of the
feature vectors from the center of their associated cluster and it indicates the clusters’
extent.



                                                   102
Chapter 5                                Texture Recognition Based Image Coloring System




              Cluster 1            Cluster 2       Cluster 1             Cluster 2




                             (a)                                   (b)

              Cluster 1            Cluster 2                Cluster 1 points
                                                            Cluster 2 points
                                                            Center of cluster 1
                                                            Center of cluster 2
                                                            Center of all points
                                                            Distance between points
                             (c)                            and cluster center

               Figure. 5.12: Example to illustrate the significance of the VRC index.

In figure 5.12, the distance of each vector from the corresponding center is indicated
with a line. The TSS is the total distance of all vectors from the overall center, and it is
an indicative measure of cluster separability. In figure 5.12 (a) and (b) the WCSS is
identical [one can notice that both clusters 1 and 2 of figure 5.12 (a) are identical to
the ones in figure 5.12 (b)]. On the other hand, the TSS for figure 5.12 (b) is smaller
(the two clusters are closer to the overall center). Therefore, the VRC index is smaller
as indicated by (2) and (4), which signifies that the separation of the two clusters in
figure 5.12 is larger than in figure 5.12 (b) with respect to the cluster extent. The
example presented in figure 5.12 (c) is a downscaled version of the example in figure
5.12 (a). The WCSS and TSS in figure 5.12 (c) are smaller than the ones in figure 5.12
(a) by the same ratio. Thus, the VRC is the same in both examples.

Instead of simply looking for maximization of VRC, we use criterion presented by D.
Charalampidis et al [44] which they have called CEC “Combined Estimation
Criterion”.


   1. If the VCR index, for k clusters, is smaller than 98 of the VCR index, for k-1
       clusters, the is not satisfied.




                                               103
Chapter 5                          Texture Recognition Based Image Coloring System




   2. If the VCR index, for k clusters, is larger than 102 of the VCR index for k-1
      clusters, or if k=1, the CEC is satisfied.

   3. If the VCR index, for clusters, is smaller than 102 but larger than 98 of the
      VCR index for clusters, the CEC is satisfied only if TSS for k-1 clusters is
      smaller than 70 of TSS for k clusters.

Finally the adaptive segmentation algorithm in steps will be as following:

   1. initiate k=1

   2. Perform the fast k-mean segmentation with k.

   3. Calculate the CEC. If CEC is satisfied, set k = k+1 and goto step 2, else stop
      with number of clusters equal to k..

The results of and the comparison between the fast k-mean with fixed number of
clusters and adaptive k-mean are presented in table 5.2, A-time is abbreviated for
adaptive fast k-mean time, F-time for fixed k fast k-mean time, T-time for traditional
k-mean without accelerating , E-time for the elimination time and EM-time for
elimination time with estimating minimum region size (MRS).

- Minimum region size estimation

Finally, we have generated a simple routine to estimate the MRS. The routine can be
described as follows:


      1. Split the disjoint regions.
      2. Calculate all regions size.
      3. Sort regions according to their size
      4. Measure the difference between them.
      5. Select the regions size of difference more than the largest image dimension.
      6. Consider the MRS.




                                          104
Chapter 5                         Texture Recognition Based Image Coloring System




This algorithm works very well and gives accurate results. The execution time for
elimination step is increased significantly. For example for image 1 in the table, the
estimated MRS is 324 and the execution time is increased to 59 sec.




                                                                A-     F-     T-     E-
            Image             Features   Clusters    Regions
                                                               time   time   time   time

                                                               19     12     42     31
                              Wavelets       4             8
                                                               sec    sec    min    sec


                                                                             10
                                                               11      3     min    31
                                Laws         4             7
                                                               sec    sec    36     sec
     (170×256), MRS=500                                                      sec
                                                                              17
                                                               11      3     min,   21
                              Wavelets       3             5
                                                               sec    sec     14    sec
                                                                             sec

                                                                             13
                                                               13      5     min    32
                                Laws         3             5
                                                               sec    sec    44     sec
      240×320, MRS=500                                                       sec
                                Table 5.2 k-mean results




                                          105
Chapter 5                          Texture Recognition Based Image Coloring System




5.2 Classification Stage


Classification means to assign each region to a specific class name saved in a specific
database. Classification can be described as a recognition stage. Any recognition
system must pass by two basic phases:


   1: Training phase: where training set must be defined and their features must be
   extracted.
   2: Testing phase: where test image is defined and its features extracted and
   compared with the training set features using a classifier to determine to which
   class it belongs.

For the training phase, we have generated two training sets; the first one was taken
from Brodatz database (chapter 1). About 32 different Brodatz database collection
images are used. Each image has a size of 256 × 256. To prepare different images for
each class, each image was mirrored horizontally and vertically to produce a 512 ×512
image. Then the image is split into 16 different images of size 128 × 128 as shown in
figure 5. 13.


The other training set was taken from real natural images as random 64×64 blocks.
Nine different classes ‘cloud, sky, sea, sand, tree, grass, stone, water, and wood’ were
gathered. Each class has about 12-25 samples. Figure 5.14 shows example samples
from our training set.




                                          106
Chapter 5                       Texture Recognition Based Image Coloring System




            256 × 256                 512 × 512                  16 × 128 × 128
                        Figure 5. 13: Training set1 generation




       Tree                                      Sand


        Sky                                      Grass


        Sea                                      Cloud

                         Figure 5.14: Training set2 samples




                                        107
Chapter 5                             Texture Recognition Based Image Coloring System




In the testing phase; a group of testing images is selected for each training set. For the
first one, artificial multi textural images were used as shown in figure 5.15. For the
other training set, different natural scenes and also the images used in extracting the
training set were used for testing. The PANN database collection also was used for
testing.


To classify each region in the segmented image, region based features (RBF) must be
extracted from each region. Since by definition, texture is sensitive to the pixels
arrangement, rectangular shaped region must be considered instead of using a non
uniform shape or what Y. Liu et al. [45] described as arbitrary shaped regions.


For training set 1, we developed a routine that searches for the largest rectangle in
each region. This routine can be described briefly as follows:


   1. Determine 9 points in each region; the center of the region mass and 8 corners
       points “north, east, south, west, north-east, north-west, south-east, south-west”.
   2. For each point, start by a rectangle of size 1×1 pixel and extend it while it’s
       still inside the region.
   3. Consider the rectangle of the maximum size. This rectangle in most cases is
       describing the most collection of pixels in the region.

As shown in figure 5.16, for each region of the segmented image, there are maximum
of different 9 points marked as blue asterisks for corner points and red asterisk for the
central point For set2, we developed a routine that searches for the most suitable
64×64 rectangle for each region. This routine can be described briefly as follows:


   1. Find the largest rectangle for each region as described above..
   2. Extract a central 64×64 rectangle from this rectangle.

In figure 5.17, the largest rectangle in each region appears in green while the final
extracted rectangle appears in red.




                                            108
Chapter 5                             Texture Recognition Based Image Coloring System




       (a)            (b)                      (c)                      (d)             (e)
             Figure 5.15 samples of the artificial images used in set1 classification




                       Figure 5.16: Largest rectangle extraction example




                       Figure 5.17: 64×64 rectangle extraction example




                                              109
Chapter 5                          Texture Recognition Based Image Coloring System




In many cases the final rectangle may exceed the region and take other regions pixels
inside its boundaries like region 3, 4 and 5 in the figure. This occurs when the largest
rectangle in the region has a dimension less than 64 pixels. We overcame this problem
by padding the region inside the rectangle. Different initial padding techniques can be
used. Zero padding (ZR) is the simplest. However, it introduces discontinuities at the
region boundary, yielding spurious high frequency coefficients that will degrade
performance of the texture feature obtained. To relieve this problem, we have used the
padding techniques presented by Y.Liu et al. [45]: a simple Mirroring (MR) padding,
and an Object-based Padding (OBP) which provides smooth extrapolation. MR
extends the region with its ‘mirror image’ outside the boundary. Usually, the support
of the region is arbitrary with respect to the extended rectangle, and the mirroring may
have to be done several times up to the rectangle boundary.


The OBP technique has the following steps :


         I. The region A is first extended over the rectangular block L by zero-
            padding .
        II. L is divided into 8×8 or 16×16 blocks, and each block is classified into
            one of the three types :
                 (1) All pixels are in A;
                 (2) Some pixels belong to A ;
                 (3) All pixels do not belong to A .
       III. Only blocks in the above case (2) and (3) need padding.


For case (2), pixels that do not belong to A are replaced with the mean value of the
pixels that are in A. In case (3), the pixel values are computed by referring to the
adjacent blocks around the current block. If no block that has already been padded has
been found among the adjacent blocks, the current block is skipped and operation is
moved to the next block. Figure 5.18 shows padding results using ZR, MR, and OBP.




                                            110
Chapter 5                          Texture Recognition Based Image Coloring System




As noticed, the padding steps depend on the number of pixels to be padded, so this
technique is time consuming. A simple checking can solve this problem. And three
cases can be found:



   • If the number of region pixels > 85% of the rectangle size, and the difference
       between the gray average between the region pixels and the pixels of the
       intersected regions is > 50 padding can be done



   • If the number of region pixels > 85% of the rectangle size, and the difference
       between the gray average between the region pixels and the pixels of the
       intersected regions is < 50, the rectangle is taken without padding.



   • If the number of region pixels < 85% of the rectangle size, stretch the largest
       rectangle of the region to 64×64 rectangle.


Our classification or matching stage can be considered as a CBIR (Content Base
Image Retrieval) system [24] since the process of matching will search for the most
similar pattern ‘image’ in the database the matches the extracted rectangle ‘image’ for
each shape. For that purpose, RBF vector must be used. Statistical features such as
GLCM (Gray Level Co-occurrence Matrix) standard features; ‘Energy, Entropy,
Contrast (Inertia), and Homogeneity’ are used together with Tamura features
‘coarseness-contrast and directionality’ (chapter 3) in the feature extraction phase.


After experience, we found that these features are scale sensitive so it was suitable for
cases like figure 5.15(c) where the rectangle is the same as the pattern scale as seen in
figure 5.19.




                                          111
Chapter 5                       Texture Recognition Based Image Coloring System




                 Figure 5.18: Padding results by ZR, MR, and OBP




            Figure 5.19: Classification results for GLCM, Tamura features




                                        112
Chapter 5                          Texture Recognition Based Image Coloring System




For natural scenes scale invariant features are more suitable. Wavelets coefficients
provide this requirement. To get region based wavelets features, wavelets mean and
variance are tested [46]. But the values were very scattered and the results were not
accurate for most cases. Instead, wavelets energies [47] for 5 levels decomposition are
considered and they give accuracy of 92% using “LOOCV” validation method (each
(sub)image is classified one by one so that other (sub)images serve as the training
data) method (chapter 4). Finally, 15 features RBF vector are then taken from this
rectangle and compared to the features of the training set by a classifier technique. For
simplicity and for testing our TRICS system, KNN (K-Nearest Neighbors) classifier is
used with values (knn=1; knn=5, knn=10, knn=20). After experiments we find k=5
gives more accuracy up to 94% using “N-fold “ validation method (the collection of
(sub)images is divided into N disjoint sets, of which N-1 serve as training data in turn
and the Nth set is used for testing) (chapter 4). Figure 5.20 shows natural scenes
classification results.


Some errors may occur in the classification stage according to the similarity between
some different texture samples. Some of these similarities can be accepted like as
cloud and water that they are near in color but with some cases (e.g. sea and grass) it’s
not accepted that they are completely different in color. To overcome this problem
more dissimilar textures can be supplied to the DB or SOFM neural net can be used to
classify the samples instead of human made. To prevent this type of errors as possible,
we applied a simple technique that classifies the objects in two classification levels.
For instance, if the KNN results in 5 classes like “grass, sea, water, grass, sea”, the
traditional solution is the class of the grass. But we have considered the water and the
sea classes to be one class. So in level one, the result will exclude the grass and retain
the water and sea classes, then in level two the algorithm will select the sea class
among them since it has the most occurrence. In the same way, trees and grass are
considered as one class in level one, also for (sky and clouds) and (wood and stone)..




                                           113
Chapter 5                          Texture Recognition Based Image Coloring System




            Region no:1                 Region no:2                 Region no:3




            Class : Cloud                Class : Tree               Class : Grass




                                       (a)Example 1



            Region no:1                 Region no:2                 Region no:3




             Class : Sky                 Class : Sea                Class : Sand




                                       (a)Example 2



                     Figure 5.20: Natural scenes classification results




                                            114
Chapter 5                          Texture Recognition Based Image Coloring System




5.3 Coloring Stage:


Back to figure 5.1, in coloring stage, the image is converted to HSV/HSB color space
which has also three channels, two chromatic ones “Hue, Saturation” and one
achromatic channel “Value/Brightness” (Appendix A). Actually we have selected this
color model for its meaningful definition; the Hue value (the color of the pixel) differs
from 0 for red to 1 for violet, the Saturation (the amount of the color pureness) varies
from 0 for unsaturated color (white) to 1 for fully saturated color, and the
Value/Brightness (the amount of light) varies from 0 for black (dark) to 1 for white
(light). HSV has more natural definitions more than any other color models. In other
words; if we want to describe a red color, red means its hue, light or dark means its
brightness, and the pureness of the red means its saturation. So the Hue value is the
basic value to describe the color and the other channels affects the color quality
independently.


Another alternative color model, HSI/HLS (appendix A), has the same definition
except that the maximum value for saturation occurs in the middle of the lightness
channel. In fact there is another important difference between the two models that the
lightness channel in HSI/HLS model is the same intensity of the gray image. While in
HSV/HSB the brightness channel is computed as the maximum value of Red, Green
and Blue channels so it’s not the same image intensity as shown in figure 5.21. We
have used both of the two models replacing the lightness channel in HLS model and
the brightness channel in HSB model by the gray image.




                                          115
Chapter 5              Texture Recognition Based Image Coloring System




            Figure 5.21: HSV/HSB and HSI/HLS channels




                              116
Chapter 5                           Texture Recognition Based Image Coloring System




For both models, we still have to get the remainder chromatic values ‘Hue and
Saturation’. Our experiments show that most natural textures have only one Hue or
very narrow range hues for each texture as shown in figure 5.21. So, only one hue
value is satisfying to describe each texture. We have selected a suitable hue value for
each texture class and attach it with its class in the database.


Finally saturation is a case of tasting the color and there is no mathematical formula to
retrieve it back. Many experiments were done to get the suitable value for the
saturation channel. Fixed saturation value for the whole image gives unnatural look
for many scenes. Selecting specific saturation for each region may cause cross channel
artifacts. On the other hand, the figure shows that the saturation channel seems to be
the complement of the lightness channel and our experiments proved this notice as
seen in figure 5.22 where fixed saturation =128 is used in the middle and the estimated
saturation is used in the right. Figure 5.23 presents another example of using the
estimated saturation as the contrary of the luminance channel. Due to the definition of
HLS model, saturation results in better coloring in the middle of the lightness channel
so the saturation channel is substituted by half the complement of the lightness
channel. There is no big difference between the results of both of the two models. Just,
HLS model conversion time is longer than of HSV model.




                                            117
Chapter 5                         Texture Recognition Based Image Coloring System




                    Figure 5.22: Change in saturation coloring results




            Figure 5.23: Replacing saturation with the complement of lightness




                                           118
Chapter 6                                                       Results and Conclusion




                               CHAPTER SIX

              RESULTS AND CONCLUSIONS

6.1 Experiments:


The system has been written in Matlab v6.5 code release13 and has run on a
Pentium4 PC with 256 MB RAM. The system duration time differs from less than 1
second up to 14 minutes according to segmentation technique, image size, regions
features selection (padding exceed the execution time) and the number of clustered to
be classified.


First, we test our TRICS system on Brodatz (chapter 3) mixture textures images. As
previously mentioned, the segmentation features used with this training set concluded
the spatial features while the classification features that have been used were the
GLCM measures with Tamura features. The attached hues have been selected
randomly just to test the validation of the classification stage and coloring quality. As
seen in figure 6.1, the spatial features are suitable with well structured images like
figure 6.1a (without small region elimination) and figure 6.1b, while it leads to bad
segmentation for some cases (diagonally segmented regions) like figure 6.1 c. This is
according to the initial centers selected by the k-mean procedure.

After many experiments we found that Brodatz textures are not suitable for testing our
system due to the unnatural look, the highly contrasted textures and the high
sensitivity for scale and rotation what is not the case for most natural scenes ‘the core
of research’, so we have started working for the second training set ‘natural’ and
saving the results. Some of the images used for training set collection are presented in
figure 6.2.



                                          119
Chapter 6                                                   Results and Conclusion




            (a) K=3             (b) K=5, M=100           (c) K=5, M=100
                 Figure 6.1: Brodatz textures experiment results




                        Figure 6.2: Training set images




                                     120
Chapter 6                                                       Results and Conclusion




All the natural images that have been used in our system are day vision simple images.
Simple means there are no shadows, reflections, refractions or any manmade objects
in the scene. The training set images were taken from the gray versions of these
images. We used the same images for testing and also we have tested our system on
about 50 natural scenes including the PANN database benchmark (chapter 1) and the
results are satisfying in most cases.


6.2 System results:


6.2.1 Accurate Results:


First, the results of testing the images used for training set collection. We grouped our
results according to the conditions of each set. Each set has a name of the
segmentation technique used. Two numbers are followed; the number of clusters and
the minimum region size.

Figure 6.3 shows the results of using K-Mean with spatial features, notice the number
of clusters is more than the actual number to overcome the problem of structured
segmentation. Figure 6.4 shows the results of using K-Mean without spatial features;
the number of clusters is the same as the number of regions.

Results of other test images are shown in figure 6.5, while PANN database results
using K-Mean with splitting disjoint regions are presented in figure 6.6. Note that in
PANN scenes there are outdoor objects (e.g. walls) what are not predefined in our
system, so they wasn’t colored naturally.




                                            121
Chapter 6                                                 Results and Conclusion




                          (a)KM (10, 1000)




                           (b) KM(8,1000)

            Figure 6.3: Segmentation with spatial features,




                                 122
Chapter 6                                                  Results and Conclusion




                           (a) KM (2, 1000)




                            (b) KM (3, 500)




                             (c) KM (4, 1000)

            Figure 6.4: Segmentation without spatial features




                                  123
Chapter 6                                       Results and Conclusion




                    (a) KM(2,100)




                    (b)KM(3,1000)

            Figure 6.5: Other test set images




                        124
Chapter 6                                                            Results and Conclusion




            Figure 6.6: PANN Database, segmentation with splitting disjoint regions




                                             125
Chapter 6                                                          Results and Conclusion




6.2.2 Misclassified results:


Some errors may occur in the classification stage according to the similarity between
different texture samples. Some of these similarities can be accepted like cloud and
water that they are near in color but with some cases (e.g. sea and grass) it’s not
accepted that they are completely different in color.

Figure 6.7 shows some examples of these cases; sometimes the high rough sea is
recognized as stones fig6.7 (a-b). On the contrary, if the sea is very quiet and does not
have clear waves, it is recognized as grass fig6.7.c. Also highly contrasted trees are
recognized as stones fig6.7 (d-e). To overcome this problem more dissimilar textures
can be supplied to the DB. Also using machine learning clustering techniques like
SOFM neural networks, can be better for classify the samples instead of human made.

To prevent this type of errors as possible, we applied a simple technique that classifies
the objects in two classification levels. For instance, if the KNN results in 5 classes
like “grass, sea, water, grass, sea”, the traditional solution is the class of the grass. But
we have considered the water and the sea classes to be one class. So in level one, the
result will exclude the grass and retain the water and sea classes, then in level two the
algorithm will select the sea class among them since it has the most occurrence. In the
same way, trees and grass are considered as one class in level one, also for (sky and
clouds) and (wood and stone).




                                            126
Chapter 6                                                   Results and Conclusion




            (a)                       (b)                           (c)




                  (d)                                        (e)

                        Figure 6.7: Misclassified results




                                      127
Chapter 6                                                      Results and Conclusion




6.3 Comparisons:
Comparisons were done between our TRICS system and the ‘Global image matching
technique’ (GIM) of T.Walsh and the ‘Local color transfer technique’ (LCT) of Y.
Tai as described in chapter 2. We have selected the same images presented in their
papers for comparisons. For both techniques, the source color is shown in the figure
bellow the gray image.

GIM technique has become the base of all the coloring systems that relay on matching
procedure. But in many cases especially when the source image has different colors
with the same intensities, the results become not acceptable. As seen in figure 6.8, we
compare our system with both GIM and the LCT technique in coloring the images
(a)and (b) using the colored images (c) and (d) respectively. For the first image (a),
the coloring process using LCT has resulted in black part in the sky (figure 6.8 (e))
while black color existence means that the lightness channel has been changed!!.
While coloring using GIM resulted in unnatural look due to the wrong pixel mapping.
On the contrary, our system result is clear and retains the actual lightness, more over;
it doesn’t depend on any user interaction.

Other comparisons with GIM are presented in figure 6.9 for coloring the images
presented in T.Walsh paper. It’s so clear that the GIM is not suitable for all images
even if they are similar or acquired from the same place. Although figure 6.9 (i) seems
to be colored correctly, from a close look we can recognize many blue parts in the
green leaves. So T.Walsh has proposed his swatches solution to overcome these
problems (chapter 2).

Finally we present a comparison between using HSV and HSI coloring models as
presented in chapter 5. The results are shown in figure 6.10. As recognized HSI
coloring gives a lighter look than HSV, while HSV gives, from our side of view, more
natural color distribution.




                                             128
Chapter 6                                                            Results and Conclusion




                             (a)                               (b)
                                     Original gray images




                                   (c)                   (d)
                                     Source colored images




              (e)                             (f)                           (g)




              (h)                             (i)                           (j)
             LCT.                            GIM                          TRICS

            Figure 6.8: Comparison between LCT, GIM techniques and TRICS system




                                             129
Chapter 6                                                      Results and Conclusion




            (a)                       (b)                                  (c)
                             Original gray images




            (d)                       (e)                                  (f)
                              Source color images




             (g)                   (h)                               (i)
                                 GIM Results




             (j)                   (k)                               (l)
                                TRICS Results

                   Figure 6.9: Comparison with GIM technique




                                     130
Chapter 6                                               Results and Conclusion




            (a) HSI Coloring                (b) HSV Coloring
                 Figure 6.10 HSI- HSV coloring comparison




                                  131
Chapter 6                                                     Results and Conclusion




6.4 Conclusion:


      As a conclusion of this thesis, we have shown that using recognition systems
for coloring gray scale images results in more natural coloring than any other
traditional techniques. This thesis concentrated on texture recognition based coloring
system as an example. So our system -TRICS- is proposed specially for natural
scenes coloring.


For coloring a natural scene using TRICS, the image passes by three stages; first it
must be segmented according to its texture features into different regions. Next, each
region in the segmented image is classified according to its texture features to be one
of predefined classes stored in a special database. Finally, the image is converted to
HSV color model that hue values are obtained from the database, brightness channel is
the same as the gray image and the saturation is obtained be inverting the brightness
channel.


After research and as presented in details in the thesis chapters, we have proved the
following conclusions:


      Discrete wavelets transform coefficients are very suitable texture features for
      both pixel based and region based image analysis.


      Using accelerated k-mean algorithm with cluster number generation and
      smallest region estimation algorithms gives better results than using mean-shift
      clustering technique.




                                         132
Chapter 6                                                     Results and Conclusion




      HSV color model is the best coloring model for recognition based coloring
      systems due to good natural look of the coloring results and the simplicity in
      obtaining the color channels.


By the end of this thesis we have obtained our research objectives that can be
summarized in the following points:


      The proposed system (TRICS) is a new computer based coloring system that
      simulates the human vision in this area.


      TRICS is a fully unsupervised intelligent recognition system contributed for
      textural images like natural scenes.


      The execution time of TRICS system has been minimized as possible to satisfy
      the basic requirements for video coloring.


      TRICS structure is considered to be an abstract structure for building any more
      intelligent coloring systems for any other types of images.


      TRICS performs the other coloring systems.




                                         133
Chapter 6                                                      Results and Conclusion




6.5 Future Work:


   The proposed system has many advantages over than other techniques introduced
in the literature, as illustrated in the previous chapters. Although the proposed
techniques overcame most of problems in other approaches presented in the literature,
there are still some further research points that may be suggested to make the
proposed techniques more effective. These points can be summarized as follows:

      For applying the system on other types of images like outdoor scenes, indoor
      scenes or human photos, different features and training set are needed for each
      type. Also an intelligent system must precede the segmentation stage to
      recognize the image type.


      TRICS has succeeded to colorize simple natural scenes. For more complex
      scenes, intelligent systems are needed to recognize reflections, refractions,
      shadows, fog … etc.


      To improve the classification results, training set clustering can be done
      automatically using SOFM ‘Self organization feature map’. More over images
      can be equalized before training set extraction and gray image coloring to
      eliminate the effect of different histogram distribution of the same class.


      Other classification techniques and distance measures may cause better results
      than K-NN classifier with Euclidean distance as presented in the previous
      chapter.


      Using motion track and scene cut detection techniques, will improves video
      coloring    instead   of    coloring      all   the   frames   in    the      movie.




                                          134
Appendix A                                                              Color Models




                                APPENDIX A

                         COLOR MODELS
       The purpose of color model or space is to facilitate the specification of colors
in some standard generally accepted way. In essence, a color model is a specification
of coordinate system and a subspace within that system where each color is
represented by a single point. Different image processing systems use different color
models for different reasons. The color picture publishing industry uses the CMY
color model. Color CRT monitors and most computer graphics systems use the RGB
color model. Systems that must manipulate hue, saturation, and intensity separately
use the HSI color model. All color space discussions will assume that all colors are
normalized (values lie between 0 and 1.0). This is easily accomplished by dividing the
color by its maximum value.


A.1 Color Fundamentals

The chromatic light spans the electromagnetic spectrum from approximately 400 to
700 nm. Cones are the sensors in the eye responsible for color vision. Detailed
experimental evidence has established that 6 to 7 million cones in the human eye can
be divided into three principals sensing categories, corresponding roughly to red,
green, and blue as presented in figure A.1. Due to these absorption characteristics of
human eye, colors are seen as variable combinations of the so called primary colors
red (R), green (G), and blue (B).




                                         135
Appendix A                                                                                                                          Color Models




                        445 nm                              535 nm                      575 mm

                        Blue                                 Green                       Red




        400                    450                 500                      550              600                    650         700 nm
        Bluish purple


                        Purplish blue

                                        Blue




                                                                                        Orange

                                                                                                   Reddish orange
                                               Blue green

                                                             Green

                                                                     Yellowish green




                                                                                                                          Red
                                                                                        Yellow



                Figure A.1: Absorption of light by the red, green, and blue cones in the human eye as a
                                           function of wavelength.

The characteristics generally used to distinguish one color from another are
brightness, hue, and saturation. Brightness embodies the achromatic notion of
intensity. In other words, brightness is the amount of light in the color and it is graded
from black to dark color (e.g. dark red) to a light color (e.g. pink) then to white. Hue is
an attribute associated with the dominant wavelength in a mixture of light waves. Hue
represents dominant color as perceived by an observer. Thus when we call an object
red, orange, or yellow we are specifying its hue. Saturation refers to the relative purity
or the amount of gray in proportion to the hue in the color. The pure spectrum colors
are fully saturated. Saturation is graded from white (saturation =0) to the pure color.
Hue and saturation taken together are called chromaticity, so color may be
characterized by its brightness and chromaticity. The amounts of red, green, and blue
needed to form any color are called tristimulus values [26].




                                                                                       136
Appendix A                                                                Color Models




A.2 Color Models

A.2.1 RGB

The RGB model is represented by a 3-dimensional cube with red green and blue at the
corners on each axis (Figure A.2). Black is at the origin. White is at the opposite end
of the cube. The gray scale follows the line from black to white. In a 24-bit color
graphics system with 8 bits per color channel, red is (255,0,0). On the color cube, it is
(1,0,0) [26,48].




          Gray scale




                               Figure A.2: RGB color cube.


The RGB model simplifies the design of computer graphics systems but it0 is not
ideal for all applications that red, green, and blue color components are highly
correlated. This makes it difficult to execute some image processing algorithms. Many
processing techniques, such as histogram equalization, work on the intensity
component of an image only. These processes are easier implemented using the HSI
or YIQ color models.

A.2.2 CMY/CMYK

The CMY color space consists of cyan, magenta, and yellow. It is the complement of
the RGB color space since cyan, magenta, and yellow are the complements of red,
green, and blue respectively. Cyan, magenta, and yellow are known as the subtractive
primaries. These primaries are subtracted from white light to produce the desired



                                          137
Appendix A                                                                            Color Models




color. Cyan absorbs red, magenta absorbs green, and yellow absorbs blue. You could
then increase the green in an image by increasing the yellow and cyan or by
decreasing the magenta (green's complement). Figure A.3 describes the model.

Because RGB and CMY are complements, it is easy to convert between the two color
spaces. To go from RGB to CMY, subtract the complement from white:

    C = 1.0 – R                                                                     R = 1.0 - C

    M = 1.0 - G             And to go from CMY to RGB:                              G = 1.0 - M

    Y = 1.0 - B                                                                     B =1.0 – Y



          Yellow             White                                Blue              Black



                   Red Green                                             Cyan Magneta




    Magneta                              Cyan             Green                               Red
                    Blue                                                   Yellow



                       Figure A.3: Additive colors and subtractive colors


Another color model is called CMYK. Black (K) is added in the printing process
because it is a more pure black than the combination of the other three colors. Pure
black provides greater contrast. There is also the added impetus that black ink is
cheaper than colored ink. To make the conversion from CMY to CMYK:

   K = min(C, M, Y) ,            C=C–K                ,       M=M–K ,                 Y=Y-K

To convert from CMYK to CMY, just add the black component to the C, M, and Y
components.




                                                138
Appendix A                                                                   Color Models




A.2.3 HSI/HLS

Since hue, saturation, and intensity (Luminance) are three properties used to describe
colors, it seems logically to create a corresponding color model, HSI/HLS. When
using the HSI color space, simply adjusts the hue to get the color. To change a deep
red to pink, adjust the saturation. To make it darker or lighter, alter the intensity.

Many applications use the HSI color model. Machine vision uses HSI color space in
identifying the color of different objects. Image processing applications  such as
histogram operations, intensity transformations, and convolutions  operate on only
an image's intensity. These operations are performed much easier on an image in the
HSI color space.

For the HSI is modeled with cylindrical coordinates, see Figure A.4a. The hue (H) is
represented as the angle θ, varying from 0o to 360o. Saturation (S) corresponds to the
radius, varying from 0 to 1. Intensity (I) varies along the z axis with 0 being black and
1 being white.

When S = 0, the color lays on the Black to White-axe (BW-axe) and has a gray shade
according to the intensity value. When S = 1, the color is on the boundary of top cone
base. The greater the saturation, the farther the color is from BW-axe (depending on
the intensity).

Adjusting the hue will vary the color from red at 0o, through green at 120o, blue at
240o, and back to red at 360o. When I = 0, the color is black and therefore H is
undefined. When S = 0, the color is grayscale. H is also undefined in this case.

By adjusting the intensity, a color can be darker or lighter. By maintaining S = 1 and
adjusting I, shades of that color are created.




                                            139
Appendix A                                                                                  Color Models




1.3.4 HSV/HSB

HSV is stands for (Hue, Saturation Value/Brightness). The concept behind this model
is similar to HSI model except that the three dimensions put together gives a volume
shaped like a cone, some reducing it to a hexacone (figure A.4b). The saturation has
the maximum value at brightness =1, while in HSI model it has the maximum value at
lightness = 0.5.




                                                          Green 120                    Yellow


                                                   Cyan               White                     Red 0



                                                     Blue 240                               Magneta




                                  θ                                               θ     H        S
                                                                  Black


                            (a)                                           (b)
             Figure A.4: (a) A double cone of HSI/HLS Color Model (b) HSV/HSB model

Table A.1 illustrates the difference between both HSI and HSV hue and luminance
distributions. And Table A.2 illustrates the change in the saturation value in both
models. An example is presented in figure A.5 to illustrate the difference between
them   in     color   image       converted   to    both       models.          Hue,     Saturation     and
Lightness/Brightness channels of the image are presented in figure A.6. As shown in
the figure the lightness of HSL model is the same the gray version of the image.



                                              140
Appendix A                                                                     Color Models




                         Change in Lum/Bright                  Change in Hue
                         Hue = 0, Saturation = 1             Sat=1, Brightness=1

             HSI

             HSB
                           Table A.1 Hue and luminance distribution

             HSI saturation change                       HSB saturation change
             H = 0, L = 1, 0.5, 0.25                      H = 0, B = 1, 0.5, 0.25




                                Table A.2 change in saturation




                                  (a)                  (b)




                                  (c)                  (d)

   Figure A.5 Different models representation (a) RGB (b) Gray (c) HSB (d) HSL images




                                             141
Appendix A                                                                   Color Models




             Figure A.6: Hue, Saturation and Lightness/Brightness channels




                                         142
Appendix A                                                              Color Models




- Converting from RGB to HSI:


Given an image RGB color format, HSI components can be obtained using the
following equations [26, 48]:


                             1
                         I = ( R + G + B)
                             3
                                     3
                         S = 1−            [min( R, G, B)]
                                R+G+ B
                                    1 [( R − G ) + ( R − B )] 
                         H = cos 
                                −1       2                          
                                    ( R − G ) 2 + ( R − B)(G − B ) 
                                                                   



To convert from HSI to RGB, the process depends on which color sector H lies in.
there are three sectors for H representation:


For the RG sector (0○ ≤ H < 120○):

                                     1
                                 B = (1 − S )
                                     3
                                             S cos H 
                                 R = I 1 +           
                                        cos(60 − H ) 
                                                  °


                                 G = 1 − ( R + B)



For the GB sector (120○ ≤ H < 240○):

                                 H = H − 120 °
                                     1
                                 R = (1 − S )
                                     3
                                             S cos H 
                                 G = I 1 +       °   
                                        cos(60 − H ) 
                                 B = 1 − (R + G)




                                             143
Appendix A                                                                      Color Models




For the BR sector (240○ ≤ H ≤ 360○):

                                   H = H − 240 °
                                       1
                                   G = (1 − S )
                                       3
                                              S cos H 
                                   B = I 1 +      °    
                                          cos(60 − H ) 
                                   R = 1 − (B + G)



- Converting from RGB to HSV

As noticed in figure A.6, the difference between the two models appears in the
saturation and intensity/brightness channels. The following equations describe how to
convert from RGB to HSV:

                          V = max( R + G + B )
                              max(R, G, B ) − min( R, G, B )
                          S=
                                     max( R, G, B)
                                       1 [( R − G ) + ( R − B)] 
                          H = cos −1       2                         
                                       ( R − G ) 2 + ( R − B)(G − B) 
                                                                     
                          if ( B > G ) then H = 2π − H

From HSV to RGB: Set the following parameters [49]:

             Hi =[H/60] mod 6,           q = V(1-f S),             f = H/60 - Hi,
             p = V(1-S)                  t = V(1-(1-f)S),

Consider the following cases and calculate the RGB components from the table:

                          Hi=0 Hi=1 Hi=2 Hi=3 Hi=4 Hi=5
                    R       V        q         p         p     t          V
                   G        t       V         V          q     p          p
                    B       p        p         t         V     V          q
                                Table A.3 RGB values from HSV




                                               144
Appendix A                                                                Color Models




A.2.5 YIQ

YIQ is another color space that separates the luminance from the color information.
The YIQ model is used in USA commercial color television broadcasting and is a
recoding of RGB for transmission efficiency and for downward compatibility with
black-and-white television. This compatibility is that black-and-white televisions only
pay attention to the Y-component of the transmission, which contains relative
luminance information. This Y-component is defined to be the same as the CIE Y
component. This component gets the majority of the bandwidth in television
broadcasting, and is thus more precise, such that black-and-white television pictures
usually appear sharper than color television pictures in the U.S. Note that it (the Y
parameter) gets more of the bandwidth because the human visual system is more
sensitive to changes in luminance than to changes in hue or saturation.

The YIQ model is a 3D Cartesian coordinate system, with the visible subset being a
convex polyhedron that maps into the RGB cube. The Y component represents the
luminance information, and is the only component used by black-and-white television
receivers. I and Q represent the chrominance information I and Q can be thought of as
a second pair of axes on the same graph, rotated 33°; I stands for in-phase, while Q
stands for quadrature, referring to the components used in quadrature amplitude
modulation. There are several ways to convert to/from YIQ. This is the CCIR
(International Radi Consultive Committee) recommendation 601-1 and is the typical
method used in JPEG compression.

       From RGB to YIQ:                                      From YIQ to RGB:
Y = 0.299 R + 0.587 G + 0.114 B                           R = Y + 0.9563 I + 0.6210 Q
I = 0.5957 R − 0.2744 G − 0.3212 B                        G = Y − 0.2721 I − 0.6473 Q
Q = 0.2114 R − 0.5226 G + 0.3111 B                        B = Y − 1.1070 I + 1.7046 Q




                                          145
Appendix A                                                              Color Models




A.2.6 YUV

The YUV model defines a color space in terms of one luminance (luma) and two
chrominance components. The YUV color model is used in the PAL, NTSC, and
SECAM composite color video standards. Previous black-and-white systems used
only luma (Y) information and color information (U and V) was added so that a black-
and-white receiver would still be able to display a color picture as a normal black and
white pictures.

Y stands for the luma component (the brightness) and U and V are the chrominance
(color) components. The YPbPr color model used in analog component video and its
digital child YCbCr used in digital video are more or less derived from it, (Cb/Pb and
Cr/Pr are deviations from gray on blue-yellow and red-cyan axes whereas U and V are
blue-luminance and red-luminance differences), and are sometimes inaccurately called
"YUV". The YIQ color space used in the analog NTSC television broadcasting system
is related to it, although in a more complex way. To convert from RGB to YUV

                      Y=       0.299 R +    0.587 G +    0.114 B
                      U = − 0.14713 R − 0.28886 G +      0.436 B
                      V=       0.615 R − 0.51499 G − 0.10001 B

And to convert back to RGB

                      R = 1.00000Y + 1.40200V

                      G = 1.00000Y − 0.39465U − 0.58060V

                      B = 1.00000Y + 2.03211U




                                           146
Appendix A                                                               Color Models




A.2.6 CIE Lab/lαβ

The CIE, which stands for International Commission on Illumination (or Commision
Internationale de l'Eclairage), in 1931, defined three standard primaries (X, Y, and Z)
to replace red, green, and blue, because all visible colors could not be specified with
positive values of red, green and blue components. In other words, not all visible
colors can be specified in the RGB color space, at least not with positive coefficients.
With this newly created X, Y, and Z primaries, all visible colors could be specified
with only positive values of the primaries. The primary Y was intentionally defined to
match closely to the quality of luminance of a color. Note that arbitrarily combining
X, Y, and Z values can easily lead to a "color" outside the visible color spectrum.
Figure A.7a shows the cone of visible colors, as well as the plane X+Y+Z=1.

Lowercase x, y, and z refer to the normalized X, Y, and Z values [normalized such
that x + y + z = 1, so x = X/(X+Y+Z), y = Y/(X+Y+Z), and z=Z/(X+Y+Z)]. Thus, x, y
and z are on the (X + Y + Z = 1) plane of the solid of visible colors in X,Y,Z space.
These lowercase letters are called chromaticity values, and a projection onto the (X,Y)
plane of the (X+Y+Z=1) plane of the figure to the left is called the chromaticity
diagram. On this diagram, all perceivable colors with the same chromaticity but
different luminance (brightness) map into the same point. This chromaticity diagram
is shown in figure A.7b, with the dot marking the "C-illuminant" (or white point), and
the numbers along the boundary correspond to wavelength in nanometers.

The Lab transform has recently been derived from a principal component transform of
a large ensemble of hyperspectral images that represents a good cross-section of
natural scenes. The resulting data representation is compact and symmetrical, and
provides automatic decorrelation to higher than second order. Figure A.7c presents the
XYZ after conversion to Lab.




                                          147
Appendix A                                                                                                     Color Models




       Y
                                                     y
                                                     0.9
                                                                     520
                                                     0.8                          540
                                                                 510
                                                     0.7          Green
                                                                                              560
                                                     0.6       500
                                                                                        Yellow                 580
                                                     0.5             Cyan
                                                                                                         600
                                                     0.4             490
                                                                     Blue         White
                                                     0.3                                                 Red         700

                                            X        0.2             480
                                                                                  Purble
                                                     0.1
                                                                            400
                 Z                                                                                                         x
                                                               0.1    0.2     0.3       0.4   0.5      0.6   0.7
                        (a)                                                                      (b)
                                                Yellow
                              Green
                                                +b

                                                           L                  Red
                                                                      +a

                                                -a
                                      Cyan


                                                                -b
                                                                                  Magneta

                                                                                  Blue
                                      (c)

             Figure A.7: a) XYZ space B) XYZ chromaticity diagram .c) Lab color space

The actual transform is as follows [7].

       First the RGB tristimulus values are converted to device independent XYZ
       tristimulus values. This conversion depends on the characteristics of the display
       on which the image was originally intended to be displayed. Because that
       information is rarely available, it is common practice to use a device-




                                                      148
Appendix A                                                              Color Models




      independent conversion that maps white in the chromaticity diagram to white
      in RGB space and vice versa

                 X  0.5141 0.3239 0.1604  R 
                Y  = 0.2651 0.6702 0.0641 G               (A.1)
                                          
                 Z  0.0241 0.1228 0.8444  B 
                                          

      The device independent XYZ values are then converted to LMS space by

                 L   0.3897 0.6890 − 0.0787   X 
                 M  =  − 0.2298 1.1834 0.0464  Y          (A.2)
                                               
                 S   0.0000 0.0000 1.0000   Z 
                                               

      Combination of (A.1) and (A.2) results in

                 L   0.3811 0.5783 0.0402  R 
                 M  = 0.1967 0.7244 0.0782 G              (A.3)
                                           
                 S   0.0241 0.1288 0.8444  B 
                                           

      The data in this color space shows a great deal of skew, which is largely
      eliminated by taking a logarithmic transform:
                           L = log L
                           M = log M
                                                                (A.4)
                           S = log S


The inverse transform from LMS cone space back to RGB space is as follows:

      First, the LMS pixel values are raised to the power ten to go back to linear
      LMS space.

      Then, the data can be converted from LMS to RGB using the inverse transform
      of Equation (A.3):




                                         149
Appendix A                                                                            Color Models




                  R   4.4679 − 3.5873 0.1193   L 
                 G  = − 1.2186 2.3809 − 0.1624  M                     (A.5)
                                               
                  B   0.0497 − 0.2439 1.2045   S 
                                               

The following simple transform decorrelate the axes in the LMS space:

                        1         0        0  1 1
                 l       3                        1  L 
                 α  =  0       1           
                                            0  1 1 − 2  M              (A.6)
                                    6                  
                 β  
                                         1   1 − 1 0   S 
                                                         
                         0
                        
                                   0
                                             2
                                              

If we think of the L channel as red, the M as green, and S as blue, we see that this is a
variant of a color opponent model:

                                Achromatic          ∝           r+g+b

                            Yellow – blue           ∝           r+g–b    -----(A.7)
                                Red – green         ∝            r-g


After processing the color signals in the lαβ space the inverse transform of Equation
(A.6) can be used to return to the LMS space:

                                     3                 0         0 
                   L  1 1   1  3                                l 
                   M  = 1 1 − 1  0                 6           
                                                                  0  α  (A.8)
                                                       6          
                   S  1 − 2 0  
                                0                             2  β 
                                                                       
                                                       0
                                                                  2
                                                                    




                                              150
                                                                     References




                          REFERENCES
1.   Luiz Filipe M. Vieira, Rafael D. Vilela, Erickson R. do Nascimento, Fernando
     A. Fernandes Jr., Rodrigo L. Carceroni, and Arnaldo de A. Araujo,
     "Automatically Choosing Source Color Images for Coloring Grayscale
     Images," sibgrapi, XVI Brazilian Symposium on Computer Graphics and
     Image Processing (SIBGRAPI'03), 2003, p. 151

2.   Neuraltek, BlackMagic photo colorization software, version 2.8.2003 in URL:
     http://www.timebrush.com/blackmagic

3.   T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring color to greyscale
     images,” In Proceedings of the 29th Annual Conference on Computer Graphics
     and interactive Techniques, San Antonio, Texas, July 23-26, 2002, pp 277–
     280.

4.   J. Yoo and S.Oh , “A Coloring Method of Gray-Level Image using Neural
     Network,” Proceedings of the 1997 International Conference on Neural
     Information Processing and Intelligent Information Systems, Vol. 2,
     Singapore,1997, pp 1203-1206.

5.   A. N. Al-Gindy, H. Al Ahmad, R. A. Abd Alhameed, M. S. Abou Naaj and P.
     S. Excell , “Frequency Domain Technique For Colouring Gray Level Images,”
     2004. In URL: www.abhath.org/html/modules/pnAbhath/download.php?fid=32

6.   E.Reinhard, M.Ashikhmin, B.Gooch And P.Shirley , “Color Transfer between
     Images,” IEEE Computer Graphics and Applications, Sep/Oct, 2001, Vol.
     21, Issue. 5, pp 34-41




                                      151
                                                                         References




7.    A. Toet, “Transferring Color To Single-Band Intensified Night Vision Images,”
      In proceedings of SPIE, Vol. 5424, Enhanced and Synthetic Vision 2004,
      Jacques G. Verly, Editor, August 2004, pp. 40-49

8.    Y. Tai, J. Jia, and C. Tang, “Local Color Transfer via Probabilistic
      Segmentation by Expectation- maximization,” In procedding of Computer
      Vision and Pattern Recognition (CVPR'05). IEEE Computer Society
      Conference on, Vol. 1, 20-25 June 2005, pp. 747-754

9.    T.Chen, Y.Wang, V.Schillings, and C.Meinel, “Grayscale Image Matting And
      Colorization,” In Proceedings of ACCV2004 , Jeju Island, Korea, Jan 27-30,
      2004, pp 1164-1169

10.   U. Lipowezky, “Grayscale aerial and space image colorization using texture
      classification,” Pattern Recognition Letters, Vol. 27,No. 4, March 2006, pp
      275–286

11.   T. Horiuchi, “Colorization algorithm using probabilistic relaxation,” Image and
      Vision Computing, Vol. 22, 2004, pp 197–202

12.   H. Nodaa, J. Korekunib, and M. Niimia, “A colorization algorithm based on
      local MAP estimation,” Pattern Recognition, Vol. 39, 2006, pp 2212 – 2217

13.   L. Vieira, R. Vilela, E. do Nascimento, F. Fernandes Jr., R. Carceroni, and A.
      Ara´ujo, “Fully automatic coloring of grayscale images,” Image and Vision
      Computing, Vol. 25, 2007, pp 50–60

14.   A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,”
      ACM Transactions on Graphics, Vol. 23, Issue 3, 2004, pp.689–694

15.   V. Konushin, V. Vezhnevets, “Interactive Image Colorization and Recoloring
      based on Coupled Map Lattices,” Graphicon'2006 conference proceedings, ,
      Novosibirsk Akademgorodok, Russia, July 2006 pp.231-234




                                        152
                                                                         References




16.   S. Premoˇvze, and WB. Thompson, “Automated Coloring of Panchromatic
      Orthoimagery,” in 3rd ICA Mountain Cartography Workshop, Mt. Hood,
      Oregon, USA, May 2002.

17.   Daniel Sýkora, Jan Buriánek and Jiří Žára, “Colorization of Black-and-White
      Cartoons,” Image and Vision Computing, Vol. 23, No. 9, pp 767-852, Elsevier,
      September 2005

18.   EE · Ritwik K. Kumar, and Suman K. Mitra, “Color Transfer Using Motion
      Estimations and Its Application to Video Compression,” in the proceeding of
      11th International Conference of Computer Analysis of Images and Patterns
      (CAIP 2005), Versailles, France, September 5-8, 2005, pp 313-320

19.   C.Wang and Y.Huang, “A Novel Color Transfer Algorithm for Image
      Sequences,” Journal Of Information Science And Engineering, Vol. 20, 2004,
      pp 1039-1056

20.   C. H. Chen, L. F. Pau, and P. S. P. Wang (eds.), book, Pattern Recognition
      and Computer Vision, World Scientific Publishing Co., 2nd Edition 1998,
      Chapter 2.1, pp. 207-248

21.   Phil Brodatz, Textures: A Photographic Album for Artists & Designers, Dover
      publications Inc, New York, 1966.

22.   Bryan S. Morse, Brigham Young University, 1998–2000 Lecture 22

23.   T Ojala and M Pietik‫ن‬inen , Machine Vision and Media Processing Unit
      University of Oulu Finland in URL:
      http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OJALA1/texclas.htm

24.   P.Howarth, and S.Ruger, “Evaluation of texture features for content-based
      image retrieval,” In proceedings of the International Conference on Image and
      Video Retrieval, Springer-Verlag ,2004,pp 326–324




                                          153
                                                                             References




25.   M.Harbour, “Development of a suite of tools for Texture-based Segmentation
      of images in MATLAB,” Dissertation for Computer Science Bachelor degree,
      University of Bath 2004, in URL:
      www.cs.bath.ac.uk/~amb/CM30076/projects.bho/2003-/MatthewHarbourDissertation.pdf

26.   Rafael C.Gonzalez and Richard E.Woods, Book, Digital Image Processing,
      Prentice Hall, Second Edition 2002.

27.   VISTEX texture database:
      http://wwwwhite.media.mit.edu/vismod/imagery/VisionTexture/vistex.html

28.   Meastex texture database:

      http://www.cssip.elec.uq.edu.au/~guy/meastex/meastex.html

29.   CURet texture database: http://www.curet.cs.columbia.edu/curet/data

30.   T. Ojala, T. Maenpaa, M. Pietikainen, J. Viertola, J. Kyllonen and S. Huovinen,
      “Outex – New        framework for empirical evaluation of texture analysis
      algorithms,” In proceeding of ICPR, Quebec, Canada, 2002. Vol. 1, pp 701-
      706 In URL: http://www.outex.oulu.fi

31.   Mona Sharma, “Performance Evaluation of Image Segmentation and Texture
      Extraction Methods in Scene analysis,” Master thesis of Philosophy in
      Computer Science University of Exeter, 2001

32.   D. Comaniciu and P. Meer, “Mean shift Analysis and Applications,” In
      proceeding of Seventh Int’l Conf. Computer vision , Kerkyra, Greece, Sep
      1999, pp.1197-1203,

33.   D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature
      space analysis,” PAMI, Vol. 24, No. 5, pp 603–619, May 2002




                                          154
                                                                       References




34.   Comaniciu, Ramesh, and Meer, “The variable bandwidth mean shift and data-
      driven scale selection,” Proceedings of the 8th IEEE International Conference
      of Computer Vision. 7-14 July 2001, Vol. 1, pp 438-445.

35.   B. Georgescu, I. Shimshoni, and P. Meer, “Mean shift based clustering in high
      dimensions: A texture classification example,” In the proceedings of the 9th
      IEEE International Conference on Computer Vision (ICCV), 14-17 October
      2003, Nice, France,p.456

36.   Kardi Teknomo, K-Means Clustering Tutorial, free online tutorial 2004 in
      URL : http:\\people.revoledu.com\kardi\ tutorial\kMean\,

37.   T. Kasparis, D. Charalampidisa, M. Georgiopoulosa and J. Rolland,
      “Segmentation of textured images based on fractals and image filtering,”
      Pattern Recognition, Vol. 34, Issue 10 , October 2001, pp 1963-1973

38.   T. Calinski and J. Harabatz, “A dendrite method for cluster analysis,”
      Communications in. Statistics, Vol. 3, 1974, pp 1–27

39.   C.Elkan, “Using the triangle inequality to accelerate k-Means,” In Proceeding
      of the 20th ICML, Washingt, 2003. pp 147—153

40.   T Ojala and M Pietik‫ن‬inen, ‘Texture classification’ Machine Vision and Media
      Processing Unit, University of Oulu, Finland. In URL:

            http://www.ee.oulu.fi/research/imag/texture

41.   Jan Puzicha, Yossi Rubner, Carlo Tomasi, and Joachim M. Buhmann.
      “Empirical Evaluation of Dissimilarity Measures for Color and Textur,” IEEE
      International Conference on Computer Vision, Kerkira, Greece, September
      1999, pages 1165-1172

42.   Xiaojin Zhu ,lecture, “K-nearest-neighbor:an introduction to machine
      learning,” lecture, ‘Introduction to Artificial Intelligence’, Fall 2005 ,CS



                                        155
                                                                          References




      Department ,University of Wisconsin, Madison ,found in
            www.cs.wisc.edu/~jerryzhu/cs540/knn.pdf

43.   M.F.A. Fauzi, and P.H. Lewis, “A Fully Unsupervised Texture Segmentation
      Algorithm,” In Proceedings of British Machine Vision Conference, Norwich,
      UK. Harvey, 2003, pp. 519-528

44.   D. Charalampidis and T. Kasparis, “Wavelet-Based Rotational Invariant
      Roughness Features for Texture Classification and Segmentation,” IEEE
      Transactions on Image Processing, Vol. 11, No.8 August 2002, pp 825-837

45.   Ying Liu, Xiaofang Zhou, and Wei-Ying Ma, “Extracting Texture Features
      from Arbitrary-shaped Regions for Image Retrieval,” IEEE International
      Conference on Multimedia and Expo., Taipei, Jun. 2004 ,pp 1891-1894

46.   O. Commowick, C. Lenglet, and C. Louchet, “Wavelet-Based Texture
      Classification and Retrieval,” 2003 in URL:
            http://www.tsi.enst.fr/tsi/enseignement/ressources/mti/classif-textures/

47.   Eka Aulia, “Hierarchical Indexing For Region Based Image Retrieva,” Master
      thesis of Science in Industrial Engineering, Louisiana State University and
      Agricultural and Mechanical College, May 2005

48.   Jacques Paris, web site, “Color Models for MapInfo,” April 2002 in URL:
      http://www.paris-pc-gis.com/MI_Enviro/Colors/color_models.htm

49.   Wikipedia web site, http://en.wikipedia.org/wiki/HSV_color_space

50.   Wikipedia web site, http://en.wikipedia.org/wiki/Lab_color_space

51.   Wikipedia web site, http://en.wikipedia.org/wiki/YUV

                “All web sites are valid till the end of the thesis”




                                        156
        ‫ا‬




.
                       .
                                                   .

      Wavelets
                 . Tamura        Co-occurrence matrix           Laws
    K-NN                   .          Fast K-Mean Mean-Shift
                           HSV                         .
                                                     .
                                            .


                            .

                 :


     Photoshop , Paintshop                             pseudocoloring
     .                                                      .

                                      .
            .




                                  ‫د‬
             ‫ا‬




                                                               .
                           .


K Nearest Neighbor, Mean Shift , K-Mean
                                     .                                   .Classifier

                     .
        .
                      .


    .                                                          .

            RGB, HSV, HLS, …etc
                                         .


                                                                       :‫ة‬     ‫تا‬      ‫ا‬
Colorization, Texture features, Segmentation, Classification, Wavelets, Laws, Color
models
    ‫ا‬   ‫ﻥ ة‬




              ‫.‬                       ‫(‬           ‫)‬
                                  ‫.‬
                                              ‫.‬

                                                          ‫.‬

                          ‫.‬

                      ‫.‬
                                                      ‫.‬

‫:‬
                  ‫.‬
                      ‫.‬
                                          ‫.‬


    ‫.‬




                              ‫د‬
            ‫.‬                                             ‫.‬

    ‫/‬
                    ‫.‬


                                                      ‫.‬

                ‫/‬
                                                          ‫/‬
‫/‬                       ‫.‬
                    ‫.‬
                        ‫.‬




                                              ‫.‬
        ‫.‬


                                                  ‫.‬
                               ‫ً‬
                            ‫ا ♥♥‬       ‫♥♥ و‬




                                   ‫ب‬
                                      ‫/‬
        ‫.‬               ‫١٠٠٢‬

‫.‬                              ‫٢٠٠٢‬

    ‫.‬                          ‫٣٠٠٢‬
                ‫٤٠٠٢.‬




            ‫أ‬

				
DOCUMENT INFO
Shared By:
Stats:
views:781
posted:10/27/2010
language:English
pages:193
Description: Master Thesis 2007 - Faculty of Computers and Information - Menofia University - Egypt "Gray Image Coloring Using Texture Similarity Measures" Noura A.Semary