A Knowledge-based Framework for Automatic Semantic Image Retrieval

Document Sample
A Knowledge-based Framework for Automatic Semantic Image Retrieval Powered By Docstoc
					     A Knowledge-based Framework for Automatic Semantic
                      Image Retrieval

                                                        R.C.F. Wong
                                                Hong Kong Baptist University

ABSTRACT                                                            Segmentation by object is widely regarded as a hard prob-
With the rapid increase of the volume of digital image collec-   lem, which if successful, will be able to replicate and per-
tions, image retrieval has been become one of the most im-       form the object recognition function of the human vision sys-
portant research areas. The effectiveness of image retrieval      tem[2]. There has been some effort in trying to relate low-
depends on meaningful indexing. In this paper, we present        level features and regions to higher-level perception, these
a semantic annotation technique based on the use of image        are limited to isolated words, and they also require substan-
parametric dimensions and metadata. In addition, we pro-         tial training samples and statistical considerations,
pose an extension of image indexing models which utilizes           Information for finding images on the Web can come from
knowledge-based expansion and contextual feature-based in-       the associated text and the image itself. Some studies in-
dex expansion. Our system is evaluated quantitatively using      clude users in a search loop with a relevance feedback mecha-
more than 100,000 web images. Experimental results indi-         nism to adapt the search parameters based on user feedback,
cate that this approach is able to deliver highly superior       while some research focuses on implicit image annotation
performance compared.                                            which involve an implicit, rather than an explicit, indexing
                                                                 scheme and, in consequence, augments the original indexes
                                                                 with additional concepts that are related to the query[1], ne-
Keywords                                                         cessitating the use of some probabilistic weighing schemes.
feature extraction, image annotation, image retrieval, image        We propose an integrated framework for image retrieval
semantics, metadata                                              based on generative modelling approaches. In [3, 4], a se-
                                                                 mantic indexing technique named Automatic Semantic An-
                                                                 notation (ASA) approach is developed which is based on the
1.   INTRODUCTION AND RELATED WORK                               use of image parametric dimensions and metadata. Using
  The number of web images is increasing at a rapid rate,        decision trees and rule induction, a rule-based approach to
and searching them semantically presents a significant chal-      formulate explicit annotations for images fully automatically
lenge. Many raw images are constantly uploaded with little       is developed, so that, semantic query such as ”sunset by the
meaningful direct annotation of semantic content, limiting       sea in autumn in New York” can be answered and indexed
their search and discovery. While some sites encourage tags      purely by machine. In this paper, we propose an extension
or keywords to be included manually, such is far from uni-       of such image indexing models by using knowledge-based ex-
versal and applies to only a small proportion of images on       pansion and contextual feature-based index expansion. Ex-
the Web.                                                         perimental evidence on more than 100,000 web images and
  Research in image annotation has reflected the dichotomy        over 990,000 tags shows that semantically meaningful re-
inherent in the semantic gap, and is divided between two         trieval are inferred and it is able to deliver highly competent
main categories: concept-based image retrieval and content-      performance.
based image retrieval. The former focuses on retrieval by
image objects and high-level concepts, while the latter fo-
cuses on the low-level visual features of the image.             2. CORRELATING SCENE CHARACTER-
  In order to determine image objects, the image often has          ISTICS WITH IMAGE DIMENSIONS AND
to be segmented into parts. Common approaches to im-                FEATURES
age segmentation include segmentation by region, and seg-
mentation by image objects. Segmentation by region aims
to separate image parts into different regions sharing com-       2.1 Scenes of Image
mon properties. These methods compute general similarity           In relation to image acquisition, many images may be bro-
between images based on statistical image properties, and        ken down to few basic scenes, such as nature and wildlife,
common examples of such properties are texture and color         portrait, landscape and sports. A landscape scene comprises
where these methods are found to be robust and efficient.          the visible features of an area of land, including physical ele-
Some systems use color, texture and shape as attributes and      ments such as landforms, living elements of flora and fauna,
apply them for entire image characterization.                    abstract elements such as lighting and weather conditions.
                                                                 Landscape photography is the normal approach to ensure
Copyright is held by the author/owner(s).                        that as many objects are in focus as possible, which com-
ACM-HK Student Research and Career Day, 2009                     monly adopts a small aperture setting. The equipment used
                                                                                                                                       T1               Sub-tree reliability factor = r
                                                                                                           Traversal probabilities
               Table 1: scenes of images                                                                                                          t1k
                                                                                                                           t11                                    The traversal probabilities of different object
         Categories              Scenes                                                                                                                           classes exhibit different characteristics, with
                                                                                                                        T11      T12        ...    T1k            tij > t’mn for tij belonging to the concrete
                             Day scenes (Sd )                                                                                                             t1kj
                                                                                                                                                                  object class, and t’mn belonging to the
                                                                                                                                                                  abstract object class.
        Landscape (Sl )     Night scenes (Sn )
                        Sunrises and sunsets (Sss )
                           Indoor events (Sie )
                                                                                                                      Figure 2: Hierarchical expansion
                          Indoor portraits (Sip )
        Portraits (Sp )   Outdoor events (Soe )
                         Outdoor portraits (Sop )
                               Sports (Ss )                                                              Our approach provides operations to perform image re-
         Nature (Sna )         Macro (Sm )                                                            trieval with knowledge-based expansion enabled. It aims
                              Wildlife (Sw )                                                          to introduce knowledge-based expansion into the image re-
                                                                                                      trieval problem and using the sub-objects as surrogate terms
                                                                                                      for general queries to improve the precision since, in certain
                                                                                                      applications, the presence of particular objects in an image
                                                                                                      often implies the occurrence of other objects. The applica-
                                                                  Sw               Sop                tion of such inferences will allow the concept of an image to
                                                                                                      be automatically expanded.
                                                                                                         Aggregation hierarchical expansion is a particularly use-
         focal length (L)

                            10                                     Ss
                                                                                                      ful technique, which relates to the aggregation hierarchy of
                                                                                                      sub-objects that constitute an object. These can be classi-
                                                                                                      fied into two categories, concrete and abstract hierarchical
                            10                                                                        expansion. Concrete hierarchical expansion is the relevant
                                        2                                                         0
                                                                                                      objects are well-defined (Fig. 2).

                                 subject distance (d)
                                                        10    −3

                                                                             exposure time (t)
                                                                                                      2.3 Contextual feature-based Expansion
                                                                                                         In order to perform direct extraction of high-level seman-
Figure 1: Image distribution in three-dimensional                                                     tic content automatically, we establish associations between
space                                                                                                 low-level features with high-level concepts, and such associ-
                                                                                                      ations take the following forms. In the contextual feature-
                                                                                                      based expansion, the presence of certain low-level features
                                                                                                      F may suggest a finite number of m object possibilities.
by a professional photographer usually includes a fast tele-                                             The presence of certain basic features alone may not be
photo lens and a camera that has an extremely fast exposure                                           sufficient to infer the presence of specific objects, but such
time that can rapidly take pictures. Definite relationships                                            features if augmented by additional information may lead to
exist between the type of scenes and image acquisition pa-                                            meaningful inferences.
rameters. Some typical scene categories and scene images
are given in Table 1, where scene images are subset of the                                            2.3.1 Global color distribution
corresponding image categories.                                                                         We convert all images from RGB color space to an indexed
   An image Ii may be characterized by a number of dimen-                                             image value and then extract the global color feature based
sions di1 , ..., dik which correspond to the image acquisition                                        on the color histogram. Histogram search is sensitive to
parameters; i.e. Ii = (di1 , ..., dik ). Each dimension has a                                         intensity variation, color distortion and cropping.
certain domain Dj; i.e. dij ∈ Dj, for all i. Each image                                                 The color distribution does not include any spatial infor-
corresponds to a point in k-dimensional space. Fig. 1 shows                                           mation and this problem is especially acute for large databases
the clustering of images from the training set and each par-                                          and large appearance changes. The correlogram of image
ticular type of image scenes tend to cluster in a group, which                                        captures spatial correlation between identical colors and mea-
forms the basis of our rule-induction algorithm.                                                      sures the distribution of features such as colors in the image
   Our annotation system has been developed using an image                                            as well as their spatial relationships. Another type of cor-
database which consists of a collection of 3,231 unlabelled                                           relogram, the shape context correlogram, takes into account
images obtained from a photograph album over the Web at                                               the statistics of the distribution of points in shape matching
random. All images in the database are metadata-embedded                                              and retrieval. These descriptors are good for taking into ac-
and stored in JPEG format. We manually label all images                                               count the spatial distribution of the colors and the spatial
with semantic concept (the scene of images) before arbitrar-                                          distribution of binary shapes.
ily dividing this image database into a training set and a test
set. Secondly, to numerically assess the accuracy and effec-                                           2.3.2 HSV Color Space
tiveness of our new annotation approach, we have retrieved                                               A color space is defined as a model for representing color
100,000+ sets of metadata of images from and are                                            in terms of intensity values. Technology based on color tone
used as an evaluation set. After training process on the en-                                          is most widely used due to compactness of calculation and
tire training set, we obtain the set of rules and discussed in                                        information expression. Methods using color tone is robust
[3].                                                                                                  with respect to object movement, rotation and to changes
                                                                                                      like distortion within an image and may be implemented
2.2    Knowledge-based expansion                                                                      easily. Color is defined by the three characteristics: hue, sat-
uration and value. HSV is an expression of color tones that
can be sensed by humans using these characteristics. Many                                90%

histogram distances have been used to define the similar-                                 80%

ity of two color histogram representations and Bhattacharya
Distance (BD), Chi-squared Distance (CD) and Euclidean
Distance (ED) are also used.                                                             60%

                                                                        precision rate
2.3.3 Edge-detection
  To establish associations between low-level features with
high-level concepts, associating basic features with semantic                            30%

concepts may be applied to arbitrary images for inclusion                                20%
in the semantic index. Edge detection is a methodology in
image processing and computer vision, particularly in the                                10%

areas of feature detection and feature extraction, to refer to                            0%



algorithms which aim at identifying points in a digital im-











age at which the image brightness changes sharply or more













formally has discontinuities. Here, we adapt edge detection

algorithms to extract high-level concepts from low-level fea-

                                                                   Figure 3:    Experimental results on contextual
                                                                   feature-based index expansion
   Our system is evaluated quantitatively in order to com-
pute the effectiveness of our approach. The quality assess-
ment of the machine-inferred boundaries between parts of           combining HSV color space histogram distances where the
the depicted scenes is based on the precision. A set of            precision rate grows to 67.2%(BD), 71.4%(CD) and 77.9%(ED).
standard evaluation queries are used for experimentation.          Obviously, compared to annotation without the contextual
Comparison is made between base-level indexing and the             feature-based index expansion enabled, the performance is
expanded level indexing, and the widely accepted measures          around 52.8%. From the joint application of these, we can
of retrieval performance of precision and recall are used to       formulate semantic annotations for specific image fully auto-
assess system performance. To numerically assess the accu-         matically and index images purely by machine without any
racy and effectiveness of our annotation approach, we have          human involvement.
retrieved 103,521 sets of images with 991,074 associated tags
from In our evaluation, we decide that a relevant       4. CONCLUSION
image must include a representation of the category in such           In this paper, a knowledge-based framework of image re-
a manner that a human should be able to immediately as-            trieval together with contextual feature-based expansion is
sociate it with the assessed concept.                              combined. Our method combines the advantages of origi-
3.1    Results                                                     nal ASA approach and contextual feature-based expansion
                                                                   while preserving the necessary image and knowledge coher-
   In [3, 4] by using decision trees and rule induction, a rule-   ence. Our system is evaluated quantitatively, and experi-
based approach to formulate explicit annotations for images        mental results indicate that this approach is able to deliver
fully automatically has been developed. In relation to im-         highly competent performance.
age acquisition, many images may be broken down to few
basic scenes, such as nature and wildlife, portrait, landscape
and sports. In the case of aggregation hierarchical expan-         5. REFERENCES
sion, we decided to test our system using the aggregation          [1] I. A. Azzam, C. H. C. Leung, and J. F. Horwood.
hierarchy of basic categories ”night scenes” and extend the            Implicit concept-based image indexing and retrieval. In
image hierarchy to find a sub-scene ”night scene of down-               Proceedings of the IEEE International Conference on
town”, ”downtown” can be expanded to ”business district”,              Multi-media Modeling, pages 354–359, Brisbane,
”commercial district”, ”city center” and ”city district”, while        Australia, January 2004.
”city district” can be expanded to ”road”, ”building”, ”archi-     [2] P. Over, C. H. C. Leung, H. Ip, and M. Grubinger.
tecture”, ”highway” and ”hotel”. f                                     Multimedia retrieval benchmarks. IEEE Multimedia,
   We performed experiments to show the optimality and                 11(80-84), April 2004.
convergence of our approach. Firstly, we randomly select           [3] R. C. F. Wong and C. H. C. Leung. Automatic
one ”downtown” image from the test set and carried out                 semantic annotation of real-world web images. IEEE
evaluation by comparing the original Automatic Semantic                Transactions on Pattern Analysis and Machine
Annotation (ASA) approach with our approach which com-                 Intelligence, 30(11):1933–1944, November 2008.
bines the original ASA approach using adaptive annotation          [4] R. C. F. Wong and C. H. C. Leung. Knowledge-based
of HSV color space and distance algorithms and the use of              expansion for image indexing. In International
human tags.                                                            Computer Symposium, volume 1, pages 161–165,
   In Fig 3, experimental results show that tags by human              November 2008.
deliver excellent precision rate with 100% precision but this
tagging approach relies heavily on human involvement. Sig-
nificantly better results can be obtained by ASA Approach