Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Weighted Link Analysis for Logo and Trademark Image Retrieval on by dffhrtcv3


									Searching for Logo and
Trademark Images on the Web

            Euripides G.M. Petrakis*
             Epimenidis Voutsakis*
               Evangelos Milios**
    *TechnicalUniversity of Crete, Chania, Greece
       **Dalhousie University, Halifax, Canada
Retrieval of Logo & Trademarks
 Important characteristic signs of
  corporate Web sites and of products
  presented there
 Comprise 32,6% of total number of
  images on the Web
     Retrieval of logo & trademarks is of
      significant commercial interest
     Eg. Detection of unauthorized usage

CIVR'07   2
Image Retrieval on the Web
 Text queries: keywords, free text
 Answers: images in Web pages with similar
     Images not always relevant or relevant but not
     Important: From corporate web sites,
     Less important: From individuals and small
 Link analysis: assign higher ranking to
  answers from important web sites

CIVR'07   3
 Enhance accuracy of retrievals
     Support queries by image example
     Preference to images from important
      Web sites
 Evaluation of state-of-art methods
         Retrieval by Text
         Retrieval by Image content
         Retrieval by importance
         Combination of the above

CIVR'07   4
Image Content Representation
 Text surrounding images in Web pages
     Image filename, Alternate text, Page title,
 Image features computed on Intensity &
  Energy histograms
     Mean & Variance on histograms
     Moment invariants on raw images
     Count of number of distinct intensity levels

CIVR'07   5
 Intensity                                                            Intensity


  distribution of

  intensity values
                                                  1   17 33 49 65 81 97 113 129 145 161 177 193 209 225 241

 Energy Spectrum:
  distribution of
  average energy
  over co-centric
  rings on DFT
CIVR'07                                                     6
Logo & trademark Detection
 Distinguish from images of other categories
     Small images
     Few intensity levels
     Rich frequency content
 Image features form vectors which are
  used to train a decision tree
     Accuracy: 85%
 Each image is a assigned a probability of
  being logo or trademark
     Retrieval gives more emphasis to images with
      high logo-trademark probability

CIVR'07   7
Logo-Trademark Similarity
 Simage-similarity(Q,D) =
          Sfeatures + Stext
 Sfeatures= Smoment-invariants +
    Sintensity-histogram + Senergy-histogram
 Stext= Simage-caption + Sfile-name +
    Salt-text + Spage-title

CIVR'07    8
Image Retrieval by Text
 Compute text similarity between
  Image and Query text descriptions
  using Vector Space Model (VSM)
     Text is represented by vectors of tf.idf
      term weights
     Q=(q1,q2,…qN) , D=(d1,d2,…dN)
     Similarity
                       S (Q, D) 
                                 qd                         i   i

                                              q d i
                                                        i             i   i

CIVR'07                    9
Retrieval by Image features
 The similarity between histograms is
  computed by their inter intersection
 The similarity between moment
  invariants is computed as vector

CIVR'07   10
Link Analysis
 Assign importance to Web pages,
 Main idea: co-cited and co-contained
  images are likely to be related
 PageRank and HITS for text retrieval
 PicASHOW for Web pages with
  images using links alone
 WPicASHOW handles image and text
  content in queries and Web pages
CIVR'07   11
Focused graph F
 Retrieve initial set F of images
 Stop images (banners, buttons) are filtered
 Non-logo/trademarks are filtered out
  (based on probability)
 Expand F with pages pointing to images in
 Expand F with pages and images pointed to
  by pages in F
 Repeat until F sufficiently large

CIVR'07   12
Example of Focused Graph

CIVR'07   13
 Create the focused graph F
 Weighted links: image similarity between
  Queries are Images is used for regulating
  the influence of links in F
 Authorities: principal eigenvector of
     W: page to page relationships in F
     M: page to image relationships in F
 Answers: Rank answers by authority

CIVR'07   14
 Database assembled locally by
     1,5M pages with images
 Text queries: VSM, PicASHOW,
 Image queries (example image +
  text): VSM, WPicASHOW
 Average Precision/Recall on top 30
CIVR'07   15
Text Queries

CIVR'07   16
Queries by text and image

CIVR'07   17
 VSM: Relevant but not always
  important answers
 PicASHOW retrieves important but
  not always relevant answers
 WPicASHOW: good compromise
  between relevance and importance
 The size of the data set is a problem

CIVR'07   18
Web Implementation
 Try the system at
 Selection of retrieval method
 Link analysis methods
 And more..

CIVR'07   19

To top