Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Weighted Link Analysis for Logo and Trademark Image Retrieval on by dffhrtcv3

VIEWS: 4 PAGES: 19

									Searching for Logo and
Trademark Images on the Web

            Euripides G.M. Petrakis*
             Epimenidis Voutsakis*
               Evangelos Milios**
    *TechnicalUniversity of Crete, Chania, Greece
       **Dalhousie University, Halifax, Canada
Retrieval of Logo & Trademarks
 Important characteristic signs of
  corporate Web sites and of products
  presented there
 Comprise 32,6% of total number of
  images on the Web
     Retrieval of logo & trademarks is of
      significant commercial interest
     Eg. Detection of unauthorized usage

CIVR'07       http://www.intelligence.tuc.gr/intellisearch   2
Image Retrieval on the Web
 Text queries: keywords, free text
 Answers: images in Web pages with similar
  text
     Images not always relevant or relevant but not
      important
     Important: From corporate web sites,
      organizations
     Less important: From individuals and small
      companies
 Link analysis: assign higher ranking to
  answers from important web sites

CIVR'07         http://www.intelligence.tuc.gr/intellisearch   3
Contributions
 Enhance accuracy of retrievals
     Support queries by image example
     Preference to images from important
      Web sites
 Evaluation of state-of-art methods
         Retrieval by Text
         Retrieval by Image content
         Retrieval by importance
         Combination of the above

CIVR'07          http://www.intelligence.tuc.gr/intellisearch   4
Image Content Representation
 Text surrounding images in Web pages
     Image filename, Alternate text, Page title,
      Caption
 Image features computed on Intensity &
  Energy histograms
     Mean & Variance on histograms
     Moment invariants on raw images
     Count of number of distinct intensity levels




CIVR'07         http://www.intelligence.tuc.gr/intellisearch   5
Histograms
 Intensity                                                            Intensity



  Spectrum:
                                          0,025
                                           0,02
                                          0,015



  distribution of
                                           0,01
                                          0,005
                                             0


  intensity values
                                                  1   17 33 49 65 81 97 113 129 145 161 177 193 209 225 241
                                                                              Bins




 Energy Spectrum:
  distribution of
  average energy
  over co-centric
  rings on DFT
CIVR'07      http://www.intelligence.tuc.gr/intellisearch                                                     6
Logo & trademark Detection
 Distinguish from images of other categories
     Small images
     Few intensity levels
     Rich frequency content
 Image features form vectors which are
  used to train a decision tree
     Accuracy: 85%
 Each image is a assigned a probability of
  being logo or trademark
     Retrieval gives more emphasis to images with
      high logo-trademark probability

CIVR'07        http://www.intelligence.tuc.gr/intellisearch   7
Logo-Trademark Similarity
 Simage-similarity(Q,D) =
          Sfeatures + Stext
 Sfeatures= Smoment-invariants +
    Sintensity-histogram + Senergy-histogram
 Stext= Simage-caption + Sfile-name +
    Salt-text + Spage-title


CIVR'07           http://www.intelligence.tuc.gr/intellisearch   8
Image Retrieval by Text
 Compute text similarity between
  Image and Query text descriptions
  using Vector Space Model (VSM)
     Text is represented by vectors of tf.idf
      term weights
     Q=(q1,q2,…qN) , D=(d1,d2,…dN)
     Similarity
                       S (Q, D) 
                                 qd                         i   i


                                              q d i
                                                        2
                                                        i             i   i
                                                                           2



CIVR'07        http://www.intelligence.tuc.gr/intellisearch                    9
Retrieval by Image features
 The similarity between histograms is
  computed by their inter intersection
 The similarity between moment
  invariants is computed as vector
  similarity




CIVR'07    http://www.intelligence.tuc.gr/intellisearch   10
Link Analysis
 Assign importance to Web pages,
  images
 Main idea: co-cited and co-contained
  images are likely to be related
 PageRank and HITS for text retrieval
 PicASHOW for Web pages with
  images using links alone
 WPicASHOW handles image and text
  content in queries and Web pages
CIVR'07    http://www.intelligence.tuc.gr/intellisearch   11
Focused graph F
 Retrieve initial set F of images
 Stop images (banners, buttons) are filtered
  out
 Non-logo/trademarks are filtered out
  (based on probability)
 Expand F with pages pointing to images in
  F
 Expand F with pages and images pointed to
  by pages in F
 Repeat until F sufficiently large

CIVR'07      http://www.intelligence.tuc.gr/intellisearch   12
Example of Focused Graph




CIVR'07   http://www.intelligence.tuc.gr/intellisearch   13
WPicASHOW
 Create the focused graph F
 Weighted links: image similarity between
  Queries are Images is used for regulating
  the influence of links in F
 Authorities: principal eigenvector of
  [(W+I)MT](W+I)M
     W: page to page relationships in F
     M: page to image relationships in F
 Answers: Rank answers by authority
  (eigen)value

CIVR'07         http://www.intelligence.tuc.gr/intellisearch   14
Evaluation
 Database assembled locally by
  crawler
     1,5M pages with images
 Text queries: VSM, PicASHOW,
  WPicASHOW
 Image queries (example image +
  text): VSM, WPicASHOW
 Average Precision/Recall on top 30
  answers
CIVR'07      http://www.intelligence.tuc.gr/intellisearch   15
Text Queries




CIVR'07   http://www.intelligence.tuc.gr/intellisearch   16
Queries by text and image




CIVR'07   http://www.intelligence.tuc.gr/intellisearch   17
Conclusions
 VSM: Relevant but not always
  important answers
 PicASHOW retrieves important but
  not always relevant answers
 WPicASHOW: good compromise
  between relevance and importance
 The size of the data set is a problem


CIVR'07    http://www.intelligence.tuc.gr/intellisearch   18
Web Implementation
 Try the system at
  http://www.intelligence.tuc.gr/intellis
  earch
 Selection of retrieval method
 Link analysis methods
 And more..



CIVR'07     http://www.intelligence.tuc.gr/intellisearch   19

								
To top