XIRS an XML-based Image Retrieval System
Document Sample


Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 233
XIRS: an XML-based Image Retrieval System
G. N. FANZOU TCHUISSANG 1, XU DE2, WANG N.3 FRANÇOIS SIEWE
School of Computer & Information Technology Team Research Group
Beijing Jiaotong University De Montfort University
P.O. Box: 100044 Beijing The Gateway
P.R. CHINA Leicester LE1 9BH
1)fanzounar2002@yahoo.fr 2) dxu@bjtu.edu.cn UNITED KINGDOM
3) nwang@bjtu.edu.cn fsiewe@yahoo.fr
Abstract: -This paper presents a formalization of an image retrieval system based on a notion of similarity between
images in a multimedia database (namely XML-Enabled Database) and where a user request can be an image file or a
keyword. The CBIR (Content Based Image Retrieval) system and the current search engines (e.g. Google, Yahoo….)
make image search possible only when the query is a keyword. This type of search is limited because keywords are not
expressive enough to describe all important characteristics of an image. For example, an exact match request cannot be
formulated in such systems. Thus, we propose a search system in which a request might be an image file or a keyword.
The MPEG-7 standard is used for describing an image as an XML document. A similarity distance between images is
defined which is used to compare the request image with the images of a database. We also propose an algorithm to
calculate a similarity distance between two XML nodes with a given precision ‘k’ (k is defined by the user: he can fix
‘k’ at 100% for the exact match retrieval of features) so as to be able to provide accurate information in response to a
user request. The statistics show that our system is more efficient than leading content based image retrieval systems
such as ERIC7 and current search engines.
Key-Words: - XML, Image, Multimedia databases, MPEG-7, similarity search
1 Introduction because the user should be an expert in search for images
This paper falls in the field of information retrieval, in to recognize these features. He should also be able to read
particular the search of images in a database when the and understand XML files and UML diagrams. The
request is an image or a keyword. The purpose of the search for images present by ERIC7 is then tedious. We
search process is to obtain user needed information from also observe that an exact match request cannot be
a database by comparing the user’s requirements with formulated in such systems.
available information in the database. This comparison is In this work, we propose a search system in
carried out by a System of Search for Information (SSI) which a request might be an image file or a keyword. We
[2], which is a set of programs with the goal to return to describe an image as a XML document using MPEG-7[4]
the user the maximum relevant documents available that standard. We have defined a similarity distance between
meet his needs. These needs are translated in a structured images which is used to compare the features of a request
way by the user in the form of requests. The concept of image to those of the images stored in a database. We
relevance being difficult to automate, the goal of the SSI also propose an algorithm to calculate a similarity
is then to make as accurate as possible the distance between two XML nodes with a given precision
correspondence between the system relevance and the ‘k’ (k is defined by the user: he can fix ‘k’ at 100% for
user relevance. the exact match retrieval of features) so as to be able to
The CBIR (Content Based Image Retrieval) provide accurate information in response to a user
system and the current search engine (CSE) (Google, request. The statistics show that our system is more
Yahoo…) make image search possible only when the efficient than leading content based image retrieval
query is a keyword. This type of search is limited systems such as ERIC7 and the current search engines.
because these keywords are not expressive enough to
describe all important characteristics of an image. To This paper is organized in the following way:
resolve this problem, ERIC7 [3] which is a CBIR system Section 2 presents an outline of MPEG-7; Section 3
compatible with the MPEG-7 [4] Multimedia standard describes the XIRS system and Section 4 is devoted to
proposed to the user to search images by features. the implementation and the discussion. Section 5
Hence, in ERIC7 the user can choose between 15 concludes the paper and outlines future work.
features by navigating within XML files using a tool that
generates UML diagrams. However, ERIC7 is limited
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 234
2 MPEG-7 involves similarity matching with fuzzy constraints
The ISO’s subcommittees SC29, WGll, MPEG (Moving including features, content and semantics [6].
Picture Experts Group), published in February 2002
another standard called "Multimedia Content
Description Interface" (in short 'MPEG-7'). The goal of 3 XIRS (XML Image Retrieval System)
MPEG-7 is to enable fast and effective search and XIRS is a set of 3 components: the XIRS Mediator,
filtering of multimedia content. MPEG-7 is a the interrogation module, and the XIRS Server.
standardization of XML metadata structures called Starting from the feature extraction and annotation
Descriptors (D) and Description Schemes (DS), which process of a multimedia asset, the XML documents
are used to describe and annotate multimedia are generated and stored in a repository. One can
information [4]. distinguish two scenarios: pull and push.
The Ds and DSs are defined using the MPEG-7
In the pull scenario, a user submits queries to
Description Definition Language (DDL), which is based
the system. In the push scenario, the system selects a
on the XML Schema Language. Many technologies still
need to be developed around the MPEG-7 for extracting, set of results satisfying the user query constraints
searching and querying multimedia databases, which (Fig. 1).
Images Query
MPEG-7:Features Extraction
XIRS MEDIATOR Users
Search / Browser
Query Node
XML Doc
XSLT + CSS
INTERROGATION MODULE
Pull
Result
Indexation
Push
Applications
Images Storage Repository Filter
Clob, Blob,… XML Forest
XML-Enabled 1 XIRS
Fig. DataBase XIRS
Architecture SERVER
Fig. 2 XIRS Architecture
3.1 XIRS Mediator An image is represented as a set of descriptors
2 <VisualDescriptor> (features) which are structured as XML nodes and
MPEG-7 Descriptors
<ScalableColor>….
<ColorLayout>….
stored in a XML document (Fig. 2).
<DominantColor>… The image and the XML document will then be
<…>…
</VisualDescriptor>
stored in the Database (XML - Enabled Database for
example). The XML document used by our system is
Clob, Blob… obtained by combining a part of the MPEG-7
1 <MetadataDescriptor> document (VisualDescriptors) and some other
<title>….
<Keyword>…. information coming from the tables of the database
Index <Date>… 1+2 where the images are stored (MetadataDescriptors).
<…>… In the presence of an image, the XIRS Mediator
Storage </MetadataDescriptor>
extracts two description levels which interest us:
Repository - «Visual Descriptors» extracted from the image by
3 <Image id=001> MPEG-7,
<MetadataDescriptor> - «Metadata descriptors»: Our XML document is
<VisualDescriptor> 1+2=3
…
Encoding
&
completed with some information describing the
</Image> Delivery semantic and contents (free keywords, its author,
its size, and its creation date...).
XML Node
We present below the DTD of the XML documents
Fig. 2 XIRS Mediator Scope constructed by XIRS Mediator.
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 235
<?xml version="1.0" encoding="UTF-8"?> <ShotType>general</ShotType>
<!DOCTYPE Images [ <IntExt>out</IntExt>
<!ELEMENT Image (MetadataDescriptor, VisualDescriptor)> <ScalableColor numberOfCoefficients="63">
<!ATTLIST image id CDATA #REQUIRED> <Coefficients> 1 1 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0
<!ELEMENT MetadataDescriptor 0 1 1 1 0 1 1 0 0 1 0 0 <Coefficients>
(ContentDescriptor,SemanticDescriptor)> </ScalableColor>
<!ELEMENT ContentDescriptor <ColorLayout numOfYCoef="64">
(keyword*,identifier,date,link,size)> <Ycoeff>
<!ELEMENT keyword (#PCDATA)> <YDCCoef>13</YDCCoeff>
<!ELEMENT identifier (#PCDATA)> <YACCoef> 27 23 2 16 10 14 16 9 9 17 14 13 16 16 16 16
<!ELEMENT date (#PCDATA)> 14 15 17 17 16 16 17 16 14 15 </YACCoef>
<!ELEMENT link (#PCDATA)> </Ycoeff>
<!ELEMENT size (#PCDATA)> …
<!ELEMENT SemanticDescriptor (title*)> </ColorLayout>
<!ELEMENT title (#PCDATA)> <DominantColor size="8">
<!ELEMENT VisualDescriptor <ColorSpace type="RGB"/>
(DayNight,Orientation,ShotType,IntExt,ScalableColor,ColorLay <SpatialCoherency>0.3722258333336</SpatialCoherency>
out, DominantColor)> <Values>
<!ELEMENT DayNight (#PCDATA)> <Percentage>0.0838</Percentage>
<!ELEMENT Orientation (#PCDATA)> <ColorVariance>23161.6189638.56.291</ColorVariance>
<!ELEMENT ShotType (#PCDATA)> </Values>
<!ELEMENT IntExt (#PCDATA)> </DominantColor>
<!ELEMENT ScalableColor (Coefficient*)> </VisualDescriptor>
<!ATTLIST ScalableColor NumberOfCoefficients CDATA </image> /* End of first image: Image of id ‘001’ */
#REQUIRED> </Images>
<!ELEMENT Coefficient (#PCDATA)> The Color Layout and the Dominant color
<!ELEMENT ColorLayout (Ycoeff,CbCoeff,CrCoeff)> are low level colors. The IntExt indicates if the image
<!ATTLIST ColorLayout NumOfYCoef CDATA
#REQUIRED> were taken outside, in the nature, or inside; The
<!ELEMENT Ycoeff (YDCCoef,YACCoef)> DayNight indicates if the image were taken during
<!ELEMENT YDCCoef (#PCDATA)> the day or during the night; The ShotType
<!ELEMENT YACCoef (#PCDATA)> characterizes the framing of the characters of the
<!ELEMENT CbCoeff (CbDCCoef,CbACCoef)>
<!ELEMENT CbDCCoef (#PCDATA)>
image and The Orientation are high level Colors. For
<!ELEMENT CbACCoef (#PCDATA)> each one of these features, a similarity distance is
<!ELEMENT CrCoeff (CrDCCoef,CrACCoef)> defined which makes it possible to measure the
<!ELEMENT CrDCCoef (#PCDATA)> similarity of two images.
<!ELEMENT CrACCoef (#PCDATA)> Once XIRS Mediator described an image
<!ELEMENT DominantColor
(ColorSpaceType,SpatialCoherency, Percentage, in XML node, the node is categorized (to prevent too
ColorValueIndex, ColorVariance)> bulky XML documents) and stored in XML
<!ATTLIST DominantColor size CDATA #REQUIRED> documents of the collection. The role of XIRS
<!ELEMENT ColorSpaceType (#PCDATA)> Mediator is thus to define an image in XML and vice
<!ELEMENT SpatialCoherency (#PCDATA)>
<!ELEMENT Percentage (#PCDATA)>
versa. The reverse way is done easily by using the
<!ELEMENT ColorValueIndex (#PCDATA)> node: <link>...\bjtu001.jpg</link>
<!ELEMENT ColorVariance (#PCDATA)>
]>
Example of XML Document 3.2 Interrogation Module
For an example, let us concentrate on the image of the gate of
Beijing Jiaotong University.
The data model of the XIRS interrogation module is a
<?xml version="1.0" encoding="UTF-8"?> simplification of XPath data model presented in [1],
<Images> where a structured document is a tree, composed of
<image id=001> simple nodes, sheet nodes and attributes. A node can
<MetadataDescriptor> be a document, an element, a text, a namespace, an
<ContentDescriptor>
<Identifier>bjtu001</Identifier> instruction or a comment. Two cases of request arise.
<Keyword> Jiaotong University </Keyword>
<Keyword>north Gate </Keyword> 3.2.1 The request is a keyword
<link>....\bjtu001.jpg </link> A request is a conjunction of sub-requests. We have
<size>6k </size>
<date>02/06/2007</date>
the following illustration:
<ContentDescriptor> Query → sub-request AND sub-request | sub-request
<SemanticDescriptor > OR sub-request | NOT sub-request.
<title>Beijing Jiaotong University north gates</title> The Boolean model introduced in [2] defined the
</SemanticDescriptor > similarity between an image I and a request Q as:
…
⎧1 if I ∈ the set described by the request Q
</MetadataDescription>
d(Q, I) = ⎨
⎩ 0 otherwise
<VisualDescriptor>
<DayNight>day</DayNight>
<Orientation>vertical</Orientation>
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 236
3.2.2 The request is an image elementName: terminal symbols representing a name
The comparison between an image and a request of tag
amounts calculating a score. The image relevance attributName: terminal symbols representing a name
with respect to the request is calculated by a of attribute
similarity function noted d(Q, I), where Q is the
request image and I is an image of the Database. It
thus leads to calculate a similarity distance between 3.3 XIRS Zone Server
two XML nodes. We will used the following 3.3.1 XIRS principle: Search for images by
notations : I = (I1, I2,…,Im) for an Image set and T = similarity
(t1, t2,…, tn) for a keyword set. We describe the image
r Let us assumed this
Ij as a vector : I j = (w1,i, w2,i , . . . , wj,i,…, wn,i) image as a request
XML documents
where wi,j Є {0, 1} is the term-weighting. Let fi
denote the function that returns the associated weight <Image id = 1>
r <Title>Ferrari </Title>
<Color>
of the term ti : fi( I j ) = wi,j . K can be 60%, 80% or 100%
<Red> 12 10 23</Red>
<black> 06 11 30</black>
…
for exact match </Image>
The XML node produced by the XIRS
<Image id = 2>
<Title>bmw </Title>
<Color>
Mediator and corresponding to the request image is
<Red> 12 10 23</Red>
<Image id = 1>
<Title>bjtu </Title>
K- close Neighbors <black> 06 11 30</black>
…
regarded as a block of requests (like a system of
<Color> </Image>
<Red> 12 10 23</Red>
<black> 06 11
30</black> = <Image id =3>
<Title>tsinghua</Title>
<Color>
equation with several unknown factors), in which <Red> 12 10 23</Red>
<black> 06 11 30</black>
…
each sub-node (features) is seen as a request. It is ⇓
</Image>
.
thus a question of reassuring when one has a node
(q n 0 , q nn )
.
.
q n1 , ... <Image id = n>
coming from a XML document of the Database that <Title>pekin </Title>
<Color>
<Red> 12 10 23</Red>
both sub-nodes are similar. Fig. 3 XIRS principle
If a feature of an image is indexed by tj and if The image request is a node; it is a question
tj < tk then it is also indexed by tk. Therefore, one can of returning all the nodes of the XML documents of
extend the vector Ii so that: ∀j , k ∈ [1; n] , wk,i =1 if
the collection which are similar to the request node
according to a precision ‘’k’’. A similarity distance
wj,i =1 and tj < tk, otherwise wk,i =0. The usual ‘d’ between two nodes is defined by:
similarity measure in the vectorial model [2] is the
cosinus. d:N →D
n ⎛ s n0 ⎞ ⎛s ⎞ ⎛ s n 0 ⎞ ⎛ w0 q ⎞
⎜ ⎟ ⎜ n0 ⎟ ⎜ ⎟ ⎜ ⎟
Ik ∗Il
∑w j ,k × w j ,l
⎜ s n1 ⎟ ⎜ s n1 ⎟ ⎜ s n1 ⎟ ⎜ w1 q ⎟
j =1 d⎜ =
cos(I k ; I l ) = = ⎜ ... ⎟ a d ⎜ ... ⎟ ... ⎟ ⎜ ... ⎟
Ik ∗ Il n n
⎜ ⎟ ⎜ ⎟
∑w 2
j ,k × ∑w 2
j ,l ⎜ ⎟
⎜s ⎟
⎜ ⎟
⎜s ⎟ ⎜s ⎟ ⎜w ⎟
⎝ nn ⎠ nq ⎠
j =1 j =1 ⎝ nn ⎠ ⎝ nn ⎠ ⎝
Hence, d(Ik; Il) = (1 - cos(Ik; Il)) is a Where sn 0 , sn1 , … , snn are variables (sub-
similarity distance between two images Ik and Il. nodes representing the features of the image) coming
The following grammar gives a complete description from XML documents of our Database, N is a set of
of the request language used. The axiom of the Nodes and D is a set of distances. The description of
grammar is Query, Non-terminal symbols are in the image request (Fig.3) being a XML node, qn 0 ,
bold, terminal symbols (Tokens) are in italic and the
production rules are described below: qn1 , … , qnn are fixed and are query sub-nodes; w0 q ,
Query → r1 | r2 w1q , … , wnq are weight (similarity distance between
r1 → ExpressionA ExpressionB features) associated to the sub-node snl compared to
ExpressionA → keyword SuiteExpressionA | ( the request qnl with l ∈ [0, n], ‘l’ is the number of
keyword ) SuiteExpressionA sub-node of a node.
SuiteExpressionA → ExpressionA | ε
ExpressionB → BooleenOperator r1 | ε 3.2.3 Construction of S: set of results
BooleenOperator → OR | AND | NOT | ε Definition 1: Two XML nodes are k-similar if ‘k’
percent of their sub-nodes (features) are identical.
r2 → ExpressionStructure SuiteExpressionStructure
ExpressionStructure → elementName[ Condition ] Definition 2: A node belongs to S if and only if this
Condition → @attributName = keyword | r1 | ε node is K-similar to the node describes by the request
SuiteExpressionStructure → BooleenOperator
image, ie: if AVG ( w0 q , w1q , … , wnq ) ≥ k.
ExpressionStructure | ε
Caption:
ε denotes an empty string
keyword: terminal symbols representing a keyword
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 237
- Construction of S processor, 80 Go of hard disk and 512 Mo of RAM.
For Each node Sn of an XML document of the Database The operating system was Windows XP SP2.
w 0q + w 1q + ... + w nq
If ≥ k Then
n +1
S ← S + sn
Else
S ← S +{ } ;
take another node
EndIf
End For
wiq is similarity distance between features
- Calculation of w iq
Fig.4 XIRS interface
If tag (ssni , qsni ) = true then Keyword= content- To validate our system, we measured the
qni(1) precision of retrieval (percentage of similarity
/* it’s currently necessary to calculate the various between the query and the result (PR)). We believe
weights between the contents of the tags of ssn and that a better and more accurate measure could be
that of qsn */ achieved by using this metric.
if dB(ssn, qsn)=1 then wiq=100; - The precision of retrieval (PR)
else To choose an appropriate set of queries for the
n n n evaluation, we considered the types of queries used in
∑w
j =1
2
j,k × ∑w
j =1
2
j ,l − ∑ w j , k × w j ,l
j =1
basic processing operations of search:
w iq = × 100 Exact match search (Fig.5 a): when the value of
n n
∑w
j =1
2
j,k × ∑w
j =1
2
j ,l
k is equal to 100%, XIRS returns only the XML
nodes identical to the XML node of the request
else take another sub node image and thus the returning images are the one
identical to the image request. In 100 images
Here ssn is a sub-node (sub feature) and qsn is a query return by ERIC7, 40 are totally different to the
sub-node, dB(ssn, qsn) is the Boolean model distance. request image depending by the features given by
(1) fixes the contents of qni as a keyword i.e. the user. In CSE, 70% of returned images are not
<qni > content- qni </ qni > similar.
Full text search (Fig.5 b): the PR of ERIC7 is
tag(a, b) is a function which returns true when his 88.4% while that of XIRS is 88.3%, due to the
arguments have a similarity content of database clustering done by ERIC7.
tags(according to the precision k) Semantic search (Fig.5 c): XIRS is about 35 %
Example: if k=80% more efficient than ERIC7, due of the semantic
tag(<name> fanzous </name>, <names> fanzoug descriptors insert in the XML Nodes by XIRS
</names>) = = true Mediator.
d(a, b) is the function which returns the percentage PR(%) PR(%) PR(%)
of similarity between the data of a sub-node
Example: 100% 88.4 100
d(<name> fanzous </name>, <names> fanzoug 80% 80 XIRS
XIRS
</names>) = = 90% 60%
88.3 60
XIRS
ERIC7 ERIC7
ERIC7
40% 40
88.2 CSE CSE CSE
20% 20
4 Implementation and discussion 0% 0
88.1
We have used PHP 5.0 to build an interrogation SSI SSI SSI
interface (Fig.4). The Database Oracle 8i was used (a) (b) (c)
for storage of the images and the XML documents. Fig.5 Experiment results
We used MPEG-7 library to implement XIRS
Mediator. - Applications
We have created a Database of 1 500 Images. Posting an image for the similarity search in a
After categorization of images, we obtained 15 XML Database can have an importance in Hospitals to find
documents in our collection. The evaluation was the diagnosis of the radiographic stereotypes. It can
conducted on a computer having: 1.6 GHz of also be used to implement iconic communication
systems such as those described in [5].
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 238
5 Conclusion
In this paper, we have defined a search system for [2] G. N. Fanzou T., XIRL : XML Information
images when the request is an image or a keyword. Retrieval , Mémoire de DEA, University of
The user has the possibility to formulate his Yaounde 1, Cameroon, 2006.
requirements in information using a given precision [3] L. Gagnon, S. Foucher, V. Gouaillier, ERIC7:
K. The similarity between two images is defined by An Experimental Tool for Content-Based Image
the similarity between two XML nodes representative Encoding and Retrieval under the MPEG-7
the two images. An evaluation of XIRS shows the Standard, R&D Department, Computer
effectiveness of this system towards the CBIR Research Institute of Montreal, 2004.
systems and the Current Search Engines (CSE) [4] H. Kosch, MPEG-7 and Multimedia Database
(Google, Yahoo…) as for the search for images. Systems, SIGMOD Record, Vol. 31, No. 2, June
The reformulation of the requests, the 2002.
consideration of several images like the request [5] N. C. Kuicheu, P. L. Fotso, F. Siewe. Iconic
(iconic sentences for example) and the consideration Communication System by XML Language
of the heterogeneous sources of images constitute (SCILX). In the proceedings of the 2007 ACM
prospects for the continuation of this work. International Cross-Disciplinary Conference on
Web Accessibility, Banff, Canada, 7-8 May 2007.
[6] J. M. Martínez, MPEG-7 Overview (version
References: 10), Palma de Mallorca, 2004.
[1] S. Boag and al. (Eds), XQuery 1.0 : An XML
Query Language, W3C Working draft, 2003.
Related docs
Get documents about "