Multimedia Databases by jennyyingdi



        IV MSc CST
                                                                   Multimedia Databases

                                    CHAPTER 1


1.1 Introduction
Multimedia data typically means digital images, audio, video, animation and graphics
together with text data. There are number of data types that can be characterized as
multimedia data types. The basic types can be described as follows:

       1. Text: The form in which the text can be stored can vary greatly. In addition to
          ASCII based files, text is typically stored in processor files, spreadsheets,
          databases and annotations on more general multimedia objects. With
          availability and proliferation of GUIs, text fonts the job of storing text is
          becoming complex allowing special effects (color, shades..).
       2. Images: There is great variance in the quality and size of storage for still
          images. Digitalized images are sequence of pixels that represents a region in
          the user's graphical display. The space overhead for still images varies on the
          basis of resolution, size, complexity, and compression scheme used to store
          image. The popular image formats are jpg, png, bmp, tiff.
       3. Audio: An increasingly popular data type being integrated in most of
          applications is Audio. Its quite space intensive. One minute of sound can take
          up to 2-3 Mbs of space. Several techniques are used to compress it in suitable
       4. Video: One on the most space consuming multimedia data type is digitalized
          video. The digitalized videos are stored as sequence of frames. Depending
          upon its resolution and size a single frame can consume upto 1 MB. Also to
          have realistic video playback, the transmission, compression, and
          decompression of digitalized require continuous transfer rate.
       5. Graphic Objects: These consist of special data structures used to define 2D
          & 3D shapes through which we can define multimedia objects. These include
          various formats used by image, video editing applications.

                                                                     Multimedia Databases

1.1.1 Need for Multimedia Data

        The acquisition, generation, storage and processing of multimedia data in
computers and transmission over networks have grown tremendously in the recent past.
This astonishing growth is made possible by three factors. Firstly, personal computers
usage becomes widespread and their computational power gets increased. Also
technological advancements resulted in high-resolution devices, which can capture and
display multimedia data (digital cameras, scanners, monitors, and printers). Also there
came high-density storage devices. Secondly high-speed data communication networks
are available nowadays. The Web has wildly proliferated and software for manipulating
multimedia data is now available. Lastly, some specific applications (existing) and future
applications need to live with multimedia data. This trend is expected to go up in the days
to come.

        Multimedia data are blessed with a number of exciting features. They can provide
more effective dissemination of information in science, engineering, medicine, modern
biology, and social sciences. It also facilitates the development of new paradigms in
distance learning, and interactive personal and group entertainment.

        The huge amount of data in different multimedia-related applications warranted to
have databases as databases provide consistency, concurrency, integrity, security and
availability of data. From a user perspective, databases provide functionalities for the
easy manipulation, query and retrieval of highly relevant information from huge
collections of stored data.

        Multimedia Databases (MMDBs) have to cope up with the increased usage of a
large volume of multimedia data being used in various software applications. The
applications include digital libraries, manufacturing and retailing, art and entertainment,
journalism and so on. Some inherent qualities of multimedia data have both direct and
indirect influence on the design and development of a multimedia database. MMDBs are
supposed to provide almost all the functionalities, a traditional database provides. Apart
from those, a MMDB has to provide some new and enhanced functionalities and features.
MMDBs are required to provide unified frameworks for storing, processing, retrieving,
transmitting and presenting a variety of media data types in a wide variety of formats. At
the same time, they must adhere to numerical constraints that are normally not found in
traditional databases.

                                                                    Multimedia Databases

1.1.2 How is Multimedia Data Different?
   1. The content of multimedia data is often captured with different “capture”
      techniques (e.g., image processing) that may be rather unreliable. Multimedia
      processing techniques need to be able to handle different ways of content capture
      including automated ways and/or manual methods.
   2. Queries posed by the user in multimedia databases often cannot come back with a
      textual answer. Rather, the answer to a query may be a complex multimedia
      presentation that the user can browse at his/her leisure. It shows how queries to
      multimedia databases may be used to generate multimedia presentations that
      satisfy user’s queries.
   3. Multimedia data is large and affects the storage, retrieval and transmission of
      multimedia data.
   4. In case of video and audio databases time to retrieve information may be critical
      (Video on demand).
   5. Automatic feature extraction and Indexing: In conventional databases user
      explicitly submits the attribute values of objects inserted into the database. In
      contrast, advanced tools such as image processing and pattern recognition tools
      for images, to extract the various features and content of multimedia objects. As
      size of data is very large we need special data structures for storing and indexing.
   6. Multimedia objects are context dependence.
   7. Queries looking for multimedia objects are fuzzy in nature.

1.2 Applications
Many applications can make use of multimedia information to enhance the quality of
their products. The broadcast companies create and broadcast television programs to the
viewers. Cable television companies such as iCable and OptusVision in Australia
transmit their encrypted audio and video programs via dedicated network cables to the
set-top box. The set-top box then decrypts and transmits these television signals to the
television. The viewer can thus watch them on the television.

Television can also be provided via the Internet. Some Web sites containing live radio
and live television programs are available for listeners and viewers. Audience members
who have missed some programs may select to watch them again via browsers.

Movie producers create digital movies using computers and allow paid viewers to watch
them. They may allow everyone to watch the advertising materials to attract more

                                                                     Multimedia Databases

viewers. The music companies may produce song albums for artists. Amateur artists may
directly produce their songs and publish them to increase their personal fame.
Video on-demand, or Interactive TV, systems show video to the viewers who have
subscribed to watch the videos. They transmit selected video and audio objects according
to user’s choice. Education on-demand systems provide video of course lectures to
students enrolled in the course. They help students in learning at their own pace. News-
on-demand and sports-on-demand systems can provide instantaneous news and sports
information to interesting viewers.

Remote communication and cooperation can be achieved by transmitting video and audio
information. Video telephones transmit telephone and small video image over broadband
networks. Microsoft Netmeeting® and CUSeeme® provide video conference over
computers connected over the network. Collaborative computing can be achieved by
synchronizing the working task over remote communications. Video e-mails may also
enhance desynchronized communications. Voice over IP software reduces international
telephone calls charges by using the Internet.
Commercial companies may install security monitoring systems that provide around-the-
clock monitoring for the office and factory areas. Advanced systems may provide
automatic alerts when too many video cameras are being watched by a few security
officers. Multimedia information can also provide automatic quality control to enhance
production. Video cameras can take images of products. Products with significant defects
will be filtered and removed from the production line.

Visual information systems interactively search the multimedia databases using image
and audio information. Many libraries have digitized their books and journals. With the
support of government, many digital libraries have been built, and they are available to
visitors around the world. Some museums have created an online version of some of their
collections. These virtual museums allow virtual visitors to watch their collections online.
Hospitals install patient monitoring systems to monitor patients who are staying in
intensive care units. The Earth Observatory System records and stores video information
from satellites. The system produces petabytes (1015 bytes) of scientific data per year.
Multimedia information has always been used in the entertainment industry. Interactive
video games can be enriched by high resolution graphics. Interactive stories can become
a reality for story readers who may make their choice on how a story proceeds and ends.

   Multimedia Database is capable of handling huge volumes of multimedia objects
which a general database fails to do effectively. For example,

      Multimedia Database will help to create virtual museum.

      It will surely help to develop multimedia applications in various fields like
       teaching, medical sciences and libraries.

                                                                    Multimedia Databases

      Preserving decaying photographs, maps, films having got historical evidence or
       national importance.

      Using multimedia database, we can develop the excellent teaching packages.

      Helps multi-user operations.

   Educational Multimedia Services:

   Useful in distance education where classes may be recorded and sent to the students,
   or in online audio-video-document/mimeographed notes, where the students can
   access records required by querying the database.

   Movie Industry:

   Video-On-Demand has become the latest move and fast evolving concept in movie
   industry. The user could query the database for an English movie released in 2004,
   which is a thriller but did not have Nicholas Cage in it.

   Expert Advice:

   In situations where we come across a tight-knot in trying to operate any device or not
   within reach to find a mechanic to fix our machine or car, we can query the database
   from the net using a portable device and ask for the expert’s advice for that particular
   problem instead of reading through an entire manual.

   Travel Industry:

   Multimedia databases prove to be intelligent travel agents where it may show you
   place of travel that may suit your budget, your field of interest, your requirements,

   Home Shopping:

   The customer can browse through a company’s online facility or any available facility
   and query for listed items with specification like brand, cost, size, color, etc.

1.2.1 Major System Configurations
A multimedia application system has to consider the data storage and distribution
system, the data delivery network, and the delivery scheduling

                                                                      Multimedia Databases

1.2.2 Data Storage and Distribution
Several data storage and distribution systems have been researched. These include the
centralized system, the storage area network (SAN), the content distribution network
(CDN), and the serverless or peer-to-peer (P2P) network.
The centralized system stores all the multimedia objects in one location. The storage area
network stores the multimedia objects on several servers. These storage servers are
connected over a local area network using optical fibres. The content distribution network
distributes the multimedia objects on servers that are spread over a wide area network.
Client requests are sent to the nearest server that contains the object to serve the request.
The serverless systems or peer-to-peer networks do not permanently store the objects on
the servers. The server containing the object will only serve the first few requests for the
object. Afterwards, the nodes that have the object will become the seed and serve other
clients. Thus, the server can become free, and it can be disconnected from the network.

1.2.3 Delivery Network and Scheduling
The data delivery network can be built by laying dedicated cables or by the Internet. The
multimedia objects can be delivered via broadcasting or video-on-demand (VOD)
systems. Depending on the delivery scheduling and the delivery network, at least four
types of system architectures can be built.
The interactive television (ITV) companies build their systems by broadcasting over
dedicated cables (Figure 1.1). In the systems, the users subscribe to an ITV company. The
ITV Company broadcasts a number of channels of video content via a cable to a
dedicated set-top box (STB). The STB is then connected to the television set. The user
selects a channel to watch via a remote control unit of the STB. The ITV companies may
provide video-on-demand via dedicated cables (Figure 1.2). In the systems, the users
subscribe to an ITV company. The ITV Company downloads a movie list to the Set Top
box. User then selects a movie from the list using remote control of set top box. The ITV
Company broadcasts the movie in a new channel to the user. Some user may join an
existing channel to watch.

Figure 1.1: Broad casting over dedicated cables

                                                                     Multimedia Databases

Figure 1.2: Video-on-demand over dedicated cables.

The content providers may deliver multimedia objects by broadcasting over the Internet
(Figure 1.3). Users first subscribe to a content provider on the Internet. They are then
allowed to join a live video/audio channel. The content provider then delivers the live
multimedia objects from the streaming servers to all users. Users then use their browser
to receive and play streams. The content providers may also provide video-on-demand
services over the Internet (Figure 1.4). Users first subscribe to a content provider on the
Internet, and the user may select a multimedia object from the content provider’s Web
site. The content provider then tests the streaming ability to the user’s computer. The
streaming server delivers the low or high resolution object suitable for delivery to the
user. The browser on the user’s computer receives and plays the streaming object.

Figure 1.3: Broadcasting over the internet

                                                                  Multimedia Databases

1.2.4 Video-on-Demand Systems
Four different types of video-on-demand systems have been investigated. These include
the near video-on-demand (NVOD) systems, true video-on-demand (TVOD) systems,
partitioned video-on-demand (PVOD) systems, and dynamically allocated video-on-
demand (DAVOD) systems. In the true video-on-demand systems, the user has complete
control of a multimedia program. The user can perform normal play, reverse play, fast
forward, random positioning, pause, and resume. In this system, each user is allocated a
unique channel during the total duration. It allows complete user interactivity. The
number of concurrent users is however limited by the number of available channels. As a
result, many potential viewers may not be able to access the system during the busy
period of time.

Figure 1.4: Video-on-demand over the internet

The near video-on-demand system (Figure 1.5) provides video distribution at relatively
low cost. This system however provides only limited user interactivity. A popular video
is broadcast using several streams or channels. Each channel is separated from the
previous channel at a fixed interval. When the user requests for this video, the user’s
access will be delayed until the start of the next stream.
The partitioned video-on-demand system (Figure 2.6) combines the advantages of both
NVOD and TVOD systems. User interactivity is provided at the capacity of the system.
Digital channels are partitioned into two groups:
NVOD and TVOD services. NVOD channels broadcast the most popular video with
limited user control. TVOD channels will provide complete user control functions. For
example, the digital channels are divided into 50 broadcast channels and 450 interactive

                                                                    Multimedia Databases

Figure 1.5: Near video-on-demand systems

Figure 1.6: Partitioned video-on-demand systems

The dynamically allocated video-on-demand system is an extension of the
PVOD scheme. The user, watching a video from the NVOD list of most
popular videos, can request the interactivity with the video at any time. If a
channel is available, the user will be switched to the TVOD group of channels
which allows complete control. The split-and-merge (SAM) protocol
provides a mechanism to split user streams for interactive functions and
merge streams when possible.

                                                                    Multimedia Databases

1.2.5 Video Conference System
In video conference systems (Figure 2.7), computers are each installed with a video
camera, microphone, and connected to the network. A user initiates and hosts a
conference meeting. Other users then join the meeting. All of them send their own video
and audio signals to all the other users. Users may speak, type, or draw on whiteboard.
In these systems, the network needs to deliver the video capture stream from
every user to all other users. The number of video streams is equal to n(n-1) for n
concurrent users. Thus, the network needs to support a very large number of streams.

Figure 1.6 Video conference systems

1.2.6 Applications: A Scenario
Let’s consider a scenario where multimedia databases play a major role in a police

         Image Query: Where the police may have a photograph of a victim or the
     criminal, they can query the database for an image match which helps them find the
     identity of the person.

         Video Query: An accused might be caught on tape in a roadside mart and the
     police may not have accurate information about his location. By querying the
     surveillance video of the stores in that area for a match of the person, his recorded
     actions could be obtained. The video could also be split into frames to see any
     transactions between the accused and any other person.

         Audio Query: In case of audio surveillance tape, the police can record the
     conversation, match for key words like “bomb”, “Osama”, “White House”,
     “Mafia”, etc., and save the voice in vector format. This vector format can be used to
     query the database for similar voice and check for other “marked” words to raise a
     red flag.

                                                              Multimedia Databases

    Text Query: These may include browsing through old newspapers or other
documents of importance to view similar accidents or search an official record for a
person’s past criminal record.

    Simple Heterogeneous Query: A possible query which can be similar to: Find
all murder convicts of Delhi who have recently had electronic fund transfers made
into their bank accounts in ABC Corp.

   Complex Heterogeneous Query: A possible query which can be similar to:
Find all individuals who have been photographed with Simon Claude and who have
been convicted of murder in Delhi who have recently had electronic fund transfers
made into their bank accounts in ABC Corp.

                                                                     Multimedia Databases

                                    CHAPTER 2

         M U L T I M E D I A D AT A B A S E S
A multimedia database is a database that hosts one or more primary media file types such
as text documents, images, videos, audio, etc. And loosely fall into three main categories:

    Static media (time-independent, i.e. images and handwriting)

    Dynamic media (time-dependent, i.e. video and sound bytes)

    Dimensional media (i.e. 3D games or computer-aided drafting programs- CAD)

      All primary media files are stored in binary strings of zeros and ones, and are
encoded according to file type.

2.1 Contents of Multimedia Databases
A MMDB needs to manage several different types of information pertaining to the actual
multimedia data. They are:

    Media data - This is the actual data representing images, audio, video that are
     captured, digitized, processes, compressed and stored.

    Media format data - This contains information pertaining to the format of the
     media data after it goes through the acquisition, processing, and encoding phases.
     For instance, this consists of information such as the sampling rate, resolution,
     frame rate, encoding scheme etc.

    Media keyword data - This contains the keyword descriptions, usually relating
     to the generation of the media data. For example, for a video, this might include
     the date, time, and place of recording , the person who recorded, the scene that is
     recorded, etc This is also called as content descriptive data.

    Media feature data - This contains the features derived from the media data. A
     feature characterizes the media contents. For example, this could contain
     information about the distribution of colors, the kinds of textures and the different
     shapes present in an image. This is also referred to as content dependent data.

The last three types are called meta data as they describe several different aspects of the
media data. The media keyword data and media feature data are used as indices for
searching purpose. The media format data is used to present the retrieved information.

                                                                    Multimedia Databases

2.2 Designing Multimedia Databases
         Many inherent characteristics of multimedia data have direct and indirect impacts
on the design of MMDBs. These include: the huge size of MMDBs, temporal nature,
richness of content, complexity of representation and subjective interpretation. The major
challenges in designing multimedia databases arise from several requirements they need
to satisfy such as the following:

   1. Manage different types of input, output, and storage devices. Data input can be
      from a variety of devices such as scanners, digital camera for images,
      microphone, MIDI devices for audio, video cameras. Typical output devices are
      high-resolution monitors for images and video, and speakers for audio.
   2. Handle a variety of data compression and storage formats. The data encoding has
      a variety of formats even within a single application. For instance, in medical
      applications, the MRI images of brain have lossless or very stringent quality of
      lossy coding technique, while the X-ray images of bones can be less stringent.
      Also, the radiological image data, the ECG data, other patient data, etc. have
      widely varying formats.
   3. Support different computing platforms and operating systems. Different users
      operate computers and devices suited to their needs and tastes. But they need the
      same kind of user-level view of the database.
   4. Integrate different data models. Some data such as numeric and textual data are
      best handled using a relational database model, while some others such as video
      documents are better handled using an object-oriented database model. So these
      two models should coexist together in MMDBs.
   5. Offer a variety of user-friendly query systems suited to different kinds of media.
      From a user point of view, easy-to-use queries and fast and accurate retrieval of
      information is highly desirable. The query for the same item can be in different
      forms. For example, a portion of interest in a video can be queried by using either

              1) a few sample video frames as an example,

               2) a clip of the corresponding audio track or

               3) a textual description using keywords.

   6. Handle different kinds of indices. The inexact and subjective nature of multimedia
      data has rendered keyword-based indices and exact and range searches used in
      traditional databases ineffective. For example, the retrieval of records of persons
      based on social security number is precisely defined, but the retrieval of records
      of persons having certain facial features from a database of facial images requires,

                                                                    Multimedia Databases

       content-based queries and similarity-based retrievals. This requires indices that
       are content dependent, in addition to key-word indices.
   7. Develop measures of data similarity that correspond well with perceptual
       similarity. Measures of similarity for different media types need to be quantified
       to correspond well with the perceptual similarity of objects of those data types.
       These need to be incorporated into the search process
   8. Provide transparent view of geographically distributed data. MMDBs are likely to
       be a distributed nature. The media data resides in many different storage units
       possibly spread out geographically. This is partly due to the changing nature of
       computation and computing resources from centralized to networked and
   9. Adhere to real-time constraints for the transmission of media data. Video and
       audio are inherently temporal in nature. For example, the frames of a video need
       to be presented at the rate of at least 30 frames/sec. for the eye to perceive
       continuity in the video.
   10. Synchronize different media types while presenting to user. It is likely that
       different media types corresponding to a single multimedia object are stored in
       different formats, on different devices, and have different rates of transfer. Thus
       they need to be periodically synchronized for presentation.

Multimedia databases are essential for efficient management and effective use of huge
amounts of data. The diversity of applications using multimedia data, the rapidly
changing technology, and the inherent complexities in the semantic representation,
interpretation and comparison for similarity pose many challenges.

2.3 Multimedia database types
There are generally two types of multimedia databases:

      Linked Multimedia

      Databases and Embedded Multimedia Databases

2.3.1 Linked multimedia databases
Multimedia database can be organized as a database of metadata. This metadata links to
the actual data such as graphic, image, animation, audio, sound etc. These data may store
on Hard Disc, CD-ROM, DVD or Online. In this database, multimedia elements are
organized as image, audio/ MP3, video etc. In this multimedia database system, all data
may be stored either on off-line i.e.CD-ROM, Hard Disc, DVD etc. or on Online. One
great advantage of this type of database is that the size of database will be small due to
the reason that multimedia elements are not embedded in the database, but only linked to

                                                                    Multimedia Databases


          Multimedia             Multimedia
            Meta                   Data


                                                              Or DVD

                                                            Or Hard Disc

Figure 2.1: Multimedia Linked Meta database

2.3.2 Embedded multimedia database
Embedded Multimedia Database implies that the database itself contains the multimedia
objects as in the binary form in the database. The main advantage of such kind of
database is that retrieval of data will be faster because of the reduced data access time.
However, the size of the database will be very large.

2.4 MMDBMS (Multimedia database management systems)
Multimedia database management systems are a framework that manages different types
of data potentially represented in a wide diversity of formats on a wide array of media
sources. In order to work successfully, an MMDBMS should have the following abilities:

    Uniformly query data represented in different formats

    Query data represented in diverse media

    Retrieve media objects from a local storage device in a smooth, jitter-free manner.

    Take the answer generated by a query and develop a presentation of that answer
     in terms of audiovisual media.

    Deliver this presentation in a way that satisfies various quality of service

2.4.1 Characteristics of MMDBMS
A MDBMS (Multimedia Database Management System) can be characterized based on
its objectives at the time of handling multimedia objects:

                                                                    Multimedia Databases








2.4.2 Prerequisites for MDBMS Synchronization

Multimedia data refers to the simultaneous use of data in different media forms, including
images audio, texts and numerical data. Many multimedia applications such as recordings
and playback of motion video, video conferencing and slide presentations require
continuous presentations of media data streams. Such synchronization requirements are
specified by either spatial or temporal relationship among multiple data streams. For
example, a motion video and its caption must be synchronized spatially at the appropriate
position in a movie, and in a slide presentation, a sequence of images and speech
fragmentation must be temporally combined and presented to compose unified and
meaningful data streams. Current database systems are not equipped to represent the
entire multimedia data-flow. There are generally two main types of synchronization:

Intra-synchronization and Inter-synchronization

    Intra -synchronization

In order to represent the original data stream to users, synchronization constraints among
media objects must be specified and maintained. Such synchronization is called Intra-

    Inter-synchronization:

If the data stream is composed of media objects from different media streams, additional
complications may arise with the timing relationships. Such media data streams may not
be merged prior to storage in a database. Thus, the synchronization of multiple media
data streams is known as Inter-synchronization, which becomes an essential prerequisite
to any successful multimedia database application. For this reason, synchronization is one
the important factors that should be taken into consideration in order to provide
multimedia applications.

                                                                    Multimedia Databases Time dimension

Multimedia data stream consists of a set of data upon which some time constraints are
imposed. The time constraints may specify discrete, continuous, step-wise constant time
flow relationship among the data. For example, some multimedia streams such as audio
and video are continuous in nature, in that they flow across time; other data stream such
as slide presentation and animations have discrete or step-wise constraints. Therefore,
multimedia streams may not have convenient boundaries for data presentation.

2.4.3 Structure of multimedia database
Multimedia database structure can best be explained with the following components:







Data analysis

Data can be stored in the database in either unformatted (unstructured) form or formatted
(structured) form. Unstructured data are presented in a unit where the content cannot be
retrieved by accessing any structured details. Structured data are stored in variables,
fields or attributes with corresponding values. Multimedia data can be stored in database
as raw, registering and descriptive data types. Raw data are generally represented by the
pixels in the form of a bytes and bits. For example, in image can be represented in pixels
and to get the image it is essential to know the size of the image.

Data modeling

Data model deals with the multimedia objects, which has already been explained in the
previous section. Data model concentrates on conceptual design of the multimedia
database in order to execute certain operations like, media object selection, insertion,
querying and retrieval etc. Time-based multimedia like video, audio and animation
involve notions of data flow, timing, temporal composition and synchronization. These
notions are quite different from conventional data like textual data flow. One of the
gravest problems of multimedia database system is the description of the structure of time
constraint media for querying, updating, retrieval and presentation.

                                                                        Multimedia Databases

Data storage

Multimedia data objects are stored in the database. These are of types: non continuous
media such as static media like text, and images; and continuous media such as dynamic
media. Continuous media data has the real time property while non-continuous data has
not. Therefore, storage mechanism will be different for these types of data. Most of the
continuous media data are stored using separate storage server to meet the real time
constraint requirements. Non-continuous data are stored in the database with meta-
information about the files. In general, data can be stored either in Hard Disc, CD-ROM,
DVD or Online. A storage server that stores a large number of long multimedia
documents must manage huge volume of storage systems that will be constructed in
hierarchical fashion using storage device of various types as described earlier.

Data retrieval

The ultimate objective of any multimedia database is how to access multimedia
information effectively. With respect to access, multimedia objects can be classified into
two -- active and passive objects. The objects, which participate in the retrieval process,
are called active objects. Similarly, the objects, which are not participating in the retrieval
process, are called passive objects. In a really multimedia database environment all
objects should be active objects.

Query language

In order to retrieve multimedia data from database system, query language is provided to
fulfill this purpose. In a DBMS process, user queries are processed by defining a query
language as part of DBMS. It is an unseparated part of DBMS. A multimedia query
language must have ability to handle complex, spatial, and temporal relationships. A
powerful query language should have to deal with keywords, index to keywords and
contents of multimedia objects. Traditional DBMS deals with exact match query.
Generally, there are two types of queries used in databases. They are well-defined query
and fuzzy query. In a well-defined query, the user must know what they are intended to
search. The second one is called fuzzy where the properties of query objects are
ambiguous. In such a situation, multimedia data queries can be divided into the sub -
groups like keyword querying, semantic querying, and visual querying. Keyword
querying is still popular because of its simplicity. Semantic query is the most difficult
query method in terms of its indexing and pattern matching. Visual querying is used in
QBIC (Query By Image Context) through icon leading to content search in the domain of

                                                                     Multimedia Databases

Multimedia communication

Communication is the sole objectives of any information system. Distributed Multimedia
Systems with sophisticated features are capable of satisfying multi-user environment
allowing more than one users to communicate at each other simultaneously.

2.5 Evolving types of Multimedia Databases
There are numerous different types of multimedia databases, including:

      The Authentication Multimedia Database (also known as a Verification
       Multimedia Database, i.e. retina scanning), is a 1:1 data comparison
      The Identification Multimedia Database is a data comparison of one-to-many
       (i.e. passwords and personal identification numbers
      A newly-emerging type of multimedia database is the Biometrics Multimedia
       Database; which specializes in automatic human verification based on the
       algorithms of their behavioral or physiological profile.

       This method of identification is superior to traditional multimedia database
methods requiring the typical input of personal identification numbers and passwords -
due to the fact that the person being identified does not need to be physically present,
where the identification check is taking place. This removes the need for the person being
scanned to remember a PIN or password. Fingerprint identification technology is also
based on this type of multimedia database.

2.6 Other Multimedia Databases
MediaDB is Multimedia Database Management System that creates MMDB applications
to manage large volumes of multimedia objects. MediaDB is an object oriented system
that supports encapsulations, inheritance hierarchies, and object management. It also
supports one-to-many and many-to much relationship between objects.

InterBase is a relational Database that has built-in support for BLOB (binary large
objects). It has its own interface to relational database that it manages. InterBase stores
BLOBs in collection of segments. A segment in InterBase can be thought of as a fixed-
length "page" or I/O block. InterBase provides special API calls to retrieve and modify
segments. It allows user to open-BLOB , create-BLOB, read segment (by get-segment)
and write a segment to BLOB (put-segment).

Sybase's SQL Server supports TEXT and IMAGE data types. TEXT and IMAGE data-
types can be very large up to 2GB. The pages are stored as linked-list. The pages od
TEXT and IMAGES columns are stored separately from tables of database.

                                                                      Multimedia Databases

XDP from Plexus is imaging database engine that provide support for BYTES and
TEXT data types. XDP supports hierarchical storage system, managing magnet disks,
optical disks, optical jukeboxes with online, near-line and offline stored on shelf devices.

MultiMedia Datatypes in UniSQL/X Being an object relational database management
system , UNiSQL supports most of SQL constructs. For multimedia data types UniSqL/X
supports a class hierarchy rooted at generalized large object (GLO) class. This class
serves as the root of multimedia data types and provides a number of built-in attributes
and methods for instantiating and inheriting clients. For content of GLO objects user can
create either Large Object (LO) or a File Based Object(FBO) . An interesting feature is
that recovery is supported for FBOs but consistency is not supported as external
application can also access it. LOs are similar to BLOBs in relational databases. FBOs
are stored in host file system. Two other subclasses are Audio and Image class. Audio
serve as root class of various type of audio classes. Image class is the root of image types,
including 2D and 3D images. It supports number of attributes and methods (also inherited
from parent classes) to support storing and manipulating data.

2.7 Multimedia Database Systems: Where are we now?
A Multimedia Database Management System (MMDBMS) must support multimedia data
types in addition to providing facilities for traditional DBMS functions like database
creation, data modeling, data retrieval, data access and organization, and data
independence. The area and applications have experienced tremendous growth.
Especially with the rapid development of network technology, multimedia database
system gets more tremendous development and multimedia information exchange
becomes very important.

The first wave

The first MMDBMS rely mainly on the operating system for storing and querying files.
These were ad-hoc systems that served mostly as repositories. The mid 90s saw a first
wave of commercial, implemented from-the-scratch, and full-fledged MMDBMS. Some
of them were MediaDB, now MediaWay, JASMINE, and ITASCA. They were all able to
handle diverse kinds of data and provided mechanisms for querying, retrieving, inserting,
and updating data. Most of these products disappeared from the market after some years
of existence, and only some of them continued and adapted themselves successfully to
the hardware and software advances as well as to application changes. For instance,
MediaWay provided early very specific support for a wide variety of different media
types. Specifically different media file formats varying from images and video to
PowerPoint documents can be managed segmented, linked and searched.

                                                                     Multimedia Databases

The second wave

In a second wave, commercial systems were proposed which handle multimedia content
by providing complex object types for various kinds of media. The object-oriented style
provides the facility to define new data types and operators appropriate for the new kinds
of media, such as video, image and audio. Therefore, broadly used commercial
MMDBMSs are extensible Object-Relational DBMS (ORDBMSs), started by Informix.
The current releases are significantly improved in performance and integration into the
core systems. The most advanced solutions are marketed by Oracle 10g, IBM DB2 and
IBM Informix. The IBM DB2 Universal Database Extenders extends the ORDBMS
management to images, video, audio, and spatial objects. All these data types are
modeled, accessed, and manipulated in a common framework. Features of the multimedia
extenders include importing and exporting multimedia objects and their attributes into
and out of a database, controlling access to non-traditional types of data with the same
level of protection as traditional data, and browsing or playing objects retrieved from the

The third wave

The third wave includes currently running projects. They mainly address the needs of
applications for richer semantic content. Most of them rely, thereby, on the new MPEG-
standards MPEG-7 and MPEG-21. MPEG-7 is the ISO/IEC 15938 standard for
multimedia descriptions. It is an XML based multimedia meta-data standard, which
proposes description elements for the multimedia processing cycle from the capture (e.g.,
logging descriptors), analysis/filtering (e.g., descriptors of the MDS, Multimedia
Description Schemes), to the delivery (e.g., media variation descriptors), and interaction
(e.g., user preference descriptors). MPEG-21 is the ISO/IEC 21000 standard defining an
open multimedia framework. Most of the parts are in work. The driving force for MPEG-
21 was the current situation that many elements exist to build an infrastructure for the
delivery and consumption of multimedia content but that there exits no “big picture” to
describe how these elements relate to each others. The vision for MPEG-21 is to define
an open multimedia framework that will enable transparent and augmented use of
multimedia resources across a wide range of networks and devices used by different

2.8 Difficulties Involved with Multimedia Databases

       The difficulty of making these different types of multimedia databases readily
accessible to humans is:

      The tremendous amount of bandwidth they consume;
      Creating Globally-accepted data-handling platforms, such as Joomla, and the
       special considerations that these new multimedia database structures require.

                                                                     Multimedia Databases

      Creating a Globally-accepted operating system , including applicable storage and
       resource management programs need to accommodate the vast Global multimedia
       information hunger.
      Multimedia databases need to take into accommodate various human interfaces to
       handle 3D-interactive objects, in a logically-perceived manner.
      Accommodating the vast resources required to utilize artificial intelligence to it's
       fullest potential- including computer sight and sound analysis methods.
      The historic relational databases (i.e. the Binary Large Objects - BLOBs-
       developed for SQL databases to store multimedia data) do not conveniently
       support content-based searches for multimedia content.

        This is due to the relational database not being able to recognize the internal
structure of a Binary Large Object and therefore internal multimedia data components
cannot be retrieved.

        Basically, a relational database is an "everything or nothing" structure- with files
retrieved and stored as a whole, which makes a relational database completely inefficient
for making multimedia data easily accessible to humans.

       The flip-side of the coin, is that unlike non-multimedia data stored in relational
databases, multimedia data cannot be easily indexed, retrieved or classified, except by
way of social bookmarking and ranking-rating, by actual humans.

        This is made possible by metadata retrieval methods, commonly referred to as
tags, and tagging. This is why you can search for dogs, as an example, and a picture
comes up based on your text search term.

        This is also referred to a schematic mode. Whereas doing a search with a picture
of a dog to locate other dog pictures is referred to as paradigmatic mode.

        However, metadata retrieval, search, and identify methods severely lack in being
able to properly define uniform space and texture descriptions, such as the spatial
relationships between 3D objects, etc.

        The Content-Based Retrieval multimedia database search method (CBR),
however, is specifically based on these types of searches. In other words, if you were to
search an image or sub-image; you would then be shown other images or sub-images that
related in some way to your the particular search, by way of color ratio or pattern, etc.

                                                                       Multimedia Databases

                                     CHAPTER 3

3.1 Relational Databases
Codd proposed the relational data model in 1970. At that time most database systems
were based on one of two older data models (the hierarchical model and the network
model); the relational model revolutionized the database field and largely supplanted
these earlier models. Prototype relational database management systems were developed
in pioneering research projects at IBM and UC-Berkeley by the mid-70s, and several
vendors were offering relational database products shortly thereafter. Today, the
relational model is by far the dominant data model and is the foundation for the leading
DBMS products, including IBM's DB2 family, Informix, Oracle, Sybase, Microsoft's
Access and SQLServer, FoxBase, and Paradox. Relational database systems are
ubiquitous in the marketplace and represent a multibillion dollar industry. The relational
model is very simple and elegant; a database is a collection of one or more relations,
where each relation is a table with rows and columns. This simple tabular representation
enables even novice users to understand the contents of a database, and it permits the use
of simple, high-level languages to query the data. The major advantages of the relational
model over the older data models are its simple data representation and the ease with
which even complex queries can be expressed.

A relational database management system is a system that manipulates tables. A table
consists of rows and columns. It is a database that group’s data using common attributes
found in the data set. A relation is defined as a set of tuples that have the same attributes.
A tuple usually represents an object and information about that object. Objects are
typically physical objects or concepts. A relation is usually described as a table, which is
organized into rows and columns. All the data referenced by an attribute are in the same
domain and conform to the same constraints.

        The relational model specifies that the tuples of a relation have no specific order
and that the tuples, in turn, impose no order on the attributes. Applications access data by
specifying queries, which use operations such as select to identify tuples, project to
identify attributes, and join to combine relations. Relations can be modified using the
insert, delete, and update operators. New tuples can supply explicit values or be derived
from a query. Similarly, queries identify tuples for updating or deleting.

                                                                               Multimedia Databases

Relational Term                                       SQL Equivalent

Relation                                              Table

Derived relation                                      View, query result, result set

Tuple                                                 Row

Attribute                                             Column

Table 3.1: Relational databases terminology

We use the example of student information in a university database to illustrate the parts
of a relation schema:

Students(sid: string, name: string, login: string, age: integer, gpa: real)

This says, for instance, that the field named sid has a domain named string. The set
of values associated with domain string is the set of all character strings.

Figure 3.1: An instance S1 of the students relation

The instance S1 contains six tuples and has, as we expect from the schema, five fields.
Note that no two rows are identical. This is a requirement of the relational model - each
relation is defined to be a set of unique tuples or rows.

Base and derived relations

        In a relational database, all data are stored and accessed via relations. Relations
that store data are called "base relations", and in implementations are called "tables".

                                                                      Multimedia Databases

Other relations do not store data, but are computed by applying relational operations to
other relations. These relations are sometimes called "derived relations". In
implementations these are called "views" or "queries". Derived relations are convenient
in that though they may grab information from several relations, they act as a single
relation. Also, derived relations can be used as an abstraction layer.


       A domain describes the set of possible values for a given attribute. Because a
domain constrains the attribute's values and name, it can be considered constraints.
Mathematically, attaching a domain to an attribute means that "all values for this attribute
must be an element of the specified set."

        The character data value 'ABC', for instance, is not in the integer domain. The
integer value 123, satisfies the domain constraint.


        Constraints allow you to further restrict the domain of an attribute. For instance, a
constraint can restrict a given integer attribute to values between 1 and 10. Constraints
provide one method of implementing business rules in the database. SQL implements
constraint functionality in the form of check constraints.

         Constraints restrict the data that can be stored in relations. These are usually
defined using expressions that result in a boolean value, indicating whether or not the
data satisfies the constraint. Constraints can apply to single attributes, to a tuple
(restricting combinations of attributes) or to an entire relation.

        Since every attribute has an associated domain, there are constraints (domain
constraints). The two principal rules for the relational model are known as entity
integrity and referential integrity.

Foreign keys

        A foreign key is a reference to a key in another relation, meaning that the
referencing tuple has, as one of its attributes, the values of a key in the referenced tuple.
Foreign keys need not have unique values in the referencing relation. Foreign keys
effectively use the values of attributes in the referenced relation to restrict the domain of
one or more attributes in the referencing relation.

                                                                        Multimedia Databases

Stored procedures

        A stored procedure is executable code that is associated with, and generally stored
in, the database. Stored procedures usually collect and customize common operations,
like inserting a tuple into a relation, gathering statistical information about usage patterns,
or encapsulating complex business logic and calculations. Frequently they are used as an
application programming interface (API) for security or simplicity. Implementations of
stored procedures on SQL DBMSs often allow developers to take advantage of
procedural extensions (often vendor-specific) to the standard declarative SQL syntax.

     Stored procedures are not part of the relational database model, but all
commercial implementations include them.


        An index is one way of providing quicker access to data. Indices can be created
on any combination of attributes on a relation. Queries that filter using those attributes
can find matching tuples randomly using the index, without having to check each tuple in
turn. Relational databases typically supply multiple indexing techniques, each of which is
optimal for some combination of data distribution, relation size, and typical access
pattern. B+ trees, R-trees, and bitmaps.

       Indices are usually not considered part of the database, as they are considered an
implementation detail, though indices are usually maintained by the same group that
maintains the other parts of the database.

Relational algebra is one of the two formal query languages associated with the relational
model. Queries in algebra are composed using a collection of operators. A fundamental
property is that every operator in the algebra accepts (one or two) relation instances as
arguments and returns a relation instance as the result. This property makes it easy to
compose operators to form a complex query-a relational algebra expression is
recursively defined to be a relation, a unary algebra operator applied to a single
expression, or a binary algebra operator applied to two expressions. We describe the
basic operators of the algebra (selection, projection, union, cross-product, and
difference), as well as some additional operators that can be defined in terms of
the basic operators but arise frequently enough to warrant special attention, in the
following sections. Each relational query describes a step-by-step procedure for
computing the desired answer, based on the order in which operators are applied in the
query. The procedural
nature of the algebra allows us to think of an algebra expression as a recipe, or a plan, for
evaluating a query, and relational systems in fact use algebra expressions to represent
query evaluation plans.

                                                                       Multimedia Databases

3.2.1 Relational operations

        Queries made against the relational database, and the derived relations in the
database are expressed in a relational calculus or a relational algebra. In his original
relational algebra, Codd introduced eight relational operators in two groups of four
operators each. The first four operators were based on the traditional mathematical set

    Selection

   The selection condition is a expression constructed using the predicate symbols: = , <
   , >, <=, >= , !=. The arguments of these predicates are either constants from the
   attribute domain or names of the attribute names. The selection, or restriction,
   operation retrieves tuples from a relation, limiting the results to only those that meet a
   specific criteria, i.e. a subset in terms of set theory. The SQL equivalent of selection
   is the SELECT query statement with a WHERE clause.

   The following is a table named emp which has the details:

EmpNo              FName            LName              City                Zip            Salary
23538              Smith            Slander            Kansas              209369         40000

36996              Paul             Peregrin           Baltimore           206984         20000

36969              John             Baggins            Massachusetts       208499         35000

38884              Nick             Greyback           Richmond            204400         50000

Table 3.2: emp


   Will yield a result

36969              John             Baggins            Massachusetts       208499         35000

Table 3.3: selection result

    Projection

   The projection operation is essentially a selection operation in which duplicate tuples
   are removed from the result. The SQL GROUP BY clause, or the DISTINCT
   keyword implemented by some SQL dialects, can be used to remove duplicates from
   a result set. Projection simply projects out an entire column. In order to accomplish

                                                                 Multimedia Databases

this, we specify a relation and one or more columns from that relation that we are
interested in.





Table 3.4: Eg. City(emp)

 Cartesian Product

The cartesian product of two relations is a join that is not restricted by any criteria,
resulting in every tuple of the first relation being matched with every tuple of the
second relation. The cartesian product is implemented in SQL as the CROSS JOIN
join operator. Suppose we have two relations R1 and R2 having schemes (A1 to An)
and (A1’ to Am’). The Cartesian product R1 X R2 is got by (A1….An,A1’…..Am’)
and consists of all tuples (t1….tn,t1’…’) such that (t1… is a tuple in R1 and
(t1’….tm’) is a tuple in R2.

Consider 2 tables:

                     Company               Employees
                     ABC                   15000

                     XYZ                   20000

Table 3.5: employee

                     Comp                  City
                     ABC                   Kansas

                     XYZ                   Richmond

                     ABC                   Baltimore

Table 3.6: city

                                                                     Multimedia Databases

The cartesian product of employee and city is:

         Company Employees               Comp            City
         ABC           15000             ABC             Kansas

         ABC           15000             XYZ             Richmond

         ABC           15000             ABC             Baltimore

         XYZ           20000             ABC             Kansas

         XYZ           20000             XYZ             Richmond

         XYZ           20000             ABC             Baltimore

Table 3.7: result of Cartesian product

 Union
The union operator combines the tuples of two relations and removes all duplicate
tuples from the result. The relational union operator is equivalent to the SQL UNION

We say that two relations R1 and R2 are union compatible if they have the same
scheme. i.e., R1(and R2) and contains a tuple t if t is either in relation with R1 or with

                    Comp                    City
                    ABC                     Kansas

                    XYZ                     Richmond

                    ABC                     Baltimore

Table 3.8: comp1

                    Comp                    City
                    DDL                     New York

                    QPR                     Piccadilly

                    ABC                     Kansas

Table 3.9: comp2

                                                                   Multimedia Databases

                    Comp                    City
                    ABC                     Kansas

                    XYZ                     Richmond

                    ABC                     Baltimore

                    DDL                     New York

                    QPR                     Piccadilly

Table 3.10: comp1 U comp2

 Difference
The difference operator acts on two relations and produces the set of tuples from the
first relation that do not exist in the second relation. Difference is implemented in
SQL in the form of the EXCEPT or MINUS operator.

The difference of two relations is denoted by R1 – R2and contains a tuple t if t is in
R1 but not in R2.

Difference of comp1 and comp2 gives:

                    Comp                    City
                    XYZ                     Richmond

                    ABC                     Baltimore

Table 3.11: comp1 – comp2

 Join
The join operation defined for relational databases is often referred to as a natural
join. In this type of join, two relations are connected by their common attributes.
SQL's approximation of a natural join is the INNER JOIN join operator. A
fundamental operation that we perform on tables is the join operation. The join
operation takes two tables as input and also a Boolean condition C. The Boolean
condition C links an attribute from the first table with an attribute in the second table.

                                                                     Multimedia Databases

EmpNo              FName           LName             City                Zip
23538              Smith           Slander           Kansas              209369

36996              Paul            Peregrin          Baltimore           206984

36969              John            Baggins           Massachusetts       208499

38884              Nick            Greyback          Richmond            204400

Table 3.12: employee1

EmpNo              FName           LName             City                Zip
3688               Leela           Skeeter           New York            409579

4698               Paula           Bombadil          Derry               804744

3689               Mary            Galadriel         Philippines         573457

9356               Sheena          Black             Chicago             658357

Table 3.13: employee2

   The result of joining two tables on the condition of EmpNo gives:

EmpNo              FName           LName             City                Zip
23538              Smith           Slander           Kansas              209369

36996              Paul            Peregrin          Baltimore           206984

36969              John            Baggins           Massachusetts       208499

38884              Nick            Greyback          Richmond            204400

3688               Leela           Skeeter           New York            409579

4698               Paula           Bombadil          Derry               804744

3689               Mary            Galadriel         Philippines         573457

9356               Sheena          Black             Chicago             658357

Table 3.14: result of join

                                                                                  Multimedia Databases

         Intersection
        The intersection operator produces the set of tuples that two relations share in
        common. Intersection is implemented in SQL in the form of the INTERSECT

  The intersection of two relations R1 and R2 is got by R1 - ( R1 - R2 ).

  EmpNo                FName                 LName                City                Zip
  23538                Smith                 Slander              Kansas              209369

  36996                Paul                  Peregrin             Baltimore           206984

  36969                John                  Baggins              Massachusetts       208499

  38884                Nick                  Greyback             Richmond            204400

        Table 3.15: emp1

  EmpNo               BP              PF                          IT                 MA
  38838               29000           1000                        3000               2000

  36996               19400           500                         2000               2050

  36969               10600           300                         1500               3060

  38884               25750           900                         2200               2000

  36969               46500           1600                        3600               2900

  32467               36700           1400                        3200               4500

        Table 3.16: emp2

        Intersection of the tables emp1 and emp2 gives:

EmpNo FName             LName         City                  Zip        BP           PF         IT        MA
36996        Paul       Peregrin      Baltimore             206984     19400        500        2000      2050

36969        John       Baggins       Massachusetts         208499     10600        300        1500      3060

38884        Nick       Greyback      Richmond              204400     25750        900        2200      2000

        Table 3.17: result of intersection

                                                                       Multimedia Databases


Relational calculus is an alternative to relational algebra. In contrast to the algebra, which
is procedural, the calculus is nonprocedural, or declarative, in that it allows us to describe
the set of answers without being explicit about how they should be computed. Relational
calculus has had a big influence on the design of commercial query languages such as
SQL and, especially, Query-by-Example (QBE). The variant of the calculus that we
present in detail is called the tuple relational calculus (TRC). Variables in TRC take on
tuples as values. In another variant called the domain relational calculus (DRC), the
variables range over field values. TRC has had more of an influence on SQL, while DRC
has strongly influenced QBE.

3.3.1 Tuple Relational Calculus

A tuple variable is a variable that takes on tuples of a particular relation schema as
values. That is, every value assigned to a given tuple variable has the same number and
type of fields. A tuple relational calculus query has the form { T | p(T) }, where T is a
tuple variable and p(T) denotes a formula that describes T; we will shortly define
formulas and queries rigorously. The result of this query is the set of all tuples t for which
the formula p(T) evaluates to true with T = t. The language for writing formulas p(T) is
thus at the heart of TRC and is essentially a simple subset of first-order logic. As a
simple example, consider the following query.

Find all sailors with a rating above 7.

{ S | S є Sailors ^ S.rating > 7 }

When this query is evaluated on an instance of the Sailors relation, the tuple variable S is
instantiated successively with each tuple, and the test S.rating>7 is applied. The answer
contains those instances of S that pass this test. On instance S3 of Sailors, the answer
contains Sailors tuples with sid 31, 32, 58, 71, and 74.

3.3.2 Syntax of TRC Queries
We now define these concepts formally, beginning with the notion of a formula. Let
Rel be a relation name, R and S be tuple variables, a an attribute of R, and b an
attribute of S. Let op denote an operator in the set { < ; > ; = ; ≥ ; ≤ ; ≠ }. An atomic
formula is one of the following:

         є
       R Rel
      R.a op S.b
      R.a op constant, or constant op R.a

                                                                      Multimedia Databases

A formula is recursively defined to be one of the following, where p and q are
themselves formulas, and p(R) denotes a formula in which the variable R appears: any
atomic formula

┐p, p Λ q, p V q, or p => q
ЭR(p(R)), where R is a tuple variable
V R(p(R)), where R is a tuple variable

In the last two clauses above, the quantifiers Э and V are said to bind the variable
R. A variable is said to be free in a formula or subformula (a formula contained in a
larger formula) if the (sub)formula does not contain an occurrence of a quantifier that
binds it.

We observe that every variable in a TRC formula appears in a subformula that is atomic,
and every relation schema specifies a domain for each field; this observation ensures that
each variable in a TRC formula has a well-defined domain from which values for the
variable are drawn. That is, each variable has a well-defined type, in the programming
language sense. Informally, an atomic formula R є Rel gives R the type of tuples in Rel,
and comparisons such as R.a op S.b and R.a op constant induce type restrictions on the
field R.a. If a variable R does not appear in an atomic formula of the form R є Rel (i.e., it
appears only in atomic formulas that are comparisons), we will follow the convention that
the type of R is a tuple whose fields include all (and only) fields of R that appear in the
We will not define types of variables formally, but the type of a variable should be clear
in most cases, and the important point to note is that comparisons of values having
different types should always fail. (In discussions of relational calculus, the simplifying
assumption is often made that there is a single domain of constants and that this is the
domain associated with each field of each relation.)

A TRC query is defined to be expression of the form { T | p(T) }, where T is the only
free variable in the formula p.

3.3.3 Semantics of TRC Queries

What does a TRC query mean? More precisely, what is the set of answer tuples for a
given TRC query? The answer to a TRC query { T | p(T) }, as we noted earlier, is the
set of all tuples t for which the formula p(T) evaluates to true with variable T assigned
the tuple value t. To complete this definition, we must state which assignments of tuple
values to the free variables in a formula make the formula evaluate to true.

A query is evaluated on a given instance of the database. Let each free variable in a
formula F be bound to a tuple value. For the given assignment of tuples to variables, with
respect to the given database instance, F evaluates to (or simply `is') true if one of the
following holds:

                                                                       Multimedia Databases

   o F is an atomic formula R є Rel, and R is assigned a tuple in the instance of
     relation Rel.

   o F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples
     assigned to R and S have field values R.a and S.b that make the comparison true.

   o F is of the form ┐p, and p is not true; or of the form p ^ q, and both p and q are
     true; or of the form p V q, and one of them is true, or of the form p => q and q is
     true whenever p is true.

   o F is of the form ЭR(p(R)), and there is some assignment of tuples to the free
     variables in p(R), including the variable R, that makes the formula p(R) true.

   o F is of the form V R(p(R)), and there is some assignment of tuples to the free
     variables in p(R) that makes the formula p(R) true no matter what tuple is
     assigned to R.

3.3.4 Domain Relational Calculus
A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute whose
domain is the set of integers). A DRC query has the form { (x1, x2…. xn) | p (x1, x2….
xn)}, where each xi is either a domain variable or a constant and p (x1, x2…. xn)denotes a
DRC formula whose only free variables are the variables among the xi , 1 ≤ i ≤ n. The
result of this query is the set of all tuples (x1, x2…. xn) for which the formula evaluates to

A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are now domain variables. Let op
denote an operator in the set { < ; > ; = ; ≥ ; ≤ ; ≠ }and let X and Y be domain variables.
An atomic formula in DRC is one of the following:

   o   (x1, x2…. xn) є Rel, where Rel is a relation with n attributes; each
       xi , 1 ≤ i ≤ n is either a variable or a constant.
   o X op Y

   o X op constant, or constant op X

A formula is recursively defined to be one of the following, where p and q are
themselves formulas, and p(X) denotes a formula in which the variable X appears:

                                                                  Multimedia Databases

Any atomic formula

┐p, p Λ q, p V q, or p => q

ЭX(p(X)), where X is a domain variable

V X(p(X
)), where X is a domain variable

3.4 Normalization
Normalization was first proposed by Codd as an integral part of the relational model. It
encompasses a set of best practices designed to eliminate the duplication of data, which
in turn prevents data manipulation anomalies and loss of data integrity. The most
common forms of normalization applied to databases are called the normal forms.
Normalization trades reducing redundancy for increased information entropy.

                                                                      Multimedia Databases

                                     CHAPTER 4


       Multimedia data types include numbers, text, graphics, animations, image, audio,
and video. However, a computer can only handle digital data that represents either 0 or 1.

Computer graphics are represented using the coordinates on the displaying screen.
Computer animations are performed by updating changes to the frame buffers and these
changes are then drawn on the displaying screen. Images are represented as two-
dimensional pixels of colors. Each color pixel can be described using color
representations RGB, YUV, YCbCr, or CMYK.

Sound waves need to be accepted and digitized into digital signals for computer
processing. The digitization of analog waves is done by taking sample values at a fixed
sampling rate. The quality of the digitized sound is mainly determined by the number of
sample values and the sampling rate.

Video are represented as an array of image frames. 24 to 30 frames should be displayed
per second to show full continuous motions. High definition televisions use a very high
frame rate of around 60 frames per second.

4.1 Numbers and Text
In computers, positive integers are represented as a number with the base of 2 instead of a
base of 10. Negative integers are represented in the 2s complement form. Real numbers
are divided into mantissa and exponent such that the significant digits are represented in
the mantissa.

Each text character is represented by eight bits called a byte in the computer. For ASCII
representation, the binary byte of “0100 0001” represents an A, and “0100 0010” means
B, and so on. An English word is thus represented by a string of bytes.

4.2 Graphics
Each position on the screen is specified as a coordinate (x, y), x-axis from left to right
and y-axis from top to bottom. For example, in an 800 x 600 screen, the top left corner is
(0, 0), the top right corner is (800, 0), the bottom right corner is (800, 600), and the
bottom left corner is (0 ,600).

A line is represented by a pair of coordinates. A curve is represented by a list of
coordinates of the starting point, several turning points, and the end point. A circle can be
represented by the coordinate of the centre and the length of the radius

                                                                   Multimedia Databases

Figure 4.1: Simple Graphics

4.3 Animations
Computers use graphic tools to provide visual effects in a frame buffer. The frame buffer
is changed continuously by the animation program. It scans, converts, erases, and redraws
the graphic image. These changes are repeatedly drawn on the display to appear like
continuous motions. Normally, the animation program should make 15-20 changes per

That is, the program has around 50 milliseconds to update the frame buffer. If the
animation updates are running too fast, the viewer may not be able to see the changes
clearly. If the animation updates are running too slowly, the display may become jerky.

Figure 4.2: Animation

Figure 4.3: The animation program scans, converts, erases, and redraws the graphic
image within 50 msec.

                                                                        Multimedia Databases

Figure 4.4: Double Buffering

When the processes of scan, convert, erase, and redraw operations take longer than 50
msec, the program may use the double buffering technique. The frame buffer is divided
into two parts. Each part is used to store half the bits per pixel of the overall frame buffer.
In this way, the erase and redraw process of the first half frame buffer overlaps with the
scan and convert process of the second half frame. Each process may then have more
time to modify the frame buffer.

4.4 Images
An image is represented as a two-dimensional array of sample points called pixels.
Different from the coordinates used in mathematics, the Y coordinate increases in the
downward direction. For example, Figure 2.12 shows a 320 x 200 image that has 320
pixels on each horizontal line and 200 pixels on each vertical line. The coordinates of the
top left corner are (1, 1). The coordinates of the top right corner are (320, 1). The
coordinates of the bottom left corner are (1, 200). The coordinates of the top left corner
are (320, 200).

4.4.1 Image Bits per Pixel
Different images may use a different number of bits per pixel. The black and white image
(B&W) format uses only one bit per pixel. This B&W image format is widely used in
facsimile images. In elementary computer graphics with 16 different colors, four bits are
required to describe in each pixel.

Figure 4.5: A 320 X 200 image

                                                                     Multimedia Databases

Figure 4.6: RGB Representation

The grey-scale image format uses 8 bits per pixel, and it can describe 256 different levels
of color intensity in each pixel. This image format can be used in black and white photos.

Full color images are described using 16 to 24 bits per pixel. They can be represented
using three different representations: RGB, YUV, and YCbCr format. These
representations are described in more details in the paragraphs below.

4.4.2 RGB Representation
Our eyes have three classes of receptors called rods. Each type of rod has a different
sensitivity to three colors: red, green, and blue. The trichromatic theory states that the
sensation of color is produced by selectively exciting the different types of rods. Thus,
each pixel is represented by the intensity of red, green, and blue. Each intensity value is
usually coded with eight bits to the grey-scale range of [0,255].

4.4.3 YUV Representation
The cones in our eyes are very sensitive to brightness in a dark environment. Human
perception is more sensitive to brightness than any color information. YUV separates
brightness information (luminance Y) from the color information (chrominance U and V)

Y = 0.3R + 0.6G+ 0.1B



The luminance value can be coded with more bits than the chrominance values; for
example, the number of bits may be in the ratio of (4: 2: 2).

                                                                       Multimedia Databases

4.4.4 YCbCr Representation
The YCbCr representation is similar to the YUV representation. It is used in the JPEG
compression. In the YCbCr representation:

Y = 0.3R + 0.6G + 0.1B

Cb = U/2 + 0.5

Cr = V/1.6 +0.5

Each of these values is scaled and zero shifted to the range [0, 1].

4.4.5 Representation for Printing - CMYK
When images are being printed, the CMYK representation is used to print the images in
colors. The four colors are Cyan, Magenta, Yellow, and Black. Each dot is printed as the
combination of these four colors at different intensities.

4.5 Sound and Audio

Figure 4.7: Sound wave

4.5.1 Concept
Sound is a longitudinal wave of air pressure. Sound is characterized by the pitch and
loudness. Like other waveforms, sound can be represented by a combination of waves
with frequency and amplitude. Frequency of the wave measures the pitch of the sound,
and the amplitude of the wave measures the loudness of the sound.

Wavelength is the distance between repeating units of a waveform. Frequency is the
number of occurrences of a repeating event per unit time, and it is inversely related to the
wavelength. While wavelengths are measured in units of meters, frequency is measured
in units of Hertz (Hz), where 1Hz = 1/second. The frequencies of some common ranges
of sound waves are:

                                                                    Multimedia Databases

• Infra sound: 0-20 Hz

• Human hearing: 20Hz -20KHz

• Ultrasound: 20KHz – 1 GHz

• Hypersound: 1GHz – 10 THz

The amplitude of a sound wave measures the loudness of the sound. Amplitude is
measured in units of bell or decibel (dB). Different sound amplitudes have different
effects on us:

• The background noise usually has low sound amplitude. It is difficult to hear clearly,
and we ignore these low amplitude sounds.

• The speaking level is normal amplitude sound.

• When the sound amplitude is too high, it is uncomfortable to our ears.

Figure 4.8: amplitude of sound wave

Figure 4.9: Computer processing of sound

                                                                      Multimedia Databases

4.5.2 Computer Processing of Sound
Computers cannot process sound waves without converting the sound waves into digital
signals. In order to process sound, the sound needs to be processed. Sound waves are
accepted from the microphone as analog electronic signals. An analog-to-digital (A/D)
converter converts the analog electronic signals to digital signals in binary representation.
The computer can thus store and process the binary data. After processing, the computer
may output binary data as digital signals.

A digital-to-analog (D/A) converter does the reverse operation of changing the digital
signals back to analog electronic signals. The speakers can then output the signals as
sound waves to be heard.

Figure 4.10: Digitization of Sound Wave

4.5.3 Digitization of Sound Wave
A digitization process is used to convert the analog signals into digital signals. The A/D
converter takes sample values at different times of the analog wave according to a
sampling rate. The sampling rate in number of samples per second is usually fixed. The
amplitude values of the analog signal of a cycle are then taken as the digital data.

When the digitized sound is output, the analog wave is generated at the same sampling
rate. The amplitude of the analog wave is adjusted according to the values of the digital
data. The reproduced analog wave resembles the original analog wave before digitization.

The reproduced analog wave may not be the same as the original analog wave before
digitization. In order to reproduce the analog wave, the sampling rate must be at least
more than or equal to twice of the frequency of the analog wave. If the sampling rate is
lower than twice of the frequency, then some data would be lost.

4.5.4 Sample Values
The sample values can be encoded with more or less bits. If more bits are used to
describe each sample value, the amplitude of the analog wave is described in finer details.
If the sample values are encoded in 8 bits, then 256 different amplitude values can be

                                                                     Multimedia Databases

described. If the sample values are encoded in 16 bits, then 65,536 different amplitude
values can be described. When more bits are used to describe the sample values, the
sound quality would be higher. However, more data would need to be stored and

4.5.5 Standard Audio/Sound Formats
Different sound needs to be represented in different quality levels. The high sound quality
level needs to be described with more sample values at high sampling rate. Thus, high
quality sound is described with more bits per second.

For example, telephone quality is sufficient for normal speech, and CD quality is required
for audio music and songs:

Telephone quality speech takes 8,000 samples/second and 8 bits/sample.

• CD-quality audio has 2 channels, a left channel and a right channel, taking 44,100
samples/second and 16 bits /sample.

• DVD quality audio has 6 channels, including a front-left channel, a front-right channel,
a front-centre channel, a back-left channel, a backright channel, and a subwoofer channel.

MIDI Format

Apart from the encoding of nature and recorded sound, audio may be encoded using
music scores. The musical instrument digital interface (MIDI) is a digital encoding
format of musical information. In the MIDI format, the sound data are not necessary.
Only the commands, that is, music scores, that describe how the music should be played
are encoded.

Figure 4.11: MIDI Format

The MIDI format uses the smallest number of bits/second to describe the music. If
recorded audio can be compressed into the MIDI format, it would have achieved the
highest compression ratio. Since the music score describes how the music is played, a
music score file in MIDI format can be edited easily. However, the MIDI format only
describes the music score that can be easily understood by human beings. It requires a
music synthesizer to generate music.

                                                                     Multimedia Databases

4.6 Video
The concept of video leads us to the data representation of video. The video frame rates
and the aspect ratio determine the quality of the video. The viewer should watch video at
the most suitable viewing distance. Lastly, the video formats that are used in computers
are described.

Figure 4.12: Video Format

4.6.1 Video Concept
Since the creation of movies, a video is represented as a list of images called frames.
Each image frame is separated from the previous frame by a time interval. Each frame
may have an image that has some difference from the image in the previous frame. The
images of consecutive frames are usually slightly different. The video would then exhibit
some continuous motion over time.

At camera cuts, the images of two consecutive frames may be completely different.
Before and after a camera cut, the consecutive frames should only be different slightly
again. If all consecutive frames have completely different images, the video exhibits a
chaotic scene which can be unpleasant to view.

4.6.2 Video Frame Rates
The video frame rate is the number of frames that are displayed per unit time. It is usually
described in number of frames per second. The video frame rate has an important impact
on the video quality. Our human eyes hold the captured vision for a very short period of
time. If the frame rate is high enough, the viewer would observe a continuous motion. If
the frame rate is too low, the viewer would observe freeze in the video. In order to show
continuous motion, the video frame rate should have at least 15 frames per second. For
full motion video, 30 frames per second are necessary.

Some video frame rates have been standardized. For movies in the cinema, 24frames are
displayed per second. The PAL TV standard in the UK, Australia, and Hong Kong
displays 25 frames per second. The NTSC TV standard in Japan and the U.S. uses 29.97
frames per second. The High Definition Television (HDTV) displays 59.94 frames per

                                                                     Multimedia Databases

4.6.3 Computer Video Formats
Computers display their output to the screen. The graphic cards inside the computer
control the video format of the display screen. Some common standard computer video
formats are CGA, EGA, VGA, XGA, and SVGA. The color graphics array (CGA) format
uses the resolution of 320 x 200 pixels. Each pixel has 4 colors and it is described with 2
bits. Thus, each image is described with (320x200) pixels x 2 bits/pixel = 15.625
kilobytes (KB).

The enhanced graphics array (EGA) format uses the resolution of 640 x 350 pixels per
image. Each pixel uses 4 bits to describe the 16 colors. Thus, each image is described
with (640x350) pixels x 4 bits/pixel = 109.375 KB.

The video graphics array (VGA) format uses the resolution of 640 x 480 pixels. Each
pixel requires 8 bits to show 256 different colors. Thus, each image is described with
(640x480) pixels x 8 bits/pixel = 300 KB.

The extended graphics array (XGA) format uses the two different resolutions with
different numbers of color. It may use the resolution of 640 x 480 pixels with 65,536
colors, and each pixel is described in 16 bits. Thus, each image is described with
(640x480) pixels x 16 bits/pixel = 600 KB. Alternatively, it may use the resolution of
1024 x 768 pixels with 256 colors, and each pixel is described with 8 bits. Each image is
then described using (1024x768) pixels x 8 bits/pixel = 768 KB.

The super video graphic array (SVGA) format also uses two different resolutions. It may
use the resolution of 800 x 600 pixels with 16,777,216 colors, and each pixel is described
in 24 bits. Thus, each image is described with (800x600) pixels x 24 bits/pixel = 1.37
megabytes (MB). Alternatively, it may use the resolution of 1024 x 768 pixels with
65,536 colors, and each pixel is described with 16 bits. Each image is then described
using (1024x768) pixels x 16 bits/pixel = 1.5 MB.

                                                                     Multimedia Databases

                                    CHAPTER 5


        Multimedia systems are similar to traditional computer systems in terms of their
architectures. Both types of systems have central processing unit (CPU), random access
memory, hard disks, and so forth. The CPU connects to the memory and other
components via the memory bus, and it connects to the peripherals via the input/output
(I/O) bus.

In order to process continuous multimedia streams, multimedia computer systems are
built with stringent processing time requirements. Each component of the computer
system needs to be able to process large amounts of data, process data in parallel, and
finish the processing within a guaranteed time frame. Otherwise, undesirable effects
would appear to lower the quality of the multimedia streams.

 When storage servers are designed to handle multimedia streams, the architecture of the
storage servers also needs to handle the processing time requirements. The storage server
needs to access data continuously to the clients according to the clients’ requests.

Multimedia objects are large, and the magnetic hard disks need to access segments of the
objects within a short time. These requirements lead to the emergence of constant
recording density disks and zoned disks.

5.1 Server Architectures
Multimedia servers need to provide continuous delivery of multimedia objects to the
clients. The remote clients are usually connected through a local area network or several
networks. The Internet today is a best effort network, and it does not provide any service
guarantees to multimedia streams. Thus, the present technology uses dedicated networks
to deliver the streams. The dedicated networks, such as cable TV, are able to deliver
multimedia streams in a controllable environment.

Multimedia servers store many objects in their storage. They need to access the objects
and deliver the objects according to the requests from many clients. The storage server
should access and deliver the objects efficiently in order to maintain the quality of the

5.1.1 Simple Multimedia Server System
The storage server or storage system is composed of a storage subsystem and a processor
subsystem. The processor subsystem serves requests from the clients via the network. It
maintains the quality of streams that are delivered to the clients. When data are required,
it sends requests to the storage subsystem. The main responsibility of the storage

                                                                    Multimedia Databases

subsystem is to store the multimedia objects. All the multimedia objects are stored on the
storage devices in the storage subsystem. The storage subsystem serves data requests
from the processor subsystem. The main reason to separate the storage subsystem from
the processor subsystem is because of the workload. Since the object

Figure 5.1: A simple multimedia server system

The workload on the storage subsystem is thus heavy. If the storage subsystem and the
processor subsystem are running on the same server, the application server’s ability to
respond interactively to the users will be adversely affected. The user may need to wait a
long time for a very simple mouse click.

The processor subsystem is composed of three servers: the application server, the
scheduling server, and the data server. The application server receives requests from the
clients and provides a response back to them. The scheduling server divides a request
stream into a number of requests. It then schedules the requests in a timely manner. It
sends the requests to the data server. The data server searches for the location of the
requested object and forwards the requests to the storage subsystem.

When the storage subsystem serve a read request from the data server, reads the object
from the storage device, and passes the accessed data to the data server. When the storage
subsystem serves a write request, it writes the object to the storage devices. Most
multimedia clients only access the objects for viewing purposes only. Since multimedia
objects are often read and played to users, most requests would only read the object from
the server. Thus, the main concern on the storage subsystem is on the read operations
only even though the storage subsystem provides both read and write operations.

When the data server receives data from the storage subsystem, it directly passes the data
to the clients via the network. It will then send another data request to the storage
subsystem at the time controlled by the scheduling server.

                                                                        Multimedia Databases

5.1.2 Distributed Multimedia Server System
A single multimedia server system may be able to serve 1 to 2,000 client streams. When
more streams need to be served or more objects need to be stored, a large multimedia
server system consisting of multiple servers is required. A distributed multimedia server
system has five objectives:

1. To store more objects

2. To serve more clients

3. To reduce the network contention

4. To spread out the network contention

5. To balance the server workloads

A multimedia server that has the accessed object may not be able to serve a client stream
for two reasons. First, if the server is overloaded, the server does not have disk bandwidth
to access the object from the storage subsystems. Second, if the network around the
server is already congested, the server does not have network bandwidth to deliver the
object to the client.

In either situation, the server shall reject the client stream even though it has the object on
its storage devices.

The first objective is to store more objects. Several servers in the distributed multimedia
server system have more disks to store more objects than a simple multimedia server. To
store the most number of objects, the storage space on the servers should be used
carefully. Extra copies of objects may be created according to their access popularity.
When a new object is stored, the extra copies of objects may be deleted to release storage
space for the new object.

The second objective is to serve more client streams. Unless all the requests are served by
only one server, a distributed server system can serve more client streams than a single
server. In order to serve the most number of streams, the objects should be distributed so
that the requests are evenly spread to the servers. Therefore, the workloads on the servers
should be well balanced.

The third objective is to reduce the network workload. The workload on the network also
depends on the distance from the servers to their requesting clients. If the server is far
from a requesting client, the data need to be transmitted over a long distance from the
server to the client. The workload imposed on the network is then heavy. If the server is
close to the requesting client, the data can be transmitted over the smallest number of
hops from the server to the client. The workload imposed on the network is then light.

                                                                     Multimedia Databases

In distributed systems, the server that is closest to the requesting client may be chosen to
deliver the request stream. Thus, the imposed workload on the network would be

The fourth objective is to spread out the network contention. If the servers are close to
each other, they would send packets from nearby routers on the network. When the
servers are busily serving clients, the workload on the network around these routers
becomes heavy. If the servers are far from each other, then the routes from these servers
to their serving clients may not overlap.

Thus, the workload on the network can be spread out to more routes. The fifth objective
is to balance the server workloads. While a server is busily serving some streams, it may
not have sufficient resources, such as disk load, to serve any additional new stream. New
streams will then need to wait. If other servers are available to serve this stream, the new
stream can be served immediately. The workload on the busy server is then transferred to
the other servers. Thus, the server workloads can thus be balanced.

In general, a distributed multimedia server system is composed of multimedia servers,
clients, and the network. Multimedia objects are stored on the simple multimedia servers.
The servers are connected to the network. Clients send requests to the multimedia servers
over the network.

Figure 5.2: A distributed multimedia server system

The multimedia servers then serve the client requests and deliver object streams to the

Several options are available to build a distributed multimedia server system.

                                                                      Multimedia Databases

Here are some choices in building the system architecture:

1. Multiple independent servers share their storage to store the objects

2. A depot system to direct request to appropriate server

3. A reverse proxy server in front to balance the workload

4. A storage area network to spread the workload

5. A distributed server system to balance the server workload and spread out the network

6. A content distribution system to balance server and network workload

If the server system is simply a list of independent multimedia servers, then the clients
need to know which server a particular object resides in. In addition, some servers
containing hot objects may be overloaded while other servers containing cold objects are
idle. Thus, some mechanisms need to be applied so that these servers operate like a single
server system to the users.

A depot system may be placed in front of the servers to direct the client requests to the
appropriate server. Such a depot server may deliver a new client request to an idle server
or the less busy server. The servers would then serve the requests directly.

A reverse proxy server is placed in front of the multimedia servers to receive client
requests. It may redirect requests to the appropriate server containing the accessed object.
If the accessed object resides in more than one server, the reverse proxy server may
redirect requests to the most lightly loaded server. When data are delivered from the
server, the reverse proxy server may create a local cache copy. When the same object is
accessed again by the same client or other clients, the reverse proxy server may then
serve the repeated accesses from its local cache.

A storage area network has several servers that are connected to each other via fibre
channels. These servers together operate like a single server with higher capacity. The
storage area network redirects requests to the appropriate storage device. The storage
device then serves the data access request.

A distributed servers system has several storage servers and these servers reside at
different geographical locations. The objects divided into segments and these segments
are distributed over several servers. It operates like a single multimedia server system to
the users. A client may send requests to the application server. The application server
would then identify the segments and the server containing the segments of the object.
The appropriate server then delivers data segments directly to the requesting clients. Each
segment of the server may be delivered from a different server. The distributed server
system thus balances the workload among servers and spreads out the workload on the

                                                                      Multimedia Databases

A content distribution system is composed of several storage servers. A client may send
requests to one of the servers. The server system then chooses the server that is closest to
the client to deliver the object to the client. If this closest server does not have the
required object, it will access the object from other servers and keep a copy in its storage.
After some time, each storage server will store the objects that are recently or frequently
accessed by its neighboring clients.

In multimedia database systems, a client who is connected to the network sends queries
to the database system. The database system then looks up the index tables and finds the
objects that can satisfy the query. The data server then sends a few most relevant objects
to the user for preview.

The user may then select the most relevant objects for display. The multimedia server or
multimedia server system then delivers the chosen object to the user.

Figure 5.3: Multimedia database server system

5.2 Input/Output Processors
Inside the computer system, data are stored on different storage devices depending on
their usage requirements. Permanent data are stored on the hard magnetic disks.
Temporary data are stored on the random access memory (RAM) or memory. Frequently
accessed data are temporarily stored on the cache memory for quick accesses. Data are
either read or written to these storage devices by the running user programs or operating
system programs.

Traditional computer systems run programs when they are invoked by users or timer
events. A job task is a fragment of codes belonging to a running program and it is
executed by the CPU. A program may invoke one or more job tasks. Many tasks
belonging to different programs are concurrently executed by the CPU. Since the CPU
can serve only one job task at any one particular moment, the tasks are served on a time-

                                                                     Multimedia Databases

slice manner. After the CPU serves a task for one unit of time, it switches to another task.
The order of service is determined by the job scheduling policy.

When a task arrives at a code to receive input from the keyboard, output to the screen,
read from hard disk, or other input/output operations, the running task will be suspended
and put into the waiting queue until the I/O instruction is finished. The CPU then resumes
the suspended task and continues the task after the I/O operation. Inside the computer
system, the memory bus connects all the main components, including the CPU and

Figure 5.4: I/O Processor

Other peripheral devices and the hard disks are connected to the I/O bus. An input/output
processor (IOP) connects the I/O bus to the memory bus. Since the input and output
devices are very slow devices when compared to the memory and CPU. The memory bus
would be very slow if the I/O devices are directly connected. With the help of the I/O
processor, the I/O devices can communicate with the CPU and memory without slowing
them down.

When the CPU executes a line of code that performs an I/O instruction, it works with the
I/O processor to execute the I/O instruction in four steps:

1. The CPU issues an I/O instruction to the I/O processor.

2. The I/O processor reads a command from memory.

3. The I/O processor transfers data to/from memory directly.

4. The I/O processor sends an interrupt to CPU when done.

                                                                     Multimedia Databases

In the first step, the CPU issues an I/O instruction to the I/O processor. The I/O
instruction is composed of the operation code (OP), the target device number (device),
and the command address (address).

The operation code specifies which command to execute. The device specifies the target
device number. The address contains the address location of the I/O command inside the

Figure 5.5: Step 1 - The CPU issues an I/O instruction to the I/O processor

In the second step, the I/O processor looks in the memory for the command. The
command is composed of four fields: the OP field, the Addr field, the Cnt field, and
Other field. The OP field specifies what to do. The Addr field specifies where to put data.
The Cnt field specifies the count of how much data can be accessed by the command.
The Other field only specifies details of the command. The I/O processor then reads the
command from memory and executes the command.

                                                                    Multimedia Databases

Figure 5.6: Step 2 - The I/O processor reads a command from memory

In the third step, the I/O processor executes the command. Most I/O commands need to
access memory. When data are transferred, the I/O processor directly transfers data to
and from the memory without interfering with the CPU. When a sector is read from the
disk, a sector of data (512 bytes) is read from the disk and directly transferred to the

When the I/O command has finished, the I/O processor executes the last step. It sends an
interrupt to the CPU. When the CPU receives this interrupt, it executes the interrupt in a
preemptive manner. The CPU suspends the currently running task even though the task
has not been executed for one time unit. It then performs the O/S command for the I/O
interrupt. The job task that issues the I/O instruction is resumed. The task is removed
from the list of suspended tasks and placed in the list waiting for CPU. The CPU then
resumes the previously suspended task and continues to serve it.

                                                                    Multimedia Databases

Figure 5.7: Step 3 - The I/O processor transfers data to/from memory directly.

Figure 5.8: Step 4 - The I/O processor sends an interrupt to CPU when done.

                                                                        Multimedia Databases

                                      CHAPTER 6

                      STORAGE DEVICES
6.1 Magnetic Disks
Magnetic disks are inexpensive disks. The storage device is inexpensive because it stores
data using two-dimensional circular disk platter and the disk platters are stacked up on
the third dimension. Magnetic disks are composed of disk platters and read/write heads.
The disk platters are connected together at the centre on a spindle. When the spindle
rotates, all the disk platters move at the same speed.

Figure 6.1: Magnetic disks

The read/write heads are supported by disk arms. The disks look like a hair comb
structure in which each read/write head is a tip of the comb. Each read/ write head is
laced above a surface top of a disk platter. When the disk platters rotate, the heads hover
at a very thin layer of air above the disk surface. While the read/write heads are fixed and
the disk platters are rotating, each head forms a circle on the corresponding disk platter
surface. These circles are the tracks when data are written onto the disk surface. These
tracks are circular in shape. The shorter tracks that are closer to the centre of the disks are
called inner tracks. The longer tracks that are near the circumference of the disks are
called outer tracks. All the tracks on different surfaces with the same radius together form
a cylinder.

When data are accessed, the disk takes the following steps:

1. All read/write heads move together at a direction perpendicular to the circumference of
the circular tracks until the heads reach the required cylinder.

2. The control servo waits for the read/write heads to settle above the required cylinder
after the movement.

3. The head above the required tracks within the cylinder is chosen.

                                                                        Multimedia Databases

4. The heads then wait for the rotation of the disk until the beginning of the required data
on the track come under the head.

5. The I/O path from the disk controller to the memory is established.

6. When the beginning of the required data comes under the head, data are immediately
transferred between the disk and the memory.

Data are written in units of 512 bytes. Each unit of 512 bytes is called a sector.

When the read/write head is above a track, it can access all the data on this track by
waiting for the disk to rotate. At any moment, only one of the read/write heads can
transfer data. When the read/write head is fixed, it can access all the data on the cylinder
by choosing the appropriate read/write heads.

Traditionally, the magnetic disks rotate at a fixed angular speed and the read/write heads
transfer data at a fixed speed. All the tracks store the same number of bytes. When the
heads are close to the disk centre, the length of the circular tracks is short and data bits on
the tracks are densely written.

When the heads are far from the disk centre, the e tracks are longer in length and data bits
on the tracks are sparsely written. Thus, the recording density varies when the heads are
close to or far from the centre of disks. Thus, the traditional disk recording format is
called variable density recording. In these traditional magnetic disks, the disk platters
simply rotate at fixed speed. However, it does not fully utilize the storage capacity of the
long outer tracks. In order to store more data on the outer tracks, the constant recoding
density method is widely accepted in recent years. The constant density recording format
stores more data on the longer outer tracks and less data on the shorter inner tracks. This
constant density recording is applied in two layouts: the zoned disk layout and the spiral
track layout. These two layouts are described in the paragraphs below.

For mobile devices, the storage devices need to be small, compact, and light. The
millipede disks and the nanodisks are products that address these requirements.

6.2 Zoned Disks
Magnetic disks use the zoned disk format to increase their storage capacities. The disk
surface of magnetic zoned disks is divided into zones. Each zone is a group of
neighboring tracks within a range of radii. Thus, each zone is a ring-shaped region on the
disk surface. Within a zone, the disks operate like a variable density recording disk. The
disks rotate at a fixed angular speed. Thus, all the tracks within a zone store the same
number of sectors and the number of sectors per track is fixed within a zone. To store the
maximum number of sectors within a zone, the innermost track within the zone should
store the most sectors. Other tracks in the same zone then store the same number of

Since the innermost track of the inner zones are shorter than the innermost track of the
outer zones, tracks of the inner zones store less data than the tracks of the outer zones.

                                                                     Multimedia Databases

Although the number of sectors per track is fixed within a zone, each zone may have a
different number of tracks. The storage capacity of a zone is found as the product of the
storage capacity of a track and the number of tracks within the zone.

Figure 6.2: Zoned Disk format

In addition, the I/O path transfers data at a fixed number of bits per second and the disks
rotate at a fixed speed. All the data on one track can be accessed by one disk revolution.
Thus, the data transfer rate within a zone is fixed. Since the track capacity of outer zones
is larger than the track capacity of the inner zones, data are transferred faster when the
heads are above the outer zones. Thus, the outer zones have higher data transfer rate than
the inner zones. Magnetic zoned disks have two main advantages over traditional
magnetic disks. First, they have higher storage capacity than traditional magnetic disks

of the same size. Second, data on the outer tracks of zoned disks can be accessed more
quickly. In traditional magnetic disks, the motor speed is fixed. Whereas in zoned disks,
the motor speed changes when the heads change from one zone to another. Since
changing the motor speed is very simple, it is not difficult to be implemented.

6.3 Spiral Track Layout
Optical disks, such as compact disk (CD) and digital versatile disks (DVD) use the spiral
track to increase their storage capacities. The optical disks can

record data at a fixed speed continuously for a very long time. On the surface of the
optical disks, data are recorded on a long spiral track in sectors. The spiral track runs
continuously from the inside near the centre of the disk to outside near the rim. Dual
layer DVD may have a second spiral track at the second layer that runs in the same or
opposite direction. The motor changes the disk rotation speed according to the position of
the optical read/write head. The servo controls the motor speed and changes it
automatically. While the optical head is near the centre of the disk, the optical disk
speeds up. While the optical head is near the rim of the disk, the disk slows down. The

                                                                     Multimedia Databases

motor speed is maintained so that the data on the track pass the optical head at a fixed
linear speed.

Figure 6.3: CD and DVD layout

6.4 Millipede Project
The millipede project creates a new type of disk. The size and shape of the millipede disk
looks like a postage stamp. The disk is composed of silicon tips above a polymer. Data
are written on the polymer by punching holes on the polymer with a silicon tip. The holes
are separated at a distance of around 10 nanometers or 50 atoms. The disk can record data
at a density of 1 trillion bits per square inch. It records data at 20 times denser than the
magnetic disks. The disk is rewritable. Data on the polymer can be read or written by
changing the temperature of the silicon tips. Data on the polymer are written with hot tips
at 400°C. Data are read from the polymer with warm tips at 300°C. In addition, data on
the polymer can be erased using hot tips. Since the time to conduct heat to the polymer is
rather long, the data recording speed is 1,000 times slower than hard disks. In order to
compensate for the long access latency, the disk uses 1024 silicon tips working in

6.5 Nano RAM
Another new disk is the Nano-RAM disk. Nano random access memory (NRAM) is one
of the first storage devices that use the nanotechnology. It is small and compact. The
NRAM is small and compact. The NRAM is composed of carbon nanotubes that are a
billionth of a meter in size. The disk head sends differing electrical charges into the
nanotube and swings the tubes into one of the two positions. One of the two positions
represents a binary digit 0 while the other position represents a binary digit 1.

Inside the N RAM, the nanotubes only move a very short distance, and it takes a very
short time to finish this movement. Thus, the read/write operations can be finished very
quickly. This short latency feature makes the NRAM suitable for high performance
systems. The position of the nanotubes is nonvolatile. The nanotubes do not need

                                                                       Multimedia Databases

power to maintain their current positions as in random access memory. Thus, the NRAM
is suitable for permanent storage of information. In addition, the NRAM does not need to
maintain continuous rotations like magnetic disks and optical disks. It saves power, and
the NRAM can be used in mobile devices.

The NRAM is 50 times stronger than steel. The nanotubes can swing into positions many
times in order to support a large number of write cycles. Recent developments on quality
control help to select only nanotubes that are growing properly.

In summary, the nanotube is a durable, compact, low power, compact, high capacity, and
low latency storage device. The NRAM can be used in mobile and high performance
systems in which the system requirements are stringent.

6.6 Disk Array
In order to store more data on a storage system, multiple disks can be used. The disks
may serve requests in parallel or independently. When multiple disks are used as a disk
array, data are divided into data stripes. Each data stripe is a fixed number of bytes, and it
is stored on multiple disks. When data are accessed, each disk is issued a request. All the
requests are then served simultaneously. Each request retrieves a fraction of the data
stripe. Hence, more data are transferred and large data transfers are served efficiently.

Mean time to disk failure is the average time that a disk may fail. When more disks are
used, the mean time to disk failure shortens. For example, assume that the mean time to
disk failure is 5 years. If we use only one disk, then we may expect to encounter a disk
failure in around 5 years. If we use 10 disks, then we may expect to encounter a disk
failure in around 6 months. If we use one hundred disks, then we may expect to encounter
a disk failure in around 18 days. If we use 2,000 disks, then we may expect to encounter a
disk failure every day.

In order to recover data after disk failure, some redundant data are encoded and stored.
Data on the failed disks can then be recovered from data storing on other disks. This
arrangement of disks forms a redundant disk array.

Redundant array of inexpensive disks (RAID) is an array of small and inexpensive disks
that store encoded redundant data to increase data reliability and data security. When a
single disk fails, data on the failed disk is recovered from data on the remaining disks.
Seven RAID levels are described below.

RAID 0: No redundancy: Data are simply stored on disks without any redundant
information. Data can be lost when disk fails.

RAID 1: Mirrored disks: The disks are arranged in pairs. Each disk in the pair contains
the same data. This is the most expensive option that only half of the available disk
capacity is utilized for data storage.

                                                                      Multimedia Databases

RAID 2: Bit interleaved array: Several correction disks are added to the group of data
disks similar to RAM chips. A single parity disk can detect a single error, but at least
three disks are needed to correct an error. More parity disks in a group means more
overheads for fault tolerance, but fewer data disks in a group means fewer I/O
events/second. Since the whole group must be accessed to validate the correction codes,
this is inefficient for small transfers.

RAID 3: Parity disk: Data are interleaved bit-wisely or byte-wisely across the data
disks. Disk controller can detect the failed bit position, and a parity disk contains the
parity of the data disks. It is possible to recover data on any single lost disk by reading
the contents from the surviving disks, and recomputing the parity. The disk array
performance is similar to a RAID2 with a single correction disk.

RAID 4: Block interleaved: Each individual block is stored on a single disk. Data are
interleaved between disks at the block level instead of the bit level or byte level. The new
parity is calculated as equal to (old data x-or new data x-or old parity). A small write
request uses two disks to perform four accesses. Since all write requests access the parity
disk, contentions at the parity disk would result.

RAID 5: Rotated parity: Parity blocks are interleaved among the disks in a rotating
manner called left-symmetric. Two writes can take place in parallel as long as the data
and parity blocks use different disks. This disk array performs better for small and large
transfers, making it the most widely accepted level for transaction processing workloads.
RAID5 tolerates single disk failure in each parity group of disks. Data are lost only when
multiple disks in the same group of disks fail. Gibson used mean-time-to-data-loss to
measure the reliability of disk arrays and showed that RAID5 can increase data

RAID 6: Two-dimensional parity: The disks are arranged into a two dimensional
matrix and a parity disk is added to each row and each column of the matrix array. This
disk array can survive any losses of two disks and many losses of three disks. The only
exception for three loss disks is that the data disk and both the parity disk and the column
disk of this data disk fail at the same time. Since every logical write needs three disks and
six accesses, the impact on I/O performance is significant. Hence, this disk array is
acceptable only when the fault-tolerant requirement is very high.

In most data storage on disks, data are not differentiated into read-write or read-only
types. Read-only data are static and cannot be modified by the applications. Read-write
data are dynamic and are frequently modified by the application. Read-only data are
easily recoverable from elsewhere, such as tertiary storage. RAID addresses the problem
of losing data under the conditions of disk failures. Under the condition that read-only
data are recoverable easily from other sources, the storage of redundant information of
read-only data may waste storage capacity and bandwidth.

                                                                     Multimedia Databases

                                    CHAPTER 7

                  DATA COMPRESSION

A vast number of compression techniques have been designed since the 1950s. To
understand different compression techniques, we here use a general model to describe
data compressions. Data compression is performed using two processing components.
The first component is the encoder and the second component is the decoder. The
encoder and the decoder components convert input data into output data according to the
compression rules being specified in the compression method.

The encoder accepts some original data as input and generates a new encoded
representation of these symbols. These encoded symbols are sometimes called
codewords. The encoded symbols are created following the rules being specified by the
compression method. Very often, the encoded symbols are intentionally designed to be
shorter than the original input symbols. Conversely, the decoder accepts the encoded
symbols as input and outputs the restored symbols. In order to restore the original data,
the decoder must use the same set of rules as the encoder, and these rules are specified by
the compression method. If the decoder uses a different set of compression rules,

it would not be able to restore the original data from the codewords. In addition, the
codewords must be delivered unaltered from the encoder to the decoder. If any parts of
the codewords are altered, the decoder also cannot restore the original data from the
altered codewords. To measure the performance of a compression technique, it is
necessary to compare the size of the encoded symbols with the size of the original
symbols. If the size of the encoded symbols is only one-third of the size of the original
symbols, the compression ratio is said to be 3:1. Sometimes, the processing time to
perform the encoding and decoding algorithms are also considered. These three metrics,
including compression ratio, encoding time, and decoding time, can provide good metrics
of the performance of the compression techniques.

                                                                     Multimedia Databases

Figure 7.1: Compression model

7.1 Text Compression
The Huffman coding method was created in the 1950s. The Ziv-Lempel compression and
the arithmetic coding were created in the 1970s. Several popular compression algorithms,
such as LZ77, LZ78, LZW, and gzip, are variants of the Ziv-Lempel compression
method. Later, the prediction by the partial matching method was designed in the 1980s.
Most of the state-of-the art compression techniques are variants of these fundamental
compression methods.

In text compression, the encoder accepts some input text symbols and generates
codewords. The codewords are created according to the rules being specified by the
compression method in Figure 4.2. For example, if we use “a” to represent “apple,” “b”
to represent “boy,” and “c” to represent “cat.” We then represent “apple, boy, cat” with
the codewords “a, b, c.” This codewords are much shorter than the original input data
symbols. Conversely, the decoder restores the original data from the codewords
according to the rules specification of the compression method. In the above example, the
decoder converts the codewords “a, b, c” back to “apple, boy, cat” according to the
compression rules.

Before applying any data compressions, text symbols are represented by a fixed number
of bits or bytes. In the ASCII code being used in personal computers, each text character
is represented by a fixed number of eight bits.

Figure 7.2: Text compression

Data compression changes the number of bits in the codewords to represent each text
character in the symbol.

The techniques used in the compression methods can be grouped into symbol wise
methods, dictionary methods, and hybrid methods. The symbolwise methods, sometimes
referred to as statistical methods, estimate the probabilities of occurrence of symbols and

                                                                     Multimedia Databases

use shorter codewords for the more likely symbols. The dictionary methods replace
words and contiguous words with an index to an entry to a “dictionary.” The decoder
then uses the indexes to look up the corresponding words from the same dictionary. The
hybrid methods combine the two techniques of both the symbolwise methods and the
dictionary methods within the same compression model. We shall explain these
techniques with more details below. Afterwards, we describe the LZ77 and arithmetic
coding compression techniques.

7.1.1 Symbol wise Methods
In a paragraph or text document, each different word or symbol usually occurs for a
different number of times. Some words, such as “to,” “is,” and “at,” occur very
frequently. Other words, such as “incorrecttypo,” occur rarely. If we choose a shorter
codeword for the frequently occurring symbols and the longer codeword for the rarely
occurring symbols, the short codewords occur more frequently, and the long codewords
occur less frequently. The average length of codewords in the compressed text would
then be short. When the estimation of symbol occurrence is good, the symbolwise
methods usually lead to better compression. Although the average length of codewords is
usually shorter using the symbolwise method, the actual compression ratio depends on
the number of occurrences of each symbol in the original text document. If the less likely
symbols occur frequently in the text document, the average length of codewords in the
compressed text can become long.

It is commonly known that the number of occurrences of each word in a file often
depends on the context of the file. While the word “byte” appears frequently in a
computer book, it may appear only rarely in a tourist guide book. Therefore, it is unlikely
that a single set of compression rules works for all types of data.

We use an example to show how variable length codewords can reduce the average
length of codewords in the compression. If we have a list of names “Paul John John
Johanna John John Joshua John John Joshua John John John Peter” as the input symbols,
the list of uncompressed symbols are “Paul,” “John,” “Johanna,” “Joshua,” and “Peter.”

Before compression, each character occupies one byte, and we ignore the space

The length of the list of input symbols is = 10*4+1*7+2*6+1*5 bytes

= 64 bytes.

We represent the input symbols using fixed length codewords. For five different names,
we need at least a 3 bits codeword to represent each name without ambiguity. For
conventional purposes, we use a->b to show that codeword a represents symbol b. We
choose “000” -> “Paul,” “001” -> “John,” “010” -> “Johanna,” “011” -> “Joshua,” “100”
-> “Peter.” As there are 14 names in the list of symbols, the total length of the symbols
using fixed length codewords is = 14*3 bits = 42 bits.

                                                                    Multimedia Databases

We represent the symbols using variable length codewords. For five different symbols,
we only need to create five different codewords with one codeword for each symbol. We
choose “0” -> “Paul,” “10”-> “John,” “110” -> “Johanna,” “1110” -> “Joshua,” “1111” -
> “Peter.” As there is only one occurrence of “Paul,” nine occurrences of “John,” only
one occurrence of “Johanna,” two occurrences of “Joshua,” and only one occurrence of
“Peter,” the total length of the symbols using variable length codewords is

= 1*1+9*2+1*3+2*4+1*4 bits

= 34 bits

The compression ratio due to using variable length codewords is thus

= 42 bits

34 bits

= 1.235:1.

We have seen that the use of variable length codewords may change the average length of
codewords. The amount of changes actually depends on the choice of codeword to
represent the symbols. We can easily observe that the names appear a different number of
times. The average length of codewords is minimized when the shorter codewords are
chosen to represent the more frequent symbols. That is, we arrange the list of symbols
according to their occurrence in the descending order. We have the ordered list of
symbols as “John,” “Joshua,” “Paul,” “Johanna,” and “Peter.” Let “0” -> “John,” “10”->

“Joshua,” “110” -> “Paul,” “1110” -> “Johanna,” “1111” -> “Peter,” the total length of
the symbols using this set of variable length codewords is = 9*1+2*2+1*3+1*4+1*4 bits

= 25 bits

The compression ratio of this set of variable length codewords is thus

= 42 bits

25 bits

= 1.68 : 1.

Therefore, better compression ratio can be achieved by using shorter codewords for the
more frequent symbols.

7.1.2 Dictionary Methods
The dictionary methods replace symbols and text with an index to an entry in a
“dictionary.” They use simple representations to code references to entries in the
dictionary. Instead of specifying one index for each symbol, an index can represent
several matching symbols in the dictionary to achieve higher compressions. This is useful
when several symbols often occur together. The compression methods use a static

                                                                     Multimedia Databases

dictionary, a semistatic dictionary, or an adaptive dictionary. A static dictionary simply
uses a fixed dictionary to compress different sets of symbols. It is simple to use, but the
compression ratio is not optimal in general. While a dictionary is optimal for one set of
symbols, it may be suboptimal for a different set of symbols.
Some methods may use a semistatic dictionary to compress different sets of symbols.
These methods construct a new dictionary or codebook for each text being compressed.
This helps to optimize the compression ratio for the text or set of symbols being
compressed. However, the overheads of transmitting or storing the constructed codebook
are significant. As the same codebook has to be used by both the encoder and the
decoder, the encoder needs to transmit the newly constructed codebook to the decoder.
Some methods use the adaptive dictionary approach. These methods use all the text prior
to the current position as the codebook. While the text is reconstructed at the decoder, the
codebook is reconstructed at the same time with the decompressed text. The decoder thus
creates the same codebook as the encoder without the need to receive the codebook from
the decoder. The dictionary is transmitted or stored implicitly at no extra cost. This
codebook also makes a very good dictionary due to the same style and language used as
the upcoming text after the current position. In the dictionary methods, longer matching
symbols lead to higher compression. For example, an index to two words “to be” is more
efficient than two separate indexes to “to” and “be.”

7.2 Image Compression
The main objective of image compression is to reduce the amount of data in representing
an image. As uncompressed images are large in size, images are often kept in a
compressed format. This helps to save storage space in keeping the images and time to
retrieve the images from the storage media. The main approach in image compression
methods is to reduce redundancy in encoding images. The images may be decompressed
and retrieved in parallel to hide the processing time in decompression. Image
compression methods can be roughly divided into lossless compression methods, lossy
compression methods, and hybrid compression methods. The most well-known image
compression standards include the Joint Photographic Expert Group (JPEG) and
JPEG2000 methods.
Lossless compression, or noiseless compression, encodes data in a form that represents
the original images with fewer bits. The original representation can be perfectly
recovered. If the original images must not be lost, the images should be compressed using
lossless compression methods only. The Huffman coding, arithmetic coding, Ziv-Lempel,
and run length encoding belong to this category.
Lossy compression methods encode images into a form that can be decoded into a
representation that humans find similar to the original image. The difference between the
original images and restored images should be unnoticeable or not important to the
human viewer. Lossy compression methods can be applied on image, audio, and video
objects. The main advantage of lossy compression methods is that they can usually
compress images at a much higher compression ratio. Using the lossless compression
techniques, JPEG can compress images to the just noticeable quality at the compression

                                                                  Multimedia Databases

ratio of 15:1. The Motion Picture Expert Group (MPEG) standard can compress video at
compression ratio of 200:1. The H.261 or px64 compression methods can compress video
at the compression ratio up to 2000:1.
The hybrid compression methods use both lossless and lossy compression techniques.
These include most compression standards, including JPEG, JPEG 2000, MPEG-1, and
MPEG-2. Compression standards help to avoid complexity in handling heterogeneous
Lossy compression methods compress images and video objects by predictive, frequency
oriented, and importance oriented techniques. The motion compensation method is a
predictive technique. The transform coding and subband coding are frequency oriented
techniques. The filtering, bit allocation, subsampling, and quantization methods are
importance oriented techniques. These techniques are used in JPEG, JPEG2000, and
MPEG compressions.

Figure 7.3: Lossy compression technique

7.2.1 JPEG2000 Compression
The JPEG compression methods encode images using one of the four modes of
operations. The four modes of operations include sequential encoding, progressive
encoding, lossless encoding, and layered encoding. In the lossless encoding, the images
are encoded to guarantee exact recovery of every source image sample value. In the
sequential encoding, each image component is encoded in a single left-to-right, top-to-
bottom scan. In the progressive encoding, the images are encoded in multiple scans for
applications in which transmission time is long. In the layered encoding, also called
hierarchical encoding, the images are encoded at multiple resolutions. The lower
resolution versions of the images may be accessed without first having to decompress the
image at its full resolution.

                                                                   Multimedia Databases

The JPEG2000 compression method is a hybrid compression method which uses both
lossless and lossy compression techniques (Adams, 2002). It implements compression of
low bit rate. It is designed for images over low bandwidth transmission. Each image is
divided into several image components.
Each image component is subdivided into tiles that cover less than or equal to 4096
pixels. It performs color transform, wavelet/subband coding, quantization, progression,
and rate control on the images.

Figure 7.4: Processing components of the JPEG encoder

A source image can be composed of several overlapping components. JPEG supports 1 to
255 image components. Each image component consists of one color channel or spectral
band. The color of a full color image can be decomposed into three color components,
such as red, green, and blue. The image formed by the red component is the red
component image. Similarly, the images formed by the green or blue component are the
green or blue component image. Thus, a color image is decomposed into three
overlapping images.
In the preprocessing step, the encoder adjusts the pixel values so that the nominal
dynamic range is approximately centered at about zero. This is done by subtracting a bias
of 2P-1 to move the samples to the range [-2P-1, 2P-1 –1]. JPEG defines two
intercomponent transforms; including irreversible color transform (ICT) and reversible
color transform (RCT) to change the color representation of the images. The irreversible
color transform (ICT) is a Lossless compression using real-to-real transform. The
reversible color transform (RCT) is a lossy compression using integer-to-integer
transform. The ICT converts image colors from the RGB representation to the YCbCr
transform. The RCT approximates the ICT to perform a reversible integer to-integer
Afterwards, the encoder performs an intracomponent transform using the 2D
wavelet/subband coding as illustrated in Figure 4.11. A low (L) subband image of half
resolution of the original image is formed by using the mean of the sample values in the
higher resolution. Then, another subband, the high (H) subband, image of half resolution

                                                                     Multimedia Databases

is formed by using the difference of the subband image with the original image. Thus, an
image is transformed into two subband images. This subband coding is applied in both
horizontal and vertical directions to form four subband images, including the LL, LH,
HL, and HH subbands. The subband images are formed recursively on the LL subband of
the previous level to generate the wavelet image.

Figure 7.5: Wavelet intra-component transform

The transformed wavelet image is then quantized. Mathematically, the quantization
process is

where Δ is the quantization step size, U(x, y) is the value of the pixel at position (x, y)
before quantization, and sgn() is the sign function returning either +1 or -1.

Conversely, the dequantization process is:

where r.= 0.5 is the bias parameter.

In the Tier-1 Coding, the image is divided into blocks of rectangular tiles with size 4096
pixels per tile. Thus, the largest square tile covers 64x64 pixels. The pixel values are
retrieved at a scan height of four samples per vertical column. Three passes per bit plane
to get the sample values in the scan order. In the first pass, only the most significant
bits of the sample values are obtained. In the second pass, the refinement bits are
used. In the third cleanup pass, all other least significant bits are used. The sample
values are obtained in this scanning order to support multiple passes encoding. This
is particularly suitable for images being transmitted at a low transmission rate. The
resolution of images increases progressively as more passes of data are received.

The Tier-2 Coding builds packets with passes. Each packet is comprised of two parts,
header and body. The encoded data for each tile is organized into a number of layers.
Five sorting orders of packets called progressions are specified in JPEG2000. The five
sorting orders are layer-resolution-component-position, resolution-layer-component-
position, resolution-position component-layer, position-component-resolution-layer, and

                                                                  Multimedia Databases

component position-resolution-layer. The encoder may choose the most suitable sorting
order for the image or application.
JPEG2000 supports bit rate controlling. The bit rates can be controlled by choosing
suitable quantization step sizes or including only a suitable subset of coding passes.
JPEG2000 allows the region of interest (ROI) coding. Different regions of an image may
be coded with differing fidelity. While synthesized from its transformed coefficients in
the decompression process, each coefficient contributes only to a specific region. The
encoder may identify the coefficient contributing to the ROI. It can then encode some or
all of these coefficients with greater precision than the others. JPEG2000 defines a
structure for the encoded data. A code stream is a sequence of marker segments. Each
marker segment has three fields, including type, length, and parameter. The code stream
has one main header, a number of tile-part header body pairs, and one main trailer. The
JPEG2000 files use the .JP2 file extension. A JP2 file contains a number of boxes. Each
box has box length, box type, the true length of box when the box length is 1, and box
data. The JPEG2000 decoder reverses the process of JPEG 2000 encoder. It goes through
the following processing components as.

Figure 7.6: Processing components of the JPEG decoder

1. The compressed code stream is fed through Tier-2 and Tier-1 decoders.

2. The decoded values go through the dequantizer.

3. The dequantized values go through the inverse intracomponet transform and the
inverse intercomponent transform.

4. The postprocessing restores the pixel values.

5. The image component is reconstructed from the pixel tiles.

6. The image is reconstructed from the color components.

                                                                 Multimedia Databases

7.3 Video Compression
Several video compression standards were developed by the International
Telecommunications Union and the Motion Picture Expert Group. ITU alone developed
the video compression standards H.261, H.263, H.263+, and H.263++. MPEG alone
developed the MPEG1, MEPG4, MPEG7, and MPEG21. The ITU and MPEG worked
together to develop the MPEG2 and H.264.
MPEG1 is the first video compression standard from MPEG, and it was released in 1993.
Its main purpose is to compress a video into a sequence of image frames. MPEG2 is an
enhanced video compression standard, and it was released in 1994. MPEG4 is an object
based video compression standard in 1999, and it compresses video into composing
objects. MPEG7 is a multimedia content description standard based on the eXtensible
Markup Language (XML). MPEG21 is an open multimedia framework standard. MPEG-
1 compresses the source intermediate formats (SIF) video. The characteristics of a SIF
format video are 4:2:0 subsampling, progressive scan, and 30 mbps. A SIF format video
may display either 352×240 pixels/frame at 30 frames/seconds or 352×288 pixels/frame
at 25 frames/second. MPEG-1 compresses SIF video with raw data rate of 30 mbps to
about 1.1 mbps at the VHS VCR quality MPEG compression is suitable for digital
storage media and channels. Different types of applications may compress and
decompress video different number of times. Some applications may compress an object
only once and decompress them several times. Other applications may compress and
decompress objects in similar number of times.
Depending on the frequency of compressions and decompressions being performed, a
compression technique can be classified as symmetric and asymmetric. The symmetric
compression methods compress objects and decompress objects with similar processing
times. They are more suitable for use in applications such as video e-mail and video
conferencing. In these applications, video objects are compressed and decompressed a
similar number of times. The asymmetric compression methods compress video objects
with varying processing times. They are more suitable for use in applications including
movies, video-on-demand, education-on-demand, and e-commerce. In these applications,
the video objects are compressed only once at production of the objects. The compressed
objects are decompressed more frequently, usually once when the objects are being
viewed or displayed.

7.3.1 MPEG2 Compression
MPEG2 is an asymmetric compression. It strikes a balance between intraframe and
interframe coding. For interframe coding, it performs block based motion compensations
to reduce temporal redundancy. For intraframe coding, it performs DCT based
transformations to reduce spatial redundancy.
An MPEG stream consists of many group-of-pictures (GOPs). Each GOP consists of
three types of frames, including I-frame, P-frame, and B-frame. I-frames are
intrapictures, and they are compressed using JPEG. They are independently compressed,
and they can be used as the starting points for random access. P-frames are predicted
pictures, and they are coded by referring to past pictures. They may also be used as

                                                                     Multimedia Databases

reference pictures for future predicted pictures. B-frames are bidirectional predicted
pictures, and they are coded by interpolating from the past and future pictures.
The MPEG I-frame encoders compress pictures using the JPEG compression. The
encoder first converts the color space of
the picture from RGB to YUV. The encoder then performs a forward discrete cosine
transform (FDCT). The transformed sample values are quantized. After that, the
quantized values are encoded using Huffman coding. MPEG achieves moderate
compression on the I-frames. The MPEG P-frames and B-frames are compressed by
referring to other frames. P-frames only refer to the previous I-frame or P-frame, whereas
B-frames refer to the previous I-frame or P-frame as well as the future I-frame or P-
frame. P-frames are encoded using motion estimations, and B-frames are encoded using
Motion estimation uses the block matching techniques to compensate the interframe
differences due to motion as shown in Figure 4.15. For each block inside the current
frame, the encoder finds the best matching block from the reference frame. If this block is
found, it encodes the location of the matching block and the difference between this
block and the matching block. P-frames are thus compressed at higher compression ratio
than the I-frames. MPEG uses interpolations to perform motion compensations on B-
frames. For each block in the current frame, the encoder finds the best matching block in
the previous reference frame and the best matching block in the future reference frame.
These two blocks are interpolated to generate the interpolated block. The difference
between the current block and this interpolated block is then encoded. MPEG thus
achieves the highest compression on the B-frames.

As the B-frames depend on the previous and future frames, the sequence of storing and
retrieving frames is different from the sequence of displaying frames. Each B-frame is
stored after the previous and future pictures that it depends on.

Figure 7.7: MPEG Motion estimation

                                                                        Multimedia Databases

Figure7.8: MPEG Interpolation

                                      CHAPTER 8

8.1 Data placement on disks
Many intelligent storage organizations, or data placement methods, have been designed for
traditional data files and database systems. Traditional file placement methods are grouped into
the following:

1. Random placement. Each data file is split into file blocks, and the file blocks are
   randomly placed on any storage locations. This is the simplest strategy to handle
   random accesses to file blocks.
2. Contiguous placement. Each data file is stored to contiguous physical locations.
   This strategy performs best when the entire file is accessed by consecutive requests.
   However, fragmentation prohibits the placement of large files.
3. Type based placement. Files containing the same type of data are grouped to a
   category. Files belonging to the same category are placed close to each other. This
   strategy trades off the seek distance of consecutive requests on data of the same type
   with that on data of different types.
4. Frequency based placement. Files are sorted according to the stationary
   probabilities of their accesses. Frequently accessed files are placed in the locations
   with low average access overheads. This strategy needs to record the access
   frequency of files in order to reorganize the data files.
5. Markovian placement. The pattern of consecutive accesses to data files is
   investigated. Two data files that are accessed by consecutive accesses are correlated.
   The data files with the highest correlation probabilities are stored to consecutive
   locations. This strategy optimizes the seek distance of requests according to the
   access history. Many data placement methods are specifically designed for
   multimedia data. These data placement methods can be grouped according to their
   strategies into the following categories:
   1. Random placement. Data stripes are stored randomly. This simple method is
       used for comparison only. Practical systems usually use this strategy due to its
       simplicity and flexibility.
   2. Statistical placement. Objects are stored according to the stationary or transition
   3. Striping. Objects are divided into stripes to allow round robin or parallel

                                                                       Multimedia Databases

    4. Replication. Objects are fully or partially replicated to increase availability of
        data, or redundant codes are encoded and stored to increase data reliability and
    5. Constraint allocations. The physical storage locations to store consecutive data
        stripes are restricted so that the maximum overheads between consecutive
        requests are reduced.
These data placement strategies, except the random placement strategy, are described in
the following chapters. The random placement strategy is skipped because it is simple
and it does not provide any promises to the performance of the storage systems.
8.2 Multimedia Access Streams
In traditional client/server computer systems, the types of data being accessed are usually
textual and binary data. Binary data are often stored in database files, and textual data are
stored in document files. In multimedia systems, multimedia data such as video, audio,
and images are stored in data files. These data may be accessed in a pull-based manner or
a push-based manner.

Figure 8.1: Request streams

Traditional data are usually accessed in a pull-based approach. The client programs send
discrete requests to load data from the server. The request may look like this to human
beings: “Give me the 10th block of data in file A.” Upon receiving this request, the server
accesses the block of data, encloses it in a data packet, and passes it to the client. The
client then opens the data packet and accesses the data inside the packet. After serving
this request, the server program would wait for another request from the client.
Multimedia data are often accessed in a push-based approach. The client program sends a
request to the server asking for the multimedia file starting at a particular block. The
request may look like this to human beings: “Give me the file M starting from the 10        th

block.” Upon receiving this request, the server accesses the 10 block of data in file M,

encloses it in a data packet, and passes it to the client. The server then accesses the 11  th

block of data in file M and passes it to the client, and so on. The server would continue to
access the next block of data in file M and pass it to the client until it receives another
request from the client. When the client receives a data packet, it opens the packet and
accesses the data inside.

                                                                      Multimedia Databases

Due to the continuous nature of the multimedia data, many data requests would be sent to
the server in the pull-based approach. All the requests and the returned data packets flow
continuously like water in a river stream. In the push-based approach, the data packets
also flow continuously through the communication path like a stream. Thus, the
multimedia objects are accessed via request.
8.2.1 Classification of Streams
Depending on the time interval between consecutive packets, a stream can be classified
as strongly periodic, weakly periodic, or aperiodic streams.
Depending on the variation of data size between consecutive packets, a stream can be
strongly regular, weakly regular, or irregular. Depending on the continuity of consecutive
packets, a stream can be continuous or discrete (Furht, 1996).
   Strongly Periodic Stream: If the time interval between any two consecutive packets
    is constant, then the stream is called a strongly periodic stream. In the ideal case, the
    jitter is zero. Figure 2.23 shows a strongly periodic stream that has a fixed time
    interval between consecutive data packets. For example, the pulse code modulation
    (PCM) coded speech is a strongly periodic stream.

        Figure 8.2: Strongly periodic stream

       Weakly Periodic Stream: If the time intervals between two consecutive packets
        are not constant but only periodic, then the stream is called a weakly periodic
        stream. Figure 2.24 shows a weakly periodic stream in which the time interval
        between consecutive packets oscillates between T1 and T2. When we merge two
        strongly periodic streams with different periods, the resultant stream is a weakly
        periodic stream.

                                                                  Multimedia Databases

Figure 8.3: Weakly periodic stream

   Aperiodic Stream: Aperiodic streams are streams such that the time intervals
    between consecutive packets are neither constant nor periodic. An aperiodic stream
    with different time intervals between consecutive packets is shown in the figure.

    Figure 8.4: Aperiodic stream

   Strongly Regular Stream: If all data packets are of the same constant size, then the
    stream is called a strongly regular stream (Figure 2.26). An uncompressed video
    stream created from a capturing video camera is usually a strongly regular stream.

    Figure 8.5: Strongly regular stream

                                                                    Multimedia Databases

   Weakly Regular Stream: If the data size of packets changes periodically, then the
    stream is called a weakly regular stream

    Figure 8.6: Weakly regular stream

   Irregular Stream: When the data sizes of packets are not constant and they are not
    periodic, the stream is called an irregular stream. Since the data sizes of packets
    change, it complicates the transmission and processing of the data packets. When
    temporary buffers are allocated, their size should be large enough to accommodate
    the largest data packet. Thus, the buffers cannot be utilized to their full capacity.
    Therefore, some efficiency is lost in handling irregular streams.

    Figure 8.7: Irregular Stream

   Continuous Stream: When the large objects are accessed and the multimedia data
    need to be returned in small data packets, the data packets would be sent continuously
    over a long time. The data stream is called a continuous stream. Video and audio
    objects are usually accessed by continuous streams.
    When the data packets are transmitted over the communication path, the data packets
    occupy some capacity of the communication path for a long period of time. If the data

                                                                    Multimedia Databases

    packets are transmitted without any intermediate gaps, they may fully occupy the
    communication path. The system resources may not be able to serve other users of the

    Figure 8.8: Continuous stream

   Discrete stream: Some multimedia objects such as images are not continuous in
    nature. These objects may be large, but they can be accessed with discrete requests. A
    packet is not connected to its preceding and following packets. The data stream is
    thus discrete. For example, a large image object may be accessed by a request, and
    the object is returned via a discrete stream.

    Figure 8.9: Discrete stream

                                                                    Multimedia Databases

                                    CHAPTER 9

                   OBJECT DATABASES

        Relational database systems support a small, fixed collection of data types (e.g.,
integers, dates, strings), which has proven adequate for traditional application domains
such as administrative data processing. In many application domains, however, much
more complex kinds of data must be handled. Typically this complex data has been
stored in OS file systems or specialized data structures, rather than in a DBMS. Examples
of domains with complex data include computer-aided design and modeling
(CAD/CAM), multimedia repositories, and document management.

As the amount of data grows, the many features offered by a DBMS - for example,
reduced application development time, concurrency control and recovery, indexing
support, and query capabilities-become increasingly attractive and, ultimately, necessary.

In order to support such applications, a DBMS must support complex data types. Object-
oriented concepts have strongly influenced efforts to enhance database support for
complex data and have led to the development of object-database systems. Object-
database systems have developed along two distinct paths:

Object-oriented database systems: Object-oriented database systems are proposed as an
alternative to relational systems and are aimed at application domains where complex
objects play a central role. The approach is heavily influenced by object-oriented
programming languages and can be understood as an attempt to add DBMS functionality
to a programming language environment.

Object-relational database systems: Object-relational database systems can be thought
of as an attempt to extend relational database systems with the functionality necessary to
support a broader class of applications and, in many ways, provide a bridge between the
relational and object-oriented paradigms.

Figure 9.1: Object Oriented Database

                                                                     Multimedia Databases

Figure 9.2: Example of a Object-Oriented Model.

Object databases are generally recommended when there is a business need for high
performance processing on complex data.

When database capabilities are combined with object programming language capabilities,
the result is an object database management system (ODBMS). An ODBMS makes
database objects appear as programming language objects in one or more object
programming languages. An ODBMS extends the programming language with
transparently persistent data, concurrency control, data recovery, associative queries, and
other capabilities.

Some object-oriented databases are designed to work well with object-oriented
programming languages such as Python, Java, C#, Visual Basic .NET, C++, Objective-C
and Smalltalk; others have their own programming languages. ODBMSs use exactly the
same model as object-oriented programming languages.

Object databases based on persistent programming acquired a niche in application areas
such as engineering and spatial databases, telecommunications, and scientific areas such
as high energy physics and molecular biology. They have made little impact on
mainstream commercial data processing, though there is some usage in specialized areas
of financial services. It is also worth noting that object databases held the record for the
World's largest database (being the first to hold over 1000 terabytes at Stanford Linear
Accelerator Center) and the highest ingest rate ever recorded for a commercial database
at over one Terabyte per hour. Another group of object databases focuses on embedded
use in devices, packaged software, and real-time systems.

                                                                       Multimedia Databases

Most object databases also offer some kind of query language, allowing objects to be
found by a more declarative programming approach. It is in the area of object query
languages, and the integration of the query and navigational interfaces, that the biggest
differences between products are found. An attempt at standardization was made by the
ODMG with the Object Query Language, OQL.

Access to data can be faster because joins are often not needed (as in a tabular
implementation of a relational database). This is because an object can be retrieved
directly without a search, by following pointers. Another area of variation between
products is in the way that the schema of a database is defined. A general characteristic,
however, is that the programming language and the database schema use the same type
definitions. Multimedia applications are facilitated because the class methods associated
with the data are responsible for its correct interpretation.

Many object databases, for example VOSS, offer support for versioning. An object can
be viewed as the set of all its versions. Also, object versions can be treated as objects in
their own right. Some object databases also provide systematic support for triggers and
constraints which are the basis of active databases.

The efficiency of such a database is also greatly improved in areas which demand
massive amounts of data about one item. For example, a banking institution could get the
user's account information and provide them efficiently with extensive information such
as transactions, account information entries etc. The Big O Notation for such a database
paradigm drops from O(n) to O(1), greatly increasing efficiency in these specific cases.

Figure 9.3: Relational database of a cat                 Figure 9.4: Object-oriented database
.                                             of a cat

                                                                     Multimedia Databases

This enables:

      complex data types to be stored (e.g. CAD applications)
      a wide range of data types in the same database (e.g. multimedia applications)
      easier to follow objects through time (e.g. "evolutionary applications")

9.1 Object Oriented Database Systems manifest

Figure 9.5: Object Oriented Database Systems manifest.

9.1.1 Objects
Objects are complex objects that may be images, sound, videos, etc. This complex object
formed from simpler ones by constructors. It contains record, set, bag, list and array
complex object constructors. It has constructor orthogonality.

Object identity and equality: every object has unique and immutable object identifier
(OID) that they retain throughout their lifetime. Even after they are destroyed, the OID is
not assigned to another object. A point to note about objects is, sharing of objects through
references. Two objects are said to be identical if they have the same OID. And also two
objects are equal if they have the same state. The objects are shallow but have deep

Encapsulation: The object consists of an interface through which it interacts with other
objects and implementation that states what needs to be done. The interface defines
signatures of public methods and follows a common standard. The implementation

                                                                       Multimedia Databases

includes object data and methods. The object’s state is only modified through public
methods and cannot be implicitly changed. The object is protected by the class outside it
and cannot be manipulated directly, though the object data structure may be exposed for
declarative queries.

Figure 9.6: Objects

9.1.2 Types and classes

Data types are the definition of an object’s properties. It’s static part of the database that
describes the object’s structure. The dynamic part describes object behavior. It’s interface
and implementation are separated. Data types are used to check correctness of programs
at compile time.

Object Classes are the container for objects of the same type. They serve as the “black-
box” for the objects, where its held safe from outer hindrance. Its only through the class
that objects can be added and removed. It is used to create and manipulate objects at run

Figure 9.7: Types and classes

                                                                  Multimedia Databases

9.1.3 Generalization Hierarchies

Generalization hierarchies are classifying the classes into sub-classes and superclasses
based on what structures and behaviors are inherited by other classes. The advantages of
generalization hierarchy are that it is a powerful modeling tool and guarantees semantic
complexity. Its main purpose is the reuse of specification and implementation of classes
and hence reduces rewriting of same code. The classes can also be modified o provide a
variety of functionality from the inherited class.

Inheritance is a hierarchy where objects of subclass belong automatically to superclass.
Its attributes and methods are inherited from superclass. The subclass can introduce new
attributes and methods in addition to the superclass functions. And there can also be
migration between classes and you can move objects between hierarchy levels.

Object specialization moves down the hierarchy whilst generalization goes up.

Figure 9.8: Generalization Hierarchies

Substitution inheritance is another hierarchy where the subtype has more operations
than supertype. The subtype can be substituted where supertype is expected to operate.
This inheritance is based on behavior rather than values.

                                                                   Multimedia Databases

Inclusion inheritance is a different hierarchy where every object of subtype is also
object of supertype. It is based on structure rather than operations.

Constraint inheritance is a special case of inclusion inheritance. Here, the subtype is
expressed by constraint on supertype.

Specialization inheritance has subtype objects that contains more specific information.

9.1.4 Overriding, Overloading and Late Binding
Method overriding: Here the method is redefined in subtype. It guarantees specialization
of methods and preserves uniform method interface

Method overloading: It is the effect caused by method overriding .It proves that various
version of a method can exist in parallel.

Late Binding: The appropriate version of overloaded method is selected at run time and
loaded to the program. It is also known as virtual method dispatching.

Figure 9.9: Overriding, Overloading and Late Binding

                                                                       Multimedia Databases

9.1.5 Computational Completeness and Extensibility
Computational completeness is the requirement for the method implementation
language. Using this, any computable function can be expressed. It can be realized
through connection with existing language.

Extensibility: A database has a set of predefined types. Developers can define new types
according to requirements. There is no usage distinction between system and user types
which provides great flexibility and hence extensible.

9.1.6 Durability and Efficiency
Persistence is the longevity of the object or data, since data has to survive the program
execution. It can be made persistent by storing them into disks and then retrieving them
when needed. Persistence are of two types: orthogonal persistence and implicit

Secondary storage management promotes efficiency and durability for the objects. The
objects can be managed and accessed using the following: index management, data
clustering, data buffering, access path selection, query optimization.

9.1.7 Concurrency Control and Recovery
Concurrency is the management of multiple users concurrently interacting. It offers
atomicity, consistency, isolation and durability. It allows the serialisability of operations.

Reliability is the resiliency to user, software and hardware failures. It assures that
transactions can be committed or aborted without undesirable effects. If the transaction
was unable to complete, it restores the previous coherent state of data. It allows redoing
and undoing of transactions. It keeps a log of operations performed.

9.1.8 Declarative Query Language
It is a High-level language used to express non-trivial queries concisely. It has a text-
based or graphical interface. It offers efficient execution due to its possibility for query
optimization. It is Application independent and hence works on any possible database
irrespective of the product difference. There is no need for additional methods on user-
defined types.

                                                                     Multimedia Databases

9.1.9 Optional Characteristics and Open Choices
Optional characteristics of object-oriented databases include

      multiple inheritance
      type checking and inference
      distribution
      design transactions, long transactions, nested transactions

Open choices being

      programming paradigm
      representation system
      type system
      uniformity

9.2 New Data Types
      User-defined abstract data types (ADTs): It includes image, voice, video
       footage, etc, that is user’s personalized data types and these must be stored in the
       database. Further, we need special functions to manipulate these objects. For
       example, we may want to write functions that produce a compressed version of an
       image or a lower resolution image.

      Structured types: In this application, as indeed in many traditional business data
       processing applications, we need new types built up from atomic types using
       constructors for creating sets, tuples, arrays, sequences, and so on.

      Inheritance: As the number of data types grows, it is important to recognize the
       commonality between different types and to take advantage of it. For example,
       compressed images and lower-resolution images are both, at some level, just
       images. It is therefore desirable to inherit some features of image objects while
       defining (and later manipulating) compressed image objects and lower-resolution
       image objects.

How might we address these issues in an RDBMS? We could store images, videos, and
so on as BLOBs in current relational systems. A binary large object (BLOB) is just a
long stream of bytes, and the DBMS's support consists of storing and retrieving BLOBs
in such a manner that a user does not have to worry about the size of the BLOB; a BLOB
can span several pages, unlike a traditional attribute. All further processing of the BLOB
has to be done by the user's application program, in the host language in which the SQL
code is embedded. This solution is not efficient because we are forced to retrieve all
BLOBs in a collection even if most of them could be filtered out of the answer by
applying user-defined functions (within the DBMS). It is not satisfactory from a data

                                                                     Multimedia Databases

consistency standpoint either, because the semantics of the data is now heavily dependent
on the host language application code and cannot be enforced by the DBMS.

Large objects in SQL: SQL: 1999 includes a new data type called LARGE OBJECT or
LOB, with two variants called BLOB (binary large object) and CLOB (character large
object). This standardizes the large object support found in many current relational
DBMSs. LOBs cannot be included in primary keys, GROUP BY, or ORDER BY clauses.
They can be compared using equality, inequality, and substring operations. A LOB has a
locator that is essentially a unique id and allows LOBs to be manipulated without
extensive copying.

LOBs are typically stored separately from the data records in whose fields they appear.
IBM DB2, Informix, Microsoft SQL Server, Oracle 8, and Sybase ASE all support LOBs.

This is how a database can be created:

Clog produces a cereal called Delirios, and it wants to lease an image of Herbert the
Worm in front of a sunrise , to incorporate in the Delirios box design. A SQL query to
present a box of collection with their lease prices is shown below.

The thumbnail method is used to produce a small size version of the full-size input
image. The is_sunrise method returns true if it is a picture of sun rise, else no. The same
holds goo for is_herbert. The query produces the frame code number, image thumbnail,
and price for all frames that contain images of sn-rise or Herbert.

                                                                     Multimedia Databases

The second challenge comes from Dinky's executives. They know that Delirios is
exceedingly popular in the tiny country of Andorra, so they want to make sure that a
number of Herbert films are playing at theaters near Andorra when the cereal hits the
shelves. To check on the current state of affairs, the executives want to _nd the names of
all theaters showing Herbert films within 100 kilometers of Andorra.

The theater attribute of the Nowshowing table is a reference to an object in another table,
which has attributes name, address, and location. This object referencing allows for the
notation -> name and -> address, each of which refers to attributes of
the theater t object referenced in the Nowshowing row N. The stars attribute of the _lms
table is a set of names of each _lm's stars. The radius method returns a circle centered at
its first argument with radius equal to its second argument.

The overlaps method tests for spatial overlap. Thus, Nowshowing and Films are joined
by the equijoin clause, while Nowshowing and Countries are joined by the spatial overlap
clause. The selections to `Andorra' and films containing `Herbert the Worm' complete the

These two object-relational queries are similar to SQL-92 queries but have some unusual

User-defined methods: User-defined abstract types are manipulated via their methods,
for example, is_herbert.

Operators for structured types: Along with the structured types available in the data
model, ORDBMSs provide the natural methods for those types. For example, the setof
types have the standard set methods Э, €, ∩, U, –, etc..

Operators for reference types: Reference types are dereferenced via an arrow notation.

9.3 User-Defined Abstract Data Types
Consider the Frames table given earlier. It has a column image of type jpeg image, which
stores a compressed image representing a single frame of a film. The jpeg image type is
not one of the DBMS's built-in types and was defined by a user for the Dinky application
to store image data compressed using the JPEG standard. As another example, the
Countries table has a column boundary of type polygon, which contains representations
of the shapes of countries' outlines on a world map.

                                                                     Multimedia Databases

Allowing users to define arbitrary new data types is a key feature of ORDBMSs. The
DBMS allows users to store and retrieve objects of type jpeg image, just like an object of
any other type, such as integer. New atomic data types usually need to have type-specific
operations defined by the user who creates them. For example, one might define
operations on an image data type such as compress, rotate, shrink, and crop. The
combination of an atomic data type and its associated methods is called an abstract data
type, or ADT. Traditional SQL comes with built-in ADTs, such as integers (with the
associated arithmetic methods), or strings (with the equality, comparison, and LIKE
methods). Object-relational systems include these ADTs and also allow users to define
their own ADTs.

The label `abstract' is applied to these data types because the database system does not
need to know how an ADT's data is stored nor how the ADT's methods work. It merely
needs to know what methods are available and the input and output types for the
methods. Hiding of ADT internals is called encapsulation. Note that even in a relational
system, atomic types such as integers have associated methods that are encapsulated into
ADTs. In the case of integers, the standard methods for the ADT are the usual arithmetic
operators and comparators. To evaluate the addition operator on integers, the database
system need not understand the laws of addition - it merely needs to know how to invoke
the addition operator's code and what type of data to expect in return.

In an object-relational system, the simplification due to encapsulation is critical because
it hides any substantive distinctions between data types and allows an ORDBMS to be
implemented without anticipating the types and methods that users might want to add.
For example, adding integers and overlaying images can be treated uniformly by the
system, with the only significant distinctions being that different code is invoked for the
two operations and differently typed objects are expected to be returned from that code.

Some ORDBMSs actually refer to ADTs as opaque types because they are encapsulated
and hence one cannot see their details.

Packaged ORDBMS extensions: Developing a set of user-defined types and methods
for a particular application - say image management - can involve a significant amount of
work and domain-specific expertise. As a result, most ORDBMS vendors partner with
third parties to sell prepackaged sets of ADTs for particular domains. Informix calls these
extensions DataBlades, Oracle calls them Data Cartridges, IBM calls them DB2
Extenders, and so on. These packages include the ADT method code, DDL scripts to
automate loading the ADTs into the system, and in some cases specialized access
methods for the data type. Packaged ADT extensions are analogous to class libraries that
are available for object-oriented programming languages: They provide a set of objects
that together address a common task.

                                                                    Multimedia Databases

9.3.1 Defining Methods of an ADT
At a minimum, for each new atomic type a user must define methods that enable the
DBMS to read in and to output objects of this type and to compute the amount of storage
needed to hold the object. The user who creates a new atomic type must register the
following methods with the DBMS:

Size: Returns the number of bytes of storage required for items of the type or the special
value variable, if items vary in size.

Import: Creates new items of this type from textual inputs (e.g., INSERT statements).

Export: Maps items of this type to a form suitable for printing, or for use in an
application program (e.g., an ASCII string or a file handle).

In order to register a new method for an atomic type, users must write the code for the
method and then inform the database system about the method. The code to be written
depends on the languages supported by the DBMS, and possibly the operating system in
question. For example, the ORDBMS may handle Java code in the Linux operating
system. In this case the method code must be written in Java and compiled into a Java
bytecode file stored in a Linux file system. Then an SQL-style method registration
command is given to the ORDBMS so that it recognizes the new method.

CREATE FUNCTION is_sunrise(jpeg image) RETURNS boolean

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

Once the method is registered, the DBMS uses a Java virtual machine to execute the
code2. This query presents a number of method registration commands for our Dinky

1. CREATE FUNCTION thumbnail(jpeg image) RETURNS jpeg image

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

2. CREATE FUNCTION is sunrise(jpeg image) RETURNS boolean

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

3. CREATE FUNCTION is herbert(jpeg image) RETURNS boolean

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

4. CREATE FUNCTION radius(polygon, float) RETURNS polygon

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

5. CREATE FUNCTION overlaps(polygon, polygon) RETURNS boolean

AS EXTERNAL NAME `/a/b/c/dinky.class' LANGUAGE 'java';

                                                                        Multimedia Databases

These are the method Registration Commands for the Dinky Database. Type definition
statements for the user-defined atomic data types in the Dinky schema are given below.


(internallength = VARIABLE, input = jpeg in, output = jpeg out);


(internallength = VARIABLE, input = poly in, output = poly out);

9.4 Structured Types
Atomic types and user-defined types can be combined to describe more complex
structures using type constructors. Structured types are:

Lists: Traditional list operations include head, which returns the first element; tail, which
returns the list obtained by removing the first element; prepend, which takes an element
and inserts it as the first element in a list; and append, which appends one list to another.

Arrays: Array types support an `array index' method to allow users to access array items
at a particular offset. A postfix `square bracket' syntax is usually used; for example, foo

Other: The operators listed above are just a sample. We also have the aggregate
operators count, sum, avg, max, and min, which can (in principle) be applied to any object
of a collection type. Operators for type conversions are also common. For example, we
can provide operators to convert a multiset object to a set object by eliminating

9.5 Objects, Object Identity, and Reference Types
In object-database systems, data objects can be given an object identifier (oid), which is
some value that is unique in the database across time. The DBMS is responsible for
generating oids and ensuring that an oid identifies an object uniquely over its entire
lifetime. In some systems, all tuples stored in any table are objects and are automatically
assigned unique oids; in other systems, a user can specify the tables for which the tuples
are to be assigned oids. Often, there are also facilities for generating oids for larger
structures (e.g., tables) as well as smaller structures (e.g., instances of data values such as
a copy of the integer 5, or a JPEG image). An object's oid can be used to refer (or `point')
to it from elsewhere in the data. Such a reference has a type (similar to the type of a
pointer in a programming language), with a corresponding type constructor:

                                                                        Multimedia Databases

ref(base): a type representing a reference to an object of type base.

The ref type constructor can be interleaved with the type constructors for structured
types; for example, ROW(ref(ARRAY(integer))).

9.5.1 Notions of Equality
The distinction between reference types and reference-free structured types raises another
issue: the definition of equality. Two objects having the same type are defined to be deep
equal if and only if:

      The objects are of atomic type and have the same value, or
      The objects are of reference type, and the deep equals operator is true for the two
       referenced objects, or
      The objects are of structured type, and the deep equals operator is true for all the
       corresponding subparts of the two objects.

Two objects that have the same reference type are defined to be shallow equal if they
both refer to the same object (i.e., both references use the same oid).

9.5.2 Dereferencing Reference Types
An item of reference type ref(foo) is not the same as the foo item to which it points. In
order to access the referenced foo item, a built-in deref() method is provided along with
the ref type constructor.

9.6 Inheritance
In object-database systems, unlike relational systems, inheritance is supported directly
and allows type definitions to be reused and refined very easily. It can be very helpful
when modeling similar but slightly different classes of objects. In object-database
systems, inheritance can be used in two ways: for reusing and refining types, and for
creating hierarchies of collections of similar but not identical objects.

9.6.1 Defining Types with Inheritance
In the Dinky database, we model movie theaters with the type theater t. Dinky also wants
their database to represent a new marketing technique in the theater business: the theater-
cafe, which serves pizza and other meals while screening movies. Theatercafes require
additional information to be represented in the database. In particular, a theater-cafe is
just like a theater, but has an additional attribute representing the theater's menu.

                                                                      Multimedia Databases

Inheritance allows us to capture this `specialization' explicitly in the database design with
the following DDL statement:

CREATE TYPE theatercafe t UNDER theater t (menu text);

This statement creates a new type, theatercafe t, which has the same attributes and
methods as theater t, along with one additional attribute menu of type text. Methods
defined on theater t apply to objects of type theatercafe t, but not vice versa. We say that
theatercafe t inherits the attributes and methods of theater t.

Note that the inheritance mechanism is not merely a `macro' to shorten CREATE
statements. It creates an explicit relationship in the database between the subtype
(theatercafe t) and the supertype (theater t): An object of the subtype is also considered
to be an object of the supertype. This treatment means that any operations that apply to
the supertype (methods as well as query operators such as projection or join) also apply
to the subtype. This is generally expressed in the following principle: The Substitution
Principle: Given a supertype A and a subtype B, it is always possible to substitute an
object of type B into a legal expression written for objects of type A, without producing
type errors.

This principle enables easy code reuse because queries and methods written for the
supertype can be applied to the subtype without modification. Note that inheritance can
also be used for atomic types, in addition to row types. Given a supertype image t with
methods title(), number of colors(), and display(), we can define a subtype thumbnail
image t for small images that inherits the methods of image t.

9.6.2 Binding of Methods
In defining a subtype, it is sometimes useful to replace a method for the supertype with a
new version that operates differently on the subtype. Consider the image t type, and the
subtype jpeg image t from the Dinky database. Unfortunately, the display() method for
standard images does not work for JPEG images, which are specially compressed. Thus,
in creating type jpeg image t, we write a special display() method for JPEG images and
register it with the database system using the CREATE FUNCTION command:

CREATE FUNCTION display(jpeg image) RETURNS jpeg image

AS EXTERNAL NAME `/a/b/c/jpeg.class' LANGUAGE 'java';

Registering a new method with the same name as an old method is called overloading
the method name.

Because of overloading, the system must understand which method is intended in a
particular expression. For example, when the system needs to invoke the display()
method on an object of type jpeg image t, it uses the specialized display method.

                                                                      Multimedia Databases

When it needs to invoke display on an object of type image t that is not otherwise
subtyped, it invokes the standard display method. The process of deciding which method
to invoke is called binding the method to the object. In certain situations, this binding
can be done when an expression is parsed (early binding), but in other cases the most
specific type of an object cannot be known until runtime, so the method cannot be bound
until then (late binding). Late binding facilties add flexibility, but can make it harder for
the user to reason about the methods that get invoked for a given query expression.

9.6.3 Collection Hierarchies, Type Extents, and Queries
Type inheritance was invented for object-oriented programming languages, and our
discussion of inheritance up to this point differs little from the discussion one might find
in a book on an object-oriented language such as C++ or Java.

However, because database systems provide query languages over tabular datasets, the
mechanisms from programming languages are enhanced in object databases to deal with
tables and queries as well. In particular, in object-relational systems we can define a table
containing objects of a particular type, such as the Theaters table in the Dinky schema.
Given a new subtype such as theater cafe, we would like to create another table Theater
cafes to store the information about theater cafes. But when writing a query over the
Theaters table, it is sometimes desirable to ask the same query over the Theater cafes
table; after all, if we project out the additional columns, an instance of the Theater cafes
table can be regarded as an instance of the Theaters table.

Rather than requiring the user to specify a separate query for each such table, we can
inform the system that a new table of the subtype is to be treated as part of a table of the
supertype, with respect to queries over the latter table. In our example, we can say:

CREATE TABLE Theater cafes OF TYPE theater cafe t UNDER Theaters;

This statement tells the system that queries over the theaters table should actually be run
over all tuples in both the theaters and Theater cafes tables. In such cases, if the subtype
definition involves method overloading, late-binding is used to ensure that the
appropriate methods are called for each tuple.

In general, the UNDER clause can be used to generate an arbitrary tree of tables, called a
collection hierarchy. Queries over a particular table T in the hierarchy are run over all
tuples in T and its descendants. Sometimes, a user may want the query to run only on T,
and not on the descendants; additional syntax, for example, the keyword ONLY, can be
used in the query's FROM clause to achieve this effect.

Some systems automatically create special tables for each type, which contain references
to every instance of the type that exists in the database. These tables are called type
extents and allow queries over all objects of a given type, regardless of where the objects
actually reside in the database. Type extents naturally form a collection hierarchy that
parallels the type hierarchy.

                                                                     Multimedia Databases

9.7 Database Design For an ORDBMS
The rich variety of data types in an ORDBMS offers a database designer many
opportunities for a more natural or more efficient design. In this section we illustrate the
differences between RDBMS and ORDBMS database design through several examples.

9.7.1 Structured Types and ADTs
Our first example involves several space probes, each of which continuously records a
video. A single video stream is associated with each probe, and while this stream was
collected over a certain time period, we assume that it is now a complete object
associated with the probe. During the time period over which the video was collected, the
probe's location was periodically recorded (such information can easily be `piggy-backed'
onto the header portion of a video stream conforming to the MPEG standard). Thus, the
information associated with a probe has three parts: (1) a probe id that identifies a probe
uniquely, (2) a video stream, and (3) a location sequence of htime, locationi pairs. What
kind of a database schema should we use to store this information?

An RDBMS Database Design
In an RDBMS, we must store each video stream as a BLOB and each location sequence
as tuples in a table. A possible RDBMS database design is illustrated below:

Probes (pid: integer, time: timestamp, lat: real, long: real, camera: string,
      video: BLOB)

There is a single table called Probes, and it has several rows for each probe. Each of these
rows has the same pid, camera, and video values, but different time, lat, and long values.
(We have used latitude and longitude to denote location.) The key for this table can be
represented as a functional dependency: PTLN ! CV, where N stands for longitude. There
is another dependency: P ! CV. This relation is therefore not in BCNF; indeed, it is not
even in 3NF. We can decompose Probes to obtain a BCNF schema:

Probes Loc (pid: integer, time: timestamp, lat: real, long: real)

Probes Video (pid: integer, camera: string, video: BLOB)

This design is about the best we can achieve in an RDBMS. However, it suffers from
several drawbacks.

First, representing videos as BLOBs means that, we have to write application code in an
external language to manipulate a video object in the database. Consider this query: \For
probe 10, display the video recorded between 1:10 p.m. and 1:15 p.m. on May 10 1996."
We have to retrieve the entire video object associated with probe 10, recorded over
several hours, in order to display a segment recorded over 5 minutes.

Next, the fact that each probe has an associated sequence of location readings is
obscured, and the sequence information associated with a probe is dispersed across

                                                                     Multimedia Databases

several tuples. A third drawback is that we are forced to separate the video information
from the sequence information for a probe. These limitations are exposed by queries that
require us to consider all the information associated with each probe; for example, \For
each probe, print the earliest time at which it recorded, and the camera type." This query
now involves a join of Probes Loc and Probes Video on the pid field.

An ORDBMS Database Design
An ORDBMS supports a much better solution. First, we can store the video as an ADT
object and write methods that capture any special manipulation that we wish to perform.
Second, because we are allowed to store structured types such as lists, we can store the
location sequence for a probe in a single tuple, along with the video information! This
layout eliminates the need for joins in queries that involve both the sequence and video
information. An ORDBMS design for our example consists of a single relation called
Probes AllInfo:

Probes AllInfo(pid: integer, locseq: location seq, camera: string,

video: mpeg stream)

This definition involves two new types, location seq and mpeg stream. The mpeg stream
type is defined as an ADT, with a method display() that takes a start time and an end time
and displays the portion of the video recorded during that interval. This method can be
implemented efficiently by looking at the total recording duration and the total length of
the video and interpolating to extract the segment recorded during the interval specified
in the query.

Our first query is shown below in extended SQL syntax; using this display method:

We now retrieve only the required segment of the video, rather than the entire video.

SELECT display(, 1:10 p.m. May 10 1996, 1:15 p.m. May 10 1996)

FROM Probes AllInfo P

WHERE = 10

Now consider the location seq type. We could define it as a list type, containing a

list of ROW type objects:

CREATE TYPE location seq listof

(row (time: timestamp, lat: real, long: real))

Consider the locseq field in a row for a given probe. This field contains a list of rows,
each of which has three fields. If the ORDBMS implements collection types in their full
generality, we should be able to extract the time column from this list to obtain a list of
timestamp values, and to apply the MIN aggregate operator to this list to find the earliest

                                                                     Multimedia Databases

time at which the given probe recorded. Such support for collection types would enable
us to express our second query as shown below:

SELECT, MIN(P.locseq.time)

FROM Probes AllInfo P

Current ORDBMSs are not as general and clean as this example query suggests. For
instance, the system may not recognize that projecting the time column from a list of
rows gives us a list of timestamp values; or the system may allow us to apply an
aggregate operator only to a table and not to a nested list value.

Continuing with our example, we may want to do specialized operations on our location
sequences that go beyond the standard aggregate operators. For instance, we may want to
define a method that takes a time interval and computes the distance traveled by the
probe during this interval. The code for this method must understand details of a probe's
trajectory and geospatial coordinate systems. For these reasons, we might choose to
define location seq as an ADT.

Clearly, an ideal ORDBMS gives us many useful design options that are not available in

9.7.2 Object Identity
We now discuss some of the consequences of using reference types or oids. The use of
oids is especially significant when the size of the object is large, either because it is a
structured data type or because it is a big object such as an image.

Although reference types and structured types seem similar, they are actually quite

There are important differences in the way that database updates affect these two types:

Deletion: Objects with references can be affected by the deletion of objects that they
reference, while reference-free structured objects are not affected by deletion of other
objects. For example, if the Theaters table were dropped from the database, an object of
type theater might change value to null, because the theater t object that it refers to has
been deleted, while a similar object of type my theater would not change value.

Update: Objects of reference types will change value if the referenced object is updated.
Objects of reference-free structured types change value only if updated directly.

Sharing versus copying: An identified object can be referenced by multiple reference
type items, so that each update to the object is reflected in many places. To get a similar
affect in reference-free types requires updating all `copies' of an object.

                                                                        Multimedia Databases

There are also important storage distinctions between reference types and non reference
types, which might affect performance:

Storage overhead: Storing copies of a large value in multiple structured type objects
may use much more space than storing the value once and referring to it elsewhere
through reference type objects. This additional storage requirement can affect both disk
usage and buffer management (if many copies are accessed at once).

Clustering: The subparts of a structured object are typically stored together on disk.
Objects with references may point to other objects that are far away on the disk, and the
disk arm may require significant movement to assemble the object and its references
together. Structured objects can thus be more efficient than reference types if they are
typically accessed in their entirety.

Many of these issues also arise in traditional programming languages such as C or Pascal,
which distinguish between the notions of referring to objects by value and by reference.
In database design, the choice between using a structured type or a reference type will
typically include consideration of the storage costs, clustering issues, and the effect of

Object Identity versus Foreign Keys

Using an oid to refer to an object is similar to using a foreign key to refer to a tuple in
another relation, but not quite the same: An oid can point to an object of theater t that is
stored anywhere in the database, even in a field, whereas a foreign key reference is
constrained to point to an object in a particular referenced relation. This restriction makes
it possible for the DBMS to provide much greater support for referential integrity than for
arbitrary oid pointers. In general, if an object is deleted while there are still oid-pointers
to it, the best the DBMS can do is to recognize the situation by maintaining a reference
count. (Even this limited support becomes impossible if oids can be copied freely.) Thus,
the responsibility for avoiding dangling references rests largely with the user if oids are
used to refer to objects. This burdensome responsibility suggests that we should use oids
with great caution and use foreign keys instead whenever possible.

9.7.3 Extending the ER Model
The ER model as we described it in Chapter 2 is not adequate for ORDBMS design. We
have to use an extended ER model that supports structured attributes (i.e., sets, lists,
arrays as attribute values), distinguishes whether entities have object ids, and allows us to
model entities whose attributes include methods. We illustrate these comments using an
extended ER diagram to describe the space probe data in the Figure below.

The definition of Probes in has two new aspects. First, it has a structured type attribute
listof(row(time, lat, long)); each value assigned to this attribute in a Probes entity is a list
of tuples with three fields. Second, Probes has an attribute called videos that is an abstract
data type object, which is indicated by a dark oval for this attribute with a dark line

                                                                     Multimedia Databases

connecting it to Probes. Further, this attribute has an `attribute' of its own, which is a
method of the ADT.

Figure 9.10: Space Probe entity set

Alternatively, we could model each video as an entity by using an entity set called
Videos. The association between Probes entities and Videos entities could then be
captured by defining a relationship set that links them. Since each video is collected by
precisely one probe, and every video is collected by some probe, this relationship can be
maintained by simply storing a reference to a probe object with each Videos entity.

If we also make Videos a weak entity set in this alternative design, we can add a
referential integrity constraint that causes a Videos entity to be deleted when the
corresponding Probes entity is deleted. More generally, this alternative design illustrates
a strong similarity between storing references to objects and foreign keys; the foreign key
mechanism achieves the same effect as storing oids, but in a controlled manner.

If oids are used, the user must ensure that there are no dangling references when an object
is deleted, with very little support from the DBMS.

9.8 New Challenges in Implementing an ORDBMS
The enhanced functionality of ORDBMSs raises several implementation challenges.

Some of these are well understood and solutions have been implemented in products;
others are subjects of current research. In this section we examine a few of the key
challenges that arise in implementing an efficient, fully functional ORDBMS.

9.8.1 Storage and Access Methods
In particular, the system must efficiently store ADT objects and structured objects and
provide efficient indexed access to both.

                                                                      Multimedia Databases

Storing Large ADT and Structured Type Objects

Large ADT objects and structured objects complicate the layout of data on disk.

This problem is well understood and has been solved in essentially all ORDBMSs and
OODBMSs. We present some of the main issues here.

User-defined ADTs can be quite large. In particular, they can be bigger than a single disk
page. Large ADTs, like BLOBs, require special storage, typically in a different location
on disk from the tuples that contain them. Disk-based pointers are maintained from the
tuples to the objects they contain.

Structured objects can also be large, but unlike ADT objects they often vary in size
during the lifetime of a database. For example, consider the stars attribute of the films
table. As the years pass, some of the `bit actors' in an old movie may become famous.4
When a bit actor becomes famous, Dinky might want to advertise his or her presence in
the earlier films. This involves an insertion into the stars attribute of an individual tuple
in films. Because these bulk attributes can grow arbitrarily, flexible disk layout
mechanisms are required.

An additional complication arises with array types. Traditionally, array elements are
stored sequentially on disk in a row-by-row fashion; for example

However, queries may often request subarrays that are not stored contiguously on disk
(e.g., A11, A21,…., Am1). Such requests can result in a very high I/O cost for retrieving the
subarray. In order to reduce the number of I/Os required in general, arrays are often
broken into contiguous chunks, which are then stored in some order on disk. Although
each chunk is some contiguous region of the array, chunks need not be row-by-row or
column-by-column. For example, a chunk of size 4 might be A11, A12, A21, A22, which is a
square region if we think of the array as being arranged row-by-row in two dimensions.

Indexing New Types

One important reason for users to place their data in a database is to allow for efficient
access via indexes. Unfortunately, the standard RDBMS index structures support only
equality conditions (B+ trees and hash indexes) and range conditions (B+ trees). An
important issue for ORDBMSs is to provide efficient indexes for ADT methods and
operators on structured objects.

Many specialized index structures have been proposed by researchers for particular
applications such as cartography, genome research, multimedia repositories, Web search,
and so on. An ORDBMS company cannot possibly implement every index that has been
invented. Instead, the set of index structures in an ORDBMS should be user extensible.

                                                                         Multimedia Databases

Extensibility would allow an expert in cartography, for example, to not only register an
ADT for points on a map (i.e., latitude/longitude pairs), but also implement an index
structure that supports natural map queries.

One way to make the set of index structures extensible is to publish an access method
interface that lets users implement an index structure outside of the DBMS. The index
and data can be stored in a _le system, and the DBMS simply issues the open, next, and
close iterator requests to the user's external index code. Such functionality makes it
possible for a user to connect a DBMS to a Web search engine, for example. A main
drawback of this approach is that data in an external index is not protected by the
DBMS's support for concurrency and recovery. An alternative is for the ORDBMS to
provide a generic `template' index structure that is sufficiently general to encompass most
index structures that users might invent. Because such a structure is implemented within
the DBMS, it can support high concurrency and recovery. The Generalized Search Tree
(GiST) is such a structure. It is a template index structure based on B+ trees, which
allows most of the tree index structures invented so far to be implemented with only a
few lines of user-defined ADT code.

9.8.2 Query Processing
ADTs and structured types call for new functionality in processing queries in ORDBMSs.

They also change a number of assumptions that affect the efficiency of queries. In this
section we look at two functionality issues (user-defined aggregates and security) and two
efficiency issues (method caching and pointer swizzling).

User-Defined Aggregation Functions

Since users are allowed to define new methods for their ADTs, it is not unreasonable to
expect them to want to define new aggregation functions for their ADTs as well. For
example, the usual SQL aggregates - COUNT, SUM, MIN, MAX, AVG - are not
particularly appropriate for the image type in the Dinky schema.

Most ORDBMSs allow users to register new aggregation functions with the system. To
register an aggregation function, a user must implement three methods, which we will
call initialize, iterate, and terminate. The initialize method initializes the internal state for
the aggregation. The iterate method updates that state for every tuple seen, while the
terminate method computes the aggregation result based on the final state and then cleans
up. As an example, consider an aggregation function to compute the second-highest value
in a field. The initialize call would allocate storage for the top two values, the iterate call
would compare the current tuple's value with the top two and update the top two as
necessary, and the terminate call would delete the storage for the top two values,
returning a copy of the second-highest value.

                                                                     Multimedia Databases

Method Security

ADTs give users the power to add code to the DBMS; this power can be abused. A buggy
or malicious ADT method can bring down the database server or even corrupt the
database. The DBMS must have mechanisms to prevent buggy or malicious user code
from causing problems. It may make sense to override these mechanisms for efficiency in
production environments with vendor-supplied methods. However, it is important for the
mechanisms to exist, if only to support debugging of ADT methods; otherwise method
writers would have to write bug-free code before registering their methods with the

One mechanism to prevent problems is to have the user methods be interpreted rather
than compiled. The DBMS can check that the method is well behaved either by
restricting the power of the interpreted language or by ensuring that each step taken by a
method is safe before executing it. Typical interpreted languages for this purpose include
Java and the procedural portions of SQL:1999.

An alternative mechanism is to allow user methods to be compiled from a general
purpose programming language such as C++, but to run those methods in a different
address space than the DBMS. In this case the DBMS sends explicit interprocess
communications (IPCs) to the user method, which sends IPCs back in return. This
approach prevents bugs in the user methods (e.g., stray pointers) from corrupting the state
of the DBMS or database and prevents malicious methods from reading or modifying the
DBMS state or database as well. The user code can be linked with a `wrapper' that turns
method invocations and return values into IPCs.

Method Caching

User-defined ADT methods can be very expensive to execute and can account for the
bulk of the time spent in processing a query. During query processing it may make sense
to cache the results of methods, in case they are invoked multiple times with the same
argument. Within the scope of a single query, one can avoid calling a method twice on
duplicate values in a column by either sorting the table on that column or using a hash-
based scheme much like that used for aggregation. An alternative is to maintain a cache
of method inputs and matching outputs as a table in the database. Then to find the value
of a method on particular inputs, we essentially join the input tuples with the cache table.

Pointer Swizzling

In some applications, objects are retrieved into memory and accessed frequently through
their oids; dereferencing must be implemented very efficiently. Some systems maintain a
table of oids of objects that are (currently) in memory. When an object O is brought into
memory, they check each oid contained in O and replace oids of in-memory objects by
in-memory pointers to those objects. This technique is called pointer swizzling and
makes references to in-memory objects very fast. The downside is that when an object is
paged out, in-memory references to it must somehow be invalidated and replaced with its

                                                                     Multimedia Databases

9.8.3 Query Optimization
New indexes and query processing techniques widen the choices available to a query
optimizer. In order to handle the new query processing functionality, an optimizer must
know about the new functionality and use it appropriately. In this section we discuss two
issues in exposing information to the optimizer (new indexes and ADT method
estimation) and an issue in query planning that was ignored in relational systems
(expensive selection optimization).

Registering Indexes with the Optimizer

As new index structures are added to a system - either via external interfaces or built in
template structures like GiSTs - the optimizer must be informed of their existence, and
their costs of access. In particular, for a given index structure the optimizer must know
(a) what WHERE-clause conditions are matched by that index, and (b) what the cost of
fetching a tuple is for that index. Given this information, the optimizer can use any index
structure in constructing a query plan. Different ORDBMSs vary in the syntax for
registering new index structures. Most systems require users to state a number
representing the cost of access, but an alternative is for the DBMS to measure the
structure as it is used and maintain running statistics on cost.

Reduction Factor and Cost Estimation for ADT Methods

For user-defined conditions such as is_herbert(), the optimizer also needs to be able to
estimate reduction factors. Estimating reduction factors for user-defined conditions is a
difficult problem and is being actively studied. The currently popular approach is to leave
it up to the user-a user who registers a method can also register an auxiliary function to
estimate the method's reduction factor. If such a function is not registered, the optimizer
uses an arbitrary value such as 1/10.

ADT methods can be quite expensive and it is important for the optimizer to know just
how much these methods cost to execute. Again, estimating method costs is open
research. In current systems users who register a method are able to specify the method's
cost as a number, typically in units of the cost of an I/O in the system. Such estimation is
hard for users to do accurately. An attractive alternative is for the ORDBMS to run the
method on objects of various sizes and attempt to estimate the method's cost
automatically, but this approach has not been investigated in detail and is not
implemented in commercial ORDBMSs.

Expensive selection optimization

In relational systems, selection is expected to be a zero-time operation. For example, it
requires no I/Os and few CPU cycles to test if emp.salary < 10. However, conditions
such as is_herbert(Frames.image) can be quite expensive because they may fetch large
objects off the disk and process them in memory in complicated ways.

                                                                     Multimedia Databases

ORDBMS optimizers must consider carefully how to order selection conditions. For
example, consider a selection query that tests tuples in the Frames table with two

Frames.frameno < 100 ^ is_herbert(Frame.image).

 It is probably preferable to check the frameno condition before testing is_herbert. The
first condition is quick and may often return false, saving the trouble of checking the
second condition. In general, the best ordering among selections is a function of their
costs and reduction factors. It can be shown that selections should be ordered by
increasing rank, where

rank = (reduction factor − 1)=cost. If a selection with very high rank appears in a multi-
table query, it may even make sense to postpone the selection until after performing joins.
The details of optimally placing expensive selections among joins are somewhat
complicated, adding to the complexity of optimization in ORDBMSs.

                                                                   Multimedia Databases

                                   CHAPTER 10


In the introduction of this chapter, we defined an OODBMS as a programming language
with support for persistent objects. While this definition reflects the origins of
OODBMSs accurately, and to a certain extent the implementation focus of OODBMSs,
the fact that OODBMSs support collection types (see Section 25.3) makes it possible to
provide a query language over collections. Indeed, a standard has been developed by the
Object Database Management Group (ODMG) and is called Object Query Language, or

OQL is similar to SQL, with a SELECT{FROM{WHERE{style syntax (even GROUP
BY, HAVING, and ORDER BY are supported) and many of the proposed SQL:1999
extensions. Notably, OQL supports structured types, including sets, bags, arrays, and
lists. The OQL treatment of collections is more uniform than SQL:1999 in that it does not
give special treatment to collections of rows; for example, OQL allows the aggregate
operation COUNT to be applied to a list to compute the length of the list. OQL also
supports reference types, path expressions, ADTs and inheritance, type extents, and SQL-
style nested queries. There is also a standard Data Definition Language for OODBMSs
(Object Data Language, or ODL) that is similar to the DDL subset of SQL, but supports
the additional features found in OODBMSs, such as ADT definitions.

10.1.1 The ODMG Data Model and ODL
The ODMG data model is the basis for an OODBMS, just like the relational data model
is the basis for an RDBMS. A database contains a collection of objects, which are similar
to entities in the ER model. Every object has a unique oid, and a database contains
collections of objects with similar properties; such a collection is called a class.

The properties of a class are specified using ODL and are of three kinds: attributes,
relationships, and methods. Attributes have an atomic type or a structured type.

                                                                      Multimedia Databases

ODL supports the set, bag, list, array, and struct type constructors; these are just setof,
bagof, listof, ARRAY, and ROW.

Relationships have a type that is either a reference to an object or a collection of such
references. A relationship captures how an object is related to one or more objects of the
same class or of a different class. A relationship in the ODMG model is really just a
binary relationship in the sense of the ER model. A relationship has a corresponding
inverse relationship; intuitively, it is the relationship `in the other direction. For
example, if a movie is being shown at several theaters, and each theater shows several
movies, we have two relationships that are inverses of each other: shownAt is associated
with the class of movies and is the set of theaters at which the given movie is being
shown, and nowShowing is associated with the class of theaters and is the set of movies
being shown at that theater.

Methods are functions that can be applied to objects of the class. There is no analog to
methods in the ER or relational models. The keyword interface is used to define a class.
For each interface, we can declare an extent, which is the name for the current set of
objects of that class. The extent is analogous to the instance of a relation, and the
interface is analogous to the schema. If the user does not anticipate the need to work with
the set of objects of a given class - it is sufficient to manipulate individual objects - the
extent declaration can be omitted.

The following ODL definitions of the Movie and Theater classes illustrate the above
concepts. (While these classes bear some resemblance to the Dinky database schema, the
reader should not look for an exact parallel, since we have modified the example to
highlight ODL features.)

interface Movie

       (extent Movies key movieName)

       { attribute date start;

       attribute date end;

       attribute string moviename;

       relationship SethTheateri shownAt inverse Theater::nowShowing;


The collection of database objects whose class is Movie is called Movies. No two objects
in Movies have the same movieName value, as the key declaration indicates.

Each movie is shown at a set of theaters and is shown during the specified period. (It
would be more realistic to associate a different period with each theater, since a movie is
typically played at different theaters over different periods. While we can define a class
that captures this detail, we have chosen a simpler definition for our discussion.)

                                                                   Multimedia Databases

A theater is an object of class Theater, which is defined below:

interface Theater

       (extent Theaters key theaterName)

       { attribute string theaterName;

       attribute string address;

       attribute integer ticketPrice;

       relationship SethMoviei nowShowing inverse Movie::shownAt;

       float numshowing() raises(errorCountingMovies);


Each theater shows several movies and charges the same ticket price for every movie.
Observe that the shownAt relationship of Movie and the nowShowing relationship of
Theater are declared to be inverses of each other. Theater also has a method
numshowing() that can be applied to a theater object to find the number of movies being
shown at that theater. ODL also allows us to specify inheritance hierarchies, as the
following class definition illustrates:

interface SpecialShow extends Movie

       (extent SpecialShows)

       { attribute integer maximumAttendees;

       attribute string benefitCharity;


An object of class SpecialShow is an object of class Movie, with some additional

10.1.2 OQL
The ODMG query language OQL was deliberately designed to have syntax similar to
SQL, in order to make it easy for users familiar with SQL to learn OQL. Let us begin
with a query that finds pairs of movies and theaters such that the movie is shown at the
theater and the theater is showing more than one movie:

SELECT mname: M.movieName, tname: T.theaterName

FROM Movies M, M.shownAt T

WHERE T.numshowing() > 1

                                                                      Multimedia Databases

The SELECT clause indicates how we can give names to fields in the result; the two
result fields are called mname and tname. The part of this query that differs from SQL is
the FROM clause. The variable M is bound in turn to each movie in the extent Movies.
For a given movie M, we bind the variable T in turn to each theater in the collection
M.shownAt. Thus, the use of the path expression M.shownAt allows us to easily express a
nested query. The following query illustrates the grouping construct in OQL:

SELECT T.ticketPrice,

avgNum: AVG(SELECT P.T.numshowing() FROM partition P)

FROM Theaters T

GROUP BY T.ticketPrice

For each ticket price, we create a group of theaters with that ticket price. This group of
theaters is the partition for that ticket price and is referred to using the OQL keyword
partition. In the SELECT clause, for each ticket price, we compute the average number of
movies shown at theaters in the partition for that ticketPrice. OQL supports an interesting
variation of the grouping operation that is missing in SQL:

SELECT low, high,

avgNum: AVG(SELECT P.T.numshowing() FROM partition P)

FROM Theaters T

GROUP BY low: T.ticketPrice < 5, high: T.ticketPrice >= 5

The GROUP BY clause now creates just two partitions called low and high. Each theater
object T is placed in one of these partitions based on its ticket price. In the SELECT
clause, low and high are boolean variables, exactly one of which is true in any given
output tuple; partition is instantiated to the corresponding partition of theater objects. In
our example, we get two result tuples. One of them has low equal to true and avgNum
equal to the average number of movies shown at theaters with a low ticket price. The
second tuple has high equal to true and avgNum equal to the average number of movies
shown at theaters with a high ticket price.

The next query illustrates OQL support for queries that return collections other than set
and multiset:

(SELECT T.theaterName

FROM Theaters T

ORDER BY T.ticketPrice DESC) [0:4]

The ORDER BY clause makes the result a list of theater names ordered by ticket price.

                                                                     Multimedia Databases

The elements of a list can be referred to by position, starting with position 0. Thus, the
expression [0:4] extracts a list containing the names of the five theaters with the highest
ticket prices. OQL also supports DISTINCT, HAVING, explicit nesting of subqueries,
view definitions, and other SQL features.

10.2 Comparing RDBMS with OODBMS and ORDBMS
Now that we have covered the main object-oriented DBMS extensions, it is time to
consider the two main variants of object-databases, OODBMSs and ORDBMSs, and to
compare them with RDBMSs. Although we have presented the concepts underlying
object-databases, we still need to define the terms OODBMS and ORDBMS.

An OODBMS is a programming language with a type system that supports the features
discussed in this chapter and allows any data object to be persistent, that is, to survive
across different program executions. Many current systems conform to neither definition
entirely but are much closer to one or the other, and can be classified accordingly.

10.2.1 RDBMS versus ORDBMS
Comparing an RDBMS with an ORDBMS is straightforward. An RDBMS does not
support the extensions discussed in this chapter. The resulting simplicity of the data
model makes it easier to optimize queries for efficient execution, for example. A
relational system is also easier to use because there are fewer features to master. On the
other hand, it is less versatile than an ORDBMS.

10.2.2 OODBMS and ORDBMS: Similarities
OODBMSs and ORDBMSs both support user-defined ADTs, structured types, object
identity and reference types, and inheritance. Both support a query language for
manipulating collection types. ORDBMSs support an extended form of SQL, and
OODBMSs support ODL/OQL. The similarities are by no means accidental: ORDBMSs
consciously try to add OODBMS features to an RDBMS, and OODBMSs in turn have
developed query languages based on relational query languages. Both OODBMSs and
ORDBMSs provide DBMS functionality such as concurrency control and recovery.

10.2.3 OODBMS versus ORDBMS: Differences
The fundamental difference is really a philosophy that is carried all the way through:
OODBMSs try to add DBMS functionality to a programming language, whereas
ORDBMSs try to add richer data types to a relational DBMS. Although the two kinds of
object-databases are converging in terms of functionality, this difference in their
underlying philosophies (and for most systems, their implementation approach) has
important consequences in terms of the issues emphasized in the design of these DBMSs,
and the efficiency with which various features are supported, as the following
comparison indicates:

                                                                      Multimedia Databases

OODBMSs aim to achieve seamless integration with a programming language such as
C++, Java or Smalltalk. Such integration is not an important goal for an ORDBMS.
SQL:1999, like SQL-92, allows us to embed SQL commands in a host language, but the
interface is very evident to the SQL programmer.

An OODBMS is aimed at applications where an object-centric viewpoint is appropriate;
that is, typical user sessions consist of retrieving a few objects and working on them for
long periods, with related objects (e.g., objects referenced by the original objects) fetched
occasionally. Objects may be extremely large, and may have to be fetched in pieces; thus,
attention must be paid to buffering parts of objects. It is expected that most applications
will be able to cache the objects they require in memory, once the objects are retrieved
from disk. Thus, considerable attention is paid to making references to in-memory objects
efficient. Transactions are likely to be of very long duration and holding locks until the
end of a transaction may lead to poor performance; thus, alternatives to Two Phase
locking must be used.

An ORDBMS is optimized for applications where large data collections are the focus,
even though objects may have rich structure and be fairly large. It is expected that
applications will retrieve data from disk extensively, and that optimizing disk accesses is
still the main concern for efficient execution. Transactions are assumed to be relatively
short, and traditional RDBMS techniques are typically used for concurrency control and

The query facilities of OQL are not supported efficiently in most OODBMSs, whereas
the query facilities are the centerpiece of an ORDBMS. To some extent, this situation is
the result of different concentrations of effort in the development of these systems. To a
significant extent, it is also a consequence of the systems' being optimized for very
different kinds of applications.

We use the term spatial data in a broad sense, covering multidimensional points, lines,
rectangles, polygons, cubes, and other geometric objects. A spatial data object occupies a
certain region of space, called its spatial extent, which is characterized by its location
and boundary.

10.3.1 Types of Spatial Data and Queries
From the point of view of a DBMS, we can classify spatial data as being either point data
or region data.

Point data: A point has a spatial extent characterized completely by its location;
intuitively, it occupies no space and has no associated area or volume. Point data consists
of a collection of points in a multidimensional space. Point data stored in a database can
be based on direct measurements or be generated by transforming data obtained through
measurements for ease of storage and querying. Raster data is an example of directly
measured point data and includes bit maps or pixel maps such as satellite imagery. Each

                                                                        Multimedia Databases

pixel stores a measured value (e.g., temperature or color) for a corresponding location in
space. Another example of such measured point data is medical imagery such as three-
dimensional magnetic resonance imaging (MRI) brain scans. Feature vectors extracted
from images, text, or signals such as time series are examples of point data obtained by
transforming a data object. As we will see, it is often easier to use such a representation
of the data, instead of the actual image or signal, to answer queries.

Region data: A region has a spatial extent with a location and a boundary. The location
can be thought of as the position of a fixed `anchor point' for the region, e.g., its centroid.
In two dimensions, the boundary can be visualized as a line (for finite regions, a closed
loop), and in three dimensions it is a surface. Region data consists of a collection of
regions. Region data stored in a database is typically a simple geometric approximation
to an actual data object. Vector data is the term used to describe such geometric
approximations, constructed using points, line segments, polygons, spheres, cubes, etc.
Many examples of region data arise in geographic applications. For instance, roads and
rivers can be represented as a collection of line segments, and countries, states, and lakes
can be represented as polygons. Other examples arise in computer-aided design
applications. For instance, an airplane wing might be modeled as a wire frame using a
collection of polygons (that intuitively tile the wire frame surface approximating the
wing), and a tubular object may be modeled as the difference between two concentric

Spatial queries, or queries that arise over spatial data, are of three main types: spatial
range queries, nearest neighbor queries, and spatial join queries.

Spatial range queries: In addition to multidimensional queries such as, \Find all
employees with salaries between $50,000 and $60,000 and ages between 40 and 50," we
can ask queries such as, \Find all cities within 50 miles of Madison," or, \Find all rivers in
Wisconsin." A spatial range query has an associated region (with a location and
boundary). In the presence of region data, spatial range queries can return all regions that
overlap the specified range or all regions that are contained within the specified range.
Both variants of spatial range queries are useful, and algorithms for evaluating one
variant are easily adapted to solve the other. Range queries occur in a wide variety of
applications, including relational queries, GIS queries, and CAD/CAM queries.

Nearest neighbor queries: A typical query is, \Find the 10 cities that are nearest to
Madison." We usually want the answers to be ordered by distance to Madison, i.e., by
proximity. Such queries are especially important in the context of multimediadatabases,
where an object (e.g., images) is represented by a point, and `similar' objects are found by
retrieving objects whose representative points are closest to the point representing the
query object.

Spatial join queries: Typical examples include \Find pairs of cities within 200 miles of
each other" and \Find all cities near a lake." These queries can be quite expensive to
evaluate. If we consider a relation in which each tuple is a point representing a city or a
lake, the above queries can be answered by a join of this relation with itself, where the

                                                                      Multimedia Databases

join condition specifies the distance between two matching tuples. Of course, if cities and
lakes are represented in more detail and have a spatial extent, both the meaning of such
queries (are we looking for cities whose centroids are within 200 miles of each other or
cities whose boundaries come within 200 miles of each other?) and the query evaluation
strategies become more complex. Still, the essential character of a spatial join query is

These kinds of queries are very common and arise in most applications of spatial data.
Some applications also require specialized operations such as interpolation of
measurements at a set of locations to obtain values for the measured attribute over an
entire region.

10.4 Applications Involving Spatial Data
There are many applications that involve spatial data. Even a traditional relation with k
fields can be thought of as a collection of k-dimensional points, and as we will see in
Section 26.3, certain relational queries can be executed faster by using indexing
techniques designed for spatial data. In this section, however, we concentrate on
applications in which spatial data plays a central role and in which efficient handling of
spatial data is essential for good performance.

Geographic Information Systems (GIS) deal extensively with spatial data, including
points, lines, and two- or three-dimensional regions. For example, a map contains
locations of small objects (points), rivers and highways (lines), and cities and lakes

A GIS system must efficiently manage two-dimensional and three-dimensional datasets.
All the classes of spatial queries that we described arise naturally, and both point data and
region data must be handled. Commercial GIS systems such as ArcInfo are in wide use
today, and object database systems aim to support GIS applications as well.

Computer-aided design and manufacturing (CAD/CAM) systems and medical imaging
systems store spatial objects such as surfaces of design objects (e.g., the fuselage of an
aircraft). As with GIS systems, both point and region data must be stored. Range queries
and spatial join queries are probably the most common queries, and spatial integrity
constraints such as \There must be a minimum clearance of one foot between the wheel
and the fuselage" can be very useful. (CAD/CAM was a major motivation for the
development of object databases.)

Multimedia databases, which contain multimedia objects such as images, text, and
various kinds of time-series data (e.g., audio), also require spatial data management. In
particular, finding objects similar to a given object is a common kind of query in a
multimedia system, and a popular approach to answering similarity queries involves first
mapping multimedia data to a collection of points called feature vectors. A similarity

                                                                     Multimedia Databases

query is then converted to the problem of finding the nearest neighbors of the point that
represents the query object.

In medical image databases, we have to store digitized two-dimensional and three-
dimensional images such as X-rays or MRI images. Fingerprints (together with
information identifying the fingerprinted individual) can be stored in an image database,
and we can search for fingerprints that match a given fingerprint. Photographs from
driver's licenses can be stored in a database, and we can search for faces that match a
given face. Such image database applications rely on content-based image retrieval
(e.g., find images similar to a given image). Going beyond images, we can store a
database of video clips and search for clips in which a scene changes, or in which there is
a particular kind of object. We can store a database of signals or timeseries, and look for
similar time-series. We can store a collection of text documents and search for similar
documents (i.e., dealing with similar topics).

Feature vectors representing multimedia objects are typically points in a high-
dimensional space. For example, we can obtain feature vectors from a text object by
using a list of keywords (or concepts) and noting which keywords are present; we thus
get a vector of 1s (the corresponding keyword is present) and 0s (the corresponding
keyword is missing in the text object) whose length is equal to the number of keywords in
our list. Lists of several hundred words are commonly used. We can obtain feature
vectors from an image by looking at its color distribution (the levels of red, green, and
blue for each pixel) or by using the first several coefficients of a mathematical function
(e.g., the Hough transform) that closely approximates the shapes in the image. In general,
given an arbitrary signal, we can represent it using a mathematical function having a
standard series of terms, and approximate it by storing the coefficients of the most
significant terms.

When mapping multimedia data to a collection of points, it is important to ensure that
there is a measure of distance between two points that captures the notion of similarity
between the corresponding multimedia objects. Thus, two images that map to two nearby
points must be more similar than two images that map to two points far from each other.
Once objects are mapped into a suitable coordinate space, finding similar images, similar
documents, or similar time-series can be modeled as finding points that are close to each
other: We map the query object to a point and look for its nearest neighbors. The most
common kind of spatial data in multimedia applications is point data, and the most
common query is nearest neighbor. In contrast to GIS and CAD/CAM, the data is of high
dimensionality (usually 10 or more dimensions).

                                                                       Multimedia Databases

10.5 Introduction to Spatial Indexes
A multidimensional or spatial index, in contrast to a B+ tree, utilizes some kind of
spatial relationship to organize data entries, with each key value seen as a point (or
region, for region data) in a k-dimensional space, where k is the number of fields in the
search key for the index.

In a B+ tree index, the two-dimensional space of hage, sali values is linearized - i.e.,
points in the two-dimensional domain are totally ordered - by sorting on age first and
then on sal. In the Figure, the dotted line indicates the linear order in which points are
stored in a B+ tree. In contrast, a spatial index stores data entries based on their proximity
in the underlying two-dimensional space. The boxes indicate how points are stored in a
spatial index.

Figure 10.1: Clustering of data entries in B+ trees Vs. Spatial Indexes

Let us compare a B+ tree index on key hage, sali with a spatial index on the space of age
and sal values, using several example queries:

1. age < 12: The B+ tree index performs very well. As we will see, a spatial index will
handle such a query quite well, although it cannot match a B+ tree index in this case.

2. sal < 20: The B+ tree index is of no use since it does not match this selection. In
contrast, the spatial index will handle this query just as well as the previous selection on

3. age < 12 ^ sal < 20: The B+ tree index effectively utilizes only the selection on age. If
most tuples satisfy the age selection, it will perform poorly. The spatial index will fully
utilize both selections and return only tuples that satisfy both the age and sal conditions.
To achieve this with B+ tree indexes, we have to create two separate indexes on age and
sal, retrieve rids of tuples satisfying the age selection by using the index on age and
retrieve rids of tuples satisfying the sal condition by using the index on sal, intersect
these rids, then retrieve the tuples with these rids.

                                                                        Multimedia Databases

Spatial indexes are ideal for queries such as, \Find the 10 nearest neighbors of a given
point," and, \Find all points within a certain distance of a given point." The drawback
with respect to a B+ tree index is that if (almost) all data entries are to be retrieved in age
order, a spatial index is likely to be slower than a B+ tree index in which age is the first
field in the search key.

10.5.1 Overview of Proposed Index Structures
Many spatial index structures have been proposed. Some are primarily designed to index
collections of points although they can be adapted to handle regions, and some handle
region data naturally. Examples of index structures for point data include Grid _les, hB
trees, KD trees, Point Quad trees, and SR trees. Examples of index structures that handle
regions as well as point data include Region Quad trees, R trees, and SKD trees. The
above lists are far from complete; there are many variants of the above index structures
and many entirely distinct index structures.

There is as yet no consensus on the `best' spatial index structure. However, R trees have
been widely implemented and found their way into commercial DBMSs. This is due to
their relative simplicity, their ability to handle both point and region data, and the fact
that their performance is at least comparable to more complex structures. We will discuss
three approaches that are distinct and, taken together, illustrative of many of the proposed
indexing alternatives. First, we discuss index structures that rely on space-filling curves
to organize points. We begin by discussing Z-ordering for point data, and then discuss Z-
ordering for region data, which is essentially the idea behind Region Quad trees. Region
Quad trees illustrate an indexing approach based on recursive subdivision of the
multidimensional space, independent of the actual dataset. There are several variants of
Region Quad trees.

Second, we discuss Grid files, which illustrate how an Extendible Hashing style directory
can be used to index spatial data. Many index structures such as Bang files, Buddy trees,
and Multilevel Grid files have been proposed, refining the basic idea. Finally, we discuss
R trees, which also recursively subdivide the multidimensional space.

In contrast to Region Quad trees, the decomposition of space utilized in an R tree
depends upon the indexed dataset. We can think of R trees as an adaptation of the B+ tree
idea to spatial data. Many variants of R trees have been proposed, including Cell trees,
Hilbert R trees, Packed R trees, R* trees, R+ trees, TV trees, and X trees.

10.6 Indexing Based on Space-Filling Curves
Space-filling curves are based on the assumption that any attribute value can be
represented with some fixed number of bits, say k bits. The maximum number of values
along each dimension is therefore 2k. We will consider a two-dimensional dataset for
simplicity although the approach can handle any number of dimensions.

                                                                       Multimedia Databases

Figure 10.2: Space filling curves

A space-filling curve imposes a linear ordering on the domain, as illustrated above. The
first curve shows the Z-ordering curve for domains with two-bit representations of
attribute values. A given dataset contains a subset of the points in the domain, and these
are shown as filled circles in the figure. Domain points that are not in the given dataset
are shown as unfilled circles. Consider the point with X = 01 and Y = 11 in the first curve.
The point has Z-value 0111, obtained by interleaving the bits of the X and Y values; we
take the first X bit (0), then the first Y bit (1), then the second X bit (1), and finally the
second Y bit (1). In decimal representation, the Z-value 0111 is equal to 7, and the point X
= 01 and Y = 11 has the Z-value 7 shown next to it in the second figure. This is the eighth
domain point `visited' by the space-filling curve, which starts at point X = 00 and Y = 00
(Z-value 0).

The points in a dataset are stored in Z-value order and indexed by a traditional indexing
structure such as a B+ tree. That is, the Z-value of a point is stored together with the point
and is the search key for the B+ tree. (Actually, we do not have to store the X and Y
values for a point if we store the Z-value, since we can compute them from the Z-value
by extracting the interleaved bits.) To insert a point, we compute its Z-value and insert it
into the B+ tree. Deletion and search are similarly based on computing the Z-value and
then using the standard B+ tree algorithms.

The advantage of this approach over using a B+ tree index on some combination of the X
and Y fields is that points are clustered together by spatial proximity in the X{Y space.
Spatial queries over the X{Y space now translate into linear range queries overthe
ordering of Z-values and are efficiently answered using the B+ tree on Z-values.

The spatial clustering of points achieved by the Z-ordering curve is seen more clearly in
the second curve in Figure 26.2, which shows the Z-ordering curve for domains with
three-bit representations of attribute values. If we visualize the space of all points as four
quadrants, the curve visits all points in a quadrant before moving on to another quadrant.
This means that all points in a quadrant are stored together. This property holds
recursively within each quadrant as well - each of the four subquadrants is completely

                                                                      Multimedia Databases

traversed before the curve moves to another subquadrant. Thus, all points in a
subquadrant are stored together.

The Z-ordering curve achieves good spatial clustering of points, but it can be improved
upon. Intuitively, the curve occasionally makes long diagonal `jumps,' and the points
connected by the jumps, while far apart in the X{Y space of points, are nonetheless close
in Z-ordering. The Hilbert curve, shown as the third curve, addresses this problem.

10.6.1 Region Quad Trees and Z-Ordering: Region Data
Z-ordering gives us a way to group points according to spatial proximity. What if we
have region data? The key is to understand how Z-ordering recursively decomposes the
data space into quadrants and subquadrants.

Figure 10.3: Z-Ordering and region Quad-trees

The Region Quad tree structure corresponds directly to the recursive decomposition of
the data space. Each node in the tree corresponds to a square-shaped region of the data
space. As special cases, the root corresponds to the entire data space, and some leaf nodes
correspond to exactly one point. Each internal node has four children, corresponding to
the four quadrants into which the space corresponding to the node is partitioned: 00
identifies the bottom left quadrant, 01 identifies the top left quadrant, 10 identifies the
bottom right quadrant, and 11 identifies the top right quadrant.

In the figure, consider the children of the root. All points in the quadrant corresponding to
the 00 child have Z-values that begin with 00, all points in the quadrant corresponding to
the 01 child have Z-values that begin with 01, and so on. In fact, the Z-value of a point
can be obtained by traversing the path from the root to the leaf node for the point and
concatenating all the edge labels.

                                                                        Multimedia Databases

Consider the region represented by the rounded rectangle. Suppose that the rectangle
object is stored in the DBMS and given the unique identifier (oid) R. R includes all points
in the 01 quadrant of the root as well as the points with Z-values 1 and 3, which are in the
00 quadrant of the root. In the figure, the nodes for points 1 and 3 and the 01 quadrant of
the root are shown with dark boundaries. Together, the dark nodes represent the rectangle
R. The three records h 0001, Ri, h 0011, Ri, and h 01, Ri can be used to store this
information. The first field of each record is a Z-value; the records are clustered and
indexed on this column using a B+ tree. Thus, a B+ tree is used to implement a Region
Quad tree, just as it was used to implement Z-ordering. Note that a region object can
usually be stored using fewer records if it is sufficient to represent it at a coarser level of
detail. For example, rectangle R can be represented using two records h 00, Ri and h 01,
Ri. This approximates R by using the bottom left and top left quadrants of the root.

The Region Quad tree idea can be generalized beyond two dimensions. In k dimensions,
at each node we partition the space into 2k subregions; for k = 2, we partition the space
into four equal parts (quadrants). We will not discuss the details.

10.6.2 Spatial Queries Using Z-Ordering
Range queries can be handled by translating the query into a collection of regions, each
represented by a Z-value. (We saw how to do this in our discussion of region data and
Region Quad trees.) We then search the B+ tree to find matching data items.

Nearest neighbor queries can also be handled, although they are a little trickier because
distance in the Z-value space does not always correspond well to distance in the original
X{Y coordinate space (recall the diagonal jumps in the Z-order curve). The basic idea is
to first compute the Z-value of the query and find the data point with the closest Z-value
by using the B+ tree. Then, to make sure we are not overlooking any points that are
closer in the X{Y space, we compute the actual distance r between the query point and the
retrieved data point and issue a range query centered at the query point and with radius r.
We check all retrieved points and return the one that is closest to the query point. Spatial
joins can be handled by extending the approach to range queries.

10.7 Grid Files
In contrast to the Z-ordering approach, which partitions the data space independently of
any one dataset, the Grid file partitions the data space in a way that reflects the data
distribution in a given dataset. The method is designed to guarantee that any point query
(a query that retrieves the information associated with the query point) can be answered
in at most two disk accesses.

Grid files rely upon a grid directory to identify the data page containing a desired point.
The grid directory is similar to the directory used in Extendible Hashing. When searching
for a point, we first find the corresponding entry in the grid directory. The grid directory
entry, like the directory entry in Extendible Hashing, identifies the page on which the
desired point is stored, if the point is in the database.

                                                                       Multimedia Databases

To understand the Grid file structure, we need to understand how to find the grid
directory entry for a given point. We describe the Grid file structure for two-dimensional
data. The method can be generalized to any number of dimensions, but we restrict
ourselves to the two-dimensional case for simplicity. The Grid file partitions space into
rectangular regions using lines that are parallel to the axes. Thus, we can describe a Grid
file partitioning by specifying the points at which each axis is `cut.' If the X axis is cut
into i segments and the

Y axis is cut into j segments, we have a total of i X j partitions. The grid directory is an i
by j array with one entry per partition. This description is maintained in an array called a
linear scale; there is one linear scale per axis.

The following figure illustrates how we search for a point using a Grid file index. First,
we use the linear scales to find the X segment to which the X value of the given point
belongs and the Y segment to which the Y value belongs. This identifies the entry of the
grid directory for the given point. We assume that all linear scales are stored in main
memory, and therefore this step does not require any I/O. Next, we fetch the grid
directory entry. Since the grid directory may be too large to _t in main memory, it is
stored on disk. However, we can identify the disk page containing a given entry and fetch
it in one I/O because the grid directory entries are arranged sequentially in either row-
wise or column-wise order. The grid directory entry gives us the id of the data page
containing the desired point, and this page can now be retrieved in one I/O.

Thus, we can retrieve a point in two I/Os-one I/O for the directory entry and one for the
data page.

Figure 10.4: Searching for a point in a Grid file

                                                                        Multimedia Databases

Range queries and nearest neighbor queries are easily answered using the Grid file. For
range queries, we use the linear scales to identify the set of grid directory entries to fetch.
For nearest neighbor queries, we first retrieve the grid directory entry for the given point
and search the data page it points to. If this data page is empty, we use the linear scales to
retrieve the data entries for grid partitions that are adjacent to the partition that contains
the query point. We retrieve all the data points within these partitions and check them for
nearness to the given point.

The Grid file relies upon the property that a grid directory entry points to a page that
contains the desired data point (if the point is in the database). This means that we are
forced to split the grid directory and therefore a linear scale along the splitting dimension
if a data page is full and a new point is inserted to that page. In order to obtain good space
utilization, we allow several grid directory entries to point to the same page. That is,
several partitions of the space may be mapped to the same physical page, as long as the
set of points across all these partitions fits on a single page.

Insertion of points into a Grid file is illustrated in next figure, which has four parts each
illustrating a snapshot of a Grid file. Each snapshot shows just the grid directory and the
data pages; the linear scales are omitted for simplicity. Initially (the top left part of the
figure) there are only three points, all of which fit into a single page (A). The grid
directory contains a single entry, which covers the entire data space and points to page A.

Figure 10.5: Inserting points into a Grid file

In this example, we assume that the capacity of a data page is three points. Thus, when a
new point is inserted, we need an additional data page. We are also forced to split the grid
directory in order to accommodate an entry for the new page. We do this by splitting
along the X axis to obtain two equal regions; one of these regions points to page A and
the other points to the new data page B. The data points are redistributed across pages A
and B to reflect the partitioning of the grid directory.

                                                                        Multimedia Databases

The result is shown in the top right part. The next part (bottom left) illustrates the Grid
file after two more insertions. The insertion of point 5 forces us to split the grid directory
again because point 5 is in the region that points to page A, and page A is already full.
Since we split along the X axis in the previous split, we now split along the Y axis, and
redistribute the points in page A across page A and a new data page, C. (Choosing the
axis to split in a round-robin fashion is one of several possible splitting policies.) Observe
that splitting the region that points to page A also causes a split of the region that points
to page B, leading to two regions pointing to page B. Inserting point 6 next is
straightforward because it is in a region that points to page B, and page B has space for
the new point.

Next, consider the bottom right part of the figure. It shows the example file after the
insertion of two additional points, 7 and 8. The insertion of point 7 causes page C to
becomes full, and the subsequent insertion of point 8 causes another split. This time, we
split along the X axis and redistribute the points in page C across C and the new data page
D. Observe how the grid directory is partitioned the most in those parts of the data space
that contain the most points the partitioning is sensitive to data distribution, like the
partitioning in Extendible Hashing, and handles skewed distributions well.

Finally, consider the potential insertion of points 9 and 10, which are shown as light
circles to indicate that the result of these insertions is not reflected in the data pages.
Inserting point 9 fills page B, and subsequently inserting point 10 requires a new data
page. However, the grid directory does not have to be split further - points 6 and 9 can be
in page B, points 3 and 10 can go to a new page E, and the second grid directory entry
that points to page B can be reset to point to page E.

Deletion of points from a Grid file is complicated. When a data page falls below some
occupancy threshold, e.g., less than half-full, it must be merged with some other data
page in order to maintain good space utilization. We will not go into the details beyond
noting that in order to simplify deletion, there is a convexity requirement on the set of
grid directory entries that point to a single data page: the region defined by this set of grid
directory entries must be convex.

10.7.1 Adapting Grid Files to Handle Regions
There are two basic approaches to handling region data in a Grid file, neither of which is
satisfactory. First, we can represent a region by a point in a higher dimensional space. For
example, a box in two dimensions can be represented as a four-dimensional point by
storing two diagonal corner points of the box. This approach does not support nearest
neighbor and spatial join queries since distances in the original space are not reflected in
the distances between points in the higher-dimensional space. Further, this approach
increases the dimensionality of the stored data, which leads to various problems.

The second approach is to store a record representing the region object in each grid
partition that overlaps the region object. This is unsatisfactory because it leads to a lot of
additional records and makes insertion and deletion expensive.

                                                                       Multimedia Databases

In summary, the Grid _le is not a good structure for storing region data.

10.8 R TREES: Point and Region Data
The R tree is an adaptation of the B+ tree to handle spatial data and it is a height balanced
data structure, like the B+ tree. The search key for an R tree is a collection of intervals,
with one interval per dimension. We can think of a search key value as a box that is
bounded by the intervals; each side of the box is parallel to an axis. We will refer to
search key values in an R tree as bounding boxes.

A data entry consists of a pair hn-dimensional box, ridi, where rid identifies an object and
the box is the smallest box that contains the object. As a special case, the box is a point if
the data object is a point instead of a region. Data entries are stored in leaf nodes. Non-
leaf nodes contain index entries of the form hn-dimensional box, pointer to a child node i.
The box at a non-leaf node N is the smallest box that contains all boxes associated with
child nodes; intuitively, it bounds the region containing all data objects stored in the
subtree rooted at node N.

Figure 10.6: Two views of an example R-Tree

The figure shows two views of an example R tree. In the first view, we see the tree
structure. In the second view, we see how the data objects and bounding boxes are
distributed in space.

There are 19 regions in the example tree. Regions R8 through R19 represent data objects
and are shown in the tree as data entries at the leaf level. The entry R8*, for example,
consists of the bounding box for region R8 and the rid of the underlying data object.
Regions R1 through R7 represent bounding boxes for internal nodes in the tree. Region
R1, for example, is the bounding box for the space containing the left subtree, which
includes data objects R8, R9, R10, R11, R12, R13, and R14.

The bounding boxes for two children of a given node can overlap; for example, the boxes
for the children of the root node, R1 and R2, overlap. This means that more than one leaf
node could accommodate a given data object while satisfying all bounding box
constraints. However, every data object is stored in exactly one leaf node, even if its
bounding box falls within the regions corresponding to two or more higher-level nodes.

                                                                        Multimedia Databases

For example, consider the data object represented by R9. It is contained within both R3
and R4 and could be placed in either the first or the second leaf node (going from left to
right in the tree). We have chosen to insert it into the left-most leaf node; it is not inserted
anywhere else in the tree.

10.8.1 Queries
To search for a point, we compute its bounding box B - which is just the point and start at
the root of the tree. We test the bounding box for each child of the root to see if it
overlaps the query box B, and if so we search the subtree rooted at the child. If more than
one child of the root has a bounding box that overlaps B, we must search all the
corresponding subtrees. This is an important difference with respect to B+ trees: The
search for even a single point can lead us down several paths in the tree.

When we get to the leaf level, we check to see if the node contains the desired point. It is
possible that we do not visit any leaf node - this happens when the query point is in a
region that is not covered by any of the boxes associated with leaf nodes. If the search
does not visit any leaf pages, we know that the query point is not in the indexed dataset.

Searches for region objects and range queries are handled similarly by computing a
bounding box for the desired region and proceeding as in the search for an object. For a
range query, when we get to the leaf level we must retrieve all region objects that belong
there and test to see if they overlap (or are contained in, depending upon the query) the
given range. The reason for this test is that even if the bounding box for an object
overlaps the query region, the object itself may not.

As an example, suppose that we want to find all objects that overlap our query region,
and the query region happens to be the box representing object R8. We start at the root
and find that the query box overlaps R1 but not R2. Thus, we search the left subtree, but
not the right subtree. We then find that the query box overlaps R3 but not R4 or R5. So
we search the left-most leaf and find object R8. As another example, suppose that the
query region coincides with R9 rather than R8. Again, the query box overlaps R1 but not
R2 and so we search (only) the left subtree. Now we find that the query box overlaps both
R3 and R4, but not R5. We therefore search the children pointed to by the entries for R3
and R4.

As a refinement to the basic search strategy, we can approximate the query region by a
convex region defined by a collection of linear constraints, rather than a bounding box,
and test this convex region for overlap with the bounding boxes of internal nodes as we
search down the tree. The benefit is that a convex region is a tighter approximation than a
box, and therefore we can sometimes detect that there is no overlap although the
intersection of bounding boxes is nonempty. The cost is that the overlap test is more
expensive, but this is a pure CPU cost and is negligible in comparison to the potential I/O

Note that using convex regions to approximate the regions associated with nodes in the R
tree would also reduce the likelihood of false overlaps-the bounding regions overlap, but

                                                                      Multimedia Databases

the data object does not overlap the query region-but the cost of storing convex region
descriptions is much higher than the cost of storing bounding box descriptions.

To search for the nearest neighbors of a given point, we proceed as in a search for the
point itself. We retrieve all points in the leaves that we examine as part of this search and
return the point that is closest to the query point. If we do not visit any leaves, then we
replace the query point by a small box centered at the query point and repeat the search.
If we still do not visit any leaves, we increase the size of the box and search again,
continuing in this fashion until we visit a leaf node. We then consider all points retrieved
from leaf nodes in this iteration of the search and return the point that is closest to the
query point.

10.8.2 Insert and Delete Operations
To insert a data object with rid r, we compute the bounding box B for the object and
insert the pair hB, ri into the tree. We start at the root node and traverse a single path
from the root to a leaf (in contrast to searching, where we could traverse several such
paths). At each level, we choose the child node whose bounding box needs the least
enlargement (in terms of the increase in its area) to cover the box B. If several children
have bounding boxes that cover B (or that require the same enlargement in order to cover
B), from these children we choose the one with the smallest bounding box.

At the leaf level, we insert the object and if necessary we enlarge the bounding box of the
leaf to cover box B. If we have to enlarge the bounding box for the leaf, this must be
propagated to ancestors of the leaf-after the insertion is completed, the bounding box for
every node must cover the bounding box for all descendants. If the leaf node does not
have space for the new object, we must split the node and redistribute entries between the
old leaf and the new node. We must then adjust the bounding box for the old leaf and
insert the bounding box for the new leaf into the parent of the leaf.

Again, these changes could propagate up the tree.

Figure 10.7: Alternate redistributions in a node split

                                                                      Multimedia Databases

It is important to minimize the overlap between bounding boxes in the R tree because
overlap causes us to search down multiple paths. The amount of overlap is greatly
influenced by how entries are distributed when a node is split. The figure illustrates two
alternative redistributions during a node split. There are four regions R1, R2, R3, and R4
to be distributed across two pages. The first split (shown in broken lines) puts R1 and R2
on one page and R3 and R4 on the other page. The second split (shown in solid lines)
puts R1 and R4 on one page and R2 and R3 on the other page. Clearly, the total area of
the bounding boxes for the new pages is much less with the second split.

Minimizing overlap using a good insertion algorithm is very important for good search
performance. A variant of the R tree called the R* tree introduces the concept of forced
reinserts to reduce overlap: When a node overflows, rather than split it immediately we
remove some number of entries (about 30 percent of the node's contents works well) and
reinsert them into the tree. This may result in all entries fitting inside some existing page
and eliminate the need for a split. The R* tree insertion algorithms also try to minimize
box perimeters rather than box areas.

To delete a data object from an R tree, we have to proceed as in the search algorithm and
potentially examine several leaves. If the object is in the tree, we remove it. In principle,
we can try to shrink the bounding box for the leaf containing the object and the bounding
boxes for all ancestor nodes. In practice, deletion is often implemented by simply
removing the object.

There is a variant called the R+ tree that avoids overlap by inserting an object into
multiple leaves if necessary. Consider the insertion of an object with bounding box B at a
node N. If box B overlaps the boxes associated with more than one child of N, the object
is inserted into the subtree associated with each such child. For the purposes of insertion
into child C with bounding box BC, the object's bounding box is considered to be the
overlap of B and BC.1 The advantage of the more complex insertion strategy is that
searches can now proceed along a single path from the root to a leaf.

10.8.3 Concurrency Control
The cost of implementing concurrency control algorithms is often overlooked in
discussions of spatial index structures. This is justifiable in environments where the data
is rarely updated and queries are predominant. In general, however, this cost can greatly
influence the choice of index structure.

Searches proceed from root to a leaf obtaining shared locks on nodes; a node is unlocked
as soon as a child is locked. Inserts proceed from root to a leaf obtaining exclusive locks;
a node is unlocked after a child is locked if the child is not full. This algorithm can be
adapted to R trees by modifying the insert algorithm to release a lock on a node only if
the locked child has space and its region contains the region for the inserted entry (thus
ensuring that the region modifications will not propagate to the node being unlocked).

B+ trees in locks a range of values and prevents new entries in this range from being
inserted into the tree. This technique is used to avoid the phantom problem. Now let us

                                                                     Multimedia Databases

consider how to adapt the index locking approach to R trees. The basic idea is to lock the
index page that contains or would contain entries with key values in the locked range. In
R trees, overlap between regions associated with the children of a node could force us to
lock several (non-leaf) nodes on different paths from the root to some leaf. Additional
complications arise from having to deal with changes-in particular, enlargements due to
insertions in the regions of locked nodes. Without going into further detail, it should be
clear that index locking to avoid phantom insertions in R trees is both harder and less
efficient than in B+ trees. Further, ideas such as forced reinsertion in R* trees and
multiple insertions of an object in R+ trees make index locking prohibitively expensive.

10.8.4 Generalized Search Trees
The B+ tree and R tree index structures are similar in many respects: They are both
height-balanced in which searches start at the root of the tree and proceed toward the
leaves, each node covers a portion of the underlying data space, and the children of a
node cover a subregion of the region associated with the node. There are important
differences of course e.g., the space is linearized in the B+ tree representation but not in
the R tree but the common features lead to striking similarities in the algorithms for
insertion, deletion, search, and even concurrency control.

The generalized search tree (GiST) abstracts the essential features of tree index
structures and provides `template' algorithms for insertion, deletion, and searching. The
idea is that an ORDBMS can support these template algorithms and thereby make it easy
for an advanced database user to implement specific index structures, such as R trees or
variants, without making changes to any system code. The effort involved in writing the
extension methods is much less than that involved in implementing a new indexing
method from scratch, and the performance of the GiST template algorithms is
comparable to specialized code. (For concurrency control, more efficient approaches are
applicable if we exploit the properties that distinguish B+ trees from R trees.

However, B+ trees are implemented directly in most commercial DBMSs, and the GiST
approach is intended to support more complex tree indexes.) The template algorithms call
upon a set of extension methods that are specific to a particular index structure and must
be supplied by implementer. For example, the search template searches all children of a
node whose region is consistent with the query. In a B+ tree the region associated with a
node is a range of key values, and in an R tree the region is spatial. The check to see
whether a region is consistent with the query region is specific to the index structure and
is an example of an extension method. As another example of an extension method,
consider how to choose the child of an R tree node to insert a new entry into. This choice
can be made based on which candidate child's region needs to be expanded the least; an
extension method is required to calculate the required expansions for candidate children
and choose the child to insert the entry into.

                                                                     Multimedia Databases

10.9 Issues in High-Dimensional Indexing
The spatial indexing techniques that we have discussed work quite well for two- and
three-dimensional datasets, which are encountered in many applications of spatial data. In
some applications such as content-based image retrieval or text indexing, however, the
number of dimensions can be large (tens of dimensions are not uncommon). Indexing
such high-dimensional data presents unique challenges, and new techniques are required.
For example, sequential scan becomes superior to R trees even when searching for a
single point for datasets with more than about a dozen dimensions.

High-dimensional datasets are typically collections of points, not regions, and nearest
neighbor queries are the most common kind of queries. Searching for the nearest
neighbor of a query point is meaningful when the distance from the query point to its
nearest neighbor is less than the distance to other points. At the very least, we want the
nearest neighbor to be appreciably closer than the data point that is farthest from the
query point. There is a potential problem with high-dimensional data: For a wide range of
data distributions, as dimensionality d increases, the distance (from any given query
point) to the nearest neighbor grows closer and closer to the distance to the farthest data
point! Searching for nearest neighbors is not meaningful in such situations.

In many applications, high-dimensional data may not suffer from these problems and
may be amenable to indexing. However, it is advisable to check high-dimensional
datasets to make sure that nearest neighbor queries are meaningful. Let us call the ratio of
the distance (from a query point) to the nearest neighbor to the distance to the farthest
point the contrast in the dataset. We can measure the contrast of a dataset by generating
a number of sample queries, measuring distances to the nearest and farthest points for
each of these sample queries and computing the ratios of these distances, and taking the
average of the measured ratios. In applications that call for the nearest neighbor, we
should first ensure that datasets have good contrast by empirical tests of the data.

                                                                  Multimedia Databases

                                  CHAPTER 11


      The concept of a transaction has wide applicability for a variety of distributed
computing tasks, such as airline reservations, inventory management, and electronic

11.1.1 Transaction Processing Monitors
Complex applications are often built on top of several resource managers, such as
database management systems, operating systems, user interfaces, and messaging
software. A transaction processing monitor glues together the services of several
resource managers and provides application programmers with a uniform interface for
developing transactions with the ACID properties. In addition to providing a uniform
interface to the services of different resource managers, a TP monitor also routes
transactions to the appropriate resource managers. Finally, a TP monitor ensures that an
application behaves as a transaction by implementing concurrency control, logging, and
recovery functions, and by exploiting the transaction processing capabilities of the
underlying resource managers.

TP monitors are used in environments where applications require advanced features such
as access to multiple resource managers; sophisticated request routing (also called
workflow management); assigning priorities to transactions and doing priority based
load-balancing across servers; and so on. A DBMS provides many of the functions
supported by a TP monitor in addition to processing queries and database updates
efficiently. A DBMS is appropriate for environments where the wealth of transaction
management capabilities provided by a TP monitor is not necessary and, in particular,
where very high scalability (with respect to transaction processing activity) and
interoperability are not essential.

The transaction processing capabilities of database systems are improving continually.
For example, many vendors offer distributed DBMS products today in which a
transaction can execute across several resource managers, each of which is a DBMS.
Currently, all the DBMSs must be from the same vendor; however, as transaction-
oriented services from different vendors become more standardized, distributed,
heterogeneous DBMSs should become available. Eventually, perhaps, the functions of
current TP monitors will also be available in many DBMSs; for now, TP monitors
provide essential infrastructure for high-end transaction processing environments.

                                                                    Multimedia Databases

11.1.2 New Transaction Models
Consider an application such as computer-aided design, in which users retrieve large
design objects from a database and interactively analyze and modify them. Each
transaction takes a long time minutes or even hours, whereas the TPC benchmark
transactions take under a millisecond and holding locks this long affects performance.
Further, if a crash occurs, undoing an active transaction completely is unsatisfactory,
since considerable user effort may be lost. Ideally we want to be able to restore most of
the actions of an active transaction and resume execution. Finally, if several users are
concurrently developing a design, they may want to see changes being made by others
without waiting until the end of the transaction that changes the data.

To address the needs of long-duration activities, several refinements of the transaction
concept have been proposed. The basic idea is to treat each transaction as a collection of
related sub transactions. Sub transactions can acquire locks, and the changes made by a
sub transaction become visible to other transactions after the sub transaction ends (and
before the main transaction of which it is a part commits). In multilevel transactions,
locks held by a sub transaction are released when the sub transaction ends.

In nested transactions, locks held by a sub transaction are assigned to the parent
(sub)transaction when the sub transaction ends. These refinements to the transaction
concept have a significant effect on concurrency control and recovery algorithms.

11.1.3 Real-Time DBMSs
Some transactions must be executed within a user-specified deadline. A hard deadline
means the value of the transaction is zero after the deadline. For example, in a DBMS
designed to record bets on horse races, a transaction placing a bet is worthless once the
race begins. Such a transaction should not be executed; the bet should not be placed. A
soft deadline means the value of the transaction decreases after the deadline, eventually
going to zero. For example, in a DBMS designed to monitor some activity (e.g., a
complex reactor), a transaction that looks up the current reading of a sensor must be
executed within a short time, say, one second. The longer it takes to execute the
transaction, the less useful the reading becomes. In a real-time DBMS, the goal is to
maximize the value of executed transactions, and the DBMS must prioritize transactions,
taking their deadlines into account.

11.2 Integrated Access to Multiple Data Sources
As databases proliferate, users want to access data from more than one source. For
example, if several travel agents market their travel packages through the Web, customers
would like to look at packages from different agents and compare them. A more
traditional example is that large organizations typically have several databases, created
(and maintained) by different divisions such as Sales, Production, and Purchasing. While

                                                                     Multimedia Databases

these databases contain much common information, determining the exact relationship
between tables in different databases can be a complicated problem.

For example, prices in one database might be in dollars per dozen items, while prices in
another database might be in dollars per item. The development of XML DTDs offers the
promise that such semantic mismatches can be avoided if all parties conform to a single
standard DTD. However, there are many legacy databases and most domains still do not
have agreed-upon DTDs; the problem of semantic mismatches will be frequently
encountered for the foreseeable future.

Semantic mismatches can be resolved and hidden from users by defining relational views
over the tables from the two databases. Defining a collection of views to give a group of
users a uniform presentation of relevant data from multiple databases is called semantic
integration. Creating views that mask semantic mismatches in a natural manner is a
difficult task and has been widely studied. In practice, the task is made harder by the fact
that the schemas of existing databases are often poorly documented; thus, it is difficult to
even understand the meaning of rows in existing tables, let alone define unifying views
across several tables from different databases.

If the underlying databases are managed using different DBMSs, as is often the case,
some kind of `middleware' must be used to evaluate queries over the integrating views,
retrieving data at query execution time by using protocols such as Open Database
Connectivity (ODBC) to give each underlying database a uniform interface.
Alternatively, the integrating views can be materialized and stored in a data warehouse.
Queries can then be executed over the warehoused data without accessing the source
DBMSs at run-time.

11.3 Mobile Databases
The availability of portable computers and wireless communications has created a new
breed of nomadic database users. At one level these users are simply accessing a database
through a network, which is similar to distributed DBMSs. At another level the network
as well as data and user characteristics now have several novel properties, which affect
basic assumptions in many components of a DBMS, including the query engine,
transaction manager, and recovery manager; Users are connected through a wireless link
whose bandwidth is ten times less than Ethernet and 100 times less than ATM networks.
Communication costs are therefore significantly higher in proportion to I/O and CPU

Users' locations are constantly changing, and mobile computers have a limited battery
life. Therefore, the true communication costs reflect connection time and battery usage in
addition to bytes transferred, and changes constantly depending on location. Data is
frequently replicated to minimize the cost of accessing it from different locations.

                                                                     Multimedia Databases

As a user moves around, data could be accessed from multiple database servers within a
single transaction. The likelihood of losing connections is also much greater than in a
traditional network. Centralized transaction management may therefore be impractical,
especially if some data is resident at the mobile computers. We may in fact have to give
up on ACID transactions and develop alternative notions of consistency for user

11.4 Main Memory Databases
The price of main memory is now low enough that we can buy enough main memory to
hold the entire database for many applications; with 64-bit addressing, modern CPUs also
have very large address spaces. Some commercial systems now have several gigabytes of
main memory. This shift prompts a reexamination of some basic DBMS design decisions,
since disk accesses no longer dominate processing time for a memory resident database:

Main memory does not survive system crashes, and so we still have to implement logging
and recovery to ensure transaction atomicity and durability. Log records must be written
to stable storage at commit time, and this process could become a bottleneck. To
minimize this problem, rather than commit each transaction as it completes, we can
collect completed transactions and commit them in batches; this is called group commit.
Recovery algorithms can also be optimized since pages rarely have to be written out to
make room for other pages.

The implementation of in-memory operations has to be optimized carefully since disk
accesses are no longer the limiting factor for performance. A new criterion must be
considered while optimizing queries, namely the amount of space required to execute a
plan. It is important to minimize the space overhead because exceeding available physical
memory would lead to swapping pages to disk (through the operating system's virtual
memory mechanisms), greatly slowing down execution.

Page-oriented data structures become less important (since pages are no longer the unit of
data retrieval), and clustering is not important (since the cost of accessing any region of
main memory is uniform).

11.5 Geographic Information Systems
Geographic Information Systems (GIS) contain spatial information about cities, states,
countries, streets, highways, lakes, rivers, and other geographical features, and support
applications to combine such spatial information with non-spatial data. As discussed in
Chapter 26, spatial data is stored in either raster or vector formats. In addition, there is
often a temporal dimension, as when we measure rainfall at several locations over time.
An important issue with spatial data sets is how to integrate data from multiple sources,

                                                                     Multimedia Databases

since each source may record data using a different coordinate system to identify

Now let us consider how spatial data in a GIS is analyzed. Spatial information is most
naturally thought of as being overlaid on maps. Typical queries include \What cities lie
on I-94 between Madison and Chicago?" and \What is the shortest route from Madison to
St. Louis?" These kinds of queries can be addressed using the techniques. An emerging
application is in-vehicle navigation aids. With Global Positioning Systems (GPS)
technology, a car's location can be pinpointed, and by accessing a database of local maps,
a driver can receive directions from his or her current location to a desired destination;
this application also involves mobile database access!

In addition, many applications involve interpolating measurements at certain locations
across an entire region to obtain a model, and combining overlapping models. For
example, if we have measured rainfall at certain locations, we can use the TIN approach
to triangulate the region with the locations at which we have measurements being the
vertices of the triangles. Then, we use some form of interpolation to estimate the rainfall
at points within triangles. Interpolation, triangulation, map overlays, visualizations of
spatial data, and many other domain-specific operations are supported in GIS products
such as ESRI Systems' ARC-Info. Thus, while spatial query processing techniques are an
important part of a GIS product, considerable additional functionality must be
incorporated as well. How best to extend RDBMS systems with this additional
functionality is an important problem yet to be resolved. Agreeing upon standards for
data representation formats and coordinate systems is another major challenge facing the

11.6 Temporal and Sequence Databases
Currently available DBMSs provide little support for queries over ordered collections of
records, or sequences, and over temporal data. Typical sequence queries include \Find the
weekly moving average of the Dow Jones Industrial Average," and \Find the first five
consecutively increasing temperature readings" (from a trace of temperature
observations). Such queries can be easily expressed and often efficiently executed by
systems that support query languages designed for sequences. Some commercial SQL
systems now support such SQL extensions.

The first example is also a temporal query. However, temporal queries involve more than
just record ordering. For example, consider the following query: \Find the longest interval
in which the same person managed two different departments." If the period during
which a given person managed a department is indicated by two fields from and to, we
have to reason about a collection of intervals, rather than a sequence of records.

Further, temporal queries require the DBMS to be aware of the anomalies associated with
calendars (such as leap years). Temporal extensions are likely to be incorporated in future
versions of the SQL standard. A distinct and important class of sequence data consists of

                                                                      Multimedia Databases

DNA sequences, which are being generated at a rapid pace by the biological community.
These are in fact closer to sequences of characters in text than to time sequences as in the
above examples.

The field of biological information management and analysis has become very popular in
recent years, and is called bioinformatics. Biological data, such as DNA sequence data,
is characterized by complex structure and numerous relationships among data elements,
many overlapping and incomplete or erroneous data fragments (because experimentally
collected data from several groups, often working on related problems, is stored in the
databases), a need to frequently change the database schema itself as new kinds of
relationships in the data are discovered, and the need to maintain several versions of data
for archival and reference.

11.7 Information Visualization
As computers become faster and main memory becomes cheaper, it becomes increasingly
feasible to create visual presentations of data, rather than just text-based reports. Data
visualization makes it easier for users to understand the information in large complex
datasets. The challenge here is to make it easy for users to develop visual presentation of
their data and to interactively query such presentations. Although a number of data
visualization tools are available, efficient visualization of large datasets presents many

The need for visualization is especially important in the context of decision support;
when confronted with large quantities of high-dimensional data and various kinds of data
summaries produced by using analysis tools such as SQL, OLAP, and data mining
algorithms, the information can be overwhelming. Visualizing the data, together with the
generated summaries, can be a powerful way to sift through this information and spot
interesting trends or patterns. The human eye, after all, is very good at finding patterns. A
good framework for data mining must combine analytic tools to process data, and bring
out latent anomalies or trends, with a visualization environment in which a user can
notice these patterns and interactively drill down to the original data for further analysis.


To top