BRINGING IT ALL TOGETHER: A COMPARISON OF TWO MODERN
MULTIMEDIA CONTAINER FORMATS
University of Southampton
ABSTRACT format (Ogg) although other options exist.
Microsoft’s ASF (Advanced Systems Format) 
Container formats are used in multimedia was considered to be narrower in scope than the
applications to provide storage and synchronisation Motion Picture Experts Group’s (MPEG) MPEG-4
of different media types and there are many to format because the MPEG working group contains
choose from. This paper provides a general many partners, which means that the standards
overview of what a container format comprises developed are likely to be more widely applicable.
before comparing two widely used modern container Additionally, much more information is readily
formats: Xiph.Org’s free and open Ogg format and available regarding MPEG-4 both in terms of
the Motion Picture Expert Group’s (MPEG) MPEG-4 specification and in academia. Another open
part 12 container. It concludes that Ogg is a more standard container format is Matroska , however
lightweight format, suitable for simple it is as yet immature and not as in widespread use
synchronisation and streaming of multimedia as Ogg.
whereas MPEG-4 is a more expansive and
extensible format in terms of features, especially Section 2 of this paper provides an overview of what
those involving metadata related to audio-visual container formats are, giving a brief history and
items. motivating their existence in general before
introducing MPEG-4 and Ogg. Section 3 describes
Keywords and motivates the criteria that will be used to
Container files, MPEG-4, Ogg, Metadata, compare the two formats and sections 4 through 7
Multimedia Streaming and Multimedia Storage give the actual comparison. Section 8 draws the
paper to a conclusion, summarising the conclusions
for each criteria. The criteria themselves are the
1. INTRODUCTION formats’ suitability for distribution and streaming
This paper aims to provide the reader with an idea content, their level of support for decoding multiple
of what multimedia container file formats are before formats of multimedia, their support for metadata
providing a comparison of two widely used modern and a brief consideration of the philosophy behind
multimedia container formats, Xiph.Org’s Ogg their licensing and usage. This paper’s focus is
format  and the Motion Picture Expert Group’s deliberately the technical aspects of these formats.
(MPEG) MPEG-4 format . Multimedia container
formats are useful in multimedia systems for the
interchange of data during content creation and the
synchronisation of different types of multimedia Multimedia information can be found in many
stream into single streams for network distribution or different formats and types and it is often useful to
local storage. be able to synchronise two different types of data,
for example to play audio simultaneously with video
The container formats described in detail in this and perhaps add subtitles to a video. This is where
paper for comparison were chosen since they the container format comes in; container formats
represent an industry and academic standard format bring together different types of multimedia into a
(MPEG-4) and an established open and free-for-use single file and have the capacity to synchronise
playback of disparate types of media. They provide
Permission to make digital or hard copies of all or part of this the information that multimedia storage and
work for personal or classroom use is granted without fee playback systems require in order to, for example,
provided that copies are not made or distributed for profit or play back a presentation from a file in permanent
commercial advantage and that copies bear this notice and the
full citation on the first page. To copy otherwise, to republish, to
storage locally or transmit it sequentially across a
post on servers or to redistribute to lists, requires prior specific network. Container formats also generally provide
permission. metadata facilities that give additional information to
6th Annual Multimedia Systems, Electronics and Computer users or help with common tasks such as seeking
Science, University of Southampton
© 2006 Electronics and Computer Science, University of
within a file for particular points in time or specific previously overseen specification of the widely-used
events. MPEG-1 and MPEG-2 standards regarding video
and audio compression and decompression. The
2.1 History of Container Formats most recent standard in this series is the MPEG-4
The first widely used portable multimedia container specification . In particular, this paper is
format was IFF (Interchange File Format), which interested in part 14 of the specification, which
was a standard released by EA in 1985 . This defines a container format based upon Apple
introduced the basic idea of “chunks” within a file, Computer, Inc.’s QuickTime container format.
each with its own type identifier in order to provide Unfortunately, part 14 of the MPEG-4 standard is
hints on how to decode it, such that complex files not freely available, so this paper will instead refer
containing different types of information could be to part 12 , the ISO base media format which
seamlessly moved between machines and forms the basis for part 14.
applications in a standard manner during Also of interest is the MPEG-7 standard , which
development . The IFF type solved the specifies a standard method for storing metadata for
problems that require a standardised container both generic and multimedia specific content. The
format so well that it has served as the basis for MPEG-4 standard permits the inclusion of MPEG-7
most container formats since . In particular, the metadata  instead of the OCI (Object Content
standard gave as one of its main motivations the Information) metadata type specified in the original
ability to be able to add new types to the format standard .
without having to ask a central administration group.
Extensibility and modularity of formats is something 2.4 Xiph.Org Ogg
which is important if a format wishes to gain Xiph.Org is an organisation formed to create and
widespread acceptance. maintain freely available open standards for internet
Other formats based upon the original IFF multimedia applications, including the encoding and
specification include the RIFF (Resource distribution of video and audio data . This
Interchange File Format) released by IBM and includes the Ogg container format, which this paper
Microsoft in 1991 . The only difference between will focus on. The Ogg Vorbis (audio) and Theora
IFF and RIFF is that the bytes are x86 little-endian (video) codec standards provide information on
rather than the big-endian bytes used by the packaging of data into multiplexed Ogg files [8, 9],
Motorola 68000 that the original IFF format was however they do not provide metadata for features
designed to accommodate . The commonly such as chapter indexing or subtitles. This is
used AVI (Audio Video Interleave) audio-visual instead provided by Tobias Waldvogel’s Ogg Media
format and the WAV (short for wave) audio format (OGM) extension, which is provided as part of the
are both derived from RIFF . Ogg software repository on the Xiph.Org website
2.2 Multimedia Metadata
Given that the idea of chunks and chunk type 3. METHODOLOGY
identification is still a core idea in multimedia The container formats will be compared on both
container formats, what is it which makes a format technical merit and philosophical suitability for
modern? The differentiating factor is generally in purpose on several criteria. They will be compared
the amount and type of metadata each type can on their theoretical ability to be used as both
store. Metadata can be described as “data about streaming and distributable file formats, which will
data”, which may describe fundamental properties include a discussion on both their internal structure
of that data such as screen resolution of a video and support for different audio and video codecs.
stream or the codec (coder/decoder) required to There will then be a discussion of the metadata
decode a particular data stream. This is in contrast each format supports natively. Philosophically, this
with “content essence”, which is the multimedia paper will briefly discuss the licensing options for
material itself . Important aspects of metadata each format and the implications for their use.
implementations include the ability to add more
metadata at a future date and, especially in a 3.1 Distribution and Streaming
container format, to be able to decode a file even if
The container formats aid in streaming of audio and
the application does not understand particular parts
visual data by providing a standard method of serial
of the associated metadata.
transportation that can be multiplexed by the
receiver in order to play back many different
2.3 MPEG-4 Parts 12 and 14 streams simultaneously. For example, a stream
The Motion Picture Experts Group (MPEG) is an may consist of a video channel and several
ISO (International Organisation for Standardisation) selectable audio channels, each in a different
working group put together to work on standards language, as well as subtitles. A similar principle is
regarding video and audio . They have used to store the data as a file on physical storage
FIGURE 1: Xiph.Org Ogg physical bit stream featuring multiplexed logical bit streams, with
beginning and terminating pages .
media and reconstruct the streams for local Ogg has been designed both for file storage on a
playback, aiding distribution of multimedia. The local system and for streaming over, for example, a
comparison will be made on how the formats TCP connection. One of the major design goals
support this storage. This includes error correction was to be able to construct a complete stream
and synchronisation support. without any seeking. This means that the files can
be read or written in a single pass and makes it
3.2 Support for Codecs ideal for streaming applications such as internet
Different container files will have support for radio.
different audio and video codecs depending on their Each instance of a decoding codec is responsible
internal structure and available implementations. for a single logical bit stream. The logical bit
This will compare the formats’ ability to play different streams consist of consecutively numbered pages
codecs and thus how easily they may be extended (which contain the data within the stream). Each
to take advantage of, for example, variable quality page must be uniquely numbered within the context
video and audio or future formats which have not of the physical bit stream. The physical bit stream is
yet been created. If an application may easily use a constructed from interleaved logical bit stream
relatively codec-agnostic container format then the pages (Figure 1); it is this which forms the file or
transportation and distribution layer can be easily streamed data. Logical bit streams may be
decoupled from the multiplexing and decoding layer. concatenated (also known as being chained)
sequentially, so for example one audio stream may
3.3 Metadata end and another begin immediately after. There are
Direct metadata can be used in container files for initial and terminating pages for each logical bit
multimedia applications in order to provide the user stream; a terminating page must be immediately
with information on details such as the track name followed by an initial page for the next stream if any
and artist, one example of this being the ID3 tags more data is to be sent on that logical stream.
which can be packaged with MPEG layer III (MP3) Logical bit streams may also be multiplexed in
audio files . Metadata may also be used within parallel (known as grouping). Pages within a group
the file format itself in order to index the contents must follow one another sequentially within the
and allow faster seeking of content or to provide logical bit stream but may be interleaved in any
information to applications so that they know which order within the group itself. A group of logical bit
codecs to use to decode the data. Modern streams must all begin simultaneously (send their
container formats allow advanced metadata that initial page before any data is sent) and a new
quickly allows users to search for specific group may not begin until all logical bit streams
multimedia, for example melody information that can within the previous group have ended.
be used in a “query by humming” system . The The pages within the group contain the codec data
file formats’ methods of storing metadata will be in packets, each of which is split into consecutive
discussed as well as the implications for 255 byte chunks. This allows variable sized pages
applications and users. and packets with minimal processing to discover the
beginning and end of each, since the total packet
3.4 Philosophical Considerations size may be deduced by the first non-255 sized
The licensing options available to each format can chunk within the page (a size of 0 must be given
dictate how they fare in different areas of the when the last segment in a packet is 255).
market. This paper will briefly discuss the The page header contains the sizes of each of the
philosophical issues surrounding the licensing of packet segments contained within. A maximum size
each of the formats. of 255 segments is placed on the pages in order to
prevent runaway streams being given by corrupt
4. DISTRIBUTION AND STREAMING data. Pages have a mechanism whereby a flag is
set to say when a packet has been split over several
4.1 Xiph.Org Ogg pages. This means that error recovery in the case
The technical evidence presented here comes from of corrupt packets may be easily performed, as the
the official Xiph.Org Ogg bit stream container format codec can easily pick up the start of the next packet
specification, which is described in  and . and ignore the partial data within a page. The page
FIGURE 2: MPEG-4 audio visual file with hint track ready for streaming .
header also contains CRC (Cyclic Redundancy box and its children (of which there can be only one
Check) checksum data for individual pages in order in a file).
to verify data integrity. Network streaming support is provided indirectly by
Ogg bit streams are thus built well for streaming another track format known as the hint track. This
purposes; since the packet size for the codecs may may be contained as a track with data within the
be variable and the page size may vary, this makes MPEG-4 file itself or added using a “hinter” tool
it easy to store and send variable bit rate data as before transmission. MPEG-4 may be transmitted
the Ogg bit stream does not mandate a packet size over RTP (the real-time transmission protocol) using
(though the codec may). Logical bit streams a variety of different encoding techniques based on
containing audio and video data may be sent the flexibility of the box structure . However,
simultaneously at differing bit rates, since there is the variety of types of media that may be present
no obligation for the pages to appear at the same within the MPEG-4 file mean that it is more complex
frequency within a group in the physical bit stream. to partition and order the data in order to stream
An Ogg stream provides information about what across a network . Ogg is specifically built such
data it is sending when it provides the initial logical that the overhead provided by the headers scales
stream packets, meaning that an application can act with the size of the packets, whereas the size data
accordingly. For example, if a subtitle logical provided for a raw MPEG-4 box is constant.
stream has begun the application might optionally In general the MPEG-4 Part 12 format is a lot more
provide subtitles, however if none is sent initially flexible than the Ogg container. It specifies edit lists
then it knows that no such support is required for that allow the data within the file to be out of
this grouping of logical bit streams. temporal order, whereas the Ogg standard specifies
The overhead of page information is kept to a that the logical bit stream pages must be time-
minimum, since the total size of a page may be ordered . This facilitates easier editing in place
deduced from its “lacing values” (the list of packet of MPEG-4 file contents than Ogg provides and
segment sizes). means it is potentially much more useful than Ogg
as an interchange format during development due to
4.2 MPEG-4 Part 12 the larger variety of data it may hold as standard.
This data may also be much richer, the implications
All technical data in this subsection is from  of which are described later on. In addition, MPEG-
unless otherwise specified. 4 may reference data outside of the file itself, which
The MPEG-4 part 12 ISO base file format is Ogg has no native support for. This aids during
hierarchical in nature and can be much more content creation as this media data need not be
complex than an Ogg bit stream. It consists of embedded in the file until distribution.
objects known as “boxes” (or “atoms”), whose
structure is inferred by their type (given by a four 5. SUPPORT FOR CODECS
character code, Figure 2).
Boxes may contain other boxes, for example the 5.1 Xiph.Org Ogg
Movie Box (“moov”), which contains metadata The Ogg native audio format is Vorbis. Vorbis itself
boxes for playback of the tracks (“trak” boxes). is specified as a container-agnostic format and may
Whereas in an Ogg bit stream the logical stream be encoded at a variable bit rate. It is designed
types are defined by the Ogg derived format in use such that the least important information is
and the initial pages given by the stream group, in contained at the end of the packets, meaning that
an MPEG-4 file the types are given by the “moov” they may be truncated on demand to make more
efficient use of bandwidth at the expense of audio
quality . Given the complexity of this native 6.2 MPEG-4 Part 12
format and the flexibility of the Ogg container The MPEG-4 part 12 specification gives two
format, it is reasonable to assume that the Ogg possible choices for attaching arbitrary metadata to
format is flexible enough to contain any audio or a stream: the original Object Content Information
video codec which contains data in continuous descriptors which may be attached to specific
temporal order. This is backed up by the Ogg objects within the file or as a stream attached to an
Theora codec, which is the native Ogg format for object, much like a track for information which
video that suggests interleaving various kinds of changes over time (subtitles, for example) .
audio and video data as a part of the standard .
A much better method of attaching metadata to files
5.2 MPEG-4 Part 12 is to use the MPEG-7 framework, provision for
which has already been included in the MPEG-4
The MPEG-4 suite of standards contains
standard. MPEG-7 can provide all of the
specifications for the H.264/AVC (Advanced Video
functionality of OCI and much more . MPEG-7
Coding) video standard (part 10)  and natural
uses XML based data structures to store information
audio coding which covers the range from “16 kbit/s
about audio/visual data within scenes. It is
per channel up to bit-rates higher than 64 kbit/s per
designed to be wide-ranging and extensible and
channel” . This provides the quality for encoding
provides standards for, for example, visual objects’
many different kinds of audio from speech quality to
colour, shape and location within a scene or audio
CD quality and variable bit rate video to HDTV
data’s melodic signature, instrumental timbre or
quality, given sufficient processing power to decode
seeking data provided by generic indexing .
the large data rate in real-time . Thus is could
be argued that the MPEG-4 standards for video and Applications of MPEG-7 data include search within
audio data rates are sufficient to account for the MPEG-4 data files , representation of metadata
data rate required by new and interesting formats by regarding internet streams  and storing the
themselves and there is not as much need for trajectory of objects within scenes such as sports
extensibility within MPEG-4 as there may be in Ogg. videos . This shows that the MPEG-4 format is
Ogg’s Theora is based upon On2’s VP3 format and ready to use MPEG-7 descriptor data in many
is more suited to low bit rate streaming video, different applications and research areas and while
whereas MPEG-4 native video is well suited to there is technically nothing preventing someone
variable bit rate distribution . from using MPEG-7 data or XML streams within an
Ogg file, there is little reason to if access to MPEG-4
MPEG-4 provides the MSDL  (MPEG-4 Systems
technology is available. The availability of papers
and Description Languages) in order to define new
on MPEG-4 and MPEG-7 research topics also
objects. The extensibility of the format means that
shows that they are widespread within the academic
these could be easily included in the format itself.
community as a standard for research purposes.
However, compared with Ogg this would be a
relatively lengthy process given an existing
container-independent implementation – the Ogg 7. PHILOSOPHICAL
format wrapper would almost write itself, due to the CONSIDERATIONS
flexibility of Ogg’s packet and page mechanism. The Ogg container format and other Xiph.Org
However, in MPEG-4 the new format would have to software specifications and products are completely
have an object specifically written for it in order to free for use and free from patents . This provides
be usable within the MPEG-4 file itself. a major advantage over MPEG-4 in terms of cost,
since using the MPEG-4 suite of tools requires
6. METADATA permission from the patent holders and possibly
payment of royalties. This is a tricky area, since
6.1 Xiph.Org Ogg there are many patent holders who have contributed
The generic Ogg container format does not itself to MPEG-4 and no central point of contact . All
declare any form of metadata beyond that provided of this means that, although MPEG-4 will (and likely
by the initial stream tags in the Ogg stream (and this has) become a research and marketplace standard
is implied information based on file type rather than format for audio/visual data, open formats such as
concrete metadata) . However, the Ogg audio Ogg will always be around owing to pressure from
Vorbis codec format specifies a header for the Ogg the community for open formats.
stream which contains simple metadata such as
artist name and track title. It also suggests that 8. CONCLUSIONS
arbitrary metadata associated with the streams This paper has described the technical merits of the
within the Ogg file should be given its own logical bit open Xiph.Org Ogg container format standard and
stream based on XML or a similar information the industry standard MPEG-4 standard (in
declaration technology . particular parts 12 and 14 which describe its
container format). It has shown how both are able
to store their data on disk and process it  Theora I Specification.
sequentially for network streaming. Ogg seems to http://www.xiph.org/theora/doc/Theora_I_spec.p
have a slight edge on that front, being more df last accessed 17th November 2006.
lightweight and streamlined whereas MPEG-4 is  Supported codecs and format of their
less flexible when it comes to storage and CodecPrivate blocks.
streaming. Ogg uses a simple sequential stream http://haali.cs.msu.ru/mkv/codecs.pdf last
whilst MPEG-4 utilises a hierarchy of objects. This accessed 17th November 2006.
makes MPEG-4 better for editing than Ogg, since
the component objects may be edited in place  Diepold, K., Pereira, F., Chang W. (2005)
rather than having to rewrite the whole file. MPEG-A: multimedia application formats.
Multimedia, IEEE; Vol. 12, no. 4; pp. 34- 41.
Likewise, Ogg’s lightweight, flexible specification
seems to lend itself well to extension to new media  Quackenbush, S., Lindsay, A. (2001) Overview
formats. However, MPEG-4 provides the tools for of MPEG-7 audio. IEEE Transactions on
formally specifying new objects in a standard Circuits and Systems for Video Technology;
fashion which would have to be performed Vol. 11, issue 6; pp. 725-729.
externally to Ogg to achieve the same effect.  Ogg logical and physical bit stream overview.
MPEG-4 has much more built-in support for http://www.xiph.org/ogg/doc/oggstream.html last
metadata than Ogg, since it is able to easily take accessed on 17th November 2006.
advantage of MPEG-7.  Ogg logical bit stream framing.
The biggest advantage to Ogg appears to be that it http://www.xiph.org/ogg/doc/framing.html last
is a free and open standard as well as flexible and accessed on 17th November 2006.
lightweight, whereas MPEG-4 is designed by  Basso, A., Varakliotis, S. (2000) Transport of
industry experts in order to tackle many tasks in MPEG-4 over IP/RTP. IEEE International
addition to simple streaming and synchronisation. Conference on Multimedia and Expo, 2000; Vol.
2; pp. 1067-1070..
9. REFERENCES  Wiegand, T., Sullivan, G.J., Bjntegaard, G.,
 Motion Picture Expert Group (MPEG) Luthra, A. (2003) IEEE Transactions on Circuits
Achievements. and Systems for Video Technology; Vol. 13,
http://www.chiariglione.org/mpeg/achievements. issue 7. pp. 560- 576.
htm last accessed 6th November 2006.
 Brandenburg, K., Kunz, O., Sugiyama, A.
 Koenen, R. (2002). Overview of the MPEG-4 (2000) MPEG-4 natural audio coding. Signal
Standard, N4668, ISO/IEC JTC1/SC29/WG11. Processing: Image Communication; Vol. 15,
 Martínez, J. M. (2004) MPEG-7 Overview, Issues 4-5, pp. 423-444.
N6828, ISO/IEC JTC1/SC29/WG11.  Moseler, K., Fang, J. (2000) Real-time
 MPEG-4 File Format, Version 2. Performance Analysis of MPEG-4 Systems.
http://www.digitalpreservation.gov/formats/fdd/fd Proceedings of the 43rd IEEE Midwest
d000155.shtml last accessed 6th November Symposium on Circuits and Systems; Vol. 3; pp.
 Motion Pictures Expert Group (2006)  Radha, H., Chen, Y., Parthasarathy, K., Cohen,
Introduction to MPEG-4 Object Content R. (1999) Scalable Internet Video Using
Information, N8148, ISO/IEC MPEG-4. Signal Processing: Image
JTC1/SC29/WG11. Communication; Vol. 15, pp. 95-126.
 ISO/IEC 14496-12:2005(E); Information  Eleftheriadis, A. (1997) The MPEG-4 system
technology - Coding of audio-visual objects Part and description languages: from practice to
12: ISO base media file format. theory. Proceedings of 1997 IEEE International
http://standards.iso.org/ittf/PubliclyAvailableStan Symposium on Circuits and Systems; Vol. 2; pp.
12_2005(E).zip last accessed 7th November  Microsoft’s Advanced Systems Format (ASF)
 About Xiph. http://www.xiph.org/about/ last http://www.microsoft.com/windows/windowsmed
accessed 17th November 2006. ia/forpros/format/asfspec.aspx last accessed on
 Vorbis I specification. November 17th 2006.
http://www.xiph.org/vorbis/doc/Vorbis_I_spec.ht  Matroska File Format.
ml last accessed 17th November 2006. http://www.matroska.org/technical/specs/matros
ka.pdf last accessed on November 17th 2006.
 Morrison, J. (1985), Standard for Interchange  Ki, M., Kim, K. (2006) MPEG-7 over MPEG-4
Format Files. Electronic Arts. Systems Decoder for Using Metadata.
http://www.szonye.com/bradd/iff.html last International Conference on Consumer
accessed 17th November 2006. Electronics, 2006. 2006 Digest of Technical
 Seebach, P. (2006) Standards and specs: The Papers; pp. 245- 246.
Interchange File Format (IFF). http://www-  Rehm, E. (2000) Representing internet
128.ibm.com/developerworks/power/library/pa- streaming media metadata using MPEG-7
spec16/?ca=dgr-lnxw07IFF last accessed 17th multimedia description schemes. Proceedings
November 2006. of the 2000 ACM workshops on Multimedia; pp.
 Resource Interchange File Format. 93-98.
http://en.wikipedia.org/wiki/RIFF last accessed  Haoran, Y., Rajan, D., Liang-Tien, C. (2003)
17th November 2006. Automatic Generation of MPEG-7 Compliant
 Wilkinson, J. H. Morgan, O. F. (1997) XML Document for Motion Trajectory Descriptor
International Broadcasting Convention; Vol. 1, in Sports Video. Proceedings of the 1st ACM
Issue 447; pp. 374-379. international workshop on Multimedia
databases; pp. 10 – 17.
 RFC 3533: The Ogg Encapsulation Format
Version 0. ftp://ftp.rfc-editor.org/in-  MPEG Licensing Information (2005).
notes/rfc3533.txt last viewed on 17th November http://www.mpegif.org/patents/index.php last
2006. accessed 17th November 2006.