Cours_Multimedia

Document Sample
Cours_Multimedia Powered By Docstoc
					RESEAUX MULTIMEDIA
 Multimedia, Quality of Service: What is it?

                             Multimedia applications:
                             network audio and video
                             (“continuous media”)




QoS
le réseau fournit à
l'application le niveau de
l'exécution requis pour
qu‟elle fonctionne
 Multimedia, Quality of Service: What is it?

Qu’est ce que le Multimedia?

Le Multimedia peut avoir plusieurs définitions:
Les multimédia signifie que l'information d'ordinateur peut être
représentée par audio, vidéo, et animation en plus des médias
traditionnels (c.-à-d., texte, graphiques, images ).
La bonne définition générale est :

Les Multimedia est le champ concerné par l'intégration controlée
par ordinateur du texte, des graphiques, des images immobiles et
mobiles (vidéo), l'animation, l'audio, et tous les autres médias où
chaque type d'information peut être représenté, stocké, transmis
et traité numériquement.

Une Application mulimédia est une application qui emploie une
collection de sources multiple, du texte, de graphiques, d'images,
du son/audio, d'animation et/ou de vidéo.
Hypermedia peut être considéré comme une des applications
multimedia.
    Multimedia, Quality of Service: What is it?
Quel est hypertexte et HyperMedia ?

L'hypertexte est un texte qui contient des liens à d'autres textes. La
limite a été inventée par Ted Nelson autour de 1965. L'hypertexte est
donc habituellement non linéaire (comme indiqué ci-dessous)

Definition de l’Hypertext

Hypertext n'est pas contraint pour être texte basé. Il peut inclure
d'autres médias, par exemple, graphiques, images, et particulièrement
les médias continus - son et vidéo. Apparemment, Ted Nelson était
également le premier pour employer ce terme

Definition de l’ HyperMedia

Le World Wide Web (WWW) est le meilleur exemple
des applications hypermédia.
Multimedia, Quality of Service: What is it?
    Multimedia, Quality of Service: What is it?
Exemples des Applications hypermedia?
 World Wide Web ( WWW)
 PowerPoint
 Adobe Acrobat
 autres ?

Système Multimédia

 Le Système multimédia est un système capable de
traiter des données et des applications multimédia

 Le Système Multimedia est caractérisé par le
traitement, le stockage, la génération, la manipulation et
l'interprétation d'information de multimédia
     Multimedia, Quality of Service: What is it?
Characteristiques d’un système Multimédia
 Le système Multimédia a quatre caractéristiques:
o Les systèmes Multimedia doivent être contrôlés par
   ordinateur
o Les systemes Multimédia sont integrés
oL'information qu'ils manipulent doit être représentée
  numériquement
o L'interface à la présentation finale des médias est
  habituellement interactive.
 Multimedia, Quality of Service: What is it?

Challenges des Systèmes Multimedia
 Réseaux Distribués
 relation entre les données

o Rendre les données différentes en même temps -- sans interruption
o Ordonnancement dans les médias
   passser les trames dans l’ordre/temps dans video
o Synchronisation – ordonnancement inter-media

(c.g. La synchronisation de lèvre est clairement importante pour que
les humains observent également le playback de la vidéo et l'animation
et l‟audio)



Observation toujours éprouvée hors (lèvre) du film de synchro pendant
longtemps ?
        Multimedia, Quality of Service: What is it?


Les systèmes multimédia principaux doivent traiter:

o Comment présenter et stocker l‟information temporel.
o Comment maintenir strictement les rapports temporels sur play
back/récupération
o Quels processus sont impliqués dans ce qui précède .
o Les données doivent numériquement représentée - conversion Analog-Numérique,
prélevant etc.
o Un grand volume de données exige bande passante, stokage,
compression
Multimedia, Quality of Service: What is it?
 Dispositifs souhaitables pour un système multimédia
 Etant donné les défis ci-dessus le dispositif suivant un souhaitable (sinon
    une chose nécessaire) pour un système de multimédia :
   Capacité de traitement très haut
        -- nécessaire pour traiter un grand volume de données et en temps réel des médias. Banalité spéciale de matériel .
   Système de fichiers pour le multimédia
        -- nécessaire pour fournir des médias en temps réel -- par exemple le streaming vidéo/audio. Le matériel/logiciel spéciaux a
         eu besoin par exemple de la technologie d'INCURSION.
   Représentations de données/formats de fichiers qui supportent le multimédia
        -- faciles à manipuler les représentations de données/formats de fichiers de la compression/décompression en temps réel .
   I/O Entrée-sortie efficace et élevée
        -- I/O dans un sytéme de fichiers doit être efficaces et rapides. Doit tenir compte de l'enregistrement en temps réel aussi
         bien que le playback des données. par exemple diriger vers des systèmes d'enregistrement sur disque
   Special Operating System (OS)
        -- pour permettre l'accès au système de fichiers et aux traitements de données efficacement et rapidement. Les besoins de
         soutenir des transferts directs au disque, à l'établissement du programme en temps réel, à l'interruption de traitement
         rapide, au streaming I/O etc.
   Stokage et mémorisation
        -- grandes unités de stockage (de l'ordre de 50 -100 Gb ou de plus) et grande mémoire (50 -100 Mb ou plus). Grandes
         mémoires caches également requises pour la gestion efficace.
   Support Réseau
        -- Un ssytème Client-serveur commun en tant que les systèmes distribués .
   Logiciels
        -- les outils faciles à utiliser ont dû manipuler des médias, concevoir et développer des applications, fournir les médias.
Multimedia, Quality of Service: What is it?

 Composants d’un système multimédia
 Maintenant on considère les Composants (Hardware and Software) requis
  pour un système multimédia :
 Matériels de Capture
       -- Video Camera, Video Recorder, Audio Microphone, Keyboards, mice,
        graphics tablets, 3D input devices, tactile sensors, VR devices.
        Digitising/Sampling Hardware
 Matériels de stockage
    -- Hard disks, CD-ROMs, Jaz/Zip drives, DVD, etc
 Réseaux de Communication
    -- Ethernet, Token Ring, FDDI, ATM, Intranets, Internets.
 Systemes Informatiques
    -- Multimedia Desktop machines, Workstations, MPEG/VIDEO/DSP Hardware
 Dispositifs d’affichage
    -- CD-quality speakers, HDTV,SVGA, Hi-Res monitors, Colour printers etc.
Buts

Principes
 Classification des applications multimédia
 Identifier les services de réseau le besoin d'apps
 Faire le service best effort
 Mécanismes pour fournir la QoS
Protocoles et Architectures
 Protocoles Spécifiques pour le best-effort
 Architectures pour la QoS
    Historique des Systems multimédia
 Le journal étaient peut-être le premier pour utiliser des multimédia
    -- ils ont employé la plupart du temps le texte, les graphiques, et les
    images.
   En 1895, Gugliemo Marconi a envoyé sa première transmission par
    radio sans fil chez Pontecchio, Italie. Quelques ans après (en 1901)
    il a détecté les ondes radio rayonnées à travers l'Océan atlantique.
    Au commencement inventé pour le télégraphe, la radio est
    maintenant un milieu important pour la radiodiffusion audio.
   La télévision était les nouveaux médias pour le 20ème siècle. Elle
    apporte la vidéo et depuis a changé le monde des communications.
   Certains des événements importants par rapport aux multimédia
    dans le calcul incluent :
   1945 - Bush a écrit au sujet de Memex
   1967 - Negroponte a constitué le groupe de machine d'architecture
    au MIT
   1969 - Nelson et Van Dam est l'éditeur de l'hypertexte au brun
   La naissance de l‟Internet
   1971 - Email
Historique des Systems multimédia
 1976 - Proposition d'une architecture d' groupe de machines
    à DARPA : Médias multiples
   1980 - Lippman & Mohl: Aspen Movie Map
   1983 - Backer:livre électronique
   1985 - Negroponte, Wiesner: Labo. média est ouvert
   1989 - Tim Berners-Lee proposaient le World Wide Web au
    CERN (European Council for Nuclear Research)
   1990 - K. Hooper Woolsey, lab. Apple Multimedia, 100
    personnes, educ.
   1991 - Apple Multimedia Lab: Visual Almanac, Classroom MM
    Kiosk
   1992 - the first M-bone audio multicast on the Net
   1993 - U. Illinois National Center for Supercomputing
    Applications: NCSA Mosaic
   1994 - Jim Clark et Marc Andreesen: Netscape
   1995 - JAVA for platform-independent application
    development. Duke is the first applet.
   1996 - Microsoft, Internet Explorer.
   PARTIE I
 APPLICATIONS
RESEAU MULTIMEDIA
Plan

 Introduction
 Multimédia: Technology et Data
 Streaming stored audio and video
      RTSP
  Applications: MM Networking Applications

Classes of MM applications:          Fundamental
                                       characteristics:
1) Streaming stored audio
   and video                          Typically delay sensitive
                                            end-to-end delay
2) Streaming live audio and
                                        
                                            delay jitter
   video
                                        

                                      But loss tolerant
3) Real-time interactive
   audio and video


   Jitter (gigue) est la variation
   d‟inter-arrivées
Applications: Streaming Stored Multimedia




Streaming:
 media stored at source
 transmitted to client
 streaming: le playout de client
  commence avant que toutes les
  données soient arrivées
    la contrainte de synchronisation des
     données transmises : pour le playout
Applications: Streaming Stored
Multimedia:
What is it?




               2. video
               sent
    1. video                                3. video received,
    recorded              network           played out at client
                           delay
                                                              time
                           streaming: at this time, client
                           playing out early part of video,
                           while server still sending later
                           part of video
 Applications: Streaming Stored Multimedia:
 Interactivity




    VCR-like functionality: client can
     pause, rewind, FF, push slider bar
       10 sec initial delay OK
       1-2 sec until command effect OK
       RTSP often used (more later)

 timing constraint for still-to-be
  transmitted data: in time for playout
   Applications: Streaming Live Multimedia

Exemples:
 Internet radio talk show
 Evenement sportif
Streaming
 playback buffer
 le playback peut traîner des dizaines de secondes
  après transmission
 contraintes de synchronisation
Interactivity
 Diffusion rapide impossible
 rewind, pause possible!
 Applications: Interactive, Real-Time
 Multimedia


 applications: IP telephony, video
   conference, distributed interactive
   worlds

 end-end delay requirements:
      audio: < 150 msec good, < 400 msec OK
        • includes application-level (packetization) and network delays
        • higher delays noticeable, impair interactivity
 session initialization
      comment l'appelé annonce-t-elle son IP address, numéro du
       port, codant des algorithmes ?
Applications: Multimedia Over Today‟s
Internet
TCP/UDP/IP: “best-effort service”
   no guarantees on delay, loss
                        ?                      ?      ?
             ?                     ?                       ?
              But you said multimedia apps requires ?
               QoS and level of performance to be
            ?           ? effective!              ?  ?

             Aujourd'hui les applications multimédia sur internet
             emploient des techniques Pour combler atténuer
             (en tant que meilleur) des effets de retards,des pertes
  Applications: How should the Internet
  evolve to better support multimedia?
Integrated services philosophy:   Differentiated services
 Le principe fondamental            philosophy:
   change dans l'Internet de       Peu de changements à
   sorte que les apps puissent       l'infrastructure
   réserver la bande passante        d'Internet( pourtant
   de bout-en-bout                   fournissent la 1ere et 2eme
 Exige des nouveaux,                classes de service )
   software dans les hosts &
   routers
Laissez-faire
 Pas de changement majeur
 Plis de bandwidth quand il
   est nécessaire
 distribution du contenu,
   Multicast au niveau de la
   couche application
     Applications Multimédia: Technology and
                       Data




 Introduction
 Multimedia Technology
 Multimedia Data
      Technology and Data : Introduction

Applications
Examples of Multimedia Applications include:
World Wide Web
Hypermedia courseware
Video conferencing
Video-on-demand
Interactive TV
Groupware
Home shopping
Games
Virtual reality
Digital video editing and production systems
Multimedia Database systems
                Technology and Data : Introduction

Tendances dans les multimédia
Les grands domaines d'applications courants dans les multimédia incluent :
     -- World Wide Web
     -- Hypermedia systems
     -- Embrasser presque tous les technologies de multimédia et domaines
         d'application.
      -- Popularité toujours croissante
MBone
-- Multicast Backbone: Équivalent aux conventionnelles TV et de radio sur l'Internet
Permission des Technologies
    -- se développer à une vitesse rapide pour soutenir le besoin toujours
    croissant de multimédia.
    -- Porteuse, commutation, protocole, application, codage/compression, base de
    données, traitement, et technologies d'intégration de système
Davantage de lecture/d'exploration
Essayer quelques bonnes sources pour localiser des exemples de multimédia
d'Internet sur l'Internet par exemple :
WebMuseum.paris
Audio.Net
BBC.Web.Site
                     Multimédia Technology


Discrete v continuous media

Les systèmes multimédia traitent la génération, la manipulation, le stockage, la
présentation, et la communication d'information en forme numérique.
Les données peuvent avoir plusieurs format: text, graphics, images, audio, video.
La majorité de ces données sont grandes of this data is large et les différents
médias peuvent avoir besoin de synchronisation
-- les données peuvent avoir des rapports temporels comme propriété intégrale.
Quelques médias sont indépendants en temps or statiques or discretees :
 normal data, text, single images, graphics.
Video, animation and audio are examples of continuous media.
                              Multimédia Technology




Analog and Digital Signals
• La conversion analog au numérique est nécessaire pour l‟ordinateur
• Playback convertit du numérique à l‟analogique
 Noter que le texte, les graphiques et quelques images sont produits directement par ordinateur et
n'exigent pas la digitalisation : ils sont produits directement dans le format binaire. Le texte manuscrit
doit a digitalisé l'un ou l'autre par la sensation électronique de stylo du balayage de la forme sur papier.
                   Multimédia Technology


Input Devices and Storage

oText and Static Data
o Graphics
o Images
o Audio
o Video
            Applications Multimédia: Technology

Text and Static Data
o Source: keyboard, floppies, disks and tapes.
o Stored and input character by character:
 - Storage of text is 1 byte per character (text or format character)
 - For other forms of data e.g. Spreadsheet files some
   formats may store format as text (with formatting) others may use binary
   encoding
o Files may contain raw text or formatted text
   e.g HTML, Rich Text Format (RTF) or a program language source (C, Pascal, etc..
o Not temporal: BUT may have natural implied sequence e.g. HTML format
   sequence, Sequence of C program statements.
o Capacité de stockage pas assez haute
         Applications Multimédia: Technology

Graphics
o Format: constructed by the composition of primitive objects
such as lines polygons, circles, curves and arcs.
o Input: Graphics are usually generated by a graphics editor
 programm (e.g . Freehand) or automatically by a program
( postscript).
o Graphics are usually editable or revisable (unlike Images).
o Graphics input devices include: keyboard (for text and cursor
control), mouse, trackball or graphics tablet.
o Graphics files may adhere to a graphics standard
 (OpenGL, PHIGS, GKS) Text may need to stored
o Graphics files usually store the primitive assembly
o Overhead moins élevée.
       Applications Multimédia: Technology


Images
o Images representées (non compressées) comme un bitmap (a grille des pixels)
o Input: Generated by programs similar to graphics or animation programs
o Input: Scanned for photographs or pictures using a digital scanner
  or from a digital camera.
o Les sources analogiques exigent la numérisation
o Stored at 1 bit per pixel (Black and White), 8 Bits per pixel (Grey Scale, Colour
  Map) or 24 Bits per pixel (True Colour)
o Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512 24 bit image takes
3/4 Mb with no compression.
o le overhead croit avec la taille de l‟image
o la compression est généralement appliquée
 Applications Multimédia: Technology

Audio
o Audio signals are continuous analog signals
o Input: microphones and then digitised and store
o Usually compressed as CD quality audio requires 16-bit
   sampling at 44.1 KHz
o 1 Minute of Mono CD quality audio requires 60*44100*2 Bytes which is
 approximately 5 Mb.

Video
o Input: Analog Video is usually captured by a video camera and then digitised
o There are a variety of video (analog and digital) formats
o Raw video can be regarded as being a series of single images. There are typically
 25, 30 or 50 frames per second.
o a 512x512 size monochrome video images take 25*0.25 = 6.25Mb for a minute to store
   uncompressed
o Digital video clearly needs to be compressed.
   Applications Multimédia: Technology

Output Devices
o The output devices for a basic multimedia system include
o A High Resolution Colour Monitor
o CD Quality Audio Output
o Colour Printer
o Video Output to save Multimedia presentations to (Analog) Video Tape, CD-ROM DVD
o Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
o Storage Medium (Hard Disk, Removable Drives, CD-ROM)
            Applications Multimédia: Technology



Storage Media
Les grand problèmes qui affectent le sockage media sont:
o Large volume of date
o Real time delivery
o Data format
o Storage Medium
o Retrieval mechanisms (mécanismes de récupération)
High performance I/O
Quatre facteure influencent la perfermance des I/O:
-Data:
 + high volume, continuous, contiguous storage
 + direct relationship between size of data and how long it takes to handle.
 + Compression and also distributed storage
      Applications Multimédia: Technology


High performance I/O

- Data Storage
    + Depends of the storage hardware
    + The nature of the data.
    + The following storage parameters affect how data is stored:
       Storage Capacity
       Read and Write Operations of hardware
       Unit of transfer of Read and Write
       Physical organisation of storage units
       Read/Write heads, Cylinders per disk, Tracks per cylinder, Sectors per Track
       Read time
      Seek time (temps de recherche)
  Applications Multimédia: Technology

High performance I/O
- Data Transfer
   + Depend how data generated
   + Written to disk
   + In what sequence it needs to retrieved
   + Writing/Generation of Multimedia data is usually sequential e.g. streaming
     digital audio/video direct to disk
    + Individual data (e.g. audio/video file) usually streamed
    + RAID architecture can be employed to accomplish high I/O rates by exploiting
     parallel disk access
- Operating System Support
    + Scheduling of processes when I/O is initiated
    + Time critical operations can adopt special procedures
    + Direct disk transfer operations free up CPU/Operating system space
           Applications Multimédia: Technology
Basic Storage
Les unités basiques de stockage ont des problèmes de traiter un volume important
des données
- Single Hard Drives -- SCSI/IDE Drives
- AV (Audio-Visual) drives
 + avoid thermal recalibration between read/writes
 + suitable for desktop multimedia
- New drives are fast enough for direct to disk audio and video capture
- Not adequate for commercial/professional Multimedia
- Removable Media
 + Jaz/Zip Drives
 + CD-ROM, DVD
 + Floppies not adequate
            Applications Multimédia: Technology

RAID -- Redundant Array of Inexpensive Disks


Necessaire:
o Pour accomplir les besoins du courant multimédia
 et toute autre application gourmande
o tolérance aux fautes à construire dans le dispositif de stockage
o le traitement parallèle exploite l’arrangement des hard disks

Raid technology offers some significant advantages as a storage medium:
•Affordable (accessible) alternative to mass storage (mémoire de masse)
•High throughput and reliability (fiabilité)
            Applications Multimédia: Technology


Les principaux composantes du système RAID sont:
•Set of disk drives, disk arrays, viewed by user as one or more logical
 drives.
•Data may be distributed across drives
•Redundancy added in order to allow for disk failure
•Disk arrays can be used to:
  - store large amounts of data
  - have high I/O
  - less power per megabyte (when compared to high end disks)
  - but they have very poor reliability
  - N devices generally have 1/N the reliability of a single device
            Applications Multimédia: Technology


Stockage sur plusieurs disques (arrays) (même fichier)
Quatre techniques principales pour surmonter le manque de fiabilit& des arrays:
o Mirroring or shadowing du contenu du disque qui peut influencer la capacité
 - write on two disks
 - Reads to disks may however be optimised
o Horizontal Hamming Codes:Moyens spéciaux de reconstruire l'information en
  utilisant une technique de codage de correction d'erreurs
o Parity and Reed-Soloman Codes: En outre un mécanisme de codage de correction
  d'erreurs. La parité peut être calculée d'un certain nombre de manières
oFailure Prediction: Il n‟ y a aucune capacité d‟overhead dans cette technique
          Applications Multimédia: Technology
RAID Architecture
Chaque disque dans l‟ array a besoin d‟avoir son propre contrileur I/O,
 Mais l‟ interaction avec un ordinateur principal peut être négocié avec
 contrôleur de l‟array
      Applications Multimédia: Technology

Orthogonal RAID (incursion)
Possible de combiner les
   disques ensemble pour
   produire une collection de
   dispositifs, où
o Chaque array vertical est
   maintenant l’unité de
   redondance de données
o Un tel arrangement
   s'appelle une incursion
   (RAID) orthogonale
o D'autres arrangements
   des disques sont
   également possibles
             Applications Multimédia: Technology


Orthogonal RAID
Il y a maintenant 8 niveaux de technologie d'INCURSION, avec chaque niveau
fournissant une plus grande quantité de résilience puis les niveaux plus bas :
Level 0: Disk Striping (rayage)
     -- distributing data across multiple drives. Level 0 is an independent array
     the complete set of data on a single drive, though with a lower access rate.
Level 1: Disk Mirroring
     -- Level 1 focusses on fault tolerancing and involves a second duplicate write
       to a mirror disk each time a write request is made
Level 2: Bit Interleaving and HEC Parity
         -- Level 2 stripes data to a group of disks using a bite stripe. A Hamming
            code symbol for each data stripe is stored on the check disk.
Level 3: Bit Interleaving with XOR Parity
         -- Level 3 is a striped parallel array where data is distributed by bit or byte.
           One drive in the array provides data protection by storing a parity
      check byte for each data stripe.
        Applications Multimédia: Technology

   .
 Level 4: Block Interleaving with XOR Parity
   -- In Level 4 parity is interleaved at the sector or transfer level. As
      with Level 3, a single drive is used to store redundant data using a
      parity check byte for each data stripe
 Level 5: Block Interleaving with Parity Distribution
   -- Level 5 combines the throughput of block interleaved data striping of
      Level 0 with the parity reconstruction mechanism of Level 3 without
      requiring an extra parity drive
 Level 6: Fault Tolerant System
   -- additional error recovery. This is an improvement of Level 5. Disks
      are considered to be in a matrix formation and parity is generated
      for each row and column of the matrix
          Applications Multimédia: Technology
   .
 Level 7: Heterogeneous System
   -- Fast access across whole system. Level 7 allows each individual drive
      to access data as fast as possible by incorporating a few crucial
      features:
            – Each I/O drive is connected to high speed data bus which posses a
              central cache store capable of supporting multiple host I/O paths.
            – A real time process-oriented OS is embedded into the disk array
              architecture -- frees up drives, allowing independent drive head
              operation. Substantial improvement.
            – All parity checking and check/microprocessor/bus/cache control logic
              embedded in this OS.
            – OS designed to support multiple host interfaces -- other RAID levels
              support only one.
            – Additional ability to reconstruct data in the event of dependent drive
              failure increased due to separate cache/device control, and secondary,
              tertiary and beyond parity calculation -- up to four simultaneous disk
              failures supported.
            – Dynamic Mapping used. In conventional storage a block of data, once
              created, is written to fixed memory location. All operations then rewrite
              data back to this location. In Dynamic Memory this constraint is freed
              and new write locations logged and mapped. This frees additional disk
              accesses and the potential for a bottleneck.
    Applications Multimédia: Technology

.
          Applications Multimédia: Technology
Optical Storage
 Optical storage has been the most popular storage medium in the
  multimedia context due its compact size, high density recording,
  easy handling and low cost per MB.
 CD is the most common and we discuss this below. Laser disc and
  recently DVD are also popular.
                Applications Multimédia: Technology
CD Storage
There a now various formats of CD:
 CD-DA (Compact Disc-Digital Audio)
 CD-I (Compact Disc-Interactive)
 CD-ROM/XA (eXtended Architecture)
 Photo CD - digital pictures on compact disk
 The capacity of a CD-ROM is 620-700 Mbs depending on CD material, Drives that read and
   write the CD-ROMS. 650 Mb (74 Mins) is a typical write once CD-ROM size.
CD Standards
There are several CD standard for different types of media:
 Red Book
        -- Digital Audio: Most Music CDs.
   Yellow Book
        -- CD-ROM: Model 1 - computer data, Model 2 - compress audio/video data.
   Green Book
        -- CD-I
   Orange Book
        -- write once CDs
   Blue Book
        -- LaserDisc
      Applications Multimédia: Technology

DVD ( Digital Video Disc, Digital Versatile Disc)
 DVD has become a major new medium for a whole host
  of multimedia system:
 DVD-Video and DVD-ROM
 DVD-Audio format
         Applications Multimédia: Technology
The main features of DVD include:
 Over 2 hours of high-quality digital video (over 8 on a double-sided, dual-
  layer disc).
 Support for widescreen movies on standard or widescreen TVs (4:3 and
  16:9 aspect ratios).
 Up to 8 tracks of digital audio (for multiple languages), each with as many
  as 8 channels.
 Up to 32 subtitle/karaoke tracks.
 Automatic seamless branching of video (for multiple story lines or
  ratings on one disc).
 Up to 9 camera angles (different viewpoints can be selected during
  playback).
 Menus and simple interactive features (for games, quizzes, etc.).
 Multilingual identifying text for title name, album name, song name, cast,
  crew, etc.
 Instant rewind and fast forward, including search to title, chapter,
  track, and timecode.
 Durability (no wear from playing, only from physical damage).
 Not susceptible to magnetic fields. Resistant to heat.
 Compact size (easy to handle and store, players can be portable,
  replication is cheaper).
    Applications Multimédia: Technology
Quality of DVD-Video
 DVD has the capability to produce near-studio-quality video and
  better-than-CD-quality audio.
 DVD is vastly superior to videotape and generally better than
  laserdisc.
 However, quality depends on many production factors. Until
  compression experience and technology improves we will
  occasionally see DVDs that are inferior to laserdiscs.
 DVD video is compressed from digital studio master tapes to
  MPEG-2 format.
 DVD audio quality is excellent. One of DVD's audio formats is
  LPCM (linear pulse code modulation) with sampling sizes and rates
  higher than audio CD.
 The final assessment of DVD quality is in the hands of consumers.
  Most initial reports consistently rate it better than laserdisc.
    Applications Multimédia: Technology
What are the disadvantages of DVD?
 It will take years for movies and software to become widely
  available.
 It can't record (yet).
 It has built-in copy protection and regional lockout.
 It uses digital compression. Poorly compressed audio or video may
  be blocky, fuzzy, harsh,
 or vague.
 The audio downmix process for stereo/Dolby Surround can
  reduce dynamic range.
 It doesn't fully support HDTV.
 Some DVD players and drives may not be able to read CD-Rs.
 First-generation DVD players and drives can't read DVD-RAM
  discs.
 Current players can't play in reverse at normal speed.
    Applications Multimédia: Technology
Compatibility of DVD
 DVD is compatible with most other optical media storage (but
  there is a distinction between DVD and DVD-ROM below:
 CD audio (CD-DA) -- All DVD players and drives will read audio
  CDs (Red Book). This is not actually required by the DVD spec,
  but so far all manufacturers have stated that their DVD
  hardware will read CDs. On the other hand, you can't play a DVD
  in a CD player. (The pits are smaller, the tracks are closer
  together, the data layer is a different distance from the surface,
  the modulation is different, the error correction coding is new,
  etc.)
   Applications Multimédia: Technology
Compatibility of DVD

 CD-R may be compatible with DVD-ROM -- The problem is that
  CD-Rs (Orange Book Part II) are invisible to DVD laser.
 Is CD-RW is compatible with DVD -- CD-Rewritable (Orange Book
  Part III) .The new MultiRead standard addresses this and some
  DVD manufacturers have already suggested they will support it.
 Video CD may be compatible with DVD -- It's not required by the
  DVD spec, but it's trivial to support the White Book standard
  since any MPEG-2 decoder can also decode MPEG-1 from a Video
  CD. Panasonic, RCA, Samsung, and Sony models play Video CDs.
      Applications Multimédia: Technology
Sizes and capacities of DVD
 There are many variations on the DVD theme. There are two
   physical sizes: 12 cm (4.7 inches) and 8 cm (3.1 inches), both 1.2
   mm thick. These are the same form factors as CD. A DVD disc
   can be single-sided or double-sided. Each side can have one or two
   layers of data. The amount of video a disc can hold depends on
   how much audio accompanies it and how heavily the video and
   audio are compressed. The oft-quoted figure of 133 minutes is
   apocryphal: a DVD with only one audio track easily holds over 160
   minutes, and a single layer can actually hold up to 9 hours of video
   and audio if it's compressed to VHS quality.
 At a rough average rate of 4.7 Mbps (3.5 Mbps for video, 1.2
   Mbps for three 5.1-channel soundtracks), a single-layer DVD
   holds around 135 minutes. A two-hour movie with three
   soundtracks can average 5.2 Mbps. A dual-layer disc can hold a
   two-hour movie at an average of 9.5 Mbps (very close to the 10.08
   Mbps limit).
      Applications Multimédia: Technology
Capacities of DVD:
 For reference, a CD-ROM holds about 650 MB (megabytes). In the list
  below, SS/DS means single-/double-sided, SL/DL means single-/dual-layer,
DVD-5 (12cm, SS/SL): 4.38 GB (4.7 G) of data, over 2 hours of video
DVD-9(12cm, SS/DL): 7.95 GB (8.5 G), about 4 hours
DVD-10 (12cm, DS/SL): 8.75 GB (9.4 G), about 4.5 hours
DVD-18 (12cm, DS/DL): 15.90 GB (17 G), over 8 hours
DVD-1? (8cm, SS/SL): 1.36 (1.4 G), about half an hour
DVD-2? (8cm, SS/DL): 2.48 GB (2.7 G), about 1.3 hours
DVD-3? (8cm, DS/SL): 2.72 GB (2.9 G), about 1.4 hours
DVD-4? (8cm, DS/DL): 4.95 GB (5.3 G), about 2.5 hours
DVD-R (12cm, SS/SL): 3.68 GB (3.95 G)
DVD-R (12cm, DS/SL): 7.38 GB (7.9 G)
DVD-R (8cm, SS/SL): 1.15 GB (1.23 G)
DVD-R (8cm, DS/SL): 2.29 GB (2.46 G)
DVD-RAM (12cm, SS/SL): 2.40 GB (2.58 G)
DVD-RAM (12cm, DS/SL): 4.80 GB (5.16 G)
  Applications Multimédia: Technology

The increase in capacity from
  CD-ROM is due to:
 smaller pit length ( 2.08x)
 tighter tracks ( 2.16x),
 slightly larger data area (
  1.02x),
 slightly larger data area (
  1.02x),
        Applications Multimédia: Technology
DVD vs CD-ROM Pit Length
discs single or double sided
another data layer added to each side creating a potential for four layers
of data per disc
    Applications Multimédia: Technology
DVD video details
 DVD-Video is an application of DVD-ROM. DVD-Video is also an
  application of MPEG-2. This means the DVD format defines
  subsets of these standards to be applied in practice as DVD-
  Video. DVD-ROM can contain any desired digital information, but
  DVD-Video is limited to certain data types designed for television
  reproduction.
 A disc has one track (stream) of MPEG-2 constant bit rate (CBR)
  or variable bit rate (VBR) compressed digital video. A limited
  version of MPEG-2 Main Profile at Main Level is used. MPEG-1
  CBR and VBR video is also allowed.
    Applications Multimédia: Technology
DVD video details
 Picture dimensions are max 720x480 (29.97 frames/sec) or
  720x576 (25 frames/sec). Allocating an average of 12 bits/pixel.
  (Color depth is still 24 bits, since color samples are shared across
  4 pixels.) The uncompressed source is 124.416 Mbps for video
  source (720x480x12x30 or 720x576x12x25), or either 99.533 or
  119.439 Mbps for film source (720x480x12x24 or
  720x576x12x24). Using the traditional television measurement of
  lines of horizontal resolution DVD can have 540 lines on a
  standard TV (720/(4/3)) and 405 on a widescreen TV
  (720/(16/9)). In practice, most DVD players provide about 500
  lines because of filtering. VHS has about 230 (172 w/s) lines and
  laserdisc has about 425 (318 w/s).
 Different players use different numbers of bits for the video
  digital-to-analog converter.
 Maximum video bitrate is 9.8 Mbps. The average bitrate is 3.5
  but depends entirely on the length, quality, amount of audio, etc.
       Applications Multimédia: Technology
DVD video details
 Still frames (encoded as MPEG-2 I-frames) are supported and
  can be displayed for a specific amount of time or indefinitely.
 A disc also can have up to 32 subpicture streams that overlay the
  video for subtitles,
 Video can be stored on a DVD in 4:3 format (standard TV shape)
  or 16:9 (widescreen).
 DVD players can output video in four different ways:
      full frame (4:3 video for 4:3 display)
      letterbox (16:9 video for 4:3 display)
      pan and scan (16:9 video for 4:3 display)
      widescreen (16:9 video for 16:9 display)
 Video stored in 4:3 format is not changed by the player. It will
  appear normally on a standard 4:3 display. Widescreen systems
  will either enlarge it or add black bars to the sides. 4:3 video may
  have been formatted in various ways before being transferred to
  DVD.
    Applications Multimédia: Technology
DVD audio
 The DVD-Audio format is not yet specified. The International
  Steering Committee announced it expects to have a final draft
  specification by December 1997. This means DVD-Audio products
  may show up around 1999.
 The following details are for audio tracks on DVD-Video. Some
  DVD manufacturers such as Pioneer are developing audio-only
  players using the DVD-Video format.
 A disc can have up to 8 audio tracks (streams). Each track can be
  in one of three formats:
 Dolby Digital (formerly AC-3): 1 to 5.1 channels
 MPEG-2 audio: 1 to 5.1 or 7.1 channels
 Two additional optional formats are supported: DTS and SDDS.
  Both require external decoders.
    Applications Multimédia: Technology
DVD audio
 All five audio formats support karaoke mode, which has two
  channels for stereo (L and R) plus an optional guide melody
  channel (M) and two optional vocal channels (V1 and V2).
 Discs containing 525/60 (NTSC) video must use PCM or Dolby
  Digital on at least one track. Discs containing 625/50
  (PAL/SECAM) video must use PCM or MPEG audio on at least one
  track. Additional tracks may be in any format. The DVD Forum
  has clarified that only stereo MPEG audio is mandatory for
  625/50 discs, while multichannel MPEG-2 audio is recommended.
  Since multichannel MPEG-2 decoders are not yet available, most
  625/50 discs include Dolby Digital audio.
 For stereo output (analog or digital), all NTSC players and all PAL
  players (so far) have a built-in Dolby Digital decoder.
    Applications Multimédia: Technology
DVD audio
 The downmix process does not include the LFE channel and may
  compress the dynamic range in order to improve dialog audibility
  and keep the sound from becoming muddy on average home audio
  systems.
 Linear PCM is uncompressed (lossless) digital audio, the same
  format used on CDs. It can be sampled at 48 or 96 kHz with 16,
  20, or 24 bits/sample. (Audio CD is limited to 44.1 kHz at 16 bits.)
 Dolby Digital is multi-channel digital audio, compressed using AC-3
  coding technology from original PCM with a sample rate of 48 kHz
  at 16 bits.
    Applications Multimédia: Technology
DVD audio
 MPEG audio is multi-channel digital audio, compressed from
  original PCM format with sample rate of 48 kHz at 16 bits. Both
  MPEG-1 and MPEG-2 formats are supported.
 DTS is an optional multi-channel (5.1) digital audio format,
  compressed from PCM at 48 kHz. The data rate is from 64 kbps
  to 1536 kbps.
 SDDS is an optional multi-channel (5.1 or 7.1) digital audio format,
  compressed from PCM at 48 kHz. The data rate can go up to 1280
  kbps.
 A DVD-5 with only one surround stereo audio stream (at 192
  kbps) can hold over 55 hours of audio. A DVD-18 can hold over
  200 hours.
    Applications Multimédia: Technology
DVD and computers
 SO far we have focussed on the media representation and
  standard DVD players. DVD and DVD-ROM in particular is
  beginning to have a huge impact on computers.
 For a computer to employ DVD it must have the following
  features:
 In addition to a DVD-ROM drive, you must have extra hardware
  to decode MPEG-2 video and Dolby Digital/MPEG-2/PCM audio.
  The computer operating system or playback system must support
  regional codes and be licensed to decrypt copy-protected movies.
  You may also need software that can read the MicroUDF format
  used to store DVD data files and interpret the DVD control
  codes. It's estimated that 10-30% of new computers with DVD-
  ROM drives will include decoder hardware, and that most of the
  remaining DVD-ROM computers will include movie playback
  software.
   Applications Multimédia: Technology
DVD and computers
 Some DVD-Videos and many DVD-ROMs will use video encoded
  using MPEG-1 instead of MPEG-2. Many existing computers have
  MPEG-1 hardware built in or are able to decode MPEG-1 with
  software.
 CompCore Multimedia and Mediamatics make software to play
  DVD-Video movies (SoftDVD, DVD Express). Both require at least
  a 233 MHz Pentium MMX with AGP and an IDE/SCSI DVD-ROM
  drive with bus mastering DMA support to achieve about 20
  frame/sec film rates (or better than 300 MHz for 30 frame/sec
  video), and can decrypt copy-protected movies. Oak's software
  requires hardware support. The software navigators support most
  DVD-Video features (menus, subpictures, etc.) and can emulate a
  DVD-Video remote control.
    Applications Multimédia: Technology
DVD and computers
 CompCore, Mediamatics, and Oak Technology have defined
  standards to allow certain MPEG decoding tasks to be performed
  by hardware on a video card and the remainder by software.
  Video graphics controllers with this feature are being called DVD
  MPEG-2 accelerated. (The Mediamatics standard is called
  MVCCA.)
 If you have at least a 433 MHz Alpha workstation you'll be able
  to play DVD movies at full 30 fps in software.
DVD-ROM Drives:
 Most DVD-ROM drives have a seek time of 150-200 ms, access
  time of 200-250 ms, and data transfer rate of 1.3 MB/s
  (11.08*106/8/220) with burst transfer rates of up to 12 MB/s or
  higher. The data transfer rate from DVD-ROM discs is roughly
  equivalent to a 9x CD-ROM drive. DVD spin rate is about 3 times
  faster than CD, so when reading CD-ROMs, some DVD-ROM
  drives transfer data at 3x speed while others are faster. 2x and
  3x DVD-ROM drives are already in the works. Hitachi is shipping
  samples of a 2x DVD-ROM drive which also reads CDs at 20x.
   Applications Multimédia: Technology
DVD and computers
 Connectivity is similar to that of CD-ROM drives: EIDE (ATAPI),
  SCSI-2, etc. All DVD-ROM drives have audio connections for
  playing audio CDs. No DVD-ROM drives have been announced with
  DVD audio or video outputs (which would require internal
  audio/video decoding hardware).
 DVD-ROMs use a MicroUDF/ISO 9660 bridge file system. The
  OSTA UDF file system will eventually replace the ISO 9660
  system of CD-ROMs, but the bridge format provides backwards
  compatibility until operating systems support UDF.
Recordable DVD-ROM: DVD-R and DVD-RAM:
 There are two recordable versions of DVD-ROM: DVD-R (record
  once) and DVD-RAM (erase and record many times), with
  capacities of 3.95 and 2.58 G bytes. Both specifications have
  been published. DVD-R and DVD-RAM are not currently usable for
  home video recording.
 DVD-R uses organic dye polymer technology like CD-R and is
  compatible with almost all DVD drives. The technology will
  improve to support 4.7 G bytes in 1 to 2 years
                      Multimédia Data

 Multimedia Data Representations
o   Basics of Digital Audio
o   Synthetic Sounds
o   Introduction to MIDI (Musical Instrument Digital Interface)
o   Graphic/Image File Formats
o   Colour in Image and Video
o   Basics of Video
   Video and Audio Compression
                    Multimédia Data

Multimedia Data Representations
The topics we consider here are specifically:
 Digital Audio
 Sampling/Digitisation
 Compression (Details of Compression algorithms Next Chapter)
 Graphics/Image Formats
 Digital Video
                Applications Multimédia: Data

Application of Digital Audio
 Music Production
   
            –   Hard Disk Recording
            –   Sound Synthesis
            –   Samplers
            –   Effects Processing
 Video
      - Audio Important Element: Music and Effects
 Web
      -- Many uses on Web
            – Spice up Web Pages
            – Listen to Cds
            – Listen to Web Radio
                Applications Multimédia: Data

Digitization of Sound
Let us first analyse what a sound actually is:
 Sound is a continuous wave that travels through the air
 The wave is made up of pressure differences. Sound is detected by
   measuring the pressure level at a location.
 Sound waves have normal wave properties (reflection, refraction,
   diffraction, etc.).

A variety of sound sources:
 Source Generates Sound
               – Air Pressure changes
               – Electrical -- Loud Speaker
               – Acoustic -- Direct Pressure Variations
 Destination Receives Sound
            – Electrical -- Microphone produces electric signal
            – Ears -- Responds to pressure hear sound
 Sound is required input into a computer: it needs to sampled or digitised:
 Microphones, video cameras produce analog signals (continuous-valued
   voltages)
         Applications Multimédia: Data

Digitizing Audio
That is the basic idea of digitizing a sound unfortunately things are
   (practically speaking) not so simple.
Questions for producing digital audio (Analog-to-Digital Conversion):
1. How often do you need to sample the signal?
2. How good is the signal?
3. How is audio data formatted?
           Applications Multimédia: Data

Computer Manipulation of Sound
Once Digitised processing the digital sound is essentially straightforward although it
   depends on the processing you wish to do ( e.g. volume is easier to code than
   accuarte reverb)
Essentially they all operate on the 1-D array of digitised samples, typical examples
   include:
 Volume
 Cross-Fading
 Looping
 Echo/Reverb/Delay
 Filtering
 Signal Analysis
Soundedit Demos
 Volume
 Cross-Fading
 Looping
 Echo/Reverb/Delay
 Filtering
          Applications Multimédia: Data

Sample Rates and Bit Size
How do we store each sample value ( Quantisation)?
8 Bit Value
      (0-255)
16 Bit Value
      (Integer) (0-65535)
How many Samples to take?
11.025 KHz
      -- Speech (Telephone 8KHz)
22.05 KHz
      -- Low Grade Audio (WWW Audio, AM Radio)
44.1 KHz
      -- CD Quality
       Applications Multimédia: Data

Nyquist's Sampling Theorem
 Suppose we are sampling a
  sine wave (Fig 6.3. How
  often do we need to sample
  it to figure out its
  frequency?




           A Sine Wave
       Applications Multimédia: Data

 If we sample at 1 time per
  cycle, we can think it's a
  constant




       Sampling at 1 time per cycle
        Applications Multimédia: Data

If we sample at 1.5 times per
   cycle, we can think it's a
   lower frequency sine wave




        Sampling at 1.5 times per cycle
        Applications Multimédia: Data

Now if we sample at twice the
  sample frequency, i.e
  Nyquist Rate, we start to
  make some progress. An
  alternative way of viewing
  thr waveform
  (re)genereation is to think
  of straight lines joining up
  the peaks of the samples.
  In this case (at these
  sample points) we see we
  get a sawtooth wave that
  begins to start crudely        Sampling at 2 times per cycle
  approximating a sine wave
        Applications Multimédia: Data

Nyquist rate -- For lossless
  digitization, the sampling
  rate should be at least
  twice the maximum
  frequency responses.
  Indeed many times more
  the better.




                               Sampling at many times per cycle
                      Applications Multimédia: Data
Implications of Sample Rate and Bit Size
Affects Quality of Audio
   Ears do not respond to sound in a linear fashion
   Decibel (dB) a logarithmic measurement of sound
   16-Bit has a signal-to-noise ratio of 98 dB -- virtually inaudible
   8-bit has a signal-to-noise ratio of 50 dB
   Therefore, 8-bit is roughly 8 times as noisy
         6 dB increment is twice as loud
Signal to Noise Ratio (SNR)
   In any analog system, some of the voltage is what you want to measure ( signal), and some of it is random
    fluctuations (noise).
   Ratio of the power of the two is called the signal to noise ratio (SNR). SNR is a measure of the quality of
    the signal.
   SNR is usually measured in decibels (dB).




    Typically 8 bits or 16 bits.
    Each bit adds about 6 dB of resolution, so 16 bits => 96 dB.
    Samples are typically stored as raw numbers ( linear format ), or as logarithms
   (u-law (or A-law in Europe)).
   o Logarithmic representation approximates perceptual uniformity.
            Applications Multimédia: Data

  Affects Size of Data




There is therfore is a trade off between Audio Quality vs. Data Rate
Some typical applications of sample bit size and sample rate are listed below:

Quality    Sample Rate    Bits per      Mono/       Data Rate         Frequency
               (KHz)      Sample       Stereo       (Uncompressed)      Band
--------- -----------     -------- -------- -------- ---------        ------------
Telephone         8        8           Mono            8 KBytes/sec 200-3,400 Hz
AM Radio         11.025    8           Mono          11.0 KBytes/sec
FM Radio         22.050    16          Stereo        88.2 KBytes/sec
CD               44.1      16          Stereo         176.4 KBytes/sec 20-20,000 Hz
DAT             48          16         Stereo         192.0 KBytes/sec 20-20,000 Hz
               Applications Multimédia: Data
AUDIO DEMO: Comparison of Sample Rate and Bit Size
Click on the file links below to audibly hear the difference in the sampling
rates/bit sizes indicated for each file type:
File TYpe                 File Size (all mono)
44 Hz 16 bit                       3.5 Mb
44KHz 08 bit                       13 Mb
22KHz 16 bit                       740 Kb
22KHz 08 bit                       424 Kb
11KHz 08 bit                       120 Kb

Telephone uses u-law encoding, others use linear. So the dynamic range of digital
  telephone signals is effectively 13 bits rather than 8 bits.
CD quality stereo sound -> 10.6 MB / min.
            Applications Multimédia: Data

Typical Audio Formats
 Popular audio file formats include .au (Unix workstations), .aiff
  (MAC, SGI), .wav (PC, DEC workstations)
 A simple and widely used audio compression method is Adaptive
  Delta Pulse Code Modulation (ADPCM). Based on past samples, it
  predicts the next sample and encodes the difference between
  the actual value
 compression formats:
  Sounblaster: .voc
  Protools/Sound: .sd2
  Realaudio: .ra
  Ogg Vorbis: .ogg
 MPEG AUDIO: More Later (MPEG-3 and MPEG-4)
             Applications Multimédia: Data

Delivering Audio over a Network
 Trade off between desired fidelity and file size
 Bandwidth Considerations for Web and other media.
 Compress Files:
     Could affect live transmission on Web
Streaming Audio
 Buffered Data:
      Trick get data to destination before it's needed
      Temporarily store in memory (Buffer)
      Server keeps feeding the buffer
      Client Application reads buffer
 Needs Reliable Connection, moderately fast too.
 Specialised client, Steaming Audio Protocol (PNM for real audio).
            Applications Multimédia: Data

Synthetic Sounds
 FM (Frequency Modulation) Synthesis - used in low-end Sound
   Blaster cards, OPL-4 chip, Yamaha DX Synthesiser range popular
   in Early 1980's.
 Wavetable synthesis - wavetable generated from sound waves of
   real instruments
 Modern Synthesiser use a mixture of sample and synthesis
Introduction to MIDI (Musical Instrument Digital Interface)
Definition of MIDI: a protocol that enables computer, synthesizers,
   keyboards, and other musical device to communicate with each
   other.
             Applications Multimédia: Data

Digital Audio and MIDI
 There are many application os DIgital Audio and Midi being used
   together:
 Modern Recording Studio -- Hard Disk Recording and MIDI
      Analog Sounds (Live Vocals, Guitar, Sax etc) -- DISK
      Keyboards, Drums, Samples, Loops Effects -- MIDI
 Sound Generators: use a mix of
      Synthesis
      Samples
 Samplers -- Digitise (Sample) Sound then
      Playback
      Loop (beats)
      Simulate Musical Instruments
            Applications Multimédia: Data

Digital Audio, Synthesis, Midi and Compression -- MPEG 4
   Structured Audio
 We have seen the need for compression already in Digital Audio --
   Large Data Files
 Basic Ideas of compression (see next Chapter) used as integral
   part of audio format -- MP3, real audio etc.
 Mpeg-4 audio -- actually combines compression synthesis and midi
   to have a massive impact on compression.
 Midi, Synthesis encode what note to play and how to play it with a
   small number of parameters -- Much greater reduction than
   simply having some encoded bits of audio.
 Responsibility to create audio delegated to generation side.
             Applications Multimédia: Data
MPEG 4 Structured Audio
MPEG-4 covers the the whole range of digital audio:
from very low bit rate speech to full bandwidth high quality audio
built in anti-piracy measures
 Structured Audio
 Structured Audio Tools
MPEG-4 comprises of 6 Structured Audio tools are:
 SAOL
    , the Structured Audio Orchestra Language
 SASL
    , the Structured Audio Score Language
 SASBF
    , the Structured Audio Sample Bank Format
 a set of MIDI semantics
    which describes how to control SAOL with MIDI
 a scheduler
    , which describes how to take the above parts and create sound
 the AudioBIFS
    part of BIFS, which lets you make audio soundtracks in MPEG-4 using
      a variety of tools and effects-processing techniques
 Very briefly each of the above tools have a specific function
              Applications Multimédia: Data
   Graphic/Image File Formats
This section introduces some of the most common graphics and image file formats.
 Some of them are restricted to particular hardware/operating system platforms,
others are cross-platform independent formats. While not all formats are
 cross-platform, there are conversion applications that will recognize and translate
formats from other systems.
The document (http://www.cica.indiana.edu/graphics/image.formats.html) by
CICA at Indiana Univ. provides a fairly comprehensive listing of various formats.
Most image formats incorporate some variation of a compression technique due to
 the large storage size of image files. Compression techniques can be classified into
either lossless or lossy. We will study various video and audio compression techniques
in the Next Chapter.

o Monochrome/Bit-Map Images
o Gray-scale Images
o 8-bit Colour Images
o 24-bit Colour Images
        Applications Multimédia: Data

Monochrome/Bit-Map Images
An example 1 bit monochrome image is illustrated in Fig. where:




Sample Monochrome Bit-Map Image
 Each pixel is stored as a single bit (0 or 1)
 A 640 x 480 monochrome image requires 37.5 KB of
  storage.
 Dithering is often used for displaying monochrome images
           Applications Multimédia: Data
Gray-scale Images
An example gray-scale image is illustrated in Fig. 6.12 where:




Example of a Gray-scale Bit-map Image
 Each pixel is usually stored as a byte (value between 0 to 255)
 A 640 x 480 greyscale image requires over 300 KB of storage.
            Applications Multimédia: Data
8-bit Colour Images
An example 8-bit colour image is illustrated in Fig. 6.13 where:




Example of 8-Bit Colour Image
 One byte for each pixel
 Supports 256 out of the millions s possible, acceptable colour
  quality
 Requires Colour Look-Up Tables (LUTs)
 A 640 x 480 8-bit colour image requires 307.2 KB of storage (the
  same as 8-bit greyscale)
             Applications Multimédia: Data
 24-bit Colour Images
 An example 24-bit colour image is illustrated in Fig. 6.14 where:




Example of 24-Bit Colour Image
 Each pixel is represented by three bytes (e.g., RGB)
 Supports 256 x 256 x 256 possible combined colours (16,777,216)
 A 640 x 480 24-bit colour image would require 921.6 KB of storage
 Most 24-bit images are 32-bit images, the extra byte of data for each pixel is
 used to store an alpha value representing special effect information
              Applications Multimédia: Data
Standard System Independent Formats
The following brief format descriptions are the most commonly used
  formats. Follow some of the document links for more descriptions.

 o   GIF (GIF87a, GIF89a)
 o   JPEG
 o   TIFF
 o   Graphics Animation Files
 o Postscript/Encapsulated Postscript
         Applications Multimédia: Data
GIF (GIF87a, GIF89a)
 Graphics Interchange Format (GIF) devised by the UNISYS Corp.
  and Compuserve, initially for transmitting graphical images over
  phone lines via modems
 Uses the Lempel-Ziv Welch algorithm (a form of Huffman
  Coding), modified slightly for image scan line packets (line
  grouping of pixels)
 Limited to only 8-bit (256) colour images, suitable for images with
  few distinctive colours (e.g., graphics drawing)
 Supports interlacing
         Applications Multimédia: Data
JPEG
 A standard for photographic image compression created by the
  Joint Photographics Experts Group
 Takes advantage of limitations in the human vision system to
  achieve high rates of compression
 Lossy compression which allows user to set the desired level of
  quality/compression
 Detailed discussions in next chapter on compression.
         Applications Multimédia: Data
TIFF
 Tagged Image File Format (TIFF), stores many different types of
  images (e.g., monochrome, greyscale, 8-bit & 24-bit RGB, etc.) ->
  tagged
 Developed by the Aldus Corp. in the 1980's and later supported
  by the Microsoft
 TIFF is a lossless format (when not utilizing the new JPEG tag
  which allows for JPEG compression)
 It does not provide any major advantages over JPEG and is not as
  user-controllable it appears to be declining in popularity
        Applications Multimédia: Data
Graphics Animation Files
 FLC - main animation or moving picture file format, originally
  created by Animation Pro
 FLI - similar to FLC
 GL - better quality moving pictures, usually large file sizes
        Applications Multimédia: Data
Postscript/Encapsulated Postscript
 A typesetting language which includes text as well as
   vector/structured graphics and bit-mapped images
 Used in several popular graphics programs (Illustrator,
   FreeHand)
 Does not provide compression, files are often large
         Applications Multimédia: Data
System Dependent Formats
Many graphical/imaging applications create their own file format
  particular to the systems they are executed upon. The following
  are a few popular system dependent formats:
o Microsoft Windows: BMP
o Macintosh: PAINT and PICT
o X-windows: XBM
         Applications Multimédia: Data
Microsoft Windows: BMP
 A system standard graphics file format for Microsoft Windows
 Used in PC Paintbrush and other programs
 It is capable of storing 24-bit bitmap images
Macintosh: PAINT and PICT
 PAINT was originally used in MacPaint program, initially only for
  1-bit monochrome images.
 PICT format is used in MacDraw (a vector based drawing
  program) for storing structured graphics
X-windows: XBM
 Primary graphics format for the X Window system
 Supports 24-bit colour bitmap
 Many public domain graphic editors, e.g., xv
 Used in X Windows for storing icons, pixmaps, backdrops, etc.
        Applications Multimédia: Data
Colour in Image and Video
 Basics of Colour
   o   Light and Spectra
   o   The Human Retina
   o   Cones and Perception
 CIE Chromaticity Diagram
 Colour Image and Video Representations
 Summary of Colour
         Applications Multimédia: Data

Light and Spectra
 Visible light is an electromagnetic wave in the 400nm - 700 nm
   range.
 Most light we see is not one wavelength, it's a combination of
   many wavelengths (Fig. 6.15).




Light Wavelengths
 The profile above is called a spectra.
             Applications Multimédia: Data
The Human Retina
 The eye is basically just a camera
 Each neuron is either a rod or a cone. Rods are not sensitive to colour
Cones and Perception
 Cones come in 3 types: red, green and blue. Each responds
  differently to various frequencies of light. The following figure
  shows the spectral-response functions of the cones and the
  luminous-efficiency function of the human eye (Fig. 6.16.
            Applications Multimédia: Data
Cones and Luminous-efficiency Function of the Human Eye
 The profile above is called a   spectra.
 The colour signal to the brain comes from the response of the 3 cones to the
   spectra being observed (Fig 6.17). That is, the signal consists of 3 numbers:




where E is the light and S are the sensitivity functions.
 A colour can be specified as the sum of three colours. So colours form a 3
 dimensional vector space.
 The following figure shows the amounts of three primaries needed to match
all the wavelengths of the visible spectrum (Fig. refspectrum)
          Applications Multimédia: Data
Wavelengths of the Visible Spectrum
 The negative value indicates that some colours cannot be exactly
  produced by adding up the primaries.
                      Applications Multimédia: Data

Colour Image and Video Representations
    A black and white image is a 2-D array of integers.
    A colour image is a 2-D array of (R,G,B) integer triplets. These triplets encode how much the corresponding
     phosphor should be excited in devices such as a monitor
Beside the RGB representation, YIQ and YUV are the two commonly used in video
YIQ Colour Model
    YIQ is used in colour TV broadcasting, it is downward compatible with B/W TV.
    Y (luminance) is the CIE Y primary
    Y = 0.299R + 0.587G + 0.114B
     the other two vectors:
  I = 0.596R - 0.275G - 0.321B Q = 0.212R - 0.528G + 0.311B
     The YIQ transform:




    I is red-orange axis, Q is roughly orthogonal to I.
    Eye is most sensitive to Y, next to I, next to Q. In NTSC, 4 MHz is allocated to Y, 1.5 MHz to I, 0.6 MHz to
     Q.
Applications Multimédia: Data
          Applications Multimédia: Data

YUV (CCIR 601 or YCrCb) color Model
 Established in 1982 to build digital video standard
 Video is represented by a sequence of fields (odd and
  even lines). Two fields make a frame.
 Works in PAL (50 fields/sec) or NTSC (60 fields/sec)
 Uses the Y, Cr, Cb colour space (also called YUV) Y =
  0.299R + 0.587G + 0.114B Cr = R - Y Cb = B - Y
 The YCrCb (YUV) Transform:
               Applications Multimédia: Data

CCIR 601 also defines other image parameters, e.g. for NTSC, Luminance (Y)
image size = 720 x 243 at 60 fields per second Chrominance image size = 360 x 243
at 60 fields per second
An example YCrCb Decomposition
                 Applications Multimédia: Data
The CMY Colour Model
 Cyan, Magenta, and Yellow (CMY) are complementary colours of RGB.
They can be used as Subtractive Primaries.
 CMY model is mostly used in printing devices where the colour pigments on the
paper absorb certain colours (e.g., no red light reflected from cyan ink).
             Applications Multimédia: Data
Conversion between RGB and CMY:
- e.g., convert White from (1, 1, 1) in RGB to (0, 0, 0) in CMY.
  Sometimes, an alternative CMYK model (K stands for Black) is used
in colour printing (e.g., to produce darker black than simply mixing
CMY).
 where
            Applications Multimédia: Data

Summary of Colour
 Colour images are encoded as triplets of values.
 Three common systems of encoding in video are RGB, YIQ, and
  YCrCb.
 Besides the hardware-oriented colour models (i.e., RGB, CMY,
  YIQ, YUV), HSB (Hue, Saturation, and Brightness, e.g., used in
  Photoshop) and HLS (Hue, Lightness, and Saturation) are also
  commonly used.
 YIQ uses properties of the human eye to prioritize information. Y
  is the black and white (luminance) image, I and Q are the colour
  (chrominance) images. YUV uses similar idea.
 CCIR 601 is a standard for digital video that specifies image size,
  and decimates the chrominance images
          Applications Multimédia: Data
Basics of Video
 Types of Colour Video Signals
 Analog Video
 Digital Video
   Chroma Subsampling
      CCIR Standards for Digital Video
      ATSC Digital Television Standard
          Applications Multimédia: Data
Types of Colour Video Signals
 Component video - each primary is sent as a separate video signal.
    The primaries can either be RGB or a luminance-chrominance
      transformation of them (e.g., YIQ, YUV).
    Best colour reproduction
    Requires more bandwidth and good synchronization of the three
      components
 Composite video - colour (chrominance) and luminance signals are mixed
  into a single carrier wave. Some interference between the two signals is
  inevitable.
 S-Video (Separated video, e.g., in S-VHS) - a compromise between
  component analog video and the composite video. It uses two lines, one
  for luminance and another for composite chrominance signal.
          Applications Multimédia: Data

Analog Video
 The following figures are
  from A.M. Tekalp, Digital
  video processing
               Applications Multimédia: Data
NTSC Video
 525 scan lines per frame, 30 frames per second (or be exact, 29.97 fps,
  33.37 msec/frame)
 Aspect ratio 4:3
 Interlaced, each frame is divided into 2 fields, 262.5 lines/field
 20 lines reserved for control information at the beginning of each field
    So a maximum of 485 lines of visible data
    Laserdisc and S-VHS have actual resolution of 420 lines
    Ordinary TV - 320 lines
 Each line takes 63.5 microseconds to scan. Horizontal retrace takes 10
  microseconds (with 5 microseconds horizontal synch pulse embedded), so
  the active line time is 53.5 microseconds.
               Applications Multimédia: Data

 Digital Video Rasters
 Colour representation:
       NTSC uses YIQ colour model.
       composite = Y + I cos(Fsc t) + Q sin(Fsc t), where Fsc is the frequency of
        colour subcarrier
       Eye is most sensitive to Y, next to I, next to Q. In NTSC, 4 MHz is allocated
        to Y, 1.5 MHz to I, 0.6 MHz to Q.
 PAL Video
 625 scan lines per frame, 25 frames per second (40 msec/frame)
 Aspect ratio 4:3
 Interlaced, each frame is divided into 2 fields, 312.5 lines/field
 Colour representation:
       PAL uses YUV (YCbCr) colour model
       composite = Y + 0.492 x U sin(Fsc t) + 0.877 x V cos(Fsc t)
       In component analog video, U and V signals are lowpass filtered to about half
        the bandwidth of Y.
             Applications Multimédia: Data

 Digital Video
 Advantages:
      Direct random access -> good for nonlinear video editing
      No problem for repeated recording
      No need for blanking and sync pulse
 Almost all digital video uses component video
               Applications Multimédia: Data
Chroma Subsampling              is a method that stores color
information at lower resolution than intensity information
How to decimate for chrominance ?
                 Applications Multimédia: Data
Chroma Subsampling
 4:2:2 -> Horizontally subsampled colour signals by a factor of 2. Each pixel is
 two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4) ...
 4:1:1 -> Horizontally subsampled by a factor of 4
 4:2:0 -> Subsampled in both the horizontal and vertical axes by a factor of 2
    between pixels
 4:1:1 and 4:2:0 are mostly used in JPEG and MPEG
               Applications Multimédia: Data
CCIR Standards for Digital Video
(CCIR - Consultative Committee for International Radio)

                         CCIR 601          CCIR 601    CIF           QCIF
                         525/60            625/50
                                    NTSC              PAL/SECAM     NTSC
-------------------- ----------- ----------- ----------- -----------
Luminance resolution    720 x 485     720 x 576         352 x 240     176 x120
Chrominance resolut.      360 x 485    360 x 576        176 x 120      88 x 60
Colour Subsampling         4:2:2        4:2:2
Fields/sec                  60          50               30            30
Interlacing                  Yes         Yes             No            No

 CCIR 601 uses interlaced scan, so each field only has half as much vertical resolution
  (e.g., 243 lines in NTSC). The CCIR 601 (NTSC) data rate is 165 Mbps.
 CIF (Common Intermediate Format) is introduced to as an acceptable temporary
  standard. It delivers about the VHS quality. CIF uses progressive (non-interlaced) scan.
              Applications Multimédia: Data
ATSC Digital Television Standard
 (ATSC - Advanced Television Systems Committee) The ATSC Digital
  Television Standard was recommended to be adopted as the Advanced TV
  broadcasting standard by the FCC Advisory Committee on Advanced
  Television Service on November 28, 1995. It covers the standard for HDTV
  (High Definition TV).
 Video Format
 The video scanning formats supported by the ATSC Digital Television
  Standard are shown in the following table
       Vertical Lines   Horizontal Pixels   Aspect Ratio    Picture Rate



           1080               1920             16:9         60I 30P 24P


           720                1280             16:9         60P 30P 24P


           480               704             16:9 & 4:3    60I 60P 30P 24P


           480                640               4:3        60I 60P 30P 24P
          Applications Multimédia: Data
 The aspect ratio for HDTV is 16:9 as opposed to 4:3 in NTSC, PAL, and
  SECAM. (A 33% increase in horizontal dimension.)
 In the picture rate column, the &quot;I&quot; means interlaced scan, and
  the &quot;P&quot; means progressive (non-interlaced) scan.
 Both NTSC rates and integer rates are supported (i.e., 60.00, 59.94,
  30.00, 29.97, 24.00, and 23.98).
               Applications Multimedia: Data
Video and Audio Compression
 Classifying Compression Algorithms
 Lossless Compression Algorithms (Repetitive Sequence Suppression)
 Lossless Compression Algorithms (Pattern Substitution)
 Lossless Compression Algorithms (Entropy Encoding)
 Source Coding Techniques
 JPEG Compression
 Video Compression
 H. 261 Compression
 MPEG Compression
 Audio Compression
               Applications Multimedia: Data
Video and Audio Compression
  Video and Audio files are very large beasts. Unless we develop and maintain very
 high bandwidth networks (Gigabytes per second or more) we have to compress to
data. Relying on higher bandwidths is not a good option -- M25 Syndrome: Traffic
needs ever increases and will adapt to swamp current limit whatever this is.
As we will compression becomes part of the representation or coding scheme which
have become popular audio, image and video formats. We will first study basic
compression algorithms and then go on to study some actual coding formats.

Classifying Compression Algorithms
We can classify compression by the why it employs redundancy or by the method
it compresses the data.
What is Compression?
Compression basically employs redundancy in the data:
Temporal -- in 1D data, 1D signals, Audio etc.
Spatial -- correlation between neighbouring pixels or data items
Spectral -- correlation between colour or luminescence components. This uses the
frequency domain to exploit relationships between frequency of change in data.
psycho-visual -- exploit perceptual properties of the human visual system.
Compression can be categorised in two broad ways:
                 Applications Multimédia: Data

Lossless Compression
     -- where data is compressed and can be reconstituted (uncompressed) without
     loss of detail or information. These are referred to as bit-preserving or reversib
     compression systems also.
Lossy Compression
     -- where the aim is to obtain the best possible fidelity for a given bit-rate or
     minimizing the bit-rate to achieve a given fidelity measure. Video and audio
     compression techniques are most suited to this form of compression.
If an image is compressed it clearly needs to uncompressed (decoded) before it can
viewed/listened to. Some processing of data may be possible in encoded form howeve
Lossless compression frequently involves some form of entropy encoding and are base
in information theoretic techniques (Fig. 7.1)
Lossy compression use source encoding techniques that may involve transform encodin
differential encoding or vector quantisation (Fig. 7.1).
Applications Multimédia: Data
                 Applications Multimédia: Data

Classification of Coding Techniques
We now address common coding methods of each type in turn:

Lossless Compression Algorithms (Repetitive Sequence Suppression)
These methods are fairly straight forward to understand and implement.
Their simplicity is their downfall in terms of attaining the best compression ratios.
However, the methods have their applications, as mentioned below:

o Simple Repetition Suppresion
o Run-length Encoding
                Applications Multimédia: Data
Simple Repetition Suppresion
If in a sequence a series on n successive tokens appears we can replace these with
a token and a count number of occurences. We usually need to have a special flag to
denote when the repated token appears
For Example
89400000000000000000000000000000000
we can replace with
894f32
where f is the flag for zero.
Compression savings depend on the content of the data.
Applications of this simple compression technique include:
     o Suppression of zero's in a file ( Zero Length Supression )
           Silence in audio data, Pauses in conversation etc.
           Bitmaps
           Blanks in text or program source files
           Backgrounds in images
     o Other regular image or data tokens
                  Applications Multimédia: Data
Run-length Encoding
This encoding method is frequently applied to images (or pixels in a scan line). It is a small
compression component used in JPEG compression
In this instance, sequences of image elements              are mapped to pairs


where ci represent image intensity or colour and li the length of the ith run of pixels
(Not dissimilar to zero length supression above).
For example:
Original Sequence:
111122233333311112222
can be encoded as:
(1,4),(2,3),(3,6),(1,4),(2,4)
The savings are dependent on the data. In the worst case (Random Noise) encoding is more
 heavy than original file: 2*integer rather 1* integer if data is represented as integers.
               Applications Multimédia: Data

Lossless Compression Algorithms (Pattern Substitution)
This is a simple form of statistical encoding.
Here we substitue a frequently repeating pattern(s) with a code. The code is
shorter than than pattern giving us compression.
A simple Pattern Substitution scheme could employ predefined code
 (for example replace all occurrences of `The' with the code '&').
More typically tokens are assigned to according to frequency of occurrenc of
patterns:
o Count occurrence of tokens
o Sort in Descending order
o Assign some symbols to highest count tokens
A predefined symbol table may used ie assign code i to token i.
However, it is more usual to dynamically assign codes to tokens. The entropy
encoding schemes below basically attempt to decide the optimum assignment of
 codes to achieve the best compression.
                Applications Multimédia: Data

Lossless Compression Algorithms (Entropy Encoding)
Lossless compression frequently involves some form of entropy encoding and are
based in information theoretic techniques, Shannon is father of information theory
and we briefly summarise information theory below before looking at specific
 entropy encoding methods.

o Basics of Information Theory
o The Shannon-Fano Algorithm
o Huffman Coding
o Huffman Coding of Images
o Adaptive Huffman Coding
o Arithmetic Coding
o Lempel-Ziv-Welch (LZW) Algorithm
o Entropy Encoding Summary
o Further Reading/Information
             Applications Multimédia: Data
Basics of Information Theory
According to Shannon, the entropy of an information source S is defined as




where pi is the probability that symbol Si in S will occur.
       indicates the amount of information contained in Si, i.e., the number of
bits needed to code Si.
For example, in an image with uniform distribution of gray-level intensity,
i.e. pi = 1/256, then the number of bits needed to code each gray level is 8
bits. The entropy of this image is 8.
Q: How about an image in which half of the pixels are white (I = 220) and
half are black (I = 10)?

                                       :
                 Applications Multimédia: Data
The Shannon-Fano Algorithm
This is a basic information theoretic algorithm. A simple example will be used to
illustrate the algorithm:
           Symbol A B C D                E
          ----------------------------------
           Count      15 7 6 6           5
Encoding for the Shannon-Fano Algorithm:
A top-down approach
1. Sort symbols according to their frequencies/probabilities, e.g., ABCDE.
2. Recursively divide into two parts, each with approx. same number of counts.
                 Applications Multimédia: Data


Symbol   Count     log(1/p)     Code       Subtotal (# of bits)
------   -----     --------    ---------   --------------------
A          15      1.38         00               30
B          7      2.48          01               14
C          6      2.70          10               12
D          6      2.70           110             18
E         5       2.96          111              15
                              TOTAL (# of bits): 89
                  Applications Multimédia: Data

Huffman Coding
Huffman coding is based on the frequency of occurance of a data item
(pixel in images). The principle is to use a lower number of bits to encode the data
that occurs more frequently. Codes are stored in a Code Book which may be
constructed for each image or a set of images. In all cases the code book plus
encoded data must be transmitted to enable decoding.
The Huffman algorithm is now briefly summarised:
            A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest frequencies/probabilities, create
   a parent node of them.
(b) Assign the sum of the children's frequencies/probabilities to the parent node
    and insert it into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and delete the children from
   OPEN.
Applications Multimédia: Data
                Applications Multimédia: Data

   Symbol      Count     log(1/p)      Code         Subtotal (# of bits)
   ------       -----     --------     ---------    --------------------
    A            15       1.38            0             15
    B            7         2.48           100           21
    C            6        2.70             101          18
    D             6        2.70            110           18
    E            5         2.96             111          15
                                      TOTAL (# of bits): 87
The following points are worth noting about the above algorithm:
Decoding for the above two algorithms is trivial as long as the coding table
 (the statistics) is sent before the data. (There is a bit overhead for sending this,
 negligible if the data file is big.)
Unique Prefix Property: no code is a prefix to any other code (all symbols are at
the leaf nodes) -> great for decoder, unambiguous.
If prior statistics are available and accurate, then Huffman coding is very good.
In the above example:
Number of bits needed for Huffman Coding is: 87 / 39 = 2.23
                 Applications Multimédia: Data
Huffman Coding of Images
In order to encode images:
    Divide image up into 8x8 blocks
    Each block is a symbol to be coded
    compute Huffman codes for set of block
    Encode blocks accordingly

Adaptive Huffman Coding
The basic Huffman algorithm has been extended, for the following reasons:
(a) The previous algorithms require the statistical knowledge which is often not
     available (e.g., live audio, video).
(b) Even when it is available, it could be a heavy overhead especially when many
    tables had to be sent when a non-order0 model is used, i.e. taking into account the
The solution is to use adaptive algorithms. As an example, the Adaptive Huffman
Coding is examined below. The idea is however applicable to other adaptive
 compression algorithms
ENCODER
                  Applications Multimédia: Data
                                         DECODER
 ------                                                  - -------
Initialize_model();                                    Initialize_model();
while ((c = getc (input)) != eof)                      while ((c = decode (input)) != eof
  {                                                       {
    encode (c, output);                                         putc (c, output);
     update_model (c); update_model (c);
  }                                                       }
}

 The key is to have both encoder and decoder to use exactly the same initialization
and update_model routines.
 Update_model does two things: (a) increment the count, (b) update the Huffman
  tree (Fig 7.2).
    o During the updates, the Huffman tree will be maintained its sibling property , i.e
      the nodes (internal and leaf) are arranged in order of increasing weights
      (see figure).
    o When swapping is necessary, the farthest node with weight W is swapped with
        the node whose weight has just been increased to W+1. Note: If the node with
         weight W has a subtree beneath it, then the subtree will go with it.
    o The Huffman tree could look very different after node swapping (Fig 7.2),
       e.g., In the third tree, node A is again swapped and becomes the #5 node.
       It is now encoded using only 2 bits.
Applications Multimédia: Data
                 Applications Multimédia: Data
Arithmetic Coding
Huffman coding and the like use an integer number (k) of bits for each symbol, hence
k is never less than 1. Sometimes, e.g., when sending a 1-bit image, compression
becomes impossible.
 Idea: Suppose alphabet was
     X, Y

   and prob(X) = 2/3 prob(Y) = 1/3
If we are only concerned with encoding length 2 messages, then we can map all
  possible messages to intervals in the range [0..1]:
                 Applications Multimédia: Data

To encode message, just send enough bits of a binary fraction that uniquely
 specifies the interval.
                 Applications Multimédia: Data


 Similarly, we can map all possible length 3 messages to intervals in the range
  [0..1]:
                     Applications Multimédia: Data

•Q: How to encode X Y X X Y X ?
   Q: What about an alphabet with 26 symbols, or 256 symbols, ...?
•In general, number of bits is determined by the size of the interval.
Examples:
     • first interval is 8/27, needs 2 bits -> 2/3 bit per symbol (X)
     • last interval is 1/27, need 5 bits
•In general, need         bits to represent interval of size p. Approaches optimal encoding as
 message length got to infinity.
•Problem: how to determine probabilities?
     • Simple idea is to use adaptive model: Start with guess of symbol frequencies. Update
        frequency with each new symbol.
     • Another idea is to take account of intersymbol probabilities, e.g., Prediction by Partial
       Matching.
•Implementation Notes: Can be CPU and memory intensive; patented.
                Applications Multimédia: Data
Lempel-Ziv-Welch (LZW) Algorithm
The LZW algorithm is a very common compression technique.
Suppose we want to encode the Oxford Concise English dictionary which contains
 about 159,000 entries. Why not just transmit each word as an 18 bit number?
Problems:
 Too many bits,
 everyone needs a dictionary,
 only works for English text.
 Solution: Find a way to build the dictionary adaptively.
 Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch improved
   the scheme in 1984 (called LZW compression).
 It is used in UNIX compress -- 1D token stream (similar to below)
 It used in GIF comprerssion -- 2D window tokens (treat image as with Huffman
   Coding Above)
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a character k )
{ if wk exists in the dictionary w = wk;
else add wk to the dictionary; output the code for w; w = k; }
 Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII codes.
                    Applications Multimédia: Data
Example:
Input string is "^WED^WE^WEE^WEB^WET".
w         k           output        index           symbol
-------------------------------------------------------------
 NIL     ^
^        W              ^             256              ^W
 W       E              W             257               WE
E         D              E            258               ED
D         ^              D            259               D^
 ^        W
^W        E             256           260              ^WE
E         ^              E            261               E^
^         W
^W         E
^WE        E             260         262               ^WEE
E          ^
E^        W              261          263               E^W
W          E
WE         B              257        264                WEB
B          ^              B           265                B^
^          W
^W           E
^WE          T           260          266               ^WET
 T           EOF T
                 Applications Multimédia: Data

A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each
 code/symbol will need more than 8 bits, say 9 bits.
Usually, compression doesn't start until a large number of bytes (e.g., > 100) are
 read in.
The LZW Decompression Algorithm is as follows:
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
 { entry = dictionary entry for k;
   output entry;
   add w + entry[0] to dictionary;
    w = entry; }
                Applications Multimédia: Data
Example (continued):
Input string is "^WED<256>E<260><261><257>B<260>T“
 w        k        output index        symbol
------------------------------------------------
         ^         ^
^         W          W       256      ^W
W         E          E       257      WE
E          D         D       258      ED
 D         <256>     ^W      259      D^
 <256>      E         E      260      ^WE
 E        <260>      ^WE 261          E^
 <260> <261>          E^     262      ^WEE
 <261>     <257>     WE      263      E^W
 <257>      B         B      264      WEB
 B        <260>      ^WE 265          B^
 <260>      T         T      266      ^WET
 Problem: What if we run out of dictionary space?
     o Solution 1: Keep track of unused entries and use LRU
     o Solution 2: Monitor compression performance and flush dictionary when
        performance is poor.
 Implementation Note: LZW can be made really fast; it grabs a fixed number of
  bits from input stream, so bit parsing is very easy. Table lookup is automatic.
                Applications Multimédia: Data
Entropy Encoding Summary
 Huffman maps fixed length symbols to variable length codes. Optimal only when
 symbol probabilities are powers of 2.
 Arithmetic maps entire message to real number range based on statistics.
  Theoretically optimal for long messages, but optimality depends on data model.
  Also can be CPU/memory intensive.
 Lempel-Ziv-Welch is a dictionary-based compression method. It maps a variable
  number of symbols to a fixed length code.
 Adaptive algorithms do not need a priori estimation of probabilities, they are
  more useful in real applications.
                 Applications Multimédia: Data
Source Coding Techniques
Source coding is based on the content of the original signal is also called
semantic-based coding
High compression rates may be high but a price of loss of information. Good
compression rates make be achieved with source encoding with lossless or little
loss of information.
 Transform Coding
    o A simple transform coding example
 Frequency Domain Methods
    o 1D Example
    o 2D (Image) Example
    o What do frequencies mean in an image?
    o How can transforms into the Frequecny Domain Help?
 Fourier Theory
    o 1D Case
    o 2D Case
    o The Discrete Fourier Transform (DFT)
    o Compression
    o Relationship between DCT and FFT
 The Discrete Cosine Transform (DCT)
 Differential Encoding
 Vector Quantisation
                  Applications Multimédia: Data
A simple transform coding example
A Simple Transform Encoding procedure maybe described by the following steps
for a 2x2 block of monochrome pixels:
1. Take top left pixel as the base value for the block, pixel A.
2. Calculate three other transformed values by taking the difference
   between these (respective) pixels and pixel A, i.e. B-A, C-A, D-A.
3. Store the base pixel and the differences as the values of the transform.
Given the above we can easily for the forward transform:
and the inverse transform is:
The above transform scheme may be used to compress data by exploiting
 redundancy in the data:
Any Redundancy in the data has been transformed to values, Xi. So We
can compress the data by using fewer bits to represent the differences.
I.e if we use 8 bits per pixel then the 2x2 block uses 32 bits/ If we keep
8 bits for the base pixel, X0, and assign 4 bits for each difference then
we only use 20 bits. Which is better than an average 5 bits/pixel
                    Applications Multimédia: Data
 Example
 Consider the following 4x4 image block:                      120        130
                                                              125        120
then we get:
We can then compress these values by taking less bits to represent the
data.
However for practical purposes such a simple scheme as outlined above
 is not sufficient for compression:
•It is Too Simple
•Needs to operate on larger blocks (typically 8x8 min)
•Calculation is also too simple and from above we see that simple encoding
 of differences for large values will result in loss of information -- v poor
 losses possible here 4 bits per pixel = values 0-15 unsigned, -7 - 7 signed
so either quantise in multiples of 255/max value or massive overflow!!
However, More advance transform encoding techniques are very common
(See JPEG/MPEG below). Frequncy Domain methods such as Fourier
Transform and (more commonly) Discrete Cosine Transforms (DCT)
compression techniques fall into this category. We no consider these
 methods in general and then specifically.
               Applications Multimédia: Data
Frequency Domain Methods
Frequency domains can be obtained through the transformation from one
 (Time or Spatial) domain to the other (Frequency) via
 Discrete Cosine Transform,
 Fourier Transform etc.
                  Applications Multimédia: Data
1D Example
Lets consider a 1D (e.g. Audio) example to see what the different domains mean:
Consider a complicated sound such as the noise of a car horn. We can
describe this sound in two related ways:
•sample the amplitude of the sound many times a second, which gives
  an approximation to the sound as a function of time.
•analyse the sound in terms of the pitches of the notes, or frequencies,
  which make the sound up, recording the amplitude of each frequency.
In the example below (Fig ) we have a signal that consists of a
Sinusoidal wave at 8 Hz. 8Hz means that wave is completing 8 cycles
in 1 second and is the frequency of that wave. From the frequency
domain we can see that the composition of our signal is one wave
 (one peak) occurring with a frequency of 8Hz with a magnitude/fraction
of 1.0 i.e. it is the whole signal.
Applications Multimédia: Data
                Applications Multimédia: Data

2D (Image) Example
Now images are no more complex really:
Similarly brightness along a line can be recorded as a set of values measured at
equally spaced distances apart, or equivalently, at a set of spatial frequency values.
Each of these frequency values is referred to as a frequency component.
An image is a two-dimensional array of pixel measurements on a uniform grid.
This information be described in terms of a two-dimensional grid of spatial
frequencies.
A given frequency component now specifies what contribution is made by data
which is changing with specified x and y direction spatial frequencies.
                Applications Multimédia: Data


What do frequencies mean in an image?
If an image has large values at high frequency components then the data is
changing rapidly on a short distance scale. e.g. a page of text
If the image has large low frequency components then the large scale features
of the picture are more important. e.g. a single fairly simple object which occupies
most of the image.
For colour images, The measure (now a 2D matrix) of the frequency content is with
 regard to colour/chrominance: this shows if values are changing rapidly or slowly.
Where the fraction, or value in the frequency matrix is low, the colour is changing
gradually. Now the human eye is insensitive to gradual changes in colour and
 sensitive to intensity. So we can ignore gradual changes in colour and throw away
data without the human eye noticing, we hope.
               Applications Multimédia: Data

How can transforms into the Frequecny Domain Help?
Any function (signal) can be decomposed into purely sinusoidal components
(sine waves of different size/shape) which when added together make up our
original signal.
In the example below (Fig 7.4) we have a square wave signal that has been
decomposed by the Fourier Transform to render its sinusoidal components.
Only the first few sine wave components are shown here. You can see that a the
Square wave form will be roughly approximated if you add up the sinusoidal
components.
Applications Multimédia: Data
                 Applications Multimédia: Data
Thus Transforming a signal into the frequency domain allows us to see what
sine waves make up our signal e.g. One part sinusoidal wave at 50 Hz and two
parts sinusoidal waves at 200 Hz.
More complex signals will give more complex graphs but the idea is exactly the
 same. The graph of the frequency domain is called the frequency spectrum.
An easy way to visualise what is happening is to think of a graphic equaliser on
a stereo.
                 Applications Multimédia: Data

The bars on the left are the frequency spectrum of the sound that you are
listening to. The bars go up and down depending on the type of sound that you
are listening to. It is pretty obvious that the accumulation of these make up
the whole. The bars on the right are used to increase and decrease the sound
at particular frequencies, denoted by the numbers (Hz). The lower frequencies,
 on the left, are for bass and the higher frequencies on the right are treble.
This is directly related to our example before. The bars show how much of the
 signal is made up of sinusoidal waves at that frequency. When all the waves are
added together in their correct proportions that original sound is regenerated.
                Applications Multimédia: Data

Fourier Theory
In order to fully comprehend the DCT will do a basic study of the Fourier theory
and the Fourier transform first.
Whilst the DCT is ultimately used in multimedia compression it is easier to perhaps
comprehend how such compression methods work by studying Fourier theory, from
 which the DCT is actually derived.
The tool which converts a spatial (real space) description of an image into one in
terms of its frequency components is called the Fourier transform
The new version is usually referred to as the Fourier space description of the
image.
The corresponding inverse transformation which turns a Fourier space description
back into a real space one is called the inverse Fourier transform.
                 Applications Multimédia: Data


1D Case
Considering a continuous function f(x) of a single variable x representing distance.
The Fourier transform of that function is denoted F(u), where u represents spatial
frequency is defined by



Note: In general F(u) will be a complex quantity even though the original data
 is purely real.
The meaning of this is that not only is the magnitude of each frequency present
important, but that its phase relationship is too.
The inverse Fourier transform for regenerating f(x) from F(u) is given by



 which is rather similar, except that the exponential term has the opposite sign.
 Let's see how we compute a Fourier Transform: consider a particular function f(x)
 defined as
Applications Multimédia: Data
               Applications Multimédia: Data
In this case F(u) is purely real, which is a consequence of the original
data being symmetric in x and -x. A graph of F(u) is shown.
 function is often referred to as the Sinc function.



  So its Fourier transform is:
               Applications Multimédia: Data

2D Case
If f(x,y) is a function, for example the brightness in an image, its Fourier
transform is given by




and the inverse transform, as might be expected, is
                 Applications Multimédia: Data
The Discrete Fourier Transform (DFT)
Images and Digital Audio are digitised !!
Thus, we need a discrete formulation of the Fourier transform, which takes
 such regularly spaced data values, and returns the value of the Fourier transform
for a set of values in frequency space which are equally spaced.
This is done quite naturally by replacing the integral by a summation, to give
the discrete Fourier transform or DFT for short.
In 1D it is convenient now to assume that x goes up in steps of 1, and that there
are N samples, at values of x from to N-1.
                  Applications Multimédia: Data
The Discrete Cosine Transform (DCT)
  The discrete cosine transform (DCT) helps separate the image into parts
 (or spectral sub-bands) of differing importance (with respect to the image's
visual quality). The DCT is similar to the discrete Fourier transform: it transforms
a signal or image from the spatial domain to the frequency domain (Fig 7.8).
                Applications Multimédia: Data
Differential Encoding
Simple example of transform coding mentioned earlier and instance of this
approach.
Here:
The difference between the actual value of a sample and a prediction of that
values is encoded.
Also known as predictive encoding.
Example of technique include: differential pulse code modulation, delta modulation
 and adaptive pulse code modulation -- differ in prediction part.
Suitable where successive signal samples do not differ much, but are not zero.
 E.g. Video -- difference between frames, some audio signals.
Differential pulse code modulation (DPCM) simple prediction:
               Applications Multimédia: Data



Vector Quantisation
The basic outline of this approach is:
 Data stream divided into (1D or 2D square) blocks -- vectors
 A table or code book is used to find a pattern for each block
 Code book can be dynamically constructed or predefined
 Each pattern for block encoded as a look value in table
 Compression achieved as data is effectively subsampled and coded at this level
                  Applications Multimédia: Data
JPEG Compression
 What is JPEG?
 "Joint Photographic Expert Group" -- an international standard in 1992.
 Works with colour and greyscale images, Many applications e.g., satellite, medical, ...
JPEG compression involves the following:
 Encoding
                 Applications Multimédia: Data
 Decoding - Reverse the order for encoding
The Major Steps in JPEG Coding involve:
 DCT (Discrete Cosine Transformation)
 Quantization
 Zigzag Scan
 DPCM on DC component
 RLE on AC Components
 Entropy Coding

o Quantization
         Uniform quantization
         Quantization Tables
o Zig-zag Scan
o Differential Pulse Code Modulation (DPCM) on DC component
o Run Length Encode (RLE) on AC components
o Entropy Coding
o Summary of the JPEG bitstream
o Practical JPEG Compression
                   Applications Multimédia: Data
Quantization
Why do we need to quantise:
 To throw out bits
 Example: 101101 = 45 (6 bits).
Truncate to 4 bits: 1011 = 11.
Truncate to 3 bits: 101 = 5.
 Quantization error is the main source of the Lossy Compression
Uniform quantization
 Divide by constant N and round result (N = 4 or 8 in examples above).
 Non powers-of-two gives fine control (e.g., N = 6 loses 2.5 bits)
Quantization Tables
 In JPEG, each F[u,v] is divided by a constant q(u,v).
 Table of q(u,v) is called quantization table.
----------------------------------------------
16 11 10        16 24 40 51               61
12 12 14         19 26 58 60 55
14 13 16         24 40 57 69 56
14 17 22 29 51               87 80 62
18 22 37 56 68 109 103 77
 24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
                Applications Multimédia: Data

 Eye is most sensitive to low frequencies (upper left corner), less sensitive
to high frequencies (lower right corner)
 Standard defines 2 default quantization tables, one for luminance (above),
one for chrominance.
 Q: How would changing the numbers affect the picture (e.g., if I doubled
them all)?
Quality factor in most implementations is the scaling factor for default
quantization tables.
 Custom quantization tables can be put in image/scan header.
 Zig-zag Scan
What is the purpose of the Zig-zag Scan:
 to group low frequency coefficients in top of vector.
 Maps 8 x 8 to a 1 x 64 vector
                Applications Multimédia: Data
Differential Pulse Code Modulation (DPCM) on DC component
Here we see that besides DCT another encoding method is employed:
DPCM on the DC component at least. Why is this strategy adopted:
 DC component is large and varied, but often close to previous value
 (like lossless JPEG).
 Encode the difference from previous 8x8 blocks - DPCM



Run Length Encode (RLE) on AC components
 Yet another simple compression technique is applied to the AC component:
1x64 vector has lots of zeros in it
 Encode as (skip, value) pairs, where skip is the number of zeros and value is
 the next non-zero component.
 Send (0,0) as end-of-block sentinel value.
                 Applications Multimédia: Data
Entropy Coding
DC and AC components finally need to be represented by a smaller number of bits:
 Categorize DC values into SSS (number of bits needed to represent) and actual bits

-----------------------------
Value             SSS
0                  0
 -1,1              1
 -3,-2,2,3        2
-7..-4,4..7        3
------------------------------
 Example: if DC value is 4, 3 bits are needed.
Send off SSS as Huffman symbol, followed by actual 3 bits.
 For AC components (skip, value), encode the composite symbol (skip,SSS) using the
 Huffman coding.
 Huffman Tables can be custom (sent in header) or default.
                 Applications Multimédia: Data
Summary of the JPEG bitstream
Figure 7.12 and the above JPEG components have described how compression is
 achieved at several stages. Let us conclude by summarising the overall compression
process:
 A "Frame" is a picture, a "scan" is a pass through the pixels (e.g., the red componen
a "segment" is a group of blocks, a "block" is an 8x8 group of pixels.
 Frame header: sample precision (width, height) of image number of components
unique ID (for each component) horizontal/vertical sampling factors (for each
component) quantization table to use (for each component)
 Scan header Number of components in scan component ID (for each component)
 Huffman table for each component (for each component)
 Misc. (can occur between headers) Quantization tables Huffman Tables Arithmetic
 Coding Tables Comments Application Data
                 Applications Multimédia: Data

Practical JPEG Compression
JPEG compression algorithms may fall in to one of several categories depending
on how the compression is actually performed:
 Baseline/Sequential - the one that we described in detail
 Lossless
 Progressive
 Hierarchical
 "Motion JPEG" - Baseline JPEG applied to each image in a video.
Briefly, this is how each above approach is encoded:
1. Lossless Mode
          o      A special case of the JPEG where indeed there is no loss
                 Applications Multimédia: Data
o Take difference from previous pixels (not blocks as in the Baseline mode) as
 a "predictor“

Predictor uses linear combination of previously encoded neighbors.
It can be one of seven different predictor based on pixels neighbors




 o Since it uses only previously encoded neighbors, first row always uses P2,
   first column always uses P1.
 o Effect of Predictor (test with 20 images)
                         Applications Multimédia: Data




Comparison with Other Lossless Compression Programs (compression ratio):
---------------------------------------------------------------------------------
 Compression Program                       Compression Ratio
                                   Lena        football      F-18 flowers
 --------------------------------------------------------------------------------
lossless JPEG                       1.45        1.54         2.29       1.26
optimal lossless JPEG               1.49         1.67         2.71      1.33
compress (LZW)                      0.86         1.24        2.21      0.87
gzip (Lempel-Ziv)                   1.08         1.36        3.10       1.05
gzip -9 (optimal Lempel-Ziv)        1.08         1.36        3.13       1.05
 pack (Huffman coding)              1.02         1.12        1.19      1.00
 ---------------------------------------------------------------------------------
                Applications Multimédia: Data
1. Progressive Mode
    o Goal: display low quality image and successively improve.
    o Two ways to successively improve image:
(a) Spectral selection : Send DC component, then first few AC, some more AC, etc.
(b) Successive approximation : send DCT coefficients MSB (most significant bit) to
  LSB (least significant bit).
3. Hierarchical Mode
    A Three-level Hierarchical JPEG
    Encoder
               Applications Multimédia: Data


    o Down-sample by factors of 2 in each direction.
Example: map 640x480 to 320x240
    o Code smaller image using another method (Progressive, Baseline, or Lossless)
    o Decode and up-sample encoded image
    o Encode difference between the up-sampled and the original using Progressive,
    Baseline, or Lossless
    o Can be repeated multiple times.
    o Good for viewing high resolution image on low resolution display.
4. JPEG-2
        o Big change was to use adaptive quantization
                   Applications Multimédia: Data
Video Compression
We have studied the theory of encoding now let us see how this is applied in
practice.
We need to compress video (and audio) in practice since:
1.   Uncompressed video (and audio) data are huge. In HDTV, the bit rate easily
exceeds 1 Gbps. -- big problems for storage and network communications. For
example:
     One of the formats defined for HDTV broadcasting within the United States
      is 1920 pixels horizontally by 1080 lines vertically, at 30 frames per second.
     If these numbers are all multiplied together, along with 8 bits for each of the
      three primary colors, the total data rate required would be approximately
     1.5 Gb/sec. Because of the 6 MHz. channel bandwidth allocated, each channel
     will only support a data rate of 19.2 Mb/sec, which is further reduced to 18 Mb/sec
      by the fact that the channel must also support audio, transport, and ancillary data
     information. As can be seen, this restriction in data rate means that the original
     signal must be compressed by a figure of approximately 83:1. This number seems
     all the more impressive when it is realized that the intent is to deliver very high
     quality video to the end user, with as few visible artifacts as possible.
2.
     Lossy methods have to employed since the compression ratio of lossless methods
      (e.g., Huffman, Arithmetic, LZW) is not high enough for image and video compression,
      especially when distribution of pixel values is relatively flat.
The following compression types are commonly used in Video compression:
Spatial Redundancy Removal - Intraframe coding (JPEG)
Spatial and Temporal Redundancy Removal - Intraframe and Interframe coding (H.261, MPEG)
                Applications Multimédia: Data
H. 261 Compression
H. 261 Compression has been specifically designed for video telecommunication
 applications:
 Developed by CCITT in 1988-1990
 Meant for videoconferencing, videotelephone applications over ISDN telephone
  lines.
 Baseline ISDN is 64 kbits/sec, and integral multiples (px64)
o Overview of H.261
o Intra Frame Coding
o Inter-frame (P-frame) Coding
oThe H.261 Bitstream Structure
o Hard Problems in H.261
          Motion Vector Search
          Propagation of Errors
          Bit-rate Control
                Applications Multimédia: Data

Overview of H.261
The basic approach to H. 261 Compression is summarised as follows:

   Decoded Sequence




 Frame types are CCIR 601 CIF (352x288) and QCIF (176x144) images with 4:2:0
  subsampling.
 Two frame types: Intraframes ( I-frames) and Interframes (P-frames)
 I-frames use basically JPEG
 P-frames use pseudo-differences from previous frame (predicted), so frames
  depend on each other.
 I-frame provide us with an accessing point.
                      Applications Multimédia: Data
Intra Frame Coding
The term intra frame coding refers to the fact that the various lossless and
lossy compression techniques are performed relative to information that is
contained only within the current frame, and not relative to any other frame in
the video sequence. In other words, no temporal processing is performed outside
of the current picture or frame. This mode will be described first because it is
simpler, and because non-intra coding techniques are extensions to these basics.
Figure 1 shows a block diagram of a basic video encoder for intra frames only.
It turns out that this block diagram is very similar to that of a JPEG still image
video encoder, with only slight implementation detail differences.
                  Applications Multimédia: Data
The potential ramifications of this similarity will be discussed later. The basic processing
 blocks shown are the video filter, discrete cosine transform, DCT coefficient quantizer,
and run-length amplitude/variable length coder. These blocks are described individually in the
 sections below or have already been described in JPEG Compression.
This is a basic Intra Frame Coding Scheme is as follows:
Macroblocks are 16x16 pixel areas on Y plane of original image.
A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block.
In the example HDTV data rate calculation shown previously, the pixels were represented as
8-bit values for each of the primary colors - red, green, and blue. It turns out that while this
may be good for high performance computer generated graphics, it is wasteful in most video
compression applications. Research into the Human Visual System (HVS) has shown that the
eye is most sensitive to changes in luminance, and less sensitive to variations in chrominance.
Since absolute compression is the name of the game, it makes sense that MPEG should operate
 on a color space that can effectively take advantage of the eye¹s different sensitivity to
 luminance and chrominance information. As such, H/261 (and MPEG) uses the YCbCr color spac
 to represent the data values instead of RGB, where Y is the luminance signal, Cb is the blue
 color difference signal, and Cr is the red color difference signal.
A macroblock can be represented in several different manners when referring to the YCbCr
 color space. Figure 7.13 below shows 3 formats known as 4:4:4, 4:2:2, and 4:2:0 video. 4:4:4
is full bandwidth YCbCr video, and each macroblock consists of 4 Y blocks, 4 Cb blocks, and
4 Cr blocks. Being full bandwidth, this format contains as much information as the data would
 if it were in the RGB color space. 4:2:2 contains half as much chrominance information as 4:4:4
, and 4:2:0 contains one quarter of the chrominance information. Although MPEG-2 has
 provisions to handle the higher chrominance formats for professional applications, most
consumer level products will use the normal 4:2:0 mode.
               Applications Multimédia: Data




Macroblock Video Formats
Because of the efficient manner of luminance and chrominance representation,
 the 4:2:0 representation allows an immediate data reduction from
12 blocks/macroblock to 6 blocks/macroblock, or 2:1 compared to full bandwidth
representations such as 4:4:4 or RGB. To generate this format without generating
color aliases or artifacts requires that the chrominance signals be filtered.
The Macroblock is coded as follows:
                 Applications Multimédia: Data
o Many macroblocks will be exact matches (or close enough). So send address of
  each block in image -> Addr
o Sometimes no good match can be found, so send INTRA block -> Type
o Will want to vary the quantization to fine tune compression, so send quantization
  value -> Quant
o Motion vector -> vector
o Some blocks in macroblock will match well, others match poorly. So send bitmask
   indicating which blocks are present (Coded Block Pattern, or CBP).
o Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.
 Quantization is by constant value for all DCT coefficients (i.e., no quantization
 table as in JPEG).
                   Applications Multimédia: Data
Inter-frame (P-frame) Coding
The previously discussed intra frame coding techniques were limited to processing the video
signal on a spatial basis, relative only to information within the current video frame.
Considerably more compression efficiency can be obtained however, if the inherent temporal,
or time-based redundancies, are exploited as well. Anyone who has ever taken a reel of the
old-style super-8 movie film and held it up to a light can certainly remember seeing that most
consecutive frames within a sequence are very similar to the frames both before and after
the frame of interest. Temporal processing to exploit this redundancy uses a technique known
as block-based motion compensated prediction, using motion estimation. A block diagram of the
 basic encoder with extensions for non-intra frame coding techniques is given in Figure 7.14.
 Of course, this encoder can also support intra frame coding as a subset.
                Applications Multimédia: Data


P-Frame Coding
Starting with an intra, or I frame, the encoder can forward predict a future frame.
Each P frame in this sequence is predicted from the frame immediately preceding it,
P-coding can be summarised as follows:
               Applications Multimédia: Data


 An Coding Example (P-frame)
 Previous image is called reference image.
 Image to code is called target image.
 Actually, the difference is encoded.
Subtle points:
1. Need to use decoded image as reference image, not original. Why?
2. We're using "Mean Absolute Difference" (MAD) to decide best block.
Can also use "Mean Squared Error" (MSE) = sum(E*E)
                Applications Multimédia: Data

The H.261 Bitstream Structure
The H.261 Bitstream structure may be summarised as follows:




 Need to delineate boundaries between pictures, so send Picture Start Code -> PSC
 Need timestamp for picture (used later for audio synchronization), so send
 Temporal Reference -> TR
 Is this a P-frame or an I-frame? Send Picture Type -> PType
 Picture is divided into regions of 11x3 macroblocks called Groups of Blocks -> GOB
 Might want to skip whole groups, so send Group Number ( Grp #)
 Might want to use one quantization value for whole group, so send Group
 Quantization Value -> GQuant
 Overall, bitstream is designed so we can skip data whenever possible while still
 unambiguous.
Applications Multimédia: Data
                                Applications Multimédia: Data

Hard Problems in H.261
There are however a few difficult problems in H.261:

•Motion vector search
•Propagation of Errors
•Bit-rate Control

o Motion Vector Search
o Propagation of Errors
o Bit-rate Control
               Applications Multimédia: Data

Motion Vector Search
               Applications Multimédia: Data

•(x+k,y+i) - pixels in the macro block with upper left corner (x,y) in the
  Target.
R(X+i+k,y+j+l) - pixels in the macro block with upper left corner (x+i,y+j
 in the Reference.
Cost function is:
Where MAE stands for Mean Absolute Error
•Goal is to find a vector (u, v) such that MAE (u, v) is minimum
•Full Search Method:
1. Search the whole        searching region.
2. Cost is:

  operations,
  assuming that each pixel comparison needs 3 operations (Subtraction,
   Absolute value, Addition).
               Applications Multimédia: Data
•Two-Dimensional Logarithmic Search:
Similar to binary search. MAE function is initially computed within a
 window of          at nine locations as shown in the figure.
Repeat until the size of the search region is one pixel wide:
1. Find one of the nine locations that yields the minimum MAE.
2. Form a new searching region with half of the previous size and
  centered at the location found in step 1.
              Applications Multimédia: Data
 Hierarchical Motion Estimation:
                        Applications Multimédia: Data
1. Form several low resolution version of the target and reference pictures
2. Find the best match motion vector in the lowest resolution version.
3. Modify the motion vector level by level when going up
•Performance comparison:
-----------------------------------------------------------------------------------------
Search Method                            Operation for 720x480 at 30 fps
                                      p = 15                           p=7
–-----------------------------------------------------------------------------------------
Full Search                            29.89 GOPS                     6.99 GOPS
Logarithmic                            1.02 GOPS                       777.60 MOPS
Hierarchical                           507.38 MOPS                     398.52 MOPS
 -------------------------------------------------------------------------------------------
                Applications Multimédia: Data

Propagation of Errors
 Send an I-frame every once in a while
Make sure you use decoded frame for comparison
Bit-rate Control
 Simple feedback loop based on "buffer fullness"
 If buffer is too full, increase the quantization scale factor to reduce the data
                 Applications Multimédia: Data
MPEG Compression
The acronym MPEG stands for Moving Picture Expert Group, which worked to
generate the specifications under ISO, the International Organization for
Standardization and IEC, the International Electrotechnical Commission.
What is commonly referred to as "MPEG video" actually consists at the present
time of two finalized standards, MPEG-11 and MPEG-22, with a third standard,
MPEG-4, was finalized in 1998 for Very Low Bitrate Audio-Visual Coding. The
MPEG-1 and MPEG-2 standards are similar in basic concepts. They both are based
on motion compensated block-based transform coding techniques, while MPEG-4
deviates from these more traditional approaches in its usage of software image
construct descriptors, for target bit-rates in the very low range, < 64Kb/sec.
Because MPEG-1 and MPEG-2 are finalized standards and are both presently
being utilized in a large number of applications, this paper concentrates on
compression techniques relating only to these two standards. Note that there is
 no reference to MPEG-3. This is because it was originally anticipated that this
 standard would refer to HDTV applications, but it was found that minor extensions
to the MPEG-2 standard would suffice for this higher bit-rate, higher resolution
application, so work on a separate MPEG-3 standard was abandoned.
The current thrust is MPEG-7 "Multimedia Content Description Interface" whose
completion is scheduled for July 2001. Work on the new standard MPEG-21
"Multimedia Framework" has started in June 2000 and has already produced a
                 Applications Multimédia: Data
MPEG-1 was finalized in 1991, and was originally optimized to work at video
Resolutions of 352x240 pixels at 30 frames/sec (NTSC based) or 352x288 pixels at
25 frames/sec (PAL based), commonly referred to as Source Input Format (SIF)
video. It is often mistakenly thought that the MPEG-1 resolution is limited to the
above sizes, but it in fact may go as high as 4095x4095 at 60 frames/sec. The
bit-rate is optimized for applications of around 1.5 Mb/sec, but again can be used
at higher rates if required. MPEG-1 is defined for progressive frames only, and has
no direct provision for interlaced video applications, such as in broadcast television
applications.
MPEG-2 was finalized in 1994, and addressed issues directly related to digital
television broadcasting, such as the efficient coding of field-interlaced video and
scalability. Also, the target bit-rate was raised to between 4 and 9 Mb/sec, resulting
 in potentially very high quality video. MPEG-2 consists of profiles and levels.
The profile defines the bitstream scalability and the colorspace resolution, while
the level defines the image resolution and the maximum bit-rate per profile.
 Probably the most common descriptor in use currently is Main Profile, Main
Level (MP@ML) which refers to 720x480 resolution video at 30 frames/sec, at
bit-rates up to 15 Mb/sec for NTSC video. Another example is the HDTV resolution
 of 1920x1080 pixels at 30 frame/sec, at a bit-rate of up to 80 Mb/sec. This is an
 example of the Main Profile, High Level (MP@HL) descriptor. A complete table of
the various legal combinations can be found in reference2.
.
              Applications Multimédia: Data

•MPEG Video
   •MPEG Video Layers
   •B-Frames
   •Motion Estimation
   •Coding of Predicted Frames:Coding Residual Errors
   •Differences from H.261
•The MPEG Video Bitstream
•Decoding MPEG Video in Software
   •Intra Frame Decoding
   •Non-Intra Frame Decoding
   •MPEG-2, MPEG-3, and MPEG-4
                 Applications Multimédia: Data

MPEG Video
MPEG compression is essentially a attempts to over come some shortcomings of
H.261 and JPEG:
 Recall H.261 dependencies:




 The Problem here is that many macroblocks need information is not in the
reference frame.
For example:
                Applications Multimédia: Data
•Actual pattern is up to encoder, and need not be regular.
The MPEG solution is to add a third frame type which is a bidirectional frame,
 or B-frame
 B-frames search for macroblock in past and future frames.
 Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB
Actual pattern is up to encoder, and need not be regular.




o MPEG Video Layers
o B-Frames
o Motion Estimation
o Coding of Predicted Frames:Coding Residual Errors
o Differences from H.261
                 Applications Multimédia: Data

MPEG Video Layers
MPEG video is broken up into a hierarchy of layers to help with error handling,
random search and editing, and synchronization, for example with an audio
bitstream. From the top level, the first layer is known as the video sequence layer,
and is any self-contained bitstream, for example a coded movie or advertisement.
The second layer down is the group of pictures, which is composed of 1 or more
groups of intra (I) frames and/or non-intra (P and/or B) pictures that will be
defined later. Of course the third layer down is the picture layer itself, and the
next layer beneath it is called the slice layer. Each slice is a contiguous sequence
of raster ordered macroblocks, most often on a row basis in typical video
applications, but not limited to this by the specification. Each slice consists of
macroblocks, which are 16x16 arrays of luminance pixels, or picture data elements,
with 2 8x8 arrays of associated chrominance pixels. The macroblocks can be
further divided into distinct 8x8 blocks, for further processing such as transform
coding. Each of these layers has its own unique 32 bit start code defined in the
syntax to consist of 23 zero bits followed by a one, then followed by 8 bits for the
actual start code. These start codes may have as many zero bits as desired
preceding them.
                 Applications Multimédia: Data
B-Frames
The MPEG encoder also has the option of using forward/backward interpolated
prediction. These frames are commonly referred to as bi-directional interpolated
 prediction frames, or B frames for short. As an example of the usage of I, P, and B
frames, consider a group of pictures that lasts for 6 frames, and is given as
 I,B,P,B,P,B,I,B,P,B,P,B,Š As in the previous I and P only example, I frames are
coded spatially only and the P frames are forward predicted based on previous
I and P frames. The B frames however, are coded based on a forward prediction
 from a previous I or P frame, as well as a backward prediction from a succeeding
I or P frame. As such, the example sequence is processed by the encoder such that
 the first B frame is predicted from the first I frame and first P frame, the second
B frame is predicted from the second and third P frames, and the third B frame is
 predicted from the third P frame and the first I frame of the next group of picture
 From this example, it can be seen that backward prediction requires that the future
frames that are to be used for backward prediction be encoded and transmitted firs
 out of order. This process is summarized in Figure 7.16. There is no defined limit to
the number of consecutive B frames that may be used in a group of pictures, and of
course the optimal number is application dependent. Most broadcast quality
applications however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,Š) as
the ideal trade-off between compression efficiency and video quality.
                   Applications Multimédia: Data



                          .
B-Frame Encoding
The main advantage of the usage of B frames is coding efficiency. In most cases, B frames will
 result in less bits being coded overall. Quality can also be improved in the case of moving objects
that reveal hidden areas within a video sequence. Backward prediction in this case allows the
encoder to make more intelligent decisions on how to encode the video within these areas. Also,
since B frames are not used to predict future frames, errors generated will not be propagated
further within the sequence.
One disadvantage is that the frame reconstruction memory buffers within the encoder and decoder
 must be doubled in size to accommodate the 2 anchor frames. This is almost never an issue for the
relatively expensive encoder, and in these days of inexpensive DRAM it has become much less of
an issue for the decoder as well. Another disadvantage is that there will necessarily be a delay
throughout the system as the frames are delivered out of order as was shown in Figure . Most
 one-way systems can tolerate these delays, as they are more objectionable in applications such as
 video conferencing systems.
                 Applications Multimédia: Data
Motion Estimation
The temporal prediction technique used in MPEG video is based on motion
estimation. The basic premise of motion estimation is that in most cases,
consecutive video frames will be similar except for changes induced by objects
 moving within the frames. In the trivial case of zero motion between frames
(and no other differences caused by noise, etc.), it is easy for the encoder to
efficiently predict the current frame as a duplicate of the prediction frame.
When this is done, the only information necessary to transmit to the decoder
becomes the syntactic overhead necessary to reconstruct the picture from the
original reference frame. When there is motion in the images, the situation is not
as simple.
Figure 7.17 shows an example of a frame with 2 stick figures and a tree. The second
half of this figure is an example of a possible next frame, where panning has
resulted in the tree moving down and to the right, and the figures have moved
farther to the right because of their own movement outside of the panning.
The problem for motion estimation to solve is how to adequately represent the
changes, or differences, between these two video frames.
                     Applications Multimédia: Data




Motion Estimation Example
The way that motion estimation goes about solving this problem is that a comprehensive 2-dimensional
spatial search is performed for each luminance macroblock. Motion estimation is not applied directly to
chrominance in MPEG video, as it is assumed that the color motion can be adequately represented with
the same motion information as the luminance. It should be noted at this point that MPEG does not define
how this search should be performed. This is a detail that the system designer can choose to implement in
one of many possible ways. This is similar to the bit-rate control algorithms discussed previously, in the
respect that complexity vs. quality issues need to be addressed relative to the individual application.
It is well known that a full, exhaustive search over a wide 2-dimensional area yields the best matching
results in most cases, but this performance comes at an extreme computational cost to the encoder.
As motion estimation usually is the most computationally expensive portion of the video encoder, some lower
cost encoders might choose to limit the pixel search range, or use other techniques such as telescopic
searches, usually at some cost to the video quality.
Figure 7.18 shows an example of a particular macroblock from Frame 2 of Figure 7.17, relative to various
macroblocks of Frame 1. As can be seen, the top frame has a bad match with the macroblock to be coded.
The middle frame has a fair match, as there is some commonality between the 2 macroblocks. The bottom
frame has the best match, with only a slight error between the 2 macroblocks. Because a relatively good mat
has been found, the encoder assigns motion vectors to the macroblock, which indicate how far horizontally a
vertically the macroblock must be moved so that a match is made. As such, each forward and backward
predicted macroblock may contain 2 motion vectors, so true bidirectionally predicted macroblocks will utilize
                Applications Multimédia: Data




Motion Estimation Macroblock Example
Figure 7.19 shows how a potential predicted Frame 2 can be generated from
Frame 1 by using motion estimation. In this figure, the predicted frame is
subtracted from the desired frame, leaving a (hopefully) less complicated residual
 error frame that can then be encoded much more efficiently than before motion
estimation. It can be seen that the more accurate the motion is estimated and
matched, the more likely it will be that the residual error will approach zero, and
the coding efficiency will be highest. Further coding efficiency is accomplished by
 taking advantage of the fact that motion vectors tend to be highly correlated
between macroblocks. Because of this, the horizontal component is compared to
the previously valid horizontal motion vector and only the difference is coded.
This same difference is calculated for the vertical component before coding.
These difference codes are then described with a variable length code for maximum
 compression efficiency.
                Applications Multimédia: Data




Final Motion Estimation Prediction
Of course not every macroblock search will result in an acceptable match. If the
 encoder decides that no acceptable match exists (again, the "acceptable"
criterion is not MPEG defined, and is up to the system designer) then it has the
option of coding that particular macroblock as an intra macroblock, even though
it may be in a P or B frame. In this manner, high quality video is maintained at a
 slight cost to coding efficiency.
                 Applications Multimédia: Data
Coding of Predicted Frames:Coding Residual Errors
After a predicted frame is subtracted from its reference and the residual error
 frame is generated, this information is spatially coded as in I frames, by coding
8x8 blocks with the DCT, DCT coefficient quantization, run-length/amplitude
coding, and bitstream buffering with rate control feedback. This process is
basically the same with some minor differences, the main ones being in the DCT
 coefficient quantization. The default quantization matrix for non-intra frames is
 a flat matrix with a constant value of 16 for each of the 64 locations. This is very
 different from that of the default intra quantization matrix which is tailored for
 more quantization in direct proportion to higher spatial frequency content. As in
 the intra case, the encoder may choose to override this default, and utilize another
 matrix of choice during the encoding process, and download it via the encoded
bitstream to the decoder on a picture basis. Also, the non-intra quantization step
function contains a dead-zone around zero that is not present in the intra version.
This helps eliminate any lone DCT coefficient quantization values that might reduce
 the run-length amplitude efficiency. Finally, the motion vectors for the residual
 block information are calculated as differential values and are coded with a
variable length code according to their statistical likelihood of occurrence.
                   Applications Multimédia: Data
Differences from H.261
•Larger gaps between I and P frames, so expand motion vector search range.
•To get better encoding, allow motion vectors to be specified to fraction of a pixel (1/2 pixels).
•Bitstream syntax must allow random access, forward/backward play, etc.
•Added notion of slice for synchronization after loss/corrupt data.
 Example: picture with 7 slices:
                                        Applications Multimédia: Data


•B frame macroblocks can specify
 two motion vectors (one to past and
one to future), indicating result is to
 be averaged.


•Compression performance of MPEG 1
------------------------- ----------------------------------------------------------------------------------------------
Type                 Size                        Compression
-------------------------------------------------------------------
I                    18 KB                            7:1
P                     6 KB                           20:1
B                     2.5 KB                         50:1
Avg                   4.8 KB                         27:1
-------------------------------------------------------------------
                Applications Multimédia: Data
The MPEG Video Bitstream
 The MPEG Video Bitstream is summarised as follows:
 Public domain tool mpeg_stat and mpeg_bits will analyze a bitstream
                   Applications Multimédia: Data
 Sequence Information
1.Video Params include width, height, aspect ratio of pixels, picture rate.
2.Bitstream Params are bit rate, buffer size, and constrained parameters flag (means
   bitstream can be decoded by most hardware)
3.Two types of QTs: one for intra-coded blocks (I-frames) and one for inter-coded blocks
  (P-frames).
 Group of Pictures (GOP) information
1.Time code: bit field with SMPTE time code (hours, minutes, seconds, frame).
2.GOP Params are bits describing structure of GOP. Is GOP closed? Does it have a dangling
  pointer broken?
 Picture Information
1.Type: I, P, or B-frame?
2.Buffer Params indicate how full decoder's buffer should be before starting decode.
3.Encode Params indicate whether half pixel motion vectors are used.
 Slice information
1.Vert Pos: what line does this slice start on?
2.QScale: How is the quantization table scaled in this slice?
 Macroblock information
1.Addr Incr: number of MBs to skip.
2.Type: Does this MB use a motion vector? What type?
3.QScale: How is the quantization table scaled in this MB?
4.Coded Block Pattern (CBP): bitmap indicating which blocks are coded.
                 Applications Multimédia: Data
Decoding MPEG Video in Software
 Software Decoder goals: portable, multiple display types
 Breakdown of time

   ---------------------------------------------------
   Function                                % Time
   Parsing Bitstream                        17.4%
   IDCT                                     14.2%
   Reconstruction                           31.5%
   Dithering                                24.5%
   Misc. Arith.                             9.9%
   Other                                    2.7%
    --------------------------------------------------
                  Applications Multimédia: Data
Intra Frame Decoding
                                                     it is necessary to
To decode a bitstream generated from the encoder of Figure 7.20,
reverse the order of the encoder processing. In this manner, an I frame
decoder consists of an input bitstream buffer, a Variable Length Decoder
 (VLD), an inverse quantizer, an Inverse Discrete Cosine Transform
(IDCT), and an output interface to the required environment
(computer hard drive, video frame buffer, etc.). This decoder is shown
in fig.
                        Applications Multimédia: Data
Intra Frame Encoding




Intra Frame Decoding
The input bitstream buffer consists of memory that operates in the inverse fashion of the buffer in the encoder.
For fixed bit-rate applications, the constant rate bitstream is buffered in the memory and read out at a variable rate
depending on the coding efficiency of the macroblocks and frames to be decoded.
The VLD is probably the most computationally expensive portion of the decoder because it must operate on a bit -wise
basis (VLD decoders need to look at every bit, because the boundaries between variable length codes are random
and non-aligned) with table look-ups performed at speeds up to the input bit-rate. This is generally the only function in the
 receiver that is more complex to implement than its corresponding function within the encoder, because of the extensive
 high-speed bit-wise processingnecessary.
The inverse quantizer block multiplies the decoded coefficients by the corresponding values of the quantization matrix
and the quantization scale factor. Clipping of the resulting coefficients is performed to the region -2048 to +2047, then an
IDCT mismatch control is applied to prevent long term error propagation within the sequence.
The IDCT operation is given in Equation 2, and is seen to be similar to the DCT operation of Equation 1. As such,
these two operations are very similar in implementation between encoder and decoder.
                 Applications Multimédia: Data

Non-Intra Frame Decoding

It was shown previously that the non-intra frame encoder built upon the basic
building blocks of the intra frame encoder, with the addition of motion estimation
and its associated support structures. This is also true of the non-intra frame
decoder, as it contains the same core structure as the intra frame decoder with
the addition of motion compensation support. Again, support for intra frame
decoding is inherent in the structure, so I, P, and B frame decoding is possible.
The decoder is shown in Figure 24.
                Applications Multimédia: Data
MPEG-2, MPEG-3, and MPEG-4
• MPEG-2 target applications
-------------------------------------------------------------------------------------
 Level     size              Pixels/sec       bit-rate       Application
                                              (Mbits)
--------------------------------------------------------------------------------------
Low        352 x 240           3M               4             consumer tape equiv.
Main        720 x 480          10 M             15             studio TV
High 1440 1440 x 1152          47 M             60             consumer HDTV
 High       1920 x 1080        63 M             80             film production
---------------------------------------------------------------------------------------
• Differences from MPEG-1
1. Search on fields, not just frames.
2. 4:2:2 and 4:4:4 macroblocks
3. Frame sizes as large as 16383 x 16383
4. Scalable modes: Temporal, Progressive,...
5. Non-linear macroblock quantization factor
6. A bunch of minor fixes (see MPEG FAQ for more details)
• MPEG-3: Originally for HDTV (1920 x 1080), got folded into MPEG-2
• MPEG-4: Originally targeted at very low bit-rate communication (4.8 to 64 kb/sec)
   Now addressing video processing...
                   Applications Multimédia: Data

Audio Compression
As with video a number of compression techniques have been applied to audio.


 •Simple Audio Compression Methods
 •Psychoacoustics
      •Human hearing and voice
      •Frequency Masking
      •Critical Bands
      •Temporal masking
      •Summary
 •MPEG Audio Compression
      •Some facts
      •Steps in algorithm:
      •Example:
      •MPEG Layers
      •Effectiveness of MPEG audio
 •Streaming Audio (and video)
                Applications Multimédia: Data

Simple Audio Compression Methods
Traditional lossless compression methods (Huffman, LZW, etc.) usually don't work
 well on audio compression (the same reason as in image compression).
The following are some of the Lossy methods applied to audio compression:
• Silence Compression - detect the "silence", similar to run-length coding
• Adaptive Differential Pulse Code Modulation (ADPCM)
e.g., in CCITT G.721 - 16 or 32 Kbits/sec.
(a) encodes the difference between two consecutive signals,
(b) adapts at quantization so fewer bits are used when the value is smaller.
      o It is necessary to predict where the waveform is headed -> difficult
      o Apple has proprietary scheme called ACE/MACE. Lossy scheme that tries to
        predict where wave will go in next sample. About 2:1 compression.
• Linear Predictive Coding (LPC) fits signal to speech model and then transmits
  parameters of model. Sounds like a computer talking, 2.4 kbits/sec.
• Code Excited Linear Predictor (CELP) does LPC, but also transmits error
  term - audio conferencing quality at 4.8 kbits/sec.
                 Applications Multimédia: Data
Psychoacoustics
These methods are related to how humans actually hear sounds:

- Human hearing and voice
- Frequency Masking
- Critical Bands
- Temporal masking
- Summary
Human hearing and voice
• Range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz.
• Dynamic range (quietest to loudest) is about 96 dB
• Normal voice range is about 500 Hz to 2 kHz
    o Low frequencies are vowels and bass
    o High frequencies are consonants
Question: How sensitive is human hearing?
• Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just
barely audible. Vary the frequency and plot
               Applications Multimédia: Data
Frequency Masking
Question: Do receptors interfere with each other?
• Experiment: Play 1 kHz tone ( maskingtone) at fixed level (60 dB). Play test
  tone at a different level (e.g., 1.1kHz), and raise level until just distinguishable.
• Vary the frequency of the test tone and plot the threshold when it becomes
   audible:
• Repeat for various frequencies of masking tones
Critical Bands
• Perceptually uniform measure of frequency, non-proportional to width of masking curve
About 100 Hz for masking frequency < 500 Hz, grow larger and larger above 500 Hz.
• The width is called the size of the critical band
Barks
• Introduce new unit for frequency called a bark (after Barkhausen)
1 Bark = width of one critical band
For frequency < 500 Hz,
For frequency > 500 Hz,
• Masking Thresholds on critical band scale:
                Applications Multimédia: Data
Temporal masking
• If we hear a loud sound, then it stops, it takes a little while until we can hear a
  soft tone nearby
• Question: how to quantify?
• Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at
  40 dB. Test tone can't be heard (it's masked).
   Stop masking tone, then stop test tone after a short delay.
   Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).
    Repeat with different level of the test tone and plot:
• Try other frequencies for test tone (masking tone duration constant). Total
   effect of masking
Summary
• If we have a loud tone at, say, 1 kHz, then nearby quieter tones are masked.
• Best compared on critical band scale - range of masking is about 1 critical band
• Two factors for masking - frequency masking and temporal masking
• Question: How to use this for compression?
               Applications Multimédia: Data

MPEG Audio Compression


•Some facts
•Steps in algorithm:
•Example:
•MPEG Layers
•Effectiveness of MPEG audio
                Applications Multimédia: Data

Some facts
• MPEG-1: 1.5 Mbits/sec for audio and video
About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio
(Uncompressed CD audio is 44,100 samples/sec * 16 bits/sample * 2 channels >
  1.4 Mbits/sec)
• Compression factor ranging from 2.7 to 24.
• With Compression rate 6:1 (16 bits stereo sampled at 48 KHz is reduced to
  256 kbits/sec) and optimal listening conditions, expert listeners could not
   distinguish between coded and original audio clips.
• MPEG audio supports sampling frequencies of 32, 44.1 and 48 KHz.
• Supports one or two audio channels in one of the four modes:
1. Monophonic - single audio channel
2. Dual-monophonic - two independent channels (similar to stereo)
3. Stereo - for stereo channels that share bits, but not using joint-stereo coding
4. Joint-stereo - takes advantage of the correlations between stereo channels
                 Applications Multimédia: Data

Steps in algorithm:
1. Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into
     frequency subbands that approximate the 32 critical bands -> sub-band
     filtering.
2. Determine amount of masking for each band caused by nearby band using the
    results shown above (this is called the psychoacoustic model).
3. If the power in a band is below the masking threshold, don't encode it.
4. Otherwise, determine number of bits needed to represent the coefficient such
   that noise introduced by quantization is below the masking effect (Recall that 1
   bit of quantization introduces about 6 dB of noise).
5. Format bitstream
                 Applications Multimédia: Data
Example:
• After analysis, the first levels of 16 of the 32 bands are these:
----------------------------------------------------------------------
Band         1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
 ----------------------------------------------------------------------
• If the level of the 8th band is 60dB,
it gives a masking of 12 dB in the 7th band, 15dB in the 9th.
Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
Level in 9th band is 35 dB ( > 15 dB ), so send it.
-> Can encode with up to 2 bits (= 12 dB) of quantization error.
               Applications Multimédia: Data
MPEG Layers
• MPEG defines 3 layers for audio. Basic model is same, but codec complexity
  increases with each layer.
• Divides data into frames, each of them contains 384 samples, 12 samples from
  each of the 32 filtered subbands as shown below.
• Layer 1: DCT type filter with one frame and equal frequency spread per band.
 Psychoacoustic model only uses frequency masking.
• Layer 2: Use three frames in filter (before, current, next, a total of 1152
  samples). This models a little bit of the temporal masking.
• Layer 3: Better critical band filter is used (non-equal frequencies),
  psychoacoustic model includes temporal masking effects, takes into account
   stereo redundancy, and uses Huffman coder.
                  Applications Multimédia: Data
Effectiveness of MPEG audio
-----------------------------------------------------------------------------------
Layer        Target        Ratio Quality @           Quality      @ Theoretical
             bitrate                64 kbits       128 kbits       Min. Delay
 -----------------------------------------------------------------------------------
 Layer 1      192 kbit      4:1        ---           ---            19 ms
 Layer 2      128 kbit      6:1      2.1 to 2.6        4+           35 ms
 Layer 3       64 kbit      12:1     3.6 to 3.8       4+            59 ms
-------------------------------------------------------------------------------------
• 5 = perfect, 4 = just noticeable, 3 = slightly annoying, 2 = annoying,
  1 = very annoying
• Real delay is about 3 times theoretical delay
                 Applications Multimédia: Data

Streaming Audio (and video)

Popular new delivery medium for the Web and other Multimedia networks


 Examples of streamed audio (and video) (and video)
 •Real Audio
 •Shockwave
 •.wav files (not video obviously)
Plan

 Multimedia Networking         6.5 Beyond Best Effort
  Applications                  6.6 Scheduling and Policing
 Streaming stored audio and     Mechanisms
  video                         6.7 Integrated Services
      RTSP                     RSVP
 Real-time Multimedia:         Differentiated Services
  Internet Phone Case Study
 Protocols for Real-Time
  Interactive Applications
      RTP,RTCP
      SIP
 Streaming Stored Multimedia
Application-level streaming
  techniques for making the
  best out of best effort       Media Player
  service:
                               jitter removal
    client side buffering
                               decompression
    use of UDP versus TCP     error concealment
    multiple encodings of     graphical user interface
     multimedia                  w/ controls for
                                interactivity
Internet multimedia: simplest approach




                       audio or video stored in file
                       files transferred as HTTP object
                           received inentirety at client
                           then passed to player



audio, video not streamed:
 no, “pipelining,” long delays until playout!
Internet multimedia: streaming approach




 browser GETs metafile
 browser launches player, passing metafile
 player contacts server
 server streams audio/video to player
  Streaming from a streaming server




 This architecture allows for non-HTTP protocol between
  server and media player
 Can also use UDP instead of TCP.
Streaming Multimedia: Client Buffering

          constant bit
         rate video           client video       constant bit
     transmission               reception       rate video
                                             playout at client
                 variable
                 network




                                buffered
                                  video
                  delay



                   client playout                                time
                        delay

 Client-side buffering, playout delay compensate
  for network-added delay, delay jitter
Streaming Multimedia: Client Buffering



                                        constant
             variable fill                drain
              rate, x(t)                 rate, d




                             buffered
                               video


  Client-side buffering, playout delay compensate
   for network-added delay, delay jitter
Streaming Multimedia: UDP or TCP?
UDP
 server sends at rate appropriate for client (oblivious to
  network congestion !)
    often send rate = encoding rate = constant rate
    then, fill rate = constant rate - packet loss
 short playout delay (2-5 seconds) to compensate for network
  delay jitter
 error recovery: time permitting
TCP
 send at maximum possible rate under TCP
 fill rate fluctuates due to TCP congestion control
 larger playout delay: smooth TCP delivery rate
 HTTP/TCP passes more easily through firewalls
Streaming Multimedia: client rate(s)
      1.5 Mbps encoding



28.8 Kbps encoding




Q: how to handle different client receive rate
  capabilities?
    28.8 Kbps dialup
    100Mbps Ethernet
A: server stores, transmits multiple copies
   of video, encoded at different rates
User Control of Streaming Media: RTSP

HTTP                             What it doesn‟t do:
 Does not target multimedia      does not define how
  content                          audio/video is encapsulated
 No commands for fast             for streaming over network
  forward, etc.                   does not restrict how
RTSP: RFC 2326                     streamed media is
 Client-server application
                                   transported; it can be
  layer protocol.                  transported over UDP or
                                   TCP
 For user to control display:
                                  does not specify how the
  rewind, fast forward,
  pause, resume,                   media player buffers
  repositioning, etc…              audio/video
RTSP Example

Scenario:
 metafile communicated to web browser
 browser launches player
 player sets up an RTSP control connection, data
  connection to streaming server
            La téléphonie sur l‟internet
 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 Real-time, Interactive         6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
   o VoIP
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP
    SIP
  Real-time interactive applications

 PC-2-PC phone               Going to now look at
    instant messaging        a PC-2-PC Internet
     services are providing   phone example in
     this                     detail
 PC-2-phone
    Dialpad
    Net2phone
 videoconference with
  Webcams
Interactive Multimedia: Internet Phone
 Introduce Internet Phone by way of an example
  speaker‟s audio: alternating talk spurts, silent
   periods.
       64 kbps during talk spurt
  pkts generated only during talk spurts
       20 msec chunks at 8 Kbytes/sec: 160 bytes data
  application-layer header added to each chunk.
  Chunk+header encapsulated into UDP segment.

  application sends UDP segment into socket every
   20 msec during talkspurt.
Internet Phone: Packet Loss and Delay

 network loss: IP datagram lost due to network
  congestion (router buffer overflow)
 delay loss: IP datagram arrives too late for
  playout at receiver
      delays: processing, queueing in network; end-system
       (sender, receiver) delays
      typical maximum tolerable delay: 400 ms
 loss tolerance: depending on voice encoding, losses
  concealed, packet loss rates between 1% and 10%
  can be tolerated.
Delay Jitter

         constant bit
              rate                 client       constant bit
     transmission              reception       rate playout
                                            at client
                variable
                network




                               buffered
                                 data
                 delay
                (jitter)


                  client playout                               time
                       delay

  Consider the end-to-end delays of two consecutive
   packets: difference can be more or less than 20
   msec
Internet Phone: Fixed Playout Delay

 Receiver attempts to playout each chunk exactly q
  msecs after chunk was generated.
    chunk has time stamp t: play out chunk at t+q .
    chunk arrives after t+q: data arrives too late
     for playout, data “lost”
 Tradeoff for q:
    large q: less packet loss
    small q: better interactive experience
Fixed Playout Delay
• Sender generates packets every 20 msec during talk spurt.
• First packet received at time r
• First playout schedule: begins at p
• Second playout schedule: begins at p‟
  Adaptive Playout Delay, I
   Goal: minimize playout delay, keeping late loss rate low
   Approach: adaptive playout delay adjustment:
         Estimate network delay, adjust playout delay at beginning of
          each talk spurt.
         Silent periods compressed and elongated.
         Chunks still played out every 20 msec during talk spurt.




Dynamic estimate of average delay at receiver:


where u is a fixed constant (e.g., u = .01).
                Adaptive playout delay II
Also useful to estimate the average deviation of the delay, vi :



The estimates di and vi are calculated for every received packet,
although they are only used at the beginning of a talk spurt.

For first packet in talk spurt, playout time is:


where K is a positive constant.

Remaining packets in talkspurt are played out periodically
 Adaptive Playout, III

Q: How does receiver determine whether packet is
  first in a talkspurt?
 If no loss, receiver looks at successive timestamps.
      difference of successive stamps > 20 msec -->talk spurt
       begins.
 With loss possible, receiver must look at both time
  stamps and sequence numbers.
      difference of successive stamps > 20 msec and sequence
       numbers without gaps --> talk spurt begins.
         Recovery from packet loss (1)
forward error correction           Playout delay needs to be
  (FEC): simple scheme              fixed to the time to receive
 for every group of n chunks       all n+1 packets
  create a redundant chunk by      Tradeoff:
  exclusive OR-ing the n original     increase n, less
  chunks                                 bandwidth waste
 send out n+1 chunks, increasing     increase n, longer
  the bandwidth by factor 1/n.           playout delay
 can reconstruct the original n      increase n, higher
  chunks if there is at most one         probability that 2 or
  lost chunk from the n+1 chunks         more chunks will be lost
          Recovery from packet loss (2)
2nd FEC scheme
• “piggyback lower
quality stream”
• send lower resolution
audio stream as the
redundant information
• for example, nominal
stream PCM at 64 kbps
and redundant stream
GSM at 13 kbps.


              • Whenever there is non-consecutive loss, the
              receiver can conceal the loss.
              • Can also append (n-1)st and (n-2)nd low-bit rate
              chunk
         Recovery from packet loss (3)




Interleaving
 chunks are broken              if packet is lost, still have
   up into smaller units          most of every chunk
 for example, 4 5 msec units    has no redundancy overhead
   per chunk                     but adds to playout delay
 Packet contains small units
   from different chunks
Summary: Internet Multimedia: bag of tricks

 use UDP to avoid TCP congestion control (delays) for time-
  sensitive traffic
 client-side adaptive playout delay: to compensate for delay
 server side matches stream bandwidth to available client-to-
  server path bandwidth
      chose among pre-encoded stream rates
      dynamic server encoding rate
 error recovery (on top of UDP)
      FEC, interleaving
      retransmissions, time permitting
      conceal errors: repeat nearby data
 PARTIE II
Voice Over IP
  Voice Over IP: PLAN
 Introduction
 VoIP: introduction
VoIP is an End-to-End Architecture which exploits processing in the end points.
VoIP server


Unlike the traditional Public Switch Telephony Network - where processing is
done inside the network.




Network Convergence:

In the past, many different networks - each optimized for a specific use: POTS,
data networks (such as X.25), broadcast radio and television, … and each of these in turn
often had specific national, regional, or proprietary implementations)

   (Now) we think about a converged network which is a global network
    VoIP: introduction
Potential Networks




    We will focus on VoIP,
    largely independently of the
    underlying network,
i.e., LAN, Cellular, WLAN,
    PAN, Ad hoc, … .
  VoIP: introduction
Internetworking is
• based on the interconnection (concatenation) of multiple networks
• accommodates multiple underlying hardware technologies by
   providing a way to interconnect heterogeneous networks and makes
   them inter-operate.
Public Switched Telephony System (PSTN) uses a fixed sampling rate,
   typically
8kHz and coding to 8 bits, this results in 64 kbps voice coding;
   however, VoIP is
not limited to using this coding and could have higher or lower data
   rates depending on the CODEC(s) used, the available bandwidth
   between the end points, and the user‟s preference(s).
One of the interesting possibilities which VoIP offers is quality which
   is:
• better that “toll grade” telephony or
• worse than “toll grade” telephony (but perhaps still acceptable)
This is unlike the fixed quality of traditional phone systems.
VoIP: introduction
VoIP a major market
Voice over IP has developed as a major market - which began with H.323 and
   has
now moved to SIP. There are increasing numbers of users and a large variety
   of
VoIP hardware and software on the market. With increasing numbers of
   vendors,
the competition is heating up - is it a maturing market?
“Cisco began selling its VoIP gear to corporations around 1997, but
until the past year, sales were slow. Cisco notes that it took more than three
   years to sell its first 1 million VoIP phones, but the next 1 million
took only 12 months.”
As of their fiscal year 2005 (ending July 30, 2005), they had shipped their
6 millionth IP phone[9]. (This is 3 million more than one year earlier.)
    VoIP: introduction
    VOIP Modes of Operation

•   PC to PC
•   PC-to-Telephone calls
•   Telephone-to-PC calls
•   Telephone-to-Telephone calls via the Internet
•   Premises to Premises
        use IP to tunnel from one PBX/Exchange to another
        see Time Warner‟s “Telecom One Solution”
• Premises to Network
        use IP to tunnel from one PBX/Exchange to a gateway
        of an operator
• Network to Network
        from one operator to another or from one operator‟s
         regional/national network to the same operator in
         another region or nation
VoIP: introduction
IP based data+voice infrastructure
   VoIP: introduction
   Voice Gateway




 Use access servers filled with digital modems (currently (formerly?) used
   for current analog modem pools) as voice gateways or special purpose
   gateways such as that of Li Wei [4].
  VoIP: introduction
Voice over IP (VOIP) Gateways
Gateways not only provide basic telephony and fax services, but can
   also enable lots of value-added services, e.g., call-centers,
   integrated messaging, least-cost routing, … .
 Such gateways provide three basic functions:
• Interface between the PSTN network and the Internet
  Terminate incoming synchronous voice calls, compress the voice,
   encapsulate it into packets, and send it
   as IP packets. Incoming IP voice packets are unpacked,
   decompressed, buffered, and then sent out as synchronous voice to
   the PSTN connection.
• Global directory mapping
  Translate between the names and IP addresses of the Internet
   world and the E.164 telephone numbering scheme of the PSTN
   network.
• Authentication and billing
Voice representation
  VoIP: introduction
Signaling
Based on the H.323 standard on the LAN and conventional signaling
   will be used on telephone networks.
NB: In conventional telephony networks signalling only happens at the
   beginning and end of a call. See Theo Kanter‟s dissertation for what
   can be enabled via SIP so that you can react to other events.
Fax Support
Both store-and-forward and real-time fax modes.
• In store-and-forward the system records the entire FAX before
   transmission.
Management
Full SNMP management capabilities via MIBs (Management
   Information Base)
• provided to control all functions of the Gateway
• Extensive statistical data will be collected on dropped calls,
   lost/resent packets, and network delays.
  VoIP: introduction
Compatibility
De jure standards:
• ITU G 723.1/G.729 and H.323
• VoIP Forum IA 1.0
De facto standards:
• Netscape‟s Cooltalk
• Microsoft‟s NetMeeting (formerly H.323,
  now SIP)
Session Initiation Protocol (SIP) [RFC 2543]
  is much simpler than H.323
  VoIP: introduction
Cisco’s Voice Over IP
Enables Cisco 3600 series routers to carry live voice
   traffic (e.g., telephone calls and faxes) over an IP
   network.
 They state that this could be used for:
• Toll bypass
• Remote PBX presence over WANs
• Unified voice/data trunking
• POTS-Internet telephony gateways
 Uses Real-Time Transport Protocol (RTP) for carrying
   packetized audio and video traffic over an IP network.
 Cisco 3600 supports a selection of CODECs:
• G.711 A-Law 64,000 bits per second (bps)
• G.711 u-Law 64,000 bps
• G.729 8000 bps
    VoIP: introduction
 Cisco 3800 supports even more CODECs:
•   ITU G.726 standard, 32k rate
•   ITU G.726 standard, 24k rate
•   ITU G.726 standard, 16k rate
•   ITU G.728 standard, 16k rate (default)
•   ITU G.729 standard, 8k rate
   By using Voice Activity Detection (VAD) - you only need to
    send traffic if there is something to send {Note: telecom
    operators like this because it enables even higher levels of
    statistical multiplexing}.
   An interesting aspect is that users worry when they hear
    absolute silence, so to
   help make them comfortable it is useful to play noise when
    there is nothing useful
   to output. Cisco provide a “comfort-noise command to
    generate background noise to fill silent gaps during calls if
    VoIP: introduction
Intranet Telephone System
 On January 19, 1998, Symbol Technologies and Cisco
   Systems announced that
 they had combined the Symbol Technologies‟ NetVision™
   wireless LAN handset
 and Cisco 3600 to provide a complete wireless local area
   network telephone system based on Voice-Over-IP
   technology.
 The handset uses a wireless LAN (IEEE 802.11)
   infrastructure and a voice gateway via Cisco 3600 voice/
   fax modules. The system conforms to H.323.
 "I believe that this is the first wireless local area network
   telephone based on this technology" -- Jeff Pulver
 Seamless roaming via Symbol‟s pre-emptive roaming
   algorithm with load balancing.
 Claims each cell can accommodate ~25 simultaneous, full-
   duplex phone calls. Ericsson partnered with Symbol, using
  VoIP: introduction
 Wireless LANs
 “The wireless workplace will soon be upon us1
 Telia has strengthened its position within the area of
  radio-based data solutions through the acquisition of
  Global Cast Internetworking. The company will
  primarily enhance Telia Mobile‟s offering in wireless
  LANs and develop solutions that will lead to the
  introduction of
 the wireless office. A number of different alternatives to
  fixed data connections are currently under development
  and, later wireless IP telephony will also be introduced.
 …
 The acquisition means that Telia Mobile has secured the
  resources it needs to maintain its continued expansion
  and product development within the field of radio-based
  LAN solutions. Radio LANs are particularly suitable for
  use by small and medium-sized companies as well as by
VoIP: introduction
 Telia’s HomeRun
 http://www.homerun.telia.com/
 A subscription based service to link you to your corporate network
  from airports,
 train stations, ferry terminals, hotels, conference centers, etc. via
  WLAN.
 Look for Telia‟s HomeRun logo:
  VoIP: introduction
Ericsson’s "GSM on the Net"
• Provide communication services over an integrated GSM- IP
   (Internet Protocol) network
• support local and global mobility
• support multimedia capabilities and IP-based applications
• uses small radio base stations to add local-area GSM coverage to
   office LANs
• provides computer-telephony integration: applications include
   web-initiated telephony, directory-assisted dialing, unified
   messaging and advanced conferencing and application-sharing
   using voice datacoms and video.
  VoIP: introduction
VOIP vs. traditional telephony
 As of 2003 approx. 14% of International traffic to/from
  the US is via VoIP, based on 24 billions minutes vs. 170.7
  billion minutes via PSTN [10] (the article cites the source
  of data as TeleGeography Research Group/Primetrica Inc.)
 As of December 2004, commercial VoIP calling plans for
  unlimited North American traffic cost ~US$20-30/month.
 There is a move for traditional operators to replace their
  exchanges with IP
  VoIP: introduction
Economics
 “Can Carriers Make Money On IP Telephony?” by Bart Stuck and
  Michael
 "What is the reality in the battle over packet-versus-circuit
  telephony, and what is hype?
 Looking at the potential savings by cost element, it is clear that in
  1998, access arbitrage is
 the major economic driver behind VOIP. By 2003, we
  anticipate that switched-access arbitrage will diminish in
  importance, as the ESP exemption disappears and/or access rates
 drop to true underlying cost.
 However, we believe that the convergence between voice and data
  via packetized networks
 will offset the disappearance of a gap in switched access
  costs. As a result, VOIP will continue to enjoy a substantial
  advantage over circuit-switched voice. Indeed, as voice/data
  VoIP: introduction
VoIP vs. traditional telephony
 Henning Schulzrinne in a slide entitled “Why should carriers
   worry?”1 nicely states the threats to traditional operators:
• Evolution from application-specific infrastructure Þ Content-
   neutral bandwidth delivery mechanism - takes away the large
   margins which the operators are used to (and want!): – “GPRS:
   $4-10/MB, SMS: >$62.50/MB, voice (mobile and landline):
   $1.70/MB”
• Only operators can offer services Þ Anybody can offer phone
   services
• SIP only needs to handle signaling, not media traffic
• High barriers to entry Þ No regulatory hurdles2
In addition to this we can add:
• Only vendors can create services Þ anybody can create a service
 NB. These new services can be far broader than traditional
   telephony services.
  VoIP: introduction
Dérégulation Trends
• replacing multiplexors with Routers/Switches/… << 1/10 circuit swi.
  cost
• Standard telco interfaces being replaced by datacom interfaces
• New Alliances:
  • HP/AT&T Alliance, 3Com/Siemens, Bay/Ericcson,
  Cabletron/Nortel, Alcatel integrating Cisco IOS software
  technology, Ericsson Radio Systems & Cisco Systems collaborate
  wireless Internet services, …
• future developments building on VOIP
      Fax broadcast, Improved quality of service, Multipoint audio bridging,
       Text-to-speech conversion and Speech-to-Text conversion, Voice
       response systems, …
       Replacing the wireless voice network‟s infrastructure with IP:
    U. C. Berkeley‟s ICEBERG: Internet-based core for Cellular
  networks BEyond the thiRd Generation
  VoIP: introduction
Carriers offering VOIP
 “Equant, a network services provider, will announce tomorrow
  that it is introducing voice-over-frame relay service in 40
  countries, ... The company says customers can save 20% to 40%
  or more by sending voice traffic over its frame relay network.
  "This is the nearest you‟re going to get to free voice," says
  Laurence Huntley, executive VP of marketing for Equant
  Network Service. … Equant isn‟t alone in its pursuit to send
  voice traffic over data networks. Most of the major carriers
  are testing services that would send voice over data networks.
  ... .”1
• October 2002:
   • Verizon offering managed IP telephony via IPT Watch for
  US$3-4/month
   • WorldCom offering SIP based VoIP for DSL customers for
  US$50-60/month for unlimited local, domestic long distance,
  and data support {price does not include equipment at
  VoIP: introduction
MCI (formerly WorldCom) Connection
 Previously
 • 3 or more separate networks (often each had its own staff!)
 • Duration/geography-based pricing
 • Expensive moves, adds, and changes (typically 1+
   move/person/year)
  • Standalone applications - generally expensive
  • Closed PBX architecture
 Today
  • via gateway to the PSTN, service expands beyond the LAN to the
   WAN
  • centralized intelligence is offered; customers utilize a Web
   browser to control and manage their network
  • MCI incurs the costs of buying major equipment, thus limiting
   customer‟s risk and capital investment
  VoIP: introduction
Level 3 Communications Inc.
Introduced (3)VoIP Toll Free service: “a toll-free calling
   service across the United
States, rounding out its local and long distance voice over
   Internet protocol offerings.”
 Antone Gonsalves, E-BUSINESS: Level 3 Rounds Out VoIP
   Offerings, Internetweek.com, January 13, 2004,
 http://www.internetweek.com/e-
   business/showArticle.jhtml?articleID=17300739
 Level 3 sells services to carriers, who then offer VoIP and
   data services to their customers.
 Uses softswitch networking technology to convert voice
   signals from the PSTN
 to IP packets and conversely converts packets to voice
   signals when a call is routed
 to the public switched network. (>30 x 109 minutes of calls
  VoIP: introduction
TeliaSonera Bredbandstelefoni
February 5th, 2004 TeliaSonera annouces their
  residential broadband telephony service using server
  and client products from Hotsip AB (www.hotsip.com).
  In addition to telephony, the service includes: video
  calls, presence, and instant messaging.[6]
• The startup cost is 250 kr and the monthly cost 80
  kr.
• Calls to the fixed PSTN network are the same price
  as if you called from a fixed telephone in their
  traditional network.
• Customers get a telephone number from the
  “area/city” code 075 (i.e.,+46 75-15xxxxxxx)
• They do not support calls to “betalsamtal” (0900-
  numbers)
  VoIP: introduction
Emulating the PSTN
Many people feel that VoIP will really only “take off” when it can
  really emulate all the functions which users are used to in the
  PSTN:
• Integration with the web via: Click-to-connect
• “Dialing” an e-mail address or URL {digits vs. strings}
• Intelligent network (IN) services:
   •   Call forward, busy
   •   Call forward, no ans.
   •   Call forward, uncond.
   •   Call hold
   •   Call park
   •   Call pick-up
   •   Call waiting
   •   Consultation hold
   •   Do not disturb
  VoIP: introduction
• additional PBX features (which in Sweden means providing
  functions such as “I‟m on vacation and will not return until 31
  August 2005”)
• Computer-Telephony Integration (CTI), including Desktop call
  management, integration with various databases, etc.
• PSTN availability and reliability (thus the increasing use of
  Power over Ethernet for ethernet attached IP phones - so the
  wall outlet does not have to provide power for the phone to
  work)
• Roaming - both personal and device mobility
• Phone number portability
• E911 service {How do you handle geographic location of the
  station?}
  VoIP: introduction
Calling and Called Features
• Calling feature - activated when placing a call
  • e.g., Call Blocking and Call Return
• Called feature - activated when this entity would be the
   target of a call
   • Call Screening and Call Forward
  VoIP: introduction
Beyond the PSTN: Presence & Instant Messaging
• Presence, i.e., Who is available?
• Location, i.e., Where are they?: office, home, traveling, …
• Call state: Are they busy (in a call) or not?
• Willingness: Are they available or not?
• Preferred medium: text message, e-mail, voice, video, …
• Preferences (caller and callee preferences)
See Sinnreich and Johnston‟s Chapter 11 (Presence and
  Instant Communications) & course 2G5565 Mobile
  Presence: Architectures, Protocols, and Applications.
• Reuters has deployed a SIP-based instant-messaging
  platform for the financial services industry that has
  50,000 users each week.
• IBM‟s NotesBuddy application for ~315k employees - an
  experimental messaging client that integrates instant
  messaging (IM), email, voice, and other communication.
  VoIP: introduction
Presence-Enabled Services
• Complex call screening
  • Location-based: home vs. work
  • Caller-based: personal friend or business colleague
  • Time-based: during my “working hours” or during my
   “personal time”
• Join an existing call Þ Instant Conferencing, group chat
   sessions, …
• Creating a conference when a specific group of people are
   all available and willing to be called
• New services that have yet to be invented! (This is a good
   area for projects in 2G5565 Mobile Presence:
   Architectures, Protocols, and Applications)
• SIP Messaging and Presence Leveraging Extensions
   (SIMPLE) Working Group was formed in March 2001
   VoIP: introduction
Three major alternatives for VoIP
Concept                                                             Implementation
Use signalling concepts from the traditional telephony industry           H.323
Use control concepts from the traditional telephony industry          Softswitches
Use an internet-centric protocol                        Session Initiation Protocol (SIP)


SIP  a change from telephony‟s “calls” between handsets controlled
  by the network to “sessions” which can be between processes on
  any platform anywhere in the Internet and with both control and
  media content in digital form and hence can be easily manipulated.
  • thus a separate voice network is not necessary
  • open and distributed nature enables lots of innovation
    – since both control and media can be manipulated and
    – “events” are no longer restricted to start and end of calls
  VoIP: introduction
Negatives
Although VoIP equipment costs less than PBXs:
• the technology is new and thus upgrades are frequent (this
   takes time and effort)
• PBXs generally last ~10 years and public exchanges ~30yrs;
   while VoIP equipment is mostly computer equipment with a ~3
   year ammortization
  VoIP: introduction
Deregulation  New Regulations
“I am preparing legislation to preserve the free regulatory
   framework that has allowed VoIP applications to reach
   mainstream consumers,” Sununu, Republican from New
   Hampshire, said in a statement. “VoIP providers should be free
   from state regulation, free from the complexity of FCC
   regulations, free to develop new solutions to address social
   needs, and free to amaze consumers.
  VoIP: introduction
Programmable “phone”
 Programming environments
• Symbian
• Java
• Linux
• …
Avoids lock-in driven by operators and telecom equipment
   vendors
Greatly increases numbers of developers
 more (new) services
 more security problems
  VoIP: introduction
Conferences Voice on the Net (VON) http://www.von.com/
Interoperability testing:
• SIP development community‟s interoperability testing event
   is called SIPit http://www.sipit.net/ 1. Note: The SIPit event
   is closed to the public and press, and no information is
   released about which products fail to comply with the
   standard.
  • Why have it closed? So that the testing can be done
   wthout risk of public embarrasment.
• Interoperability is one of the most important aspects of wide
   deployment using multiple vendors products[5].
• Proper handing of server failover is considered by some to be
   the most critical interoperability issue at present[5].
  VoIP: introduction
Not with out problems
It is not necessary a smooth transition to VoIP. Numerous
   organizations have faced problems [14] and there remain
   vast areas where further work is needed.
Potential for Spam over Internet Telephony (SPIT), Denial
   of Service, …
     VoIP: introduction/ Deployment
                  H.323                        SIP
                                                                  Gatekeeper

         Corporate LAN


                                            Gateway



                                                      Multipoint Control Unit


                          Switched
                                                                     H.320
                          Circuit Network                            (over ISDN)
                          (POTS and ISDN)

Router Internet                                                      H.324
                                                                     (over POTS)


                                                             Speech-Only
                                                             (telephones)
VoIP: introduction/ Desktop and
Room Videoconferencing Systems
 VoIP: introduction/ 3 Ways to
Videoconference over the Internet




           1. Point-to-Point
     VoIP: introduction/ 3 Ways to
Videoconference over the Internet (Contd.)




           2. Multi-Point Star Topology
     VoIP: introduction/ 3 Ways to
Videoconference over the Internet (Contd.)




         3. Multi-Point Multi-Star Topology
  VoIP: introduction/ Signaling Protocols
                   Terminology
 Call Establishment and Teardown
 Call Control and Supplementary Services
    Call waiting, Call hold, Call transfer

 Capability Exchange
 Admission Control
 Protocol Encoding (ASN1, HTTP)
         VoIP: Details
Traditional Telecom vs. Datacom
Circuit-switched                                                           Packet-switched
standardized interfaces                                                    standardized protocols and packet formats
lots of internal state (i.e., each switch & other network nodes)           very limited internal state
                                                                           • caches and other state are soft-state and dynamically built based on
                                                                                 traffic
                                                                           • no session state in the network
long setup times - since the route (with QoS) has to be set up from end-   End-to-End Argument Þ integrity of communications is the
      to-end before there is any further traffic                                reponsibility of the end node, not the network
services: built into the network Þ hard to add new services                Services can be added by anyone
• operators decide what services users can have                            • since they can be provided by any node attached to the network
• all elements of the net have to support the service before it can be     • users control their choice of services
      introduced
• Application programming interfaces (APIs) are often vendor
      specific or even proprietary
centralized control                                                        no central control Þ no one can easily turn it off
“carrier class” equipment and specifications                               a mix of “carrier class”, business, & consumer equip.
• target: very high availability 99.999% (5 min./year of                   • backbone target: high availability >99.99% (50 min./year
      unavailability)                                                           unavailability)
• all equipment, links, etc. must operate with very high availability      • local networks: availability >99% (several days/year of
                                                                                unavailability)
                                                                           • In aggregate - there is extremely high availability because most of
                                                                                the network elements are independent
long tradition of slow changes                                             short tradition of very fast change
• PBXs > ~10 years; public exchanges ~30yrs                                • Moore’s Law doublings at 18 or 9 months!
clear operator role (well enshrined in public law)                         unclear what the role of operators is
                                                                           (or even who is an operator)
     VoIP: Details
VoIP details: Protocols and Packets


Carry the speech frame inside an RTP packet




Typical packetization time of 10-20ms per audio frame.

This should be compared to the durations relevant to speech phenomena:
• “10 ms: smallest difference detectable by auditory system (localization),
• 3 ms: shortest phoneme (plosive burst),
• 10 ms: glottal pulse period,
• 100 ms: average phoneme duration,
• 4 s: exhale period during speech.” (from slide titled „What is a “short” window of time?‟
    VoIP: Details
RTP and H.323 for IP Telephony


 audio/video applications                          signaling and control                data applications

 video                         audio   RTCP        H.225          H.225       H.245          T.120
 code                          codec               registration   Signaling   Control

                   RTP
                            UDP                                                   TCP
                                              IP


 H.323         framework of a group protocols for IP telephony (from ITU)
 H.225         Signaling used to establish a call
 H.245         Control and feedback during the call
 T.120         Exchange of data associated with a call
 RTP           Real-time data transfer
 RTCP          Real-time Control Protocol
       VoIP: introduction
   RTP, RTCP, and RTSP
audio/video       signaling and control                   streaming applications
applications
video, audio, …   RTCP                         S                CODECs
CODECs                                         D
                                               P
       RTP                                     SI                 RTSP
                                               P
                              UDP                   T   UDP
                                                    C
                                                    P
                                          IP
  VoIP: introduction
Real-Time Delivery
 In a real-time application  data must be delivered with the same
  time relationship as it was created (but with some delay)
 Two aspects of real-time delivery (for protocols):
 Order           data should be played in the same order as it was
                   created
 Time           the receiver must know when to play the packets, in
                  order to reproduce the same signal as was input
  We keep these separate by using a sequence number for order and
  a time stamp for timing.
 Consider an application which transmits audio by sending datagrams
  every 20ms, but does silence detection and avoids sending packets
  of only silence. Thus the receiver may see that the time stamp
  advances by more than the usual 20ms, but the sequence number
  will be the expected next sequence number. Therefore we
  can tell the difference between missing packets and silence.
  VoIP: introduction
Packet delay
 A stream of sampled audio packets are transmitted from the source
   (sn), received at the destination (rn), and played (pn), thus each packet
   experiences a delay before playout (dn)

 If a packet arrives too late (r3 arrives after we   should have started to
   play at p3), then there is a problem (for some or all of the third packet‟s
   audio)
   VoIP: introduction
Dealing with Delay jitter
 Unless packets are lost, if we wait long enough they will come, but
  then the total delay may exceed the threshold required for
  interactive speech! (~180ms)
    VoIP: introduction
Delay and delay variance (jitter)
 The end-to-end delay (from mouth to ear - for audio), includes:
  encoding, packetization, (transmission, propagation,
   switching/routing, receiving,)+ dejittering, decoding, playing
 To hide the jitter we generally use playout buffer only in the
   final receiver. Note: This playout buffer adds additional delay
   in order to hide the delay variations (this is called: delayed
   playback), playback delay > delay variance
 There are very nice studies of the effects of delay on peceived
   voice quality
• the delay impairment has roughly two linear behaviors, thus
   Id = 0.024d + 0.11( d – 177.3 )H(d – 177.3)
  d = one-way delay in ms
  H(x) = 0
  if ( x < 0 )      else      H(x) = 1 when           x >= 0
• for delays less than 177ms conversation is very natural, while
   above this it become more strained (eventually breaking down 
  VoIP: introduction
Playout delay
• Playout delay should track the network delay as it varies during a
   session
• This delay is computed for each talk spurt based on observed
   average delay and deviation from this average delay -- this
   computation is similar to estimates of RTT and deviation in TCP
• Beginning of a talk spurt is identified by examining the
   timestamps and/or sequence numbers (if silence detection is
   being done at the source)
• The intervals between talk spurts give you a chance to catch-up
    • without this, if the sender‟s clock were slightly faster than
   the receiver‟s clock the queue would build without limit! This is
   important as the 8kHz sampling in PC‟s codecs is rarely exactly
   8kHz.
    VoIP: introduction
When to play
 The actual playout time is not a function of the arrival time, only
  of the end-to-end delay which can be calculated as shown below:
  VoIP: introduction
Retransmission, Loss, and Recovery
 For interactive real-time media we generally don‟t have time to
  request the source to retransmit a packet and to receive the new
  copy Þ live without it or recover it using Forward Error
  Correction (FEC), i.e., send sufficient redundant data to enable
  recovery.
 However, for non-interactive media we can use retransmission at
  the cost of a longer delay before starting playout
 If you do have to generate output, but don‟t have any samples to
  play:
• audio
   • Comfort noise: play white nosie or play noise like in the last
  samples {as humans get uncomfortable with complete silence, they
  think the connection is broken!}
    • if you are using highly encoded audio even a BER of 10-5 will
  produce very noticable errors
  VoIP: introduction
• video
   • show the same (complete) video frame again
   • you can drop every 100th frame (for a BER of 10-
  2), but the user will not notice!
 There may also be compression applied to RTP


Patterns of Loss
 With simple FEC you could lose every other packet
   and still not be missing content, but if pairs of
   packets are lost then you lose content.
 To understand temporal patterns of speech, various
   models have been developed,
    VoIP: introduction
VoIP need not be “toll quality”
 Public Switched Telephony System (PSTN) uses a fixed sampling
  rate, typically 8kHz and coding to 8 bits, this results in 64 kbps
  voice coding
 However, VoIP is not limited to using this coding and could have
  higher or lower data rates depending on the CODEC(s) used, the
  available bandwidth between the end points, and the user‟s
  preference(s).
 One of the interesting possibilities which VoIP offers is quality
  which is:
  • better that “toll grade” telephony or
  • worse than “toll grade” telephony (but perhaps still
    acceptable)
 This is unlike the fixed quality of traditional phone systems.
 VoIP: introduction
RTP: Real-Time Transport Protocol
• First defined by RFC 1889, now defined by RFC
  3550
• Designed to carry a variety of real-time data:
  audio and video.
• Provides two key facilities:
   • Sequence number for order of delivery
      (initial value chosen randomly)
   • Timestamp (of first sample) - used for
      control of playback Provides no mechanisms
      to ensure timely delivery
   VoIP: introduction




• VER - version number (currently 2)
• P –wherther zero padding follows the payload
• X -whether extension or not
• M - marker for beginning of each frame (or talk spurt if doing silence detection)
• PTYPE - Type of payload - first defined as Profiles in RFC 1890 now defined in RFC 3551
We will address the other fields later.
   VoIP: introduction
Payload types
VoIP: introduction
 VoIP: introduction
Audio Encodings
VoIP: introduction
  VoIP: introduction
Timestamps
 The initial timestamp is to be chosen randomly (just as the
  initial sequence number is selected randomly):
  • to avoid replays
  • to increase security (this assumes that the intruder does
  not have access to all the packets flowing to the destination)
 The timestamp granularity (i.e., the units) are determined by
  the payload type {often based on the sampling rate}
    VoIP: introduction
Stream translation and mixing




• Each source has a unique 32 bit Synchronization Source Identifier.
• When several sources are mixed the new stream gets its own unique Synchronization Source Identifier
and the IDs of the contributing sources are included as Contributing Source IDs, the number of which
is indicated in the 4-bit CC field of the header.
VoIP: introduction
Proposed RTCP Reporting Extensions
 See RFC 3611 RTP Control Protocol Extended Reports (RTCP XR)
 VoIP Metrics Report Block - provides metrics for monitoring VoIP
   calls.



 Inserer images module2: P109
    VoIP: introduction
   block type (BT)   the constant 64 = 0x40
   reserved          8 bits - MUST be set to zero unless otherwise defined.
   length            length of this report block in 32-bit words minus one, including
                       the header; constant 6.
   loss rate         fraction of RTP data packets from the source lost since the
                       beginning of reception, as a fixed point number with the binary
                       point at the left edge of the fielda
   discard rate       fraction of RTP data packets from the source that have been
                       discarded since the beginning of reception, due to late or early
                       arrival, under-run or overflow at the receiving jitter buffer, in
                       binary fixed point
   burst duration     mean duration of the burstb intervals, in milliseconds
   burst density      fraction of RTP data packets within burst intervals since the
                        beginning of reception that were either lost or discarded, in
                        binary fixed point
   gap duration       mean duration, expressed in milliseconds, of the gap intervals
                       that have occurred
    VoIP: introduction
   gap density         fraction of RTP data packets within inter-burst gaps since the
                         beginning of reception that were either lost or discarded, in
                         binary fixed point
   round trip delay     most recently calculated round trip time between RTP
                         interfaces, in milliseconds
   end system delay   most recently estimated end system delay, in milliseconds
   signal level        voice signal relative level is defined as the ratio of the signal
                         level to overflow signal level, expressed in decibels as a signed
                          integer in two‟s complement form
   doubletalk level    defined as the proportion of voice frame intervals during which
                        speech energy was present in both sending and receiving
                        directions
   noise level        defined as the ratio of the silent period back ground noise level
                       to overflow signal power, expressed in decibels as a signed
                        integer in two‟s complement form
    VoIP: introduction
   R factor        a voice quality metric describing the segment of the call that is
                    carried over this RTP session, expressed as an integer in the
                    range 0 to 100, with a value of 94 corresponding to "toll
                    quality« and values of 50 or less regarded as unusable;
                    consistent with ITU-T G.107 and ETSI TS 101 329-5
   ext. R factor   a voice quality metric describing the segment of the call that is
                    carried over an external network segment, for example a cellular
                    network
   MOS-LQ          estimated mean opinion score for listening quality (MOS-LQ) is a
                     voice quality metric on a scale from 1 to 5, in which 5
                    represents excellent and 1 represents unacceptable
   MOS-CQ          estimated mean opinion score for conversational quality (MOS-
                    CQ) defined as including the effects of delay and other effects
                    that would affect conversational quality
   Gmin            gap threshold, the value used for this report block to determine
                    if a gap exists
    VoIP: introduction
   R RX Config            PLC - packet loss concealment: Standard
                           (11)/enhanced(10)/disabled (01)/ unspecified(00); JBA - Jitter
                            Buffer Adaptive: Adaptive (11) / non-adaptive (10) / reserved
                            (01)/ unknown (00). Jitter Buffer is adaptive then its size is
                            being dynamically adjusted to deal with varying levels of
                            jitter;JB Rate - Jitter Buffer Rate (0-15)
 Jitter Buffer             nominal size in frames (8 bit)
 Jitter Buffer Maximum     size in frames (8 bit)
 Jitter Buffer Absolute    size in frames
  Maximum
     VoIP: introduction
RTP translators/mixers
Translator     changes transport (e.g., IPv4 to IPv6) or changes media coding
              (i.e., transcoding)
Mixer         combines multiple streams to form a combined stream

Connect two or more transport-level “clouds”, each cloud is defined by a common
network and transport protocol (e.g., IP/UDP), multicast address or pair of unicast
addresses, and transport level destination port.

To avoid creating a loop the following rules must be observed:
• “Each of the clouds connected by translators and mixers participating in one RTP
    session either must be distinct from all the others in at least one of these
     parameters (protocol, address, port), or must be isolated at the network level
     from the others.
• A derivative of the first rule is that there must not be multiple translators
   or mixers connected in parallel unless by some arrangement they partition the set
   of sources to be forwarded.”
     VoIP: introduction
Synchronizing Multiple Streams

One of the interesting things which RTP supports is synchronization of multiple streams
(e.g., audio with a video stream)




• Unfortunately since the time stamps of each stream started at a random number we
  need some other method to synchronize them!
• Thus use Network Time Protocol (NTP) based time stamps  an absolute timestamp
• Since we now include the stream timestamps we can correlate these to absolute time
  (and hence from one stream to another)
  VoIP: introduction
RTP Transport and Many-to-many Transmission
RTP uses a connectionless transport (usually UDP):
  • Retransmission is undesirable (generally it would be too late)
  • Since RTP handles flow control and sequencing we don‟t need
     this from the transport protocol
  • RTP is packet oriented
  • Enables us to easily use multicast (when there are many
      endpoints that want the same source stream)
    • multicast identifed a group
    • these multicast groups can be dynamic
  VoIP: introduction
Sessions, Streams, Protocol Port, and Demultiplexing
Session          All traffic that is sent to a given IP
                 address, port
Stream           a sequence of RTP packets that are from a
                 single synchronization source
 Demultiplexing:
Session demultiplexing    occurs at the transport layer
                          based on the port number
Stream demultiplexing     occurs once the packet is passed to
                          the RTP software, based on the
                          synchronication source identifier -
                          then the sequence number and
                          timestamp are used to order the
                           packet at a suitable time for playback
   Protocoles des applications interactives en
                   temps réel
 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 6.3 Real-time, Interactive     6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP, H323
    SIP
         Real-Time Protocol (RTP)
 RTP specifies a packet    RTP runs in the end
  structure for packets      systems.
  carrying audio and        RTP packets are
  video data                 encapsulated in UDP
 RFC 1889.                  segments
 RTP packet provides       Interoperability: If
      payload type          two Internet phone
       identification        applications run RTP,
      packet sequence       then they may be able
       numbering             to work together
      timestamping
         RTP runs on top of UDP
RTP libraries provide a transport-layer interface
that extend UDP:
   • port numbers, IP addresses
   • payload type identification
   • packet sequence numbering
   • time-stamping
                    RTP Example
 Consider sending 64 kbps
                                RTP header indicates type
  PCM-encoded voice over
                                 of audio encoding in each
  RTP.
                                 packet
 Application collects the
                                      sender can change
  encoded data in chunks,             encoding during a
  e.g., every 20 msec = 160           conference.
  bytes in a chunk.             RTP header also contains
 The audio chunk along with     sequence numbers and
  the RTP header form the        timestamps.
  RTP packet, which is
  encapsulated into a UDP
  segment.
                      RTP and QoS

 RTP does not provide any mechanism to ensure
  timely delivery of data or provide other quality of
  service guarantees.
 RTP encapsulation is only seen at the end systems:
  it is not seen by intermediate routers.
      Routers providing best-effort service do not make any
       special effort to ensure that RTP packets arrive at the
       destination in a timely matter.
                           RTP Header



Payload Type (7 bits): Indicates type of encoding currently being
used. If sender changes encoding in middle of conference, sender
informs the receiver through this payload type field.

    •Payload type 0: PCM mu-law, 64 kbps
    •Payload type 3, GSM, 13 kbps
    •Payload type 7, LPC, 2.4 kbps
    •Payload type 26, Motion JPEG
    •Payload type 31. H.261
    •Payload type 33, MPEG2 video

Sequence Number (16 bits): Increments by one for each RTP packet
sent, and may be used to detect packet loss and to restore packet
sequence.
                   RTP Header (2)

 Timestamp field (32 bytes long). Reflects the sampling
  instant of the first byte in the RTP data packet.
    For audio, timestamp clock typically increments by one
      for each sampling period (for example, each 125 usecs
      for a 8 KHz sampling clock)
    if application generates chunks of 160 encoded samples,
      then timestamp increases by 160 for each RTP packet
      when source is active. Timestamp clock continues to
      increase at constant rate when source is inactive.

 SSRC field (32 bits long). Identifies the source of the RTP
  stream. Each stream in a RTP session should have a distinct
  SSRC.
    Real-Time Control Protocol (RTCP)

 Works in conjunction with           Statistics include number
  RTP.                                 of packets sent, number of
 Each participant in RTP              packets lost, interarrival
  session periodically                 jitter, etc.
  transmits RTCP control              Feedback can be used to
  packets to all other                 control performance
  participants.                          Sender may modify its
 Each RTCP packet contains                transmissions based on
  sender and/or receiver                   feedback
  reports
      report statistics useful to
       application
                       RTCP - Continued




- For an RTP session there is typically a single multicast address; all RTP
and RTCP packets belonging to the session use the multicast address.

- RTP and RTCP packets are distinguished from each other through the use of
distinct port numbers.

- To limit traffic, each participant reduces his RTCP traffic as the number
of conference participants increases.
                 RTCP Packets

Receiver report packets:   Source description
 fraction of packets        packets:
  lost, last sequence       e-mail address of
  number, average            sender, sender's name,
  interarrival jitter.       SSRC of associated
Sender report packets:       RTP stream.
 SSRC of the RTP           Provide mapping
  stream, the current        between the SSRC and
  time, the number of        the user/host name.
  packets sent, and the
  number of bytes sent.
         Synchronization of Streams

 RTCP can synchronize            Each RTCP sender-report
  different media streams          packet contains (for the
  within a RTP session.            most recently generated
 Consider videoconferencing       packet in the associated
  app for which each sender        RTP stream):
  generates one RTP stream             timestamp of the RTP
  for video and one for audio.          packet
 Timestamps in RTP packets            wall-clock time for when
                                        packet was created.
  tied to the video and audio
  sampling clocks                 Receivers can use this
                                   association to synchronize
    not tied to the wall-
                                   the playout of audio and
     clock time
                                   video.
            RTCP Bandwidth Scaling
 RTCP attempts to limit its      The 75 kbps is equally shared
  traffic to 5% of the             among receivers:
  session bandwidth.                   With R receivers, each
Example                                 receiver gets to send RTCP
 Suppose one sender,                   traffic at 75/R kbps.
  sending video at a rate of 2    Sender gets to send RTCP
  Mbps. Then RTCP attempts         traffic at 25 kbps.
  to limit its traffic to 100     Participant determines RTCP
  Kbps.                            packet transmission period by
 RTCP gives 75% of this           calculating avg RTCP packet
  rate to the receivers;           size (across the entire
  remaining 25% to the             session) and dividing by
  sender                           allocated rate.
              H.323 – ITU Standard

 H.323 is an umbrella standard that defines how
  real-time multimedia communications such as
  Videoconferencing can be supported on packet
  switched networks (Internet)
 Devices: Terminals, Gateways, Gatekeepers and
  MCUs
 Codecs:
      Video: H.261, H.263
      Audio: G.711, G.723.1
 Signaling: H.225, H.245
 Transport Mechanisms: TCP, UDP, RTP and RTCP
 Data collaboration: T.120
 Many others…
H.323 Protocol Stack
H.323 Call setup and teardown
H.323 Call setup and teardown (Contd.)
                SIP - IETF Standard

 Session Initiation Protocol (SIP)
 SIP Elements: User Agent Client (UAC), User
  Agent Server (UAS)
      Ease of location of users due to the flexibility in SIP to
       contact external location servers to determine user or
       routing policies (url, email ID, e.g. pcalyam@oar.net)
 Server Types: Redirect Server, Proxy Server and
  Registrar
      SIP Proxy: perform application layer routing of SIP
       requests and responses.
      SIP Registrar: UAC sends a registration message and the
       Registrar stores registration information in a location
       service using a non-SIP protocol
SIP Deployment Architecture
       SIP (Session Initiation Protocol)
SIP long-term vision
 All telephone calls and video conference calls take
  place over the Internet
 People are identified by names or e-mail
  addresses, rather than by phone numbers.
 You can reach the callee, no matter where the
  callee roams, no matter what IP device the callee
  is currently using.
 VoIP: introduction
SIP, RTP, and RTSP



 audio/video applications                signaling and control   streaming
                                                                     applicat
                                                                     ions
 video, audio, … CODECs                  RTCP            SDP     CODECs


                        RTP                               SIP      RTSP

                              UDP                                 TCP
                                    IP
  VoIP: introduction
SIP actors
       VoIP: introduction
SIP Methods and Status Codes


 INVITE         Invites a user to join a call.
 ACK            Confirms that a client has received a final response to an INVITE.
 BYE            Terminates the call between two of the users on a call.
 OPTIONS        Requests information on the capabilities of a server.
 CANCEL         Ends a pending request, but does not end the call.
 REGISTER       Provides the map for address resolution, this lets a server know the location of a user.

 SIP Status codes - patterned on and simular to HTTP’s status codes:

Code    Meaning

1xx     Informational or Provisional    - request received, continuing to process the request
2xx     Final - the action was successfully received, understood, and accepted
3xx     Redirection - further action needs to be taken in order to complete the request
4xx     Client Error - the request contains bad syntax or cannot be fulfilled at this server
5xx     Server Error - server failed to fulfill an apparently valid request (Try another server!)
6xx     Global Failure - the request cannot be fulfilled at any server (Give up!)
   VoIP: introduction
SIP Uniform Resource Indicators (URIs)
Two URI schemes - similar to the form of an e-mail addresses: user@domain
• SIP URI - introduced in RFC 2543
   • example: sip:maguire@kth.se
• Secure SIP URI - introduced in RFC 3261
  • example: sips:maguire@kth.se
  • Requires TLS over TCP as transport forsecurity
Three types of SIP URIs:
• Address of Record (AOR) (identifies a user)
  • example: sip:maguire@kth.se
  • Need DNS SRV records to locate SIP Servers for kth.se domain
• Fully Qualified Domain Name (FQDN) (identifies a specific device)
  • examples: sip:maguire@130.237.212.2 or sip:maguire@chipsphone.it.kth.se
  • sip:+46-8-790-6000@kth.se; user=phone the main KTH phone number in E.164
    format via a gateway; note that the visual separators in a phone number (dashes, dots, etc.)
    are ignored by the protocol
• Globally Routable UA URIs (GRUU) (identifies a instance of a user at a given UA, for the duration of
    the registration of the UA to which it is bound)
  VoIP: introduction
Issues to be considered
• Address Resolution
• Session Setup
• Media Negotiation
• Session Modification
• Session Termination
• Session Cancellation
• Mid-call Signaling
• Call Control
• QoS Call setup
  VoIP: introduction
Address Resolution
The first step in routing the SIP request is to compute the mapping
  between the URI and a specific user at a specific host/address.
This is a very general process and the source of much of SIP‟s power.
• providing support for mobility and portability
• Can utilize:
  • DNS SRV lookup
  • ENUM
  • Location Serverlookup
We will look at this in detail (see DNS and ENUM on page 222), but
  for now will assume a simple DNS lookup based on the URI.
    VoIP: introduction
SIP timeline
 Simple version of Alice invites Bob to a SIP session:




 We begin by examining the details of session setup. For lots of example of basic
 call flows
  VoIP: introduction
SIP Invite
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
 (Alice‟s SDP not shown)
SIP is a text-based protocol and uses ISO 10646 character set in UTF-8
   encoding (RFC 2279). The message body uses MIME and can use S/MIME for
   security.
The generic form of a message is:
   generic-message = start-line
                      message-header*
                      CRLF
                      [ message-body ]
  VoIP: introduction
Bob’s response to Alice’s INVITE
SIP/2.0 200 OK
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bKnashds8
To: Bob <sip:bob@biloxi.com>;tag=a6c85cf
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:bob@192.0.2.8>
Content-Type: application/sdp
Content-Length: 131
  VoIP: introduction
ACK
ACK sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 ACK
Content-Length: 0
A successful set-up sequence was: INVITE/200/ACK
A set-up failure would be a sequence such as: INVITE/4xx1/ACK
NB: INVITE is the only method in SIP that involves a 3-way handshake with ACK
The further setup of the call can proceed directly between Alice and Bob, based
   on the the information (especially that in SDP) which they have exchanged.
Now we will examine the details of these initial SIP messages!
VoIP: introduction
SIP Invite (method/URI/version)
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142

Start Line is the first line of a SIP message which contains:
• method or Request type: INVITE
• Request-URI which indicates who the request is for:
      sip:bob@biloxi.com
• SIP version number: SIP/2.0
     VoIP: introduction
SIPVia
INVITE sip:bob@biloxi.com SIP/2.0
Via:SIP/2.0/UDPproxy.stockholm.se:5060;branch=82.1
Via:SIP/2.0/UDPpc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
• Via headers show the path the request has taken in the SIP network
   • A Via header is inserted by the User Agent which initiated the request (this will be
    last in the list of Via headers)
   • Via headers are inserted above this by proxies in the path (i.e., this details the
    path taken by the request)
• Via headers are used to route responses back the same way the request came
  • this allows stateful proxies to see both the requests and responses
  • each such proxy adds the procotol, hostname/IP address, and port number
• The “branch” parameter is used to detect loops
      VoIP: introduction
Dialog (Call leg) Information
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
• Dialog (formerly “call leg”) information is in headers:
   • To tag, From tag, and Call-ID All requests and responses in this call will use this
    same Dialog information.
   • “To” specifies the logical recipient of the message, “From” the logical sender
    – the string “Bob” is called a “display name”
• Call-ID is unique identifier
  • Call-ID number is arbitrary, but it uniquely identifies this call (i.e., session), hence all
    future references to this session refer to this Call-ID
  • usually composed of pseudo-random string @ hostname or IP
     Address
    VoIP: introduction
SIP CSeq
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142

• Command Sequence (CSeq) Number
  • Initialized at start of call (1 in this example)
  • Incremented for each subsequent request
  • Used to distinguish a retransmission from a new request
• Followed by the request type (i.e., SIP method)
  VoIP: introduction
SIP Contact
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
• Contact header contains a SIP URL for direct communication between
   User Agents
  • If Proxies do not Record-Route, they can be bypassed
  • Contact header is also present in 200 OK response
   VoIP: introduction
SIP Content Type and Length
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
• Content-Type indicates the type of message body attachment (others
   could be text/plain, application/cpl+xml, etc.)
  • Here “application/sdp” indicates that it is SDP
• Content-Length indicates length of the message body in octets (bytes)
  • 0 indicates that there is no message body.
    VoIP: introduction
SIP Max-Forwards
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com:5060;branch=z9hG4bK776asdhds
Max-Forwards: 30
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
• Max-Forwards is decremented by each proxy that forwards the
   request.
• When count goes to zero, request is discarded and 483 Too Many
   Hops response is sent.
• Used for stateless loop detection.
  VoIP: introduction
Other header fields
• Content-Encoding:
• Allow:
• Expires:
• In-Reply-To:
• Priority: indicated priority of displaying a message to a user
  • Normal
  • Urgent
  • Non-Urgent
  • Emergency
• Require: contains a list of options which the server is expected to
   support in order to process a request
• Retry after: number of seconds after which a requestor should try again
• Supported: enumerates all the extensions supported the sender (NB:this
   differs from a “Require” which requires that a destination supports the
   given extension)
    VoIP: introduction
Several types of SIP Servers
• User agent server runs on a SIP terminal (could be a SIP phone, a PDA, laptop, …) - it
    consists of two parts:
  • User Agent Client (UAC): initiates requests
  • User Agent Server (UAS): responds to requests
• SIP proxy - interprets (if necessary, rewrites specific parts of a SIP request message)
    before forwarding it to a server closer to the destination:
  • SIP stateful proxy server - remembers its queries and answer; can also forward
    several queries in parallel (can be Transaction Stateful or Call Stateful).
  • SIP stateless proxy server
  • They ignore SDP and don‟t handle any media (content)
  • Outgoing proxy: used by a user agent to route an outgoing request
  • Incoming proxy: proxy server which supports a domain (receives incoming requests)
• SIP redirect server - directes the client to contact an alternate URI
• Registrar server - receives SIP REGISTER requests updates LS
• Location server (LS) - knows the current binding and queried by
      Proxies to do their routing
   • SIP can also use DNS SRV (Service) Records used to locate
       (inbound) proxy.
   • note in RFC 2543: a location server is a generic term for a database
    VoIP: introduction
SIP Trapezoid1
   VoIP: introduction
SIP Call Setup1

                  SIP Call Setup - when B has not registered
     VoIP: introduction
   SIP Call Setup Attempt1




              SIP Call Setup Attempt - when B has not registered
    VoIP: introduction
SIP Call Setup Attempt1




        SIP Call Setup Attempt - when B has not registered (continued)
   VoIP: introduction
SIP Presence1




                SIP Presence: A asks to be told when B registers
   VoIP: introduction
SIP B not Present1




                     NOTIFY A that B has <Not Signed In>
  VoIP: introduction
SIP Registration Example1
  VoIP: introduction
Purpose of registration
User B registers in order to establish their current device and location
• Only their location server need know
  • The location server need not disclose this location to "just
   anyone", but can apply various polices to decide who can learn of it,
   i.e., their location server can decide who can ask for B‟s location and
   when they can ask (perhaps even limiting it to where they can ask
   from).
  • This has significant privacy implications.
• This scales well - as B only has to update their location server,
   rather than having to inform all possible callers.
  VoIP: introduction
REGISTERing




           REGISTER request includes one or more Contact headers:
                   Contact: <sip:UserA@4.3.2.1>;class=personal
   Contact: <sip:UserA-msg-depot@voicemail.provider.com>;feature=voicemail
     Contact: <sip:+13145551212@gateway.com;user=phone>;class=business
    Contact: <sip:+13145553333@cellphone.com;user=phone>;mobility=mobile
                          Contact: <tel:+13145551212>
                     Contact: <mailto:UserA@hotmailer.com>
   VoIP: introduction
SIP Call Setup Attempt1




           SIP Call Setup Attempt - when B has registered
VoIP: introduction
SIP Session Termination using BYE




 BYE causes the media session to be torn down.
Note: BYE like INVITE is an end-to-end method.
  VoIP: introduction
SIP Session Termination using CANCEL




CANCEL causes the session to be cancel. Note: If a reply is 481 Transaction
Unknown, then the user agent may need to send a BYE since the CANCEL was
received after the final reponse was sent (there was a race condition).
  VoIP: introduction
CANCEL and OPTIONS
• In addition to canceling a pending session
• CANCEL can also be sent by a proxy or user agent
  • for example, when a parallel fork has been done, once
   you have a successful match, then you can cancel the
   others
OPTIONS
• Used to query a server or user agent for its capabilities
• sometimes used for very simple presence information
  VoIP: introduction
Unsuccessful final responses are hop-by-hop
Unsuccessful final responses (3xx, 4xx, 5xx, 6xx) are always
  acknowledged on a hop-by-hop basis.
Only 200 OK is end-to-end.
  VoIP: introduction
Authentication
Builds upon authentication schemes developed for HTTP
   (see RFC 2716), for example challenge/response,
   digest, …
Two forms:
• user agent-to-user agent
  • 401 Unauthorized Þ Authentication Required
• user agent-to-server
  • 407 Proxy Authentication Required Þ Authentication
   Required (reponse sent by a proxy/server)
Note: Any SIP request can be challenged for
   authentication.
Note: There is no integrity protection, for additional
   information
   VoIP: introduction
SIP Method Extensions in other RFCs
• INFO - Call signaling information during a call
  • RFC 2976: The SIP INFO Method, October 2000.
• PRACK - Reliable ACK
  • RFC 3262: Reliability of Provisional Responses in Session Initiation Protocol (SIP),
    June 2002
• SUBSCRIBE/NOTIFY
  • RFC 3265: Session Initiation Protocol-Specific Event Notification, June 2002.
• REFER
  • RFC 3515: The Session Initiation Protocol (SIP) Refer Method, April 2003
  • “The SIP Referred-By Mechanism”, Internet-Draft, March 22, 2004
• MESSAGE
   • RFC 3428: Session Initiation Protocol Extension for Instant Messaging, December
    2002
• UPDATE - Early media and preconditions
   • RFC 3311: The Session Initiation Protocol (SIP) UPDATE Method. October 2002
  VoIP: introduction
SIP Extensions and Features
• Method Extensions
  • Unknown methods rejected by User Agent using 405 or 501
   response
  • Listed in Allow header field
  • Proxies treat unknown methods as a non-INVITE
• Header Field Extensions
  • Unknown header fields are ignored by user agents and proxies
  • Some have feature tags registered, these can be declared in a
   Supported or Require header field
• Message Body Extensions
  • Unknown message body types are rejected with a 406 response
  • Supported types can be declared with an Accept header field
  • Content-Disposition indicates what to do with it
• Extension must define failback to base SIP specification.
 VoIP: introduction
SIP Presence - Signed In




   NOTIFY A that B has <Signed In>
           User Agent B
 VoIP: introduction
SUBSCRIBE and NOTIFY




If user B‟s agent does not wish to provide user A‟s agent with a notification it sents
a 603 Decline response.
 VoIP: introduction
SIP Instant Messaging Example




       A sends a message to B User Agent B
  VoIP: introduction
SIP Instant Messaging Example1




               B sends a message to A
  VoIP: introduction
Message example
A simple Instant Message (IM) as SIP:
MESSAGE im:UserB@there.com SIP/2.0
Via: SIP/2.0/UDP 4.3.2.1
To: User B <im:UserB@there.com>
From: User A <im:UserA@here.com>
Call-ID: a5-32-43-12@4.3.2.1
CSeq: 1 MESSAGE
Content-type: text/plain
Content-Length: 16
Hi, How are you?
The response will be a 200 OK from B.
Note: the example uses IM URIs instead of SIP URIs.
A MESSAGE request can be sent at anytime (even without a session).
For futher information about the work of the IETF working group on
VoIP: introduction
Midcall signalling
 Midcall signalling used when
   the session parameters don‟t
   change, to exchange
   information between two user
   agents via the body of an
   INFO message. If the
  session parameters did change
   then you would use a re-
   INVITE.




   Note in the above figure the ISUP messages: IAM (Initial address message), INM
                  (Answer message), and USR (user-to-user message).
  VoIP: introduction
Call Control
SIP is peer-to-peer -- thus a proxy can‟t issue a BYE, only end
    devices (UAs) can.
To methods for third party call control:
• A proxy passes an invite on, but stays in the signaling path
• Use REFER to initial third party control (the third party is no
    longer in the signaling path).
Useful for:
• click-to-call•Automatic Call Distribution(ACD)
• web callcenter
•…
  VoIP: introduction
  Example of using REFER
  Third party call control, by User A to set up a
     session between Users B and C.




Note: the use by A of an INVITE with a Refer-to header and
  the user by B of an INVITE with a Referred-By header.
  VoIP: introduction
QoS and Call Setup
The path which SIP takes may be different that the media path, thus
  new extensions were added to enable more handshaking:
• Early Media - by allowing SDP to be included in the 183 Session
   Progress response (allows establishment of QoS requirements before
  call is answered) - may also enable one-way RTP {hence the name
  “early media”}, formally: “media during early dialog”
• Reliable Provisional Responses extension allows detection of a lost
  183 Session Progress response based on using Provisional
  Response Acknowledgement (PRACK)
• UAs can use preCOnditions MET (COMET) method to indicate that the
  QoS requirements can be met and that the user can be alerted by
  ringing the phone.
SDP in the INVITE contains an attribute-value pair: "a=qos:mandatory".
 For further details see: RFC3312 and RFC3262;
VoIP: introduction
  VoIP: introduction
SIP Message retransmission
Timer default Purpose
T1      500ms       Set when SIP request is first sent
T2       4 sec.Longer timeout when a provisional response has been
         received
If a request is lost, then timeout T1 will generate a retransmission of
   the request.
If a request is received and a provisional response is received, then
   sender switches to timeout T2 (to wait for the final response).
INVITE is different:
• receiving a provisional response stops all re-transmissions of the
   INVITE;
• however, the sender of the provisional response starts a T1 timer
   when it sends its final response and if it does not get an ACK in
   time it retransmits the final response.
  VoIP: introduction
RFC 3261 - New Services
• Customized ringing
  • A trusted proxy can insert an Alert-Info header field into an
  INVITE
• Screen Pops
  • A trusted proxy can insert an Call-Info header field into an
  INVITE
  • URI can be HTTP and can contain call control “soft keys”
• Callback
  • Reply-to and In-Reply-To header - to assist in returning calls
• Annoucement handling
  • UAS or proxy need not make a decision about playing an early
  media annoucement
     – Error response contains new Error-Info header field
         which contains the URI of the annoucement
   • UAC makes a decision based on the user‟s interface
  VoIP: introduction
Compression of SIP
As textual protocols, some might thing that SIP and SDP are too
  verbose, hence RFC 3486 describes how SIP and SDP can be
  compressed. RFC 3485 describes a static dictionary which can be
  used with Signaling Compression (SigComp) to achieve even higher
  efficiency.
  VoIP: introduction
Intelligent Network service using SIP
ITU has defined a set of service features (think of them as primitives
   which can be use to construct more complex services). These are
   divided into two sets:
•Capability Set 1: ServiceFeatures
•Capability Set 2
 Abbreviated Dialing (ABD)                               Call
   queueing(QUE)                        Off-net calling (ONC) Attendant
   (ATT)                                        Call transfer (TRA)
   One number (ONE)
 Authentication (AUTC)                                    Call waiting
   (CW)                         Origin dependent routing (ODR)
   Authorization code (AUTZ)                               Closed
   usergroup(CUG)                    Originating call screening (OCS)
   Automatic callback (ACB)                               Consultation
   calling (COC)              Originating user prompter (OUP)
 Call distribution (CD)                                   Customer
   profile management (CPM)      Personal numbering (PN)
    VoIP: introduction
 Capability Set 1: Services
 Abbreviated dialling (ABD)
    Originating call screening (OCS)[ Account card calling (ACC)
    Premium rate (PRM)
   Automatic alternative billing (AAB)                   Security
    screening (SEC)
   Call distribution (CD)                                Selective
    call forwarding on busy/don‟t answer (SCF)
   Call forwarding (CF)                                  Selective
    call forwarding
   Call rerouting distribution (CRD)                     Call
    forwarding on busy
   Completion of calls to busy subscriber (CCBS)        Call
    forwarding on don‟t answer (no reply) Conference calling (CON)
    Split charging (SPL)
   Credit card calling (CCC)
      VoIP: introduction
   Capability Set 2
   Wireless services
   Inter-network services
   Multimedia
   Call pick-up
   Calling name delivery


Gateways
• Gateway Location Protocol (GLP) - a protocol used between
Location Server (LSs) {similar to BGP}
• Signaling Gateway - to convert from the signaling used in one network
to that of the other
• Media Gateway - to convert the media format from that used in one network to that of the other
  VoIP: introduction
Significance
• In July 2002, 3GPP adopted SIP for their signalling protocol (Release5)
• 3GPP adops SIMPLE as instant messaging/presence mechanism
   (Release6)
While there are some differences between the 3GPP and IETF points of view
 3GPP                                     IETF
Network does not trust the user       User only partially trusts the
                                        network
layer 1 and layer 2 specific                   generic
walled garden                                  open access

 Not suprisingly the 3GPP system for using SIP is rather complex with a
   number of new components: Proxy Call Session
Control Function (P-CSFC), Interrogating Call Session Control Function (I-
   CSFC), Serving Call Session Control
Function (S-CSFC), Home Subscriber Server (HSS), Application Server (AS),
   Subscription Locator Function (SLF), Breakout Gateway Control Function
 VoIP: introduction
Session Description Protocol
   (SDP)
  VoIP: introduction
Session Description Protocol (SDP)
Defined by RFC 2327 http://www.ietf.org/rfc/rfc2327.txt
• describes media session
• a text-based protocol
• carried in MIME as a message body in SIP messages
• uses RTP/AVP Profiles for common media types [72]
Note: It is more a session description format than a protocol.
Internet drafts related to SDP:
• Session Description Protocol (SDP) Source Filters
    http://www.ietf.org/internet-drafts/draft-ietf-mmusic-sdp-
   srcfilter-06.txt
• SDP: Session Description Protocol (new)
http://www.ietf.org/internet-drafts/draft-ietf-mmusic-sdp-new-
   24.txt
• Connection-Oriented Media Transport in SDP
  http://www.ietf.org/internet-drafts/draft-ietf-mmusic-sdp-
   comedia-10.txt
   VoIP: introduction
SDP Message Details
V=0
o=Tesla 289084526 28904526 IN IP4 lab.high-voltage.org
s=-
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
• Version number (ignored by SIP)
• Origin (not used by SIP)
• Subject (ignored by SIP)
• Connection Data
  • connection: network (IN == Internet), Address type (IPv4),
   and Address
• Time (ignored by SIP): start stop
• Media (type, port, RTP/AVP Profile)
  VoIP: introduction
Session description
v= protocol version
o= owner/creator and session identifier
s= session name
[i= session information]              { [xx] Þ xx is optional}
[e=emailaddress]
[p=phonenumber]
[u= URI of description]
[c= connection information- not required if included in all media]
[b= bandwidth information]
<Time description>+                        { <xx>+ Þ one or more
    times}
[z= time zone adjustments]
[k= encryption key]
[a= zero or more session attribute lines]* { <xx>* Þ zero or more
   VoIP: introduction
SDP Offer/Response Example
v=0                        Version of SDP (0)
o=                         Origin - not use by SIP
c=IN IP4 130.237.212.6      Connection INternet, IPv4,
                            address=130.237.212.6
t=                         Time - not use by SIP
m=video 4004 RTP/AVP 14 26 Media Video, port=4004, type=RTP/AVP
                            profile, profiles: 14 and 26
a=rtpmap:14MPA/90000       Attribute for profile 14, codec=MPA,
                            sampling rate=90000
a=rtpmap:26JBEG/90000      Attribute for profile 26, codec=JBEG,
                            sampling rate=90000
m=audio 4006 RTP/AVP 0 4    Media Audio, port=4006, type=RTP/AVP
                            profile, profiles: 0 and 4
a=rtpmap:0PCMU/8000        Attribute for profile 0, codec=PCMU
                           (PCM mlaw), sampling rate=8000
  VoIP: introduction
SDP Response Example
v=0                      Version of SDP (0)
o=                       Origin - not use by SIP
c=IN IP4 130.237.21.87    Connection INternet, IPv4,
                          address=130.237.21.87
t=                       Time - not use by SIP
m=video 0 RTP/AVP 14     Media Video, port=0, type=RTP/AVP
                          profile, profiles: 14
                          Receiver declines the video, indicated by
                          port = 0
m=audio 6002 RTP/AVP 4   Media Audio, port=6002, type=RTP/AVP
                          profile, profiles: 4
                          Receiver declines the PCM coded audio and
                          selects the GSM coded audio
a=rtpmap:4 GSM/8000      Attribute for profile 4, codec=GSM,
                         sampling rate=8000
VoIP: introduction
Session Modification
   VoIP: introduction
Session modification (continued)
• The re-INVITE could have been done by either party - it uses the
  same To, From, and Call-ID as the original INVITE.
• Note that the re-INVITEs do not cause a 180 Ringing or other
  provisional messages, since communication between Alice and Bob
  is already underway.
• Note that the first media session continues despite the SIP
  signalling, until a new agreement has been reached - at which time
  the new media session replaces the former session.
• The re-INVITE can propose changes of any of the media
  characteristics, including adding or dropping a particular media
  stream.
• this adding or dropping may be because the user has moved from
  one wireless cell to another, from one network to another, from
  one interface to another, from one device to another, …
  VoIP: introduction
Start and Stop Times
Enable the user to join a broadcast sessions during
  the broadcast.
   VoIP: introduction
Grouping of Media Lines in the Session
Description Protocol (SDP)
Defines two SDP attributes:
• "group" and
• "mid" - media stream identification
Allows grouping several media ("m") lines together. This is to support:
• Lip Synchronization (LS) and
• Flow Identification (FID) - a single flow (with several media
   streams) that are encoded in different formats (and may be
   received on different ports and host interfaces)
   • Changing between codecs (for example based on current error
   rate of a wireless channel)
Note FID does not cover the following:
   • Parallel encoding using different codecs
   • Layered coding
  VoIP: introduction
Lip Synchronization
A session description of a conference that is being multicast. First and
   the second media streams MUST be synchronized.
v=0
o=Laura 289083124 289083124 IN IP4 one.example.com
t=0 0
c=IN IP4 224.2.17.12/127
a=group:LS 1 2
m=audio 30000 RTP/AVP 0
i=voice of the speaker who speaks in English
a=mid:1
m=video 30002 RTP/AVP 31
i=video component
a=mid:2
m=audio 30004 RTP/AVP 0
i=This media stream contains the Spanish translation
  VoIP: introduction
Next generation of SDP (SDPng)
• Designed to address SDP‟s „flaws‟:
 • Limited expressiveness
        – For individual media and combinations of media
        – Often only very basic media descriptions available –
            desire for more complex media
 • No real negotiation functionality - as SDP today is a “take it or
     leave it” proposal
 • Limited extensibility (not nearly as easy to extend as SIP)
 • No semantics for media sessions! Sessions are only implicit.
• SDPng should avoid "second system syndrome"
 • Hence it should be simple, easy to parse, extensible, and have
    limited scope
 • Session Description and Capability Negotiation
  http://www.ietf.org/internet-drafts/draft-ietf-mmusic-sdpng-
  08.txt
VoIP: introduction
SIP Mobility
• Terminal mobility1 Þ the terminal moves between subnets
  • Note: Mobile IP supports this at the network layer, while
  SIP supports this at the application layer (without requiring
  Mobile IP be underneath)
• Personal Mobility  the person moves between terminals
• Service mobility  the person has access to the same
  services
  despite their movement between terminals and/or networks
  • note: the service may be reduced in quality or capabilities
  subject to the current network‟s capabilities -- but it is the
  same service
  • this implies that personalization of services must be
  distributed to the various terminals that
  the user wishes to use - see the dissertation of Roch Glitho
• Session mobility  the same session is maintained despite
  the user changing from one device to another
  VoIP: introduction
Local Number Portability
 In the PSTN this means a complex set of lookups for the number,
   since the number is no longer tied to an exchange.
 In SIP the portability occurs because of the lookup of
   name@domain, which can be mapped to whereever the user wants
   this mapped to! (i.e., fully qualified domain names are unique, but
   are not tied to an underlying network address – it is the name to
   address mapping which estabilishes this mapping and it is always
  dynamic)
  VoIP: introduction
SIP Service Creation
 It is the increased opportunities for the exchange of
   signaling information via SIP which enables many new
   features and services.
 Services implemented by x
Where x is:
• proxy server,
• called user agent,
• calling user agent, or
• Back-to-Back User Agent (B2BUA)
  VoIP: introduction
Services implemented by Extensions
i.e., new methods and headers
See the activities of the IETF SIP, SIPPING, and SIMPLE working
    groups
Proxy servers - simply treat unknown methods as an OPTION request,
    unless there is a Proxy-Require header.
User agents return:
405 Method Not Allowed         if the method is recognized, but not
                                supported
500BadRequest                  if it does not recognize the method
420BadExtension                if the UAS does not support the
                                requested feature
• All SIP extensions which use the Require or Supported header1
     must be documented as an RFC - to prevent interoperability
    problems
• All standardized SIP extensions must document how the extension
    interacts with elements that don’t understand this extension
   VoIP: introduction
SIP Service Logic


 • Call Processing Language (CPL)
 • SIP Common Gateway Interface (CGI)
 • SIP Java Servlets
  VoIP: introduction
Call Processing Language (CPL)
RFC 2824: Call Processing Language (CPL) [103] and [104]
An XML-based scripting language for describing and controlling
   call services.
CPL is a very simple language without variables, loops, or the
   ability to run external programs! {Hence non-trusted end users
   can upload services to their SIP server} However, it has
   primitives for making decisions and acting based on call
   properties (e.g., time of day, caller, called party, …).
There is a Document Type Definition (DTD) “cpl.dtd” and strict
   parsing1 is done based on this DTD.
See also Chapter 13 of Practical VoIP: Using VOCAL[1], this
   includes an example of developing a feature in CPL
  VoIP: introduction
SIP Common Gateway Interface (CGI)
RFC 3050: Common Gateway Interface for SIP
Similar to HTML CGI, a SIP CGI script resides on the server and
   passes message parameters via environment variables to a
   separate process. This process sends instructions back to the
   server through its standard output file descriptor.
Scripts can be written in Perl, Tcl, C, C++, Java, …
Of course these scripts (being based on general purpose
programming languages) do not have the limitations of CPL and
   hence only trusted users can be allowed to provide such
   scripts.
CGI scripts have access to both the request headers and the
   body and can therefore do general computations based on all
   this information.
   VoIP: introduction
SIP Java Servlets
Extends functionality of SIP client by passing messages to the SIP
   servelets.
Servlets are similar to the CGI concept, but instead of using a
   separate process, the messages are passed to a class that runs
   within a Java Virtual Machine (JVM) inside the server.
Servlets are portable between servers and operating systems, due to
   the portability of the Java code.
For details see: K. Peterbauer, J. Stadler, et al., “SIP Servlet API
   Extensions”, February 2001, (an expired internet draft)
http://www.cs.columbia.edu/sip/drafts/draft-peterbauer-sip-
   servlet-ext-00.txt
SIP Servlets were defined in A. Kristensen and A. Byttner, “The SIP
   Servlet API”, IETF Draft, September 1999,
http://www.cs.columbia.edu/sip/drafts/draft-kristensen-sip-servlet-
   00.txt
• Unfortunately this draft expired and was not carried forward, but
   is referenced (and large parts included) in subsequent work.
  VoIP: introduction
JAIN APIs
Providing a level of abstraction for service creation across circuit
   switched and packet networks, i.e., bridging IP and IN protocols.
   Goal is provisioning of telecom services by:
• Service Portability: - Write Once, Run Anywhere. (via Java
   portability)
• Network Convergence: (Integrated Networks) - Any Network
• Service Provider Access - By Anyone!
   • to allow services direct access to network resources and
   devices
SIP APIs - especially those within the JAIN™ initiative
(http://java.sun.com/products/jain/index.jsp) :
• JAIN SIP (JSR-000032) - a low level API that maps directly to
   RFC2543 - http://jcp.org/en/jsr/detail?id=32
• JAIN SIP Lite (JSR-000125)- a high-level API, to allow
   application developers to create applications that have SIP as
   VoIP: introduction
SDP API (JSR-000141) - to enable users to manipulate SDP
 messages http://jcp.org/en/jsr/detail?id=141
• JAIN SIP Servlet API (JSR-000116) -
  http://jcp.org/en/jsr/detail?id=116
• SIMPLE related APIs
  • JAIN SIMPLE Instant Messaging (JSR-000165) - to exchange
  messages between SIMPLE
   clients http://jcp.org/en/jsr/detail?id=165
  • JAIN Instant Messaging (JSR-000187) - to control, manage and
  manipulate instant messages between clients through the use of
  presence servers http://jcp.org/en/jsr/detail?id=187
  • JAIN SIMPLE Presence (JSR-000164 ) - to manipulate presence
  information between a SIMPLE client (watcher) and a presence
  server (presence agent) http://jcp.org/en/jsr/detail?id=164
  • JAIN Presence and Availability Management (PAM) API (JSR-
  000123) - http://jcp.org/en/jsr/detail?id=123
  • JAIN Presence (JSR-000186) - to control, manage and manipulate
  VoIP: introduction
   • JAIN User Location and Status (ULS) (JSR-000194) - to
        interrogate the location and status of a user‟s mobile
        device http://jcp.org/en/jsr/detail?id=194
• JAIN OAM API Specification v2.0 (JSR-000132) -
  http://jcp.org/en/jsr/detail?id=132
• JAIN ENUM API Specification (JSR-000161) - API to query
  and provision E.164 telephone numbers and their service-
  specific Uniform Resource Identifiers (URI)
  http://jcp.org/en/jsr/detail?id=161
• JAIN 3G MAP Specification (JSR-000137) - to enable mobile
  applications in the 3G domain to talk to each other
  http://jcp.org/en/jsr/detail?id=137
  VoIP: introduction
US National Institute of Standards and
Technology - SIP and Jain
http://www-x.antd.nist.gov/proj/iptel/
• NIST-SIP 1.2
• JAIN-SIP Proxy
• JAIN-SIP Instant MessagingClient
• JsPhone - a JAIN-SIP Video Phone
• NIST-SIP traces viewer
• JAIN-SIP gateway
• JAIN-SIP Third Party CallController
   VoIP: introduction
Parlay
Parlay Group formed (1998) to specify and promote open APIs
   that “intimately link IT applications with the capabilities of
   the communications world”.
Goal: to allow applications to access the functionality of the
   telecoms network in a secure way.
Parlay APIs:
• Service interfaces - provide access to network capabilities
   and information
• Framework interfaces provide the underlying supporting
   necessary for the service interfaces to be secure and
   manageable.
The APIs are defined in Universal Modeling Language (UML).
  VoIP: introduction
SIP Request-URIs for Service Control
B. Campbell and R. Sparks, “Control of Service Context
   using SIP Request-URI”, IETF RFC 3087, April 2001
   proposes a mechanism to communicate
context information1 to an application (via the use of a
   distinctive Request-URI).
Using different URIs to provide both state information
   and the information about lead to this state transition
   (for example, you were forwarded to the voicemail
   system because the user did not answer vs. being
   forwarded to the voicemail system because the user is
   busy with another call).
    VoIP: introduction
Reason Header
Since it is (often) useful to know why a Session
   Initiation Protocol (SIP) request was issued, the
   Reason header was introduced. It encapsulates a
   final status code in a provisional response.
This functionality was needed to resolve the
   "Heterogeneous Error Response Forking Problem"
   (HERFP).
    VoIP: introduction
 User Preferences
• Caller preference
  • allows caller to specify how a call should be handled
  • to specify media types: audio, video, whiteboard, …
  • to specify languages (of the callee -- consider for example a help desk call where you want
    to get help in your choice of language)
  • do you want to reach the callee at home or only at work?, via a landline or on their mobile
     phone? …
  • examples: should the call be forked or recurse, do you want to use a proxy or redirect, do
     you want to CANCEL 200 messages or not,
• Called party preference
  • accepting or rejecting calls: based on time of day, day of week, location of called party, from
    unlisted numbers, …

Caller/callee different
  • Callee is passive, caller is active
     – Thus callee‟s preferences must be defined ahead of time (for example by CPL)
     – However, caller‟s preferences can be in request
 • Services (usually) run on callee server
 • A given caller might contact any of a large number of number of servers (each of which will
have to decide how to process this caller‟s request)
Conclusion: Include caller preferences in request
      VoIP: introduction
Contact parameters
Values are either pre-set or indicated when a user REGISTER‟s:
Parameter   Value                  example(s)          Explaination of example(s)
class        personal           class=personal   Call should go the "home" not the office
             business
duplex       full half      duplex=full           should be a full duplex call
             send-only
             receive-only
feature       voicemail     feature=voicemail     Caller wants to be connected to voicemail server
              attendant

language    language tag               language=”en,de,se,!fi"    Connect caller to someone who speaks
                                                                  English, German, Swedish, not Finnish
media        MIME types                 media="text/html"           use HTML as the media type
mobility     fixed                 mobility fixed   connect to the callee‟s fixed rather than
               mobile                                                mobile terminal
priority    urgent                  priority=urgent       call is urgent (as seen by the caller).
            emergency
            non-urgent
service     fax | IP | ISDN |        service=IP use IP rather than fax/ISDN/PSTN/…
              PSTN | text
  VoIP: introduction
Contact header example
Contact: maguire <sip:maguire@it.kth.se>
   ;language="en,de,se,!es"
  ;media="audio,video,application/chat“
VoIP: introduction
Accept/Reject-Contact header(s)
SIP request contains Accept-Contact and Reject-Contact
   headers
Reject-Contact indicates URI‟s not acceptable
Accept-Contact indicates ordered list of acceptable URI‟s
Indication by means of rules
  • set intersection and non-intersection of parameters
  • string match of URIs
Example:
Accept-Contact: sip:sales@acme.com ;q=0,
;media="!video" ;q=0.1,
;mobility="fixed" ;q=0.6,
;mobility="!fixed" ;q=0.4
In the second example, the caller does not want to talk to
   sales@acme.com, but
has a preference for video and somewhat prefers the user‟s
  VoIP: introduction
Callee (i.e., called party) Parameter processing
• Proxy obtains list of URI‟s and the parameters for each, for
   callee
• Those that match a rule in Reject-Contact are discarded
• Matching set of URI‟s determined
• q parameters merged
• Result split into sets of q-equivalency classes
• Parallel search of highest preference q-equivalence class
   VoIP: introduction
Accept-Contact Example
example from http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-
    mmusic-sip-caller-00.txt
sip:mjh@aciri.org;language=en;media=audio,vi
deo;q=.8
sip:m.handley@acm.org;class=business;q=0.3
   VoIP: introduction
Request-Disposition
Defines services desired from proxy servers
Feature values                 Meaning
proxy redirect              whether to proxy or redirect
Cancel                      whether to return just the first 200-class response, or all
                                 2xx responses
no-cancel
Fork                        whether to fork or not (i.e., proxy to only a single address)
no-fork
Recurse                      whether a proxy server upon receiving a 3xx-class response
                             should recurse (i.e., send requests to the addresses listed in
no-recurse                    the response) or not (i.e., simply forward the list of
                              addresses upstream towards the caller)



parallel sequential            For a forking proxy server, should it send the request to all
                               known addresses at once (parallel), or go through them
                               sequentially, i.e., contacting the next address only after
                               receiving a non-2xx or non-6xx final response.
Queue                         If called party is temporarily unreachable, caller can indicate
no-queue
                              that it wants to enqueue rather than be rejected
      VoIP: introduction
SIP Service Examples
Some examples of SIP Services are listed below
Call Hold                      Single Line Extension
Consultation Hold              Find-Me
Music On Hold                  Call Management (Incoming Call Screening)
Unattended Transfer Call Management (Outgoing Call Screening)
Attended Transfer      Call Park
Call Forwarding        Call Pickup
Unconditional
Call Forwarding        Automatic Redial
  - Busy
Call Forwarding
- No Answer
3-way Conference
  - Third Party is Added
3-way Conference
  - Third Party Joins
  VoIP: introduction
Privacy-Conscious Personalization
Bell Labs‟ has developed software designed to give cell
   phone users greater control over the disclosure of their
   location
Preferences could depend on:
• who is requesting the location data,
• what time of day it is,
• or the callers‟ activities,
• ….
Requests for location are then filtered through these
   preferences, and are permitted or blocked accordingly.
Operators might provide users with a selection of
   “preference palettes” to start with, the user could then
   customize their preferences over time.
  VoIP: introduction
SIP Security
SIP Security - RFC 3261 [120]
If you want to secure both the SIP and RTP traffic, then you should
   probably be using an IPSec VPN.
SIP‟s rich signalling means that the traffic reveals:
• caller and called parties IP addresses
• contact lists
• traffic patterns
For further details concerning how complex it is to protect such
   personal information see the dissertation by Alberto Escudero-
   Pascual, “Privacy in the next generation Internet, Data Protection
   in the context of European Union Data Protection Policy”
 For an example of a call anonymizer service -- using a back-to-
   back user agent (B2BUA)
    VoIP: introduction
SIP Digest Authentication
Built upon HTTP‟s challenge/response mechanism
Challenges:
• 401 Authentication Required or
• 407 Proxy Authorization Required
Header fields:
Digest                                      the schema name
username="A"                                The user name as specified in the credentials
realm="sip:proxy.com"                       realm - copied from the challenge realm indicates the
                                            domain for the authentication
nonce="e288df84f1cec4341ade6e5a359"          nonce - copied from the challenge a unique string typically
                                             generated from a timestamp (and possibly a seed), then
                                             encrypted with the user‟s private key
opaque="63632f41"                            opaque string which should be returned unchanged to be
                                             matched against the challenge (allows for a stateless
                                             system)
uri="sip:UserB@there.com"                     URI from the Request-URI
response="1d19580cd833064324a787ecc"          message digest computed using user‟s credentials and
                                              the nonce
  VoIP: introduction
SIP and S/MIME
RFC 3261 describes the use of Secure MIME (S/MIME) message
   bodies:
• SIP header fields can be encrypted in an S/MIME message body
• see RFC 2633
Provides:
• Message integrity
  • Allows detection of any modification of message contents
• Message privacy
  • Private headers protected by S/MIME
• Identity
  • Certificates can be verified to validate identity
  VoIP: introduction
SDP & RTP security
As noted earlier SDP enables you to say that you will
encrypt the media stream which is sent via RTP - such as
   DES in CBC Mode (DES-CBC)1 or AES in f8-mode
This is done via adding to the SDP for each media
   description: k=encryption key
    VoIP: introduction
User identity
J. Peterson and C. Jennings in an IETF draft [122] define mechanims and
   practices to assure the identity of the end user that originates a SIP
   request (does not cover identity for responses).
Their identity mechanism derives from the following principle:
If you can prove you are eligible to register in a domain under a
   particular address-of-record (AoR), then you are also proving
   that you are capable of receiving requests for that AoR \ when you
   place that AoR in the From header field of a SIP request other than a
   registration (e.g., INVITE), you are providing a ‟return address‟ where
   you can legitimately be reached.
Introduces:
(a) authentication service (at either a user agent or a proxy server) and
(b) two new SIP headers, Identity & Identity-Info headers
  VoIP: introduction
miniSIP
miniSIP supports pluggable CODECs:
• each RTP packet says which codec was used
• SDP can specify multiple codecs each with different properties (including
    better than toll quality)
• tests used PCM Þ sending 50 packets of 160 byte RTP payload length
    (packet size is 176 bytes) per second (i.e. 64 Kbps), i.e., 20 ms between
    packets
• Configuration used in the test described next:
   • time to transmit/receive a packet ~55-60 ms
   • Laptop ASUS 1300B with Pentium III processor, 700 MHz
   • 112 MB RAM (no swapping)
   • Operating System: SuSE Linux 7.1 Personal Edition
   • Security Services: confidentiality and message authentication
     (with Replay Protection)
  • Cryptographic Algorithms: AES in Counter Mode for the confidentiality
    and HMAC SHA1 for the message authentication
  • Lengths: master key: 16 bytes; salting key: 14 bytes; authentication key:
    16 bytes; encryption key: 16 bytes; block: 128 bytes
    VoIP: introduction
Secure Real Time Protocol (SRTP)
Described in IETF RFC 3711 [131], provides confidentiality,message authentication, and
    replay protection for RTP and RTCP traffic.
Sender behavior                          Receiver behavior
                                     Read the SRTP packet from the socket.
Determine cryptographic context to use     Determine the cryptographic context to be used
Derive session keys from master key (via MIKEY)      Determine the session keys from
                                                      master key (via MIKEY)
                                                      If message authentication and replay
                                                      protection are provided, check for
                                                      possible replay and verify the
                                                      authentication tag
Encrypt the RTP payload                       Decrypt the Encrypted Portion of the packet
If message authentication required,         If present, remove authentication tag
compute authentication tag and append
Send the SRTP packet to the socket              Pass the RTP packet up the stack
  VoIP: introduction
Multimedia Internet KEYing (MIKEY) as the key management protocol
Secure Call Setup
Total delay (in ms)              Calling Delay      Answering Delay
No security                       19.5                     9.5
MIKEY, shared key                 20.9                     10.5
MIKEY, Diffie-Hellman             52.5 (UDP)               47.6 (UDP)
                                  58.9 TCP)                48.9 (TCP)
• name-servers (BIND 8.2 on Linux 2.4, 500 MHz Pentium 3 laptops)
• root name-server ns.lab manages the delegation of minisip.com and
  ssvl.kth.se to their respective name server
• two routers (1.1 GHz Celeron desktops) perform static routing, and
  each router also runs a SIP server, SIP Express Router (SER v0.8.11))
• Alice and Bob use minisip, running on 1.4 GHz Pentium laptops,running
  Linux 2.4
  VoIP: introduction
Efficient Stream Loss-tolerant Authentication (TESLA)
SRTP TESLA was designed to provide efficient data origin
   authentication
for multicast and broadcast session.
This is needed since we don‟t want to create all possible pairwise
   authentications for the participants in a conference.
    VoIP: introduction
NATs and Firewalls
Because Network Address Translation (NAT) devices change addresses
and sometimes port numbers and because addresses and port numbers are
inside both SIP and SDP there can be a problem!
Note: CNAME‟s in RTCP may need to be updated by the Network Address
Translation (NAT) to hide private network addresses.
To protocols being developed to help deal with NATs:
    • Simple Traversal of User Datagram Protocol Through Network
   Address Translators (STUN)
   • Globally Routable User Agent Universal (GRUU) Resource Indicator
    VoIP: introduction
• a URI which can be used by anyone on the Internet to
  route a call to a specific UA instance
 4000, and 4001 to the Cisco ATA) - which also refers
  to http://www.dyndns.org/
 See the internet drafts:
 • Interactive Connectivity Establishment (ICE): A
  Methodology for
 Network Address Translator (NAT) Traversal for
  Multimedia Session
 Establishment Protocols [147]
 • How to Enable Real-Time Streaming Protocol (RTSP)
  traverse Network
 Address Translators (NAT) and interact with Firewalls
 (<draft-ietf-mmusic-rtsp-nat-03> has expired)
   VoIP: introduction
Types of NAT
Source NAT                All callers look like they come from the same
                         IP address
Destination NAT           Which internal address should traffic to a
                         given port be forwarded to?
Four types of NATs and
Type                Description
Full Cone           maps a specific internal IP address and port
                    number to a given external IP address and port
                    number
                    This is the only type of NAT that allows an
                    external host to contact an internal host
                   (i.e., behind the NAT) without having previously
                   received packets from this internal host.
Restricted Cone     external hosts must have the IP address of an
                   internal host prior to communicating with this
                   internal host
Port Restricted
Cone              external hosts must have the IP address and port
                  number of an internal host prior to communicating
                  with this internal host
Symmetric        assigns unique internal IP address and port
                 numbers based on the specific internal destination
  VoIP: introduction
Cone vs. Symmetric
 NAT(a) Cone NAT vs. (b) Symmetric NAT - figure inspired by
   figures 1 and 2
    VoIP: introduction
NAT traversal methods
• Symmetric media streams
• STUN protocol
  • also: Extended STUN for Symmetric NAT
• rport SIP extension
  • See RFC 3581 - defines a new parameter for the Via header
     field, called "rport", this “allows a client to request that the
     server send the response back to the source IP address and
     port from which the request originated.”
• OPTIONS request registration refresh
  • Causes the UA to send traffic out - thus refreshing the NAT
     bindings
• Outgoing INVITE transaction refresh
• Traversal using Relay NAT (TURN)
  • insert a server in the media and signalling path (to deal with
   Symmetric NATs)
• Application Layer Gateway (ALG)
  • Here the NAT knows about SIP and “does the right thing”
• Universal Plug and Play (UPnP)
  • Use UPnP to control the NAT to open a specific “pinhole” in
      the firewall
• Manual Configuration
  • manually configure a set of addresses and ports for SIP to use
  VoIP: introduction
• Tunnel
  • Tunnel the traffic - inside IPsec, HTTP (i.e., act like HTTP), …
A NAT support “hairpinning” if it can route packets coming from the
  private network addressed to a public IP address back into the
  private network. For example, a mobile user might actually be
  connected to the private network - thus packets to this user don‟t
  actually need to be sent out and then sent back into the private
  network!
    VoIP: introduction
STUN (Simple Traversal of UDP through NATs
  (Network Address Translation))
STUN, defined in RFC 3489 [139], assists devices behind a
  NAT firewall or router with their packet routing.
• enables a device to find out its public IP address and the
  type of NAT service its sitting behind
   • By querying a STUN server with a known public address,
  the STUN client learns the public IP and port address that
  were allocated by this client‟s) NAT
• operates on TCP and UDP port 3478
• uses DNS SRV records to find STUN servers attached to a
  domain. The service name is _stun._udp or _stun._tcp
• Unfortunately, it is not (yet) widely supported by VOIP
  devices
Note: The STUN RFC states: This protocol is not a cure-all for
  the problems associated with NAT.
Open source STUN servers from vovida.org, larry.gloo.net,
  stun.fwdnet.net, stun.softjoys.com, and others.
    VoIP: introduction
STUN steps
1 Client queries a STUN server for a shared secret username and
   password
2 Server responds with a unique username/password combination for this
   client
3 Client sends a binding request using this username/password to the
   server via UDP
4 Server copies the source IP and port number into a binding response,
  and sends this response back to the client
5 Client compares the IP address and port number received from the
  server with its local IP address and port number. If they do not match,
   then the client is behind some type of NAT.
    VoIP: introduction
 UDP and TCP Firewall Traversal problems
    VoIP: introduction
 UDP and TCP NAT Traversal problems
    VoIP: introduction
SIP Application Level Gateway (ALG) for Firewall
  Traversal
 Use a proxy within the (possibly private) network:
 Firewall permits SIP and RTP traffic to/from the
  Application Level Gateway (ALG) proxy.
     VoIP: introduction
  Middlebox communications (MIDCOM)



The generic problem of enabling complex applications through the middleboxes
is being addressed by the Middlebox communications (MIDCOM) Working
Group, they do so via MIDCOM agents which perform ALG functions, logically external to
a middlebox
  VoIP: introduction
Application aware Middlebox
Newport Networks‟ Automatic Channel Mapping™ (ACM) [145]:
• SignallingProxy™ acts as a high-performance B2BUA (Back to Back
  User Agent
• MediaProxy™ provides a transit point for RTP and RTCP media
  streams between User Agents
    VoIP: introduction
Intercept architecture


 • The existance of Intercepts should be transparent to both the subject and other LEAs!
 • The dotted links (probably SNMPv3) must be secured to prevent Unauthorized Creation
   and Detection of intercepts - while solid red links must be secured to protect intercept
   related information (IRI)
 • Intercept [Access] Point (IAP): router, PSTN gateway, SIP proxy, RADIUS server, …
      VoIP: introduction
 Voice over IP Security Alliance
 The Voice over IP Security Alliance (http://www.voipsa.org/) was
    formed February 7, 2005
 They have a moderated mailing list: VOIPSEC

Spam over Internet Telephony (SPIT)
There is rising concern that misconfigured voice gateways, … will lead to increased IP telephony SPAM.
One solution is using speaker recognition and then checking to see of this speaker
is on:
• awhite list (automatic accept),
• a black list(automatic reject), or
• unknown (message could be recorded and the user listens to it later
and then adds the user to their white or black lists).
  VoIP: introduction
SIP Telephony
SIP Telephony (SIP-T) -- for details see RFC 3204
Gateway between the SIP world and the PSTN world looks like a SIP
   user agents to other SIP entities and like a terminating telephone
   switch to the PSTN.
Advantages         Provides ISUP transparency (by carrying ISUP
                      message as multipart MIME messages in the
                      SIP messages between SIP-T gateways)
Disadvantages      Does not interwork with SIP Perpetuates ISUP!
  VoIP: introduction
Telephony Routing over IP (TRIP)
• TRIP is a gateway to Location Server (LS) protocol
• Designed for an interdomain gateway
• Allows the gateway to advertise what PSTN number range it is a
   gateway for
For within a domain there is a version for between a gateway and a
   proxy: TRIP-lite
A Location Server is responsible for a Internet Telephony
   Administrative Domain (ITAD).
  VoIP: introduction
Call Control Services
Generally include advanced telephony services such as:
• Call Transfer, both Attended and Unattended
• Call Park/Un-Park
• Multistage Dialling
• Operator Services
• Conference Call Management
• Call Mobility
• Call Pickup
  VoIP: introduction
Call Center Redesign using SIP
• Replace the call center switch via VoIP
• Interactive Voice Response (IVR) - using a media
   server (for pre-recorded clips) and SIP signalling
• Automatic Call Distribution (ACD) - replace with
   scripts using Call Processing Language (CPL)
• Agent Workstation - a PC with a SIP client
• The agent has access via Web and various databases
   to information, which can be indexed by the agent
   using information from the SIP request.
   VoIP: introduction
Additional SIP Telephony services
• SIP for the Hearing Impaired
• Emergency Services
• Precedence signalling (military, government, emergency
  services, …)
   • RFC 3487 gives the requirements for resource priority
  mechanisms for SIP
• Message Waiting, Voice Mail, and Unified Messaging
• Call Waiting
• SIP continuing presence service
  • The I-Am-Alive (IAA) database [168] is a distributed
  database system that users can query after-the-event to
  determine the status of a person - it does not require the
  session properties of SIP
• Is there a SIP corollary - for continuing presence?
  VoIP: introduction
Emergency Telecommunication Service (ETS)
Telephony Signaling when used in Internet-based telephony services in
   addition to the general requirements specified in needs to support a
   number of additional requirements:
• Telephony signaling applications (used with Internet-based telephony) must
   be able to carry labels.
• The labels must be extensible
  • to support various types and numbers of labels.
• These labels should have a mapping to the various emergency related
   labels/markings used in other telephony based networks, e.g., PSTN
  • To ensure that a call placed over a hybrid infrastructure (i.e.,
   PSTN+Internet) can carry the labels end-to-end with appropriate
   translation at PSTN/Internet boundaries.
  • Only authorized users or operators should be able to create non-ordinary
   Labels (i.e., labels that may alter the default best effort service).
  • Labels should be associated with mechanisms to providing strong end-to-
   end integrity
  • Operators should have the capability of authenticating the label
  VoIP: introduction
• Application layer IP telephony capabilities must not
  preclude the ability to do application layer
  accounting.
• Application layer mechanisms in gateways and
  stateful proxies that are specifically in place to
  recognize ETS type labels must be able to support
  “best available” service (i.e., better than “best
  effort”).
VoIP: introduction
Emergency Services (E911)
We need to support 3 things:
• There must exist an emergency address (similar to 911, 112, help, …)
• find Public Safety Answering Point (PSAP)
  • outbound proxy -- only if there is a well bounded geographic area servered by this
    proxy
  • use DNS where the user or device enters a relevant name: e.g.,
    pittsburgh.pa.911.arpa
  • SLP - but scope not likely to coincide with ESR
  • call volume:
  – Sweden: SOSAlarm.se has 20 call centers distributed aound Sweden with ~18
    million calls/year with ~20% of them calls to 112 the rest are automatic alarms;
  – US: National Emergency Number Association (NENA) reports >500,000 calls/day
    or 190 million a year (more than 80% are not emergencies Þ 311 non-emergency
    number)
• obtain caller‟s identity and geographical address
   • this is done to minimize prank calls
   • caller provides in request
      – Geographic position: N 59° 24.220' E017° 57.029' +/- 77m
          and/or
      – Geographic Location: "5th floor, Isafjordsgatan 22, Kista,
    VoIP: introduction
Vonage 911 service
http://www.vonage.com/no_flash/features.php?feature=911
• User must pre-designate the physical location of their Vonage
   line and update Vonage when the user moves
• 911 dialing is not automatically a feature of having a line
  • users must pre-activate 911 dialing
  • user may decline 911 dialing
• A 911 dialed call will be connected to a general access line at the
     Public Safety Answering Point (PSAP)
  • thus they will not know you phone number or location
• Service may not be available due to
  • a local power failure (your IP phone needs power)
  • you local ISP not being able to offer service
  • one of the transit networks not being able to offer service
  • the voice gateway to the PSTN not being in service
  • …
    VoIP: introduction
Vonage equips PSAps with VoIP
Vonage Equips Over 100 New Counties and 400 Calling Centers
   With E911 in Just One Month, Vinage Press Release, March
   7, 2006
•http://www.vonage.com/corporate/press_index.php?PR=2006_
   03_07_0
• "Nearly 65 Percent of Vonage Customers Now Have E911"
• "In February alone, Vonage equipped an additional 400 calling
   centers in over 100 new counties with E911 -- bringing the
   total number of calling centers across the nation with E911
   service to over3400, which is more than half of the nation‟s
   calling centers. While it took Vonage less than a year to turn
   on E911 in more than one-half of the nation‟s PSAP‟s, it took
   the wireless industry 10 years to accomplish the same feat."
• "In the event Vonage is unable to connect to the 911 system
   or for customers who are using mobile devices such as wifi
   phones or softclients, Vonage offers a national emergency
   call center which enables customers to get local help when
  VoIP: introduction
Geographic Location/Privacy Working Group (GEOPRIV)
 GEOPRIV ( http://www.ietf.org/html.charters/geopriv-
  charter.html ) an
 IETF working group tasked with establishing a means of
  disseminating geographic data that is subject to the same sorts
  of privacy controls as presence is today.
 Jon Peterson, “A Presence-based GEOPRIV Location Object
  Format”, IETF draft
 (original version 14-Jan-04 - current version September 9, 2004),
  based on earlier
 work done in formulating the basic requirements for presence
  data -- the Presence
 Information Data Format (PIDF).
VoIP: introduction
                      Conferencing
    VoIP: introduction
 Conferencing Models [176]
 Type of Conference                       Description
  Scale
 Endpoint mixing                         One end point acts as a
  mixer for all the other end points          small

    SIP Server and distributed media        Central SIP server
    establishes a full mesh between all
   participants - each participant does their own mixing
   Dial-in conference                     All participants connect to a
    conference bridge which does
   the mixing for each participant
   Ad hoc centralized conference          Two users transition to a
    multiparty conference, by one of them using third-party signaling
    to move the call to a conference bridge
   Large multicast conference             user join the multicast
  VoIP: introduction
 Commercial conference bridge authenticate the
  users joining the conference.
    VoIP: introduction
 SIP Conferencing
 RFC 4353 [177] defines SIP procedures for the following
    common operations:
   • Creating Conferences
   • Adding Participants
   • Removing Participants
   • Destroying Conferences
   • Obtaining Membership Information
   • Adding and Removing Media
   • Conference Announcements and Recordings
   How to realize conferences
   • Centralized Server
   • Endpoint Server
   • Media Server Component
   • Distributed Mixing
  VoIP: introduction
 Mixed Internet-PSTN Services
 • PSTN and Internetworking (PINT)
 • Servers in the PSTN Initiating Requests to
  Internet Servers (SPIRITS)
 • Telephony Routing over IP (TRIP)
    VoIP: introduction
 PSTN and Internetworking (PINT)
 PSTN and Internetworking (PINT)[185] - action from the
    internet invokes a PSTN
   service (note: this is one way invocation), examples:
   • Request to Call Þ “Click to Connect” from a web page
   • Request to Fax Content Þ “Click to FAX”
   • Request to Speak/Send/Play Content
   • …
   Based on SIP extensions (SIPext), which in actuality are SDP
    extensions (i.e., the
   body of SIP messages). Redefines some methods (INVITE,
    REGISTER, and
   BYE) and introduces three new methods:
   • Subscribe - request completion status of a request
   • Notify - receive status updates
    VoIP: introduction
 Servers in the PSTN Initiating Requests to
 Internet Servers (SPIRITS)
 SPIRITS protocol [188] - implementing a family of IN
    services via internet server
   (rather than in the PSTN)
   For example, internet call waiting (ICW) - calling a busy phone
    in the PSTN
   network could pop up a call waiting panel on the client that is
    using this telephone
   line, this replaces earlier solutions such as:
   • for example, Ericsson‟s PhoneDoubler, Ericsson Review,
    No. 04, 1997
   http://www.ericsson.com/about/publications/review/1997_04
    /article55.shtml
   • PDF of the entire article:
   http://www.ericsson.com/about/publications/review/1997_04
  VoIP: introduction
 Telephony Routing over IP (TRIP)
 Telephony Routing over IP (TRIP) [194] Finding a
  route from the Internet to a gateway nearest to
  where the call should be terminated
 Telephony Routing Protocol is modeled after the
  Border Gateway Protocol (BGP)
    VoIP: introduction
 Authentication, Authorization, Accounting
 (AAA)
 This become a major issue especially in conjunction with QoS
    since for better than
   best effort service, someone probably has to pay for this high
    QoS - AAA is
   necessary to decide who you are, if you are allowed to ask for
    this service, and how
   much you should be charged. See [202] and “Authentication,
    Authorization and
   Accounting Requirements for the Session Initiation
    Protocol”[197].
  VoIP: introduction
SIP Accounting
 For definition of terms see RFC 2975
 Purposes:
 • controlling resource usage (for example, gateways to
  PSTN from which someone could place a very expensive
  international call)
 • real-time
 • fraud detection
 • pre-paid subscriptions
 • off-line
 • monthly/quarterly billing
 • deriving usage patterns Þ planning upgrades (resource
  dimensioning) , input for fraud
 detection, …
 Resources to account for:
    VoIP: introduction
 Open Settlement Protocol (OSP)
 (mostly) off-line settlement between operators based on Call
    Detail Records
   Open Settlement Protocol developed as part of ETSI project
    TIPHON (Telecommunications and Internet Protocol
    Harmonization Over Networks) [200]
   Based on exchange of Extensible Markup Language (XML)
    messages via HTTP
   <!DOCTYPE Message [
   <!ELEMENT Message(( PricingIndication | PricingConfirmation|
    AuthorisationRequest | AuthorisationResponse |
    AuthorisationIndication | AuthorisationConfirmation|
    UsageIndication | UsageConfirmation | ReauthorisationRequest |
    ReauthorisationResponse )+ ) >
   ... ]>
   • Over provision!
  VoIP: introduction
 Achieving QoS
 • If this fails, then use TOS field or Diffserv
 • Much of the problem is on the access network - hence
  TOS or Diffserv even only on these links may be enough
 • If this fails, then use RSVP
 • Much more complex - especially when done over several
  operator‟s domains
  VoIP: introduction
Some measured delays
 Actual performance of SIP phone to SIP phone and
  software applications over a
 LAN, shows that the performance of SIP phones is well
  within acceptable delay.
   VoIP: introduction
 Actual performance of SIP phone to SIP phone and software
  applications over a
 LAN, shows that the performance of SIP phones is well within
  acceptable delay.
VoIP: introduction
                      Some statistics from Qwest
                       for POP to POP
                       measurements1
    VoIP: introduction
 Voice Quality
 • Mean Opinion Score (MOS)- defined in ITU-T P.800
 • ITU test based on using 40 or more people from
    different ethnic or language backgrounds listening to
    audio samples of several seconds each
   • Human listeners rating the quality from 1 to 5; 5 being
    perfect, 4 “toll-quality”, …
   • Perceptual Speech Quality Measurement (PSQM) -
    ITU-T P.861
   • A computer algorithm - so it is easy to automate
   • scale of 0 to 6.5, with 0 being perfect
   • Designed for testing codecs
   • test tools from Agilent[204], QEmpirix, Finisar, … -
    cost US$50k and up
   • PSQM+
   • Developed by Opticom
    VoIP: introduction
 Rating voice quality in practice
 One approach is to occasionally ask IP phone users to
    indicate how the quality of
   their call was at the end of the call Þ MOS scoring!
   Another is exemplified by Susan Knott, global
    network architecture for
   PricewaterhoursCoopers:
   “But I‟ve found that if my vice president of finance
    can talk to my CIO [over a VoIP connection],
   and they both say the quality of the connection is
    OK, then I say that‟s good enough.”
   Phil Hochmuth, “Quality question remains for VoIP”,
    NetworkWorld, Vol. 19, Number 40, October 7,
    2002, pp. 1 and 71, quote is from page 71.
    VoIP: introduction
 QoS Proprietary vs. Standards based
 Agere Systems, Inc. VoIP “Phone-On-A-Chip” used a proprietary
    voice packet
   prioritization scheme called Ethernet Quality of Service using
    BlackBurst
   (EQuB), an algorithm (implemented in hardware) ensures that voice
    packets are given the highest priority in their collision domain.
   2002
   Their Phone-On-A-Chip solution now implements a software-based
    IEEE 802.1q tagging protocol (i.e. Virtual local area network
    (VLAN) tagging) for outgoing
   Ethernet frames.1
  VoIP: introduction
 QoS for SIP
 SDP can be used to convey conditions which must be met:
 • direction for QoS support: send, receive, or
  bidirectional
 • along with a “strength” parameter: optional or mandatory
 If conditions can be met then a COMET is sent. See also
  [217].
    VoIP: introduction
 VoIP traffic and Congestion Control
 RFC 3714: IAB Concerns Regarding Congestion Control
    for Voice Traffic in the
   Internet [209] - describes the concerns of the IAB due
    to the persistance of VoIP
   clients which continue to send RTP streams despite high
    packet loss rates WRT1:
   • the risks of congestion collapse (along the end-to-end
    route) and
   • fairness for congestion-controlled TCP traffic sharing
    the links.
   When a steady-state packet drop rate >> a specified
    drop rate the flow should be terminated or suspended.
    Thus:
   • RFC3551: RTP Profile for Audio and Video
    Conferences with Minimal Control - should be changed to
    VoIP: introduction
 Delay and Packet Loss effects
 Effect of delay and packet loss on VoIP when using FEC has
    been studied by many researchers [211], [212], [213], [214].
   A rule of thumb: When the packet loss rate exceeds 20%,
    the audio quality of VoIP
   is degraded beyond usefulness (cited as [S03] in [209]).
   Normally in telephony, when the quality falls below a certain
    level users give up
   (i.e., they hang up). Does this occur in the absence of a cost
    associated with not hanging up?
   \ according to [209]:
   if loss rate is persistently unacceptably high relative to the
    current sending rate
   & the best-effort application is unable to lower its sending
    rate:
   Þ flow must discontinue:
    VoIP: introduction
 When to continue (try again)
 Probabilistic Congestion Control (PCC) [215] based on:
 • calculating a probability for the two possible states
    (on/off) so that the expected average rate of the flow
    is TCP-friendly
   • to perform a random experiment that succeeds with
    the above
   probability to determine the new state of the non-
    adaptable flow, and
   • repeat the previous steps frequently to account for
    changes in network conditions.
   The off periods need to be fairly distributed among
    users and the on period need
   to be long enough to be useful.
   When to try again is determined by: Probing the
    network while in the off state
  VoIP: introduction
 VoIP quality over IEEE 802.11b
 Two exjobb reports:
 Juan Carlos Martín Severiano, “IEEE 802.11b MAC layer‟s
  influence on VoIP
 quality: Measurements and Analysis”[219]
 Victor Yuri Diogo Nunes, “VoIP quality aspects in 802.11b
  networks” [220]
  VoIP: introduction
 Application Policy Server (APS)
 Gross, et al. proposed the use of an Application Policy
  Server (APS) [203]
 Customer Application Policy Server
 ISP or enterprise policy domain
    VoIP: introduction
 Non-voice Services and IP Phones
 Phone Services: built using scripts which the IP phone
    executes to acquire information and display it
   For example, some of the Cisco IP telephones (7940
    and 7960) have a web browser which understands XML
    and a 133x65 pixel-based LCD display to display output.
   Sample services:
   • Conference room scheduler
   • E-mail and voice-mail messages list
   • Daily and weekly schedule and appointments
   • Personal address book entries (Þ any phone can
    become “your” phone)
   • Weather reports, Stock information, Company news,
    Flight status, Transit schedules, …
   • Viewing images from remote camera (for security,
    for a remote receptionist, …)
  VoIP: introduction
 XML
 XML objects include: CiscoIPPhoneMenu,
  CiscoIPPhoneText, CiscoIPPhoneInput,
  CiscoIPPhoneDirectory, CiscoIPPhoneImage,
 CiscoIPPhoneGraphicMenu, CiscoIPPhoneIconMenu,
  CiscoIPPhoneExecute, CiscoIPPhoneError and
  CiscoIPPhoneResponse.
 Cisco IP Phone Services Software Developer‟s Kit:
 http://cisco.com/warp/public/570/avvid/voice_ip/cm_xm
  l/index.html
   VoIP: introduction
 Invoking RTP streams
 On the Cisco phones it is possible to invoke RTP streaming
  (transmit or receive)
 via URIs in above services. RTP information for the stream types
  must be of the form:
  VoIP: introduction
 Network Appliances
    VoIP: introduction
   VOCAL System Overview
       VoIP: introduction
     VOCAL Servers
•   Marshal server (MS)
•   User Agent (UA) Marshal server
–    interface to/from IP phones connected to this network
–    can do different types of authentication on a per-user basis
•   (PSTN) Gateway Marshal servers
–    provides interworking with PSTN
•   Internet Marshal server
–    interface to/from a SIP proxy server on another IP network
–    authenticate calls via Open Settlement Protocol (OSP)
–    can request QoS via Common Open Policy Service (COPS)
•   Conference Bridge Marshal server
–    interface to/from third party conference servers
•   Feature server (FS)- to provide advanced telephony services
•   Redirect server (RS) - keep track of registered users and provide routing to/from them
•   Provisioning server (PS) - for configuration
•   Call Detail Record (CDR) server - stores start/end information about calls for billing and other purposes
  VoIP: introduction
 Scaling of a VOCAL
  system
 From table 3-1 of
  Practical VoIP: Using
  VOCAL
 • Note that unlike a PBX
  or Public Exchange, the
  capacity in calls per second
  (or BHCA) is independent
  of the call durations, since
  the call traffic is carried
  directly between the
  endpoints via RTP and does
  not use the VOCAL
  system!
VoIP: introduction
    VoIP: introduction
 For comparison with a PBX
 • NEC‟s PBX: EAX2400 IMX - Integrated Multimedia
    eXchange, model ICS IMGdxh uses a
   Pentium control process and the claimed1 BHCA is
    25,600.
   • Tekelec‟s softswitch2 "VXiTM Media Gateway
    Controller" claims3 a capacity which scales
   from 250,000 to over 1 million BHCA - a Class 5
    exchange.
   • Lucent‟s 5E-XC™ Switch High Capacity Switch -
    supports 4 million BHCA, 250K trunks, and
   99.9999% availability [238]
   • Frank D. Ohrtman Jr. says that a Class 4 Softswitch
    should handle 800,000 BHCA, support
   100,000 DS0s (i.e., 100K 64 bps channels), with a
    reliability of 99.999%, and MOS of 4.0
    VoIP: introduction
 Marshal server (MS)
 A SIP proxy server which provides:
 • authentication of users
 • generates call detail records (CDRs)
 • provides a entry point for SIP messages into the
    VOCAL system
   • thus the other elements of the VOCAL system don‟t
    need to authenticate each message
   • monitor heart beats - can uses this for load balancing
    across RSs
   • SIP transaction stateful, but not call (dialog) stateful
   Allows better scaling, since these servers can be
    replicated as needed; while allowing the redirect server to
    focus just on keeping registration information.
    VoIP: introduction
 Redirect Server (RS)
 • receives SIP REGISTER messages from User Agents
    (UAs)
   • keeps track of registered users and their locations (i.e.,
    registrations)
   • provides routing information for SIP INVITE messages
   • based on caller, callee, and registration information (for
    either or both parties)
   • based on where the INVITE message has already been
   • Supports redundancy
   • Utilizes multicast heartbeat
   – starts by listening for 2s for another RS
   – if found, then it synchronizes with this RS and will act as
    a redundant backup RS (follow-
   ing synchronization)
    VoIP: introduction
 Feature Server (FS)
 • Implements Call Forward, Call Screening, Call
    Blocking
   • The “Core Features” are implmented “within the
    network”
   – for example, you can‟t implement features in
    aphone which is not there!
   – you can‟t give an end system the caller‟s ID, but
    guarantee that they don’t display it, …
   • Execute arbitrary Call Processing Language (CPL)
    scripts written by users
   • CPL is parsed into eXtensible Markup Language
    (XML) document object model (DOM)
   trees, these are then turned into state machines (in
    C++), then executed.
    VoIP: introduction
 Residential Gateway (RG)
 A residential gateway (RG) provides “… Internet access
    throughout the home and remote management of common
    household appliances such as lights, security
   systems, utility meters, air conditioners, and entertainment
    systems.”1
   Open Services Gateway Initiative (OSGi™) Alliance
    http://www.osgi.org/ is attempting to define a standard
    framework and API for network delivery of managed services to
    local networks and devices.
   An alternative to using a residential gateway to attach analog
    phones are devices
   such as the Cisco Analog Telephone Adaptor (ATA) 186 [239].
   In VOCAL: “SIP Residential Gateway is an IP Telephony gateway
    based on SIP
   which allows a SIP user agent to make/receive SIP call to/from
    VoIP: introduction
 SIP Express Router (SER)
 http://www.iptel.org/ser/
 An open-source
    implementation which can
    act as SIP registrar, proxy
    or redirect server. SER
    features:
   • an application-server
    interface,
   • presence support,
   • SMS gateway,
   • SIMPLE2Jabber
    gateway,
   • RADIUS/syslog
    accounting and
    authorization,
    VoIP: introduction
 SipFoundry
 http://www.sipfoundry.org/
 Formed on March 29, 2004 - goal improving and adopting
    open source projects related to SIP
   Pingtel Corp. contributed their sipX family of projects
    (distributed under the
   LGPL). This includes:
   sipXphone           SIP soft phone
   sipXproxy           pair of applications that together form
    a configurable SIP router. sipXregistry          SIP
    Registry/Redirect server
   sipXpublisher       server to handle SIP
    SUBSCRIBE/NOTIFY handling + flexible plugin
    architecture for different event types.
   sipXvxml           VXML scripting engine supporting
    creation of IVR and other VXML applications (including
    VoIP: introduction
 • JAIN-SIP Proxy
 Other SIP Proxies
 • http://snad.ncsl.nist.gov/proj/iptel/
 • JAIN-SIP proxy, JAIN-SIP IM client, SIP communicator, SIP
    trace viewer, JAIN-SIP gateway,
   JAIN-SIP 3PCC, …
   • SaRP SIP and RTP Proxy
   • http://sarp.sourceforge.net/
   • written in Perl
   • sipd SIP Proxy
   • http://www.sxdesign.com/index.php?page=developer&submnu=sipd
   • Siproxd SIP and RTP Proxy
   • http://sourceforge.net/projects/siproxd/
   • an proxy/masquerading daemon for the SIP protocol
   • partysip
    VoIP: introduction
 • Callflow
 SIP Tools
 • http://callflow.sourceforge.net/
 • Generates SIP call flow diagrams based on an ethereal
    capture file
   • SIPbomber
   • http://www.metalinkltd.com/downloads.php
   • a SIP proxy testing tool for server implementations (i.e.,
    proxies, user agent servers, redirect
   servers, and registrars)
   • Sipsak
   • http://sipsak.berlios.de/
   • sipsak a comand line tool for developers and administrators
    of SIP applications
   • PROTOS Test-Suite
    VoIP: introduction
 • kphone
 SIP Clients
 • http://www.wirlab.net/kphone/
 • IPv4 and IPv6 UA for Linux, also supports Presence and
    Instant Messaging
   • UA for Linux - for KDE
   • Linphone
   • http://www.linphone.org/?lang=us&rubrique=1
   • UA for Linux - for GNOME
   • Xten‟s X-Lite - free demo version for Windows and Linux
   • http://www.xten.net/
   • The also have a "Business-class SIP Softphone" called X-
    PRO
   • Xten‟s ineen http://www.ineen.com/index.html
   • Instant Messenger & Buddy List
    VoIP: introduction
 • Do Not Disturb
 • Open Standards
    Compliant
   • Zultys Technologies SIP
    Soft Phone
   •
    http://www.lipz4.com/lipz4
    .htm
   • for linux
   • …
         Comparison of H.323 and SIP

 Evolution
      H.323 evolved from the Telecommunications Community
      SIP evolved from Internet Community
 Protocols
      Differences is in the signaling and control procedures
      Off-the-record: SIP is equivalent to H.225 and RAS of H.323
 Feature sets
      Functionality
      Quality of Service
      Manageability
      Scalability
      Flexibility
      Interoperability
      Ease of Implementation
 PARTIE III
QoS DANS LES
 RESEAUX IP
        Au-delà du service Best Effort
 6.1 Multimedia           6.5 Beyond Best Effort
  Networking Applications  6.6 Scheduling and
 6.2 Streaming stored      Policing Mechanisms
  audio and video
                           6.7 Integrated Services
    RTSP
 6.3 Real-time,           6.8 RSVP
  Interactive Multimedia:  6.9 Differentiated
  Internet Phone Case       Services
  Study
 6.4 Protocols for Real-
  Time Interactive
  Applications
      RTP,RTCP
      SIP
Improving QOS in IP Networks
Thus far: “making the best of best effort”
Future: next generation Internet with QoS guarantees
    RSVP: signaling for resource reservations
    Differentiated Services: differential guarantees
    Integrated Services: firm guarantees
 simple model
  for sharing and
  congestion
  studies:
Principles for QOS Guarantees

 Example: 1MbpsI P phone, FTP share 1.5 Mbps link.
    bursts of FTP can congest router, cause audio loss
    want to give priority to audio over FTP




    Principle 1
    packet marking needed for router to distinguish
    between different classes; and new router policy
    to treat packets accordingly
Principles for QOS Guarantees (more)
 what if applications misbehave (audio sends higher
   than declared rate)
       policing: force source adherence to bandwidth allocations
 marking and policing at network edge:
   similar to ATM UNI (User Network Interface)




Principle 2
provide protection (isolation) for one class from others
 Principles for QOS Guarantees (more)

 Allocating fixed (non-sharable) bandwidth to flow:
  inefficient use of bandwidth if flows doesn‟t use
  its allocation




       Principle 3
     While providing isolation, it is desirable to use
     resources as efficiently as possible
Principles for QOS Guarantees (more)

   Basic fact of life: can not support traffic demands
    beyond link capacity




     Principle 4
    Call Admission: flow declares its needs, network may
    block call (e.g., busy signal) if it cannot meet needs
Summary of QoS Principles




 Let‟s next look at mechanisms for achieving this ….
Quality of
 Service
 Support
            QOS in IP Networks
 IETF groups are working on proposals to provide
  QOS control in IP networks, i.e., going beyond
  best effort to provide some assurance for QOS
 Work in Progress includes RSVP, Differentiated
  Services, and Integrated Services
 Simple model
  for sharing and
  congestion
  studies:
       Principles for QOS Guarantees

 Consider a phone application at 1Mbps and an FTP
  application sharing a 1.5 Mbps link.
      bursts of FTP can congest the router and cause audio packets
       to be dropped.
      want to give priority to audio over FTP
 PRINCIPLE 1: Marking of packets is needed for
  router to distinguish between different classes; and
  new router policy to treat packets accordingly
 Principles for QOS Guarantees (more)

 Applications misbehave (audio sends packets at a rate higher
  than 1Mbps assumed above);
 PRINCIPLE 2: provide protection (isolation) for one class
  from other classes
 Require Policing Mechanisms to ensure sources adhere to
  bandwidth requirements; Marking and Policing need to be
  done at the edges:
 Principles for QOS Guarantees (more)

 Alternative to Marking and Policing: allocate a set
  portion of bandwidth to each application flow; can
  lead to inefficient use of bandwidth if one of the
  flows does not use its allocation
 PRINCIPLE 3: While providing isolation, it is
  desirable to use resources as efficiently as
  possible
 Principles for QOS Guarantees (more)

 Cannot support traffic beyond link capacity
    Two phone calls each requests 1 Mbps

 PRINCIPLE 4: Need a Call Admission Process;
  application flow declares its needs, network may
  block call if it cannot satisfy the needs
Summary
  Scheduling And Policing Mechanisms

 Scheduling: choosing the next packet for transmission
    FIFO
    Priority Queue
    Round Robin
    Weighted Fair Queuing

 We had a lecture on that!
               Discussion of RED

 Advantages
 Early drop
    TCP congestion

 Fairness in drops
    Bursty versus non-Bursy

 Disadvantages
    Many additional parameters
    Increasing the loss
                Policing Mechanisms

 (Long term) Average Rate
    100 packets per sec or 6000 packets per min??
       • crucial aspect is the interval length
 Peak Rate:
    e.g., 6000 p p minute Avg and 1500 p p sec Peak

 (Max.) Burst Size:
    Max. number of packets sent consecutively, ie over a
     short period of time
 Units of measurement
    Packets versus bits
                Policing Mechanisms
 Token Bucket mechanism, provides a means for
  limiting input to specified Burst Size and Average
  Rate.
 Bucket can hold b tokens;
 tokens are generated at a rate of r token/sec
      unless bucket is full of tokens.
 Over an interval of length t, the number of
  packets that are admitted is less than or equal to
  (r t + b).
              Token bucket example
arrival queue   bucket   sent

                                 parameters:
p1 (5)   -      0        -
                                 b=5
p2 (2)   p1     3        -       r=3

p3 (1)   p2     1        p1

                1        p3,p2

                4

                5
            Integrated Services

 An architecture for providing QOS guarantees in
  IP networks for individual application sessions
 relies on resource reservation, and routers need
  to maintain state info (Virtual Circuit??),
  maintaining records of allocated resources and
  responding
  to new Call
  setup
   requests
  on that
  basis
                  Call Admission

 Session must first declare its QOS requirement
    and characterize the traffic it will send through
    the network
   R-spec: defines the QOS being requested
   T-spec: defines the traffic characteristics
   A signaling protocol is needed to carry the R-spec
    and T-spec to the routers where reservation is
    required;
   RSVP is a leading candidate for such signaling
    protocol
        RSVP request (T-Spec)

 A token bucket specification
    bucket size, b
    token rate, r
    the packet is transmitted onward only if the number of
     tokens in the bucket is at least as large as the packet
 peak rate, p
    p>r

 maximum packet size, M
 minimum policed unit, m
   All packets less than m bytes are considered to be m bytes
   Reduces the overhead to process each packet
   Bound the bandwidth overhead of link-level headers
                 Call Admission

 Call Admission: routers will admit calls based on
  their R-spec and T-spec and base on the current
  resource allocated at the routers to other calls.
       Integrated Services: Classes

 Guaranteed QOS: this class is provided with firm
  bounds on queuing delay at a router; envisioned for
  hard real-time applications that are highly
  sensitive to end-to-end delay expectation and
  variance
 Controlled Load: this class is provided a QOS
  closely approximating that provided by an unloaded
  router; envisioned for today‟s IP network real-
  time applications which perform well in an
  unloaded network
                            R-spec

 An indication of the QoS control service
  requested
      Controlled-load service and Guaranteed service
 For Controlled-load service
    Simply a Tspec

 For Guaranteed service
    A Rate (R) term, the bandwidth required
        • R  r, extra bandwidth will reduce queuing delays
      A Slack (S) term
        • The difference between the desired delay and the delay
          that would be achieved if rate R were used
        • With a zero slack term, each router along the path must
          reserve R bandwidth
        • A nonzero slack term offers the individual routers greater
          flexibility in making their local reservation
        • Number decreased by routers on the path.
QoS Routing: Multiple constraints

 A request specifies the desired QoS requirements
    e.g., BW, Delay, Jitter, packet loss, path reliability etc

 Two type of constraints:
    Additive: e.g., delay
    Maximum (or Minimum): e.g., Bandwidth

 Task
    Find a (min cost) path which satisfies the constraints
    if no feasible path found, reject the connection
                   Example of QoS Routing




        A
                                              B




Constraints: Delay (D) < 25, Available Bandwidth (BW) > 30
          Differentiated Services

 Intended to address the following difficulties
  with Intserv and RSVP;
 Scalability: maintaining states by routers in high
  speed networks is difficult sue to the very large
  number of flows
 Flexible Service Models: Intserv has only two
  classes, want to provide more qualitative service
  classes; want to provide „relative‟ service
  distinction (Platinum, Gold, Silver, …)
 Simpler signaling: (than RSVP) many applications
  and users may only want to specify a more
  qualitative notion of service
            Differentiated Services

 Approach:
    Only simple functions in the core, and relatively complex
     functions at edge routers (or hosts)
    Do not define service classes, instead provides functional
     components with which service classes can be built
    Edge Functions at DiffServ (DS)

 At DS-capable host or first DS-capable router
 Classification: edge node marks packets according
  to classification rules to be specified (manually by
  admin, or by some TBD protocol)
 Traffic Conditioning: edge node may delay and
  then forward or may discard
                Core Functions

 Forwarding: according to “Per-Hop-Behavior” or
  PHB specified for the particular packet class; such
  PHB is strictly based on class marking (no other
  header fields can be used to influence PHB)

 BIG ADVANTAGE:
     No state info to be maintained by routers!
      Classification and Conditioning

 Packet is marked in the Type of Service (TOS) in
  IPv4, and Traffic Class in IPv6
 6 bits used for Differentiated Service Code Point
  (DSCP) and determine PHB that the packet will
  receive
 2 bits are currently unused
      Classification and Conditioning

 It may be desirable to limit traffic injection rate
  of some class; user declares traffic profile (eg,
  rate and burst size); traffic is metered and
  shaped if non-conforming
                  Forwarding (PHB)

 PHB result in a different observable (measurable)
  forwarding performance behavior
 PHB does not specify what mechanisms to use to
  ensure required PHB performance behavior
 Examples:
      Class A gets x% of outgoing link bandwidth over time
       intervals of a specified length
      Class A packets leave first before packets from class B
                  Forwarding (PHB)

 PHBs under consideration:
    Expedited Forwarding: departure rate of packets from a
     class equals or exceeds a specified rate (logical link with
     a minimum guaranteed rate)
    Assured Forwarding: 4 classes, each guaranteed a
     minimum amount of bandwidth and buffering; each with
     three drop preference partitions
     Differentiated Services Issues

 AF and EF are not even in a standard track yet…
  research ongoing
 “Virtual Leased lines” and “Olympic” services are
  being discussed
 Impact of crossing multiple ASs and routers that
  are not DS-capable
                DiffServ Routers

DiffServ
Edge
Router
           Classifier   Marker       Meter       Policer




DiffServ                    PHB
           Select PHB        PHB           Local
Core                          PHB
                               PHB       conditions
Router
                  Extract              Packet
                  DSCP               treatment
          IntServ vs. DiffServ



            IntServ
            network
                               DiffServ
                               network




"Call blocking"        "Prioritization"
approach               approach
Comparison of Intserv & Diffserv
         Architectures
Comparison of Intserv & Diffserv
         Architectures
 Diffserv
Theoretical
  Model
           Basic Theoretical Model
 Single FIFO queue.
 Bounded capacity: holds up to B packets
    All packets have same size

 Packet Arrival: arbitrary
 Packet Send: 1 packet/time unit
 Actions:
    Non-Preemptive model: accept or reject
    Preemptive model: also preempt



                              FIFO
                     Packet Values

 Goal:
    Each packet has an intrinsic value
    maximize the total value of packet sent!

 Cheap and expensive packets (two values):
    low value of 1 and high value of 

 Continuous packet values
      any value in [1,]
           Competitive Analysis

 Analysis for online algorithms

                   algorithm       decisions
      packets


 For a given sequence S: VA(S) / Vopt(S)
 Competitive Ratio: MINS {VA(S) / Vopt(S)}
 Worse case guarantee
           Non-Preemptive Policies


 Fixed Partition(x)
    At most xB low value and (1-x)B high value.

 Flexible Partition (x)
    At most xB low value and any high value.

 Round Robin(x):
    Like fixed partition.
    send x low and (1-x) high [fractional!]
    Simulate it using FIFO queue.
           Implementing Round Robin

 Implementation:
    Maintain two variables:
        • high
        • low
      If low packet arrives tests low +1 < xB
        • IF YES ACCEPT
        • IF NO REJECT
      High packets the same
      Sending:
        • low = low –x
        • high = high – (1-x)
 Main observation:
   once a packet is accepted it will be sent eventually.
   Sending order not important!
             Analysis of Round Robin

 Consider the case that all packet values are 1.
 Claim:
    For any input sequence
    The number of packet a buffer of size B/2 accepts
    is at least half of a buffer of size B

 Let x= ½
 Consider Low and High packets separately
 RR(½) :
    Accepts at least half High and half Low
    Benefit at least half
              Preemptive Policies


 Greedy:
    Always accept if the buffer is not full
    Preempt a low value packet to accept a high one
    COMPETITIVE RATIO 2

 -Preemptive:
    Drop from the head packets with total value /
    Active queue management (AQM)
       Preemptive Model: 1/2 -Preemptive
   We consider 1/2-Preemptive Policy
   There are two packet values: 1 and 
   For =9 each high value packet preempts 3 low
       value packets (pro-active preemptions)


high                      low
             1/2-Preemptive: Theorem

 Claim 1: VA(Slow) VOPT(Slow)   + 1/1/2 VOPT(Shigh)

 Claim 2: VA(Shigh)  VOPT(Shigh) + 1/1/2 VOPT(Shigh)



 Theorem:
           VA(S)  VOPT(S) + 2/1/2 VOPT(S)
                 Optimal Offline

 Process the packet in decreasing order of value.
 Accept a packet if possible.
    otherwise reject



 Two values:
    Maximizes   the number of high value packets
      • Given a buffer of size B
    Maximizes   the total number of packets
      • Using the remaining buffer space.
             Proof Outline: Claim 2

We partition the schedule to intervals:
• Intervals ends when the buffer is empty.

• Overloaded intervals: some high value packet is lost
and only high value packets are scheduled.

• Underloaded intervals: no high value packet is lost
                                                    time

                Overloaded Intervals
                    Proof (Claim 1):

 We show: VA(Slow) VOPT(Slow)  + 1/1/2 VA(Shigh)
 Low packet loss: overflow + Preemption
 Low packet lost in overflow:
      Opt also lost a packet.
 Low packet preempted by a high packet
    Value of high 
    Preempted  1/2
    Value is 1/1/2 V(high)

 Recall VA(Shigh)  VOPT(Shigh)
         Proof Outline (Claim2):

We divide the HIGH packet loss into two subsets:

• The packets lost by OPT (easy case)

• The packets scheduled by OPT
            Proof Outline (Claim 2):
Observation 1:

When some high value packet is lost the buffer is
full of high value packets



                         B

     high
         Proof Outline (Claim 2):

Observation 2:

If there are at least B/1/2 high value packets in the
buffer then the next packet to be scheduled is a high
value packet.

                     B/1/2

                                  high
         Proof Outline (Claim 2):

• Observation 1  The length of an overloaded
interval is at least B

• Observation 2  An optimal offline policy could
have scheduled at most B/1/2 additional high value
packets

• The ratio between the additional loss and the benefit
of the overloaded interval is bounded by 1/1/2

•VA(Shigh)  VOPT(Shigh) + 1/1/2 VOPT(Shigh)
        Lower bound (Non-Preemptive)

 Scenario:
    B low value packets
    [maybe] B high value packets

 Online: accepts xB low value
   Case I: only low values
        • Online xB Offline B
      Case II: Both low and high value
        • online xB + (1-x) B offline B
 Competitive ratio  /(2-1)
 For large values of  we have /(2-1)  ½
       Lower bound: Preemptive model

 Scenario:
    B low value packets
    For zB time units:
        • one high value packet arrives each time unit
      [Maybe] B high value packets
 Let zB be the time the Online sends the last low
    (1) No more packets arrive
    (2) B high value packets arrive

 Online Benefit:  (1) zB + zB (2) zB + B
 Offline Benefit: (1) B + zB (2) zB + B
 Solving for best z gives a lower bound (about 0.8)
           Fixed vs. Flexible Partition


Arrival event   time      Flexible    Fixed

B high          1         B high      B/2 high
B/2 low                               B/2 low

B/2 low         B/2       B/2 low
B/2 high                              B/2 high

B/2 low         B                     B/2 low
                       Summary of Results: Non-preemptive

                           Policies                     =          =2           =1
     Two values



                           Always accept                0            ½             1
                           Round Robin                  ½            ½             ½
                           Fixed Partition              [1/3,0.41]   [0.41,1/2]    ½
                           Flexible Partition           0.41         [0.41,0.56]   1
                           Dynamic Flexible Partition   ½            [0.53,0.56]   1
                           Impossibility                ½            2/3           1
                           optimal online               ½            2/3           1
Multiple
                  Values




                             Policies        Competitive ratio
                             cont RR         1/(2+ ln )
                             Impossibility   1/(1+ ln )
Summary of Results: Preemptive

             Multiple Values
   Policies         Competitive ratio
   Greedy           ½
   Better Than G    1/(1.98..)
   Impossibility    0.8


                   2 Values
  Policies              Competitive ratio
  1/2-Preemptive       1-2/1/2
  Impossibility         1-1/(21/2)
 Mécanismes d‟ordonnancement et de contrôle

 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 6.3 Real-time, Interactive     6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP
    SIP
Scheduling And Policing Mechanisms

 scheduling: choose next packet to send on link
 FIFO (first in first out) scheduling: send in order of
  arrival to queue
      real-world example?
      discard policy: if packet arrives to full queue: who to discard?
        • Tail drop: drop arriving packet
        • priority: drop/remove on priority basis
        • random: drop/remove randomly
Scheduling Policies: more
Priority scheduling: transmit highest priority queued
  packet
 multiple classes, with different priorities
      class may depend on marking or other header info, e.g. IP
       source/dest, port numbers, etc..
      Real world example?
       Scheduling Policies: still more
round robin scheduling:
 multiple classes
 cyclically scan class queues, serving one from each
  class (if available)
 real world example?
Scheduling Policies: still more
Weighted Fair Queuing:
 generalized Round Robin
 each class gets weighted amount of service in each
  cycle
 real-world example?
Policing Mechanisms

Goal: limit traffic to not exceed declared parameters
Three common-used criteria:
   (Long term) Average Rate: how many pkts can be sent
    per unit time (in the long run)
       crucial question: what is the interval length: 100 packets per
        sec or 6000 packets per min have same average!
   Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500
    ppm peak rate
   (Max.) Burst Size: max. number of pkts sent
    consecutively (with no intervening idle)
Policing Mechanisms
Token Bucket: limit input to specified Burst Size and Average Rate.




 bucket can hold b tokens
   tokens generated at rate r token/sec unless bucket full
   over interval of length t: number of packets admitted less than or
    equal to (r t + b).
Policing Mechanisms (more)

 token bucket, WFQ combine to provide guaranteed
  upper bound on delay, i.e., QoS guarantee!

arriving   token rate, r
traffic

           bucket size, b
                            per-flow
                            rate, R
                    WFQ

                 D = b/R
                  max
Policing Mechanisms (more)

 token bucket, WFQ combine to provide guaranteed
  upper bound on delay, i.e., QoS guarantee!

arriving   token rate, r
traffic

           bucket size, b
                            per-flow
                            rate, R
                    WFQ

                 D = b/R
                  max
    Scheduling:
Buffer Management
The setting
         Buffer Scheduling
 Who to send next?
 What happens when buffer is full?
 Who to discard?
         Requirements of scheduling

 An ideal scheduling discipline
    is easy to implement
    is fair and protective
    provides performance bounds

   Each scheduling discipline makes a different
    trade-off among these requirements
                Ease of implementation


 Scheduling discipline has to make a decision once
  every few microseconds!
 Should be implementable in a few instructions or
  hardware
      for hardware: critical constraint is VLSI space
      Complexity of enqueue + dequeue processes
 Work per packet should scale less than linearly with
  number of active connections
                    Fairness


 Intuitively
    each  connection should get no more than its
     demand
    the excess, if any, is equally shared
 But it also provides protection
    traffic hogs cannot overrun others
    automatically isolates heavy users
               Max-min Fairness:
                Single Buffer

   Allocate bandwidth equally among all users
   If anyone doesn‟t need its share, redistribute
   maximize the minimum bandwidth provided to any flow not
    receiving its request
   Ex: Compute the max-min fair allocation for a set of four
    sources with demands 2, 2.6, 4, 5 when the resource has a
    capacity of 10.
     • s1= 2;
     • s2= 2.6;
     • s3 = s4= 2.7
   More complicated in a network.
             FCFS / FIFO Queuing

 Simplest Algorithm, widely used.
 Scheduling is done using first-in first-out
  (FIFO) discipline
 All flows are fed into the same queue
              FIFO Queuing (cont’d)

 First-In First-Out (FIFO) queuing
    First Arrival, First Transmission
    Completely dependent on arrival time
    No notion of priority or allocated buffers
    No space in queue, packet discarded

      Flows can interfere with each other; No isolation;
       malicious monopolization;
      Various hacks for priority, random drops,...
                       Priority Queuing
 A priority index is assigned to each packet upon arrival
 Packets transmitted in ascending order of priority index.
       Priority 0 through n-1
       Priority 0 is always serviced first
 Priority i is serviced only if    0 through i-1 are empty
 Highest priority has the
       lowest delay,
       highest throughput,
       lowest loss
 Lower priority classes may be starved by higher priority
 Preemptive and non-preemptive versions.
                    Priority Queuing

                Packet discard
                when full
High-priority
                                                 Transmission link
packets

 Low-priority                    When
 packets                         high-priority
                Packet discard   queue empty
                when full
          Round Robin: Architecture
 Round Robin: scan class queues serving one from
  each class that has a non-empty queue
 Flow 1
                                          Transmission link


 Flow 2
                                  Round robin

 Flow 3




    Hardware requirement:
    Jump to next non-empty queue
           Round Robin Scheduling
 Round Robin: scan class queues serving one from each class
  that has a non-empty queue
              Round Robin (cont‟d)
 Characteristics:
    Classify incoming traffic into flows (source-destination
     pairs)
    Round-robin among flows

 Problems:
    Ignores packet length (GPS, Fair queuing)
    Inflexible allocation of weights (WRR,WFQ)

 Benefits:
    protection against heavy users (why?)
              Weighted Round-Robin
 Weighted round-robin
      Different weight wi (per flow)
      Flow j can sends wj packets in a period.
      Period of length  wj
 Disadvantage
      Variable packet size.
      Fair only over time scales longer than a period time.
        • If a connection has a small weight, or the number of connections is
          large, this may lead to long periods of unfairness.
                      DRR algorithm
 Choose a quantum of bits to serve from each connection in
  order.
 For each HOL (Head of Line) packet,
      if its size is <= (quantum + credit) send and save excess,
      otherwise save entire quantum.
       If no packet to send, reset counter (to remain fair)
 Each connection has a deficit counter (to store credits) with
  initial value zero.
 Easier implementation than other fair policies
      WFQ
              Deficit Round-Robin
 DRR can handle variable packet size




Quantum size : 1000 byte                  1st Round
                                               A‟s count : 1000
       2000        1000             0          B‟s count : 200 (served twice)
                                               C‟s count : 1000
                            1500    A
                                          2nd Round
                                               A‟s count : 500 (served)
                          500 300   B
                                               B‟s count : 0
                                               C‟s count : 800 (served)
                            1200    C

    Second       First         Head of
    Round       Round          Queue
                DRR: performance

 Handles variable length packets
 Backlogged source share bandwidth equally
 Preferably, packet size < Quantum
 Simple to implement
    Similar to round robin
   Generalized
Processor Sharing
        Generalized Process Sharing (GPS)

 The methodology:
    Assume we can send infinitesimal packets
        • single bit
      Perform round robin.
        • At the bit level
 Idealized policy to split bandwidth
 GPS is not implementable
 Used mainly to evaluate and compare real approaches.
 Has weights that give relative frequencies.
           GPS: Example 1




                   30         50   60




Packets of size 10, 20 & 30
arrive at time 0
        GPS: Example 2




                          40   45
        5    15     30


Packets: time 0 size 15
         time 5 size 20
        time 15 size 10
           GPS: Example 3




       5    15    30      45   60

Packets: time 0   size   15
         time 5   size   20
        time 15   size   10
        time 18   size   15
            GPS : Adding weights

 Flow j has weight wj
 The output rate of flow j, Rj(t) obeys:




 For the un-weighted case (wj=1):
            Fairness using GPS
 Non-backlogged connections, receive what they ask
  for.
 Backlogged connections share the remaining
  bandwidth in proportion to the assigned weights.
 Every backlogged connection i, receives a service
  rate of :




                             Active(t): the set of
                             backlogged flows at time t
            GPS: Measuring unfairness

   No packet discipline can be as fair as GPS
      while a packet is being served, we are unfair to others
   Degree of unfairness can be bounded
   Define: workA (i,a,b) = # bits transmitted for flow i in time [a,b] by
    policy A.
   Absolute fairness bound for policy S
      Max (workGPS(i,a,b) - workS(i, a,b))
   Relative fairness bound for policy S
      Max (workS(i,a,b) - workS(j,a,b))
     assuming both i and j are backlogged in [a,b]
          GPS: Measuring unfairness

 Assume fixed packet size and   round robin
 Relative bound: 1
 Absolute bound: < 1
 Challenge: handle variable size packets.
Weighted Fair
 Queueing
                 GPS to WFQ

 We can’t implement GPS
 So, lets see how to emulate it
 We want to be as fair as possible
 But also have an efficient implementation
              GPS vs WFQ (equal length)
   Queue 1                                                       GPS:both packets
   @ t=0                                                         served at rate 1/2

    Queue 2                   1
                                                                 Both packets complete
    @ t=0
                                                                 service at t=2

                                                             t
                                  0           1         2


Packet from                                       Packet-by-packet system (WFQ):
queue 2 waiting                                   queue 1 served first at rate 1;
                                                  then queue 2 served at rate 1.
                  1
                                                  Packet from queue 2
  Packet from                                     being served
  queue 1 being
  served                                  t
                      0   1           2
          GPS vs WFQ (different length)

                                 2                            GPS: both packets served
Queue 1
                                                              at rate 1/2
@ t=0

Queue 2                          1
                                                              Packet from queue
@ t=0
                                                              2 served at rate 1

                                                          t
                                     0       2   3



      Packet from                                        queue 2 served at rate 1
      queue 2 waiting
                         1
          Packet from
          queue 1 being
          served at rate 1                           t
                             0       1   2   3
                            GPS vs WFQ
           Queue 1                                            GPS: packet from queue 1
           @ t=0                                              served at rate 1/4;

           Queue 2
                                 1
           @ t=0                                               Packet from queue 1
 Weight:                                                       served at rate 1
           Packet from queue 2
Queue 1=1                                                     t
           served at rate 3/4
Queue 2 =3                           0           1       2


   Packet from
   queue 1 waiting                               WFQ: queue 2 served first at rate 1;
                                                 then queue 1 served at rate 1.
                 1
    Packet from                                  Packet from queue 1
    queue 2 being                                being served
    served                                   t
                     0      1            2
                  Completion times

 Emulating a policy:
    Assign each packet p a value time(p).
    Send packets in order of time(p).

 FIFO:
    Arrival of a packet p from flow j:
     last = last + size(p);
     time(p)=last;
    perfect emulation...
              Round Robin Emulation

 Round Robin (equal size packets)
    Arrival of packet p from flow j:
    last(j) = last(j)+ 1;
    time(p)=last(j);

 Idle queue not handle properly!!!
    Sending packet q: round = time(q)
    Arrival: last(j) = max{round,last(j)}+ 1
    time(p)=last(j);

 What kind of low level scheduling?
             Round Robin Emulation

 Round Robin (equal size packets)
    Sending packet q:
    round = time(q); flow_num = flow(q);
    Arrival:
    last(j) = max{round,last(j)}
    IF (j < flow_num) & (last(j)=round)
      THEN last(j)=last(j)+1
    time(p)=last(j);

 What kind of low level scheduling?
               GPS emulation (WFQ)

 Arrival of p from flow j:
    last(j)= max{last(j), round} + size(p);
    using weights:
   last(j)= max{last(j), round} + size(p)/w j;
 How should we compute the round?
   We like to simulate GPS:
   round(t+x) = round(t) + x/B(t)
   B(t) = active flows

 A flow j is active while round(t) < last(j)
         WFQ: Example (equal size)
Time 0: packets arrive to flow 1 & 2.
last(1)= 1; last(2)= 1; Active = 2
round (0) =0; send 1
Time 1: A packet arrives to flow 3
round(1) = 1/2; Active = 3
last(3) = 3/2; send 2
 Time 2: A packet arrives to flow 4.
 round(2) = 5/6; Active = 4
 last(4) = 11/6; send 3
Time   2+2/3:   round   =   1; Active = 2
Time   3 :      round   =   7/6 ; send 4;
Time   3+2/3:   round   =   3/2; Active = 1
Time   4 :      round   =   11/6 ; Active=0
Worst Case Fair
 Weighted Fair
Queuing (WF 2Q)
  Worst Case Fair Weighted Fair Queuing (WF2Q)


 WF2Q fixes an unfairness problem in WFQ.
   WFQ: among packets waiting in the system, pick one that
    will finish service first under GPS
   WF2Q: among packets waiting in the system, that have
    started service under GPS, select one that will finish
    service first GPS
 WF2Q provides service closer to GPS
   difference in packet service time bounded by max. packet
    size.
Multiple Buffers
         Buffers

                       Fabric


                Buffer locations
 Input ports
 Output ports
 Inside fabric
 Shared Memory
 Combination of all
         Input Queuing




                                  Outputs
Inputs




                         fabric
            Input Buffer : properties

•   Input speed of queue – no more than input line
•   Need arbiter (running N times faster than input)
•   FIFO queue
•   Head Of Line (HOL) blocking .
•   Utilization:
    • Random destination
    • 1- 1/e = 59% utilization
    • due to HOL blocking
Head of Line Blocking
      Overcoming HOL blocking:
            look-ahead

 The fabric looks ahead into the input buffer for
  packets that may be transferred if they were not
  blocked by the head of line.
 Improvement depends on the depth of the look
  ahead.
 This corresponds to virtual output queues where
  each input port has buffer for each output port.
 Input Queuing
Virtual output queues
         Overcoming HOL blocking:
             output expansion

   Each output port is expanded to L output ports
   The fabric can transfer up to L packets to the
    same output instead of one cell.




Karol and Morgan,
IEEE transaction on communication, 1987: 1347-1356
Input Queuing
Output Expansion


                     L




            fabric
     Output Queuing
        The “ideal”

                          1
 2                            1
1                     2
                      2 1


 2                    2
1
1




 2
1
         Output Buffer : properties


 No HOL problem
 Output queue needs to run faster than input lines
 Need to provide for N packets arriving to same queue
 solution: limit the number of input lines that can be destined to
  the output.
              Shared Memory


     FABRIC




                                 FABRIC
                MEMORY


a common pool of buffers divided into
linked lists indexed by output port number
         Shared Memory: properties



• Packets stored in memory as they arrive
• Resource sharing
• Easy to implement priorities
• Memory is accessed at speed equal to sum of
  the input or output speeds
• How to divide the space between the sessions
                   Services intégrés

 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 6.3 Real-time, Interactive     6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP
    SIP
IETF Integrated Services

 architecture for providing QOS guarantees in IP
  networks for individual application sessions
 resource reservation: routers maintain state info
  (a la VC) of allocated resources, QoS req‟s
 admit/deny new call setup requests:



  Question: can newly arriving flow be admitted
  with performance guarantees while not violated
  QoS guarantees made to already admitted flows?
Intserv: QoS guarantee scenario
                    Resource reservation
                       call setup, signaling (RSVP)
                       traffic, QoS declaration
                       per-element admission control




                                 request/
                                   reply

                 QoS-sensitive
                 scheduling (e.g.,
                     WFQ)
                 Call Admission

Arriving session must :
 declare its QOS requirement
    R-spec:  defines the QOS being requested
 characterize traffic it will send into network
    T-spec: defines traffic characteristics
 signaling protocol: needed to carry R-spec and T-
  spec to routers (where reservation is required)
    RSVP
         Intserv QoS: Service models [rfc2211, rfc 2212]

Guaranteed service:                    Controlled load service:
 worst case traffic arrival: leaky-    "a quality of service closely
  bucket-policed source                   approximating the QoS that
 simple (mathematically provable)        same flow would receive from an
  bound on delay [Parekh 1992, Cruz       unloaded network element."
  1988]


   arriving     token rate, r
   traffic

                bucket size, b
                                 per-flow
                                 rate, R
                          WFQ

                       D = b/R
                        max
                           RSVP
 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 6.3 Real-time, Interactive     6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP
    SIP
         RSVP
 rien
            Signaling in the Internet

    connectionless                               no network
      (stateless)          best effort       signaling protocols
   forwarding by IP
                       +     service     =       in initial IP
        routers                                    design


 New requirement: reserve resources along end-to-end
  path (end system, routers) for QoS for multimedia
  applications
 RSVP: Resource Reservation Protocol [RFC 2205]
      “ … allow users to communicate requirements to network in
       robust and efficient way.” i.e., signaling !
 earlier Internet Signaling protocol: ST-II [RFC 1819]
               RSVP Design Goals

1.   accommodate heterogeneous receivers (different
     bandwidth along paths)
2.   accommodate different applications with different
     resource requirements
3.   make multicast a first class service, with adaptation
     to multicast group membership
4.   leverage existing multicast/unicast routing, with
     adaptation to changes in underlying unicast,
     multicast routes
5.   control protocol overhead to grow (at worst) linear
     in # receivers
6.   modular design for heterogeneous underlying
     technologies
                 RSVP: does not…

 specify how resources are to be reserved
      rather: a mechanism for communicating needs
 determine routes packets will take
      that‟s the job of routing protocols
      signaling decoupled from routing
 interact with forwarding of packets
      separation of control (signaling) and data
       (forwarding) planes
Signaling in the Internet

    connectionless                               no network
      (stateless)          best effort       signaling protocols
   forwarding by IP
                       +     service     =       in initial IP
        routers                                    design


 New requirement: reserve resources along end-to-end
  path (end system, routers) for QoS for multimedia
  applications
 RSVP: Resource Reservation Protocol [RFC 2205]
      “ … allow users to communicate requirements to network in
       robust and efficient way.” i.e., signaling !
 earlier Internet Signaling protocol: ST-II [RFC 1819]
RSVP Design Goals

1.   accommodate heterogeneous receivers (different
     bandwidth along paths)
2.   accommodate different applications with different
     resource requirements
3.   make multicast a first class service, with adaptation
     to multicast group membership
4.   leverage existing multicast/unicast routing, with
     adaptation to changes in underlying unicast,
     multicast routes
5.   control protocol overhead to grow (at worst) linear
     in # receivers
6.   modular design for heterogeneous underlying
     technologies
RSVP: does not…

 specify how resources are to be reserved
      rather: a mechanism for communicating needs
 determine routes packets will take
      that‟s the job of routing protocols
      signaling decoupled from routing
 interact with forwarding of packets
      separation of control (signaling) and data
       (forwarding) planes
RSVP: overview of operation
 senders, receiver join a multicast group
    done outside of RSVP
    senders need not join group

 sender-to-network signaling
    path message: make sender presence known to routers
    path teardown: delete sender‟s path state from routers

 receiver-to-network signaling
    reservation message: reserve resources from sender(s) to
     receiver
    reservation teardown: remove receiver reservations

 network-to-end-system signaling
    path error
    reservation error
Path msgs: RSVP sender-to-network signaling

  path message contents:
     address:  unicast destination, or multicast group
     flowspec: bandwidth requirements spec.
     filter flag: if yes, record identities of upstream
      senders (to allow packets filtering by source)
     previous hop: upstream router/host ID
     refresh time: time until this info times out
  path message: communicates sender info, and reverse-
   path-to-sender routing info
     later upstream forwarding of receiver reservations
RSVP: simple audio conference

 H1, H2, H3, H4, H5 both senders and receivers
 multicast group m1
 no filtering: packets from any sender forwarded
 audio rate:   b
 only one multicast routing tree possible


  H2                                           H3


                    R1      R2       R3        H4
       H1

                          H5
RSVP: building up path state

 H1, …, H5 all send path messages on                     m1:
  (address=m1, Tspec=b, filter-spec=no-filter,refresh=100)
 Suppose H1 sends first path message


      m1: in L1                                   m1: in        L7
          out   L2 L6                                 out L3 L4

                          m1: in     L6
                              out L5    L7

 H2                                                                  H3
           L2                                                L3

                     R1   L6                 L7
                                    R2               R3     L4       H4
                L1                  L5
      H1

                                 H5
RSVP: building up path state

 next, H5 sends path message, creating more state
  in routers



      m1: in L1
                    L6                            m1: in        L7
          out L1 L2 L6                                out L3 L4
                                  L5 L6
                          m1: in
                              out L5 L6 L7

 H2                                                                  H3
           L2                                                L3

                     R1   L6                 L7
                                    R2               R3     L4       H4
                L1                  L5
      H1

                                 H5
RSVP: building up path state

 H2, H3, H5 send path msgs, completing path state
  tables


                                                          L3 L4 L7
      m1: in L1
                 L2 L6                            m1: in
          out L1 L2 L6                                out L3 L4 L7
                                  L5 L6 L7
                          m1: in
                              out L5 L6 L7

 H2                                                                  H3
           L2                                                L3

                     R1   L6                 L7
                                    R2               R3     L4       H4
                L1                  L5
      H1

                                 H5
reservation msgs: receiver-to-network signaling

  reservation message contents:
       desired bandwidth:
       filter type:
         • no filter: any packets address to multicast group can use
           reservation
         • fixed filter: only packets from specific set of senders can
           use reservation
         • dynamic filter: senders who‟s p[ackets can be forwarded
           across link will change (by receiver choce) over time.
       filter spec
  reservations flow upstream from receiver-to-senders,
   reserving resources, creating additional, receiver-
   related state at routers
RSVP: receiver reservation example 1

H1 wants to receive audio from all other senders
 H1 reservation msg flows uptree to sources
 H1 only reserves enough bandwidth for 1 audio stream
 reservation is of type “no filter” – any sender can use
  reserved bandwidth


H2                                                  H3
          L2                                   L3

                    R1   L6         L7
                               R2        R3   L4    H4
               L1              L5
     H1

                              H5
RSVP: receiver reservation example 1

 H1 reservation msgs flows uptree to sources
 routers, hosts reserve bandwidth b needed on
  downstream links towards H1

                         L6                                     in L3         L4   L7
  m1: in L1 L2                                            m1:
      out L1(b) L2       L6                                     out L3        L4   L7(b)

                              m1: in L5        L6    L7
                                  out L5       L6(b) L7

H2                                                                                   H3
               b                                                         b
          L2                                                                 L3
                                  b                  b
                                 L6                  L7                   b
                   b     R1                    R2               R3       L4
                    L1                     b                                          H4
                                               L5
     H1

                                           H5
RSVP: receiver reservation example 1 (more)

 next, H2 makes no-filter reservation for bandwidth                                     b
 H2 forwards to R1, R1 forwards to H1 and R2 (?)
 R2 takes no action, since                   b already reserved on L6
                      L6                                      in L3         L4   L7
  m1: in L1 L2                                          m1:
      out L1(b) L2(b) L6                                      out L3        L4   L7(b)

                            m1: in L5        L6    L7
                                out L5       L6(b) L7

 H2        b                                                                       H3
                b                                                      b
           L2                                                              L3
                                b                  b
                               L6                  L7                   b
                 b     R1                    R2               R3       L4
                                         b                                          H4
                b L1                         L5
      H1

                                         H5
RSVP: receiver reservation: issues
What if multiple senders (e.g., H3, H4, H5) over link (e.g., L6)?
 arbitrary interleaving of packets
 L6 flow policed by leaky bucket: if H3+H4+H5 sending rate
  exceeds b, packet loss will occur

                      L6                                      in L3         L4   L7
  m1: in L1 L2                                          m1:
      out L1(b) L2(b) L6                                      out L3        L4   L7(b)

                            m1: in L5        L6    L7
                                out L5       L6(b) L7

 H2        b                                                                       H3
                b                                                      b
           L2                                                              L3
                                b                  b
                               L6                  L7                   b
                 b     R1                    R2               R3       L4
                                         b                                          H4
                b L1                         L5
      H1

                                         H5
RSVP: example 2

 H1, H4 are only senders
   send path messages as before, indicating filtered reservation
   Routers store upstream senders for each upstream link

 H2 will want to receive from H4 (only)




 H2                                                   H3
           L2                                  L3

                     R1   L6        L7
                               R2        R3   L4       H4
                L1
      H1
     RSVP: example 2
      H1, H4 are only senders
        send path messages as before, indicating filtered reservation



in   L1, L6                                           in   L4, L7
    L2(H1-via-H1        ; H4-via-R2   )                L3(H4-via-H4       ; H1-via-R3   )
out L6(H1-via-H1        )                          out L4(H1-via-R2       )
    L1(H4-via-R2        )                              L7(H4-via-H4       )


      H2                                                                            H3
                   L2                            R2                         L3

                               R1         L6                L7
                                                                     R3   L4            H4
                          L1
              H1                           in   L6, L7
                                               L6(H4-via-R3      )
                                           out L7(H1-via-R1      )
     RSVP: example 2
      receiver H2 sends reservation message for source H4
       at bandwidth b
             propagated upstream towards H4, reserving b
in   L1, L6                                          in   L4, L7
    L2(H1-via-H1        ;H4-via-R2 (b)
                                     )                L3(H4-via-H4 ; H1-via-R2   )
out L6(H1-via-H1         )                        out L4(H1-via-62 )
    L1(H4-via-R2         )                            L7(H4-via-H4 (b))


      H2                                                                      H3
                        b
                   L2                     b
                                                R2         b             L3
                                                           L7             b
                                 R1      L6
                                                                   R3   L4       H4
                            L1
              H1                          in   L6, L7
                                              L6(H4-via-R3 (b)
                                                             )
                                          out L7(H1-via-R1 )
RSVP: soft-state
 senders periodically resend path msgs to refresh (maintain) state
 receivers periodically resend resv msgs to refresh (maintain) state
 path and resv msgs have TTL field, specifying refresh interval

   in   L1, L6                                          in   L4, L7
      L2(H1-via-H1         ;H4-via-R2 (b)
                                        )                L3(H4-via-H4 ; H1-via-R3   )
  out L6(H1-via-H1          )                        out L4(H1-via-62 )
      L1(H4-via-R2          )                            L7(H4-via-H4 (b))


        H2                                                                       H3
                           b
                      L2                     b
                                                   R2         b             L3
                                                              L7             b
                                    R1      L6
                                                                      R3   L4       H4
                               L1
                 H1                          in   L6, L7
                                                 L6(H4-via-R3 (b)
                                                                )
                                             out L7(H1-via-R1 )
     RSVP: soft-state
  suppose H4 (sender) leaves without performing teardown
  eventually state in routers will timeout and disappear!



in    L1, L6                                          in   L4, L7
    L2(H1-via-H1         ;H4-via-R2 (b)
                                      )                L3(H4-via-H4 ; H1-via-R3      )
out L6(H1-via-H1          )                        out L4(H1-via-62 )
    L1(H4-via-R2          )                            L7(H4-via-H4 (b))


      H2                                                                          H3
                         b
                    L2                     b
                                                 R2         b             L3
                                                                           b
                                  R1      L6                L7
                                                                    R3   L4      gone
                                                                                    H4
                             L1                                                fishing!
               H1                          in   L6, L7
                                               L6(H4-via-R3 (b)
                                                              )
                                           out L7(H1-via-R1 )
The many uses of reservation/path refresh

 recover from an earlier lost refresh message
    expected time until refresh received must be longer than
     timeout interval! (short timer interval desired)
 Handle receiver/sender that goes away without
  teardown
      Sender/receiver state will timeout and disappear
 Reservation refreshes will cause new reservations
  to be made to a receiver from a sender who has
  joined since receivers last reservation refresh
      E.g., in previous example, H1 is only receiver, H3 only
       sender. Path/reservation messages complete, data flows
      H4 joins as sender, nothing happens until H3 refreshes
       reservation, causing R3 to forward reservation to H4,
       which allocates bandwidth
RSVP: reflections

 multicast as a “first class” service
 receiver-oriented reservations
 use of soft-state
Multimedia Networking: Summary
  multimedia applications and requirements
  making the best of today‟s best effort
   service
  scheduling and policing mechanisms
  next generation Internet: Intserv, RSVP,
   Diffserv
                Services Différenciés

 6.1 Multimedia Networking      6.5 Beyond Best Effort
  Applications                   6.6 Scheduling and Policing
 6.2 Streaming stored audio      Mechanisms
  and video                      6.7 Integrated Services
    RTSP                        6.8 RSVP
 6.3 Real-time, Interactive     6.9 Differentiated Services
  Multimedia: Internet Phone
  Case Study
 6.4 Protocols for Real-Time
  Interactive Applications
    RTP,RTCP
    SIP
IETF Differentiated Services
Concerns with Intserv:
 Scalability: signaling, maintaining per-flow router state difficult
   with large number of flows
 Flexible Service Models: Intserv has only two classes. Also want
   “qualitative” service classes
       “behaves like a wire”
       relative service distinction: Platinum, Gold, Silver

Diffserv approach:
 simple functions in network core, relatively complex functions at
   edge routers (or hosts)
 Do‟t define define service classes, provide functional components
   to build service classes
Diffserv Architecture                       scheduling

Edge router:
                                               .
                                               .
                                           marking
- per-flow traffic management
                                       r       .
- marks packets as in-profile
and out-profile                    b


Core router:
- per class traffic management
- buffering and scheduling
based on marking at edge
- preference given to in-profile
packets
- Assured Forwarding
   Edge-router Packet Marking
 profile: pre-negotiated rate A, bucket size B
 packet marking at edge based on per-flow profile


                                    Rate A

                                        B

                    User packets

Possible usage of marking:
  class-based marking: packets of different classes marked differently
  intra-class marking: conforming portion of flow marked differently than
    non-conforming one
      Classification and Conditioning

 Packet is marked in the Type of Service (TOS) in
  IPv4, and Traffic Class in IPv6
 6 bits used for Differentiated Service Code Point
  (DSCP) and determine PHB that the packet will
  receive
 2 bits are currently unused
      Classification and Conditioning

may be desirable to limit traffic injection rate of
  some class:
 user declares traffic profile (e.g., rate, burst size)
 traffic metered, shaped if non-conforming
Forwarding (PHB)

 PHB result in a different observable (measurable)
  forwarding performance behavior
 PHB does not specify what mechanisms to use to
  ensure required PHB performance behavior
 Examples:
      Class A gets x% of outgoing link bandwidth over time
       intervals of a specified length
      Class A packets leave first before packets from class B
                  Forwarding (PHB)

PHBs being developed:
 Expedited Forwarding: pkt departure rate of a
  class equals or exceeds specified rate
      logical link with a minimum guaranteed rate
 Assured Forwarding: 4 classes of traffic
    each guaranteed minimum amount of bandwidth
    each with three drop preference partitions
Multimedia Networking: Summary
  multimedia applications and requirements
  making the best of today‟s best effort
   service
  scheduling and policing mechanisms
  next generation Internet: Intserv, RSVP,
   Diffserv

				
DOCUMENT INFO