Mathematical Comparison of Important Speech Recovery Methods in Communication Networks by editorijettcs


More Info
									    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

    Mathematical Comparison of Important Speech
    Recovery Methods in Communication Networks
                                        Rohit Srivastava1, Dheeraj Kumar Singh2
                  Assistant Professor, Computer Science and Engineering Department, Parul Institute of Engineering
                                                 & Technology, Limda, Vadodara
                  Assistant Professor, Information Technology, Parul Institute of Engineering & Technology , Limda,

                                                                  loss, therefore, has a major effect on speech quality and
Abstract:The paper presents the mathematical comparison           the consequent constraints on packet dropping rates
of prominent speech recovery methods used in digital              affect system costs.
communication networks. Digital speech interpolation (DSI)
has increased the utilization efficiency of a digital link for    A conversation consists of two independent, identical,
transmitting voice packets along with the data packets by use     simplex voice paths. At the source, vocoded speech is
of Voice over Internet Protocol (VoIP). For the purpose of        produced at a constant rate. The transmitter collects
dealing with delayed voice packets the receiving end stores
                                                                  samples until a packet is filled, and transmits it over the
recently arrived packets in a jitter buffer before playout.
When the delay is greater than the jitter buffer length, the
                                                                  network. Transmission on the network is bit-serial,
packet is considered lost, and is compensated by certain          beginning with 64 sync bits, followed by the packet. A
reconstruction method. Various methods namely the Zero            packet contains 112 bits of header, a data field between
substitution method; the method of repeating the preceding        368 and 12,000 bits, and a 32 bit CRC field. A single
packet; Waveform substitution method; Sine Model and              frame holding 20-30 ms of speech is assumed stationary.
Analysis-by-Synthesis / Overlap-Add method are employed. A        Overall a packet can hold from 5.75 ms to 187.5 ms of
late retransmitted packet can also be used to restore the         speech. To control time variability existing between
concealed reference frame by the use of late frame method.        adjacent packets, “jitter buffer” is used at the receiver.
Keywords:       Digital     Communication,         Speech         Packets arriving after time limit of 150 ms are discarded
interpolation, Voice over Internet Protocol, jitter buffer,       by the system and are not made available to the decoder.
Zero substitution, Waveform substitution, Sine Model,             This necessitates for the application of a concealment
Analysis-by-Synthesis / Overlap-Add, late frame.                  procedure to replace the ‘lost’ samples, frames or packets
                                                                  in order to improve the speech quality. The missing
INTRODUCTION                                                      packets are reconstructed by substitution of past
Packet speech communication plays an important role in            waveform speech segments already available at the
the evolution of combined voice and data services. The            receiver.
advantages of communicating computer data in packets
are discussed by Rosner [15] and the merits of packet             This paper presents performance comparison between
speech by        Decina and V’lack in [13] are well               important recovery methods viz. waveform substitution,
documented. In contrast to packet data transmission               sinewave speech model, analysis - by - synthesis using
where delays are allowed to build up as traffic increases,        overlap – add model, and the late frame method; in the
speech communication requires prompt packet delivery.             order of their improving performance. The methods
The transit time for the packet through the network               applied at the receiver-end, operate on conventional pulse
varies due to queuing effects along the transmission              code modulated (PCM) speech signal and incur
path. A “jitter” or “playout” buffer, that allows the             negligible processing delay. More robust methods attain
receiver to wait for all packets arriving within an               better speech quality at the level of frame recovery,
acceptable time limit, is used to control the effects of          making the process fastidious. Those methods are quite
such variability. However, some packets may still arrive          sophisticated and involve large computation delay. The
too late to be decoded. Beyond some time limit (150 ms            signal is further encoded or compressed so as to reduce
as recommended by CCITT), delayed speech packets are              the bit data and hence the bit – rate. A trade-off between
useless at the receiving terminal and are discarded by the        bit-rate and bandwidth is required to be established as per
system. Missing or late packets are usually considered as         the application to obtain the information with low delay
“lost”, and a concealment procedure has to be applied to          and better quality respectively.
replace the missing audio samples. Unfortunately,
concealment is not perfect and errors introduced in the           Waveform substitution methods are straightforward to
concealed frame propagate in the following ones. Packet           implement. The term “waveform substitution” refers to

Volume 1, Issue 3 September-October 2012                                                                              Page 37
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

reconstruction of missing voice packets by substitution of               Pf(r) = 1 - tr; r  C
past waveform speech segments already available at the                         = 1; r > C                        (4)
receiver or, if the speech has been processed by a              If the probability of packet loss is p, independent of
vocoder, synthesizing new speech from previously                packet position, the probability of a sequence of r missing
received analysis data suggested by Weinstein and Forgie        packets is
[12]. The performance of such methods is governed by               Pm (r) = pr (1 – p)                      (5)
essentially two characteristics of speech. The first is the
maximum period over which speech can be considered              Thus, the total probability of failure of the waveform
stationary, and the second concerns the transitions which       substitution method is
occur between voiced and unvoiced or silent speech                            
segments.                                                                     Pf ( r ) Pm ( r )
                                                                       Pf = r  0                                  (6)
A simple formula for predicting the probability of
waveform substitution failure as a function of packet
                                                                which, combined with Eq. (5) and (6), produces Eq. (3)
duration and packet loss rate is being evaluated.
                                                                The Waveform substitution using Sample Interpolation
                                                                method is applicable when there is no consecutive
                                                                omission of samples. The analysis and problem solved in
Let Tp = packet duration (ms)                                   the Sethi [22] explicits the method. The method,
     p= missing packet probability                              however, introduces distortion (due to discontinuities) at
         C= max no of packet intervals over which the           packet edges.
         speechsignal is considered stationary.
Since speech parameters can be assumed stationary for            The waveform substitution methods fail if:
                                                                1. Missing segment is longer (> 32ms) and makes speech
no more than about 30 ms by Jayant and Christensen
                                                                signal non-stationary
[16], the number of contiguous missing packets that can
                                                                2. There is transition of energy or pitch within missing
be tolerated before condition of Eq (1) occurs in
approximately 30/Tp,

Therefore we have
                                                                A Sinusoidal model of successfully received speech is
                                                                utilized to recover lost frames using both extrapolation
         C= 32/ Tp                                    (1)
                                                                and interpolation techniques. The model explained by
         & t= (e-0.0052 Tp )
                                                                Lindblom and Hedelin [2], offers a low-delay packet loss
                                                                concealment method for speech communication and can
                                                                be used to repair an incoming voice stream which has
                                                                been subjected to frame-erasures of an order of 10%. The
 t= probability that No category transition over duration
                                                                model operates on the source-filter components of the
of a packet that lasts for Tp ms.
                                                                speech signal and attempts to preserve the original
                                                                waveform shape in modified speech.
 Pf= probability of unsuccessful waveform substitution is
given by:
                                                                In the source-filter model, the speech signal is considered
   Pf = 1 -   (1-p)* {1-(pt)   c+1
                                     / (1 – pt)}     (3)        as the output of a time-varying vocal tract system excited
                                                                with quasi-periodic or random sequence of pulses. The
Alternatively, assuming the occurrences of category             source- filter model forms an essential part of low-rate
transitions per second as a Poisson probability model, we       compression algorithm. The modern speech compression
have t, in Eq. (2), the probability of no transition over the   algorithms reduce a 128 kbps PCM signal, to rates in the
duration of a packet lasting Tp ms. If there is a sequence      order of 2-4 kbps, maintaining acceptable quality.
of r consecutive missing packets, the waveform                  Besides obtaining a low bit rate and a high speech
substitution fails if r > C, which means that regardless of     quality, the possibility of packet loss suggests that
transitions, the duration of the missing speech is too long     packets should be decoded independently. This motivates
to allow accurate substitution from past waveform               the use of sinusoidal speech coding, as it provide a high
segments. If r  C, the substitution will fail if there is a    speech quality at a low bit rate without using inter-frame
transition somewhere in the sequence of r missing               information. After speech has been modeled efficiently as
packets. Assuming that transitions occur independently          sum of sinusoidal components, the Sine Model follows
from packet to packet, the probability of no transition in      the steps as:
a sequence of r packets is tr and the probability of failure

Volume 1, Issue 3 September-October 2012                                                                          Page 38
      International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

1.      Analysis- In analysis, one takes apart the speech
waveform, to extract the parameters of a time-varying
model. For each sinusoidal component of a signal, its
parameters viz. frequency, amplitude, and phase are

a.            Short Time Fourier Transform (STFT) applied
              to track sinusoids.
b.            A set of band-pass filters used for the purpose.                    Fig. 3.6 Shifted version of windowed signal
                                                                       a Analysis                      0
2.       Data Reduction (optional)
3.       Modification (optional)
4.       Synthesis- In synthesis, signal is reconstructed                The fundamental frequency or pitch, w0, of speech is
from a summation of sinusoids. Based on the parameter                    found by a pitch detector. Using the pitch, the complex
estimates and model output, the parameters are processed                 amplitude (amplitude and phase) of each sine-wave can
to produce a synthetic waveform that is perceptually                     be estimated using weighted least squares or the
indistinguishable from the original.                                     Levinson- Durbin recursion algorithm.

a.               Application of Inverse Fourier Transform                At wx,
b.               A bank of oscillators used for the purpose.
                                                                              Xw( wx )                         a W ( 0)
                                                                                            =                   .
Given analyzed sinusoidal model parameters, a set of
continuously variable parameters, tracks representing the                 Xw( wx )             =       a +  W (0)
speech signal is constructed by matching nearest
neighbor components from one frame to the next and                       The window is scaled to have a DC gain of 1, then the
interpolating the matched parameters over time using                     peak magnitude equals the amplitude of the sinusoid,
polynomial functions. More details can be found in the
work of Sethi [22].
                                                                                   Xw ( w x )                            a
                                                                         Or                                    =
Illustration of method with a generalized problem:
                                                                         By using zero-phase (even) window, the phase at the
Let us consider the transform of a windowed complex                      peak equals the phase of the sinusoid,
sinusoid, with complex amplitude ‘a’ and frequency ‘wx’
                                                                         Or     Xw( wx )                      =        a
                               xw(n) = w(n)ae
The transform of this windowed signal is the convolution
of a frequency domain delta function at wx[  (w – wx)],                 The signal reconstructed by a finite sum of sinusoidals
and the transform of the window function, W(w),                          using Oscillators (Linear Predictive Coders) as:
resulting in a shifted version of the window transform.
Assuming that window length M is odd, following results
                                                                                                 A sin( w   )
                                                                                                           i        i        i

may be obtained:                                                                     rn=        i 1


               w(n) x(n)
              n  
                                                jwnT                    where L is the number of sine-waves used. E.g. for L=5,
                                                                         equation becomes
Xw(w)=                                     e

                         ( M 1) / 2
                                                   jwxnT                                         A sin( w   )
                                                                                                           i        i        i

DTFT (Xw) =                   [ w(n)a e                   ]e  jwnT                 ˆ
                                                                                     rn=        i 1

                       n   ( M 1) / 2
                                                                         For the case of 2.4 kbps LPC vocoder. The LPC
                                                                         coefficients are represented as line spectrum pair (LSP)
           w(n)
                              j (w  wx)nT                              parameters having the vector form as:
     =a                  e
                                                                              A = (a1, a2 , a3, a4, a5, a6, a7, a8, a9 , a10, G, V/UV, T)
     = aW (w – wx)

Volume 1, Issue 3 September-October 2012                                                                                         Page 39
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

A changes every 20 ms. At a sampling rate of 8000             Assuming that the parameters of components have been
samples/sec, one frame holds160 samples. For the frame        determined previously, generating a successive
size of 20 ms and 50 frames / sec to be transmitted from      approximation to s [n] in the range kNs – Na  n  kNs +
encoder. The model says that 160 sample values in one         Na expressed as
frame of speech signal is compactly represented by 13                                  l 1

values of A.                                                                           
                                                                           k [n] =
                                                                          s l-1        j 1    ˆ
                                                                                               s jk [n]
Also 2400 bps  48 bits / frame.
These are to be allocated as follows:

These bits are transmitted to the decoder as 13 3-bit                Parameter                Parameter       Rate
sample values and a 6-bit scaling factor that turns the              Name                     Notation        (bits/fram
PCM encoding into an APCM. In the decoder, the                                                                 e)
algorithm multiplies 13 3-bit samples by the scaling                 LPC (LSP)                          10    34
factor and expands them back into 40 samples, which
                                                                                              {ak}k 1
then becomes part of the source for the next three                                                       10

predictions. The signal finally passes through the tubes of                                   (
                                                                                                {wk}k 1 )
the simulated vocal tract (synthesis filter) and emerges as          Gain                     G               7
speech. 2.4 kbps LPC vocoder can be represented withstream           Voiced/                  U/UV, T          7
the block diagram of Fig 1                                           Unvoiced
                                                                     and Period
                                                                     Total                                    48
                                                                                                 l 1
                                                                            =  [n + kNs] j 1 Ajkcos (  jkn +  jk),

                                                              and a successive error sequence

                                                                       e l - 1k[n] = s [n + kNs] -  l-1k [n]
            Figure 1 2.4 kbps speech vocoder                           (9)
To reduce synthesis computation in the Sinusoidal
method, the combination of an overlap-add sinusoidal          Given the initial conditions
model with an analysis-by-synthesis procedure has been
developed to determine model parameters. An Overlap-           k[n] = 0 and e0k[n] = s [n + kN ],
                                                              s0                                s
Add model using the inverse FFT algorithm was
proposed by Mc Aulay and Quatieri [10]. The model uses
                                                              these sequences are updated recursively (for l 1) by
a successive approximation-based analysis-by-synthesis
procedure rather than peak-picking to determine model
parameters. Analysis-by-synthesis / Overlap-add                 k [n] =  k [n] +  [n + kN ] A kcos (  kn +  k)
                                                               sl         s l-1                s   l          l       l
method (ABS/OLA) enhances modeling accuracy. It is             el [n] = e l-1k [n] -  [n + kNs] Alkcos (  lkn +  lk)

capable of automatically analyzing input speech signals,      (10)
synthesizing perceptually identical replicas of speech                                                                  (10)
from analyzed parameters, and modifying speech signals        The goal of analysis-by-synthesis is to update the
to alter their time, frequency, and pitch scale. It allows    approximation to s [n] by adding a single component
for joint time and frequency modifications as well. The       such that the updated approximation is as good as
method attempts to determine the optimal parameters for       possible. This is achieved by minimizing the successive
each component without regard to stationarity and             error norm Elk of elk[n], which is given by
achieves very high synthetic speech quality by accurately                        Na
estimating component frequencies, eliminating sidelobe                          w
interference effects, and effectively dealing with                     Elk = n  Na a [n]{ elk [n] } 2
nonstationary speech events. The ABS algorithm works
as follows:
First define “component sequences” as                                w
        s jk[n]   [n + kNs] Ajkcos(  jkn +  jk),
        ˆ                                                       = n   Na a [n] {el-1k [n] -
                           (7)                                           [n+kNs].Alkcos(  lkn+  lk)}2           (11)

Volume 1, Issue 3 September-October 2012                                                                             Page 40
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

Fig. 2 shows a functional block diagram of the analysis         rather than considering a late frame as “lost”. The
procedure, illustrating the “closed-loop” successive            method suggested by Gourney et al. [1] improves the
approximation structure reminiscent of linear predictive        recovery of a speech decoder after the reception of one or
analysis-by-synthesis coders, by Spanias [7]. From Eq.          several late frames. This limits, and in some cases stops,
(9) and Fig. 2, it is clear that by minimizing in terms of      the error propagation caused by the concealment. They
the parameters of the lth component, we are in fact             can be used to improve the robustness of the decoder
approximating the residual error left after approximating       without increasing the overall end-to-end delay. A late
the segment of s [n] by the previous l-1 components.            retransmitted packet can be used to restore the concealed
                                                                reference frame, which stops error propagation among
                                                                predicted frames.

                                                                A comparison of all the above methods is tabulated on
                                                                the following sequence:
                                                                 ODS      SIL     WAVEFO          SINUSOI      ABS /        LATE
                                                                          ENC         RM          DAL          OLA          FRAM
                                                                          E        SUBSTIT        MODEL                     E
                                                                          SUB       U-TION
                                                                          STI         BY
                                                                 PARA     TU-      SAMPLE
                                                                 METE     TIO     INTERPO
                                                                 RS       N        LATION
  Fig. 3.8 Block diagram of analysis-by-synthesis                                 PCM             PCM,-
procedure applied to overlap-add sinusoidal modeling.            OPER             Packets         LAW
                                                                 ATE              containing      Encoded,         PCM         PCM
                                                                 ON       PCM     consecutive     64 /128
A further distinction of analysis-by-synthesis is its ability
                                                                                  speech          kbps, 8-
to model speech using an indeterminate number of                                  samples,-      16bits/sam
sinusoids. This ability allows the ABS/OLA system to                              Law             ple
synthesize very accurate unvoiced speech, without the                                             Frame
                                                                 OPER     Samp      Sample        level        Frame        Frame
tonality often associated with Sinusoidal analysis.              ATE        le       level        Frame        level        level
Another advantage of analysis-by-synthesis is its ability        AT       level                   length=
to deal with sidelobe interference effects. Analysis-by-                                          20ms
synthesis removes each component after determining its                                            frame=16
parameters. Due to this, slight tonality in synthetic                                             0
voiced speech as a result of sidelobe effects is reduced.                                         samples=3
                                                                                                  2 sine
Finally, a basic issue in sinusoidal speech modeling is the                                       waves in
representation of unvoiced speech or other events that                                            LP
are not naturally modeled using narrowband
deterministic processes. While the ABS/OLA system                SAMP     8          8 kHz          8 kHz       8000        8000
                                                                 LING     kHz                                  samples/s    samples/
models such events well using only sinusoids. This               RATE                                          ec           sec
configuration yields a 5 dB increase in average                                   There
segmental SNR, primarily due to ABS generating better            ALLO     Pf =    should be       Upto 10%     Ability to   If used
                                                                 WED       p      no              frame loss   model        in
estimates of component frequencies, as per George [8].                            consecutive                  speech       conjunct
                                                                 PACK     p
The illustration of method can be seen in the work of            ET       1%      omission of                  using        ion with
Sethi [22].                                                      LOSS             samples for                  indetermi    predictiv
                                                                 RATE,            high p.                      nate         e coders,
                                                                 p /                                           sinusoids    their Pf
The computational load required to implement the                 FAILU             5%, Upto                   .            can be
ABS/OLA system has been significantly reduced by                 RE               10% if                                    significa
exploiting frequency domain interpretations of the               PROB.             pkt. loss is                             ntly
                                                                 , Pf             controlled.                               reduced.
analysis and synthesis algorithms and their relation to the                       ( 5, 10, &
FFT algorithm. As a result, the ABS/OLA system                                    20% for
achieves a combination of modification flexibility, high                          T p = 32, 16
                                                                                  & 8 ms)
quality output, and reasonable implementation
requirements     not    found    in    existing     speech
analysis/synthesis systems.

The use of late frame method proposes to use late frame
information to update the internal state of the decoder,

Volume 1, Issue 3 September-October 2012                                                                                    Page 41
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

                                 Spectral       Spectral                                   Let
PACK    16-      64 – 256        part=20        part=20       Packet              Let      Distortion      Let            Let          Much
ET      32ms     Samples         bits/frame     bits/fram     duration            D1 be    in              Distortion     Distortio    reduced
SIZE    pkts     8ms       /64   [25,52]        e [25,52]     not                 the      waveform        in             n       in   &      in
        most     samples         20       ms    20      ms    greater     DISTO   amou     substitution    sinewave       AbS/OL       some
        tolera   most            frame=         frame=        than 30     RTION   nt of    be         D2   modeling       A            cases
        nt to    tolerant to     160            160           ms or               distor   D2 < D1         be D3          method       error
        pkt      pkt loss (not   samples=       samples=      256                 tion     -                 D3  D2      be D4        propagat
        loss     greater than    32      sine   32 sine       samples.            foun     Discontinui                     D4  D3     ion
                 30 ms)          waves          waves                             d in     ty at pkt                                   caused
                 16 ms / 128      64 ms         Tapered,                          silenc   edges                                       by
SEAR    -NA-      SAMPLES        length                       -NA-               e        -Degrade                                    conceal
                                                                                                                                       ment is
WIND             ( LW  SNR
                                                   ws[n                          subst
                      )                        k  
OW                                                                                n        clipping &                                  .
SIZE                                                                               (Hig    phoneme
          _          2Lt                                                          hest)    loss
TEMP             (14-31            20 ms        5 < Ns <       -NA-                        -Packet
LATE             samples);2-                    20 ms                     MERG             inter-          Extrapolat
SIZE             4 ms                                                     ING     Tm =     leaving         ion/           Overlap-     Weighti
                                                              -    BFI    PROC    1ms      (N=8), Pkt      Interpolati    add with     ng
                 -     Pattern   Weighted       Successi      activates   EDUR             recovery        on    with      time-       signals
METH    -NA-     matching        least              ve        frame       E                simplifies to   50%            varying      with
ODS/             (template &     squares or     approxim      loss        &                recovery of     overlap          gain.      fade-in,
MAHT             search          the              ation /     conceal     ALLO             samples,        (meaning                    fade-out
                 window          Levinson-         FFT        ment        WED              Tm = 1ms        8ms                         window
                 used)           Durbin          algorith     procedu     MERG             -      Raised   overlaps                    s      in
                 - waveform      algorithm/         m/        re.         ING              cosine          between                     excitatio
                 difference      Discrete         STFT        -           DURA             weight          frames.)                    n
                 measure.        FT/                          Memory      TION             method &                                    domain.
                                 Inverse                      in                           addition of
                                 DFT/ LP                      predictiv                    overlap
                                 analysis/                    e                            packet
                                 auto-                        quantize                     segments.
                                 correlation                  rs where    COMP                                YES         Reasona      Depends
                                 method.                      late        UT-     NIL      Negligible         (high       bly high     on
                                                              frames      ATION            processing       synthesis     due to       speech
                                                              are         DELA               delay           delay)       inverse      coder
                                                              received    Y                                               FFT          used.
                                                              .                                                           algorith
                                                              Depends                                                     m.
                 -St                                          on coder                                     Real-time                   No
METE    -NA-     -     second         
                                 - f,a,         - f,a,   ,   properti                                     applicatio                  increase
RS               order           -      freq
                                                    (n)      es,         TRAN    Least        Low         ns     (e.g.
                                                                                                                            More       in
REQUI            divided                        Provides      generall    SMISS
RED              difference      d by peak-     better        y           ION                              voice                       end-to-
                                 picking        estimatio     paramet     DELA                             response)                   end
                                 method.        n of freq.    ers    of   Y                                possible,                   delay.
                                 -        10                  previous                                     depends                     Normal
                                 coefficient                  frame                                        on trade-                   play-out
                                 s in LPC                     are                                          off                         time &
                                                              extrapol                                     between                     no
                                                              ated for                                     bit-rate &                  addition
                                                              lost                                         bandwidth                   al delay
                                                              frame.                                                                   introduc
                                                5 dB                                                                                   ed.
SNR       s      19-20 dB >       10 dB         increase      SNR
                      s          (OPTIM         on            increase
                                   AL)          average       s
                                                segmenta      compara
                                                l SNR.        tively.
ODS     SIL      WAVEFO          SINUSOI        ABS /         LATE
        ENC         RM           DAL            OLA           FRAM
        E        SUBSTIT         MODEL                        E
        SUB       U-TION
        STI         BY
RS      N         LATION

Volume 1, Issue 3 September-October 2012                                                                                               Page 42
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

                                                               When a       [3] S.V. Andersen, et al., “ILBC - A linear predictive
OTHE     Larg    When T Pitch    Too long        Non-          referenc
         e       LW             missing         stationari    e frame
                                                                              coder with robustness to packet losses”, IEEE
FAILU    numb    1.Too long      segment         ty     not    is lost,       Workshop on speech coding, pp. 23-25, October 2002.
RE       er of   missing         makes           allowed       receiver     [4] B.W. Wah, X. Su, and D. Lin, “A survey of error
CONDI    conse   segment         speech          with          asks for
         cutiv                   signal          quasi         its
                                                                              concealment schemes for real-time audio and video
TION             makes
         e       speech          Non-            harmonic      retransm       transmission over the Internet”, IEEE International
         pack    signal Non-     stationary      signal.       ission. if     Symposium on Multimedia Software Engineering,
         et      stationary      (> 32 ms)                     retransm
         loss                                                  ission is
                                                                              December 2000.
                 (> 32 ms)
                                                               too late,    [5] N. Naka and T. Ohya (NTT Mobile
                 2.During                                      conceal        communications), “Updating internal states of a speech
                 transition in                                 ment is
                 energy                                        applied.
                                                                              decoder after errors have occurred”, U.S. Patent
                 /pitch levels                                 But late       US006085158A, 4 July 2000.
                                                               retransm     [6] E. Bryan George, Mark J. T. Smith, “Speech
                                                                              Analysis/Synthesis and Modification Using an
                                                               used to        Analysis-by-Synthesis/Overlap-Add Sinusoidal Model”,
                                                               restore        IEEE Transactions on Speech and audio processing,
                                                                              vol. 5, No. 5, September 1997.
                                                               referenc     [7] A. Spanias, “Speech coding: A tutorial review,”
                                                               e frame.       Proc. IEEE, vol. 82, pp. 1541–1582, October 1994.
                                                 Used by                    [8] E. B. George, “An analysis-by-synthesis approach
                 - Land and      Speech          T-f & LP
                 portable        processing      speech                       to sinusoidal modeling applied to speech and music
APPLI            phones          problems,       coders.                      signal processing,” Ph.D. dissertation, Georgia Inst.
CATIO    -                       speaker         Well                         Technol., Atlanta, GA, 1991.
NS       Voic    - Voice         recognitio      suited to     For
         e       store and       n, speech       non-          Predicti     [9] Mitsuhiro Yuito, Naoki Matsuo, “A new sample
         store   forward         &               linear        on based       interpolation method for recovering missing speech
         and     systems         language        nature of     speech         samples     in    packet    voice commns.,” NTT
         forw                    recognitio      analysis.     coders.
         ard                     n, coding,      Allows                       Telecommunication Networks Laboratories, Yokosuka-
         syste   - Voice         music, real     joint t &                    shi, pp 381 –384, Japan, 1989
         ms      storage for     -       time    f                          [10] R. J. McAulay and T. F. Quatieri,
                 recorded        applicatio      modificat
         -       message         ns         of   ion and                      “Computationally efficient sinewave synthesis and its
         Voic    announcem       speech          pitch                        application to sinusoidal transform coding,” in Proc.
         e       ents            analysis/       scaling,                     IEEE Int. Conf. Acoust., Speech, Signal Processing,
         mess                    synthesis       time-
         ages                    &               varying t                    Apr. 1988, pp. 370–373.
         for                     modificati      &         f                [11] David J. Goodman, Gordon B. Lockhart, Ondria
         recor                   ons.            scaling.                     J. Wasem and Wai-choong Wong,              “Waveform
         ded                     DCM             Computa
         anno                    equipment       tional                       Substitution Techniques for Recovering Missing
         unce-                   ,               load of                      Speech Segments in Packet Voice Commns.,” IEEE
         ment                    Digital         AbS                          Transactions on Acoust., Speech, Signal
         s                       satellite       remain
                                 systems.        an
                                                                            [12] Processing, vol. ASSP-34. NO. 6, pp 1440 –1448,
                                                 obstacle                     December 1986
                                                 for real-                  [13] C. J. Weinstein and J. W. Forgie, “Experience with
                                                                              speech communication in packet networks,” IEEE J.
                                                 ons.                         Selected Area Comm., vol. SAC-1, pp. 963-980,
                                                                              December 1983
                                                                            [14] M. Decina and D. V‘lack, “Voice by the packet?”
REFERNCES                                                                        IEEE J. Selected Areas Comm., vol. SAC-1, pp. 961-
                                                                                 962, December 1983.
  [1] Philippe Gournuy, Francois Rousseau, and                              [15] Andrew J. Mackie, Salah E. Aidarous, Samy A.
    Roch Lefebvre, “Improved packet loss recovery                                Mahmoud, and J. Spruce Riordon, “Design and
    using late frames for prediction- based speech                               Performance Evaluation of a Packet Voice System,”
    coders”, Proc. IEEE Int. Conf. Acoust., Speech,                              Trans. of Vehicular Technology, Vol VT-32, No. 2,
    Signal Processing, 2003                                                      pp 158-168, May 1983
[2] Jonas Lindblom and Per Hedelin, “Packet loss                            [16] R. D. Rosner, “Packet Switching Tomorrow’s
  concealment based on sinusoidal modeling”, Proc.                               Communications Today,” Belmont, CA: Lifetime
  IEEE Int. Conf. Acoust., Speech, Signal Processing,                            Learning Publications, 1982.
  2002                                                                      [17] N. S. Jayant and S. W. Christensen, “Effects of
                                                                                 packet losses in waveform coded speech and

Volume 1, Issue 3 September-October 2012                                                                                   Page 43
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

     improvements due to an odd-even sample
     interpolation procedure,” IEEE Trans. Comm., vol.
     COM-29, pp. 101-109, February 1981.
[18] L. R. Rabiner and R. W. Schafer, Digital
     Processing of Speech Signals. Englewood Cliffs, NJ:
     Prentice-Hall, 1978.
[19] Ronald W. Schafer and Lawrence R. Rabiner,
     “Digital Representations of Speech Signals,”
     Proc. IEEE, Vol. 63, No. 4, pp 662 –679, April
[20] Nuggehally S. Jayant, “Digital Coding of Speech
     Waveforms: PCM, DPCM, and DM Quantizers,”
     Proc. IEEE, Vol. 62, No. 5, pp 611-633, May 1974
[21] P. Elias, “Predictive coding,” IRE Trans. Inform.
     Theory, vol. IT-1, pp. 16-33, March 1955.
[22] “Packetised Speech Communication using Sinewave
     Analysis/Synthesis System,” Proc. ECTTA, Oct 22-
     23, Dept. of Electrical Engg, MBM Engg College,
     JNV University, Jodhpur, pp. 235-240, October
[23] “Performance Comparison of Different Speech
     Recovery Methods in Packet Based Transmission”,
     M.E. Dissertation Thesis, Dept. of Electronics &
     Comm Engg, MBM Engg College, JNV University,
     Jodhpur, 2007.


Rohit Srivastava has received M.Tech Degree from
Jodhpur National University in 2011 and B.E from
MECRC Jodhpur in 2007. Areas of interests are Network
Security, Compilers Design, Theory of Computation ,
Algorithms, and AI.

Dheeraj Kumar Singh has received the M.E degree in
Computer Science and Engineering in 2012 and B.Tech
degree in Computer Engineering in 2009 from RGPV,
Bhopal . Areas of Interest are Web Security, Web Service,
SOA, Cloud Computing, Distributed Computing and soft

Volume 1, Issue 3 September-October 2012                                              Page 44

To top