research papers

Description

Research papers for communication engineering

Reviews
Stats
views:
142
rating:
not rated
reviews:
0
posted:
5/24/2009
language:
English
pages:
0
Prototype of an Adaptive Voice Coder for IP Telephony A. L. Robustelli1, S. Loreto1, A. Fresa1, M. Longo 2, D. Spinelli2 1. CoRiTeL Italy – Via Ponte Don Melillo, 1 - 84084 Fisciano (SA) – Italy 2. University of Salerno – DIIIE – Via Ponte Don Melillo, 1- 84084 Fisciano (SA) – Italy E-mail: robustelli@coritel.it , loreto@coritel.it , fresa@coritel.it , longo@unisa.it Abstract: Purpose of this paper is to describe the theoretical basis, design and implementation of a communication system based on an adaptive voice coder tuned for IP real-time voice communications. The developed voice coder makes use of an adaptive algorithm which performs an automatic coding switch, according to the evolving congestion conditions of the underlying IP network, in order to render the communication's speech quality optimal in the various situations. 1. INTRODUCTION With the evolution of the multimedia applications and network technologies, the need has grown to communicate on the Internet with an ever deeper integration of data, voice, animations and (short) films, leading to multimedia network products such as IP telephony, Internet TV and Videoconference [1]. All these are real-time applications, i.e. where the execution time of the different operations is a crucial parameter; in particular that means that the audio-video realtime streams data must reach the final user within a certain established time interval, otherwise loosing any usefulness [3]. Given all this, it is easy to understand that for such data reliability is not so important as, on the contrary, delivery timeliness [6]. 2. VOICE OVER IP The real-time transmission of voice on an IP network (VoIP) is attracting much interest among the emerging Internet and Telecommunications technologies [2]. To understand the IP telephony phenomenon, let’s briefly describe the differences between the Internet and the public switched telephone network (PSTN). The PSTN is a circuit-switched (CS) network optimised for voice communications with a guaranteed quality of service (QoS). When a session is established, a full-duplex circuit is set up between the two sides, which remains permanently allocated independently from the voice activity (speech or silence) of the two parties. Thus, a PSTN call’s cost is strictly associated to the call’s duration and distance. The Internet, on the contrary, is a packet-switched (PS) network historically used for non-real-time applications. The PS networks do not allocate permanent circuits, but the information is fragmented into packets, each of them containing both user data and control information necessary to the network to deliver the packet. At every node in the path, the packet is received, stored and then routed to the next node – all this independently from the other packets. The PS networks are more efficient than the CS ones when data must be transmitted: in a typical data communication the line is mostly unused and so, dedicating a fixed bandwidth is a waste of resources. The VoIP objective is to provide the efficiency of a PS network and, at the same time, the quality of a CS network. The deriving advantages would be cost saving, versatility and integrability. Obviously there could also be drawbacks due to possible insufficient signal quality. 3. VOICE QUALITY For a VoIP application the final speech quality obtained from the service is a crucial parameter [14]. Several factors come into play in relation with that [15]: • Quality of Service QoS refers to the ability of a network layer to provide a service of an agreed-upon quality to a determinate traffic above various underlying technologies. It uses various protocols and technologies such as Resource Reservation Protocol (RSVP, RFC 2205) and Differentiated Services (RFC 2474). • Packet loss For VoIP applications the transport protocol used is UDP and not TCP, as this last does not meet the real-time requirements needed. UDP, differently from TCP, does not use retransmission techniques, does not guarantee that all the packets sent reach the destination nor about their arrival order. • Jitter Typical feature of a PS network is delaying the arrival of a packet in an unpredictable way; so if a source has injected packets at regular time intervals, it is not assured that they reach the destination according to the same time pattern [9]. The jitter is the time difference between the expected arrival time of a packet and the actual one: it appears only in PS networks and is function of the network’s transmission delay. To deal with that, a buffer is used at the receiving side. • Latency Latency is the total time needed to the speaker’s voice to reach the listener’s ear and is made u of three delays: p serialization delay, propagation delay and management delay. • Echo Another factor which degrades the voice quality is echo. It can have several origins; typically it appears because of not perfectly matched impedances along the analog line. 4. SPEECH QUALITY ESTIMATION The voice quality can be estimated in two ways: subjective or objective [10]. Human beings can perform a subjective test while computers, which differently from the human ear, are very unlikely to be cheated by compression algorithms, perform an objective test [13]. The codec systems are developed and tuned with reference to voice quality subjective measurements. A very common subjective evaluation to quantify performances of voice-related codecs is MOS (Mean Opinion Score) [12]. MOS carries out a test on a certain group of listeners; since voice quality is a subjective judgment, in order to realise a good test it is important to have a large set of listeners and coded materials. The listeners can evaluate the speech quality by means of a score comprised between 1 (poor) and 5 (excellent). Then the scores’ average is computed to obtain the mean value. A way to estimate speech quality without interviewing people is to use the E-model [4]. This is a model utilising the transmission parameters to predict the subjective voice quality, thus extending an equivalence “bridge” between the subjectiveevaluation domain and the objective-evaluation one. The E Model’s purpose [11] is to compute an R factor, which can assume values between 0 and 100 and allows to univocally determine the relative MOS. The relation is: For R < 0: For 0 < R < 100: For R > 100: MOS = 1 MOS = 1+ 0.035R+R(R-60)(100-R)7/106 MOS = 4.5 exchange for particular advantages. Usually A is ignored supposing that the user requires an optimal communication. Remains to say that the G.107 standard defines a table where, for all the parameters necessary to the computation of the R factor, default values are listed. Making use of those values, an optimal value for R is obtained: R = 93.2 (2) That value is obtained only by the sum of the first two terms in (1) , since the other terms are considered equal to zero in the above-mentioned table. In cases where it is not possible to suppose those terms as zero, the following equation can be considered for the R factor: R = 93.2 – Id(delay) – Ie (packet loss) (3) 5. ROBUST AUDIO TOOL Robust Audio Tool (RAT) [8] is a tool developed at the University College London which allows to perform Internet audio-conferences both of unicast and multicast kind; it uses RTP/RTCP [7] as a real-time transport protocol on top of UDP/IP. RTP allows identification of the data format, provides sequence numbers, timestamps and transmission monitoring; but does not provide any guarantee about data delivery, QoS or ordered packet delivery. RTCP provides feedback statistics and information of various kind. RAT offers a wide range of possible transmission techniques based on several codecs, among which GSM and µ-law, which are easy to set. Like other audio applications based on the RTP protocol, RAT provides statistics regarding the reception quality and information about the participants in a conference session. It offers a user-friendly graphic window interface which allows to have information about the session participants, communication statistics, and to manually switch among the various codecs and transmission techniques supported. RAT is an application written in C and Tcl-Tk for the graphical interface, and running on the Linux operative system. 6. THE ADAPTIVE MODEL Purpose of the present paper is to show the results that have been obtained by appropriately modifying RAT: an automatic and dynamic coding switch, at the occurrence of an excessive packet loss, and a continuous monitoring of the communication’s MOS have been introduced in RAT. Such dynamic switch of the coding employed is justified through the E-Model [12] which, as previously said, allows to determine the estimated MOS associated to a communication; and this from an R factor which can be computed as shown in (3): R = 93.2 – Id – Ie The first impairment, depending on the mouth-to-ear delay, can be computed using the results present in [ and in 6]; particular its graph, experimentally obtained, which plots Id as a function of the mouth-to-ear D delay, has been used: The R factor is given by the following equation: R = Ro – Is – Id – Ie + A (1) • Ro (Basic signal-to-noise ratio) takes into account the noise effects, for example circuit-generated or due to the room’s environment (both sender’s and receiver’s). • Is (Simultaneous impairment factor) takes into account too strong or too weak connections and quantization noise. • Id (Delay impairment factor) takes into account the talker and listener echos and the various delays. • Ie (Equipment impairment factor) takes into account the packet loss and the coded-related distortion. • A (Advantage factor) is called “expected factor” and represents the degradation a user is ready to tolerate in 40 35 30 25 20 15 10 5 0 0 13 30 50 88 12 5 16 0 18 0 19 5 21 0 22 5 28 5 31 3 35 0 mouth-to-ear D delay (ms) The delay D has been computed as: D = RTT/2 + d pac + d cod ( 4) • RTT is the Round Trip Time, and is present among the parameters RAT provides to the user. • d pac is the packetization delay [9] and is computed as: d pac = (k-1)*f (5) where f is the frame duration in msec and k is the number of frames present in a packet. • dcod represents the coding delay [9] and is given by the sum of three terms: d cod = f + look-ahead + d processing (6) where f is the frame duration in msec, look-ahead is the delay a process spends to determine whether or not an input signal is a voice signal and d processing is the processing delay. At this point, once obtained the mouth-to-ear delay D, from the previous graph the value for Id was evaluated. As for the second impairment, which depends on the lost packet percentage, it was evaluated on the basis of [5]. In that paper some graphs were obtained where Ie is reported as a function of the lost packet percentage for various kinds of codecs; to the purposes of the present paper only the GSM and µ-law codecs have been considered: for any Ploss< 0.4 the Ie value relative to G.711 is lower than the GSM one, while for Ploss> 0.4 the contrary happens. All that suggests to begin the communication using the µ-law coding and, in case during the session the lost-packet percentage exceeds 0.4% because of network congestion, to switch to the GSM coding; this in order to keep the Ie value as low as possible and, consequently, the R factor as high as possible according to eq. (3). For what was said previously, maximising R means in fact maximising the speech quality. Then, in order to avoid oscillations around the chosen threshold value (0.4%), it is better to use two different thresholds and then implement a coding switch with the hysteresis mechanism depicted in the figure: Coding Id GSM µ-law 0.2 0.4 Packet loss (%) This way, if Ploss < 0.2% the µ-law coding is used, if if Ploss > 0.4% GSM is used; if 0.2 < Ploss (%) < 0.4, that is if Ploss enters the critical region, then the previous coding is still used until Ploss reaches the opposite margin of the region, then the codec switching takes place. The value 0.2 has been empirically obtained, in fact it has been observed that with such a value the oscillations due to the codec switching are not excessive; furthermore, it was chosen to use, as a second threshold, a value lower than 0.4 and not greater (so showing a slight bias toward GSM) because GSM appears to be a more robust coding than µ-law. Then it is possible to compute the R factor, from which the estimated MOS can be obtained using the equations provided by the G.107 standard and shown previously. So, summarising, the communication is begun using the µ-law coding and whenever RTT and Ploss(%) information is available, obtained from “Sender Report” RTCP packets, by using the graphs seen above the Id and Ie values are computed. Then the R factor and the MOS are obtained. 7. THE TESTBED Thanks to the graphs above, it is possible to determine, for the GSM and µ-law codecs, the value of Ie for any given percentage of lost packets; then, those graphs can be used as a reference in order to implement the dynamic codec switching. In fact, it is possible to observe that the two curves, referred to the GSM and G.711 (i.e. µ-law) codecs, intersect in the point corresponding to a Ploss of 0.4%; and that Let us now briefly dwell upon the test scenario. In the several unicast audio-conference tests carried out, it was decided to use the adaptive algorithm on one only of the two communicating sides, while the other transmitted using a fixed coding (GSM or µ-law). The reason of this choice was to be able to compare the speech quality of a user using an adaptive coding with the quality of another user using a fixed coding: i.e. in order to have a significant comparison, the tests were arranged so that the two sides “saw” the same network at the same instant and in the same congestion conditions; yet to the purpose of a meaningful comparison, the two communicating computers sent to each other the same identical audio file. To perform those tests, it was needed to congest the local network so as to test the modified RAT and evaluate the dynamic algorithm’s performance when the established network conditions occur. That was achieved by means of a Traffic Generator installed on other three computers in the LAN; each of these opened eight TCP connections towards the other two, sending a 65,535 byte packet every 10 ms. Finally, let us point out that several test sessions were performed for each one of the following three different network conditions: • Communication with short congestion intervals: 3 minutes without congestion, then the traffic generator is activated for 1 minute; all this repeated three times obtaining a total test session duration of 12 minutes. • Communication with congestion intervals equal in duration to intervals with traffic generator off: 2 minutes without congestion, then other 2 with generator on; all this repeated three times for a test session of 12 minutes. • Communication with long congestion intervals: 1 minute without congestion, then generator on for 3 minutes; all this repeated three times for a test session of 12 minutes. 8. RESULTS OBTAINED The results obtained have confirmed the behaviour which had been expected; that is for the user using the adaptive coding an average MOS was observed greater than the MOS relative to the other user using a fixed coding • Communication with short congestion intervals MOS (Adaptive coding) MOS (GSM coding) • The GSM full-rate coding used, on the contrary, is based on the RPE-LTP (Regular Pulse Excitation – Long Term Prediction coding) algorithm, which belongs to the Linear Predictive Coding voice coder category. Such coders create a model of the human voice, from which some significant parameters are extracted, coded and transmitted. This compression mechanism leads to a lower bit-rate (13.2 Kbps) but also a worse communication quality with respect to µ-law coders: if the network is not heavily loaded, it is then convenient to transmit with µ -law; when the network is heavily loaded or congested, it is better to use the GSM coding which, apart from having a lower bit-rate which avoids to further worsen the congestion condition, makes use of the interleaving transmission technique that reduces the number of consecutive audio units lost in case of packet loss. Furthermore, the GSM coding i such that every 20 msec s frame contains some parameters relative to the modelled voice signal referring to the previous frame; so if some packets are lost, it is possible to partially reconstruct their content from the subsequent packets received (on the other hand PCM, i.e. µ-law, leaves a “gap” in the final audio reproduction). In the high-loss intervals, the Adaptive and GSM codecs show comparable MOS value; this is due to the fact that the adaptive coder, in case of heavy losses, uses the GSM coding, which is just the one used by the other side. MOS (Adaptive coding) MOS (µ-law coding) 5 4 3 2 1 t (sec) MOS 720 5 MOS 4 3 2 1 t (sec) Average MOS Adaptive coding 4.13 µ-law 3.90 Improvement 5.9 % 720 Average MOS Adaptive coding 4.14 GSM 3.99 Improvement 3.8 % Comparing the Adaptive coder against GSM, it is possible to note that in the case of few packets lost, the former, which is using the µ-law coding, shows a better MOS than the latter; that can be explained from the different way of operation of the two codecs: • The µ-law coding belongs to the Waveform Coding voice coder category: the analog signal s sampled at a 8 Khz i frequency and every sample is then represented by 8 bits; so a µ-law coder generates a 64 Kbps bit-rate. On the other hand, by a comparison between the Adaptive and the µ-law coders, it can be observed that in case of a small lost-packet percentage, both present the same MOS values, whereas in case of high loss the Adaptive coder has a higher MOS; this, since in that situation it uses the GSM coding which, for what discussed above, is more robust and efficient in case of congestion. • Communication with congestion intervals equal in duration to unloaded intervals (i.e. with generator off) In this case also, for the reasons previously pointed out, the Adaptive coder’s behaviour was better than both the µ-law and GSM ones. The comparison against µ-law is more favourable to the Adaptive coder than in the previous congestion pattern because µ-law is particularly affected by packet loss. MOS (Adaptive coding) MOS (GSM coding) 9. CONCLUSIONS By using a dynamic coding switch, it has been achieved to optimise the quality of an audio real-time communication, i.e. to minimise its degradation in the presence of network congestion conditions, by adapting the coding to the current underlying network situation; and overall encouraging results have been obtained. Further developments can certainly consist in expanding the set of the codecs used for the implementation of the dynamic algorithm; to this regard particularly promising could be revealed the inclusion of codecs based on variable-bit-rate techniques. REFERENCES [1] P. C. Mehta, S. Udani: “Overview of voice over IP”, University of Pennsylvania, 2001; www.cis.upenn.edu/~udani/papers/overviewVoIP.pdf [2] Raikar, Amit: “Voice over IP Network”, 2000; www.launchingpad.com/industrylinks.htm [3] H. Marjamäki: “Delay characteristics of an IP voice terminal”, Tampere University of Technology 1999; www.tct.hut.fi/tutkimus/ipana/paperit/harrimsc.pdf [4] ITU-T Recommendation G.107: “ The E-Model, a computational model for use in transmission planning”,2002. [5] J. Janssen, D. De Vleeschauwer, M. Buchli, G. H. Petit: “Assessing voice quality in packet-based telephony”, IEEE Multimedia, 2002. [6] R. G. Cole, J. H. Rosenbluth: “Voice over IP performance monitoring”, AT & T Laboratories Middletown NJ, ACM SIGCOMM Computer Communication Review April 2001. [7] H. Schulzrinne,S. Cawsner,R. Frederick, V. Jacobson: “RTP: A trasport protocol for real-time application”,RFC 1889, 2001. [8] University College London: Robust Audio Tool (RAT); www-mice.cs.ucl.ac.uk/multimedia/software/rat [9] M. J. Karam, F. A. Tobagi: “Analysis of the delay and jitter of voice traffic over the internet”, IEEE Multimedia, 2000. [10] J. Janssen, D. De Vleeschauwer, F. Poppe, G. H. Petit: “Quality bounds for packetized voice transport ”, 2000; http://atr.alcatel.de/hefte/00i_1/gb/pdf_gb/06vleegb.pdf [11] J. Horrocks, Chairman: “The E – Model, 2001; http://portal.etsi.org/stq/presentations/emodel.pdf [12] J. Q. Walker: “Assessing VoIP call quality using the EModel”, 2001; http://download.netiq.com [13] T.A.Hall: “Objective speech quality measures for internet telephony”, 2000; w3.ant.nist.gov/wctg/manet/speechq.pdf [14] H. Schulzrinne, W. Jiang: “QoS measurement of real-time multimedia services in the internet”, Technical Report CUCS015-99, Columbia University. [15] “VoIP QoS Issues”, International Engineeering Consortium, 2003; www.iec.org/online/tutorials/vfoip 5 4 MOS 3 2 1 1 t (sec) 720 GSM 3.90 Improvement 3.9 % MOS (µ-law coding) Average MOS Adaptive coding 4.05 MOS (Adaptive coding) 5 MOS 4 3 2 1 t (sec) 720 µ-law 3.54 Improvement 13.0 % Average MOS Adaptive coding 4.00 • Communication with long congestion intervals Also in case of heavy congestion the adaptive coder performs better than both µ-law and GSM. In particular it can be noted that the longer the congested intervals, the greater the improvement with respect to µ-law. MOS (Adaptive coding) MOS (GSM coding) 5 4 MOS 3 2 1 t (sec) 720 Average MOS Adaptive coding 3.84 MOS (Adaptive coding) GSM 3.70 Improvement 3.8 % MOS (µ-law coding) 5 MOS 4 3 2 1 t (sec) 720 Average MOS Adaptive coding 3.75 µ-law 3.30 Improvement 13.6 %

Related docs
Research Papers
Views: 9  |  Downloads: 0
Research Papers
Views: 10  |  Downloads: 0
papers
Views: 9  |  Downloads: 0
glyn harman research papers
Views: 5  |  Downloads: 0
call for papers
Views: 4  |  Downloads: 0
and call for papers of
Views: 2  |  Downloads: 0
CALL FOR PAPERS
Views: 3  |  Downloads: 0
call for papers
Views: 15  |  Downloads: 0
(a) referred papers
Views: 7  |  Downloads: 0
CALL FOR PAPERS
Views: 2  |  Downloads: 0
Call for papers
Views: 5  |  Downloads: 0
Chemistry Research Papers
Views: 208  |  Downloads: 2
premium docs
Other docs by Imdadullah Moh...
WIFI CELLULAR
Views: 157  |  Downloads: 4
VOIP DIGITAL DIVIDE
Views: 44  |  Downloads: 1
Priscila TRIDENT
Views: 38  |  Downloads: 0
Mazurcky paper
Views: 37  |  Downloads: 0
Anton Paper
Views: 18  |  Downloads: 0
Hoene communication
Views: 20  |  Downloads: 0
communication paper
Views: 40  |  Downloads: 3
Arivana Mobile
Views: 73  |  Downloads: 0
Cellular Phone papers
Views: 59  |  Downloads: 3
01Fitri
Views: 143  |  Downloads: 2
VOIP
Views: 82  |  Downloads: 9
VOIP data sheets
Views: 35  |  Downloads: 14
Web Design Tutorials
Views: 286  |  Downloads: 66
Patent Form 19
Views: 46  |  Downloads: 5
Patent Filing
Views: 179  |  Downloads: 17