Impact of Statistical Multiplexing on Voice Quality in Cellular Networks

Mobile Networks and Applications 7, 153–161, 2002  2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Impact of Statistical Multiplex...

Author: Francine Moore

7 downloads 1 Views 227KB Size

Report

Download PDF

Recommend Documents

Statistical Multiplexing over DSL Networks

Notes on Statistical Multiplexing

Performance Modeling, Stochastic Networks and Statistical Multiplexing

Applying Statistical Multiplexing and Traffic Grooming in Optical Networks Jointly

End-to-end QoS with Statistical Multiplexing in ATM Networks

Statistical multiplexing of distributed video streams

Generations of Cellular Networks

Computer Networks: Multiplexing

Statistical Multiplexing of Transcoded IPTV Streams based on Content Complexity

On The Statistical Multiplexing of Optical Code Division Multiple Access

On the Statistical Multiplexing Gain of Virtual Base Station Pools

Statistical Multiplexing of H.264 programs

Impact of the Base Station Antenna Beamwidth on Capacity in WCDMA Cellular Networks

Cellular Networks

Testing the effect of load on delay and voice quality in VoIP networks

Cacheability of YouTube Videos in Cellular Networks

Statistical Multiplexing of Homogeneous Streams Results in Linear Bandwidth Gains

STATISTICAL MULTIPLEXING OF IDENTICAL BURSTY SOURCES IN AN ATM NETWORK

Statistical Multiplexing, Bandwidth Allocation Strategies and Connection Admission Control in ATM Networks *

Statistical multiplexing in optical flow switching networks Kai Wu a and Qingji Zeng b*

ADAPTIVE STATISTICAL MULTIPLEXING FOR BROADBAND COMMUNICATION

Statistical Multiplexing and Mix-Dependent Alternative Routing in Multiservice VP Networks

A Survey on Energy Efficient Wireless Communication in Cellular Networks

Cellular Networks and WiMAX

Mobile Networks and Applications 7, 153–161, 2002  2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Impact of Statistical Multiplexing on Voice Quality in Cellular Networks THOMAS ENDERES Global Wireless Systems Research, Bell Labs, Lucent Technologies, Optimus, Swindon SN5 6PP, UK

SWEE CHERN KHOO Division of Engineering, King’s College London, Strand, London WC2R 2LS, UK

CLARE A. SOMERVILLE Global Wireless Systems Research, Bell Labs, Lucent Technologies, Optimus, Swindon SN5 6PP, UK

KOSTAS SAMARAS Raycap Corporation, Telou & Petroutsou 14, Maroussi, Athens 15124, Greece

Abstract. This paper examines the quality of transmission of voice over cellular, packet-switched networks. The medium access mechanism in the uplink is simulated under various statistical multiplexing scenarios in order to assess the effect of front-end clipping on voice quality. Moreover, the simulation is implemented in a real-time demonstration platform utilized to acquire subjective indicators of voice quality by performing Mean Opinion Score (MOS) tests. Results from the MOS tests are reported, and an analysis of the obtained speech samples is presented. Finally, the results are summarized and potential further directions for the simulation tool and the speech models are discussed. Keywords: voice quality, cellular networks, statistical multiplexing, speech pattern

1. Introduction There is a strong trend to create one network technology capable of supporting all kinds of data types simultaneously. Therefore, the network infrastructure would have to both support the individual requirements of the data types as well as use its communication resources as efficiently as possible. One way to achieve this is by exploiting the statistical multiplexing offered by packet-switched transmission techniques. However, different services have different requirements. Voice, being a real-time service, needs to be delivered from the source to the destination without exceeding a certain delay. Downloading a data file does not need to meet the same delay requirements. On the other hand, speech codecs can cope with some transmission errors while for file transfers a single error could be unacceptable. In order to support such diverse qualities of service, the design of all protocol layers in the bearer network is affected and to some extend inter-layer interaction is necessary. This represents a major challenge for the design of integrated networks. This paper concentrates on the provision of packet-switched voice, since this represents the most common service in today’s cellular networks. Referring to the Voice over IP (VoIP) field, much work has been done to facilitate voice transmission using datagrams. However, many important issues are still unresolved when it comes to the support of voice over cellular, packet-switched networks. Some of these issues are addressed herein. For cellular networks, statistical multiplexing over the air interface is particularly appealing,

since the radio spectrum is an extremely scarce and expensive resource and, therefore, should be used in an efficient way. However, whenever statistical multiplexing is applied, situations of temporary overload may occur, which have to be taken into account. In code division multiple access (CDMA) systems, a peak only degrades the carrier to interference ratio, which might be acceptable up to a certain level. For time division multiple access (TDMA) systems, such as the General Packet Radio Service (GPRS), a number of parallel and independent data/voice flows share a common pool of channels (timeslots) and due to a temporal lack of channels it may happen that some voice packets are dropped or delayed. Consequently, statistical multiplexing will have an impact on the quality of service (QoS). The medium access (MAC) protocol, which represents a key part of every system employing statistical multiplexing, determines the QoS to a large degree. The simulation of the medium access mechanism used in this project is described in section 2. Furthermore, simulating objective QoS parameters such as delay and packet dropping alone is not sufficient to find out the best tradeoff between gaining capacity through statistical multiplexing and degradation of speech quality. Instead, subjective assessment of the speech quality is required. A platform which has been designed in order to carry out an experimental study on voice quality using mean opinion score (MOS) tests, is described in section 2. The test procedure is outlined in section 3. The results as well as an analysis of the speech patterns obtained from the samples are presented in sections 4 and 5.

154

T. ENDERES ET AL.

2. Simulation and demonstration system For the purposes of assessing the quality of speech under statistical multiplexing environments, a simulation tool called NetSim has been designed and implemented at Bell Labs. The software is written in C++ [13] and it is original in the sense that it does not depend on any other existing MAC simulation platform. NetSim is capable of operating in two different modes, real-time mode, or non real-time mode. Non-real time mode means NetSim operates as a stand-alone application running on only one PC without affecting a real two-way conversation but only simulating the traffic in a network cell, and storing the results into a log file. In demonstration (realtime) mode, two persons have a full duplex conversation, the quality of which is directly influenced by the simulation taking part simultaneously in NetSim. 2.1. Simulation of the uplink The vast majority of commercially deployed cellular networks use a central coordination unit per cell, in GSM/GPRS referred to as a Base Station Controller (BSC), leading to a star topology. Therefore, the task of distributing downlink resources is centralized at the BSC and is much less complex compared to the uplink which requires coordination among the Mobile Stations (MS). For most TDMA systems MAC policies are based on packet reservation multiple access (PRMA) techniques [11,12,17]. The time that elapses between the arrival of a packet at the MAC layer and the time it can actually be passed onto the physical layer is referred to as the medium access delay. For this study, it represents a crucial parameter, as it is significantly affected by statistical multiplexing. For the transmission of voice on the uplink, the MS would start to contend for a channel (time slot) as soon as the voice activity detector (VAD) detects the start of a talkspurt. For the whole duration of the talkspurt, speech packets are continuously generated. If the medium access delay remains below a certain threshold, which depends on the internal buffer available, packets will be stored and transmitted later when channel resources are released. However, if the medium access delay exceeds the threshold, packets will have to be dropped. The dropping strategy is to discard the oldest packet in the buffer when a new packet arrives. This leads to what is referred to as front-end clipping. In this case the beginning of a talkspurt is not transmitted and the first packets in this talkspurt are lost. Due to the contention process, front-end clipping will be more severe on the uplink than on the downlink. Therefore, the study presented in this paper is focused on the uplink related problem. 2.1.1. Contention procedure Each GPRS cell has a number of radio frequency channels available. For each of those channels, the time axis is divided into TDMA frames each having a duration of 4.62 ms. Each TDMA frame is further subdivided into 8 time slots [7]. Referring to the uplink, it is assumed that at least one timeslot is reserved for a packet random access channel (PRACH), while

Figure 1. Uplink radio frequency channel.

the others are used to transport voice packets. This is illustrated in figure 1. A detailed description of this arrangement is given in [4,5,7]. Due to delay constraints, real-time services such as voice should use an unacknowledged transmission mode, requiring GPRS to utilize a two-phase access [5]. As downlink messages have a time granularity of 20 ms, for a two-phase access this means a delay of 80 ms in the best case. It is easily concluded that GPRS as standardized today, is not suitable for real-time services [10]. Furthermore, statistical multiplexing might impose additional delay. There are proposals on methods to reduce the delay to less than 20 ms by abolishing the 20 ms granularity [9]. As the main goal is to evaluate how statistical multiplexing would affect the uplink transmission both the access scheme described in those proposals and the simulation of the cell are incorporated into NetSim. In the proposals for the introduction of the capability to support real time services in GPRS, the MAC mechanism on the uplink requires mobile stations to contend for a channel by transmitting requests on the PRACH. When the BSC successfully receives a request, it responds immediately using a channel allocation or a queueing notification message. The BSC would retransmit a channel allocation to the user, in case the first channel allocation message is corrupted at the downlink. As soon as the MS receives a queueing notification, it stops contending, and waits for a channel allocation message. In case the MS does not receive any notification, a second request is sent, according to a backoff algorithm. To prevent a buildup of collisions that would render the system useless, the backoff algorithm randomly delays retransmission of requests of unsuccessful mobile stations. Additional management functionality of the BSC includes the assignment of free channels/timeslots to mobile stations that could not be served before as well as monitoring the channel activity on channels assigned to mobile stations. If no activity is detected on those, the channel resources will be deallocated and added to the pool of free channels/timeslots. In case more than one request is sent during the same slot, signals are corrupted and the BSC cannot resolve the requests. However, if the BSC can lock onto the stronger signal, or use adaptive antenna array processing techniques, more than one request can be captured in case of collision. Such capture techniques based on spatio-temporal signal processing are described in detail in [14]. In the present paper the receiver’s ability to capture one or more messages in case of

IMPACT OF STATISTICAL MULTIPLEXING ON VOICE QUALITY IN CELLULAR NETWORKS

collision, is modeled by the so-called capture probability matrix P, where the element Pij represents the probability of capturing i packets/messages when there are j packets colliding. A “no capture” model would have all elements of the matrix zero, except P11 = 1. 2.1.2. Speech modeling In order to study the effects of statistical multiplexing, the simulation of the speech patterns for the users sharing the same wireless facilities (channels) becomes a crucial modeling consideration. For the sake of simplicity, a discrete-time version of Brady’s Markovian two-state model of speech [3] has been implemented to simulate the packet generation characteristics of the mobile users. According to that model, a user can be either in talkspurt state St , or in a silence state Ss . It is understood that in silent state, any resources are released and have to be reclaimed though uplink contention at the beginning of the next talkspurt. The two-state model is fully described by the transition probabilities between the states. Talkspurt and silence lengths are assumed to be exponentially distributed. The discrete version is appropriate, as transitions between the two states only occur on a frame-by-frame basis, resulting in a granularity of T = 20 ms. The transition probabilities are given by TOFF , P (s → t) = 1 − exp − T TON P (t → s) = 1 − exp − . T This speech model describes only the dynamics of the voice activity. However, in cellular networks speech codecs are used. Thus, it becomes important to model the output of such a coder. As most common for second generation digital cellular networks, the GSM Enhanced Full Rate (EFR) codec has been chosen for this study. It includes a voice activity detector (VAD), which assigns either the state active or silent to the digital speech input for every sampling interval T . As it would be very confusing for the participant at the receiving side not to hear anything whenever the other party is silent, comfort noise is generated at the receiving side within those periods to reassure that the connection is still intact. To make it sound similar to the actual background noise at the transmitting side, special data describing this noise is transmitted as well. These data are contained in silence descriptor (SID) frames. While packets containing speech frames are constantly created while the VAD is detecting a talkspurt, the mechanisms during the silence period include the addition of hangover periods as well as the generation and transmission of SID frames [6]. In this paper three different levels of “speech” activity are considered: (1) the actual voice activity as detected by the VAD, (2) for the EFR case, the voice activity including hangover periods (SP bit = 1) [6], (3) the coder-output activity; active meaning a speech or SID frame is scheduled for transmission.

155

Table 1 Literature references for TON . TON (s)

Reference

0.35 0.40 0.42 1.00 1.41

[2] [18] [3] [1,16] [8]

Unfortunately, the ITU recommendations on speech modeling [1] do not describe a model that would account for the coder-output activity. Lacking a better alternative, Brady’s model is used. The question remains how to select the values for TON and TOFF . During the course of development and in the first versions of NetSim, these two parameters were fixed. However, in a realistic situation it is highly unlikely to have all users in a cell generating speech with the same average activity ratio and average talkspurt length. According to the ITU standards, the activity ratio can have any value between 0.29 and 0.93 [6]. Therefore, NetSim was extended in order to be able to pick those values dynamically on a user-to-user basis. As no probability distribution function was given in the source, the values were picked uniformly within that interval. In the literature, the values reported for TON vary significantly. Due to a lack of more precise information, and for the purposes of this project, the values are selected uniformly between the smallest and the largest values reported in the open literature, listed in table 1. These values are compared with the results that were obtained as part of the subjective quality assessment. 2.2. Real-time demonstration mode Before explaining the test-system used for the subjective voice quality assessment, it has to be outlined what is meant by subjective voice quality. For communication tasks, voice quality is a subjective or objective evaluation of the similarity between a voice signal and the attempt to rebuilt it with technical means. However, there are different scopes for that evaluation. First, for the broadest scope, from now on called user voice quality or voice quality at user scope, the original is the voice signal at the mouth reference point of the speaker while the signal to be evaluated is the sound level at the ear reference point, as shown in figure 2. This is by far the most complex scope, as any noise contributions are taken into account. Unfortunately, this is the relevant scope for subjective testing and, therefore, would need to be accounted for when performing the MOS tests. For the design of a demonstration platform, the relevant scope is system voice quality, which is determined by technical processing in the end nodes as well as the capabilities of the networks involved. The objective for a demonstration system is to create a performance as close to real implementations as possible, and consequently, would have to account for any factors at system scope. However, the constraint for the demonstration system is to use inexpensive and

156

T. ENDERES ET AL.

Figure 2. Various scopes for voice quality.

Maintaining real-time synchronization in a distributed application is only possible up to a certain accuracy, and some properties of TCP, such as the Nagle algorithm, which optimize throughput by buffering data to avoid sending several small packets, make it an even more difficult task. However, that algorithm can be bypassed by either switching it off, if the system offers that option (TCP_NODELAY) or padding the packets to be transmitted to the maximum segment size, which is 1460 bytes for the Ethernet case [19]. Another problem encountered is the fact that it is not possible to calculate the system delay experienced by the speech signal from the time it enters the one PC over the microphone and leaves the other PC over the loudspeaker. That delay consists of the processing delay from the PC sound system as well as the delay consisting of the network transport and the processing by the NetSim and the “voice application” components. In order to estimate the delay, a packet is sent back and forth. Still, as it was not possible to measure the time passed from the moment a voice data unit is handed to a device driver until it is actually transformed into an acoustic signal, and vice versa, there is some uncertainty remaining. The systeminduced delays were in the range between 90 and 140 ms, depending mainly on the LAN technology utilized. 3. Subjective quality assesment with MOS tests The MOS test objective is to determine what level of statistical multiplexing can be achieved while maintaining adequate subjective voice quality. A further goal is to correlate frontend clipping with subjective voice quality. For the purposes of this project MOS tests are conducted during conversations. The quality of the conversations is graded by subjects with one of the following marks: 1 = bad, 2 = poor, 3 = fair, 4 = good and 5 = excellent.

Figure 3. Setup of the demonstration system.

commonly available hardware components, which is the reason why multimedia laptops and common LAN technologies have been used. Neither laptops nor standard LAN technology can be regarded as real-time systems in contrast to mobile station hardware and cellular networks, which have been built to work in real-time conditions specifically for voice transmission. Thus, it will never by possible to simulate their behavior exactly, however, it is possible to approximate it sufficiently well in order to obtain QoS predictions. An overview of the overall setup for demonstration mode is presented in figure 3. Due to its availability for any LAN technology, the end-toend protocol utilized here is Transmission Control Protocol (TCP). It offers the feature of guaranteed delivery and therefore is given preference over User Datagram Protocol (UDP). In general, the end-to-end protocol is of minor importance for the demonstration system, as the two laptops involved are supposed to be the only hosts connected to a LAN in order to avoid collisions and retransmissions.

3.1. Scenarios In order to correlate MOS with levels of statistical multiplexing gain, certain network scenarios corresponding to certain settings in NetSim are considered. The following scenarios were selected: 1. Circuit switched (reference). 2. Statistical multiplexing gain 1.2. 3. Statistical multiplexing gain 1.5. 4. Statistical multiplexing gain 1.8. For the scenarios 2, 3 and 4, it is assumed that there are three radio frequency channels available in the uplink, i.e. 24 TDMA timeslots. One out of 24 timeslots accommodates the PRACH whilst the rest are traffic timeslots. For each of the above three scenarios the number of simultaneous voice flows is 29, 36 and 43, respectively. In scenario 1, the number of voice flows equals the number of available timeslots. The overall end-to-end quality at user scope is simulated as realistically as possible, by selecting NetSim parameter values, which are typical in cellular networks. As far as the

IMPACT OF STATISTICAL MULTIPLEXING ON VOICE QUALITY IN CELLULAR NETWORKS

157

Block Error Rate (BLER) on the traffic channel, which corresponds to dropped blocks due to physical layer impairments, a typical value of 1% is assumed [20]. The BLER for access bursts on the PRACH, is 1%, while the BLER for downlink messages is considered to be 10%. The higher BLER on the downlink is due to the fact that there no diversity techniques applied in the downlink (at least not considered in this work). The no-capture case is applied, with a retry probability for the backoff algorithm of 0.5. The end-to-end delay excluding the part imposed by statistical multiplexing, is set to a fixed value of 150 ms. Furthermore, the mean talkspurt length and the mean activity ratio for the uplink simulation were selected dynamically as described in section 2.1. 3.2. Tasks In order to assess the impact of delay on the quality of a conversation over a packet-switched link conversation-opinion tests have been conducted. In the ITU-T recommendation P.800 [15], the method to be applied is described, while the content is left unspecified. One possibility for the experiment design is to have a task resolved by the two subjects. Consequently, for this project, a task should be suitable of triggering typical conversations in a phone call. Obviously, it is not possible to choose a single task, which would capture the whole variety of situations in which mobile phones are used. A variety of factors determine how a conversation takes place: for what reason is the call initiated, how familiar are the talkers, what is the location of the participants, etc. Therefore, four different tasks are used. In the first task, the participants were asked to solve a simple game requiring constant interaction between the participants. Those interactions mainly consist of rather short commands (questions and answers) with a high density of information, so it is expected that even short clippings would create degradation of intelligibility. In the second task, the participants have to find differences in two nearly identical pictures by describing them, which does not prescribe a structure for the conversation but the participants are free to organize their conversation by themselves. The third task is supposed to stimulate a conversation as it would occur for one of the information services, enabling customers, e.g., to telephone a movie theater to ask about performance hours of the movies, etc. The last one recreates a situation, where one participant has to describe a way on a map, while his partner has to redraw the way from a starting point to a destination according to the directions given by the other party. 3.3. Selection of subjects To average out the influence of gender, 6 female and 7 male subjects are selected, totaling 13. They are all students living in Britain, aged between 18 and 24 years. Each subject was invited to take part in this MOS test in two sessions, one with a female and the other one with a male partner, except for subject No. 1 and No. 13, who only participated in one session

Figure 4. MOS as a function of the statistical multiplexing gain.

each. In each session, the four (statistical multiplexing) scenarios were applied in a random order. Altogether, the MOS for each scenario is averaged over 24 grades. 4. Test results Figure 4 shows the results of the experimental study for the four scenarios. As it is expected the MOS drops with the increase of the statistical multiplexing gain, besides the slight increase between 1.0 and 1.2. This small discrepancy is due to the limited number of subjects, and the fact that the actual differences in subjective conversation quality between the line-switched case, and the statistical multiplexing 1.2 and 1.5 are not significant. For statistical multiplexing values of above 1.5, the impact of front end clipping starts becoming significant. In order to obtain a better understanding of that “good quality” region (up to statistical multiplexing of 1.5) more points, i.e. more subjects and more tests are required. 4.1. Correlating front end clipping with MOS In this section the MOS scores are correlated with front-end clipping. For each task executed, two records are obtained, one from each subject. A record contains the information on clipping as well as the grade given by the subject. The records are partitioned into clusters according to the MOS grade given. For each cluster a representative vector is calculated by averaging over all records belonging to that cluster. The representative vector contains a large number of elements of which only the first elements are of interest. Each element of the vector represents a point value of the cumulative distribution function (cdf) of front-end clipping. Front-end clipping to some extent is tolerable. According to figure 5, with as much as 35% of all talkspurts affected by front-end clipping, good quality can be achieved. The differences between the “bad”, “poor” and “fair” are more significant compared to the ones between the represen-

158

T. ENDERES ET AL.

Figure 6. Pdf of the mean activity ratio. Figure 5. The cdf of the number of front-end clips.

tatives of “excellent” and “good”, which even have overlaps. Consequently, it becomes more difficult to detect small increases in front-end clipping the better the quality gets, which supports the previous statements. Besides the uncertainty due to the rather small number of subjects, another explanation would be that the front-end clipping does not have any effect in that “good quality” region, but that the perceived quality is limited by the test system itself; that would need further investigation with better-quality facilities and more subjects. 5. Speech patterns When designing the simulation part of NetSim, two of the problems faced were how to assign both the mean activity ratio and the mean talkspurt length to the simulated users. Therefore, 440 minutes of unidirectional voice data, stored in 96 speech files, obtained within the 48 bidirectional conversations from the MOS tests, were analyzed and the results are presented in this chapter. First the pdf of the mean values are plotted in figures 6–8; each mean value is obtained by averaging over one of the 96 speech files. We also compare how closely the statistics of these quantities approach the uniform distribution assumption. From figures 6 and 7, it is clear, that for the coder-output neither the mean voice activity nor the mean talkspurt length can be described by one fixed value or a uniform distribution. The mean talkspurt length as well as the mean silence length of the coder-output seem to follow an exponential distribution. After having examined the mean values, another point of interest is to find out how close the real speech pattern are to the model chosen. If Brady’s model was suitable for simulating the coder-output, i.e. the packets actually transmitted over the air interface as part of the voice service, figures 9 and 10 would have to show exponential distributions for that

Figure 7. Pdf of the mean talkspurt length.

graph. At least for the pdf of the silence length, figure 10, there is an exponential shape, which, however, is interrupted by maximum peak at 0.46 s. That, as well as the fact that after this peak any probability measured is zero, is perfectly logical, due to the transmission of SID frames after 0.46 s of silence. For the talkspurt lengths in figure 9, the shape is less obvious to interpret. Comparing the speech files and the on/off pattern generated, it is concluded that a large percentage of those talkspurts with length of 0.02 s for the VAD output and 0.04 s for the coder-output, is due to noise triggering the VAD of the EFR. However, from the description of the VAD in the ITU specifications, no explanation could be found why there are virtually no “VAD talkspurts” produced with length in the range between 0.06 and 0.24 s. In conclusion, an exponential model alone is not sufficient to describe the coder output.

IMPACT OF STATISTICAL MULTIPLEXING ON VOICE QUALITY IN CELLULAR NETWORKS

159

Figure 10. Pdf of the silence length. Figure 8. Pdf of the mean silence length.

Figure 11. “Short-error” misdetection. Figure 9. Pdf of the talkspurt length.

To get a deeper insight about the behavior of the demonstration system, the waveform stored into voice files was plotted. It has been overlaid with the (normalized) on–off patterns for the VAD (amplitude 0.8), the speech including hangover (amplitude 0.6) and the coder-output (amplitude 0.4). Figure 11 illustrates an example of such a waveform, in which a short word is spoken. At the beginning t = 100.2 s, both the VAD, the speech including hangover and the coder output are in “off” state. At t = 100.34 s, t = 100.62 s and t = 100.82 s, the coder output is triggered to schedule SID frame for transmission. The VAD is triggered two times, the first time without any audible or visible reason at t = 100.62 s. Such “short-error” misdetections were found often in the recorded files and are responsible for the high probabilities for talkspurt lengths 0.02 and 0.04 s in figure 9. Another question that needs further investigation is why, besides those “short-errors”, there are no talkspurts shorter

than 0.26 s in figure 9. A possible explanation could be that due to the nature of the human speech tract, human originated sounds cannot be shorter than a certain threshold, which theoretically could be 0.26 s. However, this assumption still does not explain why there are not even misdetections between 0.06 and 0.24 s. The plots of the waveforms recorded yield the same result as the statistics, i.e. speech activity lasts at least 0.26 s while misdetections are either very short as described in the preceding section, or longer than 0.24 s. Figure 12 illustrates a situation where the end of such a long period of misdetection, “long-error”, is followed by a period of speech activity (talkspurt), which is correctly identified by the VAD. 6. Conclusions From the subjective results gathered in this experimental study, it is seen that there is clearly a potential of employing statistical multiplexing on cellular packet-switched networks.

160

T. ENDERES ET AL.

havior of the VAD. Furthermore, those factors are also likely to have an impact on front-end clipping, as for a high input level, the VAD is switched on when the speaker is breathing in, before starting to talk, in which case the clipping might not affect actual speech at all. If simulation results for front-end clipping in the same format as described in section 4.1 were available, it might be suitable to apply a distance measure (such as least mean squares) or fuzzy logic to make statements on their subjective quality without having to perform costly MOS tests. Acknowledgements

Figure 12. “Long-error” misdetection.

If a MOS of 3.5 is considered as the acceptable level for communication quality, according to this experimental study a statistical multiplexing gain of 1.54 would be possible (figure 4). Compared to the circuit switched case, the capacity of the network could be increased by 54%. However, this is just an indicator to show that the potential of statistical multiplexing exists. Further testing in a speech laboratory with higher quality equipment, calibrated noise injection and a larger pool of subjects is required. It is also shown that a certain amount of front-end clipping does not affect the perceived speech quality. Although the software used for this study, NetSim, is a suitable framework for simulating objective QoS, as well as performing subjective assessment, there are issues that still could be improved to give a more exact prediction. First, the voice application so far uses only the EFR codec, although it would be interesting to know how alternatives, such as the Adaptive Multi Rate (AMR) codec, can cope with phenomena like clipping, BLER, jitter and delay. Furthermore, as shown in figure 3, NetSim is separately implemented to allow the performance assessment for any application that uses packet-switched transmission. So far, a memoryless channel model is applied to simulate the physical layer while a channel with memory might yield more realistic results. Admission control, that would take into consideration factors such as the number of users, the amount of front-end clipping and carrier-to-interference ratio in order to allow incoming calls to be accepted, is an essential system component and needs to be implemented. One of the main issues addressed in this study, the mean activity ratio and the mean talkspurt length could be selected more realistically, using the data presented in section 5. In that context another improvement would be to replace Brady’s exponential model by one that is closer to the talkspurt length and silence length distributions of the actual coder output. For all those examinations the recording level of the microphone as well as different levels of background noise need to be taken into consideration as they significantly influence the be-

The authors wish to thank Louis G. Samuel and Ran Yan for useful discussions. The support of several individuals at Bell Labs, Lucent Technologies, at the Department of Electrical Engineering at King’s College, UK, and at the Department of Microwaves and Electronic Engineering, University of Karlsruhe, Germany, is appreciated. References [1] Artificial conversational speech, ITU-T Recommendation P.59 (March 1993). [2] P.T. Brady, A statistical analysis of on–off patterns in 16 conversations, Bell Systems Technical Journal 47 (September 1967) 73–91. [3] P.T. Brady, A model for generating on–off speech patterns in two-way conversations, Bell Systems Technical Journal 48 (September 1969) 2445–2472. [4] Digital cellular telecommunications system (Phase 2+), General Packet Radio Service (GPRS), Overall description of the GPRS radio interface, Stage 2, GSM 03.64 version 6.0.1 (1997). [5] Digital cellular telecommunications system (Phase 2+), General Packet Radio Service (GPRS), Mobile Station (MS)–Base Station System (BSS) interface, Radio Link Control/Medium Access Control (RLC/MAC) protocol, GSM 04.60 version 6.3.0 (1997). [6] Digital cellular telecommunications system (Phase 2+), Discontinuous Transmission (DTX) for Enhanced Full Rate (EFR) speech traffic channels, GSM 06.81 version 7.0.0 (1998). [7] Digital cellular telecommunications system (Phase 2+), Multiplexing and multiple access on the radio path, GSM 05.02 version 8.0.1 (1999). [8] J. Dunlop, D. Robertson, P. Cosimini and J. De Vile, Development and optimization of a statistical multiplexing mechanism for ATDMA, in: Proc. IEEE VTC 1994 (1994) pp. 1040–1044. [9] ETSI SMG2 EDGE Tdoc 2e99-588, Performance of burst-based access and assignment for EGPRS Phase II, Lucent Technologies (December 13–16, 1999). [10] S. Fabri, Proposed evolution of GPRS for the support of multimedia services, M.Sc. thesis, University of Surrey, UK (1998). [11] D.J. Goodman, R.A. Valenzuela, K.T. Gayliard and B. Ramamurthi, Packet reservation multiple access for local wireless communications, IEEE Transactions in Communications 37(8) (August 1989) 885–890. [12] D.J. Goodman and S.X. Wei, Efficiency of packet reservation multiple access, IEEE Transactions on Vehicular Technology 40(1) (February 1991) 170–176. [13] S.C. Khoo, T. Enderes, C.A. Somerville and K. Samaras, Description of a simulation platform for voice quality assessment over wireless packet-switched systems, in: Proc. CSNDSP, Bournemouth, UK (July 2000) pp. 120–125 [14] A. Kuzminskiy, K. Samaras, C. Luschi and P. Strauch, Enhanced space–time capture processing for random-access channels, in: Proceedings of 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP) (August 2000) pp. 156–160.

IMPACT OF STATISTICAL MULTIPLEXING ON VOICE QUALITY IN CELLULAR NETWORKS

[15] Methods for subjective determination of transmission quality, ITU-T Recommendation P.800 (August 1996). [16] P. Narasimhan and R.D. Yates, A new protocol for the integration of voice and data over PRMA, IEEE Journal on Selected Areas in Communications 14(4) (May 1996) 623–631. [17] J. Schiller, Mobile Communications (Addison-Wesley, 2000). [18] K. Sriram, T.G. Lyons and Y.-T. Wang, Anomalies due to delay and loss in AAL2 packet voice systems: Performance models and methods of mitigation, IEEE Journal on Selected Areas in Communications 17(1) (January 1999) 4–17. [19] W.R. Stevens, TCP/IP Illustrated, Vol. 3 (Addison-Wesley, 1996). [20] Voice over GPRS, Technical memorandum, Lucent Technologies (1999).

Thomas Enderes received his Diplom Ingenieur in communication engineering from the University of Karlsruhe, Germany. He is currently employed at the Global Wireless Systems Research Department of Bell Labs, Lucent Technologies, UK. Previously he worked for the Mobile Communication Department of IBM, Heidelberg, Germany and for the Software Department of Humphrey Systems, California, USA. His interests are adaptive applications for next generation mobile devices. He is a member of the OSGi and SyncML and supports the Java2 Micro Edition and XHTML efforts. E-mail: [email protected]

S.C. Khoo received the B.Eng. degree in electrical and electronic engineering from King’s College, London, in 1998. He is currently pursuing his Ph.D. in King’s College, London. His research interests include noise suppression, error detection and correction of speech signals.

161

Kostas Samaras received a BSc in physics and an MSc in telecommunications, both from the University of Athens, Athens, Greece. He received his doctorate in electrical engineering from the University of Oxford, Oxford, UK, where he studied as a scholar of the Greek State Scholarships Foundation. He is currently the Head of the New Technologies Department of Raycap Corporation in Athens, Greece. Prior to this he was with the Global Wireless Systems Research Department of Bell Laboratories, Lucent Technologies in Swindon, UK. He has held academic positions in Greece and in the UK. His interests include radio communications, multi-access protocols, wireless infrared and optical fiber communications. E-mail: [email protected]

Clare Somerville (née Brooks) received the M.Eng. and Ph.D. degrees in electronic engineering in 1995 and 1999, respectively, from the University of Southampton, Southampton, UK. From 1995 to 1998 she performed research into low bit rate speech coders for wireless communications. She is currently with the Global Wireless Systems Research Department, Bell Laboratories, Swindon, UK. Her current research involves the protocol layers in wireless communications systems such as GPRS and UMTS. The focus is on techniques for improving transmission of real-time services and efficient transfer of TCP data.