Downlink Scheduling in CDMA Data Networks

Downlink Scheduling in CDMA Data Networks Niranjan Joshi, Srinivas R. Kadaba, Sarvar Patel, and Ganapathy S. Sundaram wireless TechnologyLaboratory,Lu...
17 downloads 0 Views 1MB Size
Downlink Scheduling in CDMA Data Networks Niranjan Joshi, Srinivas R. Kadaba, Sarvar Patel, and Ganapathy S. Sundaram wireless TechnologyLaboratory,LucentTechnologies,Whippany,NJ (nsjoshi,skadaba,sarvar, ganeshs) @ bell-labs.com

ABSTRACT

1.

Packet data is expected to dominate third generation wireless networks, unlike current generation voice networks. This opens up new and interesting problems. Physical and link layer issues have been studied extensively, while resource allocation and scheduling issues have not been addressed satisfactorily.

Packet data services over wireless communication links are a reality, and have been standardized in third generation (3G) wireless communication systems. Sprint PCS offers access to the web over wireless phones, Metricom offers wireless data service, and Palm Computing now markets a wireless internet access capable device, the Palm VII-W. Importantly, the major and most widely deployed 3G systems are expected to be based on code division multiple access (CDMA) [27], with the time division multiple access (TDMA) based European GSM system evolving towards UMTS [1]. The transmission of packet data over 3G cellular systems requires new algorithms and protocols. In particular, the issues of resource allocation and quality of service (QoS) management for wireless packet data on CDMA are both interesting and important.

In this work, we address resource management on the downlink of CDMA packet data networks. Network performance (for example, capacity) has been addressed, but user centric performance has not received much attention. Recently, various non-traditional scheduling schemes based on new metrics have been proposed, and target user performance (mostly without reference to wireless). We adapt these metrics to the CDMA context, and establish some new results for the ofitine scheduling problem. In addition, we modify a large class of online algorithms to work in our setup and conduct a wide range of experiments. Based on detailed simulations, we infer that: • Algorithms which exploit "request sizes" seem to outperform those that do not. Among these, algorithms that also exploit channel conditions provide significantly higher network throughput. • Depending on continuous or discretized bandwidth conditions, either pure time multiplexing or a combination of time and code multiplexing strikes an excellent balance between user satisfaction and network performance. • Discrete bandwidth conditions can lead to degraded user level performance without much impact on network performance. We argue that the discretization needs to be fine tuned to address this shortcoming. Permissionto make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributedtbr profit or commercialadvantageand that copies bear this notice and the full citation on the first page. To copy otherwise, to republish,to post on servers or to redistributeto lists, requires prior specific permissionand/or a fee. MOBICOM 2000 Boston MA USA Copyright ACM 2000 1-58113-197-6/00/08...$5.00

INTRODUCTION

Packet data traffic is highly bursty in nature and error rate requirements are far more stringent compared to voice, which dominates 2G systems. Unlike voice, which exhibits fairly regular flow characteristics, tolerance to errors, and strict delay requirements, packet data is more tolerant to delay, enabling the use of retransmission to recover erroneous frames. Further, the volume of information to be transported in a packet data system (for example, internet access) can be much larger than on voice networks. Protocols related to the physical and link layers have been standardized, but issues of resource management have not been worked out. All this calls for a fresh approach to resource allocation. In this paper we address the problem of power and rate scheduling in the context of data transmission over CDMA downlinks. Succinctly put, we examine algorithms that decide which user(s)'s packets go out on the air, when, for how long, and at what rate. Such a study is not entirely new to wireless, but our point of departure is the exploitation of "request size" information as well as rate information to perform scheduling in CDMA downlinks. We experiment with appropriate algorithms and study a broad range of performance measures which reflect both network performance and user level satisfaction. R e l a t e d w o r k o n s c h e d u l i n g . The focus of research in resource allocation for CDMA has been on the uplink. This is due to the perception that the uplink, due to the potentially large number of users competing for simultaneous channel use, is the bottleneck. This usually involves some sort of demand assignment at the base station, with information

179

from the multiple mobiles competing for uplink bandwidth. Some recent work includes [26, 22] and references therein. On the other hand, there has been relatively little activity in resource allocation and scheduling on the downlink. This is due to the predominance of voice traffic in current generation networks. In 3G systems, d a t a traffic for internet applications is expected to dominate, which would make the downlink intensive compared to the uplink. Simple algorithms for interference management at the intercell level have been proposed in [22, 29]. A fair scheduling approach, which treats the resource allocation problem on general wireless networks (with no reference to CDMA) as one of fair packet scheduling, was considered in [24]. However, no distinction is made between uplink and downlink d a t a flows. R a t e processor sharing was proposed in [11] for scheduling downlink transmissions, followed by further work [14]. In a different regime, job-based scheduling has been considered in [23, 15, 17, 16] without reference to specific wireless systems. O u t l i n e o f t h e p a p e r a n d c o n t r i b u t i o n s . We describe the CDMA downlink system model in Section 2. This is done to identify the key characteristics of CDMA which are useful in resource allocation. We discuss issues related to physical and upper layer protocols to establish t h a t request size information can be extracted in wireless downlinks. Most of the information related to upper layers can be found in [23], b u t the details axe included here to make the exposition self contained. Our contributions include the following. In Section 3, we identify metrics pertaining to user level performance; one metric called flow (or delay or response time) is traditional, while another, called the stretch (or normalized delay or slowdown), is relatively new. The latter explicitly uses the size of the request along with the response time. We present some old and new results related to the optimality of these metrics. The new results pertain to scheduling to optimize stretch in a CDMA context. One provides an algorithm to optimize stretch when user b a a d w i d t h s are continuous, and another is an intractability result under discretized user bandwidth conditions. We then detail algorithms based on both metrics, which address rate allocation and scheduling in CDMA.

- If bandwidth is discretized, then a combination of time and code multiplexing, with a dominant time multiplexing component, provides better performance overall. - The granularity of the discretization may have an impact on user centric measures, without much decrease in network performance. We argue t h a t this can be tuned to largely offset any negative impact. We conclude the paper in Section 5.

2.

CDMA DOWNLINK MODEL

In this section, we discuss the C D M A downlink model including physical layer and t r a n s p o r t layer issues. This is done to elaborate the salient features of the CDMA scheduling problem. These include: (i) for a given resource (power in the case of CDMA), users in different locations in the cell experience different bandwidths, and (ii) every user's d a t a flows through and from a common anchor point (e.g., a base station or switch), as opposed to the traditional router scenario which is distributed. We use the terms "user", "job", and "mobile" interchangeably.

2.1

Physical layer resources: power and rate

In the physical layer, the primary resources of interest are power and rate. The relation between t h e m provides some insight into the CDMA downlink resource management problem. A simplified, b u t broadly applicable, relationship can be derived from physical layer considerations. At any location with a given signal to interference plus noise ratio (SINR), the fraction ¢ of base station (BS) power required to support a d a t a rate R is given by [3],

, 0 p~na=. THEOREM 4. There exists a near optimal, o~ine, polynomial time algorithm which minimizes max-stretch under the assumption of continuum rates. PROOF. The proof is constructive and provides an algorithm to minimize the max-stretch ot~ine. The number of jobs, their arrival times, sizes, and SINR's (or geometries) are known in advance. So also the m a x i m u m power allocated to any job, and consequently the corresponding max. rate. The algorithm proceeds in three steps. First we determine if a certain value of stretch is a feasible upper b o u n d for all jobs. Next, using an iterative search, we converge on the rain-max-stretch of all jobs in the system to within any specified tolerance. Finally, we use the rain-max-stretch to determine a schedule. F e a s i b i l i t y . Let S be any real number. In order to determine if there exists a schedule such that the stretch of any job is less than or equal to S, it is enough to check if every job completes before its deadline Di = S • ti + a~. We proceed by making use of the MAXFLOW problem on directed graphs [4] as follows. We create a graph which is an instance of the MAXFLOW problem, and solve it using the Goldberg-Tarjan algorithm [4]. This algorithm runs in time polynomial in the number of vertices and edges in the

graph. The feasibility is tested by checking if the m a x i m u m flow on the graph is equal to the total energy required to serve all the jobs.

Graph: Consider the set { a z , a 2 , . . . , a N , D I , D 2 , . . . ,Dlv} of arrival times and deadlines. Sort these instants of time in ascending order. Let I~ = ( u j , v j ) denote the jth interval between the jth and (j + 1) th instants in this set. Next, we construct a bi-partite graph of vertices and edges, as follows. Create a vertex for every job Ji in the system along the left column. Create a vertex for every interval Ij in the right column. If the arrival time ai vj, then we create an edge between the i tu vertex in the left column and the jth vertex in the right column with capacity P["~=(vj - u j ) . This is the amount of energy that can be allocated to Ji in the interval Ij. In addition to this bipartite graph we create a source vertex and sink vertex. From the source vertex, create edges to every vertex in the left column with capacities P[a==.ti. This is the amount of energy needed to serve job J~. From the jth vertex in the right column, create an edge with capacity P s s ( v j uj). This is the amount of energy available in the system in interval Ij. Test: Compute the m a x i m u m flow in this graph. It is easily observed that S is feasible (or equivalently all deadlines are met) if and only if the m a x i m u m flow in the graph is equal to ~ i -1~.=ma= .~.~. Since the m a x i m u m flow algorithm runs in polynomial time, the feasibiliy of S can be determined in polynomial time. C o m p u t i n g rain-max-stretch. The algorithm to compute the rain-max-stretch, to arbitrary precision, works as follows. Let St~8~ denote the test value of stretch. Begin with St~t = 1. If the feasibility test on St~t is negative, keep doubling St~st until a feasible value is reached. Denote this value by Syeasibte; the recently tested (infeasible) value is now Sozd = Syeaslble/2. Now, perform a binary search between Sozd and Sfeasibl~ until their difference is within the desired tolerance. This algorithm runs in polynomial time. We denote this stretch as Sopt. S c h e d u l e . In order to compute the offline schedule, we repeat the graph creation process once again with stretch Sopt. This time, we pay attention to the solution of the MAXFLOW problem in detail by noting the flows along the various edges. Clearly, the flows from the source to the left hand vertices should correspond to P~a=.ti. Let the flow along the edge from Ji to Ij be Cij. Then we allot Cij/(vj - uj) units of power to Ji during the interval Ij. This determines the entire schedule. [] R e m a r k . When Pim~* = PBS Vi, then this theorem follows from results in [15]. The feasibility test and schedule are both obtained using an EDF strategy. Next we consider the discrete rate case. In this setup, we do not translate power into rate directly. Instead we are constrained to transmitting data using a rate that belongs to a predefined "discrete rate set." Now, given P ~ , the translation to R ~ ~ proceeds as follows. First we translate the power directly into rate r~a=; this rate may not correspond to any rate in the rate set. So we choose the largest rate in

182

the rate set not exceeding r ~ ; let this rate be R~i. Now calculate the power corresponding to Rli, say Pti. Compute the difference between p m ~ and Pli. Repeat the above procedure, using the left over power in each step, until there is not enough power left to assign. Let the corresponding rates be R2i . . . . , Rmi. T e n Ri = ~-]j Rji. Now each rate Rji corresponds to a specific code. So, under this model, any job can be served simultaneously using multiple codes. In addition, the scheduling instants are dictated by the frame boundaries. Let us denote the size of a frame by A. We call this setup as the discrete rate set case. Under this model, we prove the following hardness result which indicates the improbability of the existence of a polynomial time algorithm for minimizing max-stretch.

(EDF) and processor sharing (PS), which are listed in Table 1. Not all algorithms are simulated in both continuum and discrete rate cases. Among the algorithms listed in the table, we only describe RMAX. The others are either self explanatory or the appropriate reference is indicated. Motivated by results in [15], we use an earliest deadline first (EDF) strategy to design an online heuristic that attempts to minimize max-stretch. Since the optimal value of the max-stretch for all jobs so far is not available, we maintain an estimate, the max-stretch-so-far S, and adjust it dynamically.

Algorithm R M A X THEOREM 5. Minimizing max-stretch for the discrete rate set case is an N P hard problem.

1. Assign deadlines Di = ,9 • ti + ai for queued jobs. 2. For continuum rates, execute job with the earliest deadline.

PROOF. The proof proceeds by making successive reductions and relating it to the Bin Packing Problem [19]. Recall that, the Bin Packing Problem seeks an algorithm to partition a set of L numbers into k sets where the sum of all elements in each partition is less than or equal to a specified quantity b. It has been shown that this problem is NP hard. In this proof, we show that if there was an algorithm to minimize max-stretch for the discrete rate set case (for any set of rates, and any frame size) then that algorithm can be used to solve any instance of the bin packing problem. Observe that it is enough to show that there is no polynomial time algorithm to test feasibility of a certain stretch value. Suppose there was a polynomial time algorithm (poly in number of jobs) to test feasibility for any discrete rate set and N jobs and any test stretch value S. Then this algorithm will also solve the instance when the rate set consists of exactly one rate R. In particular it will solve the instance when all jobs have the same arrival time and job time. Without loss of generality, we m a y assume ai --- 0 and t~ = t for all jobs. Recall that the deadline for job Ji is Di = S.ti + al. Hence the deadlines of all jobs are S.t. Note that even though the rate set consists of only one element, R ~ a~ need not be equal across jobs. Let p~ be the amount of power needed to support a rate of R for Ji. Hence the corresponding quantum of energy that is used to send one frame along one code, is piA. Let M i be the number of such quanta needed to service Ji. Then overall, meeting every deadline amounts to packing the set of numbers consisting of the union of all these quanta (over all i) with the constraint that in any frame the total energy does not exceed P B s •A. Note that any instance of the Bin Packing Problem can be converted into such a set for suitable values of R, [J~[, and S. The result now follows from the intractability of this problem. []

3.2

Online algorithms

In the online scenario, available information is limited to jobs that have arrived so fax. Hence, we suggest an online heuristic RMAX which applies to both continuum and discrete rate cases. Here onwards, we assume that Pima~ = PBS, and allow only one code per job in the discrete rate case. We consider two classes of algorithms: deadline-based

3. For discrete rates, sort the deadlines and allocate power in order of earliest deadline first using a greedy algorithm. 4. At a job completion, update the max-stretch-so-far to the maximum of the current value and the stretch of the completed job. Variants of RMAX are possible wherein different estimates of the rain-max-stretch of all jobs so far are used. Some simple ones are discussed in [15].

4.

EXPERIMENTAL RESULTS

Here, we present results of extensive simulations performed with real data from a proxy server.

4.1

Simulation environment

The simulation consists of three major components: (i) data traffic provided by the proxy server trace, (ii) the cellular system including gross propagation effects, which we simulate, and (iii) link level data which considers detailed air interface conditions, which we obtain from [25]. Detailed descriptions of these components follow. D a t a traffic. A recent report [13] indicates that internet data traffic is largely a mix of web based traffic ( H T T P and F T P ) , dominated by H T T P in particular. Such traffic is highly bursty in nature. Various analytic characterizations and models of internet data traffic exist in the literature. Mah [5], and Barford and Crovella [21], proposed H T T P simulators. But the drawback with this simulated approach is, one does risk the possibility of over dependence on certain parameters. In fact, Willinger and Paxson [28] show that internet traffic patterns in general, and web traffic in particular, are very sensitive to modeling parameters. Also, in a wireless scenario, it seems natural to model the BS as a proxy server, as opposed to an isolated web server (recall arguments from Section 2.2). Therefore, we choose a proxy server log (trace) [20] as the traffic source to our system. The chosen trace is a 24 hour record of job sizes, timestamps, and destinations (IP addresses) of responses to H T T P re-

183

RMAX MAX SRPT SRJF FIFO HRUF

BitProp WorkProp WorkProc UPS' RouRob EqRate

T a b l e 1: E D F a n d P r o c e s s o r S h a r i n g A l g o r i t h m s Based on Max Stretch so far (see below) Algorithm from [15]; an equal unit rate is used in computing deadlines Shortest Remaining Processing time First Shortest Remaining Job (in Bytes) First; an equal unit rate is used to compute processing times First In First Out Highest Rate User First - job with highest R ma= (or highest SINR) is served first Bit Proportional - the power assigned to a job is proportional to the job size in bits (or bytes) Work Proportional - the power assigned to a job is proportional to the energy (P.~a= × - J n ~ ) Work Processor Sharing - the power assigned to each job is proportional to the energy per bit (~ ~a-~) Uniform Processor Sharing - power is equally shared among all jobs R o u n d Robin - serve jobs in cyclical order a frame at a time E(iual Rate - allocate power to support 153.6 kbps regardless of position in cell

Mean Median Max

Inter-arrival time (ms) 233 128 4612

Table 2: Trace S t a t i s t i c s Job arrivais/sec D a t a arrivai rate (bytes/sec) 93631 4.3

14760 50899675

25

quests. The job sizes (in bytes) indicate the sizes of H T M L / F T P files requested. The timestamps (in milliseconds) record the instants at which the jobs arrived at the proxy. Table 2 summarizes some of the key characteristics of the trace. Also, for a given file size, different user geometries would result in different job times. This requires that multiple S I N R configurations be simulated in order to gain more insight into the performance of the scheduling algorithms. W e present results for upto 50 different configurations of job SINR's. Cellular system. A cluster of 19 hexagonal cells, with a B S at the center of each cell, forms our cellular system setup. The center cell is the one of interest, with the other 18 cells forming two tiers of interferers around it. Typically, two tiers are necessary and sufficient to generate significant interference to the users in the center cell. Since C D M A allows universal frequency reuse, it is reasonable to assume that all surrounding cells transmit at full power. Consequently, we only perform intracell scheduling in the center cell. User locations are uniformly distributed over the cluster area. A path loss exponent (q, = 4) and standard deviation (as = 8 dB) for the log-normal shadowing model gross propagation effects. Once a user is placed, the B S from which the strongest signal is received acts as the h o m e B S to the user. The h o m e B S can be different from the one to which the user is geographically closest. A job from the trace is assigned only if the user is connected to the center cell. This results in concentrating the traffic source in the center cell,while assuming that all interfering cells transmit at full power. This is equivalent to assuming that the interfering cells are always loaded with traffic. Such high loading presents a conservative interference scenario, due to which our results m a y be interpreted as lower bounds in a C D M A network. A user is assumed to be connected to the h o m e B S throughout its sojourn. Switching between BS's (handoff) is not simulated.

Job size (bytes) 14197

1945 54319947

Link level data. W e assume that the S I N R is instantaneously conveyed to the B S by the mobile. The B S translates the S I N R to the corresponding m a x i m u m rate R7 'a= using a lookup table created from link level data in [25]. The data provides the required power fraction ¢, as a function of SINR, to support a rate of 76.8 kbps at a frame error probability of 5 % regardless of position in the cell. This was obtained by performing detailed C D M A air interface simulations in a slow Rayleigh fading environment (3 k m p h vehicle speed) and included a power control algorithm. Since slow fading was already considered in the link level simulations, we do not simulate it at the system level. Further, the presence of power control in the link level simulations ensures that the user's S I N R is more or less maintained throughout the sojourn. Therefore, in our simulations, the user location and channel conditions axe assumed to remain constant throughout the trial. For a given SINR, to obtain the corresponding ¢ for a rate other than 76.8 kbps, the numbers are scaled in proportion to the desired rate. This is an approximation and assumes that channel models, mobile speeds, frame error rates etc. are same for all rates. However, the approximation applies to all the algorithms. Though the physical channels are not error free, we do not simulate error recovery protocols. Consequently the throughputs that we obtain should be interpreted as upper bounds in the given conditions.

P e r f o r m a n c e m e a s u r e s a n d a l g o r i t h m s . We consider a wide range of performance measures (see Table 3) to evaluate the performance of various scheduling algorithms. They are intended to capture i m p o r t a n t attributes of both network and user centric performance. The choice of performance measures may not be exhaustive, but is representative of the wireless scenario. Given our choice of performance measures, we simulate algorithms which are optimal, or near-optimal, with respect to these measures. Table 1 summarizes the algorithms we simulate. Our choice of ED-

184

Avg. Avg. Max. Min. Avg. Avg. Avg. Avg.

Agg. Thru Str Str Job. Thru Job. Thru Jobs in queue Resp. Time Unsat (x %)

Jobs Complete Jobs Complete < N

Table 3: Various p e r f o r m a n c e m e a s u r e s Average Aggregate Throughput of the system (kbps) Average of job stretches Maximum of job stretches Minimum of job throughputs (kbps) Average of job throughputs (kbps) Average jobs present in the system at any instant (averaged over time) Average response time of job (seconds) Average number of jobs which achieve less than x% of their maximum rate Rma=; equivalent to average number of jobs whose stretch is greater than 1/x Number of completed jobs in a given time window Number of completed jobs among the first N jobs

F algorithms is explained as follows. RMAX and MAX are online heuristics for minimizing max-stretch when rate information is and is not available, respectively. SRPT minimizes avg-flow. SRJF is its rate-independent variant. FIFO is optimal for minimizing non-preemptive max-flow, while HRUF maximizes the instantaneous system throughput. Processor sharing is natural to CDMA, and is therefore worthy of consideration. Voice transmission in CDMA at 9.6 kbps is a classic example of EqRate. The other flavors of processor sharing were chosen from [14]. Another popular algorithm, RouRob (Round Robin), is equivalent to processor sharing in time. An important result to note (Lemma 3, [15]) is that processor sharing turns out be very unfair: it can have f~(n) competitive ratio for max-stretch. In other words, the observed max-stretch for processor sharing algorithms can be far from optimal. Experiments are performed with both continuum and discrete rate sets. Not all algorithms are simulated in both settings. The results are presented in detail in the following two sections. In addition to distributions of some of the key measures, we also provide statistics gathered over different configurations of job SINR's.

4.2

larger number corresponds to HRUF, which gives preference to closer jobs and boosts the network throughput. But this amounts to restricting the coverage, since jobs with unfavorable SINR would tend to experience large delays and may eventually drop the connection.

0.9 0 07

°I

i

........

. .....

=.........

8

-

~

-

_,,"

- -

RMAX

- -

FIFO

-2

1 0.3 0.2

-

I -

1

.

4

-

11.1

-

°o

Scheduling with continuum rates

In this section we discuss the results of our simulation when the rate assigned to a job is taken from an interval determined purely by power and SINR. Further, time is a continuous variable without any restrictions on scheduling instants. In essence, we do not impose constraints on bandwidth allocation and frame size granularity. This allows us to judge the relative performance of algorithms purely with respect to the chosen performance measures. In the EDF algorithms, the BS simply calculates a rate using the entire power PBs and grants it to the job with the earliest deadline. On the other hand, in processor sharing all jobs in queue are assigned some share of the base station transmit power, thus serving them simultaneously.

Iooo

2ooo

30co

4o0o 5ooo sooo 7ooo 8000 Jot) throughput (Id~)

9ooo

1oooo

F i g u r e 2: C D F of job t h r o u g h p u t for c o n t i n u u m rates (single realization).

Let us now compare algorithms with respect to the applicable performance measures chosen earlier. Referring to the distributions of individual job throughputs in Fig. 2, we see that they are comparable for all the EDF algorithms except FIFO. However, this does not provide enough information regarding individual job satisfaction. Table 4 separates the algorithms in this regard: RMAX, SRPT, and their rateindependent variants MAX and SRJF, are far superior to HRUF with respect to minimum job throughputs and response times. Further, close to 90% of the jobs achieve atleast 75% of their maximum throughput. On the other hand, for FIFO and HRUF, almost every job has a throughput less than 75% and 65% of their maximum throughput, respectively. Interestingly, using just job size information but no rate information, MAX and SRJF show very good overall job level performance. This is indicative of the advantages of using job size information. The low number of

The simulation is event driven, where events are job arrivals or departures. A simulation realization stops at the end of a 200 minute time window. O b s e r v a t i o n s . The numbers in bold indicate the best performance along the columns in Table 4. The average aggregate throughput, presented in the first column of the table, varies between 420 kbps and 630 kbps. Expectedly, the

185

Algorithm

RMAX MAX SRPT SRJF FIFO HRUF WorkProp

BitProp UPS WorkProc

EqRate

Table 4: Statistics for v a r i o u s scheduling a l g o r i t h m s : c o n t i n u u m rates Avg. Avg. Total. Avg. Avg. Avg. Max. Min. Avg. Unsat Jobs. Agg. Str Str Job. Job. Jobs in Resp. queue Time (75%) Comp. Thru Thru Thru 11.91 51468 25.8 2.51 554.38 1.35 58.1 0.81 1.67e3 51444 427.58 1.73 4.8e3 3.86 1.5e3 36.93 1.47 13.99 20.85 1.04 12.44 51481 573.1 1.13 71.77 2.15 1.63e3 36.85 1.59 17.84 51445 425.07 2.18 4.9e3 3.79 1.4e3 29624 1.08e4 2.6e3 99.87 420.66 6.9e5 9.7e7 9e-5 2.37 2.88e3 9.6e2 34.3 45125 634,1 3.8e3 1.2e6 le-4 1.7e3 2.49e4 5.2e2 66.3 907 434.65 3.5e5 4.4e7 8e-5 650.1 8327 2.08e4 3.2e3 99.5 557 1.9e6 4.3e7 6e-5 10.25 92.41 12.4 51305 534.90 92.84 384.5 0.12 45.04 99.9 51183 136.5 16.7 99.9 424.3 550.7 3.4e4 0.65 9.74 35970 515.5 1.7e5 3.6e7 le-4 16.55 6.39e4 1.2e3 99.9

Jobs. Comp. < 50000 49954 49925 49961 49926 29624 43888 907 8327 49790 49671 35970



?;lI

12

0.+ J(-

., ....... :

114 10 8 6 4

/

5

10

15

20

25

30 ~15

40

45

.~)

5

10

15

20

25

30

40

45

50

0.5

0.4

0

# O0°O0

.

3

0.2

ooo ogl°°°°° ooo °°

0.1 0

~

oo

0 5



10

1,5

20

Job25stre~h 30 35

40

45

50

0

35

F i g u r e 3: C D F o f j o b s t r e t c h for c o n t i n u u m rates (single realization).

Figure 4: C o m p a r i s o n o f a v e r a g e s t r e t c h for continu u m rates (across realizations).

jobs in the queue, on average, indicates superior queue control; the queue is efficiently managed and large buildups do not occur. This can potentially benefit buffer management at the BS. On the other hand, in the FIFO (HRUF) schemes, small jobs (farther jobs) tend to queue up behind large jobs (closer jobs), which results in starvation. This is evidenced by the total number of jobs completed in the 200 minute window: HRUF completes almost 6000 fewer jobs than RMAX, SRPT, and variants. Many of these jobs do not even belong to the first 50,000 jobs in the trace (in order of their arrival), in keeping with the tendency of HRUF to unduely delay jobs with unfavorable SINR (likely the jobs farther from the BS). This is exemplified by the last column in Table 4: among the first 50,000 jobs, HRUF completes about 6000 fewer jobs compared to the other EDF algorithms. In effect, these jobs are blocked, while jobs which arrive much later (but have better SINR) get served. This serves to reinforce the observation that HRUF provides high network throughput and good average job throughput at the cost of significantly higher blocking.

With respect to max-stretch and avg-stretch, RMAX, MAX, $RPT, and SRJF show far superior performance than the other algorithms. This is expected since they are intended to work well with respect to these metrics. Comparison between RMAX and MAX, as well as between SRPT and SRJF demonstrates that, rate information can significantly improve the network performance. Although, our average job-centric numbers do not show any significant advantages for RMAX and SRPT (over MAX and SRJF respectively) they consistently performed better as evidenced in Figs. 4 and 5. In addition, the maximum value of max-stretch for RMAX and SRPT is orders of magnitude lower than that for MAX and SRJF. This proves that rate information enhances scheduling performance when job size information is used. Processor sharing algorithms provide thinner data pipes (smaller rates) to all the jobs, due to which the stretch or individual job throughput takes a hit. For example, the average job throughput ranges between 9.74 kbps and 650.1 kbps which is orders of magnitude less than all EDF schemes, ex-

186

xld 15

,

~

,

,

,

,

,

0 6 G C C C C C C ~ C C.~ G C C ~ C C.7. C C . ~ - ~ ~ C 3 C ~,C ~ C C ~.C G C-~ 0 5 10 15 20 25 30 35 5000

0 100

0! 0

,

,

,

,

,

5

10

15

20

25

,

,

,

,

i 5

i 10

i 15

,

30

i i i 20 25 30 Realization number

,

2 ~- ~ 40

most all the performance metrics observed suffer greatly. In other words, multiplexing jobs in time space is better than multiplexing jobs in code space in CDMA downlinks. This is not unexpected considering the sub-optimality of processor sharing with respect to max-stretch (Lemma 3,[15]).

,

CCCG~ C~ 45 50

,

,

,

35

40

45

i 35

i 40

i 45

E D F algorithms which exploit job size information, and which give preference to smaller jobs (either explicitly or implicitly) outperform all other algorithms. Figs. 2 and 3 further corroborate this observation. 50

Capturing job size information is very desirable for improving job level performance, but the absence of rate information results in lower network throughput. Algorithms which capture rate information in addition to job size information, via the stretch metric, seem to strike an excellent balance between network throughput and individual job performance.

50

Figure 5: C o m p a r i s o n o f m a x stretch for c o n t i n u u m rates (across realizations). cept FIFO. In parallel, the avg-stretch and max-stretch are also significantly higher. Note that WorkProp and BitProp perform the worst with respect to the quantities measured, since they give higher preference to larger jobs. This indicates, t h a t prioritizing jobs with smaller "sizes" (explicitly in the case of SRPT and SRJF, and implicitly in the case of RMAX and MAX) leads to distinct performance advantages. This inference is further emphasized by the observation that UPS and WorkProc do not perform as badly as WorkProp and

BitProp. In general, assigning discrete rates would further aggravate the problem related to processor sharing algorithms. Hence, jobs may not get any rates (which in turn may lead to inefficiencies in power utilization). Moreover it may not be possible to accomodate all jobs simultaneously. One method of applying a processor sharing scheme, in a discrete rate setup, is to provide an equal predetermined rate from the rate set to all jobs (when they receive data). But the problem is, this EqRate algorithm (for example, in an a t t e m p t to guarantee 153.6 kbps to the jobs), ends up wasting power. The reason is t h a t some times the available power at the base station is insufficient to provide this predetermined rate (of 153.6 kbps in our example) to any other job and hence will remain unutilized. As a result, all the performance measures suffer drastically (as evidenced in Table 4). The processor sharing algorithms are therefore not considered in the later sections.

4.3

Scheduling with discrete rates

A continuum rate set is difficult to achieve in practice. However, it allows us to judge the performance of various algorithms in a relatively unconstrained setting, emphasizing their principal characteristics rather than their reaction to practical constraints. Based on the results in the previous section, we identified E D F algorithms as the best performers. To reiterate, processor sharing algorithms are more likely to suffer drastically when the practicalities of a discrete rate set and scheduling instants are imposed. In this section, we seek to examine the performance of the chosen E D F algorithms under such constraints, and compare them with HRUF and RouRob. We choose RouRob since it is equivalent to processor sharing in the time domain. We only simulate uniform round robin and do not consider any weighted round robin schemes. Unlike the unquantized (in time and rate), and event-driven, simulation in the continuum rate case, the following simulation uses 20 ms frames (as in cdma2000) for transmitting data, with the frame boundaries also forming scheduling instants. We assign only one code per user, and hence the maximum rate is an element of the discrete rate set. The discrete rate set that we employ is (9.6, 19.2, 38.4, 76.8, 153.6, 307.2, 614.4} (kbps) specified in [27]. Under these conditions, we simulate 20 different job SINR configurations. The results axe presented in Table 5 and Figs. 6 - 8.

I n f e r e n c e s . Based on the results presented and the intuition behind processor sharing algorithms as well as E D F algorithms, we note the following.

O b s e r v a t i o n s . The numbers in bold indicate the best performance along the columns in Table 5. The average aggregate throughput, presented in the first column of the table, varies between 440 kbps and 630 kbps. Expectedly, the larger number corresponds to HRUF, which gives preference to closer users and boosts the network throughput. But as observed earlier this restricts coverage.

• Time multiplexing jobs in the downlink has distinct advantages from the perspective of the scheduling metrics t h a t we have observed. Processor sharing algorithms which a t t e m p t to "fair share" the BS transmit power only end up choking the jobs uniformly and al-

Assigning rates to users from a discrete rate set can lead to residual transmit power, which can be utilized by greedily filling the power bin. We therefore schedule jobs using an E D F algorithm for computing schedule priorities, followed by greedy power allocation. It is important to note that this combination is only an online heuristic, which may be

187

Algorithm

RMAX MAX SRPT SRJF RouRob HRUF

T a b l e 5: Statistics f o r v a r i o u s Avg. Max. Min. Str Str User. Thru 1.04 2.0 47,05 2.0 126.3 4.65 2.76 1.91 72.05 4.47 2.1 134.5 344.2 2.95e4 0.013 629.82 I~4 1.75e3 3.32e6 Avg. Agg. Thru 528.8 440.38 548.9 466.9 517.3

i

i

i

scheduling algorithms : discrete rates Total. Avg. Avg. Avg. Avg. Jobs. Jobs in Resp. Unsat User. Comp. Thru queue Time (%) 30.0 51460 278.1 28.7 3.1 36.87 1.65 31.59 51443 268.6 51475 23.07 1.5 30.35 276.2 51455 31.9 1.96 34.45 259.9 51254 109.9 1.7e4 98.17 19.89 45404 2.65e3 6.86e4 52.3 266.1

Jobs. Comp. < 50000 49965 49932

49970 49931 49876 44132

i 11

1 0.9

.

o.91

fa / / • / /

0.8

0.81

0.7

0.71

/-

0,6 0.5 0.4 /s

t I

0.8i



I=

0.5i o.41

oo i ~ :/

0.3

: : I

o.31 s,

s SSS

0.2

O.2

0.1

0.1!

100

2oo

3oo

4oo

mrou~'v,ut (kb~)

5oo

6oo

01

700

5

10

15

20

25

30

Job stretch

Figure 7: C D F o f j o b stretch (discrete rates)

Figure 6: CDF of job throughput (discrete rates) decidedly sub-optimal for R M A X and S R P T . Greedy allocation reduces power wastage, which results in the network throughput being comparable to that in the continuum rate case. Indeed, the network throughputs for all the E D F algorithms in Table 5 are commensurate with those in Table 4.

In addition, as expected from the results in the previous section, RouRob performs very poorly when compared to the other algorithms. More results on the performance of all the algorithms (for a chosen realization) with r e s p e c t t o throughput and stretch are presented in Figs. 6 and 7. Fig. 8 outlines the differences between RMAX, SRPT and their rate free variants MAX, SRJF, as a function of the realization. In some realizations, the max-stretch for RMAX is noticed to be higher t h a n t h a t of SRPT. This is due to the likely suboptimality of the combination of RMAX and greedy power allocation. Moreover, the updates to the working value of stretch are restricted to the 20 ms frame boundaries, and axe not done at shorter intervals. Over the course of our experiments, we have observed t h a t updating the working value of stretch more frequently tends to improve the performance of both R M A X and M A X .

Regarding the user related performance measures, two imp o r t a n t observations may be made.

1. The E D F algorithms which use job size information outperform the other two algorithms. This observation is consistent with earlier results. In addition, the relative performance between the E D F algorithms remains unchanged when compared to the simulations of the previous section. 2. As expected, the individual numbers for all the user level performance measures, suffer when compared to the continuum rate case. Specifically, the average user throughput numbers axe lower. This is due to the fact t h a t the m a x i m u m rate of any user is constrained by the discrete rate set, while the actual rate t h a t can be supported by the available power can be much higher. However, if multiple codes can be assigned to each job, then the m a x i m u m rates can be much higher, leading to improved job level performance.

Inferences. following:

Based on the results presented we note the

* The use of a greedy power utilization scheme in every frame ensures t h a t the power wastage is reduced. This is reflected in the fact t h a t the average aggregate throughputs for the various algorithms are comparable to their corresponding numbers in the continuum case. 188

2.5

I-

2.4

I-

2.3

RMAX - SRJF

I -,

Suggest Documents