Buffer Management for Wireless Media Streaming

Buffer Management for Wireless Media Streaming Aditya Dua and Nicholas Bambos Department of Electrical Engineering Stanford University 350 Serra Mall,...
Author: Dinah Houston
0 downloads 1 Views 110KB Size
Buffer Management for Wireless Media Streaming Aditya Dua and Nicholas Bambos Department of Electrical Engineering Stanford University 350 Serra Mall, Stanford, CA 94305 Phone: 650-725-5525, Fax: 650-723-4107 Email: {dua,bambos}@stanford.edu Abstract— We study playout buffer management at the receiver for supporting multimedia streaming services over unreliable wireless links. On one hand, memory is a precious resource on portable wireless devices and must be used judiciously. On the other hand, allocating a big playout buffer to a media streaming application reduces the probability of a playout interruption/freeze due to buffer underflow, thus improving the user’s perceived experience. Thus, inherent in the wireless media streaming scenario is a tradeoff between memory utilization and quality-of-service (QoS) delivered to the application layer. We study this buffer-QoS tradeoff in a dynamic programming (DP) framework. Within the same setting, we also address the issue of determining the optimal rebuffering level when the buffer underflows. Leveraging closed form solutions obtained for a special case of our formulation, we propose BuM, a very low complexity heuristic dynamic buffer management algorithm. The near optimality of BuM, as demonstrated by experimental results, in conjunction with its ease of implementation, makes it an attractive option from a practical perspective. Index Terms— Multimedia streaming, QoS, wireless networks, buffer management, dynamic programming.

I. I NTRODUCTION Next generation wireless networks promise to deliver a variety of multimedia services to subscribers. Multimedia streaming in particular is projected to be a key application in the near future. However, streaming high quality media to mobile wireless users with portable devices is fraught with technical challenges. These challenges are attributed to the high bandwidth and strict delay/jitter requirements of multimedia applications, the inherently unreliable nature of the wireless medium, and limited processing power and memory available on portable wireless devices. Significant research effort has been devoted in recent years toward issues related to media streaming over wireless links, ranging from content encoding at the sender to adaptive playout schemes at the receiver. See [1], [2] and references therein for an overview. On the receiver side, the primary objective is to ensure uninterrupted media playout, or equivalently, to ensure that the playout buffer which stores incoming media packets does not underflow, in spite of the unreliability of the wireless channel over which these packets are delivered. An empty playout buffer causes a playout freeze or interruption and degrades the user’s multimedia experience. Frequent playout freezes are prevented by buffering packets in advance at the receiver as well as through adaptive playout schemes. Kalman et. al. [3] proposed an adaptive media playout algorithm (AMP), which varies the playout speed of media frames depending on channel conditions, allowing the user to

buffer less data, thus introducing less delay, for a given buffer underflow probability. Li et. al. adopted a similar approach in [4] and studied the adaptive playout rate control problem in conjunction with transmit power control in a dynamic programming setting. Stockhammer et. al. analytically showed in [5] that if the video source characteristics and channel behavior is known a priori at the transmitter, then there exist minimum values for initial playout delay and decoder buffer size which guarantee successful playout. In recent work, Liang et. al. [6] proposed an analytical framework to find the distribution of the number of playout freezes during the streaming of a video stream over randomly varying channels. In the works cited above, the main focus was on playout management at the receiver in order to reduce the frequency of interruptions in playout and/or reduce initial playout delay. However, memory is a premium resource on portable wireless devices, which places a stringent limit on the number of media packets which can be buffered in advance. Moreover, this limited memory is shared by multiple applications running concurrently. Thus, memory usage must be explicitly modeled and accounted for when optimizing the performance of multimedia streaming over wireless links. This is the primary motivation for our work. A small playout buffer results in frequent playout freezes, or in other words, poor quality-of-service (QoS) for the multimedia application. On the other hand, a large playout buffer ensures smoother playout, but potentially at the expense of starving other applications of memory space. Thus, inherent in the wireless media streaming scenario is a buffer-QoS tradeoff, which we quantify in this paper. To the best of our knowledge, this tradeoff has not been examined in the existing literature. In our approach, the evolution of a user is split into two alternating phases — the playout phase and the rebuffering phase. In the playout phase, the user has media packets readily available in the buffer for playout. Since memory is an expensive resource, the user incurs a “cost” for storing packets locally, which deters her from buffering too many packets. The playout phase terminates whenever the buffer underflows for the first time (resulting in a playout freeze). Our objective is to maximize the expected duration of a typical playout phase, while keeping buffering costs small. Given the playout buffer level and downlink wireless channel state, the user, in every time-slot, must select one of two conflicting options — request packet(s) from the source (to reduce the probability of a freeze in the near future, thereby elongating the expected duration of the current playout phase), or wait until some

Unreliable wireless channel Access point

Variable rate arrivals

Playout buffer

Constant rate playout

Backlog of playout buffer

Wireless receiver

Playout phase

Playout freeze

Time

Fig. 2. Fig. 1.

Playout phase Rebuffering phase

Playout phase and rebuffering phase

Downlink multimedia streaming over a time-varying wireless link

more buffer content has been played out (to reduce buffering costs). We study this dilemma in a dynamic programming (DP) framework [7] and establish the optimality of a threshold type policy. The end of a playout phase marks the beginning of a rebuffering phase. In this phase, playout is stalled and the user continues to request packets from the source until the buffer level has reached a certain preset level. A new playout phase commences thereafter, and the cycle repeats. We leverage properties of the optimal control to the DP formulated for the playout phase to determine the optimal rebuffering level. In our formulation, the downlink wireless channel is modeled as a finite-state Markov chain. For a special case of this model, viz. a Bernoulli channel, we compute the optimal solution in closed form. The closed form solution enables us to design a very low complexity heuristic buffer management algorithm, namely BuM, which closely matches the performance of the optimal DP based solution, as corroborated by experimental results. The remainder of this paper is organized as follows: The buffer management problem is described precisely in Section II and a mathematical model is constructed to study the problem in an analytical framework. The optimal solution to the problem is presented in Section III, with special emphasis on the Bernoulli channel model. The optimal solution is then leveraged to devise a simple buffer management algorithm (BuM) in Section IV. The performance of BuM is experimentally evaluated in Section V. Finally, concluding remarks and directions for future work are furnished in Section VI. II. M ODEL C ONSTRUCTION

AND

P ROBLEM F ORMULATION

Consider the downlink of a time-slotted wireless communication system (Fig. 1). We focus on a particular user receiving a multimedia stream from the access point (AP). We assume that the media stream is either pre-cached at the AP or can be accessed on demand from a backbone network over a wired link, which operates much faster than the wireless link from the AP to the user. The user “consumes” (plays out) received packets at a fixed (normalized) rate of one packet per timeslot. The user alternates between one of two possible phases — the playout phase and the rebuffering phase (see Fig. 2). A. Playout phase During the playout phase, at the beginning of every timeslot, the user either requests a fixed amount of data from the AP or does not request any data at all. We refer to these two actions as REQ and REQ, respectively. The user chooses

her action based on current buffer occupancy, b ∈ N, and the state i of the downlink wireless channel. We model the channel state evolution as a homogeneous finite-state Markov chain (FSMC) with I states, indexed by I. Each state i ∈ I maps to a unique success probability through the mapping s : I 7→ [0, 1]. The state transitions of the FSMC are governed by the transition matrix P = [Pij ]. Thus, if the current channel state is i, the probability of successful transmission (if the user requests data) is s(i), and the channel state transitions to j ∈ I w.p. Pij in the next time-slot. Remark: Apart from channel conditions, s(·) depends on technology specific parameters such as modulation, coding, receiver structure etc. Since we are interested in packet level performance (rather than bit level performance), we abstract all details of wireless channel behavior and physical layer characteristics into the success probability function, rather than modeling them explicitly. Every time the user requests data (REQ), the AP transmits a fixed number K ≥ 1 packets on the downlink. Given current channel state i, the transmission is successful with probability (w.p.) s(i) and the backlog changes from its current level b to b + K − 1 (K packets received, 1 packet consumed). The transmission fails w.p. 1 − s(i) and the backlog reduces from b to b−1. If the user does not request data (REQ), the backlog reduces from b to b − 1 w.p. 1. A backlog cost of c > 0 units per packet per time-slot is incurred by the user (equivalently, a negative reward rate of −c). Thus, if data is requested when the backlog is b and channel state is i, a reward of −c(b + K − 1) is earned w.p. s(i) and a reward of −c(b − 1) is earned w.p. 1 − s(i). If data is not requested, a reward of −c(b − 1) is earned w.p. 1. The cost rate c captures the buffer-QoS tradeoff discussed in Section I. A small c encourages the user to request data from the AP more aggressively and fill up the playout buffer quicker, and vice-versa. Starting with a non-empty buffer, our goal is to compute the optimal sequence of user actions (REQ or REQ) which jointly maximize the expected time to first freeze and minimize buffering costs. To realize this objective, a fixed positive reward of 1 unit is earned by the user in every time-slot during the playout phase. This positive reward is offset by a negative backlog reward, as discussed above. B. Rebuffering phase Once the buffer underflows, playout is stalled and the user rebuffers b0 (a design parameter) packets before resuming media playout. A large b0 implies a long rebuffering phase,

which is annoying to the user because she has to wait for a long time for playout to resume. On the other hand, a short rebuffering phase is not desirable because starting the playout phase with a small playout buffer reduces the expected passage time to the next freeze. The rebuffering phase is alluded to in the literature as the “initial playout delay” (e.g., see [5]). During this phase, the user requests data from the AP in every time-slot. In channel state i, a request is successful w.p. s(i) and fails w.p. 1 − s(i). Thus, the backlog increases from b to b + K w.p. s(i), and stays the same else. The phase terminates as soon as b ≥ b0 . No backlog cost is incurred during the rebuffering phase. However, a waiting cost of ζ > 0 units per time-slot is incurred during this phase (equivalently, a negative reward rate of −ζ). A terminal reward is earned upon completion of this phase, which depends on b0 , as well as the optimal control policy in the (next) playout phase. Our goal is to compute the optimal choice for b0 which maximizes the reward earned during the rebuffering phase. III. O PTIMAL C ONTROL A. Playout phase — DP formulation The optimal actions of the user in the playout phase can be computed using the methodology of dynamic programming. To this end, define the state of the system (in the current timeslot) by the two-tuple (b, i). We use the term policy to denote a set of user actions as a function of the system state. More formally, a policy π is mapping π : N × I 7→ {REQ, REQ}. Let Π denote the set of all admissible policies. Recall that our goal is to determine a policy which maximizes the expected time elapsed before the buffer empties for the first time, subject to a (negative) backlog reward in every time-slot. Let π ? ∈ Π denote the optimal policy which achieves the desired objective, and let V (b, i) denote the expected reward earned by π ? , starting the system in state (b, i). Then, V (b, i), which is called the value function in DP terminology, is computed from the following recursive, non-linear Bellman’s equations: X V (b, i) = max{ Pij [V (b − 1, j) + 1 − c(b − 1)], j∈I

s(i)

X

|

{z

}

REQ

Pij [V (b + K − 1, j) + 1 − c(b + K − 1)] +

j∈I

|

(1 − s(i))

{z

}

REQ, Successful transmission

X

Pij [V (b − 1, j) + 1 − c(b − 1)]} ∀ i ∈ I,

j∈I

|

{z

REQ, Failed transmission

}

(1)

along with the boundary conditions V (0, i) = 0 ∀ i ∈ I. B. A special and insightful case — The Bernoulli channel The DP equations in (1) do not admit a closed form solution and therefore must be solved numerically using techniques like value iteration or policy iteration [7]. The numerical solution provides limited insight into the fundamental bufferQoS tradeoff embodied by the problem. However, a special

and insightful case of the above general formulation is more amenable to analysis. This special case is that of a Bernoulli channel, where each requested transmission is successful w.p. s and fails w.p. 1 − s, independent of all other events. Remark: Apart from being amenable to analysis, the Bernoulli channel model is of independent interest in the context of static channel scenarios (e.g. a fixed user connected to a wireless LAN), where the channel either stays constant or varies over a time-scale significantly larger than the time-scale of buffer dynamics. For this special case, I = {0} and s(0) ≡ s, i.e., there is only one channel state, and the two-dimensional system state collapses to a single dimension. The DP equations (1) for this case simplify to: V (b) = max{0, s [V (b + K − 1) − V (b − 1) − cK]} +V (b − 1) + 1 − c(b − 1),

(2)

with V (0) = 0. Quite remarkably, the optimal policy in this case can be computed in closed form! Theorem 1 (Optimality of a threshold policy): The optimal policy under the Bernoulli channel model is of threshold type, i.e. ∃ b? such that π ? chooses action REQ ∀ b ≤ b? and REQ 1 1−K . + ∀ b > b? . In particular, b? = c 2 Sketch of Proof: The proof follows directly by combining the results of Lemma 1 and Lemma 2, presented below. Remark: Even though we do not provide a proof here, the optimal policy under the FSMC channel model is also of threshold type, with a different threshold b? (i) in each channel state i. The optimal policy chooses action REQ in state (b, i) if b ≤ b? (i), and chooses action REQ else. While the key ideas involved in the proof are similar to those used in proving Theorem 1, the algebra is (much) more cumbersome. Further, unlike Theorem 1, it is not possible to compute the thresholds in closed form, even for a simple two-state channel model. C. Policy iteration We need the notion of policy iteration to prove Theorem 1. Policy iteration is a widely used numerical technique for solving DP equations. The policy iteration algorithm comprises of two steps — policy evaluation and policy improvement. We briefly describe the two steps in the context of our problem. In the policy evaluation step, starting with any policy π ∈ Π, its expected reward Vπ (b) is evaluated using Vπ (b) =Vπ (b − 1) + 1 − c(b − 1) if π(b) = REQ Vπ (b) =sVπ (b + K − 1) + (1 − s)Vπ (b − 1) + 1 − c(b − 1 + sK) if π(b) = REQ,

(3)

with Vπ (0) = 0. Equation (3) is set of linear equations which can be solved efficiently. In the policy improvement step, π is “improved” to generate a new policy π 0 ∈ Π using Vπ0 (b) = max{0, s [Vπ (b + K − 1) − Vπ (b − 1) − cK]} (4) +Vπ (b − 1) + 1 − c(b − 1). π 0 is called a one-step improvement of π since it is guaranteed that Vπ0 (b) ≤ Vπ (b) ∀ b.

Lemma 1: If policy π is a threshold type policy, then its one-step improvement π 0 computed using (3) and (4) is also a thresholdtype policy. Moreover, the threshold for π 0 is located  1 1 − K at b?π0 = , independent of π. + c 2 Sketch of Proof: See Appendix VII-A. Lemma 1 states that a one-step improvement of any threshold type policy π ∈ Π results in another threshold type policy π 0 , whose threshold b?π0 is independent of π. The implication is that starting with a threshold type policy, the policy iteration algorithm converges in just one step (to the optimal threshold policy). All we need to do now is find an initial policy π which is of threshold type. To this end, consider the policy π ¯ , under which the user never requests data from the AP, i.e., π ¯ (b) = REQ ∀ b. First, we perform the policy evaluation step to compute the expected cost Vπ¯ (b) under policy π ¯ . Since the action REQ is chosen by π ¯ in all states, we have Vπ¯ (b) = Vπ¯ (b − 1) + 1 − c(b − 1) ∀ b > 0.

(5)

Recursively, we obtain cb2  c b. (6) + 1+ 2 2 Next, we compute a one-step improvement of policy π ¯ to obtain a new policy π ¯0 . Lemma 2: The policy π ¯ 0 obtained by one step improvement of policy π ¯ is a threshold type policy with threshold b?π¯ 0 =  1 1−K + . c 2 Sketch of Proof: See Appendix VII-B. Remark: Note that Lemma 2 can be treated as a special case of Lemma 1 by considering π ¯ as a threshold type policy with its threshold at b?π¯ = 0. However, establishing the two lemmas separately is a more general approach (used to establish optimality of a threshold type policy under the general FSMC channel model) and also more insightful. Remark: The hypothesis of Theorem 1 follows easily from Lemma 1 and 2. It is indeed interesting to observe that the threshold b? for the optimal policy is independent of the success probability s (which captures the channel state). Intuition is unlikely to suggest such a result at first glance! Vπ¯ (b) = −

D. Rebuffering phase — Optimal rebuffering level As discussed in Section II-B, playout is stalled during the rebuffering phase and the user requests data from the AP in every time-slot. This phase lasts until the buffer occupancy reaches level b0 , a design parameter. We will leverage properties of the optimal policy π ? for the playout phase to determine b?0 , the optimal choice for b0 . More concretely, if the rebuffering phase ends when the buffer occupancy reaches level b0 and the channel state is i ∈ I, a phase termination reward of V (b0 , i) is earned, where V is the value function for the DP associated with the playout phase, as computed in (1). The terminal reward is offset by the waiting cost incurred at a rate ζ per time-slot. In a typical time-slot during the rebuffering phase, if the channel state is

i, the buffer occupancy increases by K w.p. s(i) and remains constant w.p. 1 − s(i). Let J(b, i) denote the expected reward earned until the end of the rebuffering phase, starting with b packets in the buffer and channel state i. Clearly, we are interested in computing (and maximizing as a function of b0 ) J(0, i) ∀ i ∈ I, subject to J(b0 , i) = V (b0 , i) ∀ i ∈ I. We have the following recursion ∀ b < b0 . X X J(b, i) = s(i) Pij J(b+K, j)+(1−s(i)) Pij J(b, j)−ζ. j∈I

j∈I

(7) Since the buffer level may exceed b0 (by upto K − 1) if K > 1, we also impose J(b, i) = V (b, i) ∀ b > b0 , i ∈ I. Our objective is to choose b0 such that min{J(0, i)} is maximized. i∈I Alternatively, X we can choose b0 to maximize X the (weighted) average λi J(b, i) for some λi ≥ 0, λi = 1. i∈I

i∈I

Remark: A more “complicated” formulation assigns each channel state a different rebuffering level. Thus, the rebuffering phase would end when the backlog is b?0 (i), if the channel state at the beginning of the rebuffering phase was i. The Bernoulli channel case: The recursion in (7) does not admit a closed form solution, just like its counterpart in (1). We therefore once again turn to the Bernoulli channel model for some explicit and insightful computations. Lemma the Bernoulli  3: For   channel model with K = 1, 1 1 ζ ? b0 = 1− . −s+ c c s Sketch of Proof: See Appendix VII-C. Note that b?0 is a decreasing function of ζ, as intuition suggests. Unfortunately, we cannot closed form expressions for the general case of K > 1 in closed form. Remark: In closely related work, Li et. al. [4] studied rebuffering in a DP framework, in the context of adaptive playout control. They argued the existence of a threshold policy, where the user stalls playout and rebuffers packets if the playout buffer occupancy is below a certain threshold, and resumes media playout else. By splitting the evolution of the user into two alternating phases, we a priori imposed this thresholding behavior. We then went on to compute the optimal rebuffering threshold for a special case of our model. IV. A H EURISTIC B UFFER M ANAGEMENT A LGORITHM Based on key properties of the optimal control under the Bernoulli channel model studied in Section III, we now propose BuM, a low complexity heuristic dynamic buffer management algorithm. BuM operates as follows: (a) During the playout phase, set the threshold b? as dictated by Theorem 1 (regardless of the channel conditions). (b) During the rebuffering phase, given the instantaneous channel success probability s, choose the rebuffering level based on Lemma 3 (regardless of K). Note that BuM has very low implementation complexity since it does not explicitly need to solve any DP equations. In contrast, the optimal buffer management algorithm (OPT) bases its decision on solutions to the recursive equations in (1) and (7), and thereby has a significantly higher computational

BuM OPT

α = 0.1

α = 0 .25 10

5

α = 0.5

0 −3 10

Average backlog (bave)

Average backlog (bave)

15

15

−2

5 sLO = 0.5

α = 0.25

α = 0.1

BuM OPT

10 α = 0.5 5

0 −3 10

−2

−1

10 10 Fraction of time spend in rebuffering phase (ηR)

Fig. 3.

Experiment A: Buffer-QoS tradeoff for different values of α

complexity. However, as we show experimentally in Section V, BuM closely matches the performance of OPT, while retaining the advantage of low complexity. V. P ERFORMANCE E VALUATION In this section, we experimentally evaluate the performance of our proposed buffer management policy, BuM. We focus on the buffer-QoS tradeoff for a single link. We investigate the performance of BuM as a function of different system parameters and contrast it to the optimal control policy OPT. For our experiments, the downlink wireless channel was modeled as a two-state Markov chain with states LO and HI, corresponding to success probabilities sLO , sHI respectively (sLO < sHI ). The evolution of the Markov chain was governed by a transition matrix of the form:   1−α α . (8) P= α 1−α We report two sets of experiments. In the first set, α and c were varied, keeping other parameters fixed. In the second set, sLO and c were varied, keeping other parameters fixed. All fixed parameters and their respective value(s) are enumerated in Table I. Parameter Length of simulation (N ) Maximum buffer size (bmax ) Waiting cost rate (ζ) Success probability in HI state (sHI ) Packets per time-slot (K)

−2

10 Playout freeze frequency (f ) F

Value 50,000 time-slots 20 packets 0.1 per time-slot 0.9 0.2

TABLE I F IXED SIMULATION PARAMETERS AND THEIR VALUES

We considered three performance metrics — average buffer occupancy (bave ), frequency of playout freezes, i.e., number of freezes/second (fF ), and the fraction of time spent in the rebuffering phase (ηR ). We would like all three metrics to attain values as small as possible. Note that these are competing

Average backlog (bave)

Average backlog (bave)

F

15

BuM OPT

sLO = 0.1

10

0 −3 10

10 Playout freeze frequency (f )

sLO = 0.3

15 sLO = 0.3

BuM OPT

10 sLO = 0.1 5

0 −3 10

Fig. 4.

sLO = 0.5

−2

−1

10 10 Fraction of time spent in rebuffering phase (ηR)

Experiment B: Buffer-QoS tradeoff for different values of sLO

objectives. For each set of experiments, we generated tradeoff curves bave vs. fF and bave vs. ηR by varying c from 0.05 to 0.25, in steps of 0.05. A small value of c results in large bave and small fF , ηR , and vice-versa. A. Experiment A — varying α The results are depicted in Fig. 3. The top plot shows bave vs fF , while the lower plot shows bave vs ηR . Results for three different values of α are reported. A small value of α corresponds to a slower variation in channel conditions, and vice-versa. Note that α only determines the rate of channel fluctuations. The average success probability is the same for all three choices of α (equal to 0.5(sLO + sHI ) = 0.6). Observe that both BuM and OPT yield similar performance for all three choices of α. The performance of both BuM and OPT improves with α because the channel visits its good state (HI) more frequently. B. Experiment B — varying sLO The results are depicted in Fig. 4. The top plot shows bave vs fF , while the lower plot shows bave vs ηR . For this experiment, α was fixed at 0.1. However, the average channel conditions were varied by changing sLO . Not surprisingly, the performance of both BuM and OPT improves as sLO increases. Observe that once again they perform quite similar to each other. We also conducted several other experiments by varying K, ζ, and with asymmetric channel transition rates. In each of these cases, BuM closely matched the performance of the optimal policy OPT. The experimental results conclusively demonstrate that BuM provides an easily tunable buffer-QoS tradeoff for wireless media streaming. Moreover, BuM deliver near optimal performance (relative to the optimal DP based solution) at very low complexity. VI. C ONCLUSIONS This paper addressed the problem of optimal buffer management on portable wireless devices for supporting multimedia

streaming applications. The basic premise of the study was the fact that memory is an expensive resource on portable devices and hence must be used judiciously. On the other hand, allocating a large playout buffer for media streaming reduces the probability of buffer underflows (playout freezes), thereby improving application layer QoS. The optimal tradeoff between these two competing objectives was studied in a dynamic programming framework. Closed form solutions derived for a special case (the Bernoulli channel model) were leveraged to devise BuM, a low complexity near optimal heuristic buffer management algorithm. The key finding of this paper is that “simple” rules of thumb for buffer management can be used to optimize the performance of multimedia streaming over unreliable wireless links, precluding the need for any complicated online computations. Ongoing research involves a study of buffer management in a multiuser scenario, where a user’s request for data may not be served immediately due to resource contention at the AP. R EFERENCES [1] J.G. Apostolopoulos, W. Tan, and S.J. Wee, “Video streaming: concepts, algorithms, and systems”, Technical Report HPL-2002-260, HP Laboratories, Sep. 2002. [2] M. Etoh and T. Yoshimura, “Advances in wireless video delivery”, Proceedings of the IEEE, vol. 93, no. 1, pp. 111-122, Jan. 2005. [3] M. Kalman, E. Steinbach, and B. Girod, “Adaptive media playout for low-delay video streaming over error-prone channels”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 6, pp. 841-851, Jun. 2004. [4] Y. Li, A. Markopoulou, N. Bambos, and J.G. Apostolopoulos, “Joint power-playout control for media streaming over wireless links” IEEE Transactions on Multimedia, vol. 8. no. 4, pp. 830-843, Aug. 2006. [5] T. Stockhammer, H. Jenkac, and G. Kuhn, “Streaming video over variable bit-rate wireless channels”, vol. 6, no. 2, pp. 268-277, Apr. 2004. [6] G. Liang and B. Liang, “Balancing interruption frequency and buffering penalties in VBR video streaming”, Proceedings of IEEE INFOCOM, to appear, 2007. [7] D. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 & 2, 2nd. Ed., Athena Scientific, 2000.

VII. A PPENDIX A. Sketch of proof of Lemma 1 Sketch of Proof: For clarity of exposition and due to limited space, we present the proof only for the case K = 1. The proof for K > 1 follows along similar lines, albeitwith  1 ? . more cumbersome algebra. Note that for K = 1, bπ0 = c Consider a policy π which is of threshold type, with the threshold at b?π . Thus, under policy π, the user requests data from the AP whenever her backlog b ≤ b?π , and does not request data else. We first perform the policy evaluation step to compute the expected cost Vπ (b) for policy π. For any b > b?π , policy π chooses action REQ, implying Vπ (b) = Vπ (b − 1) + 1 − c(b − 1) ∀ b > b?π .

(9)

Recursively, we obtain   c Vπ (b) = Vπ (b?π ) + (b − b?π ) 1 − (b + b?π − 1) ∀ b > b?π . 2 (10)

For any b ≤ b?π , policy π chooses action REQ, implying cb 1 +c+ 1−s 1−s Recursively, we obtain   −cb2 c + 2(1 − sc) b + Vπ (b) = 2(1 − s) 2(1 − s) Vπ (b) = Vπ (b − 1) −

∀ 1 ≤ b ≤ b?π . (11)

∀ 1 ≤ b ≤ b?π . (12)

Now we compute policy π 0 , which is a one step improvement of policy π. Setting K = 1 in (4), we have Vπ0 (b) = max{0, s[Vπ (b) − Vπ (b − 1) − c]} +Vπ (b − 1) + 1 − c(b − 1).

(13)

Three possible cases can arise, namely, b > b?π +1, b = b?π +1, and b < b?π + 1. For these three cases, it follows from (10) and (12) that  −cb + 1 + c ; b > b?π + 1   ; b = b?π + 1 −cb?π + 1 Vπ (b) − Vπ (b − 1) = (14) −cb + 1   + c ; b < b?π + 1 1−s

Note from (13) that the decision of policy π 0 in state b is determined by the sign of Vπ (b) − Vπ (b − 1) − c. It then easily 0 follows from  (14) that π is a threshold policy with threshold 1 . Since π was chosen arbitrarily, the desired result b?π0 = c follows (for K = 1). B. Sketch of proof of Lemma 2 Sketch of Proof: Setting π = π ¯ in (4) and substituting from (6),

Vπ¯ 0 (b) = max{0, sK (−cb + 1 + c(1 − K)/2)} (15) cb2  c − b. + 1+ 2 2 0 Thus, π ¯ chooses action REQ if −cb + 1 + c(1 − K)/2 ≥ 0, and action REQ else. Rearranging terms, it follows that π ¯ 0 is a threshold type policy with its threshold at b?π¯ 0 =   1 1−K , as desired. + c 2 C. Sketch of proof of Lemma 3 Sketch of Proof: In this case, (7) reduces to J(b) = sJ(b + 1) + (1 − s)J(b) − ζ.

(16)

ζb0 , where V (·) is It is now easily seen that J(0) = V (b0 ) − s given by (2). It follows that as a function of b0 , J(0) achieves its maximum value at b?0 = min{b0 : V (b0 + 1) − V (b0 ) < ζ/s}. Now, Theorem 1 and Lemma 1 imply that V (b0 + 1) − V (b0 ) is a monotone decreasing and piecewise linear function of b0 . Also, for b0 > 1/c, necessarily V (b0 + 1) − V (b0 ) < 0. The implication is that b?0 < 1/c, which is the threshold for the optimal policy in the playout phase. For b?0 < 1/c, it follows −c(b + 1) + 1 + c. Thus, from (14) that V (b0 + 1) − V (b0 ) = 1−s    1 ζ 1 1− , as desired. by definition b?0 = −s+ c c s

Suggest Documents