The Adaptive Knapsack Problem with Stochastic Rewards

The Adaptive Knapsack Problem with Stochastic Rewards ˙ Taylan Ilhan, Seyed M. R. Iravani, Mark S. Daskin Department of Industrial Engineering and Man...

Author: Agnes Hall

58 downloads 2 Views 360KB Size

Report

Download PDF

Recommend Documents

Modified Stochastic Knapsack for UMTS Capacity Analysis

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

Solving the Income Fluctuation Problem with Unbounded Rewards

Adaptive Stochastic Control for the Smart Grid

The adversarial stochastic shortest path problem with unknown transition probabilities

The Online Stochastic Generalized Assignment Problem

A Formalism for Stochastic Adaptive Systems

The best-deterministic method for the stochastic unit commitment problem

Engineering Stochastic Local Search for the Satisfiability Problem

The Problem With TROLLS!

The Problem with Acceleration

The problem with Nabokov

The Online Loop-free Stochastic Shortest-Path Problem

A Stochastic Programming Approach to the Airline Crew Scheduling Problem

A hybrid guided neighborhood search for the disjunctively constrained knapsack problem

Sicherheitsvorsorge Chemiepark Knapsack

Pricing Inflation Products with Stochastic Volatility and Stochastic Interest Rates

The Problem with Dog Aggression

What's the Problem with Pesticides?

The Problem with Historical Performance:

The Problem with Eyewitness Testimony

Problem. Find the acceleration for the inclined plane problem with

The Problem of the Problem with Educational Research

Beating the Adaptive Bandit with High Probability

The Adaptive Knapsack Problem with Stochastic Rewards ˙ Taylan Ilhan, Seyed M. R. Iravani, Mark S. Daskin Department of Industrial Engineering and Management Sciences Northwestern University, Evanston, IL, 60208, USA Abstract Given a set of items with associated deterministic weights and random rewards, the Adaptive Stochastic Knapsack Problem (Adaptive SKP) maximizes the probability of reaching a predetermined target reward level when items are inserted sequentially into a capacitated knapsack before the reward of each item is realized. This model arises in resource allocation problems which permit or require sequential allocation decisions in a probabilistic setting. One particular application is in obsolescence inventory management. In this paper, the Adaptive SKP is formulated as a Dynamic Programming (DP) problem for discrete random rewards. The paper also presents a heuristic that mixes adaptive and static policies to overcome the “curse of dimensionality” in the DP. The proposed heuristic is extended to problems with Normally distributed random rewards. The heuristic can solve large problems quickly and its solution always outperforms a static policy. The numerical study indicates that a near-optimal solution can be obtained by using an algorithm with limited look-ahead capabilities.

1

Introduction

Since the seminal paper of Dantzig (1957), the knapsack problem (i.e., finding a subset of items that maximizes the total reward while satisfying the knapsack capacity constraint) and its extensions have been widely studied due to their practical applications. One such extension is the class of Stochastic Knapsack Problems (SKP) with random rewards. In the Static SKP (e.g., Henig 1990 and Morton and Wood 1998), all items are initially present and the solution is based on selecting a subset of items that maximizes the probability of reaching a predetermined target level for the total reward. In the Dynamic SKP (e.g., Slyke and Young, 2000; and Kleywegt and Papastavrou, 2001), the items with random rewards and/or random sizes arrive at the system over time and the decision is whether to accept the arrived item (to insert it into the knapsack) or to reject it after observing its actual reward and/or size. This paper presents an adaptive version of the Stochastic Knapsack Problem (Adative SKP), which is defined as the problem of selecting items sequentially, one after another, with the objective of maximizing the probability of reaching a target reward level without exceeding the knapsack capacity. The probability distributions of the return levels for each selection are known; however, the actual return from a selection is only revealed after making that selection. The Adaptive SKP is similar to the Static SKP in that both problems have the same probabilistic objective function. In both the Adaptive and Static SKPs, all items are present initially. On the other hand, unlike the Static SKP, items are selected sequentially depending on the realizations of the random rewards in the previous stages. Another SKP which requires sequential decision making with the objective of maximizing the 1

expected profit is the Dynamic SKP. However, in the Dynamic SKP, the items arrive at the system one at a time and the reward realizations are observed before the accept/reject decision is made. Thus, the Dynamic SKP and the Adaptive SKP differ with respect to these assumptions. We believe that our study on the Adaptive SKP fills an important gap between the Static SKP and the Dynamic SKP. The problem is motivated by obsolescence inventory management. When a manufacturing firm makes a significant design change that eliminates a product line, or replaces an old part with a newly designed part, the inventory of the old parts at the firm’s suppliers becomes obsolete. Since the design change often happens before the contracts between the firm and its suppliers end, the firm is obligated to reimburse its suppliers for their obsolete inventory. The firm audits the inventory at the suppliers’ sites before paying them their claimed amounts. In most cases, the actual reimbursement amount differs from the amount claimed by the supplier. A recovered claim is defined to be the difference between the value of the inventory claimed by the supplier and the actual reimbursement paid to the supplier. These recovered claims are random and their values are not known until the suppliers are visited. The firm must select which supplier (or set of suppliers) among many to audit in the next audit period. The objective is to make sure that a certain number of suppliers’ claims are audited and a certain target recovery level is reached. After auditing a supplier and realizing the recovery amount, the supplier that will be visited in the next period must be selected. The sequential decision of which supplier to be audited next to increase the probability of reaching the target recovery level corresponds to the sequential decision of which item to select in the Adaptive SKP. The inventory audit problem is explained in greater detail in Online Appendix A. Other applications of Adaptive SKP can also be found there. The benefit of an adaptive solution over a static one can be shown by a simple example. Suppose, there are only 3 items to choose from and the target level is $3. The return on item 1 is deterministic and is $1. The return on item 2, however, is stochastic and is equally likely to be $1 or $2. Similarly, the return on item 3 is stochastic and is equally likely to be $0 or $2. We can select only 2 items. A simple inspection shows that an optimal static solution is to select items 1 and 2, for which the probability of reaching the target level is 0.5. On the other hand, an adaptive policy specifies item 2 as the first selection since it dominates both items stochastically. Then, depending on its realization, the policy picks either item 1 or item 3. The resulting probability of reaching the target level of $3 by the adaptive policy is 0.75. As this example shows, there may be a significant increase in the optimal value of the objective function if one uses an adaptive solution instead of a static one. Dean et al. (2004) considered another class of the Adaptive SKP, in which the items are all present at the beginning and have random weights with known distributions and deterministic rewards. The problem is to find a policy that maximizes the total reward collected by inserting items one at a 2

time until the capacity of the knapsack is exceeded. The Adaptive SKP is a special case of multi-stage stochastic programming, which is predicated on the adaptivity in stochastic problems, (Charnes and Cooper (1959), Pr´ekopa (1995) and Birge and Louveaux (1997)). These problems are generally difficult. We believe that this is the first study that analyzes adaptivity for the stochastic knapsack problem with a probabilistic objective function. The most widely utilized tool to solve finite-horizon stochastic optimization problems is Stochastic Dynamic Programming (SDP). For a general treatment of this subject, we refer the reader to Bertsekas (1987). For some small problems, it is possible to obtain a closed form solution by solving the Bellman equations directly. However, in many realistic problems, due to the large state space, especially in the case of discrete optimization problems, this may not be the case. Therefore, several look-ahead approximation methods, such as the one presented in this paper, have been developed (see Bertsekas and Castanon (1999), Kusic and Kandasamy (2007), and Novoa and Storer (2009) for examples). The remainder of the paper is organized as follows. Section 2 formulates a Dynamic Programming (DP) model for problems with discrete random rewards and provides analytical characteristics of the optimal solution for a special case of the Adaptive SKP with unit weights and an unlimited number of items of each type. Section 3 presents a heuristic method for problems with discrete rewards, which is guaranteed to yield a solution that is at least as good as the optimal static solution. Also, for the Adaptive SKP with Normally distributed rewards, Section 4 outlines lower and upper bounding schemes on the optimal solutions and a heuristic which integrates the previous heuristic method with a parametric solution algorithm. Finally, Section 5 presents a numerical study to analyze the performance of the heuristic algorithms and to identify conditions under which the value of adaptivity is high.

2

Problem Description

Consider a knapsack with a capacity of W > 0 and a set of Ni items of each item type i ∈ N , that can be placed in the knapsack. All items of type i ∈ N have the same deterministic weight wi . Item j of type i ∈ N is associated with a random reward Xij . Random rewards within each type are i.i.d. distributed (i.e. Xij ∼ Xi ), and random rewards across all types are independently distributed. The objective is to maximize the probability of reaching a target reward level of R by sequentially inserting items into knapsack. Suppose that the knapsack has already been partially filled with items Si for each i ∈ N , and let xji be the realization of Xij . Define (r, w, n) to be the state of the problem, where r is the remaining P P P target level (R − i∈N j∈Si xji ), w is the remaining capacity (W − i∈N |Si |wi ), and n is 1 × |N | vector whose ith entry, ni , is the number of items of type i remaining in the system (ni = Ni − |Si |). 3

In the beginning of the problem when no item has been selected, n = N = [N1 , N2 , . . . , N|N | ]. The problem is to find an optimal policy that inserts items sequentially into the knapsack starting from the initial state of the problem (R, W, N). The optimal adaptive policy chooses the next item that maximizes P(r, w, n), which is the probability of reaching the target reward level at state (r, w, n). Assume that the reward distributions are discrete and that a type-i item can take on Ci values, i ∈ N . Specifically, the probability that any item j of type i takes on a value xik is pik , i.e., pik = P (Xij = xik ), for k = 1, 2, . . . , Ci , and i ∈ N . Also, define ei to be a row vector composed entirely of 0s, except that there is a 1 in the ith position. Let Pmax (r, w, n) be the optimal objective function value of the Adaptive SKP when the system starts from state (r, w, n). The problem can be formulated as a dynamic programming problem as follows: Pmax (r, w, n) = max i∈N

Ci nX

Pmax (r − xik , w − wi , n − ei )pik

o

(1)

k=1

with the following boundary conditions: Pmax (r, w, n) = 0

w < 0; or ni < 0 for at least one i; or r > 0 , w = 0,

(2)

Pmax (r, w, n) = 1

r≤0, w≥0, n≥0

(3)

If the xik ’s are all positive integers, then the complexity of the DP solution algorithm is i=|N |

i=|N |

O(RW |N |CΠi=1 (Ni +1)) where C = maxi∈N Ci since the total number of states is RW Πi=1 (Ni +1) in the worst case, and the number of calculations performed at each state is bounded by |N |C. Also, note that even if C = 2 and we have already decided which items to pick at each period (as in static version), calculating the objective function value is #P-Hard as the problem reduces to that studied by Kleinberg et al. (1997). The Adaptive SKP is analogous to the bounded integer knapsack problem, which reduces to the 0/1 knapsack problem in the case in which Ni = 1, ∀i ∈ N , and becomes the unbounded knapsack problem when Ni = ∞, ∀i ∈ N . Although the bounded version may not show any particular structure, some special cases of the unbounded version display interesting properties. For instance, if all weights are equal for the unbounded version of the Adaptive SKP (w.l.o.g. wi = 1 ∀i ∈ N ), the problem can be thought of as the sequential selection of W items from N item types. Since the number of available items for each type is unlimited, the quantity terms can be eliminated from the definition of the state (r, w, n), and we can simply use (r, w) as the state variable. As an example of the state space, consider the unbounded and equal-weight version of the Adaptive SKP with 4 types of items. Their reward distributions are given in Figure 1 along with the corresponding optimal selections for each state. Even for this special case, there is no monotone

4

item

x1

p1

x2

p2

item

x1

p1

x2

p2

1

0

0.8

12

0.2

3

0

0.4

5

0.6

2

0

0.6

8

0.4

4

0

0.1

2

0.9

(a)

(b)

Figure 1: (a) The reward distributions for 4 item types with unit weights. The random reward of each item has two possible outcomes x1 and x2 with probabilities p1 and p2 , respectively. (b) The DP table which gives optimal selections at all possible remaining target levels and capacity states for an unbounded SKP with items defined in Figure 2(a). Each cell corresponds to a possible problem state and contains the optimal solution at that state; multiple items in a cell point to alternative selections, while “any” means any one of the 4 items can be selected.

threshold level that separates the state space into regions that correspond to a particular item being optimal. Monotone threshold levels are often desirable since they are easier to implement in practice. When items with different weights and/or bounds on the number of items are introduced, the decision space gets even more complex; a monotone threshold-based decision scheme may not be available for the general case of the problem. Also, the DP formulation quickly becomes computationally intractable as the dimension of the problem increases. Thus, efficient and accurate heuristic methods are needed for obtaining near optimal selections. Consequently, in the next section, we propose such a heuristic method that gives the next item to select at each decision epoch.

3

A Heuristic Algorithm for Discrete Reward Distributions

To get around the “curse of dimensionality” in the DP algorithm, we propose an algorithm that uses the solution of a number of Static SKPs to find a near-optimal solution for the Adaptive SKP. The next two subsections introduce the Static SKP and then describe how to use it to solve the Adaptive SKP.

5

3.1

The Formulation of the Static Stochastic Knapsack Problem

Let yij be a binary decision variable which is 1 if the j th item of type i is selected, and 0 otherwise. Then, the Static SKP for state (r, w, n) can be formulated as follows (Morton and Wood, 1998): Static SKP (r, w, n) : maximize

P

ni XX

Xij yij ≥ r

i∈N j=1

subject to: X i∈N

wi

ni X

yij ≤ w

j=1

yij ∈ {0, 1}

j = 1, . . . , ni ∀i ∈ N

The objective is to maximize the probability that the total reward collected is more than the target reward level. The first constraint represents the maximum weight of the items that can be selected. P i yij gives the number of selected items of type i. The last set of constraints defines The quantity nj=1 the binary variables. Note that the objective function value of the Static SKP is a lower bound for the objective function value of the Adaptive SKP since any static solution is also a solution to the Adaptive SKP; i.e., even if the problem permits adaptive decision making, we can decide on all the items to insert into the knapsack initially and not change our decisions later. Morton and Wood (1998) and Henig (1990) develop different solution methods for the case of the Static SKP with Normal rewards. Our study utilizes the method by Henig (1990) since it fits well in the case of the Adaptive SKP with Normally distributed random rewards.

3.2

An L-level Heuristic

We propose an L-level heuristic that, instead of generating all possible sample paths of all feasible item selections, the heuristic branches only L levels (each of which corresponds to an item selection) and approximates the probabilities at each end node of the partial tree with the probability value obtained by solving the corresponding Static SKP for the remaining part of the tree. Thus, the proposed heuristic is only partially adaptive; to decide which item to select next, we assume L items are to be selected adaptively and after that, the remaining knapsack capacity is to be filled in a single static decision. However, after each selection, the algorithm reuses the heuristic to select the next item. This continues until the knapsack is filled. In essence, the algorithm provides look-ahead capability for only L periods instead of the entire decision horizon. The L-period look-ahead approximation can be represented by adding a new boundary condition to the DP model in Section 2. If the current state is (r0 , w0 , n0 ) and PS is the objective function value of the Static SKP, then the boundary condition is

6

Pmax (r, w, n) = PS (r, w, n) where ||n0 − n||1 =

P

i∈N

:

||n0 − n||1 = L , w ≥ 0 , n ≥ 0

(4)

|(n0 )i − ni | and (n0 )i is the ith element of the vector n0 . The value of this

norm gives the number of items selected until state (r, w, n) is reached, starting from state (r0 , w0 , n0 ). Note that the Static SKP corresponds to the heuristic with L = 0. Thus, the objective function value obtained by solving the Adaptive SKP with the proposed algorithm is always at least as good as that obtained by solving the Static SKP. Moreover, as the level of approximation increases, the objective function value converges to the optimal value, since for a large enough value of L, the boundary condition (4) becomes ignorable (or loose) and the heuristic algorithm becomes equivalent to the original DP algorithm. The following theorem, which holds for problems with continuous as well as discrete reward distributions, proves this result. The proofs of Theorem 1 and Theorem 2 are presented in Online Appendix B. Theorem 1. For any state (r, w, n), PS (r, w, n) ≤ P1 (r, w, n) ≤ P2 (r, w, n) ≤ . . . ≤ PL (r, w, n) ≤ PA (r, w, n), where PA , PS and Pl , l = 1, 2, . . . , L, represent the solutions of the optimal adaptive, optimal static and l-level heuristic algorithms, respectively. The L-level heuristic gives the next item to select and yields PL (r, w, n), an estimate for the probability of attaining the target, each time it is run. This probability is calculated by assuming that the algorithm will not be rerun with the remaining set of items and the updated target value after the reward realization of the selected item. Thus, the actual probability of reaching the target level by using the algorithm after every selection, PL actual (r, w, n), can differ from that estimated by the algorithm, PL (r, w, n). The following theorem proves a key relationship between these two probabilities. L Theorem 2. PL actual (r, w, n) ≥ P (r, w, n).

Our computational experience suggests that the difference between the probability of attaining the target reward from a state using the optimal adaptive algorithm and that obtained using an Llevel approximation decreases rapidly with L. In fact, in a small example, for L = 1, over 95% of the possible states result in no difference, and the average difference between the optimal probability and the heuristic probability for the remaining 5% of the states is less than 0.01. For L = 2, almost 99.5% of the states produce the same probability estimates and the average difference for the remaining states is 0.0001. These and other similar examples of the numerical study suggest that the L-level heuristic works quite well and values of 1 or 2 for L seem to be adequate for many practical purposes. Figure 2 summarizes the 1-level heuristic for problems in which the rewards have discrete distributions. When the distribution of rewards is continuous, we cannot directly use the methods developed up to this point. However, in the case of Normally distributed random rewards, the 1-level heuristic can be extended in a way which makes its application to this problem very efficient and easy-to-implement. 7

Step 1. For i ∈ N do Step 1.1. For all j = 1, . . . , Ci (i.e., all possible realizations of Xi ) Solve StaticSKP (r0 − xij , w0 − wi , n − ei ) and let the optimum objective function value be PSij . P Step 1.2. Set Pi = j∈N pij PSij Step 2. The next item to select is i? = argmax{Pi |i ∈ N }.

Figure 2: Outline of the 1-level heuristic for the Adaptive SKP where rewards have discrete distributions.

4

The Adaptive SKP with Normally Distributed Random Rewards

Most studies on the Static SKP in the literature concentrate on Normally distributed rewards (Steinberg and Parks, 1979; Sniedovich, 1980; Henig, 1990; Carraway et al., 1993; Morton and Wood, 1998; Goel and Indyk, 1999) because the Normality assumption covers a wide range of practical applications and, at the same time, makes the static problem more tractable. In this section, we consider the Adaptive SKP, in which the reward of type i items follows a Normal distribution with mean µi and variance σi2 . The computational concern in a system with continuous rewards is that there are infinitely many possible reward realizations and the algorithms of the previous section are not directly applicable (since they assume a finite state space). Thus, Section 4.1 explains how to obtain lower and upper bounds on the optimal solutions of the Adaptive SKP with Normally distributed random rewards by using a discretization scheme. These bounds are used in the numerical study. Section 4.2 shows how the 1-level heuristic can be extended to the Adaptive SKP with Normally distributed rewards. The rest of the paper uses SKP-NOR to refer to the SKP with Normally distributed rewards.

4.1

Bounds on the Adaptive SKP-NOR

The DP algorithm of Section 2 can be used with a discretization scheme to obtain lower and upper bounds on the optimal solutions of the Adaptive SKP-NOR. Consider a new discrete random reward (r.r.) X i that stochastically dominates r.r. Xi , ∀i ∈ N , (i.e., P (Xi > x) ≥ P (Xi > x) ∀x ∈ < with at least one strict inequality). Then, the probability obtained by solving the problem with r.r. X i ’s is an upper bound on the probability obtained by solving the problem with r.r. Xi ’s. Similarly, a lower bound can be obtained by solving the problem with a new r.r. X i , i ∈ N , which is stochastically dominated by the r.r. Xi , i ∈ N . The following Proposition identifies such discretizations for the problem with Normally distributed random rewards. The proof is straightforward and is therefore omitted. Proposition. Let Xi ∼ N (µi , σi2 ) and {xi0 , xi1 , . . . , xi(Ci +1) } be a finite set of ordered points such that xi0 = −∞ and xi(Ci +1) = ∞. Then,

8

(i) the discrete r.r. X i with probability mass function P (X i = xij ) = Φ(

xi(j+1) −µi ) σi

− Φ(

xij −µi σi ),

j = 0, 1, . . . , Ci , stochastically dominates the continuous r.r. Xi and, (ii) the discrete r.r. X i with probability mass function P (X i = xij ) = Φ(

xij −µi σi )

− Φ(

xi(j−1) −µi ), σi

j = 1, 2, . . . , Ci + 1, is stochastically dominated by the continuous r.r. Xi , where Φ is the cumulative distribution function of the Standard Normal distribution. Now, using this proposition we define a valid discretization scheme as follows. For a fixed α value (α < 0.5), we set xi1 = bµi + zα σi c, xiCi = bµi + z1−α σi c and xij = xi(j−1) + 1, j = 2, 3, . . . , (Ci −1), where zα denotes the αth quantile of the Standard Normal distribution. One problem with this scheme is that the new r.r. X i ’s and X i ’s must take −∞ and +∞ values at points xi0 and xi(Ci +1) , respectively, to satisfy the dominance relationships. However, to represent the stochastic i ) dominance in the problem, it suffices to set xi0 = −(W/wmin )|xmax | with P (X i = xi0 ) = Φ( xi1σ−µ i

and xi(Ci +1) = (W/wmin )|xmin | with P (X i = xi(Ci +1) ) = 1 − Φ(

xiCi σi

− µi ), where wmin = mini∈N wi ,

xmin = mini∈N xi1 and xmax = maxi∈N xiCi . The reason is that observing a reward value of −∞ in case (i) of the proposition means that the target level can never be reached, and any value of xi0 that is less than or equal to −(W/wmin )|xmax | ensures that. Similar reasoning is valid for xi(Ci +1) . Applying this discretization scheme to Normally distributed random rewards and solving the corresponding Adaptive SKP problems using the DP algorithm in Section 2 provides lower and upper bounds on the optimal solutions of the Adaptive SKP-NOR.

4.2

The 1-Level Heuristic for Adaptive SKP-NOR

One way to extend the 1-level heuristic algorithm to the Adaptive SKP-NOR is to discretize the Normal distribution for item i into Ci discrete points and directly use the algorithm in Figure 2. In this case, we can use existing algorithms such as that presented in Morton and Wood (1998) to solve the corresponding Static SKP in Step 1.1. The other way is to modify the 1-level heuristic algorithm so that it can utilize parametric search algorithms such as the one proposed by Henig (1990) to deal with the Static SKP in Step 1.1. By utilizing the algorithm of Henig (1990), the 1-level heuristic algorithm avoids discretization of the Static SKP solved at each stage, and stores some intermediate calculation that can be reused and which may help speed up the overall solution procedure. We use the 1-level heuristic algorithm to perform the numerical study in Section 5. Online Appendix C explains the details of how the 1-level heuristic algorithm for discrete reward distribution can be revised to solve the Adaptive SKP-NOR using the parametric search approach of Henig (1990).

9

5

Numerical Study

This section summarizes a numerical study designed to investigate the following three issues: (1) the accuracy and efficiency of the 1-level heuristic, (2) the value of the optimal adaptive policy over a static policy, and (3) the success of the 1-level heuristic in capturing the value of adaptivity. The test problems consist of instances with |N | = 6 to 15 where Ni = 1 for all i ∈ N , and instances with |N | = 20 to 30 where Ni ≥ 1 for all i ∈ N . In all these problem instances, the random rewards are Normally distributed. Instances differed in terms of the item weights, the means and variances of the item rewards, the target reward level, and the knapsack capacity. Online Appendix E provides a detailed explanation of the factorial experiment used in the study as well as the results of the study. Below, we summarize the highlights of those results, which all correspond to problem instances with Normally distributed random rewards. Accuracy of the heuristic To measure the accuracy of the heuristic, let E = PA − P1actual be the difference between the optimal adaptive solution obtained by the the DP algorithm (1), PA , and that of the 1-level heuristic algorithm, P1actual . Since we do not have an algorithm to obtain the optimal adaptive solution PA for problems with Normally distributed rewards, we used the discretization scheme described in Section 4.1 together A

with the optimal DP algorithm to obtain P , the upper bound on the optimal adaptive solution. The A

difference E = P − P1actual is therefore an overestimation of the error E (or an underestimation of the accuracy of the 1-level heuristic). For problems with 6 to 15 items, the average value of E is 0.02, averaged over 530 problem instances. The maximum value is 0.06.1 These results suggest that the 1-level heuristic approximates the Adaptive SKP objective function very well. Furthermore, we observe that E dose not vary as the problem size N increases from 6 to 8, 9, 10, 12 and 15. Therefore, we expect the heuristic to remain accurate for larger problem instances as well. Efficiency of the heuristic The numerical analysis results in the following regression equation for the runtime of the DP algorithm, when the Normal distributions are discretized based on the method in Section 4.1: runtime = exp(−7.01 + 0.48|N | + 1.24W ). The regression equation for the runtime of the 1-level heuristic is runtime = 0.06|N |1.2 W 1.1 for cases with equal weight items, and runtime = 0.028|N |2.63 W 0.46 for cases with items of unequal weights. For example, consider a problem with |N | = 40 and W = 7, which is the size of the inventory audit problem we observed in practice. Based on the regression 1

Large values of E occur when the target level and/or the level of diversity in the variability of the stochastic returns are relatively high. Please see Online Appendix F for details.

10

models, the DP algorithm would take roughly 37 years to complete, while the maximum time of a run of the 1-level heuristic would be approximately 43 seconds for the equal-weight cases and less than 19 minutes for unequal-weight cases. Thus, the 1-level heuristic algorithm is very time-efficient. For details, please see Online Appendix F. Bounds on the value of adaptivity The value of adaptivity (VoA) is the gap between the objective function values of the optimal adaptive and the optimal static solutions (VoA = PA − PS ). This concept is important for both the optimal and heuristic algorithms since, if the VoA is small, there is no value in using an algorithm designed to solve the Adaptive SKP. Again, since we do not have an algorithm to obtain the optimal adaptive solution PA for problems with Normally distributed rewards, we use PA , the lower bound on PA , and calculate PA − PS , the lower bound on the value of adaptivity. Figure 3(a) gives the histogram of the lower bound on the value of adaptivity for the 530 problem instances with 6 to 15 items. The average lower bound is 0.062 and the maximum is 0.20. In 75% of the instances, the value of adaptivity exceeds 0.03. The complete set of experiments indicates that having items with diverse risk levels (different coefficients of variation of the rewards) increases the value of adaptivity. For details see Online Appendix G. Capturing the value of the adaptivity A

Finally, the ratio (P1actual − PS )/(P − PS ) provides an estimate of the extent to which the 1-level heuristic captures the value of adaptivity. This ratio is between 0 and 1. The closer this ratio is to 1, the greater the fraction of the value of adaptivity that is captured by the heuristic algorithm. Figure 3(b) provides a scatter plot of the above ratio for the 530 problem instances with 6 to 15 items. The average ratio is 0.72. Furthermore, the ratio increases with the value of adaptivity itself. This implies that for those problem instances with the largest value of adaptivity, the heuristic does a good job of capturing most of the benefit associated with using the adaptive model.

6

Conclusions

This paper introduced an adaptive version of the stochastic knapsack problem in which all items are initially present with deterministic weights and random rewards and the objective is to maximize the probability of reaching the target reward level. We developed a DP model and an L-level heuristic that can be used to decide on items to insert into the knapsack sequentially. This algorithm always performs better than the optimal static algorithm and results in solutions very close to those obtained by the optimal adaptive algorithm. The numerical results also suggest that much of the benefit of a fully adaptive policy can be obtained by using a policy with limited look-ahead capabilities. A possible future research direction is to analyze the Adaptive SKP in which items have random 11

(a)The Value of Adaptivity

(b)Capturing the Value of Adaptivity

Figure 3: Summary of the Numerical Study for 530 test cases. (a) The x-axis of the histogram represents the difference [PA − PS ] and the y-axis gives the count of tests cases that fell into each difference range. At the bottom of the histogram (just above x-axis), the mean of the differences for 530 test cases, x ¯, and its 95% confidence interval are provided. (b) Each dot on the scatter-plot corresponds to a test case, where the x-axis A A represents the [P − PS ] difference and the y-axis represents the ratio of (P1actual − PS )/(P − PS ).

weights. Also, the scheme developed in this study makes no assumption on the constraint set. It may be interesting to extend this scheme to other probabilistic problems whose constraint sets are more complex than the knapsack constraint such as scheduling and routing problems.

References Bertsekas, D. P. 1987. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ. Bertsekas, D. P., and Castanon, D.A. (1999). Rollout algorithms for stochastic scheduling problems. Journal of Heuristics, 5, 89–108. Birge, J. R., F. Loueaux. 1997. Introduction to Stochastic Programming. Springer, New York. Charnes, A., and Cooper, W. W. 1959. Chance-constrainted programming. Management Science 6, 73–79. Carraway, R., R. Schmidt, L. Weatherford. 1993. An Algorithm for Maximizing Target Achievement in the Stochastic Knapsack Problem with Normal Returns. Naval Res. Logist. 40 161–173. Dantzig, G. B. 1957. Discrete-variable Extremum Problems. Oper. Res. 5, 266–277. Dean, B. C., M. X. Goemans, J. Vondrak. 2004. Approximating the Stochastic Knapsack Problem: The Benefit of Adaptivity. FOCS. 208–217. Goel, A., P. Indyk. 1999. Stochastic Load Balancing and Related Problems. FOCS. 579–586. Henig, M. 1990. Risk Criteria in a Stochastic Knapsack Problem. Oper. Res. 38 820–825. Ilhan, T., S. M. Iravani, M. Daskin. 2007. The Orienteering Problem with Stochastic Profits. IIE Transactions, 40:4, 406-421. Kleinberg, J., Y. Rabani, E. Tardos. 1997. Allocating bandwidth for bursty connections. Proc. 29th ACM Sympos. Theory Comput., 664–673. Kleywegt, A. J., J. D. Papastavrou. 2001. The Dynamic and Stochastic Knapsack Problem with Random Sized Items. Oper. Res. 49 26–41.

12

Kusic, D. and Kandasamy, N. (2007), Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems, Cluster Computing, 10 (4), 395-408 Morton, D. P., R. Wood. 1998. On a Stochastic Knapsack Problem and Generalizations.In Advances in Computational and Stochastic Optimization, Logic Programming and Heuristic Search. D. Woodruff, Ed. Kluwer, Boston, MA. pp. 149–168. Pr´ekopa, A. 1995. Stochastic Programming. Kluwer, Boston, MA. Novoa, C. and Storer, R. (2009), An approximate dynamic programming approach for the vehicle routing problem with stochastic demands, European Journal of Operational Research, 196 (2), 509-515. Siniedovich, M. 1980. Preference Order Stochastic Knapsack Problems: Methodological Issues. J. Oper. Res. Soc. 31 1025–1032. Slyke, V. R., Y. Young. 2000. Finite Horizon Stochastic Knapsacks with Applications in Yield Management. Oper. Res. 48 155–172. Steinberg, E., M. S. Parks. 1979. A Preference Order Dynamic Program for a Knapsack Problem with Stochastic Rewards. J. Oper. Res. Soc. 30 141–147.

13

Online Appendix A Applications of Adaptive Stochastic Knapsack Problem Inventory Audit Problem: The problem in our paper was motivated by obsolescence inventory management that one of the authors encountered during his visit to the Supply Chain Management Department of a large U.S. manufacturer. What often occurs is that manufacturing firms make a significant design change that eliminates a product line, or replaces an old part with a newly designed part. This leads to inventory of the old parts at the firm’s suppliers becoming obsolete. Since the design change often happens before the contracts between the firm and its suppliers end, the firm is obligated to reimburse its suppliers for the obsolete inventory on hand. It is clearly in the firm’s best interest to audit the inventory at the suppliers’ sites before paying them their claimed amounts. This is due to the fact that suppliers generally overestimate the amount and the quality of the remaining inventory (with the hope of getting more money for their obsolete inventory). To make sure that the claims are legitimate and accurate, the Obsolescence group sends an auditor to visit each supplier’s site, and after checking the quantity and quality of the supplier’s remaining inventory, the auditor determines the value of the supplier’s inventory (according to the firm’s regulations and the auditor’s experience). The auditor and the supplier then negotiate the final reimbursement amount, which will be later paid to the supplier. Available statistics at the Obsolescence group showed us that, in most cases, the actual reimbursement amount is different from the amount claimed by the supplier. A recovered claim is the difference between the value of the inventory claimed by the supplier and the actual reimbursement paid to the supplier. These recovered claims are random and their values are not known until the suppliers are visited. When the claim is settled, the recovery amount is realized. Note that the recovery amount is only revealed after the inventories are audited. Based on the history of the recovery amounts of different suppliers, the Obsolescence group of the firm (with the help of one of the authors) was able to identify the distribution of recovered claims for different classes of parts and materials (e.g., bolts and nuts, plastics). Most of the distributions were approximated very well by the Normal distribution. The Obsolescence group of the firm has a limited number of auditors who must select which supplier or suppliers (among the large list) to audit in the next audit period (e.g., two weeks). The objective of the Obsolescence group is to make sure that a certain number of suppliers’ claims are audited and a certain target recovery level is reached. In fact, one of the measures by which the performance of the Obsolescence group is evaluated is whether they reach the target recovery level specified by the Obsolescence group and the manager of Supply Chain Department for the given review period (e.g., three months). After visiting a supplier (or a set of suppliers), and realizing the recovery amount, the auditor (i.e., the Obsolescence group) must decide which supplier (or set of suppliers) s/he should visit in the next period. The Inventory Audit problem motivated our study of the “Adaptive Knapsack Problem with Stochastic Rewards (Adaptive SKP).” In the Inventory Audit problem • A supplier (or a group of suppliers that are close to each other and can therefore be visited in one trip) corresponds to an item in the Adaptive SKP. • There are a limited number of suppliers (i.e., a limited number of items in the Adaptive SKP).

14

• The time to visit a supplier (or a group of suppliers that are close to each other and can therefore be visited in one trip) corresponds to the weight of an item in the Adaptive SKP. • The maximum time that an auditor can spend visiting suppliers in an audit period corresponds to the capacity of the knapsack in the Adaptive SKP. • The sequential decision of which supplier (or set of suppliers) to visit next corresponds to the sequential decision of which item to choose in the Adaptive SKP. • The realization of the random recovered claim after a supplier is audited corresponds to the realization of the value of an item after the item is put in the knapsack. • The primary objective of reaching the recovery target level in a quarter corresponds to the objective of maximizing the probability of reaching a certain reward in the Adaptive SKP. The Adaptive SKP in our paper is the first step in solving this class of problems and in gaining insights into developing simple but effective policies. Other Potential Applications: Another application of the Adaptive SKP is the fishing ground selection problem (e.g., see Millar and Kiragu 1997). In this problem high yield fishing grounds need to be selected for a fishing ship. This corresponds to items to be selected and to be put in the knapsack. The yields of fishing grounds can be estimated by almost real time oceanographic analysis and historical figures. However, the yield of a fishing ground is a random variable which will not be realized until the end of harvesting at that location. There are constraints like capacity/legal limit/target levels on the revenue that limit the number of fish caught. For example, the storage capacity of the ship (i.e., the number of fish that can be stored in the ship) is an important constraint that limits the revenue.2 Thus, one of the objectives of the problem is to maximize the chance of reaching the target yield level, i.e., maximizing the probability of filling up the storage. The “select-search-collect” nature of the Inventory Audit problem and the Fishing Ground Selection problem, as well as their feature of realizing a random reward after a decision is made, can also be observed in many other situations including, civil or military reconnaissance over a large geography (see Kress and Royset 2008, Kress et al. 2006, Moser 1990), scheduling technicians for planned maintenance of geographically distributed equipment (see Tang et al. 2007), and talent search of a sports club (see Butt and Ryan 1999). We beleive that these problems can be formulated as variants of the Adaptive SKPs that is described in this paper.

References Butt, S.E. and Ryan, D. M. 1999. An optimal solution procedure for the multiple tour maximum collection problem using column generation, Computers and Operations Research, 26, 4, 427-441. Millar, H. H., and Kiragu, M. 1997. A time based formulation and upper bounding scheme for the selective traveling salesperson problem. J. Oper. Res. Soc. 48, 5, 511–518. Kress, M., and Royset, J. O. 2008. Aerial Search Optimization Model (ASOM) for UAVs in Special Operations, Military Oper. Res., 13, 1, 23-33. Kress, M., Baggesen, A. and Gofer, E. 2006. Probability Modeling of Autonomous Unmanned Combat Aerial Vehicles (UCAVs), Military Oper. Res., 11, 4, 5-24. Moser, H. D. Jr. 1990. Scheduling and routing tactical aerial reconnaissance vehicles. Master’s thesis, Naval Postgraduate School, Monterey, CA. 2

The storage capacity corresponds to the capacity of the knapsack in the Adaptive SKP.

15

Stroh, L. K., Northcraft, G. B., and Neale M. A. 2002. Organizational Behavior : A Management Challenge. Lawrence Erlbaum Associates, Mahwah, NJ. Tang, H., Miller-Hooks, E. and Tomastik, R. 2007. Scheduling Technicians for Planned Maintenance of geographically Distributed Equipment. Transportation Research-Part E 43, 591-609.

Online Appendix B Proofs of the Analytical Results PROOF OF THEOREM 1. The proof is by induction: Case L = 1: We will prove that PS (r, w, n) ≤ P1 (r, w, n). Let q be the vector giving the number of items of each type selected in the optimal solution of a static problem. We can state the objective function of the static problem by conditioning on item type k with qk > 0 (i.e., item k is chosen in the optimal static solution) and setting q ˆ = q − ek as follows: qˆi Ck XX X P (r, w, n) = P Xij > r − xkj pkj S

j=1

i∈N j=1

The probability term inside the sum gives the probability of reaching the target level for a particular realization of the item type k, xkj , under the solution q ˆ (i.e., the items to be chosen are in vector q ˆ at state (r − xkj , w − wk , n − ek )). We know that S

P (r − xkj , w − wk , n − ek ) ≥ P

qˆi XX

Xij > r − xkj

i∈N j=1

since the left hand side is the value of the optimal objective function of the static problem at state (r − xkj , w − wk , n − ek ) which is not restricted to select items in q ˆ. Multiplying by pkj and summing both sides for all possible values realizations of item type k, we obtain Ck X

S

P (r − xkj , w − wk , n − ek )pkj

qˆi Ck XX X Xij > r − xkj pkj ≥ P

j=1 Ck X

j=1

PS (r − xkj , w − wk , n − ek )pkj

i∈N j=1

≥ PS (r, w, n).

(5)

j=1

On the other hand, for the 1-level heuristic: P1 (r, w, n) = max i∈N

P1 (r, w, n) ≥

Ck X

Ci nX

PS (r − xij , w − wi , n − ei )pij

o

j=1

PS (r − xkj , w − wk , n − ek )pkj

j=1

16

(6)

Thus, based on (5) and (6), we obtain P1 (r, w, n) ≥ PS (r, w, n). Case L = l: Now, we assume that for L = l − 1 the following holds (recall that P0 = PS ): Pl−1 (r, w, n) ≥ Pl−2 (r, w, n) ∀(r, w, n)

(7)

and prove that Pl (r, w, n) ≥ Pl−1 (r, w, n). For, Pl (r, w, n) and Pl−1 (r, w, n) we have Pl (r, w, n) = max

Ci nX

i∈N

o Pl−1 (r − xij , w − wi , n − ei )pij ,

(8)

o Pl−2 (r − xij , w − wi , n − ei )pij .

(9)

j=1

Pl−1 (r, w, n) = max

Ci nX

i∈N

j=1

Based on the induction assumption (7), the right hand side of (8) is greater than that of (9). Therefore, Pl (r, w, n) ≥ Pl−1 (r, w, n) ∀(r, w, n) which completes the induction. PROOF OF THEOREM 2. Let T be the set of all (w, n) pairs that satisfy the following inequality

L ≥ max

ni nXX

yij |

i∈N j=1

ni XX

yij wi ≤ w , yij ∈ {0, 1} ∀i, ∀j

o

(10)

i∈N j=1

Case (w, n) ∈ T : T defines a fundamental set of states which any recursive probability calculation process will fall into eventually. Note that if (w, n) ∈ T and (w0 , n0 ) ≤ (w, n), i.e., all elements in (w0 , n0 ) are less than or equal to the corresponding elements in (w, n), then (w0 , n0 ) ∈ T . In this case, the L-level heuristic is equivalent to the optimal DP algorithm since inequality (10) indicates that L is greater than the maximum number of items which can be inserted into the knapsack. Thus, L A PL actual (r, w, n) = P (r, w, n) = P (r, w, n). Case (w, n) = (ω, η): Now assume that PL r, ω ¯ , η¯) ≤ PL (¯ r, ω ¯ , η¯) for any state (¯ r, ω ¯ , η¯) such that actual (¯ (¯ ω , η¯) ≤ (ω, η) with at least one strict inequality. Note that making a decision in state (ω, η) means that the problem moves to a state in the set of states (¯ ω , η¯). Also, let u be the next item to be selected using the L-level heuristic. We can write the probability obtained by the algorithm as follows:

L

P (r, ω, n) =

Cj X

PL−1 (r − xuku , ω − wu , η − eu )puku .

ku =0

17

Also, the actual probability of reaching the target level by repetitively running the L-level heuristic can be written as follows: PL actual (r, ω, η)

=

Cj X

PL actual (r − xuku , ω − wu , η − eu )puku

ku =0

≥

Cj X

PL (r − xuku , ω − wu , η − eu )puku

(11)

PL−1 (r − xuku , ω − wu , η − eu )puku = PL (r, ω, n)

(12)

ku =0

≥

Cj X ku =0

Inequality (11) is due to the assumption of the induction and inequality (12) is due to Theorem 1.

Online Appendix C The 1-Level Heuristic Algorithm for the Adaptive SKP-NOR For a general Static SKP, the parametric search algorithm proposed by Henig (1990) can be used to produce a set of static solutions which contains all the optimal or close-to-optimal solutions for all possible target profit values. We first summarize this algorithm and then present the proposed 1-level heuristic. The deterministic equivalent of the objective function of the Static SKP-NOR is as follows,   P  µ y − r i i qi∈N maximize  P σ2y  i∈N

where yi =

Pni

j=1 yij .

i i

Now, consider the following subproblem, X QA± (w, n, α) : maximize (αµi ± (1 − α)σi2 )yi

(13)

i∈N

subject to: X

wi y i ≤ w

i∈N

yi ≤ ni , yi ∈ Z +

∀i ∈ N

where QA− and QA+ are the problems with the (−) and (+) signs between the mean and variance terms in the objective function (13), respectively. This is the classical bounded integer knapsack problem and can be solved very efficiently. Solving QA− (QA+ ) for all α ∈ [0, 1] values generates a finite set of solutions, named EF − (EF + ). Each of these solutions also represents a pair of total mean P P reward and total variance; i.e., ∀s ∈ EF ± , (ms , vs ) = ( i∈s µi , i∈s σi2 ). The EF − and EF + together constitute a convex efficient frontier of high-mean profit values in the mean-variance domain. Also, if P there is a feasible solution s with i∈s µi ≥ r, this efficient frontier is guaranteed to contain the optimal point for any target reward level r; otherwise, the optimal solution is not necessarily contained in the convex efficient frontier. Henig (1990) shows that the points on the convex frontier contain solutions which approximate the optimal solutions very well and, in general, the number of points is quite small. In Online Appendix D, we develop a polynomial-time algorithm for the static SKP with equal 18

weights, wi = w ∀i ∈ N . Using this algorithm, the efficient frontier can be constructed in polynomial time. Solving the subproblems with such a polynomial-time algorithm increases the effectiveness of the 1-level heuristic substantially since the complexity of the 1-level heuristic is dependent on the complexity of the solution method used for the subproblems. To solve QA± (w, n, α) for all α ∈ [0, 1], we start from two pairs of mean-variance points and recursively calculate new α values and solve new subproblems. This is based on the fact that, if the objective function values for α1 and α2 are equal, that is QA± (w, n, α1 ) = QA± (w, n, α2 ), then QA± (w, n, α) = QA± (w, n, α1 ) for all α ∈ [α1 , α2 ] (see Proposition 3 in Ilhan et al. 2007). Thus, the algorithm itself chooses the α values which are needed to construct the subproblems. Now, we can transform the algorithm for rewards with discrete distributions given in Figure 2 into the 1-level heuristic for Normally distributed rewards. The idea of the 1-level heuristic is as follows. First, assuming we have selected item i, we find the best solutions in terms of objective function (13) for all possible values of the new target level rˆ = r − Xi , or equivalently for all possible realizations of the random reward of the selected item i, Xi = x ∈ (−∞, ∞). This is achieved by using the parametric search algorithm above. Second, by using the solutions obtained in the first step, we calculate the probability of reaching the target level r by selecting item i. After repeating these steps for all items in N , we select the item with the highest probability. Below, we elaborate on each step of the algorithm. Step1. Finding the best (m,v) points: Solve the subproblems QA± (w − wi , n − ei , α) for all possible α values to construct a set of mean-variance pairs. Then, construct a solution set S of the following form:  (m∗1 , v1∗ ) is the best for      (m∗ , v ∗ ) is the best for 2 2 S = . ..      ∗ ) is the best for (m∗|S| , v|S|

rˆ ∈ [−∞, a1 ) rˆ ∈ [a1 , a2 ) rˆ ∈ [a|S−1| , ∞)

Step 2. Probability calculation: The probability of reaching the target level value r if item i is selected can be obtained as follows: Z ∞ 1 Pi (r, w, n) = PS (r − x, w − wi , n − ei )fi (x)dx (14) −∞

The previous step provides the optimal or close to the optimal solutions to the static problem for all possible target values. For each interval of x and corresponding (m∗s , vs∗ ) pair in S, PS (r − x, w − wi , n − ei ) can be calculated by using the Normal distribution. Thus, equation (14) can be approximated as follows: P1i (r, w, n) ≈

s=|S| Z as X s=1

as−1

Φ

m∗s + x − r √ ∗ vs

fi (x)dx

where Φ(x) is the c.d.f. of the Standard Normal and fi (x) is the p.d.f. of Normal distribution with parameters µi and σi2 . In this study, we approximate this integral by discretizing fi (x) with equal-probability values, pdscrt = 0.001. Letting D = 1/pdscrt , xd = µi + Φ−1 (pdscrt d)σi and (m ˜ ∗d , v˜d∗ ) be the mean-variance pair to rˆ = r − xd in S, then P1i (r, w, n) can be that corresponds P m ˜ ∗d −r+xd √ ∗ approximated by pdscrt D . d=0 Φ v˜d

19

After repeating Steps 1 and 2 for all i ∈ N , the next item to select is i? = argmax{P1i |i ∈ N }.

Online Appendix D The Static SKP with Equal Weights – A Polynomial-Time Algorithm The algorithm presented here is based on the study by Henig (1990). When all weights are identical (e.g., wi = 1, ∀i ∈ N ), the Static SKP-NOR reduces to the following formulation: P µi yi − R qi∈N P 2 i∈N σi yi

M aximize subject to:

X

yi ≤ W

(15)

i∈N

yi ∈ {0, 1}

∀i ∈ N

(16)

This problem is a special case of the SKP, for which DP algorithms exist (see Morton and Wood 1998). However, we develop a polynomial algorithm by using the idea that, if there is a feasible solution s P such that i∈s µi > R, the optimal solution is one of the solutions obtained by solving the following problem for all α ∈ [0, 1] (see Henig 1990): M aximize

X

[αµi − (1 − α)σi2 ]yi

(17)

i∈N

subject to:

constraints (15) and ( 16)

The solution to this subproblem for any given α value can easily be obtained. We calculate the coefficients of the items; i.e., zi = αµi − (1 − α)σi2 . If zi < 0, then we set the corresponding yi ’s to 0. If there are less than W positive zi values, then we set all corresponding yi to 1; otherwise, we sort the zi ’s in non-ascending order and set the first W yi ’s to 1. The particular values of zi ’s are not important. The solution is dependent only on the signs and the relative values of the zi ’s. Thus, if the signs and the order of the coefficients do not change for an interval of α values, then the optimal solution stays the same for that particular interval. We can generate all the intervals of α by evaluating all pairs of coefficients and signs of individual coefficients, as follows. 1. The relative positions of zi and zj change if the α value passes the following point where zi = zj for j > i: αµi − (1 − α)σi2 = αµj − (1 − α)σj2 =⇒ α =

σj2 − σi2 µj − µi + σj2 − σi2

2. The sign of zi changes if α passes the following point: αµi − (1 − α)σi2 = 0 =⇒ α =

σi2 µi + σi2

There are |N |2 /2 values of α for the first case and |N | values of α for the second case. Considering α = 0 and α = 1, we have at most |N |2 /2 + |N | + 2 different α values. Moreover, some of them may 20

not be in the interval [0, 1] since there may be cases with µi > µj and σi < σj . If we generate all α values and combine them in an ordered set, then each successive pair of α values forms an interval in which the optimal solution stays the same. The complexity of finding all α values is O(|N |2 ) . The complexity of solving the problem with the objective function (17) for each α value is O(|N | log |N |) if we use the Quicksort algorithm. It follows that the complexity of the algorithm which solves the P original problem is O(|N |3 log |N |). Finally, if there is no feasible solution s such that i∈s µi > R, then we can still apply this algorithm by simply changing the (−) sign to a (+) sign between the terms in objective function (17). However, in this case, the algorithm does not guarantee the optimality of the solution to the problem.

Online Appendix E The Design of the Numerical Analysis Our first objective is to test the efficiency and accuracy of the 1-level heuristic by comparing its solution with the optimal static solution and with bounds on the optimal adaptive solutions. Our second objective is to identify the conditions under which it is valuable to employ an adaptive policy over a static one. All algorithms used in this study are coded in C++ and run on a PC with a Pentium 4 CPU and 512 MB of RAM. The test problems are composed of items with Normally distributed rewards. Thus, only the means and variances of the rewards are needed to characterize the randomness in the problem. We concentrate on Normally distributed rewards since the Normal distribution has a wide range of applications in practice and, since there are efficient algorithms for the Static SKP-NOR that enable us to carry out the tests effectively. We designed two sets of problems. The first set consists of problems containing 6 to 15 item types, for which we can obtain the optimal adaptive solution using our DP algorithm. The second set consists of problems containing 20 to 30 types for which we cannot obtain the optimal adaptive solution but for which we can still find the optimal static solution and compare it against the 1-level heuristic. The detailed description of the construction of these sets is given in the following two subsections.

E.1. The Design of Problems with 6 to 15 Item Types One of the objectives of the numerical study is to analyze the effect of the different levels of problem parameters. We first examine a set of problems with equal weight items (i.e., wi = 1). This is motivated by the fact that in the inventory audit problem, the time that auditors spend in a supplier is usually one day. Furthermore, the case with equally-weighted items allows us to analyze the effect of the stochasticity of rewards, without the confounding impact of different item weights. E.1.1. The Design of the Equal-Weight Cases We created test problems with N = 6, 8, 9, 10, 12, and 15. In all test problems, each type contained one item only; i.e., Ni = 1, i ∈ N . Thus, the number of item types and the total number of items available were the same. We used items with Normally distributed combining different sets of coefficients of These sets are given in Figure 4, where values, resulting in 100 possible mean/c.v

rewards. We created 53 different groups of item types by variation (c.v.) and different sets of mean reward levels. there are 10 sets of c.v. values, 10 sets of mean reward combinations. For example, the combination of two mean 21

(a)

(b)

Figure 4: The scheme to create reward distributions of item types for the Adaptive SKP. (a) Each coordinate point defines the distribution of an item type. (b) Each combination of mean-level and c.v.-level gives a group of item types.

values {m3, m4} and three c.v. values {c1, c3, c5} generates a problem with 2 × 3 = 6 item types, corresponding to coordinate points numbered 14, 16, 18, 20, 22 and 24 in Figure 4. Excluding the 47 possible combinations for which the number of items is either more than 15 or less than 6 (e.g., the combination of {c1, c2, c5, c6} and {m1, m2, m3, m4, m5} has 20 item types) results in a total of 53 different scenarios with different numbers of items and different reward distributions. We used three different target reward levels: the low-level, Rl = b0.70rc; the medium level, Rm = P P b0.85rc; and the high level, Rh = brc, where r = max{ i µi yi : i wi yi ≤ W }. Also, we used two levels of the knapsack capacity to analyze the effect of the number of items to be inserted into P P knapsack: the low level, Wl = b i∈N wi /3c, and the high level, Wh = b i∈N wi /2c. In total, we created 318 (= 53 × 3 × 2) problem instances combining 53 sets of reward distributions, 3 levels of the target reward and 2 levels of the knapsack capacity. E.1.2. The Design of the Unequal-Weight Cases We used the same experimental design outlined above except that we fixed the target level to its high level and assigned different weights to the items. After sorting the items by their mean rewards in descending order, we created weights by using the following 2 levels: in Level-1, we set wi = 2 for i = 1, . . . , b|N |/2c, and wi = 1 for i = (b|N |/2c + 1), . . . , |N |; in Level-2, we set wi = 4 for i = 1, . . . , b|N |/2c, and wi = 1 for i = (b|N |/2c + 1), . . . , |N |. Fixing the target reward factor and adding 2 weight levels, we created 212 (=53 × 2 × 2) problems.

E.2. The Design of Problems with 20 to 30 Item Types To study larger instances, we created problems with N = 20, 22, 24, 26, 28, 30 items. We generated 1000 problems for each level of N (in total, 6000 problems). For each of the 6000 problems, we selected means and variances of the item rewards from the 36 coordinate points given in Figure 4 each with

22

probability 1/36. In other words, we randomly picked N pairs of mean reward and c.v. values from the set of 36 points (allowing the same point to be selected more than once). Also, item weights were set to 1, and for each problem the capacity and the target reward levels were generated by setting P P W = N/2 and R = 0.9 × max{ i µi yi : i yi ≤ W }.

Online Appendix F The Evaluation of the 1-Level Heuristic In this appendix, we evaluate the efficiency and accuracy of the 1-level heuristic. Since we did not have an optimal algorithm for solving the Adaptive SKP-NOR, we used the DP algorithm in Section 2 along with the discretization scheme in Section 4.1 to obtain upper bounds on the optimal adaptive probabilities. To solve the Static SKP-NOR with equal weights, we used the polynomial time algorithm presented in Online Appendix D for the unit-weight version of this problem. To solve the Static SKPNOR with unequal weights, we used the algorithm by Henig (1990). F.1. Equal-Weight Case This subsection evaluates the accuracy and the efficiency of the 1-level heuristic for test problems with equally-weighted items. F.1.1. The Accuracy of the Heuristic for Problems with 6 to 15 Items As mentioned before, the probability obtained by the 1-level heuristic at each step is not the real probability of reaching the target reward, since we use the probability values approximated by the static sub-problems instead of the actual ones. Thus, to estimate the real probabilities for the policy obtained by the 1-level heuristic (P1actual ), we used Monte Carlo simulation. For each problem, we simulated 2000 cases to estimate the resulting probability of reaching the target value under the policy obtained by the 1-level heuristic. The result of each run was a success or a failure in attaining the target reward level. The total number of successes divided by the total number of runs is an estimate of the true probability of reaching the target under the policy obtained by our 1-level heuristic. In the absence of an optimal algorithm for solving the Adaptive SKP-NOR, we used the discretization A scheme and obtained P , the upper bound on the optimal adaptive probability (PA ). The difference A between P and P1actual is an overestimation of PA − P1actual (i.e., an underestimation of the accuracy of the 1-level heuristic). The average of the gaps between the solutions obtained by the upper bounds on the optimal adaptive A solution and the 1-level heuristic (i.e., P − P1actual ) is 0.019 with a standard deviation of 0.015. Figure 5(a) shows that the maximum deviation of the 1-level heuristic from the optimal adaptive solution is 0.06. Large deviations occur when the target level and/or the level of diversity in c.v.s are relatively high. These results show that the 1-level heuristic results in solutions that are very close to the optimal adaptive solutions. F.1.2. The accuracy of the heuristic for problems with 20 to 30 items Due to the large state space of the DP model we could not test the proposed 1-level heuristic against the optimal adaptive policy for problems with more than 15 items with unit-weights. However, in the experiments with 6 to 15 items, the gap between the solution to the 1-level heuristic and the upper bound on the optimal adaptive solution did not change as N increased from 6 to 8, 9, 10, 12, and 15. Thus, we expect that the 1-level heuristic also performs well for larger problems.

23

(a)

(b)

Figure 5: Histograms showing the deviation of the 1-level heuristic from the upper bound on the optimal solution for problems with (a) equal-weight items (b) unequal-weight items. The first case contains 318 test A problems and the second one contains 212 test problems. The x-axis represents [P − P1actual ], and y-axis shows the frequency of those values. At the bottom of each chart (just above x-axis), the mean of these differences, x ¯ and its 95% confidence interval are provided.

To further investigate the accuracy of our 1-level heuristic in large problems, we compared the gap between the solution obtained by our 1-level heuristic and the optimal static solutions (P1actual − PS ) for problems with 6 to 15 items to the gap for problems with 20 to 30 items. On average, the probability of reaching the target using the 1-level heuristic is 0.077 higher than is the probability when the static algorithm is employed. The standard deviation of the difference is 0.030. For each N value, the average gap between the heuristic and the optimal static solutions is at least 0.065. Figure 6 shows that the gap between the 1-level heuristic and the optimal adaptive solutions as well as the gap between the 1-level heuristic and the optimal static solutions do not change much as the number of items increases. Thus, we conclude that the 1-level heuristic performs very well in terms of the probabilistic objective function of the SKP. F.1.3. The efficiency of the heuristic The runtime of the DP algorithm increases exponentially. The numerical analysis results in the following regression equation for the runtime of the DP algorithm, when the Normal reward distributions are descretized based on the method in Section 4.1: runtime = exp(−7.01 + 0.48|N | + 1.24W ) with R2 = 0.98 and p-values < 0.0005 for the coefficients. The regression equation for the runtime of the 1-level heuristic is runtime = 0.06|N |1.2 W 1.1 with R2 = 0.98 and p-values < 0.0005 for the coefficients. The large runtime difference between the two algorithms is due to the fact that the optimal DP algorithm solves the problem for all states while the heuristic solves it only for the current state. Thus, as long as we have an efficient method to solve the static subproblems, the 1-level heuristic will run much faster than the optimal adaptive algorithm. F.2. Unequal-Weight Case This subsection evaluates the accuracy and the efficiency of the 1-level heuristic for test problems with unequally-weighted items.

24

Figure 6: The plot of the number of items versus average probabilities corresponding to the optimal static solutions, the upper bound on the optimal adaptive solutions, and the 1-level heuristic solutions (i.e., PS , the upper bound on PA , and P1actual , respectively).

F.2.1. The accuracy of the heuristic Figure 5(b) shows the differences between the solution values obtained by the 1-level heuristic and the upper bounds on the optimal adaptive solution for test problems with unequally-weighted items. The average of the differences was 0.028 with a standard deviation of 0.009. The maximum deviation of the 1-level heuristic from the optimal adaptive solution was 0.08 in our test problems. Large deviations occur when the level of diversity in the c.v.s is relatively high. These results confirm our previous observation that the 1-level heuristic results in solutions whose objective function values are close to those of the optimal adaptive algorithm. F.2.2. The efficiency of the heuristic The regression equation for the runtime of the 1-level heuristic for the problems with unequal-weights is runtime = 0.028|N |2.63 W 0.46 with R2 = 0.86 and p-values < 0.0005 for the coefficients. A single run of the heuristic for problems with 6 to 15 items took a maximum 14 seconds with an average 3.2 seconds. For the inventory audit problem with |N | = 40 and W = 7 with unequal weights, the estimated run time for the heuristic is less than 19 minutes. Thus, the 1-level heuristic works well for problems with both equal and unequal weights.

Online Appendix G The Value of the Adaptivity The value of adaptivity (VoA) is the gap between the objective function values of the optimal adaptive and the optimal static solutions (VoA = PA − PS ). This is important for both the optimal and the heuristic algorithms since, if the VoA is small, there is no value in using an algorithm designed to solve the Adaptive SKP. By using the DP algorithm with the lower bounding scheme in Section 4.1, we obtain lower bounds on the VoAs (i.e., = PA − PS ) for the 530 problem instances with 6 to 15 items (which includes the equal- and unequal-weight cases). The next two subsections analyze how the stochastic properties of the item rewards and the different item weight levels affect the VoA. The third subsection explores how well the 1-level heuristic captures the value of adaptivity. 25

G.1. The Effect of Stochasticity on the VoA The design of the reward distributions in Figure 4 enables us to test the value of using the adaptive model over the static model under many different conditions, in addition to several target reward and knapsack capacity levels. Specifically, the design in Figure 4 enables us to observe the effects of the dispersion, variety and average of the c.v.s and mean values of the random rewards on the value of the adaptive policy over the static one. • Dispersion of c.v.s: This factor provides insight into how the VoA changes as the difference between the c.v. values of the items (i.e., the variability in the item rewards) increases although their average stays the same. To examine this, we compare 120 problems created with {c3, c4}, {c2, c5} and {c1, c6} (which are low, medium and high levels of dispersion in the c.v.s, respectively). For these cases, the average c.v. value remains at 0.7 while the range of c.v.’s increases from 0.02 (low value) to 1 (high value). For each level of dispersion in the c.v.s, we have 40 instances; i.e., 24 equal-weight instances plus 16 unequal-weight instances. We performed a two-sample t-test of the null hypothesis “H0 : dispersion level of c.v.s has no effect on VoA” for every pair of this factor and calculated the corresponding confidence intervals. If the confidence interval for the difference between the adaptive and the static policy values includes zero, there is no statistically significant difference in the average of the VoA under the different c.v. scenarios {c3, c4}, {c2, c5} and {c1, c6}. • Variety of c.v.s: As the variety level increases, there are more items with different c.v. values although their average stays the same. Comparing problems created with {c1,c6}, {c1, c2, c5, c6} and {c1, c2, c3, c4, c5, c6} enables us to analyze the effect of low, medium and high levels of variety in the c.v.s. The difference between “dispersion” and “variety” is that in the latter case there are more items with different levels of c.v.s than in the previous case. The low, medium and high levels of this factor correspond to instances generated by 4, 7 and 6 mean levels (counting the valid combinations of c.v. and mean level sets), respectively, and 10 different target, weight and capacity levels. Thus, the low, medium and high levels contain 40, 70 and 60 instances, respectively. The null hypothesis “H0 : variety level of c.v.s had no effect” was again tested for every pair of levels of this factor. • Average of c.v.s: We compared problems created from the following three scenarios for c.v.s, {c1, c3}, {c2, c4} and {c3, c5}, which correspond to gradually increasing the average c.v. values (0.4, 0.6 and 0.8, respectively) although the difference between the c.v. values stays the same. This enabled us to observe the impact of increasing c.v.s (or risk levels) for all item rewards together. Note that in the previous two cases (i.e., the dispersion and the variety) the average c.v. levels stays the same. There are 40 instances for each level of the average of the c.v.s. • We did similar comparisons for the mean rewards. Problems created with {m3, m4}, {m2, m5} and {m1, m6} were used to analyze the effect of the dispersion in mean rewards; problems created with {m1, m6}, {m1, m2, m5,m6}, {m1, m2, m3, m4, m5, m6} were used to analyze the effect of the variety in mean rewards; and problems created with {m1, m3}, {m2, m4} and {m3, m5} were used to analyze the effect of the average of the mean rewards. Table 1 summarizes the results for the difference between the lower bound on the probability of reaching the target reward using the optimal adaptive policy and the probability of reaching the target using the optimal static policy. The average and the maximum differences are 0.05 and 0.14, respectively. For about 75% of the problems, the benefit of the adaptive policy over the static policy (i.e., the VoA) is more than 0.03. 26

27 0.02 0.04 0.04 0.05 0.05 0.03 0.06 0.04 0.03 0.04 0.04 0.04 0.05 0.04 0.04

low mid high low mid high low mid high low mid high low mid high

mean disp.

mean average

mean variety

Target level

Knapsack capacity low high

-

-

0.04 -

-

0.04 - 0.04 0.04

0.04 0.04 0.04 0.05 0.05 0.05 0.07 0.07 0.06

5

6

-

-

-

-

0.04 0.04 0.04 0.04 0.05 0.05

0.03 0.03 0.02 0.05 0.05 0.06 0.04 0.05 0.05

-

0.04 - 0.04 - 0.05

-

0.05 0.05 0.05 0.05 0.06 0.06 0.07 0.07 0.08

0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.03

0.01 0.01 0.01 0.03 0.03 0.03 0.05 0.05 0.05

4

mean average low mid high 8

10

-

-

-

0.04 0.03 0.04 0.04 0.03 0.03

0.04 0.03 0.02 0.03 0.04 0.05 0.03 0.03 0.04

0.04 - 0.03 - 0.04

-

0.04

0.04 0.04 0.06 0.05 0.06 -

0.01 0.01 0.03 0.01 0.02 0.03 0.02 0.02 0.03

0.01 0.01 0.00 0.03 0.03 0.05 0.04 0.04 0.06

3

mean variety low mid high 2

3

0.02 0.05 0.05 0.04 0.05 0.05

0.03 0.05 0.05

0.04 0.03 0.03 0.03 0.04 0.03 0.02 0.05 0.04

0.03 0.05 0.04 0.03 0.05 0.05 0.02 0.06 0.05

0.02 0.05 0.06 0.04 0.05 0.04 0.04 0.03 0.03

0.03 0.07 0.06 0.03 0.07 0.09 0.05 0.09 0.10

0.01 0.03 0.02 0.03 0.03 0.02 0.04 0.03 0.02

0.02 0.01 0.00 0.04 0.05 0.03 0.03 0.07 0.06

1

Target level low mid high 2

0.04 0.00 0.00 0.05

0.02 0.04 0.05 0.05 0.05 0.05

0.04 0.04 0.03 0.03 0.04 0.03

0.04 0.04 0.04 0.05 0.04 0.05

0.04 0.05 0.04 0.04 0.04 0.04

0.04 0.07 0.06 0.07 0.09 0.07

0.02 0.02 0.02 0.02 0.03 0.03

0.01 0.01 0.04 0.04 0.04 0.07

1

Capacity low high

0.04 -0.01 0.14 0.037 0.05 -0.01 0.13 0.031

0.03 -0.01 0.11 0.022 0.05 0.00 0.13 0.033 0.06 -0.01 0.14 0.041

0.04 -0.01 0.13 0.033 0.03 -0.01 0.11 0.027 0.04 -0.01 0.13 0.033

0.04 -0.01 0.13 0.034 0.04 0.00 0.13 0.035 0.05 0.00 0.14 0.037

0.04 0.00 0.14 0.037 0.04 -0.01 0.13 0.032 0.04 -0.01 0.13 0.033

0.05 -0.01 0.13 0.038 0.07 0.00 0.13 0.040 0.08 0.01 0.14 0.033

0.02 -0.01 0.06 0.020 0.02 0.00 0.05 0.013 0.03 0.00 0.05 0.011

0.01 -0.01 0.03 0.008 0.04 0.00 0.08 0.018 0.05 -0.01 0.13 0.038

aver. min max stddev

Table 1: Summary of the experimental results on the difference between the lower bound on the objective function value of the optimal adaptive solution and the objective function value of the optimal static solution with respect to different factor levels. Bold values correspond to differences that are statistically significantly different from zero.

3 8 10

4 5 6

1 2 3

3 8 10

0.02 0.02 0.01 0.02 0.02 0.01 0.03 0.02 0.02

cv variety

4 5 6

low mid high

3

0.01 0.01 0.01 0.03 0.04 0.03 0.04 0.04 0.04

2

cv average

1 2 3

1

low mid high

no.

cv disp.

level

mean dispersion low mid high

The tests show that the target reward level is an important factor affecting the value of the adaptive solutions. If the target level is low, the average of the lower bound on the difference between the static solution and the adaptive solution is 0.04 and if it was high, the corresponding average is 0.07. The estimated increase in the difference from the low target level to the high target level is 0.036 with a 95% confidence interval of (0.028, 0.044), which does not contain 0. When the target level is low, the probability of reaching the target level is close to 1 so that the possibility of an improvement due to the use of the adaptive model is small. The dispersion and variety of the c.v.s also impact the value of adaptivity. When the c.v. dispersion level changes from low to high, the value of adaptivity increases by an average of 0.045 with a 95% confidence interval of (0.032, 0.058). When we have items with more diverse variance levels, the ability to select between items with high and low variances becomes more valuable. Also, when the c.v. variety level changes from low to high, the value of the adaptivity increases by an average of 0.028 with a 95% confidence interval of (0.012, 0.044). The problem instance that shows the maximum value of adaptivity (0.15 difference between objective function values) has the high level of the c.v. variety and the high level of the target reward. Each of the remaining conditions exhibits no statistically significant effect on the value of the adaptive policy. Obviously, using a stochastic model may be valuable over a deterministic model if there is randomness in a knapsack problem. However, our experiments indicate that the mean values of the item rewards, the average level of c.v.s and the knapsack capacity do not have a direct effect on the value of an adaptive model over a static model. An adaptive stochastic model seems to have value when the problem has a diversity in risk beyond having randomness, which provides the ability to change the amount of risk taken depending on the target level. G.2. Equal-Weight Cases vs. Unequal-Weight Cases We compared the values of adaptivity obtained for each unequal-weight case with the corresponding equal-weight case. The 95% confidence interval of (−0.018, 0.005) around the mean of the differences between the cases with unit weights and the cases with Level-1 weights contains 0. Thus, we cannot conclude that the value of adaptivity is higher or lower for the cases with Level-1 weights. On the other hand, the average difference between the VoA for the cases with unit weights and the VoA for the cases with Level-2 weights was −0.021 with a 95% Confidence Interval of (−0.032, −0.011). These results suggest that, when items are not equally-weighted, the difference between item weights can have a statistically significant effect on the value of adaptivity (compared to cases with equally-weighted items).

G.3. Capturing the Value of Adaptivity We compared the 1-level heuristic results for the 530 problems with 6 to 15 items against both the optimal adaptive and optimal static solutions to see if the 1-level heuristic performs closer to the optimal adaptive or to the optimal static algorithm. For this purpose, we use the Ratio of VoAs , A which is (P1actual − PS )/(P − PS ). If the ratio is close to 1, then the 1-level heuristic performs close to the optimal adaptive solution. If the upper bound on the VoA is 0, we set the ratio to 0. In general, the ratios are closer to 1 than to 0, with a mean value of 0.72. The ratio consistently increases as the VoA increases and most of the problems for which the ratio is close to 0 are the ones which have almost no value of adaptivity. These results show that the 1-level heuristic successfully captures most of the value in solving the SKP in an adaptive manner. 28