Neural Networks based Data Mining and Knowledge Discovery in Inventory Applications

Neural Networks based Data Mining and Knowledge Discovery in Inventory Applications Kanti Bansal, Sanjeev Vadhavkar, Amar Gupta 1. Introduction Large ...
Author: Gabriel Carson
4 downloads 0 Views 137KB Size
Neural Networks based Data Mining and Knowledge Discovery in Inventory Applications Kanti Bansal, Sanjeev Vadhavkar, Amar Gupta 1. Introduction Large organizations, especially geographically dispersed organizations, are usually obliged to carry large inventories of products ready for delivery on customer demand. The problem is how much quantity of each product should be kept in the inventory at each store and each warehouse. If too little inventory is carried relative to demand, unsatisfied customers frequently turn to competing organizations. On the other hand, a financial cost is incurred for carrying excessive inventory. In addition, some products have short expiration periods and must be replaced periodically. Inventories take a lot of money to maintain. The best way to manage an inventory is through the development of better techniques for predicting customer demands and managing stock inventories accordingly. In this way, the size and the constitution of the inventory can be optimized with respect to changing demands. In this paper, we describe our neural network based efforts to optimize operations at an organization, which we call Retailcorp. 2. Highlights of Application Domain Historically, Retailcorp has maintained an inventory of approximately a billion dollars on a continuing basis and has used traditional regression models to determine inventory levels for each item. The corporate policy of Retailcorp is governed by two competing principles: minimize total inventory and achieve highest level of customer satisfaction. The former principle is not quantified in numerical terms. On the latter issue, Retailcorp strives to achieve a 95% fulfillment level. That is, if a random customer walks into a random store, the probability for the availability of the particular item is 95%. The figure of 95% is based on the type of goods that Retailcorp carries, and the service levels offered by competitors of Retailcorp for the same items. Retailcorp has about 1000 stores, and maintains information on what was sold, at what price, and to whom. The last piece of data has not been utilized in any inventory-modeling endeavor at Retailcorp. After reviewing various options, Retailcorp adopted a "three-weeks of supply" approach. This approach involves the regression study of historical data to compute a seasonally – adjusted estimate of the forecasted demand for the next three week period. This estimated demand is the inventory level that Retailcorp keeps, or strives to keep, on a continuing basis. Each store within the Retailcorp chain orders replenishments on a weekly basis and receives the ordered items 2-3 days later. Overall, this model yields the 95% target for customer satisfaction. 3. Neural Network based Data Mining and Modeling There is no general theory that specifies the type of neural network, number of layers, number of nodes (at various layers), or learning algorithm for a given problem. As such, today’s network builder must experiment with a large number of neural networks before

converging upon the appropriate one for the problem in hand. In order to evaluate the relative performance of each neural network, we made use of three different coefficients: Pearson Correlation Coefficient (P. Correlation), Normalized Mean Square Error (NMSE), and Absolute Error (AE). The Pearson Correlation Coefficient shows how well trends, i.e., bumps and valleys are picked up. The Pearson Correlation is a number ranging between -1 and 1. If the simulation predicts bumps and valleys perfectly, then the corresponding Pearson Correlation would be 1. The Normalized Mean Square Error (NMSE) is a method to compare the mean of a series against the predicted values. If the NMSE is greater than 1, then the predictions are doing worse than the series mean. If the NMSE is less than 1, then the forecasts are doing better than the series mean.

The Absolute Error (AE) indicates, as a percentage value, the average difference between the predicted and actual value. For instance, an AE of 0.40 means that the neural network will provide predictions which, on the average, are within plus or minus 40% of the actual values. Unfortunately, the AE is not a measure of how well the model predicts trends; for that one must use P. Correlation factor.

To build the neural networks, a freeware product called SNNS version 4.0 was used. Most major neural network architectures and major learning algorithms were tested using sample data patterns from Retailcorp. Investigations into recurrent neural networks, as well as topographic learning, and Hebbian learning were discontinued based on poor preliminary forecasting results. Multi Layer Perceptron (MLP) models and Time Delay Neural Network (TDNN) models yielded promising results and were studied in greater detail. In terms of modeling, short intervals of times require a greater number of forecast points, show greater sales demand variability, and exhibit lesser dependence on previous sales history. As such, good short-interval predictions can be very difficult to obtain. Using MLP architectures and sales data for one class of products, we initially attempted to forecast sales demand on a daily basis. The results were unsatisfactory: the networks produced predictions with very low Pearson Correlation (generally below 20 %) and very high Absolute Error Values (above 80 %). Such large errors rendered forecast values useless. Therefore, modeling for larger time intervals was attempted. Forecasting for a week proved more accurate than for a day and forecasting for one month proved more accurate than for a week. Indeed, when predicting aggregate annual sales demand, we obtained average Absolute Error values of only 2%. Keeping a weekly prediction interval

provided the best compromise between the accuracy of prediction and the usefulness of the predicted information for Retailcorp. The weekly forecasts are useful for designing inventory management systems for individual store within Retailcorp chain, while the yearly forecasts are useful for determining the performance of a particular item in a market and the overall financial performance of the organization. The neural network was trained with historic sales data using two methods: the standard method and the rolling method. The difference between these two methods is best explained with an example. Assume that weekly sales data (in units sold) were 10,20, 30, 40, 50, 60, 70, 80, 90, 100, etc. In the standard method, we would present the data: "10, 20, 30" and ask the network to predict the fourth value: "40". Then, we would present the network with "40, 50, 60" and ask it to predict the next value: "70". We would continue this process until all training data were exhausted. On the other hand, using the rolling method, we would present historic data as "10, 20, 30" and ask the network to predict the fourth value: "40"; then, we would present the network with "20, 30, 40" and ask it to predict the fifth value: "50". We would continue using the rolling method until all the training data were exhausted. Either method can be utilized to produce the training sets. The rolling method has an advantage over the standard method in that the former produces a greater quantity of training examples, but at the expense of training data quality. Often, the rolling method can confuse the network because of the close similarity between training examples. Using the previous example for instance, the rolling method would produce "10, 20, 30"; "20, 30, 40"; "30, 40, 50". Each of these training examples differs from another data set by a single number only. This minuscule difference may confuse the network and destroy its ability to forecast numbers. On the other hand, with the standard method, one produces a greater quality of training examples over quantity. The differentiation problem was never encountered – training sets, though few in number were adequately different from each other. With respect to neural networks, there are two types of memory: implicit and explicit. Implicit memory is stored by the connections within the neural network itself (as in Time Delay Neural Networks), whereas explicit memory is presented to the network as part of its input (as in Multi Layer Perceptron Neural Networks). The size of this memory can be varied considerably based on the size of the window of historical data used in each case. For Retailcorp, we utilized history windows as large as 14 weeks, and as small as 0 weeks. The optimal configuration for the MLP models was determined to be seven weeks of historical data. At Retailcorp, some items sell infrequently. In fact, some of them may sell only twice or thrice a year at a particular store. This lack of sales data is a major problem in training neural networks. To solve the data scarcity problem, other methods of transformation of data, reuse, and aggregation of data had to be employed. The one we found most effective involved changing future data sets with some known fraction of past data sets. If X[i]’ represents the ith changed data set, X[i] represents the ith initial data set, X[i-1] represents the initial (i-1)th initial data set and µ is some numerical factor, then the new time series

can be computed as X[i]’ = X[i] + µ * X[i-1], X[0]’ = X[0]. The modified time series thus has data elements that retain a fraction of the information of past elements. By modifying the actual time series with the proposed scheme, the memory of non-zero sales items is retained for a longer period of time. It is easy to train the neural networks with the modified time series. Another solution to the scarcity problem is to recycle old data. That is, to train the neural network over the same data set several times. In the Retailcorp project, we allowed the neural networks to cycle through the data approximately 3000 times. The greatest advantage of recycling is that it allows for more accurate neural network solutions without having to increase the amount of data fed to the neural network (thereby saving space). However, the recycling method has two disadvantages: risk of over-training and overhead time. As such, we carefully balanced the amount of recycling, via experimentation, to get the optimized set of model and results. We used data from 1994 and 1995 for training the neural network, and the data from 1996 for testing the validity of the model. 4. Interpreting Results As mentioned before the policies at Retailcorp are governed by two competing principles: minimize drug inventories and enhance customer satisfaction via high availability of items in stock. As such, we calibrated the different inventory models using two parameters: "undershoot" and "days of supply". The number of "undershoots" denotes the number of times a customer would be turned away if a particular inventory model was used over the "test" period. The "days-of-supply" statistic is the number of days the particular item in the inventory is expected to last. By using the latter parameter, one reduces the complexity and allows for equitable comparisons across different categories of items. For example, items in the inventory are measured in different ways: by weight or by volume or by number. If one talked in terms of raw amount, one would need to take into account different units of measure. However, the "days-of-supply" parameter allows all items to be specified in terms of one unit: days. The level of popularity of the item gets factored into the "days-of-supply" parameter. To compare with traditional statistical methods, a flat sales model was designed from the same set of data. The flat sales model, using 1994 – 1995 data as predictive for 1996 sales, assumes that the customer demand is a normal statistical curve from January 1996 – September 1996. Table 1 shows a comparison of the results from the various models. For slow-moving items, the MLP model is considerably better than the flat model (for example, in Table 1, file numbers #78, #82, #1235). While maintaining a 95% probability of customer, the MLP model reduces "days-of-supply" for items in the inventory by 66%. On the average, the neural network "undershoots" only three times (keeping the 95% customer satisfaction policy of Retailcorp). If one analyzes the results, one finds that our models suggests that, as compared to the "three-weeks of supply" thumb rule, the level of inventory is "reduced" for popular items and "increased" for less popular or unpopular items. This inference appears counter-intuitive at first glance. However, since fast moving items are already carried in large amounts, and since they can be replenished at

weekly intervals, one can reduce the inventory level without adversely impacting the likelihood of availability when needed. This is the factor that permits significant reduction in the size of the total inventory. 5. Conclusions The rapid growth of business databases has overwhelmed the traditional, interactive approaches to data analysis and created a need for a new generation of tools for intelligent and automated discovery in data. Knowledge discovery in databases presents many interesting challenges within the context of providing computer tools for exploring large data archives. Inventory control is a nascent application for neural network based data mining and knowledge discovery techniques. After studying the constraints that characterize the distribution arena, a new inventory optimization system has been developed using an ultra-sparse single layer neural network. By deploying this neural network based model, the inventory at Retailcorp consisting of over a billion dollars worth of drugs can be reduced by 50 % to about one-half billion dollars while

maintaining the original customer satisfaction level (95% availability level).

Table 1 Comparison of Results for Item #12345

For Further Reading:

[1] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (Eds.) (1996). Advances in Knowledge Discovery and Data Mining, MIT Press. [2] Brachman, R.J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G and Simoudis, E. (1996) Mining Business Databases, Communications of the ACM, vol. 39, no. 11, November, 1996. [3] Murata, Noboru et al. (1991). Artificial Neural Networks, vol. 1, 9- 14. [4] Soulie, Francois (1991). Artificial Neural Networks, vol. 1, 605-615. [5] Becker, S. et al. (1988). Improving the convergence of back propagation learning with second-order methods. Proceedings of the 1988 Connectionist Summer School, Morgan Kaufmann, 29-37. [6] Hebb, D.O. (1949). The Organization of Behavior. Wiley. [7] Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. [8] Mozer, M.C. (1993). Neural Net Architectures for Temporal Sequence Processing, Predicting the future and understanding the past (Eds. A. Weigend and N. Gershenfeld), Addison-Wesley. [9] Hush, D.R. and Horne, G.B. (1993). Progress in supervised neural networks: Whats new since Lippmann? IEEE Signal Processing Magazine, pages 8-38.

Suggest Documents