A New Species of Modelling System

IMA Journal of Mathematics in Management (1987) 1, 225-236 A New Species of Modelling System CLIFFORD JONES Department of Management Science and Sta...
18 downloads 0 Views 657KB Size
IMA Journal of Mathematics in Management (1987) 1, 225-236

A New Species of Modelling System CLIFFORD JONES

Department of Management Science and Statistics, University College of Swansea, Swansea, UK AND JOAQUIM CARMONA

Faculdade de Economia do Porto, Porto, Portugal

Modelling, rather than solution, support is generally recognized as a critical problem area inhibiting far greater use of mathematical programming. This paper makes use of spreadsheets to propose a new species of (mathematical programming dependent) modelling system which could overcome that inhibition.

1. Introduction

WE CAN solve fax bigger linear programming models than those whose algorithmoriented descriptions we can conveniently generate and those whose behaviour we can at times be sure to understand (Section 2). This paper is not concerned primarily with the understanding, but with the generation, barrier. One line of attack on this barrier is becoming out-of-date: the provision of a translator from a high-level modelling language, similar to algebraic notation, to the algorithm's format of the optimizers. One way of enhancing this approach will be briefly suggested here (Section 3). However, it leaves some major problems unsolved and some challenging opportunities (Section 4). The main purpose of this paper is, however, that of proposing an altogether different line of attack: by redefining some of a spreadsheet's basic concepts, a new species of (optimization) modelling system comes to life. SIMP (spreadsheet interface to mathematical programming), which may be regarded as a forerunner of this species, is then presented to illustrate the new approach (Section 6). The paper ends with an assessment of the strong and weak points of this approach (Section 7).

2. Obstacles to linear programming

Williams (1984), Darby-Dowman et al. (1984), Lockett (1985), and Jack (1985) are some of the authors who support the view that the nonexistence of solution 225 © Oxford University Prcn 1987

226

CLIFFORD JONES AND JOAQUIM CARMONA

methods or of capable packages is no longer the main obstacle to the use of linear programming. One major difficulty pointedly felt over the years by real-life modellers was the huge gap between the forms in which linear programming models originated and were developed ('modeller's forms') and the ways in which they eventually had to be described to the optimizers ('algorithm's forms'). The typical modeller's form is algebraic notation and the typical algorithm's form is the mathematical programming system (MPS) format. The traditional approach to the necessary conversion between the two forms, known as 'matrix generation', suffers from a number of important drawbacks (nongenerality of the conversion tool, requirement of computer programming expertise, and difficulty of verifying, documenting, and modifying models), which have been well discussed by Fourer (1983).

3. The new generation of systems

The logical way of solving the difficulty, as stated in these terms, consists of devising a general-purpose tool able to convert from algebraic notation to MPS (or some other optimizer's format) without requiring any other programming by the modeller—a translator for a high-level, declarative modelling language, closely resembling algebraic notation. A model could then be verified, documented, or modified as expressed in such a language, a big step forward by all counts. Fourer (1983) suggested precisely this. Implementations of this idea are now in existence (or about to be so): MAGIC (Day & Williams, 1986), CAMPS (Lucas & Mitra, 1985), and PLASMA II (Schmidt 1985). One possible shortcoming of these systems would be that they tend to enforce on the modelling process a strict discipline based on the structure of algebraic notation: define all sets first, then all variables and coefficients in terms of the sets, then all the linear constraints in terms of the sets, variables, and coefficients (Darby-Dowman et al., 1984). They therefore tend to become impersonal, noninteractive model communication tools (translators used after the model has been developed and written), rather than non-normative, interactive, personalized, intelligent modelling support systems of which the translator is only a part. The latter are not exclusive of the former, but rather represent a further enhancement along the same line of work, which couples the structuring influence of algebraic notation with a more flexible view of the modelling activity. One implementation at this approach is PORT (Carmona, 1986). Such systems are fine when people are involved who deliberate the use of linear programming, think up large, complex models in algebraic notation terms, and face the problem of generating an equivalent description the optimizers can accept. That is, they are fine for what probably is the current pattern of use of linear programming, and in so doing must tend to crystallize rather than break it. Part of the current pattern is that end users do not possess the necessary programming and modelling abilities to develop, implement, or even to explore models, experts being brought in for such purposes. The net effect of the new generation of systems based on algebraic notation is, then, that the experts are

A NEW SPECIES OF MODELLING SYSTEM

227

relieved of the need to get involved in computer programming and become more productive modellers. 4. From the new generation to the new species

A difficulty more fundamental than the mere conversion between algebraic notation and MPS can be seen in this state of affairs. It reinforces allegations of mathematical complexity, as in Lawrence (1973) (quoted by Lockett, 1985). Lockett himself refers to the handicap of having to use the 'language of mathematics'. Zahedi (1985) corroborates Lockett's observations. Roy et al. (1986) highlight the communication problems that arise between manager and the model developer, and the ensuing impossibility of responding quickly to the management's real needs. It is not that the automatic conversion, by the computer system, between the modeller's form of representation (or one as close to it as is achievable by electronic means) and the algorithm's is not the correct idea. It is undoubtedly essential. But we must not let ourselves be absorbed by one particular modeller's form (algebraic notation), even if it is the one that is actually being used in practice, for in so doing we may be shutting out of linear programming a potentially immense user base, and further inhibit the fulfilment of the promise held by the technique for decades now. Are we suggesting that brand new modeller's forms of representation be dreamt up ex nihilol Not quite, as will soon be made apparent. But we are concerned about possible users who are scared away by the need to use algebraic notation to create models, or even to explore models created for them, or to talk in such terms about (possible) models. We are concerned about possible users who ignore optimization techniques like linear programming, or who instinctively avoid deliberately using it. We do regard the big and increasing market for small, deterministic models of the simulation variety (refer to Grinyer & Wooller, 1975, and Jack, 1985, as well as to common daily observation for more recent times) as a great opportunity for linear programming too. There is a modeller's form, which, although new as regards linear programming, is proven for the development of other, arguably related, applications (e.g. financial simulation)—a form we can describe as a report, with column and row labels, comments, and notes incorporated around its body and space for numerical entries, either standing on its own or derived from others through a usually small number of simple arithmetical relationships, often pertaining to whole rows and/or columns, which underlie the report. This form is known by the term 'spreadsheet' in the jargon of accounting, where the first electronic implementation originated; prior to such implementation, all calculations had to be done 'manually' (i.e. by punching a calculator). This discussion has leapfrogged a description of earlier forms of modelling aids—those of specially designed 'matrix editors', which are aimed at users with a lower level of expertise. These non-normative systems relieve the user from a need to use not only computer programming but also algebraic notation. An example of such an editor for an IBM PC is found in IMPS (Jones, 1986).

228

CLIFFORD JONES AND J0AQU1M CARMONA

5. The new species of systems 5.1

Current Spreadsheets

In electronic spreadsheets, as we (still?) know them today, a formula is a calculation rule consisting of an expression, usually involving a number of cells, which once evaluated provides a number to be displayed in a cell that owns the formula. The user has the choice of whether such formulae are all to be re-evaluated whenever any change is made to the spreadsheet, or whether such generalized re-evaluation has to be specifically requested. He or she also usually has some degree of procedural control. For example, it can be specified whether formulae are to be successively evaluated in row or column order; this can be important if, for instance, a cell owns a formula referring to another cell which owns a formula referring to the former cell—a case of 'circular references', or one might say simultaneous equations, although they are not generally handled as such. If one formula being evaluated involves an empty cell in its expression, a zero is assumed. Since formulae are calculation rules, there would clearly be no point in any cell owning more than one formula. There would also be no point in a cell holding a number and simultaneously a formula; so cells hold at most one item of information. 5.2

Redefinition^)

The basic redefinition consists of regarding formulae not as calculation rules but as relationships between the values of the cells involved or conditions on the values which may be deposited in those cells. A number of follow-up redefinitions can be imagined. • Besides equality 'formulae' (relationships), one could have inequalities as well. • It does make sense for one cell to be involved in more than one relationship, and therefore for a cell to own more than one formula. It also makes sense for a cell which owns formulae to have a number in it too. • It would not be possible always to 'evaluate' formulae as soon as they are input. Inequality formulae can at best be checked for obvious infeasibilities, but it does not make sense to talk of evaluating them. Equality formulae might sometimes be evaluated on their own, if all (including the owner cell) but one of the cells involved had numbers at the time the formula is input, but not in general. In general, all formulae, equality and inequality ones, would have to be 'simultaneously evaluated', which would normally require a simplex run. (This redefinition is not far removed from the way in which large spreadsheet models are used. It is typical for the recalculation mode to be set to 'manual'. Many changes can be made and recalculation requested by the user as and when required, greatly reducing computation, time.) As a consequence, calculations of the entire spreadsheet would need to be specifically requested, via one of the spreadsheet's commands; there is no question of control of the order of

A NEW SPECIES OF MODELLING SYSTEM

229

evaluation (by row or by column); empty cells occurring in formulae would not be taken to mean zero, but would be interpreted as having to be filled by the command used to calculate the entire spreadsheet, and in such a way as to meet all the conditions expressed by formulae.

5.3 Degrees of Freedom We have just presented a simple and consistent core of ideas, but not all of the follow-up redefinitions are indispensable to the new species of mathematical programming modelling systems. For instance, it might still be stipulated that, as in current spreadsheets, each cell could not hold more than one item of information (a piece of text, a number, or a formula). Extra rows/columns of cells would then have to be used just to hang the multiplicity of conditions to impose on the spreadsheet. Or it might still be assumed that empty cells mean zero. However, if the fact that a cell is empty cannot be interpreted as meaning that those cells are free for the system to use as it wishes, then the calculation phase would have to include a special sub-phase in which the user specifies which cells the system can use, disregarding whatever values they possess, if any. If, on top of the assumption that empty cell means zero, only equality formulae were allowed outside the calculation phase, then it would be possible to keep the automatic calculation mode of the spreadsheets; but the calculation phase would have to include another special sub-phase in which the user specifies inequality conditions, for instance as simple numerical upper and lower bounds on some of the cells. Indeed, anyone designing a system to run on top of one of the current spreadsheets would probably stipulate one item of information per cell at most, default empty cells to zero, and only admit equality formulae (i.e. tell the user to develop the basic model in the current spreadsheet), letting the 'add-on' program that runs on top of that spreadsheet handle all the interaction, with the user to define which cells correspond to variables and specify the constraints superimposed on the spreadsheet model, and manage the interface between optimizer and spreadsheet package. An example of such a design is "What's Best', an add-on to Lotus 1-2-3; other systems, like IFPS/O and VINO, basically follow this design too. But, because of the mixing of two different terminologies, sets of concepts and systems—the spreadsheet's (cells and formulae) and the optimization's (constraints and variables)—a price would have to be paid, in simplicity, elegance, and integration. Even sticking to all the follow-up redefinitions, other degrees of freedom are still available. For instance, some of the equality formulae may still be usable as calculation rules; so, instead of making them into constraints in a linear programming problem, one might well use them to calculate a value for the owner cell, or even another cell involved in the formula, instead. It might be left to the system to deduce which formulae could be used in this way, preserving in the eyes of the user the uniformity and nonprocedurality of the formulae; or,

230

CLIFFORD JONES AND JOAQUIM CARMONA

alternatively, it might be decided to make clear to him or her a distinction between procedural calculation rules (to which the term 'formula' would be associated) and declarative entities (to which some other term, like 'condition', 'relationship', or 'constraint', would be associated). In the latter case, two different commands might be available to calculate each type of 'formulae'. Even in the former case, there might be two different commands, one being presented as a restricted calculation facility which only examines equality formulae, attempting to use each on its own to calculate a value for a cell. A further stop down the road, one may wonder whether a straightforward linear programming problem should be generated from the spreadsheet model (one constraint for each formula, one variable for each blank cell involved in any formulae); perhaps it would be convenient to do a reformulation in the process. The fact is that in this new species of systems, in contrast to the traditional and even the new generation of systems, there is no separation between the 'matrix generation' and the report writing phases. So, the practice cannot be followed of feeding an 'economical' model to the optimizer and later using the report writer to generate further information which is 'redundant', in the sense that the information is not strictly necessary for the optimizer to produce a sufficient solution (i.e. from which that extra information can later be computed), for instance, partial sums, averages, even interperiod balances in multiperiod models, etc., which are all useful for understanding the solution. All this 'redundant' information is included in the spreadsheet model: this is even a virtue of the new species. Anyone who has had experience of constructing spreadsheet models, large or small, will have implicitly, or explicitly, appreciated and welcomed this characteristic feature. (The additional features, such as being able to copy or move formulae, numbers, or text and insert or delete columns or rows, permit a very 'pragmatic' approach to model building.) The reformulation interface would automatically schedule the workload between the optimizer and a 'report writer' routine. Incidentally, this is a topic of more general interest. The subject of redundancy has been discussed in another sense (that of discarding nonbinding constraints and so on), but the kind of redundancy we are talking about here is at another level, and the problem reduction to be achieved is obtained via reformulation (and posterior recovery of the original problem and corresponding solution) rather than by simply discarding information. A similar 'demarche' has been discussed, for instance, when some types of nonlinearities have to be processed into more amenable forms. But the motivation here is different: not to arrive at a formulation which is computable, but to generate one which is smaller than the original, but nevertheless computable in the present state of algorithmic knowledge, and therefore feasible even in a modest hardware and software configuration. In the context of the new species of systems, this question is more acute, partly because of the modelling redundancy they encourage and partly because of the relatively modest hardware ambitions they inherit from the spreadsheet packages which inspired them.

A NEW SPECIES OF MODELLING SYSTEM

231

6. SIMP: The beginnings of a 'superspreadsheet'

SIMP (spreadsheet interface to mathematical programming), a prototype which will run on any IBM PC with a colour graphics adaptor, could be regarded as a forerunner of the new species (Carmona & Jones, 1986). It implements the basic and follow-up redefinitions suggested above; it regards all formulae uniformly, as relationships; but, besides the full CALCULATE command (which generally requires a simplex run), there is also another, more restricted, facility (a command called XECUTE) which only looks at the equality formulae in order to use them as calculation rules. The CALCULATE command does a naive reformulation: whenever an equality formula comes up, it is used to eliminate a variable (the first blank cell occurring in the formula); a constraint to ensure the non-negativity of the discarded variable may have to be generated to replace the ' = ' constraint. Nowadays, launching a computer which does not run a spreadsheet would be a distinct oddity, and in many cases the spreadsheet comes bundled in with the hardware and operating system. Systems of the new species would be an important milestone, if not very nearly the final stage, in the general direction loosely, and a bit grandly, foreseen by Vazsonyl (1982): 'We will go even further, because still more powerful modelling capability will emerge. Mathematical models of Management Science, like linear programming, integer and goal programming, waiting line and Markov Theory will all be built in the command structure of new computers.'

An example is now presented to illustrate the way in which SIMP works. The example, which may be familiar to some readers, is the factory planning problem from Williams (1984). It is a production planning problem with seven products, five machines, and a six month horizon. There are various ways in which the problem could be 'laid-out' on the spreadsheet: indeed, students who have used SIMP have come up with different layouts. Figure 1 shows the arrangement for Product 1'. The formulae in a sample of the cells are as follows: inB9 in BIO inC7

B9< = B12 Sales < = Market Forecast for January BIO = B7 + B8 - B9 Stock Balance Equation for January C7< = 100 Maximum Stock Level of 100 and C7 = BIO Opening Stock in February = Closing Stock in January.

The formulae can be 'echoed' across for the other months for this product and the whole block 'duplicated' to create the formulae for the other six products, as all the above formulae use 'relative referencing'. (The format used for the spreadsheet is based on the Abacus system on the ill-fated Sinclair QL, but which is now available on the IBM PC.) The machine capacity constraints are shown in Fig. 2. (In this figure the spreadsheet is shown at the COMMANDS level.) The formulae in cell B131 are

232

CLIFFORD JONES AND JOAQUIM CARMONA

ESCAPE : ESC HELP: press Fl COMMANDS press F3

I

CURSOR GOTO CELL press F5

A

CELL Al

FORMULAE

I

B

I

2 3 4 5 6 7 Initial stock 8 Production 9 Sales 10 Final stock 11 12 Market forecast 13 14-- Total sales 15 16-- Total stock 17 18

FORMULAE TEXT type press ' or " then formula followed by text and ENTER and ENTER D I E | F | G

NUMBERS type directly and press ENTER

press*-'* •»

>>> Product 1 >

MACHINE TIME USED

FEB

f MAR

^

FORMULAE type < , = or > then formula and ENTER

Save Xecute

1

1

E

1

FF

1




JAN

FEB


Grinding Vertical drilling Horizontal drilling Boring Planing

|

G

>>> P r o d u c t 1
Grinding Vertical drilling Horizontal drilling Boring Planing

C

233

MAR 0 240 0 0 0

E

MAR 1436 768 1152 0 384

APR 1436 384 1152 384 384

|

F

|

G

Suggest Documents