Linear Types for Cashflow Reengineering

Linear Types for Cashflow Reengineering Torben Æ. Mogensen DIKU University of Copenhagen Universitetsparken 1 DK2100 Copenhagen O, Denmark Phone: +45 ...
Author: Kathleen Carr
4 downloads 2 Views 46KB Size
Linear Types for Cashflow Reengineering Torben Æ. Mogensen DIKU University of Copenhagen Universitetsparken 1 DK2100 Copenhagen O, Denmark Phone: +45 35321404 Fax: +45 35321401 email: [email protected]

Abstract. A while back a major Danish bank approached the programming language group at DIKU for help on designing a language for modelling cash flow reengineering: The process of issuing customised bonds based on income from existing bonds. The idea was to have a simple language that allows non-programmers to describe such reengineering and run statistical simulations of the structures. We describe the problem and present the design of a cashflow-reengineering language based on the dataflow paradigm and linear types. This language has formed the basis of further development by the bank in question and a variant of it is now in use there.

1

Introduction

In the context of this paper, a cashflow is a bond or other financial obligation characterised by a sequence of payments of interest and principal on a sequence of specified dates. Both interest and principal payments may vary over time and can be influenced by outside forces such as interest rates, currency rates or stock market value. A cashflow may or may not have a balance that specifies the sum of the outstanding principal payments. Banks often sell or buy cashflows tailor-made to specific customers. These are usually financed by buying and selling other cashflows and redirecting the payments between those that are bought and those that are sold. This is called cashflow reengineering1 . Bank workers do this by combining and splitting cashflows in various ways such that the outcome is the desired cashflows and possibly some “residual” cashflows that the bank may keep for itself or try to sell. Typical operations on cashflows are: Add: Two cashflows are combined into one, that gets the combined payments of the components. Cleanup: When a certain condition occurs, all the remaining balance is paid as principal and no further payments are made. Divide: The cashflow is divided into two, each of which gets a specified fraction of the principal and/or interest payments. 1

The term is in different contexts used for the process of restructuring the finances of a company.

Sequential: Until a certain condition occurs, all payments go to one cashflow, and subsequent payments go to another. These operations are applied at every time step (i.e., at every payment date), but may have some degree of memory (e.g., of whether the specified condition has occurred in the past). Since various parameters (such as future interest rates) can be unknown, it may not be possible to guarantee that the bank will get a profit from a deal. However, the bank can try to guess at these and try out various scenarios to get confidence that they are likely to make a profit. This can be done by Monte Carlo simulation [5]. To support this, the deal must be encoded in a program that does the simulation. Some time ago, a Danish bank (which shall remain anonymous) used the practice of letting programmers code the deals in a standard programming language using some library functions for the simulation. The bank found this process tedious and prone to error, so they wanted to design a domain-specific language [12] that could be used directly by the bank workers without involving the programmers. The responsible people had no experience with language design, so after making a rough sketch of a possible language and finding the result unsatisfactory, they approached the programming language group at DIKU for help. The sketch the bank provided was based on object-oriented notation, with variables having methods applied with dot-notation. There was no conscious consideration if this was suitable, it was just what the programmers in the bank were used to working with. A very simple example using this notation is C = A + B; D = C.divide(0.7); In the first line, the cashflows A and B are added to form the cashflow C. In the second line, C is split such that 70% goes to the new cashflow D and the remaining 30% stays in C. There were several problems with this design: – The notation is not intuitive for nonprogrammers. – It is difficult to check for cashflow preservation. Cashflow preservation is the notion no cashflow is used twice or not at all, i.e., that you don’t spend the same dollar more than once or forget you have it. To ensure this, the bank people suggested that side effects will empty cashflows as they are used, e.g., setting A and B to zero when they are added to form C in the example above. At the end, it is checked if all except the output cashflows are zero. While this strategy, indeed, ensures cashflow preservation, it doesn’t catch all errors in a deal. For example, if a user had written a specification like this: C = A + B; : : E = D + A;

It just silently adds zero to D in the last line, where it would be appropriate to give a warning. Note that you can’t just check for zero values when doing operations, as this will flag errors for sequential splits, where some payments are meant to be zero. The rest of this paper will describe the design that was proposed by DIKU and briefly summarise the experiences of using the language.

2

Embedded vs. stand-alone domain-specific languages

Domain-specific languages generally come in two flavours: They can be stand-alone languages with their own special syntax and compilers/interpreters, like the language described here, or they can be embedded languages. Embedded languages are, as the name indicates, implemented as a set of library functions, classes, macros or some other abstraction mechanism in an existing language. An example of an embedded domain-specific language is the Lava hardware description language [4], while PostScript [1] is a stand-alone domain-specific language. It is possible to implement DSL’s as both embedded and stand-alone languages. This is the case for, e.g., the query-language SQL [6], which is available both as a set of library functions and as a stand-alone notation used in interactive front ends to databases. The two flavours of each have advantages and disadvantages: Embedded languages: An embedded language inherits the full expressiveness of the host language, so constructs for conditionals, looping, recursion, etc. need not be explicitly added to the language. Furthermore, implementation and interoperability with other languages is more easily accomplished. Stand-alone languages: The syntax can be chosen so it is natural for the problem domain without any limitations imposed by a host language. Furthermore, domainspecific consistency checks can be made at compile-time, and error-messages can be expressed in terms of the problem domain. It is clear that the latter is most suitable for the problem presented here, both because cashflow preservation can be tested at compile time and because the language is intended for non-programmers. The differences between embedded and stand-alone DSL’s and their (dis)advantages is elaborated further in [12, 11].

3

From dataflow to cashflow

The cashflow preservation property is similar to the property found in dataflow languages [9]: When a value is used, it is no longer available for further use (unless it is explicitly copied). In a dataflow language, a computation is represented as an acyclic directed graph, where the edges represent values and the nodes represent operations. Each operation has a fixed number of input and output edges, so you need an explicit copy node to use a value more than once. The same idea can be applied to cashflow reengineering: Each edge in the graph is a cashflow and the nodes are operations that

combine or split cashflows. There are no copy nodes, as these would not preserve cashflow. Edges that aren’t connected at both ends specify input or output cashflows. Graphical notation is far from compact and it requires special tools for production and editing. So we define a textual notation similar to a traditional programming language for the graph: Edges are named and each name will be used exactly twice: Once for the place where the edge is given a value and once for the place where the value is used. Input and output edges are explicitly specified to preserve this definition/use property. The example from the introduction looks like this in the new notation: declarations fraction = 0.7 input cashflow a, b structure c = a + b (d,e) = Divide(c, fraction) output cashflow d,e The fraction 0.7 is now declared as a named constant. It could also have been specified as a numerical input parameter. Note that the Divide operation has two output values and that its parameter is completely consumed. Cashflow preservation is checked only for cashflow edges, so you can use numerical or boolean constants or parameters multiple times or not at all. Several operations can be combined on one line, so the structure part of the above example can be abbreviated to (d,e) = Divide(a + b, fraction) Names always refer to single edges/values, never to tuples. When expressions are nested, they may build tuples with tuples as elements. Such a nested tuple is flattened out to a single non-nested tuple. For example, the tuple ((2,3),4,(5,6,7)) is flattened to the tuple (2,3,4,5,6,7). This is the basic core language, which is more or less what was implemented by the bank (see section 7 for more on this), but we can make various extensions that make the language more interesting and generally useful.

4

Function definitions

These are trivial to add: A function declaration has exactly the same structure as a program: It consists of declarations, input, structure and output. A function is called exactly like a predefined operation, taking a tuple of parameters as input and returning a tuple as output. Cashflow preservation for declared functions is checked in the same way as above: Each cashflow is defined and used exactly once in the function.

5

Conditionals

It is useful to construct cashflows depending on some condition. This condition can test current or past events. In the first case, the condition can change arbitrarily often over time, but since the latter tests if a certain event has happened at any point in the past, it can change status at most once (from false to true). We use two different conditionals for these two cases: exp → if cond then exp1 else exp2 exp → until cond use exp1 thereafter exp2 Note that the conditionals are expressions, so they are used to the right of an assignment. It is a requirement that the two branches (exp1 and exp2 ) have identical cashflows, i.e., that they use the same cashflow variables and produce identical tuples. This isn’t difficult to check, it gets more interesting when we consider the condition. A condition may want to check properties of cashflows, e.g., to see which has the highest interest payment. But this check doesn’t actually use the money in the cashflows, so counting it as a use would go against cashflow preservation. Hence, we must distinguish between uses of a cashflow that build new cashflows and uses that inspect the cashflow without consuming it. We can do this by using linear types [7].

6

A linear type system

The basic idea is to use two different types for cashflows: cashflow for cashflows that are consumed when used and cashflow0 for cashflows that aren’t. Additionally, we have types for booleans and numbers and tuples of these atomic types. We denote an atomic type by σ (possibly subscripted) and atomic or tuple types by τ (also subscripted): σ = bool | number | cashflow | cashflow0 τ = σ | (σ0 , . . . , σn ) We will describe the type system by showing rules for type-correctness. For brevity, we show rules only for the subset of the language shown in figure 1. The rules for the rest of the language follow the same general structure. We have chosen + as an example of a basic operation on cashflows and < as an inspection operation. Note that + can be used both on pairs of consumable cashflows, inspect-able cashflows and numbers. < compares two numbers or the total payments (interest + principal) of two cashflows. This is usually used in combination with an operator (not shown) that splits interest and principal payments into two different cashflows (so interest or principal can be compared separately). Note that the compared cashflows are not consumed, so the arguments can’t be of type cashflow. We have omitted the constant declaration part of the program, so Decl refers to declarations of input and output variables. We use a notation for linear types similar to [10]. Type environments (denoted by ∆, Γ and Θ) are lists of pairs of names and types (each written as x : τ and separated by commas). When typing expressions, we manipulate type environments with structural rules that can reorder, copy and delete pairs. We ensure that pairs where the type is

Program → input Decl structure Stat output Decl Stat

→ (x1 , . . . , xn ) = Exp | Stat ; Stat

Decl

→ τ x1 , . . . , xn | Decl ; Decl

Exp

→ | | |

x Exp + Exp Exp < Exp if Exp then Exp else Exp

Fig. 1. Syntax for a subset of the cashflow-reengineering language

Weak

Γ`e:τ , τ0 6= cashflow Γ, x : τ0 ` e : τ

Exch

Γ, x : τ0 , y : τ00 , ∆ ` e : τ Γ, y : τ00 , x : τ0 , ∆ ` e : τ

Copy

Γ, x : τ0 , x : τ0 ` e : τ 0 , τ 6= cashflow Γ, x : τ0 ` e : τ

CopC

Γ, x : cashflow0, x : cashflow ` e : τ Γ, x : cashflow ` e : τ Fig. 2. Structural rules

Program:

` di : Γ Γ ` s : ∆ ` d o : ∆ ` input di structure s output do

Prog Stat: Assg

Γ ` e : (τ1 , . . . , τn ) , ({x1 , . . . , xn } ∩ dom(∆)) = 0/ Γ, ∆ ` (x1 , . . . , xn ) = e : (∆, x1 : τ1 , . . . , xn : τn )

Seq

Γ ` s1 : ∆ ∆ ` s2 : Θ Γ ` s1 s2 : Θ

Decl: Decl

Dseq

` τ x1 , . . . , xn : (x1 : τ, . . . , xn : τ)

` d1 : Γ ` d2 : ∆ , (dom(Γ) ∩ dom(∆)) = 0/ ` d1 d2 : Γ, ∆

Exp: Var

AddC

x:τ`x:τ

Γ ` e1 : cashflow ∆ ` e2 : cashflow Γ, ∆ ` e1 + e2 : cashflow

Add0

Γ ` e1 : cashflow0 ∆ ` e2 : cashflow0 Γ, ∆ ` e1 + e2 : cashflow0

AddN

Γ ` e1 : number ∆ ` e2 : number Γ, ∆ ` e1 + e2 : number

LesC

Γ ` e1 : cashflow0 ∆ ` e2 : cashflow0 Γ, ∆ ` e1 < e2 : bool

LesN

Γ ` e1 : number ∆ ` e2 : number Γ, ∆ ` e1 < e2 : bool

If

Γ ` c : bool ∆ ` e1 : τ ∆ ` e2 : τ Γ, ∆ ` if c then e1 else e2 : τ Fig. 3. Typing rules

cashflow can not be copied or deleted. The structural rules are shown in figure 2. Note that we (in rule CopC) can make an inspect-only copy of a cashflow variable. We add rules for expressions, declarations and programs in figure 3. Most of the rules are straightforward. Note that the rule for assignment (Assg) splits the environment into the part that is consumed by the right-hand side and the part that is left alone. The latter is extended with the bindings made by the assignment, after checking that none of the defined names already occur in it. This actually allows reassignment of a name after it is consumed, but we don’t find this a problem. In fact, the users of the language like it, as they don’t have to invent new intermediate names all the time. Note how cashflow preservation is checked by verifying that the declaration of the output variables to a program correspond to the final environment. The rule is slightly simplified, as it doesn’t allow non-cashflow variables to remain in the environment without being output, even though this is harmless. This weakness can be handled by applying variants of the structural rules to the final environment.

7

Conclusion

The bank has only implemented the core language as described in section 3, minus nested expressions. Their experiences are shown in the following fragment of a letter sent to DIKU a while after the language was put into use2 . After the initial pilot phase we can draw some conclusions about [the language]. The language works well in the intended context. We have achieved separation of the abstract description of structuring of bonds and the initial parameters, such as interest rates and principal balances, which later can be fine-adjusted to find optimal business opportunities. This has made the work considerably more efficient. We have not implemented the complicated structures, such as conditionals, but the basic dataflow model corresponds well to how people think about structuring of bonds. Using only the basic language primitives have lead to quite large and complex programs, but by making a library of the most used operations, we have simplified the use considerably. The bank implemented the language entirely in-house from the design document. The choice to implement only the most basic subset of the language may be due to the bank first wanting to try out the basic design and finding that suffient, perhaps combined not having people with background in type systems, which is required for the extensions. The bank found it necessary to have a large set of primitive operations. It is my belief that the full version of the language would have allowed most of these operations to be defined in the language itself using only a few basic operations combined with conditionals and function declarations. It is th eopinion of the author that the financial word offers many opportunities for using small, well-defined domain-specific languages, as also evidenced in the section on related work below. 2

Translated from Danish and abbreviated slightly.

Related work The language Risal [11] is intended for the same purpose as our language: Describing and manipulating interest-rate products. As our language, it is actively in use in the bank world. The main new feature of our language is the linear type system for ensuring cashflow preservation. Jones et al. [8] describe a language embedded in Haskell for describing financial and insurance contracts and estimating their worth through an evaluation semantics. The purpose is quite similar to what is done here, and one option for the evaluation semantics is, indeed, Monte Carlo simulation. The main differences, apart from the difference between the contracts modelled by the language in [8] and the cashflows modelled by our language is that Jones implements his language as a set of combinators in Haskell where we have chosen a stand-alone language. Furthermore, there is no notion of cashflow preservation or similar in [8]. According to the paper, the language may be implemented as a stand-alone language in the future. [2] describes a domain-specific language for a quite different kind of financial application: Tracking and predicting stock-market values. Linear types and related systems have been used for many purposes, for example converting call-by-need to call-by-name [10] and for ensuring single-threadedness for update-able structures in functional languages [3, 13]. The first of these examples uses linear types descriptively, i.e., for identifying linear uses, while the latter uses the types prescriptively by disallowing non-linear uses of certain variables. As such, it is closest to our use of linear types, which also is intended to forbid nonlinear uses of cashflows.

References 1. Adobe postscript 3 home page. http://www.adobe.com/products/postscript/. 2. Saswat Anand, Wei-Ngan Chin, and Siau-Cheng Khoo. Charting patterns on price history. In International Conference on Functional Programming, pages 134–145, 2001. 3. Erik Barendsen and Sjaak Smetsers. Uniqueness typing for functional languages with graph rewriting semantics. In Mathematical Structures in Computer Science 6, pages 579 – 612, 1997. 4. Per Bjesse, Koen Claessen, Mary Sheeran, and Satnam Singh. Lava: Hardware design in Haskell. In ICFP 1998, 1998. 5. P. Boyle, M. Broadie, and P. Glasserman. Monte Carlo methods for security pricing. Journal of Economic Dynamics and Control, 21(1267), 1997. 6. C.J. Date and Hugh Darwen. A Guide to The SQL Standard. Addison-Wesley, third edition edition, 1993. 7. Jean-Yves Girard. Linear logic. Theoretical Computer Science, (50):1 – 102, 1987. 8. Simon Peyton Jones, Jean-Marc Eber, and Julian Seward. Composing contracts: an adventure in financial engineering. In ICFP’00. ACM Press, 2000. 9. P.C. Treleaven and R.P. Hopkins. Data-driven and demand-driven computer architecture. ACM Computing Surveys, 14(1), 1982. 10. David N. Turner, Philip Wadler, and Christian Mossin. Once upon a type. In FPCA’95, pages 1 – 11. ACM Press, 1995.

11. A. van Deursen. Domain-specific languages versus object-oriented frameworks: A financial engineering case study. In Smalltalk and Java in Industry and Academia, STJA’97, pages 35–39, 1997. 12. Arie van Deursen, Paul Klint, and Joost Visser. Domain-specific languages: An annotated bibliography. SIGPLAN Notices, 35(6):26–36, 2000. 13. Philip Wadler. Linear types can change the world! In Programming concepts and methods. North Holland, 1990.