Types and Persistence in Database Programming Languages

Types and Persistence MALCOLM Department in Database Programming Languages P. ATKINSON of Computing Science, University of Glusgow, Glasgow G12 8...
Author: Giles Hill
9 downloads 0 Views 7MB Size
Types and Persistence MALCOLM Department

in Database




Computing Science, University of Glusgow, Glasgow G12 8QQ, Scotland

0. PETER BUNEMAN Department of Computer Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Traditionally, the interface between a programming language and a database has either been through a set of relatively low-level subroutine calls, or it has required some form of embedding of one language in another. Recently, the necessity of integrating database and programming language techniques has received some long-overdue recognition. In response, a number of attempts have been made to construct programming languages with completely integrated database management systems. These languages, which we term database programming languages, are the subject of this review. The design of these languages is still in its infancy, and the purpose of writing this review is to identify the areas in which further research is required. In particular, we focus on the problems of providing a uniform type system and mechanisms for data to persist. Of particular importance in solving these problems are issues of polymorphism, type inheritance, object identity, and the choice of structures to represent sets of similar values. Our conclusion is that there are areas of programming language researchmodules, polymorphism, persistence, and inheritance-that must be developed and applied to achieve the goal of a useful and consistent database programming language. Other research areas of equal importance, such as implementation, transaction handling, and concurrency, are not examined here in any detail. Categories and Subject Descriptors: D.1.4. [Programming Techniques]: Sequential Programming; D.2.2 [Software Engineering]: Tools and Techniques; D.3.3 [Programming Languages]: Language Constructs General Terms: Design, Languages Additional Key Words and Phrases: Conceptual languages, databases, data models, data types, embedded languages, integrated languages, object-oriented programming, persistence, persistent languages, polymorphism, programming languages, type inheritance


Databases and programming languages have developed almost independently of one another for the past 20 years. Since nearly every program we write requires access to some form of permanent data, This work was carried out at the Universities

enormous resources have been expended on developing particular databases for specific applications or in using an interface between a database and a programming language, something that requires considerable expertise. Worse still are the impediments placed on system development

of Glasgow, Pennsylvania,

and St. Andrews.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1988 ACM 0360-0300/87/0600-0105 $01.50

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman


INTRODUCTION Outline of This Review Provision of Independent Persistence Data Types and Data Definition Type Completeness and Polymorphism Expressive Power Targets and Requirements for DBPLs Problems and Research Issues 1. A TEST CASE AND SOME BASIC APPROACHES 1.1 The Programming Language Approach Illustrated with Pascal 1.2 The Relational Approach 1.3 Preview of Variations Encountered While Coding the Tasks 2. EXISTING DATABASE PROGRAMMING LANGUAGES 2.1 The CODASYL Approach: The Database as External Subroutines 2.2 Pascal/R: A True Database Programming Language 2.3 Other Languages That Attempted Integration with Relations 2.4 Embedded Languages 3. LANGUAGES INCORPORATING ADVANCED DATA MODELS 3.1 Daplex and Adaplex 3.2 Taxis 3.3 Galileo 4. POLYMORPHISM AND DATABASE PROGRAMMING 4.1 ML 4.2 Poly 5. PERSISTENT LANGUAGES 5.1 PS-Algol 5.2 Amber 5.3 Persistence and Object-Oriented Programming 5.4 Persistence and Large Databases 5.5 Object-Oriented Databases 6. RELATED LANGUAGES AND INTERFACES 6.1 Approaches Based on Logic and Algebra 6.2 Persistence and Workspaces 7. CONCLUSIONS 7.1 Expressive Power 7.2 Data Type Completeness 7.3 Polymorphism 7.4 Persistence 7.5 Modularity: A Construct for Persistence and Organization ACKNOWLEDGMENTS REFERENCES

by the lack of adequate programming tools for creating databases. It has frequently been the case that the presence of an existing database, maintained under an ACM Computing

Surveys, Vol. 19, No. 2, June 1987

established database management system, has put an effective stop to the use of advanced programming tools, simply because an interface does not exist and because it would be practically impossible to restructure the data to conform to the programming environment. Thus there are organizations that cannot adapt their systems to changing needs because of these limits on database development. The practical need to develop an integrated programming environment for databases has stimulated substantial recent research on combining databases and programming languages. Earlier, both subjects had accumulated a few good prototypes so that research in the separate areas could be fairly easily classified by reference to one of a limited family of languages or data models. Only recently has the need to produce an integrated system for programming and data management been recognized, and this has led to a number of attempts to produce such systems either by writing a completely new database programming language or by enhancing an existing language with some form of database management. In either case the usual approach has been to combine an existing language with an existing data model. Our purpose in writing this paper is to give an account of these attempts to integrate programming languages and databases in the hope of extracting certain principles for use in future attempts to design programming languages or databases. In fact, it is our hope that all future research projects in either area will aim at developing a unified solution. The attainment of such a unified solution requires some guiding principles and should take note of what has already been attempted. We therefore present a survey of past and current efforts with reference to some test cases. The three principles we discuss in this paper are persistence, type completeness, and expressive power. To a limited extent, each of these can be given some formal semantic definition; however, much theory remains to be developed. We are more concerned here with the practical consequences of these principles and will attempt to clarify them and justify their importance

Types and Persistence in Database Programming Languages


through examples in hope that this will query language. There are programs, often involving numerical computations or comencourage a better formal understanding and, consequently, a more uniform ap- plicated manipulation of the database, that proach to the problem of designing data- cannot be written in the language. The traditional solution to this problem has base programming languages. To illustrate persistence, imagine a been to use two languages, sometimes second-level programming course in which embedding the query language in a comstudents are taught to write B-tree type putationally more powerful “lower level” declarations and manipulation routines in (sic) language. Although nearly all the proPascal. This well-known exercise is an ex- gramming languages we investigate provide cellent illustration of how Pascal’s data adequate computational power, they have types may be used in both formulating the often been embellished with new programproblem and preventing the kinds of errors ming constructs for manipulating datathat plague pointer manipulation. Unfor- bases. The generality with which these tunately, the student’s code serves only as constructs can be applied within the lana model for the code that is involved in a guage is then an issue. We have written this paper as a review real implementation. It cannot be used to exploit secondary storage-the raison of a number of languages that, we hope, d’etre of B-trees-since the only persistent will provide ideas for future language dedata type in Pascal, the file, will not ac- signers. We have chosen certain languages commodate this structure. To implement a because we believe they have made contriworking B-tree, the student will have to butions to the development of database resort to a lower level language for imple- programming languages (DBPLs). The fact mentation or play tricks on the Pascal com- that we are often critical of these languages piler that violate the type rules, in both should not detract from their importance. cases again leaving open the possibility of They were chosen because, although propointer errors. gramming language research is extensive, Type completeness is the requirement only a handful of language designers have that all data types enjoy equal status within recognized the importance of database proa language. A well-known example of fail- gramming. Constraints of space do not alure of type completeness is again to be low us to describe all the relevant languages found in Pascal, in which functions can or details of the languages that are deonly return values of a limited set of data scribed, we have chosen those that are reltypes: Functions, for example, cannot re- evant to the themes we discuss. There are turn arrays or other functions. Another undoubtedly unintentional omissions and well-recognized problem with Pascal’s inaccuracies, and the authors welcome cortypes is the impossibility of expressing a rections and information on other systems new parameterized type. The student’s B- that are being developed. tree code should surely work for a large number of index types and result types, yet Outline of This Review if we use Pascal’s type system strictly, a new set of procedures for lookup and update A review such as this is necessarily lengthy, must be written for every pair of index and and it is appropriate at this point to deresult types. To enable us to write just one scribe the structure of the paper and the set of procedures, the language requires reason for this structure. It is unlikely that some form of polymorphism, which we shall anyone will wish to read the paper from beginning to end, and some guidance folshortly discuss.’ The lack of expressive power is well lows. This Introduction continues with a deknown to anyone who has used a database scription of the themes of the review and ‘It should be emphasized that we are using, and should be of interest to anyone who is uncriticizing, Pascal for purely pedagogic reasons. It is familiar with terms such as “persistence” precisely because the language achieved so much in and “polymorphism” or who requires furmaking practical use of data types that we are in a ther motivation for study. It concludes with position to discuss the next steps. ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

an attempt to formulate a set of design goals for database programming languages. Section 1 introduces a set of four tasks that we believe are illustrative of problems in database programming. The tasks are meant both to motivate further our discussion of DBPL design and to serve as an informal yardstick with which to measure the various database technologies and the details of the various languages. A second and important reason for picking a fixed set of tasks is that it should make it somewhat easier to follow the numerous examples of code that are given throughout this paper in different languages, since the various programs are designed to fulfill the same purpose and, as far as possible, the same identifiers are used in the various languages. The second part of Section 1 shows two partial attempts to code these tasks using first a programming language approach with Pascal and then a database approach with SQL. Since neither approach produces a tractable solution to all of these tasks, there is a further argument for the need to study database programming languages. However, many of the techniques introduced in these partial solutions are used later on; and it may be advisable for all readers to study these programs in order to understand the data structures and algorithms involved. Section 2 shows how one might accomplish these tasks given existing software technology. It includes the use of a CODASYL system, a relational system embedded in a programming language, and Pascal/R, the only integrated database programming language that, to our knowledge, has so far received any widespread practical use. Section 3 describes three database programming languages: Adaplex, Taxis, and Galileo, all of which are currently being developed. They are grouped together because of their type systems, which all involve some form of inheritance, and because of their connection with “objectoriented” languages that we discuss later. Polymorphism is discussed in detail in Section 4 in which two languages, ML and

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Poly, are introduced. Although neither language has adequate forms of persistence for database work, their type systems are sufficiently important for us to believe they are worth studying for their potential influence on future DBPL designs. In Section 5 we discuss languages that incorporate a general notion of persistence. These are PS-algol and Amber, the latter deriving its type system in part from ML and Galileo. This is also an appropriate place to discuss persistence and its relation to object-oriented languages. Section 6 contains a brief summary of other work that is related to the themes of this paper, in particular logic and algebraic programming, but that is not treated in detail here because the connections with types and persistence have not yet been properly explored. Our conclusions and suggestions for further work are contained in Section 7. From Section 2 onward, it is likely that readers will wish to be selective. Readers unfamiliar with a wide range of languages may wish to concentrate on those sections that share a common linguistic base, whereas readers who want to explore one of the themes, such as persistence, may wish to concentrate on a different subset of languages. Space does not permit a tutorial on each language, and consequently some difficulty may be encountered in reading the example programs. It is, however, unfair to criticize a language for unfamiliar syntax since many of the languages have incorporated new ideas that are difficult to cast in the syntax of, say, the Pascal family. Our personal experience with several of the languages described here is that, after a short period of intensive use, the syntax is not a major issue. More important to the ease of understanding is the underlying simplicity, the parsimony of fundamental concepts, and the semantic consistency of the language. For readers familiar with the ALGOL/ Pascal family of languages, we recommend first that they review Section 2 in which all the approaches are based on Pascal, then move to Adaplex (3.1) and

Types and Persistence in Database Programming PS-algol (5.1), and then to Galileo (3.3) for an introduction to the (related) syntax of ML. Inheritance is treated in the three languages of Section 3 and later in Amber (5.2) and the ensuing discussion. As we have noted, persistence is treated mainly in Section 5, but readers interested in this topic should also look at Galileo (3.3) and the discussion of workspaces in Section 6.2. Provision

of Independent


The three principles introduced earlier are now further elaborated, beginning with persistence. In procedural languages, we normally think of objects as having a well-defined lifetime. A variable declared in a block or procedure will “persist” during the activation of that segment of code and thereafter be inaccessible. If an object is created as part of a data structure, its persistence, from the user’s point of view, is the duration for which it remains accessible. Some languages allow the explicit (and dangerous) control of persistence through the use of storage deallocation procedures such as free or dispose. Without explicit deallocation, the logical persistence of objects during program execution is well understood. Moreover, these mechanisms (scope rules, allocation, and deallocation) can apply to the full range of types. However, when data are required to last longer than the duration of one program execution, the treatment is much less uniform. For example, the set of types that may persist from one program execution to the next is often a small subset of, or even different from, the types available during the execution. In most languages the only objects endowed with long-term persistence are files; and whereas, in Pascal, files may be parameterized by other types and therefore can be used as a vehicle for the persistence of other objects, the parameterization is incomplete and cannot be used on, for example, pointer types. Thus the programmer is constrained either to use a possibly unsatisfactory subset of the avail-




able types or must resort to loopholes through which the physical representation of types may be manipulated. The dangers of the latter option are self-evident. We have argued elsewhere that this discontinuity in the treatment of data associated with a change in persistence is both deleterious and avoidable [Atkinson et al. 1983b, 1984; Atkinson and Morrison 1984, 19861. The languages presented here display the discontinuity to a varying degree: The well-known programming languages merely support a very small subset of their types as persistent;2 some languages provide a different sublanguage for their longterm data storage; others introduce new types for data storage; some extend the range of persistence beyond program execution by having a mechanism for storing their workspaces. Generally, in the languages we examine, we find persistence presented as a binary division between near permanence and transient existence, although it is possible that intermediate forms could be useful. Although we cannot yet give precise semantics for persistence, we hope to convince the reader that there are certain principles of persistence that should govern language design: (1) Persistence should be a property of arbitrary values and not limited to certain types. (2) All values should have the same rights to persistence. (3) While a value persists, so should its description (type). The first two of these principles state that a value’s persistence should be independent of its type; persistence should therefore be regarded as a property of data orthogonal to its type. A corollary is that the code used to manipulate a value should not depend on its persistence. Failure to comply with the third principle is a common source of system error; it should not be possible to write out a value as one type and ’ This subset is often further implementation.

ACM Computing

limited by the language’s

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

subsequently read it in as another. This may be regarded more as a property of program environments rather than programming languages. We believe that for database programming languages the issues of programming language and programming environment cannot be separated. Indeed, one of the authors [Atkinson and Morrison 1985a] has suggested that program linking-normally associated with programming environments-can be neatly represented by persistent functions. A language in which the programmer has to include explicit statements to initiate or organize transfer of data objects does not comply with the requirement that the code for manipulating an object not depend on its persistence. The required transfers between stores can, and should, be inferred from the operations on the data. A programmer is distracted from the essence of an algorithm when reading or inserting explicit transfer statements. Thus we use the phrase persistent programming language for languages that provide for longevity for values of all types and that do not require explicit organization of, or even mention of, data movement by the programmer. A counterargument to providing persistence is that it is difficult to find good engineering techniques to support an arbitrary persistent structure. Certainly, the mechanisms for some types, such as those constructed as relations, are better understood at present, possibly because of the substantial research effort expended on these types. To compete with these existing technologies, the general-purpose methods will need to be as efficient in placing data and avoiding the transfer of irrelevant data. This, in turn, requires adequate mathematical models of data and program behavior, and interpretation of these models to control the data collection and select representations, access methods, and so on. Additional problems may arise over implementing concurrency, transactions, and failure recovery. The only valid refutation of these counterarguments will be the advent of practical persistent systems. Perhaps operating systems, which already (assisted by suitable hardware) perform these functions for the arbitrary collections ACM Computing

Surveys, Vol. 19, No. 2, June 1987

of data they run, are a hint that the refutation may be constructed. Data Types and Data Definition

An examination of type definitions in Pascal (1.1) and of the specifications in data definition languages (see, e.g., Figures 12 and 5) shows more than a superficial similarity in the methods used in programming languages and databases to specify the kinds of data that the program is to deal with. The database definition (or schema) resembles a type declaration, whereas the contents of the database can be described as a value (see Section 2.2). There has been considerable discussion about the relationship between the two methods of specifying data (see, e.g., Brodie et al. [1983]), and this has been recognized in integrated DBPLs, all of which enable the database schema to be specified by a type declaration in the programming language itself. A precise semantics for the notion of data type is still the subject of much research [Kahn et al. 19841,and it is not appropriate to deal with the issues involved here. We adopt the simpleminded notion that a type is a (possibly infinite) set of values and that a variable of that type is constrained to assume values of that type. A type definition in a programming language serves to identify such sets by defining the operations that are valid for such sets. Early programming languages such as FORTRAN and LISP were based on a fixed set of types, whereas later languages, notably those of the ALGOL/Pascal family, contain type constructors, such as array and record, that allow the programmer to construct new types from old. Type constructors such as record are clearly needed for database programming. There is, however, an important distinction between the use of types in programming languages and in databases. The type integer denotes the set of all values that are integers, but this set is not actually available to the programmer as a value. If the programmer wants this set, or more usually a finite subset, a specific data structure for holding sets of integer values must be constructed. Compare this with the

Types and Persistence in Database Programming Languages



situation in database programming where, in general, the name person not only describes a type (the set of all possible persons) but is also used to describe a specific subset of values of this type-the set of all values of type person that is currently in the database. This notion of pairing a type with a specific extent arises in several database programming languages. Whether or not this is desirable is discussed later, as well as in more detail in Buneman and Atkinson [ 19861. Apart from the conceptual convenience of type declarations, a programming language compiler or interpreter can use them to prevent a common class of errors. A language is strongly typed if it prevents an operation from being applied to a value of an inappropriate type. Most languages are strongly typed, but strong typing is especially valuable where type constructors are used. LISP, for example, is strongly typed-an application of an arithmetic operation to a list will be caught at run timebut since there are no type constructors, there is no way of extending this mechanism to work on user-defined types such as person. A language is statically typed if all type errors can be detected by the compiler so that it is impossible for a type error to occur when the program runs. As much static checking as possible is clearly desirable, but whether it can always be maintained for DBPLs is discussed in this paper and in Atkinson et al. [1986].

The concept received a boost with its practical demonstration in the languages of the early seventies, especially ALGOL68 [van Wijngaarden 19691and Pascal [ Wirth 19711. The feasibility of specifying languages with a high level of data type completeness and then implementing them was demonstrated. However, engineering considerations have tended to deter complete adherence to the principle, as we have already noted for Pascal. Although languages in the ALGOL/ Pascal family allow the user to construct new data types by use of predefined type constructors, the constructors themselves are fixed, so the programmer cannot define new type constructors. If, for example, the language does not have a type constructor stack, it is desirable to be able to describe a data type stack[cw],together with the associated procedures for pushing and popping. We noted earlier that some form of indexing mechanism is essential, at least in the support of a database system. Consider again the implementation of the code for B-trees. We want to write generic code that can be used to generate several kinds of B-tree that would, for example, map Employee-Number to Employee, Name to Address, Part-Number to Pait, and so on. We need, therefore, to write code for a new type constructor B-tree[q /3] with associated procedures such as

Type Completeness


and Polymorphism

Data type completeness is only meaningful in languages that include type constructors, that is, types that are parameterized by other types. For example, if the language allows an array of (Y construct, then we should be able to substitute any type for (Y. Thus array of char, array of array of record . . . , and so on, should all be permissible types. Since procedures and functions may themselves be regarded as having types (determined by the types of their arguments and results), completeness demands that any type should be admissible as the value of such type parameters, including procedure or function types.


insert (index : B-tree [a, j3]; key: a; data : fi)

function lookup(imfez: B-tree[q

/3]; key:a):@

(we ignore the possibility of lookup failing). Such types and procedures are called polymorphic and are implemented in languages such as ML [Milner 19841, CLU [Liskov 19811, Miranda [Turner 19861, Poly [Matthews 1985a], Russel [Demers and Donahue 19791, and Ponder [Fairbairn 1982; 19851.A somewhat similar facility is available on Ada [Ichbiah et al. 19791, in which one would use a “generic package” for this purpose. For a comprehensive survey of type polymorphism and its relationship to ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

inheritance, the reader is referred to Cardelli and Wegner [1985]. We should note that in the B-tree example the choice of type for the type /3 of data, the data stored in the tree, is arbitrary but that the choice of type for key is not. The implementation of B-trees requires that we have a mechanism for comparing two elements of type CY.We need to describe cr further by specifying an algebraic signature: a set of named operations together with their types, in this case a single comparison function. The ability to specify a signature distinguishes Poly (4.2) from ML (4.1). The obvious question is whether the added power and possible complexity of having this degree of type parameterization is necessary for database programming languages. It can be argued that all that is needed is to find the “right” set of type constructors for supporting a given data model and that this is sufficient for database work. Indeed, we shall see that languages such as Pascal/R (2.2), Adaplex (3.1), and Galileo (3.3) have adopted this approach, and these languages certainly provide extremely powerful mechanisms for describing data. However, it is always desirable to be able to define new generic procedures, even over existing structures. This is especially true in cases in which these languages provide bulk structures, such as sets or relations. If they provide such structures (as they do), do they provide an adequate set of procedures for manipulating them? Can one take unions? Can one sort the structure? Are arbitrary statistical computations available for numeric fields contained in these structures? It is not at all clear that one can give an exhaustive list, and if this cannot be done, then the ability to write generic code for such structures, although not essential, is extremely convenient. The authors are convinced that some form of polymorphic type system is essential for database programming. More generally, the authors feel that an ideal DBPL would not limit the user to a particular data model and that a sufficiently flexible type system would allow a programmer to implement one or more data

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

models or to experiment with new ones. What is needed is the ability to specify the data model itself (not just the database schema) as a suitably generic data type. This calls for a degree of type manipulation beyond anything we shall see in this paper and may be regarded as an alchemist’s dream rather than a practical goal. Nevertheless, we believe that this is an interesting challenge for programming language designers. Expressive


To a programming language designer it is unthinkable that a language should not be able to perform arbitrary computations; but in database query languages there are usually severe limits to the kinds of computations that can be performed. In this paper we examine an example of a common class of computation that defeats most query languages. Another failure of many of these languages is that they cannot evaluate, at the top level, simple arithmetic expressions, even though such expressions may be permissible as part of a larger relational expression. There are several arguments that have been given in favor of restricting the power of a query language. One is that it simplifies the language for the user, who is not usually a programming language expert. It is certainly true of many of the languages that we survey that the languages that allow arbitrary computations are considerably more forbidding than the query languages. We do not yet know whether this is necessarily the case for all future database languages. We hope not, and we are encouraged by the simplicity of logic programming [Clocksin and Mellish 19811 and some of the applicative languages such as SASL [Turner 19811 and Miranda [Turner 19861, which allow the user to start by evaluating very simple queries but do not impose any limit on the power of computations that can be performed. Another reason for limiting expressive power is to provide a subset of operations that can be efficiently implemented or a language that can be effectively optimized. Again, the limits are arbitrary. Although most query

Types and Persistence in Database Programming Languages languages will compute an average of a set of numbers, few will compute the variance, even though any database machine or management system that can efficiently compute an average should be able to do so for a variance. Moreover, many of these languages will not allow the user to define a function that computes the variance. Since most of the languages we deal with are database programming languages as opposed to query languages, they do provide adequate expressive power. However, the way this is provided is not always uniform. For example, several of the languages allow an iterator of the form for x in 5’ do . . . but limit this to cases in which S is a sequence or set in the database. Whether this is a failure of expressive power or of type completeness is arguable; nevertheless a shift of computational strategy is often forced upon the programmer who must perform more complicated tasks. Many relational database systems fail to provide adequate expressive power, and the current practical solution to this problem is in “fourth-generation” systems. These provide a variety of user interfaces and development packages intended to cover most of the applications that are usually associated with a database. However, the design of such systems is more often based on a set of user interfaces rather than on a more general programming environment. When none of these interfaces is adequate for a given task, the user has to fall back on an embedded language, if this is available, and the added complexity of using a hybrid system may exceed the initial simplicity of the predefined interfaces. A good DBPL will, we believe, form a basis on which to build considerably improved fourth-generation systems. Targets

and Requirements

they have emerged during the preparation of this survey.

(1) The majority






for DBPLs

The complete set of requirements for database programming languages is certainly not yet clear; much more experimentation and experience are required. Similarly, we do not yet have sufficient knowledge to specify requirements that are mutually independent. The list below is therefore the authors’ speculation on requirements as


(7) (8)

of features commonly accepted as good design in programming languages will be inherited by DBPLs, but there will some important exceptions, such as the next item. Static and strong type checking is particularly important, not only to assist programmers in avoiding errors, but also to protect valuable long-term data from corruption. But some dynamic binding and incremental type checking are necessary. Even here strong type checking is required, and should be implemented eagerly, to indicate errors before wasteful, and possibly damaging, operations are applied to the data. The type system should be entirely consistent (data type complete) for data of any longevity, and there should be no exception to the rights to transience or longevity on the basis of type. A consistent naming system is required for data values, program components, and types, independent of longevity. As in languages such as Ada with modules, the interpretation of some names necessarily spans both program units and time. A set of simple, consistent rules, including the rules for type matching, should exist to control the binding of names between independently developed components, such as programs and data. A bulk type is required to describe regular structures. The necessary support for indexes may be provided by the same bulk type or another. Some mechanism is needed to iterate over the elements of bulk objects, and this is more useful if there is a method of predicting or controlling the order of iteration when necessary. Some form of inheritance or specialization mechanism is called for. The type system should support polymorphism, which may then

ACM Computing

Surveys, Vol. 19, No. 2, June 1987







M. P. Atkinson

and 0. P. Bun.ernan

subsume requirements (6) and (7) above, and, if it does, a well-developed programming environment will have numerous polymorphic procedures and types in its library. The type system and naming system need adequate expressive power so that the system builder may be precise in specifying a set of values or operations. For example, it should be possible to distinguish between the type of the elements of a set and the extent of that set. Persistence should be orthogonal to type, and it should be possible to write programs without taking into account the longevity of data. The programmer should not have to organize placement or movement of data but should be provided with operations to copy, protect, and secure data. A particular common case is the control of whether a location (e.g., field of a record) can be written to, after the initial value is assigned at record creation. Where possible, there should be succinct textual representations for data values (e.g., literals, record constructors).

More speculatively, we believe the following will emerge as requirements: (13) Notations

and support for nested transactions may be developed [Liskov et al. 19831, but these may need to be augmented by other transactional forms [Krablin 1985; Weihl 1985131. (14) The mechanisms for both concurrency and distribution will probably be derived from developments in programming languages, although the existing implementations developed through database research will be needed to support them. (15) The polymorphic type systems may become sufficiently powerful to permit generic application code and data models to be declared in the language. Then data model research will merge with, and contribute to, the design ACM Computing

Surveys, Vol. 19, No. 2, June 1987

of data structures and modeling methods. (16) Incremental change should be supported so that not only may new programs and new data be bound to existing compositions of programs and data, but early definitions may be corrected or refined without loss of access to the data to which they are bound. It is interesting to compare this list with earlier attempts to define the goals for a DBPL, for example, those by Wasserman [ 19781 and Wasserman et al. [1981]. Already the need for type abstraction, bulk types, and modularity was noted; however, recent activity in research into types now leads us to give type systems (particularly those involving modules and polymorphism) more prominence, and we are now convinced that a consistent treatment of persistence is now possible. Problems and Research Issues

Many problems remain to be overcome before DBPLs as imagined above can be realized. The research that may lead to such systems divides naturally into three areas:

(4 What are the theoretical underpinnings of integrated DBPLs?

(b) How are integrated DBPLs to be im-

plemented? (c) If we have such systems, what programming methodology is most appropriate for them?

These three questions can be approached in light of the requirements mentioned in the preceding section. For example, what type theory will enable us to understand both bulk data types and inheritance? A consideration of all three questions informs us of the choice of design for each aspect of the eventual DBPL. For example, difficulties in finding an efficient implementation of some bulk type might militate against its use, an inadequate theory to support the matching of values that emerge from independent contexts might rule out a form of inheritance or polymorphism, and the lack

Types and Persistence in Database Programming

of an acceptable method of programming might eliminate a model of concurrency. But this paper is about integration, which will not be achieved by considering the individual attributes of DBPLs in isolation; ultimately, it is the more challenging task of understanding their interaction and composition that will require the most serious research.



The first task is therefore, Task 1: Describe the database.

In traditional terms we are asking for a database schema; however, in the languages we examine, this description is sometimes part of a type definition. Not all the descriptions will be equivalent (in the sense that they could be automatically trans1. A TEST CASE AND SOME BASIC formed one into another). For example, APPROACHES there are conditions on the database that In order to illustrate the issues presented in some data models are implicit; in others in the previous section in a concrete fash- they can be represented explicitly in the ion, throughout this paper we use a specific type declarations (schema definition) or as set of database programming examples. In integrity constraints automatically endoing this, we must obviously make some forced; whereas in other systems it may be compromises. Database schemas can be ex- the responsibility of programmers to ensure tremely complicated (hundreds of record that all updating programs preserve these types or relations may be involved), and conditions. programs can be lengthy, involving wideHaving defined the database, we want to spread interaction with the user and with write three programs against it. The first is many parts of the database. Given that simple and is chosen to provide an introspace prohibits the presentation of large duction to each language: bodies of code, the database example presented here is intended to suggest realistic aspects of some large system. The example we use is an illustrative fragment of a manufacturing company’s parts database. The reader is invited to imagine the rest of this database and the other processes and pro- Database query languages are designed to grams that would use it. The database rep- make the expression of queries like this resents, among other things, the inventory extremely simple, and a measure of proof a manufacturing company. In particular, gramming languages in general is the simit represents the way certain parts are man- plicity with which the code for queries such ufactured out of other parts: the subparts as this can be expressed. that are involved in the manufacture of a The next example is somewhat more part, the cost of manufacturing a part from complicated and defeats many query lanits subparts, the mass increment or decre- guages: ment that occurs when the subparts are assembled. Note that manufactured parts may themselves be subparts in a further manufacturing process. The relationship between parts is therefore hierarchical, but it is a directed acyclic graph rather than a tree, for part D may be used in the manu- Since a part may be made from other parts, facture of parts B and C, which are both including parts that themselves are comused in the manufacture of part A. In ad- posite, this calls for at least one recursive dition, certain information must be held on traversal of the parts hierarchy of the dathe parts themselves: their name and, if tabase. The inability of relational query they are imported (i.e., manufactured ex- languages to perform recursive traversal or ternally), the supplier and purchase cost. transitive closure has long been recognized ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

[Aho and Ullman 1979].3 This example poses some additional efficiency problems: avoiding repeated computation of costs and masses of subparts, and the desire to compute both the cost and mass in parallel. In our solutions we develop a procedure that will derive the result for any part and then apply that procedure for the particular part with name “mast.” Occasionally, locating a part requires such effort that a separate routine, findPart, is defined. Thus, further efficiency issues are raised: Is there index support for the search for the part, and does the method evaluate the cost and mass of all the trees in the forest of part explosions, even though only one is wanted? Finally, we would like to demonstrate update by adding some information to the database:

The point of this last example is to examine where, in the program or type system, integrity constraints are implemented. Implementation of such updates are dominated by user interface construction and by the need to isolate the dialogue with the user from the consequent action of changing the database. In our examples we have largely ignored the user interface but have shown how the isolation may be achieved. The coding of Task 4 is always incomplete. What support should be given to organizing the dialogue with the user is a separate issue. There are also complex issues, not explored in this paper, about access to the database during this dialogue. If checks are to be made eagerly, which is most satisfactory for the user, then either locks will be obtained and held for the 3 Considerable work has recently been done to extend query languages to include some form of transitive closure, but it is not clear whether efficiency issues have been addressed [Bancilhon and Ramakrishnan 1986; Rosenthal et al. 19861. Moreover, a solution to the transitive closure operation itself does not necessarily solve all the problems of expressive power; in particular, it does not solve our third task, since we need to interleave arithmetic with recursion. ACM Computing

Surveys, Vol. 19, No. 2, June 1987

duration of a possibly long dialogue or the locks will be concurrently invalidated. The choice of a strategy is outside the domain of the language designer, and primitives that enable an applications programmer to specify a strategy should be provided [Krablin 1985; Weihl 1985b]. As we have already remarked, these examples only serve to represent a small fragment of what would actually be involved. The need to represent other data would set the context of this example better. For instance, it would be desirable to provide details of suppliers, their reliability, current stock, current orders, delivery times, assembly times, manufacturing processes, part strengths, shapes, and so on. Similarly, many processes would normally use such data as design, testing, maintenance, and production. With the trend to computeraided engineering and integrated factories, such a database would support not only CAD/CAM but also job scheduling, financial control, order processing, and staff administration. Any such extensive system must be designed and implemented to support change. Therefore, we are interested in the mechanisms for extending and revising the program, data description, and data in the composite system. It is also likely that short examples like these will fail to do justice to certain features of the languages that we review. Although we comment on these where possible, we must again excuse ourselves from a detailed presentation on the grounds of lack of space. 1.1 The Programming Language Illustrated with Pascal


To present the technical issues in terms that are familiar to most people, we attempt these tasks in Pascal and a relational database query language. Neither of these languages can be regarded as DBPLs: Pascal has inadequate persistence, and most query languages have inadequate computational power. However, many of the attempts to produce DBPLs are based on a fusion of the two approaches. These two examples will therefore serve as a basis for many of the later comparisons. They also show clearly that both systems are

Types and Persistence in Database Programming type PPTypc









PartLisl UsesList Vsedln Lisl SuppLisl

= = = =

t Par-l; tPartCell; T Use; /Use TSupyCell;



String GTlll7W, DO~~CITS






[i..lG] of char;

record {to represent a part} Name: String; UsedIn: Vsedhlisi; cast Pa&Type: PPType of Composite: (AssenlblyCost: Dollars; Crams; Alnssincrement: MadeFrom: UsesList): Base: (Cost Dollars; Mass: Crams; Suppliers: SuppL1sl)




the assenlbly-component NerllJses: NexlUsedIn:


UsesList; UsedInList

end; Partcell

re resent the set of all parts}



end; SuppCell var


= :



record Par?s: ParlLisl; supp11crs: Slrpplist end:

Figure 1.

Task l-describing

inadequate alone and, consequently, that integration is necessary. Both Pascal and relations have their origins in the late 1960s. New concepts may promise improvement, but as yet they do not have the advantage of 16 years of development. Many applications are written using a programming language alone, either managing without permanent data or using some programmer-contrived mechanism to store the data in files between program runs. Pascal has seriously limited forms of persistent data, so, like the B-tree example cited earlier, our code is no more than an illustration of what needs to be done. But we can use the types of Pascal to provide a representation of the part assembly as it might be implemented in secondary storage. We could, of course, code a real (secondary storage) database in Pascal, but we

the data in Pascal.

would have to resort to our own pointer management and would not gain the benefits of Pascal’s type description and checking. In any database in which the number of records is large or the schema is at all complicated, this approach is so error prone that it is not viable. Figure 1 shows a type declaration corresponding to Task 1.4 The data structure we 4 When we first worked through the various examples, we tried to adopt a uniform type font and case convention for various languages. However, we could not find any convention that was consistent with the cultures of all the languages in which we were interested. Instead, we tried to work each example in the style of the descriptions and reference manuals at our disposal. We have, however, tried to maintain a uniform naming convention, thus Part will usually refer to a data type and aPart to a value. Even maintaining a naming convention is not always possible. ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Bunemun


TaskB; {The type declarations

of fig 1)

var pl: PartList; .

{Code to build the data base must go here since we have no persistence)

be in p‘i := DatabascParls; while pl # nil do begin with plT.Co,~fT do if (ParlQpe = Base) and (Cost 1 100) then wGtoln( Name, Cost, Mass); pl:= pQ.Nert end end. Figure 2.

Task 2-a

Pascal program to retrieve

have chosen is a standard linked-list implementation of a many-to-many relationship, as described in Date [1983a]. It is also similar to the CODASYL implementation that we describe later. Invariably, some intermediate structure, a record, which we call Use, or a relation, which we call MadeFrom, has to be introduced to represent the many-to-many relationship between parts and subparts and to carry the Quantity information. In Pascal and similar languages, these Use records must be explicitly linked together in two lists: those that use the same component, Next&es, and those that contribute to the same assembly, NextUsedIn. These lists are held in the MadeFrom and UsedIn fields of a Part record. When traversing these lists, the pointers Uses and UsedIn allow us to move down and up the part hierarchy. To some extent, Pascal assists in preventing confusion over these pointers by its type system, but the support of sets or sequences in later languages much reduces the number of references the programmer has to manage. Unfortunately, the only “bulk” data types available in Pascal, array, set, and file, are inappropriate for this task. Moreover the programmer cannot construct an appropriate parameterized data type such as list-of, as demonstrated in ML later. Had this been possible we could then have declared types such as list-of(part) to model the collective objects in the schema explicitly. On the positive side, the variant record mechanism of Pascal neatly captures the specialization hierarchy of this ACM Computing Surveys, Vol. 19, No. 2, June 1987

expensive parts.

example and the requirement that every part be either a CompositePart or a BasePart. It models this case well since the specializations are exclusive but cannot deal so readily with specialization in which ouerlap of these specializations is permitted (later, in Section 3, we examine languages in which overlap is permitted). Another advantage of Pascal (as opposed, say, to PL/I) is typing of pointers. This is of substantial benefit in writing applications against structures representing many-tomany relationships where it is extremely easy to confuse two pointers. We shall see this problem of type checking again in many of the traditional methods of programming against CODASYL databases. There are in fact two kinds of type checking that are used (often simultaneously) in DBPLs: Static type checking like that in Pascal simply insists on all objects being typed in advance of program execution; dynamic checking prevents type errors at run time by raising exceptions. In the implicit pointer manipulation that is performed for some databases, one frequently has neither. The main omission of this type declaration is that there is no provision for efficient lookup of, say, a Part record given its Name. We need some sort of indexing mechanism, such as a B-tree or hash table, that would implement an efficient lookup, but we could not make it persistent and, as already explained, would have to write separate implementations for Name to Part and Sname to Supplier. Figure 2 presents a Pascal solution to Task 2. The list, con-

Types and Persistence in Database Programming Languages program



Task3; {The




of fig 1)

findParl(pn:Slriny):PartRef; {Returns reference to part named

VRI‘ pC PartList;


pn; uil otherwise}


begin result := nil; pC=Database~Paris;

while (result = uil) aud (pi # nil) do if plf.Conit.Name = pn thcu result := plt,Cont; else pl:=plT.Nexl; et;dpati

:= result


coslAsdMass(p: subTotalCost: Dollars;


begin with p do if ParfType begin

Part; VAI’ resultCost: subTotalMass: Grams;

Dollars; var ul: UsesLisl;



= Base then


:= Cost;


:= Mass




{recursively compute resulK?ost := AssemblyCosl; ul := MadeFrom; while ul # uil do

costs and masses of subparts} resultMass := Masslucremenl;

with ull do begin coslAudMass( UsesT, subTolalCosl, SubTolalMass); resullCosl := resultCost + quaniity * subTotalCosl; resultMass := TesullMass + quantity * subTotalMass; ul := Nerl Uses

cud cud end; procedure var


main; Dollars;





begin pref := findPari(‘hla.st if pref # nil then begin costAndMass(prelT, itsCost,

writelu( end


‘); ilscost, i&-Mass)

code to build



Figure 3.



as for figure


end. Task 3-Pascal

code to compute cost and mass simultaneously.

strutted as parts are added (see Figure 4), is located from the Database variable and then scanned. Every part is then tested to determine whether it is a base part on the variant field, and whether its cost exceeds $100. This explicit iteration, the total scan, and the explicit write statement are all unnecessary in some of the later systems we review. Figure 3 sketches a Pascal solution to Task 3. Organizing a depth-first recursive traversal of the part hierarchy presents no

logical problems; it is achieved here by recursive calls of costAndt4ass. The introduction of parameters and of local variables to represent partial totals permits both values to be calculated in one traversal (something that cannot be done in some other languages). The structure of costAndMass is easy to recognize. In contrast, the intent of the relatively simple function findPart is much less obvious. Again, what is required here is an efficient (search tree or hash table) indexing type. ACM Computing

Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson



and 0. P. Buneman


{The type declarations {Code to interact with t,he user to together with its sub arts and their structed together wit3 1 a MadeFrom i.e. pointer: exist from its elements

of fig l} get the name, assembly cost and mass of a new part quantities. We assume that a PartRef pr has been conlist. but this has not vet been installed in the database: to s’tructures in the database but not vice versa}

{load existing database} var



be in {install



the part referenced by pr}

u7 := prl .MadeFrom;

while ul # nil do with ult do begin

{add each component to uses list}

Nezt UsedIn := Uses1 Usest~ UsedIn := ul; ul := Next Uses

end; 1~ew(pl);


:= pr; plt.Nezt


Figure 4.


:= Dalabase.Parls;


:= pl;

DB into some stored form)

Task a-pointer


required to install a part.

Part(&: Pnum, Name: Slnng) BosePart(&: Pnum, Cost: Dollars, Mass: Crams) CompositePort(&: Pnum. AssemblyCost: Dollars, MassIncrement: MadeFrom(Assembly: Pnum, Comaonevf: Pnum, Quantity: Poslnl) SuppliedBy(&: Pnun. &: Snmn)

Figure 5.

A relational


The question posed by these examples is whether the powerful higher level types such as index or relation can be introduced-making Task 2 and findPart succinct-without loss of regularity that makes CostAndMass straightforward to code. There is also an inefficiency in the code of CostAndMass arising in the recalculation of data about common subassemblies, which we later show can be addressed without obscuring the algorithm when such higher level types are available. Figure 4 shows how Task 4 can be coded in Pascal. No facilities exist in the language for identifying this as a transaction5 or for dealing with concurrent access. The only consistency ensured by the above example ’ That is, a collection of operations that appear (a) to run to completion or not to run at all and (b) are indivisible in that no other program can run against the same data during the transaction [Date 1983a].

ACM Computing Surveys, Vol. 19, No. 2, June 1987


of part of the database.

is that if part A Uses part B then B is Usedh part A. This program forms the sequences of Uses, inserts the inverse pointers, and constructs the new extent of Parts. 1.2 The Relational


Since the relational model’s initial formulation [Codd 19701, it has been a basis for discussing data design, and as techniques have been developed for the efficient representation of relations and relational operators, relational systems have become increasingly marketable. Schmidt and Brodie [1983] give a good survey of this species of database system. Figure 5 shows how the data could be described in a prototypical relational system. It shows the column and domain names of each of five relations with the

Types and Persistence in Database Programming




Each base and composite part is also a part} b in BasePart there exists p n1 Part such that p.Pno = b.Pno for each c in CompositePart there exists p in Part such that p.Pno = c.Pno

dor each

A part must be a composite part or a bought in part} 1or each p in Part



b in BasePari

such that



c iu CompositePart


= b.Pno


such that p.Pno = c.Pno

Exclusion - we don’t make parts we buy} each c in CompositePart

1or each b in BasePart,

b,Pno # c.Pno

Only Composite Parts can be assembled) there exists c in ComposilePari

j or each m in hfadeFrom such that m,Assembly

= c.Pno

{and the parts they’re made from must exist} and there exists p in Part such that p.Pno = m.Component only buy in Base Parts} 1orWeeach s in SvppliedDy there

exists b in BasePart such that s.Pno = b.Pno

base part.s have a supplier.} each b in BasePart there exists s in SuppliedBy such that b.Pno = s.Pno

Figure 6.

Some integrity


primary keys underscored.6 But it does not define the interdependence of the relations nor the interpretation of the values within them. Some would contend this is a symptom of a general shortcoming of the basic relational model in that it fails adequately to define the interrelationships between relations [Date 1981a] and carries insufficient indication of the semantics of a database [Codd 1979; Kent 1978, 19791. In contrast, Merrett [ 19831 contends that this lack of semantics is to the relational model’s advantage, making it a better basis for organizing data storage and as a kernel for data manipulation languages. He illustrates this with a demonstration of how the model may represent line drawings [Merrett and Duchting 19841 and text [Merrett 1985a]. We return to Merrett’s work in Section 6.1.2. It should also be noted that relational systems do not usually provide, or allow the definition of, the domains used in Figure 5. Figure 6 indicates some of the constraints needed in addition to those imposed by the keys in the relational schema. These have ‘For tutorial material on relations, the reader is directed to Date [1981b, 1983131, Merrett [1984], Tsichritzis and Lochovsky [1977], and Ullman [1982].

on the parta relations.

been stated in predicate or “tuple” calculus [Ullman 19821 and in English in the comments, but they could equally well have been stated as equations or inclusion relationships on relations derived using relational algebra. In the Pascal schema (type declaration) we saw that the use of reference types can be used to capture some of these constraints. In relational databases these constraints can also be stated in terms of foreign keys, which have been discussed [Date 1981a; Codd 19791 but not, to our knowledge, implemented as constraints in any practical relational database management system. Another important constraint is that the parts explosion diagram be acyclic, that is, that no part be directly or indirectly a subpart of itself. Figure 7 attempts to express this constraint in the same style as the previous figures. What we need to do is define the transitive closure Above, of the part-subpart relationship, from the relation Madefrom. Logic programming provides one of the most succinct methods of specifying the transitive closure if we actually wanted to implement this. The constraint at the bottom of Figure 7 says that a part cannot be both above and below itself in the parts explosion diagram.

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman Above(r,y) if there exists m in Undefrom such that mAs.sembly = x and m.Comyoneni = y or there exist z and m in Madefrom such that Above(x,z) and m,Assembly = z and m.Component = y for all (z,y) if Above(x,y) then not Above(y,x) Figure 7. be acyclic.



the parts explosion

Part,Pname, easePart.Cost, Part, Basepart Part.Pno=BasePart.Pno BasePart.Cost 2 100 Task a-retrieve

Surveys, Vol. 19, No. 2, June 1987


details of expensive parta in SQL.

The ability to express this constraint in an abstract data type will be of interest to us in later sections, especially 4.1. The difficulty of programming with pointers was one of the strongest arguments for the use of the relational model. At the time that the relational model was introduced, much database programming was done through the use of low-level packages that manipulated the pointers on secondary storage. To imagine what these early database management systems were like, we could try removing the type information from our Pascal program, thereby laying open the possibility of confusing the pointers to different types of objects. However, the introduction of typed languages and data definition tools (as well as partially typed systems such as CODASYL) has done much to alleviate problems of pointer error. Yet Pascal types are still not nearly as simple as the relational schema. The problem here is that references are used both to maintain referential integrity and to build the data structures required in the database. The importance of referential integrity has been supported by Atkinson [ 19781 and Date [1981a]. The first two constraints of Figure 6 also suggest a type hierarchy, which was modeled as a variant record in the previous Pascal example. Later these ideas of generalization and specialization hierarchies [Smith and Smith 19771 are seen as fundamental to the semantic data models

ACM Computing

diagram to

[Brodie et al. 1983; Borgida 19831. A number of constraints could have been made unnecessary by eliding the first three relations; this would leave a less precise description of less compact data, and the problems posed above would not have gone away, since they would have reappeared in the treatment of not applicable or null values. Most relational systems offer a simple query language and data manipulation language. This would usually suffice for Task 2, as we illustrate by coding the query in SQL [Date 1983131as shown in Figure 8. This relational calculus query avoids much of the complication of the Pascal program (Figure 2), since iteration over the set of base parts and output of the result subset are both implicit. However, the Pascal program had one advantage: It did not need a join (the first line of the WHERE clause) to deal with the inheritance implicit in the specialization hierarchy. By contrast, Task 3 is impossible in most relational query languages. As we have already noted, it can be shown Aho and Ullman [ 19791that a general transitive closure operation cannot be expressed with a relational algebra expression. Since relational query languages are based indirectly on relational algebra and since this task calls for something at least as powerful as a transitive closure operation, the failure of query languages at this level of complexity is not surprising. A user confronted with a prob-

Types and Persistence in Database Programming




that the parts explosion diagram has only two levels; that is, composite parts are only constructed directly from base parts. The (1) Unload the relevant portions of the solution presented here may not be the database into a file and use a separate shortest or the most efficient. Views-eslanguage, or even hand calculation, to sentially function definitions-are used to complete the task. construct the intermediate relations used (2) Use a more powerful language that can in this computation. First rell is defined so call directly upon the query language that it contains part numbers of each comfor partial computations. We discuss posite part and the total cost and total mass this under embedded languages (see for each of its subparts (which must be Section 2.4). in BasePart); rel2 essentially renames (3) Write a query that performs the com- CompositePart so that its column names putation to some finite depth. For are compatible with those of rell; rel3 is example, one might assume that no then constructed to contain the total cost assembly has a subassembly that itself and total mass of each composite part. By further relabeling of columns, we include has subassemblies. the base parts with their costs and masses The first option is probably the one most to get the final result, a relation of all part frequently used in the commercial world. numbers with total cost and mass, in rel5. Analysts or “end users” are unaware of Note that much of the complexity of this what software may be available to perform code lies in the need to relabel relations in more complicated queries, and systems pro- order to take unions. Even at this point, grammers and designers are unaware of the the SQL code starts to look considerably amount of hand calculation that follows a more forbidding than our previous Pascal query or set of reports. This has three con- solution. There is a further drawback. As it sequences: The requirement for more gen- stands, the evaluation is conducted for eral computational power continues to be every part, even though the result was reunderestimated, the volume of data trans- quired for only one. A clever optimization ferred or printed is unnecessarily high, and strategy [ Jarke and Koch 19841 might there is a possibility of introducing errors avoid this, but not without some cost. It is not hard [Nikhil 19841 to combine during transcription and interpretation. the operators of the relational algebra with This mode of problem solving contributes to the popularity of fourth-generation sys- some form of function definition in a simple interactive language to provide sufficient tems, which may avoid the transcription computational power to solve Task 3, and but not the interpretation. The first two options require the user to one wonders why this is not done more understand a separate and more compli- often. We shall also see (in Section 6.1.2) cated programming system and place an that the relational algebra can be extended effective barrier on the sophistication of to provide a solution to this task. We menqueries that users can pose for themselves. tioned in the Introduction that two reasons The difficulty of acquiring skills in the are often given for the failure of most remore complicated language-due more to lational query languages to provide adethe lack of time rather than lack of quate power to solve Task 3: One is that intellect-often accounts for the rather ar- end users will (for whatever reason) be unable to learn a more complicated lantificial distinction between “applications” and “systems” programmers and accounts guage; the other is that a relational lanfor the common frustrations of data ana- guage should be limited to the operations lysts who do not see themselves as com- that some database processor can perform efficiently. It would seem that this is the puter programmers. Figure 9 shows an attempt to apply the sort of program that an end user might third option on the unrealistic assumption reasonably want to write and that it should lem such as this would therefore be confronted with the following options:

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman DEFINE

TEMPORARY VIEW rell(Pno,, VC, Vm) AS SELECT CompositePnrl.Pno, Quantzty * Cost, Quantity FROM CompositePart, BasePart, MadeFrom WHERE CompositePart.Pno = MndeFrom.Assembly AND BasePart.Pno = MadeFrom.Component DEFINE SELECT FROM


TEMPORARY VIEW rel2(Pno, Pno, AssemblyCost, Afasslncremetst CompositePart


* re12)) BY

* Mass

VC, Vm) AS

SC, Xm)







Xc, Xm)








SELECT itscost, itsmass FROM rel5,Par-t WHERE Part.Name = ‘Mast’ AND Part.Pno = RelJ.Pno

Figure 9.

A partial


be possible for such a program to be formulated. Moreover, this is precisely the sort of task that a database machine can perform efficiently, and the only reason for excluding it is that by allowing general recursive programs, one is allowing certain computations that might not be efficiently implemented. Performing the update, Task 4, in SQL, demonstrates the need for transactions and integrity constraints. For example, we need to ensure that the part-subpart relationship remains acyclic and that the condition that a part is either base or composite is maintained. Moreover, it is up to the user to invent an appropriate part number (the need for which is engendered by the relational model). Figure 10 shows the interaction that might take place if SQL were to be used for this task. We should note that this update would probably be preceded by a query that found the relevant part numbers given, say, their names. The fact that this whole process is so prone to constraint violation and transcription erACM Computing

Surveys, Vol. 19, No. 2, June 198’7

for Task

3 in SQL.


Part CompositePart


Figure 10.







rors means that in any working environment this update would be implemented using an embedded language if it were to be performed at all frequently. Further discussion of query languages may be found in Jarke and Vassilou [ 19851. The purpose of this section was to introduce four tasks that we believe are characteristic of database programming and to show two approaches that for different reasons failed to meet the challenge: Pascal because of inadequate persistence and lack of the appropriate data types; SQL because of lack of computational power and, arguably, lack of an adequate data model.

Types and Persistence in Database Programming Languages Whether some combination of these approaches can overcome these obstacles -is the subject of the next section. 1.3 Preview of Variations Coding the Tasks



The following list previews the variations that emerged as these tasks were coded in different languages. It should be noted that there is correlation between variations, either because they depend on the same underlying design decisions in a language or because several of the languages are derived from others. These notes are intended to help readers already familiar with the general area and who wish to locate or contrast languages for a specific characteristic.



0 Support for set/sequence of components: none (1.1); none but can be generically coded (4.1, 4.2, 5.1); sets provided (3.1); sequences provided (2.1,3.3); not explicit (1.2). 0 Pointer support: all delegated to programmer-misuse prevented (5.1); delegated to programmer but referend type statically constrained (1.1, 4.1, 4.2); inverse automated (2.1). of set of entities (e.g., l Representation Parts ): no support (1.1); self-coded polymorphic set (4.1, 4.2, 5.1); explicit index (4.2, 5.1); extent associated with type (1.2, 2.1, 3.1, 3.2, 3.3). l Availability of indexing: not available (2.1, 4.1); presumed or implicit (1.2, 2.2, 2.1,3.1,3.2,3.3); definable (4.2); provided as an explicit object (5.1). Description of base types/domains (e.g., l grams): no provision (1.2, 2.1, 4.1, 5.1); renaming (1.1, 2.2, 3.1, 3.2); renaming and restricted operations (3.3); parametric (4.2). Naming power: types only (1.2, 2.1); vall ues and types (1.1, 4.1, 4.2, 5.1); values, types, and extents (2.2, 3.3). 0 Control of update of attribute values: none (1.1, 1.2, 2.1, 2.2,3.1); modification preventable (3.2, 3.3, 4.1, 5.1). l Treatment of persistence: no persistence (1.1, 4.1); only persistent (1.2, 2.1); persistence for subset of types (2.2, 3.1); binding by generated procedures (2.1); uniform persistence via preserved workspace with static binding (3.3; 4.2); uniform persistence with dynamic binding (5.1, 5.2). Procedures in the database (3.2,3.3,5.1). l

TASK 1 Available type constructors: relation (1.2, 6.1.2); records, arrays, and pointers (5.1) with renaming and enumeration (l.l), with limited type parameterized relations/sets (2.2, 3.1); entity classes with limited type parameterization (3.1); fully type parameterized entity classes (3.2, 3.3); abstract data types (4.1, 4.2, 3.3); polymorphic types (4.1,4.2). Description of aggregates (e.g., for Parts): records (1.1,2.1,2.2,5.2); structures (4.2, 5.1); tuples (1.2, 2.2, 4.1); extensionally defined functions (3.1, 5.1,6.1.3); entities (3.1, 3.2, 3.3). Support for uniqueness constraint: none (1.1, 4.1, 4.2); associated implicitly with a set construct (1.2, 2.2); supported via indexes (2.1, 5.1); explicit (3.1, 3.2, 3.3). Referential integrity: no support (1.2, 2.2); implicitly supported (1.1, 2.1, 3.1, 3.2, 3.3, 4.1, 4.2, 5.1). Representation of part specialization hi- Y%S’K 2 erarchy: explicit pointer (2.1, 5.1); com- l Identification of database: implicit (1.2); as a parameter (2.2, 5.1); no option due mon key value-not explicit (1.2, 2.2); to static binding (3.3, 4.2). variant record (1.1, 4.1, 4.2); subtype/ subset of extent (3.1, 3.2, 3.3); inferred 0 Iteration: implicit (1.2, 3.3, 4.1); explicit subtype (5.2). without control of order (2.2, 3.2, 4.2); explicit with _.control of order (1.1, 4.1, Representation of many-to-many rela4.2,5.1); explicit with order determinable tionship: explicit introduction of interby declared sort (3.1). mediate object using explicit pointers (1.1, 3.1, 3.2, 3.3, 5.1), using foreign keys l Recognition of subclass (BasePart ): test (1.2, 2.2), using implicit pointers (2.1). value in record (1.1,4.1,4.2); test record/ ACM Computing

Surveys, Vol. 19, No. 2, June 1987





M. P. Atkinson and 0. P. Buneman

class name (5.1); scan separate extent (1.2, 2.1, 2.2, 3.1, 3.2, 3.3). Method of obtaining data from supertype (Part ): explicitly following pointer (2.1, 5.1); in same record (1.1,4.1,4.2); explicit join (1.2, 2.2); by inference (3.1, 3.2, 3.3); by nested iteration (2.2). Printing: explicitly stated (1.1, 2.1, 2.2, 3.1, 3.2, 3.3, 4.1, 4.2, 5.1); implicit (1.2,

3.3). TASK 3 Recursion support: fixed depth only possible (1.2); declaration of transitive closure operator (4.1, 4.2); function calls (the rest). Transient set support (for memo): none (1.1, 2.1); generically coded (4.1, 4.2); locally declarable and initialized (2.2). Support for locating a part: serial search (1.1); set selection (2.2); set selection and desetting (3.1).

TASK 4 l




Grouping operations into atomic transaction: no provision (1.1, 4.1, 4.2); synonymous with a program (2.2); synonymous with operations on top level data (3.3); transaction operations provided (e.g., start transaction) (1.2, 5.1); syntactic identification of transactions (3.1, 3.2). Allocation of unique scalar values (e.g., Pno): not available (1.2); needs encapsulating to store (2.2,3.1); stored in a bound variable (3.3, 5.1). Separation of dialogue and update: depends on user transcription (1.2); database values cannot be constructed (2.1); partial intermediate values can be constructed, but insert into extents not controlled (3.1, 3.2,3.3); intermediate values explicitly inserted (2.2, 5.1). Support for verifying constraints: left to programmer (1.1, 2.2); some automated (2.1, 3.1, 3.2, 3.3); an ADT can be constructed, preventing others from violating constraints checked by that ADT (4.1, 4.2, 5.1).7

‘Note there are difficulties for long-term data with this approach. If the operations provided by the ADT are inadequate, redefinition of the ADT loses the existing instances. New methods of encapsulation are therefore needed. ACM Computing Surveys, Vol. 19, No. 2, June 1987


We now turn to solutions to our problem set that actually work. We shall discuss several approaches. The first is the traditional solution of communicating with the database through a set of subroutines. CODASYL database management systems are almost always used in this fashion, and much database programming is done by using CODASYL subroutines from languages such as COBOL, PL/I and FORTRAN. In fact, preprocessors or modified compilers have been developed for these languages, COBOL in particular, that provide a more consistent surface syntax for these subroutine calls. However, we do not want to call COBOL-CODASYL a database programming language because the CODASYL schema declaration, the Data Definition Language, is not part of the “type” declaration of COBOL. The honor of being the first successful integrated database programming language must go to Pascal/R [Schmidt 19771. Pascal/R is Pascal with an added relational data type. It has been used fairly extensively and is the second language presented in this section. We shall then examine another commonly used method of database programming, that of embedded relational languages. 2.1 The CODASYL Approach: The Database as External Subroutines

Providing access to the database through a predefined set of subroutines is the usual method of database programming for all prerelational database management systems and is again common in microcomputer-based database management. We shall use CODASYL as an illustration since it allows one of the most sophisticated interfaces. Since the host languages have inadequate type systems for describing the database, this must be done in a separate language, the Data Definition Language (DDL). A CODASYL DDL, called a schema, describes the logical structure (with physical undertones) of the permitted values for the database. From this, subschemata, or views, may be generated through further use of a DDL. Most database textbooks (e.g., Date [1981b]) describe

Types and Persistence in Database Programming Languages 1 CALC(

... )





Part L














Figure 11. The Bachm tan diagram for Task 1.

how this is done. In general, the quality of languages used with CODASYL databases has been somewhat worse than those used with relational databases, their development having been retarded by the complexity of the CODASYL specifications. A survey is to be found in Samet [1981]. The CODASYL interface is most commonly used from COBOL, PL/I, or FORTRAN. None of these languages have appropriately typed functions or adequate control structures to permit a good assimilation of the CODASYL database model (even if it were simplified) into the language. We do not illustrate these existing interfaces, since this would consume space without shedding much light on types or persistence. The interested reader is referred to Atkinson et al. [19&Q] for an example of such an interface. However, we do show how an efficient and convenient interface can be constructed using the types of Pascal. To prepare for this, Figure 11 shows the Bachman diagram for our example database, and Figure 12 shows the corresponding schema.8 Note that some of the referential constraints in Figure 6 are BFor those who need to know more about CODASYL DBMS, tutorial material and discussion can be found in Olle [1978].

implicit in CODASYL, and others are made explicit with MANDATORY set membership. Consider how to represent the specialization of Part (we are not aware of a CODASYL mechanism to express the exclusion hierarchy): (1) Place all the fields in one record to represent all parts, encoding not applicable with some appropriate null value. Note that this is similar to an alternative we described for relations. (2) Use a set to relate the most general type to other records that contain the additional information appropriate to its specialization. Note that this allows specialization to be overlapping or exclusive. The second option is taken in the example given here by using the set Extra. Since reference, via sets, is supported, it has not been necessary to introduce Pno, an artificial key, as in the relational example. Readers familiar with Bachman diagrams will recognize the structure formed by the record types CompositePart and Use and the three sets UsedIn, Extra, and Uses. It usually indicates the presence of some kind of graph structure (in this case a directed acyclic graph (DAG) on the parts) ACM Computing Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson and 0. P. Buneman SCHEMA name is PartsDB. AREA Par&-Assembly. AREA PartSupply. DATABASENAME




one for every part} RECORD type Part; area PartsAssembly; WITHIN I name an extent] LOCATION mode is CALC on Name USING ParlName;{specify DUPLICATES are NOT allowed;{specify uniqueness constraint} PICTURE A(X). 02 Name;

an index]

RECORD type BasePart; WITHIN area PnrlsAssembly; LOCATION 02 Cod; PICTURE g(4); PICTURE 9(i). 02 Mass;

{one for each imported is VIA Ezlra;

RECORD type CompositePart; WITHIN arca PartsAssembly; 02 AssemblyCosi; 02 Masslncreme?&

{one for each assembled part} is VIA Ezlra;


RECORD type Case; WITHIN area PartsAssembly; LOCATION 02 @anlily; PICTURE g(4). RECORD type Supply; LOCATION WITHIN RWR PartSupply; RECORD type Supplier, LOCATION mode is CALC

mode is VIA

{one for each use of a part} is VIA Uses; SupliedBy;

on .._


Figure 12.


and is quite common in the database designs we have observed. Its presence generally implies that there are meaningful recursive programs associated with the database, such as the recursive solution to Task 3. The considerable investment in existing databases and in existing programming languages means that developing better techniques for their combined use remains important. The use of Pascal with CODASYL is illustrated here, using an interface ACM Computing Surveys, Vol. 19, No. 2, June 1987


DDL describing


part explosion)

the parts data.

described in Buneman et al. [ 1982b], which has a number of advantages. It exploits the strong static typing of Pascal programs so that the Pascal compiler itself prevents many of the errors common in programming with CODASYL databases. The technique is to generate automatically a set of named types and procedures appropriate to manipulating data described by a particular CODASYL schema or subschema, in contrast with other interfaces that use a predefined set of procedures. The form of the

Types and Persistence in Database Programming Languages type


















Figure 13.



= record mse RecordType: (BasePartType, ComposifePariType) of BasePartType: (BaseParlVaal:BasePartRefi; ComposifePariQpe: (ComposiiePart ValzComposilePartReJ) end; = record found: boolean; . . . {control information] end; = record found: boolean; .&control information) = recdrd found: boolean; . . . {control infotmation} end; = record found: boolean; {control information} end; = record Cost: 0..9999; Mass: 0..99999999 cud; = record AssemblyCost 0..99999; h~asslncremeni: end; record Name: packed array(l..l6] of Char end; record Quahh: 0..9999 en-d; -

Pascal types automatically

generated types can be seen in Figure 13, which is part of the data description that would be generated from our sample schema (see Figures 11 and 12). Note that the names of the types are constructed from the names in the schema in a systematic way. Record type names in Pascal are the same as the corresponding schema record type names. Field names within the record are the same as the names in the corresponding schema record and as near as possible have an equivalent type to those in the original schema. A variant record has been automatically introduced to accommodate the set Extra, which has more than one type of member. Thus our representation of specialization automatically maps to Pascal’s corresponding construct. An extra set of types, to be used as tokens equivalent to database currencies in CODASYL, is introduced with the postfix Ref. A sample of the corresponding procedures appears as Figure 14. The procedure name (e.g., FindFirstInAreaPart) encodes both the operation and the operand (record


generated from a CODASYL


type and area or set type). The second part is the name of the set or record type copied from the schema.’ The types of the parameters ensure that the appropriate type of currency is supplied and the appropriate type of currency or record is yielded. This set of generated types and procedures has linked the data definition in the program in two ways: It has defined the mapping of types and data between database and programming language, and it has made the names introduced in the schema available in the programming language’s name space in such a way that they can only be used for operations that are appropriate to their values. This relieves the programmer of the need to set up a mapping and therefore reduces the number of mistakes that can be made. However, the programmer still has to understand the CODASYL schema, which is significantly different from the notation and structures familiar from Pascal. ’ In the actual implementation ation.

there is some abbrevi-

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman {procedures procedure procedure procedure procedure . . . {similarly

for the Part class} GelPari(InRej! ParlRef; FindFirsiInArcaPari(var FindNeziInAreaParl(InRej? FindByIieyPart(IieyVal: for other record classes

{procedures procedure procedure procedure . . {etcetera

for the Uses set} FindFirslInSef Uses(InRe/: ComposileParlReJ var OutRef: VseRej); FindNeztInSeiVses(InRef: UseReh var OufRef: UseRef); FindOwnerInSetUses(InRef: UseRef; var OuiRcf CompositePariRef); for other Codssyl verbs and sets}

v~r OutRec: Part); OuiRef ParlRef); PariRef var OufRef: ParlRef); PariName; var OufRef PariRef); and other Codasyl verbs.}

. . . standard procedures for open, close for each area etc.) . . . I standard procedures start’Ikansaction, commit~ansaction Figure 14.

Pascal declarations

generated from the CODASYL


. ..{Declarations

from figure 13)

of external procedures listed in figure 14)

var aBasePartRef: aPartRef: PariRef;

BaseParlReJ aPart Pa&,



begin FindFirslInAreaBasePart(aBasePariRef); while aBasePariRef .found do begin GelBaseParf(aBasePartRef, aBasePart); if aBasePari.Cost 2 100 then begin FindOwnerInSetEzLm(aBasePartRef, aPariRef); GetPari aParlRef, aParf); writeln t aPari,Name, aBasePart.Cosi, aBasePari.Massl end {if}; FindNeziInAreaBaseParl(aBasePariRef, aBasePariRef) end; {while loop} end.

The code for Tasks 2 and 3 in Figures 15 and 16 has the same structure as that of the (pure) Pascal code given earlier in Figures 2 and 3. However, pointer manipulation has been replaced by various calls to external subroutines. In Task 2 the variable aBasePartRef is a Pascal record that contains currency information needed by the CODASYL data manipulation subroutines and a field named found that indicates, after any Find operation, whether an appropriate database record was found. Thus, the test aBasePartRef.found in Figure 15 corresponds to the test pl # nil in Figure 2, and the call to GetPart corresponds to dereferencing in the same figure. Again, Task 3 exhibits the same complexity as the Pascal program. It is in this kind of task that other CODASYL interfaces become intractable because the programmer must

ACM Computing Surveys, Vol. 19, No. 2, June 1987



. ..{Type declarations

Figure 15. Finding expensive parts with the Pascal-CODASYL interface.


explicitly control currency changes during recursion and must take care not to confuse currencies of different “types,” since they are usually all typed the same way, for example, as integers. The code for creating and storing a new composite part (Figure 17) is comparable to that for traditional CODASYL interfaces. In those interfaces the procedures for storing records are essentially unparameterized and operate by inspecting and modifying a global state. For this example, in contrast to Task 3, the side effects created by one database operation would produce the correct global state for the next operation; thus some of the variables in Figure 16 would not be needed. In the solution presented here, the first parameter of any Store . . . procedure is the record to be stored; the last is a database reference that

Types and Persistence in Database Programming Languages program



. .{Type declarations .{Declarations

from figure 13)

of external

procedures listed in 14)

procedure var


coslAndMass(p: ParlRef; var eRec: ExtraClass; bRec: BasePart; cRec: ComposileParl; uRej: UseRef; uRec: Use; pRej: ParlRef; subToialCos& subToia1Mas.s: integer,




begin FindFirslInSetExlm(p, case eRec.RecordType Bas+;t Type :



&iBaseParl( eRec.BaseParl resultCost := bRec.Uni~Cost;

Val, bRec); resullMass

:= bRec.Mass

base part case}



begin GelComposileParl(eRec.ComposilePorl Val, cRec); resultCost := cRec.AssemblyCosl; resullMass := cRec~MassIncremenl; FindFirslInSetlJses( ~Rrc~CompositePari Val, uRej); while -uRef ,jound do

begin GetUse(zlRej, uRec); FiudOwnerUsedIn(uRef, pReJ); cost4ndMass(pRef, subTolalCosl, SubTotalMass); resultCost := resultCost + uRec,Quanlily * subTotalCosi; resultMass := resuliMass + uRec.Quantity * SubTotalMass; FindNeztInSet Uses( uRef, uRej)

end{while more subcomponents} end{dealing with ComposilePatiType} end {case of either specialisation} end; {procedure Cos2AndMass) procedure var






begin FindByKeyPart(‘hlast

if not iheParl.jound else begin costAndMass(lhePaTt,

writeln(ilsCosl, end end; main begin

inteaer: ”

‘, IhePar1);



itsCosi, itsMass)

find it’)


maitr end

Figure 16.


is set to refer to the record once it is stored. The intermediate parameters are references to owner records, one for each set in which the record being stored is a member. Each set is assigned a parameter position based on the order the sets were declared in the schema from which the interface was generated. Thus the program can only be understood with knowledge of the schema. This is slightly unsatisfactory but is less cumbersome than constructing a record

program to compute cost and mass.

type in which the sets could be named. A better solution is to use keyword parameters, and an Ada interface for CODASYL was also specified in Buneman et al. [1982c]. This approach is considerably less cumbersome because one can use keywords and exploit overloading so that, for example, all the store procedures are called Store. This technique is practical and widely available. We reiterate the two reasons for its importance: It permits existing

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman program . ..{Type

T&4; declarations

. ..{Declarations

from figure 13}

of external procedures listed in 14)

procedure do Task4 ; type inputL&Type = tlnpulcell; inputCell = record Name: packed array Quantity: 0..9999; Next: ivputListType end;

[1..16] of Char;

vm inputlid: inpulLtslType; newName: packed array [1..16] of Char; neuAssemblyCost: 0..99999; newMasslflcrement: 0..9999999; {inputList is a list of nanlcs of the subparts of the new part} {together with their quantities} apart: Part; aPartRef, aSubPartRef: PartRef; aCompositePar2: CompositePart; aCompositeRef: aUse: Use; aUseRef: CJseRef; begin


{code to interact with the user and get values for newName} { newiissetnblycost, newkfasslncrement and inputList} with apart do Name:=aewName; SforePart(aPart, aPartRef); with aCompositePart do begin h4assIncremenf:=newMassIt~crement; end; StoreCompositePart( aCompositePart,

{Create and store the new part} AssemblyCost:=newAssemblyCost aPartRef,


while inpvfLisf # nil do begin {Now establish links to sub-parts] fi,ldByI~eyPart(inpulLrst~.NanLe, aSubParhf); aUse.Quant~fy:=i~~pu/L~s~~~Q~~nnlily; SloreCompoae~~t(alJse, aPartRcf. aSubPartRef, crlJseRef); taputList:=i,,pzltL~sft.Nerl end end; begin



Figure 17.

Pascal program to update a CODASYL

databases to be used by any strongly typed language,l’ and it exploits the type checking to eliminate a class of common mistakes. A recent application of the same technique has been reported [Bever and Lockemann 19851 with a different data model and language. There the names are grouped using an abstract data type (ADT), and some of the potential for type checking and polymorphic types (there called PADTs) is recognized, but the interface is not generated automatically. “‘More than one database with different schemata, and even different data models, could be bound to one program.

ACM Computing

Surveys, Vol. 19, No. 2, June 198’7

2.2 Pascal/R: Language


A True Database


In this section Pascal/R is used to illustrate a group of languages often called integrated DBPLs [Pirotte and Lacroix 1980; Atkinson et al. 19841, which attempt to combine a general-purpose programming language with a data model” as consistently as possible. The level of integration depends on the degree to which the designers and implementers were prepared to change the language (most chose to start with Pascal). Pascal/R has been fairly widely used in I1 Rather than with an existing in Sections 2.1 and 2.4.


database, as

Types and Persistence in Database Programming Languages universities for research and teaching and has also been used for some commercial applications. It is also the starting point for Modula/R [Koch et al. 19831 and for the present work of Schmidt’s group DBPL [Schmidt and Mall 19831. The first problem facing the integrated language designer is how to relate the two type systems, that of the data model and that of the programming language. In Pascal/R the tuple is identified with the record, and relation of is introduced as a constructor similar to set of. It takes two parameters, the type of the record, which may be a tuple of the relation, and the subset of that record’s fields that constitutes the primary key (there are restrictions on the types allowed as such a record’s fields-we discuss the impact of these restrictions later). Other problems to be overcome are how to identify databases and how to introduce names from the database into a program. Pascal/R added the database constructor to Pascal. This constructor is parameterized in a manner similar to the record constructor, except that all fields have to be of type relation (another restriction considered later). A database variable may then be declared in the program and appear in the program’s parameters (cf. file parameters in Pascal). This allows the same program to be run with different databases of the same type and introduces the relation names for that database into the program, a dynamic binding of the program to the database. The necessary type description of those relations ensures that their column names are available in the pr0gram.l’ At the moment of binding the system must check that the type of the database variable is an allowed view (subset perhaps) of the recorded type (schema) of the database. This is an example of the eager incremental type checking necessary in DBPLs. The designer also has to decide what manipulations to provide. In Pascal/R re‘* The implementationrestrictsa programto operating on only one DB and only allows it to be selected as the program is started-two fairly inconvenient restrictions.



lations may be arguments of procedures, and there are relational operators that correspond to relational calculus. The transition between these bulk operations (selected because of their potential for efficient implementations [Bragger et al. 1983; Jarke and Koch 1982, 1983, 19841) and the iterative control structures of Pascal is made using the new iteration construct for each. Unfortunately this construct is not consistently available for the other repetitive structures in Pascal/R, such as files and arrays. Another mechanism based on four procedures-low, next, high, and eor-allows for explicit control over the traversal of a relation. Figure 18 shows the data definition corresponding to Task 1. Note the similarity with the relational description in Figure 5. This similarity extends to the inability to describe the referential and inheritance semantics shown in Figure 6. However, the notation here allows the component types to be named. This is not necessary, but it is advantageous since it introduces names for types that are needed for programming with these data. Note the difference from the Pascal example in Figure 1 where variants are used, capturing the semantics of the required and exclusive specializations of Part. The version of Pascal/R available did not adhere to data type completeness and allow a relation to be constructed over variant records, presumably because it is then more difficult to define the operations on relations in all circumstances. The best that could be done was to add a few comments. The program in Figure 19 shows how to obtain the cost and mass of each expensive base part without printing the name. This is relatively simple. The program is extended in Figure 20 to find the names of those parts. Here the selection mechanism on the key of PartRec is used to find the appropriate parts. This could also be performed by a nested for each, which represents an explicit (natural) join, as is done in the SQL code in Figure 8. This is an example of a foreign key being used to represent the inheritance between PartRec and BasePartRec.

ACM Computing

Surveys, Vol. 19, No. 2, June 198’7



M, P. Atkinson and 0. P. Buneman type





1 mnrinl; packed array

Dollars Grams

= =

znieger; integer;

domains} numbers}

[l..lG] of char;

{records destined to become tuple types} record

I Part

{one per part, base or composite}

Pno: Pnum; Name: Strinq BasePartRec

end; record


one per bought in part} in ParlRec)

Pno: Prrum: Cost: Dollars; nr ass: crams CompositePartRec

cud; record


Pno: Pnum; AssemblyCost:


AlassIncremenl: elld; = record




en-d; record


one per constructed part} I in ParlRec, not in BaseParfRec}

Grams one per Part/Subpart I in CompositePart}




Pno: Sno:

one for each supplier of each Part} in BasePart} in Suppliers}

Pnum; Snum

end: {suppliers etc} {relation types} = = = = =

PatiRe BasePartRel Com.positePatiRel MadeFromRel SuppliedByRel

relation relation relation

Pso) of ComposilePartRec; Assembly, Componetrl) of MadeFromRec; of SuppliedByRec; I Pao, 50)

{The data base type PartsDB = data II ase Part: PartRel; BaseParf: BaseParlRel; CompositePart: CompositePntiRel; MadeFrom: MndeFromRel; SuppliedBy: SuppliedByRel;

known parts} parts bought in} constructed parts} parts to make a part} who can supply each base part}

.. end; Figure 18.

Task l-describing

the data in Pascal/R.

{ t.he type deck from fig 18) Figure lg. Pascal/R-partial tion to Task 2.



aPartDB: Par/sDB; {relation names into scope} with aPartDO do for each bPart in BaseParl: bPar?.Cosl 2 100 do writ.cln( bPart,Cost, bParl.Jfass )

The automatic typing of intermediate expressions is convenient, and their use avoids ambiguity in the selection clause and the iterated statement. In the example of a function to locate a part, shown in Figure 21, both the relation and the record ACM Computing Surveys, Vol.

19, No.


June 1987

type are used. That example also shows that obtaining a singleton set and then desetting to obtain the one item it holds (a common operation in the applications tried) is not particularly convenient in Pascal/R. Later, we see a construct, the, in

Types and Persistence in Database Programming program

Task2( aParlDB);

VRP aPnriDB:


{the type deck from fig 18)



with aPariDB do for each bPnrl in BasePart: writeln( prl[bPart.Puo].Name, Figure 20.



bParl.Cosl 2 100 do bPart.CCost, bPart~Mass


names into scope}


name of expensive parts in Pascal/R.

TaskJ( aParlDl3);

{type decls from fig. 18) VB~ aPartDB: PartsDB; function findParf(parlName: VBP apart: PartRee; thePa&: with aPartDB do

String): PartRel;



{returns part number] {relation

theParts := [em41 prt in Part: if sizc(thePa&) = 1 then


names now available}

= partName];

begin {desetting}

low( IheParls, aParl); findPari := apart. Pno

end else begin writeln findPart

(‘Ambiguous := MazInl

end elld end;

{error} or unknown part name’); representing t of error}


{of findPart} Figure 21.


Daplex and Adaplex (Section 3.1), that serves this purpose. The Task 3 computation of cost and mass is shown in Figure 22. The effect of this code is to form the natural join of the relations CompositePart and MadeFrom. There is no join operator directly available to the programmer. Instead, the programmer has kept track of one of the contributing tuples and uses its values directly. This also saved the programmer the need to define the type that would result from the join. It is difficult to provide a single operator that can be parameterized to perform this combination of addition, multiplication, and join required for one recursive step of this computation. We shall see a partial solution in Section 6.1.2. The operation CompositePart [ p] used in this example treats the relation as a sparse array rather than as a set. This can also be seen as treating it as an extensionally defined function, a useful idea we shall see again in Daplex (Section 3.1). To avoid the repetitive computation for common substructures, a memo data

to locate a part.

structure13 can be introduced and used, as shown in Figure 23. Note that it is easy to declare and initialize the necessary set of values. The :+ operator is used to add a new result to memo each time a composite part is calculated. The Boolean expression some m in memo (m - ForPart = p) determines whether costAndMass has already been computed for a part with number p and is also used to determine whether a part was a composite part. Another way of doing this would be to use memo[p] in memo where memo[p] returns the empty typle ( ) if the index is not present and this tuple is not considered to be present in any relation.14 It is convenient to be able to declare in the program the memo relation of the type initially specific to the database but to have the instance extant only for each program run. All the same I3 A mechanism originally suggested by Michie [1968]. I’ Consider the type of the empty tuple, since in presumably has type 7 x relation(T) -B 7, it must either be polymorphic or the first argument of in must be a union type.

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

program Task3(aPartDB); . var aPartDB: ParlsDB;

{type deck from fig. 18)

(findPart from fie 211 costAndMass(p: Pnum; var resultCos’2: Dollars; varvs&Mass: var subTolalCost: Dollars; subTotalMass: Grams; mf: MadeFromRec; with aParfDB do



begin if somle bParf in BasePart (bParl.Pno with BascPart[p] do begin

= p BasePart) then {base part case} := Cost; resultMass := Mass {of Base Part case)

resullCosl end else


coslAndMass( Component, subToialCos2, .subTolalMass ); resullCost := resullcosl + Quantity * subTolalCosl; rest&Mass := resultMass + Quanliiu * subTotalMass end -of loop over components} end of Composite Part case} end of scope of aParlDB relation names} end; I of coslAndMass] procedure doTask3; var ilsCos2: Dollars; begin


cos2AndMass( findPart( ‘Mast (ilsCos2, ilsMass)

Grams; ‘), &-Cost, itsMass);

writ& end; begin

{of doTask3)

doTaskS end. Figure 22.


mechanisms are available for dealing with temporary relations, such as memo, as are available for persistent relations. In order to perform the update task, it is necessary to modify the database definition of Figure 18 to include a record of the part numbers used. This is shown in Figure 24, where the required integers are encapsulated in a dummy relation Id. It is unfortunate that type independence was not maintained to the extent that we could write PartsDb = database Part: --. NextPno: Pnum; (the




NextSno: Snum end;

ACM Computing Surveys, Vol. 19, No. 2. June 1987

to use]

to obtain

cost and mass.

directly recording the integers in the database. The update program implementing Task 4 is shown in Figure 25. This code is not typical of an update program, which would be largely concerned with providing a suitable user interface and checking that the user’s input made sense. For example, there is no check here that cycles are not being introduced into the intended DAG. If concurrent use of the database were anticipated, it would be important not to lock too much during user input. However, this problem does not arise with Pascal/R since there is no indication of the extent of a transaction. The implementation actually treats a whole program run as a transaction. One deficiency of Pascal/R not exhibited in the examples is the constraint that only one database may be used in any given program. This seems serious, since people often need to combine data from different databases, produce copies, and so on. The main impact would be on the introduction of names. Since Pascal/R treats databases

Types and Persistence in Database Programming program



TnsP3(aParUIB); type deck from figure 18) to hold previously computed cost etc} 1in CompositePart}



ForPart: TotCost:

Toldfass: end: memoRe = relation “AI‘ aPnrtDB: Pa&D&

Pnum; Dollars;


For-Part) memo:

of memoRec; memoRel;

{findPart procedure

costAndhfoss(p: var subTotalCost: Dollars; with aPartDB do

begiu if BasePart[p]

Pnum; var resuliCost: subTotalhIass: Grams;

in BasePart


from fig. 21)

Dollars; var resulth4ass: mf: h4adeFromRec;


is it a base part?} as for fig 22)

else if Verne m iumemo (m.ForPart = p) then {memoized?} with tnemo[p] do begin {here if already calculated} resullCos1 := TolCosl; resulldlnss := TotUass end else not previously computed} with CompositeParl[p] do begin I here a composite part} resultCost resullhlass




end end end; procedure V&w memo

:= Cobt; := h4as.s;

:+ [(p, resultCost,




as for fig 22) memoize values for this part} of Composite Part case} of scope of aPartDB relation names) of costAndh4ass)

{nothing calculated yet} {as for fig 22)

:= 0;

end. Figure 23.

type Pnum PartRec IdRec

: 1:

= record Nom: (NeziPno, Value: integer


domains as in fig 18) records as in fig 18) I to package a Pnum}


. . );

end; PartRel IdRel PartsDB



as in fig 18)

= relation (Nom) of IdRec; = database Part: Id:


the cost and mass.

Figure 24. Pascal/R-revising description for update.


{as in fig 18) IdRel;

{next identifier

like records, there would be no problem of disambiguating names, and so we assume that a change to permit multiple database use would not be difficult. Many of the restrictions present in Pascal/R have been removed in DBPL [Eckhardt et al. 19851, a successor to Pascal/R and Modula/R. In particular, the restrictions to relation types has been

to use}

removed-arbitrary values may persist, a program can manipulate databases, and the database definition (the type declarations) need not be repeated in the application program. In addition, DBPL supports transactions and has an interesting method of “predicate locking” [Mall et al. 19841 that allows multiple users simultaneously to share and update a common relation,

ACM Computing Surveys, Vol. 19, No. 2, June 1997


M. P. Atkinson and 0. P. Buneman




type definition as in fig 24) fi?ldPart as in fig 22)

var aPartDO: PattsDB; procedure makeSub( assemblyPno, componentPno: begin with aPavtDB do begin if not Part[ComponenfPno] in Part then

eise {Stop the transaction MadeFrom

with an “Unknown

:+ [(assemblyPno,


t the function Pnum;

(I: Poshi






{of makeSub}

end; procedure Main; var newPart: PadRec; cmpntList: relation begin

newComposileParl: (cPno)



CompositePartRec; cPao: Pnum; Qly:




code to obtain from the user and verify the det.ails of a part, t e composite part information, and the list of components, cmpntlist, used in its manufacture together mith quantities.} with aParlDB do begin with Id[ NeztPno ] do begin newPart.Pno:= Value; Part :+ [newPart];

{see fig 24)


:= Value; CompositePari :+ [newComposiiePart]; for each mf in cmpntList do makeSub Value, mf.cPno, mf .qty); Value := Va \ ue+l end end end; begin


of with Id[ NeztPno of with aPartDB} i of Main}



Figure 25. Pascal/R-Task

provided the subsets of tuples that each user needs (or will generate) can be shown to be disjoint and not to violate key constraints. 2.3 Other Languages That Attempted Integration with Relations

Many other languages have been proposed to combine a standard language with relations. A few comments are offered on some that appear in Figure 26. In the late 197Os, there was considerable progress reported with the languages Plain and Rigel. Plain was a particularly ambitious project, tackling a number of issues recognized as relevant to the implementation of interactive information systems. The relations are similar to those in Pascal/R, with similar restrictions as to the attribute types. One extension is to intro-

ACM Computing Surveys, Vol. 19, No. 2, June 1987

4 recording

how a part

is made.

duce a marking, which refers to a subset of some base relation(s) derived by evaluating expressions in the relational algebra over relations and markings. Associative access on the key fields is supported in a fashion equivalent to that in Pascal/R. The provision of exceptions should, in principle, make handling relation access errors easier than in Pascal/R. But standard exceptions for relational operations are not defined in the report [Wasserman et al. 19811. The for-each statement is consistently defined to be applicable to all composite objects, aggregating sets or sequences of values of the same type, rather than just over relations as in Pascal/R. In addition, there is a limited way of specifying the order of iteration, similar to that we see later in Adaplex (Section 3.1). Relational update operators almost identical with those in Pascal/R are available. Persistence greater

Types and Persistence in Database Programming System Pascal/R [Schmidt








Working prototype widely used




Aldat/Mrds [Merrett 19841


Astral [Amble et al. 19791




Rigel [Rowe and Shoens 19791




Theseus [Shopiro

Euclid [Lampson et al. 19771



Extended Pascal [Wasserman and Booster 19811



Modula/R [Koch et al. 19831

Modula-2 [ Wirth 19831


Product in wide use

Adarel [Horowitz

Ada [Ichbiah et al. 19791


Plain [Wasserman

et al. 19811

and Kemper 19831 Figure 26.





existing languages with the relational

than the duration of a program is achieved by binding a declaration to an external object in the environment. It appears that this was only intended for a limited subset of types, such as procedures, modules, and relations. The authors understand that the underlying database handler intended for Plain has been widely used [Kersten and Wasserman 19811. Using the relational calculus rather than algebra, Rigel was built on top of INGRES [Stonebraker et al. 19761. Its type system is reported in Rowe and Shoens [1979]. It had a consistent form of iteration of relations, arrays, and files; user-defined aggregation functions; an interesting treatment of views; and a use of abstraction to encapsulate the database interface, an idea recently resuscitated [Bever and Lockemann 19851. Interest moved from the database facilities to means of facilitating programming interaction against relations via forms [Rowe 19851. The languages Theseus [Shopiro 19791 and Astral [Amble et al. 19791 were designed, but we understand that their implementation has never been completed. Various plans for adding database facilities to Ada have been abandoned in favor of Adaplex, reported later. The suggestion for Ada and its relation, Adarel [Horowitz






and Kemper 19831, is so recent that an implementation is unlikely to be finished yet. It is not known whether the proposal for a persistent Ada [Hall 19831, with a similar notion of persistence to PS-algol (see Section 5.1), is being implemented, but an interest in a similar idea is reported at Computer Corporation of America, implementers of Adaplex. 2.4 Embedded Languages This section, describing existing methods of database programming, would not be complete without some reference to embedded languages. Confronted with the failure of a query language to handle a problem such as Task 3 or with the difficulty of performing a safe update, the user of a relational database would normally turn to an embedding of the query language in some more powerful language such as COBOL, PL/I [Date 1983a], or C [Stonebraker et al. 19761. In such systems, queries can be interspersed with a host language in such a way that makes them recognizable by a precompiler that generates the appropriate external subroutine calls. Since we have been using Pascal as our reference language, the solution to Task 3 is shown in Figure 27 using a hypothetical

ACM Computing

Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson and 0. P. Buneman


procedure coslAndAlass($Pno: inlcyet; vnr resullCos%, resullhfass: integer); var $Cosl, Ohlass, $Componenl, EQuantily, subTotalCost, subToialhlass: integer; begin %SELECT Cosi, hf,zss INTO %Cosl, Odfass FROM BasePart WHERE Pao = $Pno; if ERRORSTATlrS = 0 then {we found a base part) begin resullCosi:=%Cosl; resullAfass:=!§Afass end &C begin {we assume we have a composite part} $SELECT AssemblyCosl, hfasslncwnent INTO $Cost, fh4ass FROM ComposifeParl WHERE Pno = %Pso; resullCosl:= %Cosi; resullhfnss:=$hlass; ILET X DE {Defiae a new relation of subparts} SELECT Quanfily, Componenf INTO OQuanlify, gComponetli FROM MadeFrom WHERE Assembly = $Pno; SOPEN X; {Set cursor to start of relation} while IXl1OlETATUS = 0 do begin SFETCH S; coslAtldhfass($Componc~~t, sub’TotnlCos1, subTola/hlass) resallCosl:=resullCosl + IQuanfily * SubToialCost; resullAlass:=rcsallhfass + $Qunnfily * subToialhlass; $NEXT X end CXld end;

Figum 27.

Part of Task 3 in a hypothetical

embedding of SQL in Pascal. It follows closely the code that would be used for the embedding of SQL in PL/I in System-R [Astrahan et al. 19761. In this example, we have only shown the code for the procedure CostAndMass. The code to drive this procedure would also have to be written in the host language. In this example, a special class of variables, prefixed with a $, serves as communication between the host and query language. Since the host language has no data type appropriate for representing the result of a query when this is a relation with more than one tuple, a cursor (X) is used to traverse the relation in a fashion similar to reading a file.15 The code for this task combines two languages that we have already seen and although logically straightforward, is awkward because the programmer must be biI6 Note that the cursor is not an object in the host language and therefore cannot be declared with an appropriate type.

ACM Computing Surveys, Vol. 19, No. 2, June 1987

embedded language.

lingual. The main problem with embedded languages is that data may only be conveyed across the interface via variables of a very limited set of data types, such as integer, character string, and real. To our knowledge, no attempt has been made to establish a higher level interface by establishing, say, a tuple-to-record or relationto-array correspondence. 3. LANGUAGES INCORPORATING ADVANCED DATA MODELS

To this point we have looked only at the relational and network data models. The failure of these models to capture adequate database semantics is well understood [Codd 19791,and other models, notably the entity-relationship model [Chen 19761, have been widely used as design tools for databases. Surprisingly, the entityrelationship model does not appear to have had a very direct effect on the design of database programming languages. There

Types and Persistence in Database Programming Languages are, however, two data relationships that have been found essential in database design and that have been captured within the type system of several experimental database programming languages (DBPLs). The notion of inheritance has received attention in various relevant fields and goes by various names: In artificial intelligence, ISA hierarchies have been expressed in semantic networks such as KL-One [Brachman 197819831; type (or class) inheritance is an important feature of object-oriented languages such as Simula [Dahl and Nygaard 19661 and Smalltalk [Goldstein and Bobrow 19801; and generalizatioh (or specialization) was suggested for databases by Smith and Smith [1977] and used as the basis for the Semantic Data Model [Hammer and McLeod 1980, 19811. In our test case, an example of inheritance is to be seen in the relationship between BasePart and Part. A BasePart is a special kind of Part and therefore inherits all the attributes of Part. In the relational model, the only way to capture this relationship is by a “foreign key” constraint, which is not usually available in relational schema declarations. We used a variant record in Pascal to express this relationship, but this is, in general, too restrictive since it constrains (a) a part to be a base part or a composite part and (b) no part to be both base and composite. In our example, these are exactly the constraints we require; but compare this with a database that has persons, employees, and students. Both an employee and a student are special kinds of person, but we do not necessarily want either (a) or (b) to hold since a person could be neither a student nor an employee or could be both a student and an employee. In Pascal we can relax constraint (a) by adding a third (empty) variant to the record type, but there is nothing we can do to relax (6). In CODASYL we can use a set that contains two member classes to suggest discriminated unions, as we did in Figure 11; ‘but we need to constrain each set occurrence to have at most one member, and this can only be done by making sure that each updating program respects this rule. There is no



checking mechanism built into CODASYL to enforce this constraint. The other new data relationship is simply that of an extensionally defined function. The relationship of a member to an owner class in CODASYL is many to one, i.e., functional, as is the relationship of a nonkey to a key field in a relation. This leads to the possibility of expressing the database as a collection of functions so that, for example, Name can be thought of as a function from Part to String.” Among the advantages of the functional data model is that extensional (database) functions can be given semantics similar to that of other functions. This makes the functional data model a natural adjunct for functional programming languages, in which functions are values and can be manipulated by other (higher order) functions. Requiring a language to be functional is therefore a natural step toward type completeness. 3.1 Daplex and Adaplex

The functional data model was first exploited in two languages: Daplex [Shipman 19811 and FQL [Buneman et al. 1982a]. The second of these has a polymorphic type system related to that of ML, which we discuss later. Daplex is interesting because it also captures the notion of inheritance. Originally intended as a sublanguage, there was no intention that Daplex possess any degree of type completeness or computational power. It is interesting to note, however, that Task 1, the data description, can be written in Daplex (Figure 28) to capture all the constraints suggested in Figure 6, Tasks 2 and 4 can be written as neatly as in any relational query language, and Task 3 can almost be written. Although Daplex contains a transitive closure operation, it is not quite powerful enough to include the arithmetic operations required here.17 I6 In contrast, PS-algol (Section 5.1) and Amber (Section 5.2) treat records as extensionally defined functions from labels to ualucs so that apart applied to the label Name yields a string result. i’ There is a problem with limited forms of transitive closure. Task 3 was chosen because it is a natural task that shows up the problems of predefined transitive closure operations.

ACM Computing

Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson and 0. P. Buneman



Part () -++ ENTITY Name(Part) + STRING







CompositeParl() Part AssemblyCd(ComposideParl) MassZncremenl( CompositePart) Uses( CompositePart) -++ Part




--) ENTITY -

Part + INTEGER Mass(BaseParl) + INTEGER Cost(BaseParl)


Figure 28.





Task l-a

of U.ses(CompositePart).

An interesting point to make here is that one of the tenets of “object-oriented” programming is that objects-in contrast to ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Uses( CompodePati))


OF Uses(ComposileParl)

Daplex data description

Rather than review Daplex here (the reader is referred to Kulkarni and Atkinson [1983, 19851 for experiments with the language), we shall look at a substantial endeavor to conflate Ada and Daplex in the language Adaplex [Smith et al. 19831. Adaplex has maintained most of Daplex’s data model in that it exploits the functional model and inheritance. Some of the computational structures of Daplex have been omitted since they can be performed in Ada. It should be emphasized that Adaplex is not an embedding of Daplex in Ada in the sense of embedded SQL. It is an extension of Ada to incorporate new data types and control structures corresponding to the functional model as formulated in Daplex but consistent with the design and philosophy of Ada. Although Daplex [Shipman 19811 was a major influence in the design of Adaplex, the notation for defining the database bears little resemblance to the Daplex notation for function declarations shown in Figure 28, but the main changes are surface syntax to comply with Ada convention. For example, the double headed arrow of Daplex becomes set of in Adaplex. Figure 29 shows how the definition of the database (Task 1) is recorded in Adaplex. Unfortunately, unlike Daplex, Adaplex does not allow us to define a derived function, as we do in Figure 28 in recording the fact that Usedln(Part)



for parts data.

tuples in a relation-are not identified by their intrinsic properties. Thus, two ENTITY values in Daplex (and in many of the subsequent languages we examine) can be distinct even if all their defined functions, or attributes, have the same values. Thus, there is no need to declare a Pno(Part) function to distinguish two Part values that may agree on all other attributes. They are distinguished by reference. The operation of extracting the expensive bought-in parts is simply encoded in Adaplex, as is shown in Figure 30, obtaining Name(p) automatically from Part. The inheritance was defined by the subtype statement. Note that by descending allows the simple specification of the order in which the selected entities are processed. Coding Task 3 in Adaplex is more prob1ematic.l’ Figure 31 shows a program based on the algorithm illustrated first in Pascal (Figure 3). The recursive scan of the parts tree is much as it was in Pascal, and the management of getting function values specific to a specialization (subtype) of Parts is similar. The determination of which specialization a part belongs to is different because the specialization structure in Adaplex is more general, allowing overlap, and does not require the programmer to introduce the discriminator. One disadvantage I8 It is stated in Smith et al. [ 19831 that the required procedures (containing entity and set types/manipulations) cannot be declared. Under such constraints Task 3 cannot be coded. We understand this would have been corrected had the implementation of Adaplex continued.

Types and Persistence in Database Programming database






type Supplier, type CompositePart; type Use;

--Needed for forward reference

type type type

--Declare Domains

is l..System’h!az-lnt; is integer, Grams is itrteger,

Poslnt Dollars

type Pari is entity --Declare Entities Name : Striag(1..16); Vsedln : set of ComposifePart; end entity; unique Name within Par-f; subtype cost


hlass SuppliedBy

is Pari entity : Dollars; : Grams;

--Declare Specialisation

: set of Supplier,

end entity; subtype

ComposilePart is Part entity AssembluCosi : Dollars: hiassltrckement : Grams;’ Uses : set of Use;

end entity; Use is entity Component : Pal-l; Assembly : ComposiieParl QlllZt1fify : Poslnt;


end entity; cud


Figure 29.

Task I-Adaplex

of this is the loss of the possibility of encoding the selection using a case clause. The selection of a part given its name is much simpler in Adaplex than in Pascal. It would probably be more efficient also. Note that in the final selection of the Part whose Name is ‘Mast’, the singleton set resulting from the restriction of the original set is coerced to its member, an exception being raised if the set is not a singleton. Also, the function costAndMass is automatically extended to work for sets of Part entities.lg It is not certain whether a Part entity instance may be passed as a parameter as shown in the above example, since Adaplex is quite restrictive about the places in which entity types may be declared and used. For reasons given earlier, it is desirable to improve on the above algorithm by add“This automatic extension of functions on entity types to functions on set types is a form of overloading that is ambiguous in languages that allow “higher order” sets or sequences. For example, what is the meaning of the function Size (cardinality) when applied to a set of sets?

data definition

for parts.

ing a data structure to hold a memo of subassembly properties already calculated. It is unfortunate that this only appears to be possible if the database designer has the foresight to include an appropriate entity type to hold these memos. Although such foresight is, in general, improbable, we assume it occurred for our problem. The fragment of the parts database description then appears as in Figure 32. The algorithm can now be modified to avoid reprocessing CompositeParts, as shown in Figure 33. The provision of sets in Adaplex makes it easy to introduce the appropriate data structure, and the assumed indexing mechanism makes it efficient. The extra effort required to construct this for Pascal, particularly if the number of parts allowed is very large,” probably deters many programmers. The use of an exclude statement to obtain an empty set is somewhat odd. ” Requiring indexing and possibly explicit buffering on disk, for example, when expanding a hospital, aircraft, or ship. ACM Computing Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson


and 0. P. Buneman

with Parts; use Parts: Task2: atomic for each bPart in BasePart


PUTl M-.ne( bParl )); PCrqCost( PUI’(hfass( bPari )); NEW-LINE;

Cosl( bPari bPart

) 2 100 by descending

Co&( bPart) loop


end loon: end ato& Figure 30. --Using an algorithm

Task 2-an

Adaplex program to list expensive parts.

similar to that in Pascal - figure 3

with Parts; use Paris; Task3:

atomic procedure cosiAndhfass(in aPartPort; out resullCosi: Dollars; declare subioialcosiz Dollars; subioialhfass: Grams; if apart is in BasePart then mresullCosi := Cosf( apart ); resullM4ss := hIass( apart )

out resultMass:

resullCosl := AssemblyCosi( aPart ); resultMass := MassIncremeni( for cacb allse in Ilses( aPart ) loop 4Use ),, sublofalCost, subtotalMass ); cosiAndMass( Componeni( resullCost := resulfcost + Qunnlziy( aUse ) * subtoinlcost; resulthfass := resullhlass + Quaniiiy( aUse ) * sublofalhf4ss; end loop; end if; end cosiA ndhf ass; declare




Task 3-calculating Parts


‘Mast ‘, iiscost,


hfemo is entity

ForPart TotCosi

: Part; : Dollars;

end entity; unique Forpart eud Parts; Figure 32.


a part’s cost and mass using Adaplex.

--type declarations


apart );

iisM4ss: Grams;

coslAndhfass(aPari in Part where Name(aPart) PdU~~~.s~st); Pllqifshfass); NEW-LINE; Figure 31.


--to store Tolhloss




: Grams;


Task l-revised

Adaplex data definition

The provision of persistence is associated with implicit entity classes, referred to as the extent of the entity class. In the examples the use of an entity name after in (e.g., Part) is actually an abbreviation for the extent (e.g., Part’extent). These extents automatically persist in the module-like objects introduced by the database construct. It appears that the instances of Memo will persist, which is clearly undesir-

ACM Computing Surveys, Vol. 19, No. 2, June 1987

as in fig 29

for parts.

able. Perhaps exclude Memo from Memo at the end of the transaction would avoid this. The introduction of the construct for transactions-introduced by atomic-is A major step in clarifying the grouping of operations on persistent data., Task 4 can be accomplished as shown in Figure 34. The design of Adaplex has forced us to include the dialogue with the user within the transaction ivith the database-

Types and Persistence in Database Programming



--Using an improved algorithm similar to that in - figure 31 --Using a set to avoid retraversing common subassemblies --Preamble declare


as in fig 31

set of Memo;

procedure cosfAtrdbfass(in aPart:Part; out resulfCosf: Dollars; out resultMass: Grams); declare subtotalCost: Dollars; subtotalMass: Grams; donePart: Memo; if apart is in BasePart then --As in fig 31 . else --Not a base part - Composite donePari := mem in doneParts where ForParf(mem) = apart ; if count( donePari ) = 0 then --test if already memoised resultCost := AssemblyCosi( apart ): --Here for a new subassembly --As in fig 31 include{new Memo(ForPari 3 apart, ToiCost 3 resultCost, TotMass j resultMass)} into doneParts; else resultCost := TolCosf(doneParf); --Here for a subassembly already resultMass := Tolhfass(doneParf); --visited] end if; --End any Composite Part case end if; end costAndMass; -



itsCost: Dollars; &Mass:



--set the memo empty exclude






Figure 33.

as for fig 31

Revised Adaplex process to calculate cost and mass.

with Parts; use Paris; task4:

atomic declare

newName: Slring(l..lG);


newAssemb/yCost: Dollars; newMassIncremenl: Slring(l..lG); newQuantity: Poslnt; cmpntName: values of newName, newAssemblyCosl and newMassIncrement CompositeParl;

Grams; cmpntPart:


--Obtain --Here declare an exception handier in case name already used newPart := xlew ContposilePart( hlassincremeni~newhfasslncrement)



--The set attributes, IlsedIn and Uses, will be automaticallyinitialised to the empty set --The new composite part will have been included in the extents of Part and CompositePart __Now loop obtaining Names of components, and building the parts explosion information cmpnlName := ‘????‘; --A name that is never used newQuantity := 0; --replaced during first exception while cmpnlNanre # ‘NONE’ loop --declare exception handler for name not known; deals with first time and prevents cycles cmpntPart := {apart in Part where Name(aParl) = cmpntName} newUse := llew Use( UsedIn =S newPart, Component =+ cmpnlPart, include newb’se into Uses( newPart); include I new UseI into Usedln( cmpntParl);




--Code to obtain nest component name and quantity --Once for each type of component in the suba.ssembl>

end loop end atomic; Figure 34.

Task I-recording

see Section 1.3 for the pros and cons of this choice. The declarations of the entity and the set types can only occur within the atomic unit. With some labor we could have built corresponding Ada data struc-

the definition

of a composite part.

tures during a dialogue, accumulating input, which is then passed to the atomic unit. This illustrates a problem we first encountered with embedded languages, namely, that if the types of permanent and

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

transient data are constrained to be different, communication between the code handling of one and of the other is potentially contorted by being restricted to the common, usually elementary, data types of the two systems. To check the user’s input eagerly, we would need to run small atomic units for validation. This validation would need to be redone when the final transaction was performed. In most cases the invalidation of concurrent transactions would be unlikely. But we must ask questions such as “Can the situation be recovered when they have occurred?” and “Can the tests be made without performing the updates?” The lack of choice in Adaplex appears to arise from a decision to implement syntactic support for identifying transactions concurrently with a delimitation of the parts of the program in which extensions are in use. Returning to the actual code in Figure 34, the simplicity of creating the new composite part should be noted. As the comments explain, the new objects are automatically inserted into both extents. The set include operation is markedly more convenient than the explicit pointer manipulation seen elsewhere. When the Pascal type system was introduced, it was intended to reduce a class of pointer manipulation errors. For example, it prevents a Part record appearing in the sequence of Use records that represent the parts used in the same assembly or the places in which a part is used. However, those type constraints do not prevent a programmer from accidentally merging two such lists or two lists concerning different parts. The further abstraction to set of in Adaplex eliminates this class of errors. Similarly, the automatic inclusion in the class extents prevents such inclusion from being forgotten or mishandled. It is still a matter for research and experiment that generally useful abstractions can be beneficially built into the language. The counterargument is that, as such facilities are added, the language becomes more specific, harder to implement, and harder to understand. An alternative we explore later via ML (4.1), Poly (4.2), and PS-algol (5.1) is to keep the language kernel small and provide the means of ACM Computing

Surveys, Vol. 19, No. 2, June 1987

building such abstractions. However, in such a case there is no syntactic support, unlike the literals for sets and entities in Adaplex. The exception mechanism of Ada can be neatly exploited in this sort of example. For example, the violation of a uniqueness constraint may raise an exception when the newPart is constructed. Again, the exception mechanism was used for the special case of the first component part name and for any that are unknown. Given that we have sketched rather more of the total task than usual, Adaplex has allowed this task to be coded succinctly and clearly. It is slightly strange that the language does not permit multiple instances of databases of the same type, since corresponding Ada constructs can usually have multiple instances. Perhaps it inherits this from Pascal/R. 3.2 Taxis

Taxis [Mylopoulos et al. 19801 represents the first attempt to exploit inheritance extensively in what was originally intended as a conceptual modeling language. However, its purpose is essentially the same as that of a DBPL, and we shall treat it as such. The basic notion in Taxis is a class that, roughly speaking, can be considered as an element of a type hierarchy. There is a most general class, Any Class, which has as predefined subclasses: Data Class, Formatted Class, Finitely Defined Class, Test Defined Class, Exception Class, Transaction Class. All database programming in Taxis is performed by defining classes. For example, the procedures that we have used in other languages may be represented in Taxis by creating an appropriate instance of a Transaction Class, this being a predefined class whose instances are actions. A Variable Class is similar to a type with an associated extent, much like the entity type of Adaplex. The generalization/specialization hierarchies we have discussed so far correspond to subsets of values or instances. There is a distinct, but related, hierarchy that we might call the instance hierarchy. For example, we might try to refine our

Types and Persistence in Database Programming MetaMetaClass Part-Class isA AnyDalaCInss with attributeProperties highestParlWumberVscd: h’o~li~egalrueItlteger; mazPartNumber Nonh’egaf~aeltlteger eudMetaMetaC1as.v /’ Part-Class */ :

Figure 35.

A metaclass definition

in Taxis.

description of the parts database so that each individually manufactured part has a representative value in the database. In this case we might describe record “Mast27” (the 27th mast to be manufactured) as an instance of the part named ‘mast’, which in turn is an instance of the class (or type) part. Of the various languages investigated here only Taxis allows more than two levels in the instance hierarchy. Other languages maintain two levels such as class/instance or type/value. Using this idea we can attach properties to classes themselves, so that in Figure 35 a MetaClass called Part-Class is defined. Note that as a MetaClass it is itself an instance of MetaMetaClass. This attaches properties to any instance of PartClass, such as Part, that we might subsequently create. Part-Class is defined with properties that should attach to any class of parts. It is not obvious how one would do something equivalent to this in a conventionally typed language. Indeed, the analogy of classes (or metaclasses) with types breaks down on close examination; moreover, the form of inheritance that we shall see in Amber (5.2) is not the same as the inheritance of methods associated with object-oriented programming or the inheritance in Taxis. A Data-Class (see Figure 36) is one that has a modifiable extent (i.e., insertions and deletions are allowed). In defining BasePart we are therefore creating a new DataClass that inherits all the properties of Part. Within these declarations, a key specifies an attribute that must be unique within the class, and characteristics and attribute Properties specify, respectively, fixed and modifiable attributes. The class Use performs the usual many-many linking; in this declaration the separation of characteristics and attribute Properties is somewhat arbitrary. Since there

InfegerClass IniegerClnss InfegerClass




No,,.~‘egcrtioeInteger := {I 0 :: 1000000 Dollars := {I 0 :: 10000 I}; Grams := {I 0 :: 1000000 I);


DataClass aud ParLClnss Part with characteristics pnum: {~1::1000~}; pnome: Char-80 attributeProperties numberk~stoct: PositiveInteger keys partID:(pnum); pnr2Nnnl.eIiey:(pllan2e) endDataClass /* Part */ ; DataClass Base-Part isA Parl with attributeProperties unit-price: Dollnrs; mnss: Grnms CndDataClass /* Base-Par/ */ ; DataClass





attributePropcrties assembly-cost: Dollars; mass-mcremenf: Grams cndDataClass /* Composite-Par?

*/ ;

DataClass Use with characteristics where-ased: Composiie-Part; uses: Part attributeProperties qtrantily: Posztiuelnteger endDataClass /* Use */ ; DataClass Cosf_AtLd_,l~nss_Record with characteristics cosl: Dollars; mnss: Grams endDataClass /* Cost-AIld-~fass-Record

endDataClass ExceptionClass with characteristics which-class: endExceptionClass

/* Supplier-Class



*/ ;

NoRoomException Part-Class /* NoRoomExceptioa

Figure 36.

Data description

*/ ;

in Taxis.

is no independent set type in Taxis, it would be difficult to attach to, say, a Composite-part instance, the set of Use instances with which it is associated, as was done in Adaplex 29. Presumably explicit linked lists could be built by complicating the Use class, as was done for Pascal in Figure 1. In contrast to Adaplex, the union of the extents of Base-Part and Composite-Part

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman TransactionClass with locals




) returns



actions find: for each instance p of Base-Part if p.unit-price > threshold then be&


p.pname.wrile; p.unit-price.wrile; p.nass.writeLine

c6d return nothing

endTransactionClass Figure 37.

/ * Expensive-Parts

Task 2-an

Surveys, Vol. 19, No. 2, June 198’7


expensive parts transaction

is not the same as the extent of Part. It is possible to create an instance of the Part class alone. These declarations do not, however, ensure the mutual exclusion of these two classes. It is worth digressing here to note that it is quite possible that in a different situation one might want to allow a part to be both base and composite; that is, it is one that can both be bought and manufactured. Moreover, it might be desirable to attach further properties to parts that are instances of both classes, such as whether it is more desirable to manufacture them or buy them. Taxis allows us to create new classes that are subclasses of more than one existing class. This is known as multiple inheritance, which is not allowed in Simula or Smalltalk and would be difficult to model using the network or relational structures. To ensure exclusion, assuming we want it, Taxis has an Exception Class, which we could use to raise an exception whenever an attempt was made to create a Base-Part that was already a Composite-Part, and vice versa. An example of an Exception Class, used for a different purpose, is shown here. The use of exceptions and their relationship to database programming is beyond the scope of this paper. The reader is referred to Borgida [ 1985c]. To show a simple transaction, Figure 37 shows an implementation of Task 2. The presence of a specialization hierarchy avoids the need for an explicit join as in the relational examples. Note the use of classes and properties for input and output. For some reason, all transaction

ACM Computing


in Taxis.

classes must have a parameter, and this transaction would be invoked with lOO.Expensive-Parts rather than the conventional Expensive-Parts(lOO). This is consistent with the functional model, which treats attributes as functions that can be applied to values of the appropriate (record) type. The cost and mass calculation in Figure 38 demonstrates a problem with associating type and extent. Since we want this transaction to return pairs of values and since, as in Pascal, there is no pairing operator, we have constructed a special class to describe records containing cost and mass fields. In this case we only want to use this class to function as a type for a parameter. There is no need for the class Cost-AndMass-Record to have an associated extent. However, in Taxis the class necessarily has a persistent extent, and we would presumably have to clean up after performing a Cost-And-Mass computation by deleting every instance of Cost-And-Mass-Record. If we were to add a key to this record, say PartID, we could readily capitalize on the persistence of these records to memoize the transaction. Note that the lack of structure in the data definition appears to mean that we have to iterate over the entire Use class at each call to the transaction. In fact, reverse pointers are kept with each attribute, and it is possible to reduce the size of the set over which the iteration is performed. In an object-oriented approach to the problem of representing the database, it should not be necessary to have an

Types and Persistence in Database Programming Languages



TransactionClass Cost-And-Mass (p: Part ) returns [email protected] witll locals lotaLcost: Dollars: iotal~mnss: Grand; resuli-record: Cost-And-hfass_Rerordd return nothing endl’ransactionClass /* Cost-And-Mass */ ; T;inrtionClass


(p: Base-Par-l ) returns


actions init-cost tolalcosl - p.unit-price; iflil-mass: total-mass + p.mass; Cal&ale: insertobject result-record into Cost-And-Mass-Record with (cosf: total-cost, mass: lotal-mass) return result-record /* not deleted automatically */ endTrausactionClass /* Cost-And-Mass */ ; m;in;lactionClass


(p: Composite-Part

) returns


locals u: Use cm:Cosl-Atld-Mass-Record actions init-cost: totaLcost + p.assembly-cost; it&mass: lolal-mass - p.mass-incremenl; calculate: begin for each instance u of Use do if u.where-used = p then begin cm = u.uses.Cost-And-Mass; total-cost + totaLcost + cm.cost * wquantily ; total-mass + total-mass + cm.mass * u.quantity end ;. insertObJect resullxecord into Cosl-And&lass-Record with (cost: total-cost, mass: total-mass) end return resulbecord /* not deleted automatically */ endTransactionClass /e Cost-And-Mass */ ; Figure 38.

4 recursive transaction

identifier, such as pnum, associated with the Part class. However, it is included here (Figure 39) to show that Taxis can cope with the problem of maintaining persistent data associated with the transaction Generate-Part-Number. The update example (Figure 40) shows a Transaction Class being used as a transaction. Suppose, for example, we tried to create a Use link to a subpart whose name was not known in the database. The attempt to find a part with the appropriate name would fail and abort the whole transaction including the generation of a new part number. partNameKey, declared as a key in the definition of Part, serves as an inverse function from (or attribute of) the class associated with pname (Char-80) to

in Taxis.

Part. Note the use of preconditions in this transaction and the raising of an exception if they are not satisfied. To summarize, Taxis has tried to exploit inheritance to support all aspects of database programming. Although we have some doubt as to whether this can be done without making a large number gf ‘special classes, with a consequent increase in complexity, we should also acknowledge that our tasks were not designed to illustrate all the uses of inheritance. The reader is referred to the source material for some more elegant demonstrations of Taxis [Mylopou10s and Wong, 1980; Mylopoulos et al. 19801.It should also be said that Taxis was originally intended as LJ conceptual language for designing database applicatipns

ACM Computing Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson and 0. P. Buneman



InitPartNumbers (c: Part-Class,ir~itValue:NonNegaiiveItrteger, maxValue:NonNegativeI,lleger) returns None

with /* Assign

each part

class an initial




actions inrt:

begin c.IfighestPartNumberUsed c.matPartNumber +

+ init Value; / * e.g. zero */ maxValue /+ e.g. 9999 */

end return nothing


/+ InitPartNumbers





(c: Part-Class

) returns


actions generate: c.lfighesfPartNumberUsed + c.HighestPartNumberCJsed return c. HighestPartNumberUsed endTransactionClass /* GenerutePnr?h’unrber */

Figure 39. TrrinactionClass


Generating (nil:

+ 1

part numbers in Taxis.

h’one ) returns


locals apart: Composite-Part; subPartName: Char-SO; subPartQuantity: NonNegativeInteger

preconditions full:

(Composite-ParthighestPartNunaberUsed exe NoRoomEzception( Composite-Part)

< Composite-PartmazPartNumber)

actions apart into Composite-Part with (pname: Char-SO.read, assembly-cost: (11 :: 10001}.read, mass-increment: Grams.read, pnum: Composite-Part.GeuemtePartNumber, number-instock: 0 ); handleSubParts: begin subPartName - Char-t?O.read,, subPartQuantity - h’onNegatcveIttteger.read; while subPartQuautity>O do creating:


begin insertobject

u into

Use with

(uses:subPnrtName,partNameKey puantity:subPartQuantify); subPartName subPartQuantity


, where-used:



Char-SO.ren&, c NonNegairveInteger.read

end return






Figure 40.

*/ ;

Task d-adding

and therefore should not be judged on its ability to handle all the implementation details. We now understand that a compiler for the full language is under development (A. Borgida, Communication Regarding Status and Interpretation of Taxis, personal communication, 1985) (see O’Brien [ 19831, Chung [1984], and

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

a new part in Taxis.

Nixon [ 1984]), and a compiler from part of an earlier version of Taxis to Pascal/R was built at Toronto [Nixon 19831. Taxis is reported destined to be part of a major effort in programming environments, where expert tools will assist the programmer in developing an efficient information system in a DBPL (J. W.

Types and Persistence in Database Programming Languages use PartsDB := (type (Itlit ( num drop nonfix *, nonfix mod, nonfix /, nonfix div with g:num * u: Ll& := n~kIlnil(q * repU& u ) ) in Dollars - Unil type and Grams - unit in rcc Parischss Pari Name: string ( and Pno: TIU~ assert within (1, 100000) and WhereUsed := dcrivcd all c in ComposileParts with this into Subpart of C~:,np~ai~isfof c 1 key (Pno) and BaseParts partition of Pnrfs with CompositeParts class BasePart is Part ( and Cosi: Dollars and Alass: Cmms SuppliedBy: var seq Supplier 1 and CompositeParts partition of Parts with BaseParis class ComposileParl is Part ( and var AssemblyCost: Dollars and ~lasslncrev~ent: Gravzs and CmpntList: vzw seq Use an cl (

Use 3 Subpart: Par/ and Quantify: nwn assert

Figure 41.



Task 1 coded in Galileo.

> 0 and infegral


and Suppliers class Supplier 3 ( ... 1

Schmidt, Plans for an Esprit personal communication, 1985).


3.3 Galileo

Among the languages that have adopted a type system designed to make the meaning of the long-term data more apparent is Galileo [Albano et al. 1983, 1985a], which is a research language being developed at the University of Pisa. As such, it is presented alongside Taxis and Adaplex. It has a type system partially derived from an early version of ML [Gordon et al. 1979].‘l The derivation includes the addition of a class and subtype mechanism. These are used in the data description given in Figure *I These examples are fairly free from type declarations since Galileo inherits the type inference mechanism of ML [Milner 19781.

41. In that figure, the PartsDB is made to persist by being preceded by use, which indicates that the bindings being defined are to be stored permanently. In this case the binding of PartsDB to a new environment, which is a set of (name, object : type) bindings. Only one environment exists at the outer level, which is extended implicitly by bindings preceded by use, but Galileo provides environment manipulation facilities such as inner, overlapping, and restricted and derived environments (views?). Any environment may be established as the context for name interpretation for subsequent code, by using enter, as in Figure 42. The program is completely statically type checked (as the source is compiled) and bound to its environment (data, names, and type definitions) at this time, in contrast with Pascal/R (2.2), where the data could

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Bunemun

enter ParlsDE; all b in DnsePnrlswitb Cosr 1 100

specializations of the types of its superclass. They inherit properties-as shown by is Part-and insertion into a superclass is a Figure 42. Task 2 in Galileo without projection. consequence of insertion into a subclass (see Figure 46). The subclass is determined be passed as a parameter, and to PS- as the instance is constructed, a Part canalgol (5.1). The static binding means that not become a BasePart, and a BasePart the only way in which the program can be cannot become a CompositePart. This is transferred to run on some other data col- clearly contrary to commonly encountered lection is in source form. Similarly, it is systems (cf. a person becomes a student), difficult to imagine any means of choosing and our attention is drawn to the independamong databases, other than those that are ence of the decision to provide transition in a common outer environment; and the operators, to revise the allocation of entity relatively common operation of acquiring to subclass, from the decision to provide data from one database to use in another subclasses and/or extents. would require the data to be reduced to the The and construct arranges two declaelementary types that can be passed in an rations to be simultaneously effective and I/O stream. consequently allows the additional fields to In coding Task 1, operators are removed be introduced simultaneously with those from the built-in type num to produce a inherited from the super-type after is; this new abstract data type Unit with dimen- is equivalent to specialization. var indicates sional properties. Two distinct types, fields may be updated; otherwise fields reDollars and Grams, are then defined. These tain their creation time value. Compare represent the properties of those units more with characteristics and attributesatisfactorily than any of the types avail- Properties in Taxis, or c as a prefix of a able in the preceding languages. type in PS-algol (see Section 5.1), to declare The name Parts identifies a cLass,which a variable that is updated only once as is a sequence of instances of the abstract it is created. assert establishes a contype, separately named in this case as Part. straint,22 violations of which raise an It is important to note that the designers exception. derived arranges that a field have chosen to require separate names for is automatically computed when needed the class, and the type of its members, in rather than stored. The seq of construct is contrast to other languages. Members of, a similar to set of in Adaplex, with the corclass can be explicitly created and deleted. responding advantages of reducing explicit Creation implies insertion into the class. It pointer manipulations. is not possible to create two classes over Figure 43 shows a solution for Task 2. the same abstract data type, since reitera- Since the language is intended to execute tion of the data type name or definition statements immediately unless they are generates a distinguished abstract data within a function, the result would be retype. However, the constructor seq of turned immediately, as in a traditional would allow subsets of a class to be repre- query language. However, in this case, the sented and explicitly managed. Iteration language allows a continuum of arbitrarily over a class is shown in Figure 42. complex programs. BaseParts and CompositeParts are deFigure 44 shows the Galileo solution to fined as mutually exclusive subclasses of Task 3. The language is recursive, and so Parts by the partition of construct. Gali- the traversal of the tree is easily organized. leo also has a subset of construct to de- A type cAndM is constructed to pass back scribe overlapping subsets of a class and a the compound result. The procedure for restriction of construct to arrange auto- finding a part is trivial, since get, defined matic insertion into the subclass on the basis of a Boolean expression over constant **We included Pna to illustrate this feature. Pno is properties evaluated when the member is not necessary, since Galileo supports reference to created. The types of a subclass must be members of a class. ACM Computing Surveys, Vol. 19, No. 2, June 1987

Types and Persistence in Database Programming Languages enter

hfass of bPart


Figure 43. Task 2 in Galileo projetting the result to required properties.


for bPart in BaseParts with Cosl 2 100 loop (print Name of bPart; print Cost of bPar< print




PartsDB; use ret type cAndM := (cost: Dollars and mass: Grams) and costAndMass( apart: Part): cAndhf := ( %here for a simple base part% if apart alsoin BaseParts then newcAndhf( Cost of apart, hfass of apart )

else (

%here for a composite part%

VW resultCost := AssemblyCost of apart and var resulthfass := MassIncrement of apart in 6or cmpnt

in cmpnts of apart



var iempcAndM := costAndhfass( Subpart of cmpnl ) resulicosi := resultcost + Quantzty of cmpnt * cosi of tempcAndM resuliMass := resullhfass + Quantity of cmpnt * mass of tempcAndM 1 newcAndhf( resultCost, resulthfass ) ) ))

and findPati( name: string : Part := get Part with Name of d art = uame; itscAndM := costAndMass( findPart( “Mast” )); print cost of itscAndM; print mass of itscAndh4

Figure 44. enter

Task 3 in Galileo.


%a structure

memos class memo ++ ( thePart: Part and itscost: key thePart


and itsMass:


else previousResult


:= get memos with



newmemo( 1 newcAndM(





of previousResult, Figure 45.


to hold the memo%


%cAndM and start of costAndMass definitions’%. %test and handle base case% %composite case% %first test if memoised% iffails %here not in memo yet% %calculate cost and mass as before%



of previousResult %a.3before%


over classes, either returns a unique element or raises an exception. The new fragment of code, to memoize CostAndMass is shown in Figure 45. The get operation to select a memo from memos will fail if this part has not been previously evaluated. Then the alternative code after iffails catches the exception and organizes the calculation. Thus, provision of get suggests that Galileo has a general indexing structure, at least logically.


of Task 3 in Galileo.

The code for Task 4 is sketched in Figure 46. nextPno was in PartsDB and therefore persistent. newCompositePart 23 makes a CompositePart and the corresponding Part, placing them in their subclass and class, respectively, if they pass all the asserted constraints. The evaluation of expressions 23This is a function that generates values of type CompositePart, which is created and given this name when CompositePart is declared a type.

ACM Computing Surveys, Vol. 19, No. 2, June 198’7


M. P. Atkinson and 0. P. Buneman


enter ParfsDl?; %code to set up new values by dialogue% %so that newName contains the name% ‘%newACthe assembly cost% %etc


Tas&( nme: siring, n.AC: Dollars, n11iI: Grams, nCL: seq Use ) ( nez2Pno := nezlPn0 + 1 %issue new part number% newCompositePati( rime, nextPao, nAC, nAfI, nCL ) %now a transaction% TasQ( newName. newAC, newMI, newCL )

Figure 46.

Task 4 in Galileo.

val ret append nil, I) = II at the top level are transactions, so append r (z::r), 1) = z::append( r, I); Task4(. . . ) is a transaction. An exception is raised if it fails to commit; for example, Figure 47. A simple ML function definition. a constraint fails and is not handled. The Where Used information is derived, and so the programmer does not have to arrange gramming language.24For our purposes, its most important property is its polymorphic its construction-a marked improvement type system that allows types to be freely over the other languages we have considparameterized by other types. However, ered. An implementation of this language now there are several other properties that are exists [Albano et al. 1985c], but, at present, relevant to our discussion. In particular, the provision of persistence depends on ML is incrementally compiled. This means that developing an ML program consists saving the workspace. This has limitations described elsewhere (6.2). Recent work on mostly of coding and interactively compilthe language and its implementation is de- ing small programs, and from practical experience, most debugging takes place scribed in Albano et al. [1985b, 1985d]. through interaction with the compiler (in particular the type checker). For the most 4. POLYMORPHISM AND DATABASE part, type declarations are not needed in PROGRAMMING ML, since there is a type inferencing mechAt first sight, languages like ML and Poly, anism [Mimer 19791that determines a type which form the basis for this section, have for each expression on the basis of its enlittle to do with database programming, vironment. since they support limited forms of persistML also has an exception-handling ence and they do not support a bulk data mechanism and a system of modules that type (such as a relation), which is essential is currently under development. The latter for database work. However, we believe is likely to have very important consethat the idea of polymorphic type systems quences for database programming [Caris sufficiently fundamental to the future delli and MacQueen 19851. development of database programming that As an introduction to ML, Figure 47 examination of these languages is essential. shows a function for concatenating two It is worth noting that ML and Poly are lists. Note the use of pattern matching interactive (in the sense that they have (:: is infix cons) to bind parameters. incremental compilers), statically typed In response to this input, the ML comprogramming languages. piler will output a message to the effect

4.1 ML

Based on the typed lambda calculus, ML [Gordon et al. 19791 has been developed to the point where it is now a practical proACM Computing Surveys, Vol. 19, No. 2, June 1987

*’ It is undergoing rapid development and there is an attempt to arrive at an agreed standard, so the reader is warned that versions exist with substantial differences from the version that was used to test these examples [Milner 1983,1984; MacQueen 19851.

Types and Persistence in Database Programming abstypt: (I set = Set of o /isi with val emptysef = Set nil and ret choose(Sef nil) = escape “choose” ] choose(Set(zl::l)) = z/ and ittserl(z. Set n = let‘ val ret ins nil = z::nil] ins(d::l) = if zrti then 2Y::1else z/::ins I in Sel( ins r) end and reduce f(Set nil) I = 2 ] reduce f(Set %::I) z = A t, reducef(Set r) .z); val remove(z,s) = reduce (fun( z/,sI) if r=z/ then SI else insert( ti,s/)) s emptyset; val membe


and BasePart is UseList, Cosl: Dollars, Mass: Grams, Suppliers:> SuppLmt> nnd AnyPart

is [Base: DasePart, Composife: CompositePart]

nnd UseList is [rriC unit,cell: CompositePart, NezfVses:> UseList, NezlVsedln:> nnd PartList

is [niC unit, cell : ]

netl : PartList>]

and SuppList is . .. . and DataIlaseTypc

is Supplist>

= nil, Supplier8 =nil>>)

Figure 68.

Task l-describing

import “PartsDataBaseType” type CompositePart, BasePart, AnyPart, Supplist, DataBaseType value


the data in Amber.

UseList, PartList,

: Type

let newValue = import(“PartsDataBaseFile”)

- newvalue

haa type dynamic

if typeOf(newValue) = DataBaseTypeVal then tl”,” DataBase = coerce newValue to DataBaseType - Perform updates do export(“PartsDataBaseFile”, dynamic else printString(“PartsDataBaseFile corrupted”) Figure 69.


An outline of an update program in Amber.

checks that it has the correct type, produces tion typeOf takes a dynamic object and a (nondynamic) value for that type, and, returns a value of type Type. Thus, in later, writes an updated version of that checking that the dynamic value has the value out to a file. To elaborate, using the correct type, we compare the value protype declarations in the PartsDataBase- duced by typeOf (new Value) with the typeOf Type module, the program first reads in a (an exemplar dynamic type) in the module value from the file PartsDataBaseFile, in Figure 68. It should be emphasized that which is a dynamic value (since dynamic values of type Type are only used as a guide. values are the only values that can persist). The subsequent coercion will fail if the type The next statement checks its type. There of the dynamic value does not match the is a built-in type that is used to describe given type. the type of a dynamic object. Objects of Figure 70 provides a simple example of type Type are not themselves types; they inheritance in Amber. The code for costare just values whose structure reflects the A&Mass is not given here but would be type structure of the language. The func- similar to that used in our previous ACM Computing Surveys, Vol. 19, No. 2, June 1987

Types and Persistence in Database Programming Languages value costAndMass = ret costandMass : AnyPart type

Part =


NameOf = fun p : Part + String is p.name



= fun p : AnyPart


examples (e.g., in Figure 4) that are based on a similar data structure. More interesting is the type declaration for Part. The fields of Part match (both in label and type) fields that are present in BasePart and CompositePart. From this it is inferred that both BasePart and CompositePart are subtypes of Part. This means, for example, that any value that is of type BasePart is also of type Part. Now consider the functions NameOf and MostExpensiveComponent. NameOf is of type Part + String and therefore can also take any object whose type is Part as argument. Thus, since any object of type BasePart is also of type Part, NameOf is also of type BasePart + String and CompositePart + String. On the other hand, MostExpensiveComponent returns a BasePart, and any value of this type is necessarily of type Part. Thus, MostExpensiveComponent is also of type AnyPart + Part. As a result, the expression NameOf (MostExpensiveComponnt


where a is of type AnyPart, is well typed. This is an example of another form of polymorphism, inheritance polymorphism. In general, there is a partial ordering (c) on record types, and if a function f is of typea+r,thenitisalsooftypea’+r’ whenever u c Q’ and T ’ E T. From this ordering we can infer a partial ordering on function types: u+7Ea’+r’

whenever u’

C a

and 7~7’.

In other words, this ordering on functions is antimorwtonic (or contravariant) on the argument type. Cardelli [Cardelli 1984a] provides a semantics for this ordering and shows how inheritance combines naturally with functional programming. It


Figure 70. Illustrating the use of inheritance in Amber.

: String, UsedIn :> UseList>



BasePart is

should be noted that a similar ordering can be placed on variant types. More recently [Cardelli and Wegner 19851, it has been suggested that type hierarchies of this kind can be combined with the parameterized polymorphism in ML and that a typechecking algorithm exists, but type inferencing is not yet understood. This contravariant behavior of function types may seem problematical at first and should be compared with the orderings on classes in Taxis. However, it has recently been shown [Buneman 19851 that if this form of inheritance is carried down to the level of values, a natural typing of relations and other data structures such as the tables of PS-algol may be achieved. 5.3 Persistence and Object-Oriented Programming

In the preceding sections we have seen a need to support incremental type checking, direct reference to objects, and property inheritance over the class-subclass relationship. These are all features of the languages designed to support object-oriented programming, such as Smalltalk [Goldberg and Robson 19831. The combination of object-oriented programming languages with databases is, in fact, an active line of development. A good example is the work on Gemstone and its language OPAL by Copeland and Maier [1984] and Maier et al. [1986], which has the following goals: l


Extensible Data Model. New types should be definable by the programmer, including encapsulation of behavior in code. Database Amenities. Standard database requirements, such as transactions, security, recoverability, concurrent access, and serializability, should be met. ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

be applied when the value is coerced. In both these languages the need for a dynamic binding and a delayed type-checking mechanism was accepted, but, in the cause of eager type checking (early warning of coding errors) and efficiency, it was carefully limited in locality. OPAL, and similar languages, by universally using the one The advantages of starting from an binding mechanism, may actually achieve object-oriented language are identified as simplicity and even greater flexibility. For follows: example, they may cope with the redefinition of existing data. It remains a matter l Modeling Power. There is direct rep- for research whether a judicious mixture of resentation of classes and object type static and incremental binding/checking is hierarchies. more successful at supporting large-scale l Object Identity. There are no problems persistent programming than this universal of referential constraints; moreover, adoption of incremental binding and type many update anomalies cannot occur, checking. since common properties are represented Of course, these issues blur, good comby common substructures [Khoshafian pilation techniques can optimize away and Copeland 19861. much of the dynamic checking [Owoso l Modeling Behavior. Behavior is encap- 1984; Atkinson and Morrison 19861, and a sulated in messagesassociated with object good program development environment types. This means that it is possible to can give early warning of potential errors, rely on a common behavior of the object even as the code is typed. Furthermore, the (exactly one implementation), and revi- object-oriented model can be refined to sion can be achieved by replacing that have new type descriptions based on a themessage handler to obtain a new behav- oretical understanding of polymorphism ior. [Cardelli and Wegner 1985; Zdonik and Wegner 19851.Even in object-oriented prol Classes. In addition to the properties of classes we have displayed earlier (3.1,3.2, gramming, the management of changing and 3.3), the rich repertoire of classes types is not resolved in the context of already defined in libraries are com- persistence [Skarra and Zdonik 19861. mented on. Such inherited culture would Obviously, if an efficient typed persistent system existed, much of an operating sysapply to any successful language. tem could be written in it, and it would l Associating Types with Objects. This mean that accessto operating system facilability enables the modeling of irregular structures, the handling of exceptional ities could be granted to programmers while data, and the writing of routines that still the parameters involved retained their function successfully when data are structure [Balzer 19861. Further development of the integration of object-oriented defined later. systems with other persistent structures, such as relations, depends on a better unIt is this last property of object-oriented systems that is especially relevant in this derstanding of the supporting theory [Buneman and Ohori 1986,1987]. section. In PS-algol, there is a class/pntrtype that arranges for type information to be associated with instances of the classes 5.4 Persistence and Large Databases and checks this upon accessto the instance. All other bindings are checked by type in- Long-lived systems allow time for data to formation associated with identifiers at accumulate. As a result, they have to address the issue of size even more seriously compile time. In Amber, the type dynamic is introduced to arrange for data to be as- than do other computing systems. In part, sociated with a value and for the checks to this can be treated as an implementation l

Environment. There Programming should be at least an interactive interface for defining new objects, writing OPAL routines, executing ad hoc queries expressed in OPAL, a window package, and a procedural interface for languages such as Pascal and C.

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Types arld Persistence in Database Programming

issue and, therefore, largely outside the scope of this paper. But in many cases programmers may have good ideas as to how to manage the issue in the context of their particular application, and therefore a good DBPL will give them tools for doing this. Such tools are often based on the provision of an indexing mechanism. Other mechanisms, not presented here, include methods of data compression and control of data placement. Indexes can be viewed both as a model of storage organization and as a data modeling tool, allowing a partial map to be modeled. Many of the languages examined have certainly had an index mechanism to support some other construct, such as relations in Pascal/R and class extents in Galileo. PS-algol, however, has implemented indexes directly, and they are widely used in that language. Experience there has shown that most instances of these indexes are small, but a few may become very large, and since continuity in the programming is expected, the underlying implementation must be adaptive. We may consider an index to be a partial function of type cx+ 0. This is essentially the same as a function, except that it may return the result no value, and the mapping may be modified. In PS-algol, only two options, string and int, are available for (Y, whereas /3 may range over all the types via the pntr construct. Poly, in fact, also provided an index mechanism, where cx was constrained to integers. The OPAL system (Section 5.3) also provides an indexing class of objects. Recent work [Donahue et al. 19861 reports an extention to the Cedar language [Swinehart et al. 19861 via a procedural interface that is based on an entity, domain, and index model of the data. This provides an implementation of the entity-relationship model [Cattell 19831. A good discussion of the compromises between efficiency and generality is given [Donahue et al. 19861, and the interface makes extensive use of first-class procedures, so that calls on composite objects yield procedures to scan the elements of the composite structure. This has the programming advantage of organizing iteration and the efficiency



advantage of streaming only those objects needed by the program (cf. FQL-Section 6.1.3). The interface clearly allows a greater range of type substitutions for cr, for example, enumerations, than does PS-algol, but it is doubtful whether it permits the full range of types supported by Cedar. As we have remarked before, the difficulty here is that a set of operations, at least equality and less than or hashing, must be defined for the type. When iteration is defined over an index object or any other uniform collection, there is a problem of defining a natural order for the iteration. Various approaches are possible. The sequence can be random and not necessarily repeatable (favored by those who wish to implement using dynamic hash coding), random but repeatable, or defined declaratively when the index type is created, when the index instance is created, or when the iteration begins. Investigating these options and their implementations is clearly another avenue of DBPL research. 5.5 Object-Oriented


Any attempt to combine inheritance with persistence in a system that can manipulate large collections of data can be reasonably classified as an object-oriented database system. Whether a particular language or database management system deserves this apellation depends on one’s view of what an object-oriented language is. Galileo, for example, supports inheritance and persistence but might be disqualified on the ground that Galileo’s types do not correspond to classes, since a method (i.e., a procedure) cannot reside in a type; only the type of the procedure can reside there. In this sense, a class resembles more an abstract data type (see 4.1) in which the procedures are a part of the type declaration, but ML in its present form does not support any form of inheritance. Other distinctions include methods versus procedures and the notion of object identity. In the absence of any form of concurrency, the only distinction between methods and functions is that methods express a limited form of overloading; that is, the code that is executed ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

depends on the class in which the method is defined. Another topical issue is that of “object identity” [Copeland and Khoshafian 19871. Whether or not this is equivalent to the notion of reference types, which are common in programming languages, remains to be seen. However, if we take a more liberal attitude to what an object-oriented database is, then a number of systems fall into this category. For example, Postgres [Stonebraker and Rowe 19861allows, among other things, relations to contain procedures thereby allowing methods to be held in the database. A component of the EXODUS system [Carey et al. 19861 is the language E, an extention of the language C++ [Stroustrup 19861 that maintains persistent classes and persistent extents. In order to maintain compatibility with C++, types (and classes) are grouped into nonpersistent and persistent categories; however, each nonpersistent type has a persistent counterpart so that persistence, although not entirely uniform, is quite general. Trellis/Owl [O’Brien et al. 19861 and Vbase [Ontologic, Inc. 19861 are also object-oriented systems that include some form of persistence. In addition, there are a number of recent efforts to understand the theory of types and classes for objectoriented databases [Bruce and Wegner 1987; Ohori 1987; Richard and Velez 19871. 6. RELATED LANGUAGES AND INTERFACES 6.1 Approaches Based on Logic and Algebra

Given the substantial recent activity in relating logic programming to databases, it is perhaps surprising that we have relegated a review of these approaches to a short summary. The reason for doing this is that most of the recent work in this area has focused on implementation; the study of type systems or persistence in these languages is still in its infancy. Moreover, there are already substantial surveys, which are cited below, that cover both the logic and algebraic approach to databases. Nevertheless, these approaches provide a terse and (arguably) clear method of spec-

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

ezpensiue(N,C,M) Figure 71.

:- pati(



Task 2, expensive parts, in Prolog.

above X,Y):-madeFron~(X,Y,-) above I X,2):-madeFwm(X, Y,-),aloue( Y,Z) Figure 72.

A transitive

closure in Prolog.

ifying database queries, and a brief review of the relevant languages and how we would use them for some of our tasks is appropriate. This is especially important because logic and functional programming promise to have a serious influence on the future of DBPLs. 6.1.1 Logic Programming

There is a well-established and elegant connection between logic programming and relational databases [Gallaire and Minker 1978; Gallaire et al. 19841. In particular, one can represent a relation as a set of simple predicates, all of the same arity, whose variables are all constants. Referring to the relational schema of Figure 5, the tupes of each relation provides us with a predicate such as part (123, ‘Must ‘) to assert the existence of a tuple in the Part relation with Pname ‘Mast’ and Pno 123. The database is now represented as a set of such predicates, and queries such as those required for Task 2 can be readily formulated in a language like Prolog [Clocksin and Mellish 19811 in Figure 71. Posing the “query” expensive (N, C, M) will cause all the bindings of N, C, and M that satisfy the predicate to be printed. A relatively simple precursor to Task 3 is to ask for the transitive closure of the part-subpart relationship. This is shown in Figure 72 where the predicate above(x, y) indicates (compare with Figure 49) whether the part x requires the part y somewhere in its manufacture. The problem of optimizing such queries is the subject of considerable current research in databases [Bancilhon and Ramakrishnan 19861. The problem is not so simple when we want to complete Task 3, since this involves the introduction of functions into the query. A related ex-

Types and Persistence in Database Programming costAndMass(Pname, TofalCosl, parl(Puame,Cost,hfass), (Pnnme, SubCosi, SubMass), TotalCost is Cost + SubCost, TotalMass is Mass + Subhfass.





% make sure there are no existing “assignments”

and initialize


subCostAlldh~ass(Pname, _, -) :re~raclall(increnlerlls( Pname,-,-)), asseti(bindings(Pname,O,O)),

fail. % otherwise iterate subCoslAndhfass( Pname, _, -) :madeFrom(Pname, Component, Qty), costAudMass(Component, SubCostl, SubMassl), rebind(Pname, Qly, SubCosll, Subhfassl),

fail. % set final values subCosiAndMass(Pname, SubCost, Subhfass) bindings(Puame, SulCost, SubMass),



% rebind variables rebind(Pname, Qty, SubCosll, Subhdassl) :relract( bindmgs( Pname, OldCost, OldMass)), NewCost is OldCost + Qty * SubCostI, NewMass is Oldhfass + Qly * Subhfassl, assert( bindings( Pname, NewCost, Newhlass)),

%get existing values %store new values


Figure 73.

Task 3 in Prolog using program control.

ample is presented in Clocksin and Mellish [1981], but this uses a list structure to encode the part-subpart relationship instead of directly representing the MadeFrom relation as a set of predicates. In Figure 73 the computation is performed by forcing the program, through the use of cut (! ) and fail, to iterate over the subparts of a part. The need for modifiable variables is met by using assert and retract, which effectively store these values in the “database.” It is interesting to compare this use of the database with Taxis in Figure 39, where we had to use the database for temporary storage of a different nature. This complexity can be avoided with the use of a list structure to control the iteration over the subparts. The predicate bagof accumulates in the list TupZes the set of all SubCost 1, SubMass 1) pairs that satisfy the intermediate predicate that recursively computes cost and masses of subparts. A generic predicate such as bagof, which accumulates some or all of the bindings that satisfy a specified goal, is clearly desirable for the forms of set and bag manipulations that are common in database query lan-

guages. The implementation of such a predicate is presumably accomplished through the use of side effects of the form shown in Figure 74. Metapredicates are useful in the task of memoizing; in fact, for an arbitrary predicate Goal, we can write a memo predicate memo(Goal) as shown in Figure 75. Whereas this will take care of the exponential running time that results from pathoIogical examples without memoization, one should not expect the efficiency that is achieved in other examples that have an explicit index structure to implement the memo function. The connection between database constraints (functional dependencies, multivalued dependencies, etc.) and types is a subject that must be more fully explored. It is possible to model database constraints very elegantly in logic programming and to devise systems that do not allow the addition of a predicate-a form of updateshould it violate these constraints. However, there is nothing corresponding to static type checking available for logic programming, although proposals for such a

ACM Computing Surveys, Vol. 19, No. 2, June 1987


M. P. Atkinson and 0. P. Buneman


cosiAndMass(Pname, TotalCost, TotalMass) :part(Pname,Cost,Mass), subCostAndMass(Pname, SubCost, SubMass), ToialCost is Cod + SubCost, TotalMass is Mass + SubMass.

Figure 74. Task 4 in Prolog using a list to accumulate values.

subCostAndMass(Pname, SubCost, SubMass) :bago (SubCostl, SubMassl), %Collect results in a “bag” ‘I madeFrvm( Pname, Component, Qty), costAndMass(Component, SubCost2, SubMass2), SubCod is SubCost * Qfy, SubMass is SubMass * Qfy), %W, sumTuples( Tuples, SubCosi, SubMass). sumTuples([,O,O). sumTuples([ susu$p$

C,hf-Rest], CSum, MSum):Rest, CRest, Cmass), d + CRest MSum is M + hfRest.

memo( Goal) :- recall( Goal). memo( Goal) :;jda(lwall( Goal)), asseh( recall( Goal)).

Figure 75.

with implementations of logic programming systems that can be combined with database management are likely to include novel approaches to types and persistence.

A general memoizing schema in Prolog.

6.1.2 Aldat

type checker have been given by Mycroft [1984]; moreover the use of modules in connection with logic programming has been suggested in Goguen and Meseguer [ 19841and Miller [ 19861.Another difficulty that may occur when scaling up logic programming is that specifying parameters positionally can be expansive in real applications, where relations tend to get extremely “fat”; that is, they contain a large number of columns. An interesting development is recent work [Ait-Kaci and Nasr 19851 in which inheritance can be used in logic programming. Among other things, this permits the use of a keyword notation that works more naturally with conventional relational query languages. Thus, although logic programming clearly deserves consideration as a database programming language, the issues addressed in current research have little to do with the issues of types and persistence that are the focus of this paper. Current implementations of logic programming languages deal with persistence in very much the same fashion as LISP implementations. However, we expect that this situation will change in the near future, and progress

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Aldat [Merrett 1977, 1984, 1985b] is an attempt to build a language that is based predominantly on relational algebra, but with extensions to give more expressive power. A similar approach was adopted for a tutorial system, RDB, in relational algebra [Nikhil 19821. These languages contrast with other examples of relational DBPLs, which are basically general-purpose programming languages that have an added relational data type. The idea is to develop the relational algebra by the addition of operators, function definitions, and some form of procedural evaluation to provide, in a sense, the expressive power of other programming languages. Aldat demonstrates that a language in which the only available data types are relations and the usual base types, such as integer and string, can be used for a number of tasks that one would not normally associate with database work. In Merrett [1985b] there are a number of examples of applications of this extended relational algebra to areas as diverse as inferencing, text processing, and geometric computation. Aldat should be compared closely with APL, which similarly demonstrates the power of using an array

Types and Persistence in Database Programming Name, Cost, Mass in (Part ijoin Figure 76.

BasePart) where




Cost 2 100

Task 2 in Aldat.

Above + (Assembly,Component in MadeFrom) ujoin (Above [Component icomp Assembly] (A ssembly, Component in MadeFrom)) Figure 77.

A transitive

closure in Aldat.

let Assembl be Assembly; Compl be Componenl; Qlyl be Qly. let Qty2 be equiv + of (Qtyl x Qty) by (Assembly,Compl); let Qly3 be Qly + Qty3 Above + Assembly,Component,Qty3 in (MadeFrom [Assembly,Component ujoin Assembly,Compl] (Assembly,Compl,QtyP in Above [Component ijoin Assembl] (Assembl,Compl,Qtyl in MadeFrom))) CostRel + Pno,Cost in BasePart [Pno,Cost,Mass ujoin Pno,AssemblyCost,MassIncrement]CompositePart let SubCost be Qty x Cost let SubCost be equiv + of SubCost let TotalCost be SubCost + Cost

by Assembly

SubCostRel + Assembly,SubCost2 in Above [Component ijoin Pno] CostRel TotalCostRel+ Pno,TotalCost in CostRel [Pno,TotalCost ujoin Assemb,SubcostZ] SubCostRel Figure 78.

A partial

solution to Task 3 in Aldat.

data type with a suitably rich set of operators. The database schema declarations are essentially those given in Figure 5 and need no further elaboration here. Task 2 is shown in Figure 76. The syntax is similar to the SELECT.. . FROM.. . WHERE of SQL. ijoin performs a natural join but can also be extended to relabel the columns. To embark on the solution to Task 3, we follow the previous development of a Prolog program and first give in Figure 77 a solution to the transitive closure problem. This is a breadth-first solution to the problem in which the transitive closure is formed by recursively joining the relation Above with MadeFrom. The infix function [Component relabels the joins icomp Assembly] on the named columns and then drops that column from the relation. ujoin performs a union after the appropriate relabeling. A partial solution to Task 3 is given in Figure 78. The first step is to obtain a relation Above (Assembly, Component, Qty), which gives the total number of times

a given component part is used anywhere in the manufacture of a part. The general structure follows that of the transitive closure given before; however, a number of declarations to provide relabelings and functions on domains are needed. The form by provides, like GROUP equiv...of... BY in SQL, a method of reducing columns by some function such as addition. The second stage is to form a combined cost relation for base and composite parts in CostRel; this is done by relabeling and taking the union. Finally, the Cost in this relation is multiplied by the Qty of Above and then reduced to obtain the total cost of all (direct and indirect) subparts in SubCost 2 of the relation SubCostRel, which is then added to the cost of manufacture to obtain, for every part, the total cost of manufacture in the Cost column of Fin&Cost.

Only the total cost of manufacture has been computed here. One could clearly write an additional computation to find the total mass and join the resulting relations

ACM Computing Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

Figure 79. Task 2 expressed in set notation and FQL.


Mass(z), Cosl(r))l


l([Cosl, 100 ] . 1)

I E !BasePart,

Cost(z) 2 100)


together. Performing the two computations specific database management systems of together would indicate that some further both these species. The implementation is pairing functions are needed and that the based on a lazy evaluation scheme [Henderson and Morris 19761, which automatialgebra for relations may not be entirely independent of the algebra of operations cally performs many of the optimizations required in database programming. A rerequired on domains. The efficiency of this solution is, of cent experimental implementation [Nikhil course, questionable. Whether this is a 19841 has used a polymorphic type system more efficient solution than the recursive and type inferencing, similar to that of ML programs we have presented elsewhere de- (Section 4.1), to check the types of database pends, among other things, on the structure queries before they are evaluated. Figure 79 shows the computation for of the database. A database for which the number of parts is large but for which Task 2 represented first in quasi-mathethe MadeFrom relation is small would not matical notation like the “Z-F” expressions be appropriate for this method. However, used by Turner in Miranda [Turner 19861 manufacturing systems are frequently set and then in FQL. Note that in the presence up to produce one or two complete assem- of inheritance, functions like Name do not require an explicit join or dereferencing blies (e.g., a car). In such cases,the solution presented here would be competitive with when they are applied to a BasePart. If the memoized versions we have examined there is no inheritance, as in the CODAearlier. Note that in this method we are SYL representation of the database, an apimplicitly memoizing the intermediate re- propriate function must be written in FQL to do the dereferencing. The set notation is sults. In contrast to Prolog, part of the com- self-explanatory and is syntactically quite plexity of the Aldat solution arises from the close to SQL. The notation ! BasePart deneed to relabel columns. Any language scribes the extent (rather than the type) We follow based on the relational algebra needs to associated with BasePart. Turner in assuming that this extent is a address this problem. sequence rather than a set, although in these examples the distinction is unimpor6.1.3 Functional Algebras tant. The FQL expression on the line below Since the relational algebra has proved so shows how this query can be implemented useful in the implementation and optimiusing a few typed higher order functions: zation of relational queries and can be extended, as we have just seen, to a general l Functional composition (. ). If f is a funcprogramming language, it is natural to ask tion of type (Y + /3 and g is of type whether a similar algebra exists for the /? + y, then f . g is of type (Y+ y. Note functional data model. FQL [Buneman et that this is reverse Polish syntax. al. 1982a] is a functional algebra designed l Selection ( ] ). If f is a predicate, that is, a for the implementation and optimization of function of type (Y+ Bool, then ] f is of functional languages. Loosely based on FP type Seq [LY]+ Seq [LY].] f selects from the [Backus 19781, the language treats the sequence those elements that satisfy f. database as a collection of extensionally defined functions that may be set valued in 31Susan Davidson has pointed out to us that the essentially the same fashion as Daplex “safeness” condition used to bound queries in the tuple (Figure 28). It is possible to represent (with calculus is related to Turner’s condition that every variable in a Z-F expression must be bound by at least varying degrees of convenience) both the one generator. However, Turner’s “sets” are unrelational and CODASYL data models in bounded sequences, which allows for a greater degree this way, and interfaces have been built to of expressive power. ACM Computing Surveys, Vol. 19, No. 2, June 1987

Types and Persistence in Database Programming CCost(z) = if IsEaseParl(z) CMass(z) = if IsBaseParl(t) TotalCosf(+) SumV(zl, C2(Zl>f2, Mm&


then Cost(t) else AssemblyCosi(z) then Mass(t) e/se Masslncremenl(t)

= GCosf(z) + C{ Qfy( u) x Tota/Cosf( Component(u))

1 uE Uses(z))

yl , (a,y2) = @I+ 22, ~1 + ~2) . ..)) ... 1 = Sum2(21,Sum2(+2, (Yl, Y2)) = (I x Yl, 2 x Y2)

CostAndMass(z = Sum!?(( GCosf 2z), GMass(z)), Figure 80.

x2{ Mu/2( Qly( u), CoslAndMass( Component( u))) 1 us Uses(+)})


and complete solutions to Task 3 in set notation.

GCost=(IsBasePart,Cosi,AssemblyCosl] GMass=(IsBasePari,Mass,MassIncrement] TotalCost = [GCost,Uses Sum2 = [#l Mu/2


= [1#1,#2


#1,#2. . #l)



[Qty,Component #l] mul,

+, [#2. [#1,#2

= [GCost,Uses Figure 81.



#1,#2 #2]

TotalCost] #2] mull


mul . /i-l







and complete solutions to Task 3 in FQL.

Mapping (*). If f is of type cv+ /3, then *f is of type Seq[a] -+ Seq[P], which applies f to each element of its argument to produce a sequence of results. This operation is not explicit in Adaplex (see 3.1.) Tupling ([. . .I). If fi is of type (Y + &, f2 is of type a! -+ & and so on, then [fl, f2, - * .I is of type (Y---* & x p2 . . . . A tuple of functions is applied to a single argument to produce a tuple of results. Although one or two additional higher order functions are useful, the functions listed above are the basis for most expressions in FQL. In addition, there are the functions associated with lists (hd, tl, etc.), selectors #l, #2, . . . for tuples, and the usual complement of arithmetic functions. The FQL query in Figure 79 is understood by being read from right to left. !Basepart is a constant-valued function that produces all the objects of this type. This is then filtered by the predicate [Cost, 1001 . 2. This applies both Cost and the constant-valued function 100 to each part and compares the results (composition with 2). Finally, the tuple of functions [Name, Mass, Cost] is applied to each member of the sequence to produce the required output consisting of a sequence of tuples.

The expression of Task 3 in set notation is shown in Figure 80 and is again straightforward. We start by defining, as before, Gcost and GMass functions that operate independently of whether a part is a base part or a composite part. We have also assumed that the function Uses gives a set of Use objects. This is not quite consistent with the Daplex definition but does correspond to the Adaplex and Galileo setvalued attributes. The first expression describes the evaluation of TotalCost. By then defining some simple new arithmetic functions, the computation of CostAndMass can be described in exactly the same fashion. The relationship between the set notation for Task 3 and the FQL code in Figure 81 should be apparent. FQL has not been developed with its own database management system but has been used as an interface to other existing relational or CODASYL systems. The implementation of GCost and Gmass depends on the details of the underlying database, but it is simple to do for either the relational schema or the network schema described in Figure 11. Since formal parameters are not used in FQL, the implementation of Sum2 and Mu12 are performed by a slightly cumbersome selection from the relevant tuples; ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

however, the presence of a generic reduction operator (/) makes the definition of the function to sum sequences of pairs unnecessary. We do not propose to indulge in the argument about whether algebra or logic (or something else) is the best metaphor for database programming. It seems clear that, whereas logic is essential for specifying programs, algebra is required in the implementation. This is certainly true for the relational algebra and also for the functional data model in which simple optimizations such as *f . :g = *(f - g) are immediately apparent in the algebraic syntax, but much harder to recognize in procedural code. Both approaches can be criticized for their failure to implement satisfactory forms of update and persistence. We should note that, although a rather unsatisfactory account of update can be given through the use of metapredicates or functions such as assert or eval, these are far too unconstrained and open to abuse and are of no help in defining transactions. Moreover, the only notion of persistence is that of workspaces, which we now discuss.

ments is that many of the services normally provided by the operating system, such as editors and links to files, are available in the language environment itself and should be callable from the language. This means that some attempt to deal with persistence must be made within the language or language environment. In this context the programming environments for LISP [McCarthy 1962; Teitelman 19751and APL [Iverson 19791 are interesting. The simplest form of persistence, provided by all useful interpreters for these languages, is some form of checkpointrestart instruction. By using these instructions a programmer may save the current state of the environment and resume it later. This is a simple form of all-ornothing persistence, a more sophisticated version of which is provided by Poly (see Section 4.2). LISP is more interesting because certain objects such as functions (SExpressions) have a textual representation. These can be saved in files and read in again in some subsequent session. By grouping collections of such objects in files, a degree of modularity can be obtained. However, the data structures that one normally associates with databases cannot easily be flattened into a textual represen6.2 Persistence and Workspaces tation. Therefore, special-purpose algoMany of the languages we discuss in this rithms need to be developed for certain paper are designed to be used interactively. classes of structure. This can be done, but In practice this means that the environ- there is a consequent loss of generality of ment provided is one that the user, in the the form of persistence that is available. normal course of work, never leaves. EditAPL deserves special mention both being, compiling/interpreting, debugging, and cause of its type checking and its persistso on, are all performed within the language ence. Although it was designed primarily as environment. Even though most database a scientific programming language, it has query systems fail to provide a complete been widely used as a programming lanenvironment, there is no reason why this guage for the implementation of small- and could not be done. Although it can be medium-scale databases. One of the reaclaimed that there is no difference in prin- sons for this is that it was one of the first ciple between this form of programming sophisticated interactive languages to beenvironment and the interactive interface come available on the brands of hardware provided by many operating systems for the on which databases are commonly needed. traditional edit-compile-run cycle used for Another is that its workspace mechanism many programming languages, there are made it particularly easy to partition and qualitative differences that are important share data. for our examination of database languages, In APL, the only nonprimitive data types especially with respect to persistence. are functions and arrays. There are no refParticularly important in the design of erence types. Every data structure is thereinteractive languages and their environ- fore flat, and a workspace is no more than ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Types and Persistence in Database Programming




ficient number of these languages are in a state in which they can be used in anger, we will be in a better position to make more accurate comparisons and perhaps to influence subsequent designs. Moreover we will be better able to understand the implementation problems that are common to database programming. The importance of a survey like this is not to produce detailed comparisons and (array of real) or (array of integer) analyses but to identify some common research themes or principles that we have rather than perceived in most of the languages we have array of (real or integer) discussed and that are surely in need of Type checking and optimization need further development. To recapitulate, they only be done once for addition of arrays, as are opposed to once for each addition of pairs of elements. This is in contrast to LISP, (1) The need for a uniform language. There should be no major linguistic barriers where there is no mechanism for imposing to the development of programs that a uniform type on the elements of lists. are computationally complex. APL programs that are written to exploit arrays and operators will therefore have (2) The provision of a mechanism to control persistence that is independent of a very small overhead for this form of type. dynamic type checking. The flatness of APL structures is similar (3) A built-in abstract data type or family of types to represent the regularity of to the first-normal-form requirement for large volumes of data. relational databases. Given this, it is surprising that no one has yet attempted to (4) A polymorphic indexing or retrieval produce a relational database programming mechanism for efficient implementaenvironment based on that of APL. One tion. It is possible for this to be comreason may be that efficient implementabined with, or subsumed by (3). tion of relational operators requires an (5) Programs or procedures should be elaborate overhead of nonflat data structyped objects and uniformly treated as tures, and these may be difficult or expenvalues that may persist. sive to disentangle when copying objects (6) The type system must represent some from one workspace to another. form of inheritance. (7) The types or modules must permit 7. CONCLUSIONS some form of incremental or dynamic binding. The impetus to write this paper was the authors’ conviction of the importance of (8) As much static type checking should be programming language design. At the outperformed as is consistent with (7). set of this paper we gave three criteria (9) A notation should be provided to sigaccording to which we intended to assess nify “variables” that receive a value various designs for database programming computed at the time they are created languages: data type completeness, persistbut that cannot be updated subseence, and expressive power. An obvious way quently. to summarize would therefore be to produce In addition, there are equally important a consumer guide based on these criteria; but there would be little point to this since issues that we have not discussed here. most of the languages we have surveyed are These include locking and sharing mechanot yet on the market and in many cases nisms, privacy control, transactions, and the designs are not complete. When a suf- recovery. These topics deserve a separate a collection of such structures, together with their names. The penalty for this organization is that global references within functions cannot be bound statically and the language exhibits the usual anomalies of dynamic binding. Type checking in APL is also interesting because the only bulk data type, the array, is uniformly typed. Thus, the type of A and B in A + B is

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

survey in the context of database programming and will certainly have equally difficult research problems. We shall briefly elaborate on these points here, but we should stress that, although this paper is about design, implementation is equally important. It took some 10 years for relational technology to develop to the point at which relational systems could compete in efficiency with existing database management systems, and we are only now in a position to understand the advantages and drawbacks of relational database programming languages and to use them as a basis for the discussions of the next wave of programming languages. If the ideas examined here are to be taken seriously, it would be a great pity if we were to have to wait another 10 years in order to understand what further research is needed.

We claimed in the Introduction that we saw no reason why a language should not be both powerful and simple enough for the uninitiated user, and we believe that languages such as SASL and Prolog have convincingly demonstrated this for general computations in languages with simple type systems. In Galileo (see 3.3), we saw that simple operations such as Task 2 could be written as straightforwardly and as succinctly as in a relational query language. ML (see 4.1) shared this brevity, but we were also able to express Task 3, a much more complicated computation, equally succinctly after designing the appropriate abstract data type to support general computations of that class. The only problem we encountered with languages (such as ML) whose syntax is based on the lambda calculus is that the terseness of this syntax often means that what are really syntax errors give rise to 7.1 Expressive Power incorrect programs that are syntactically Since we had assumed that all languages well formed but ill typed. The type-error would provide adequate power, we expected messages that are caused, say, by a misthat this issue would be quickly laid to rest. placed parenthesis or by forgetting the Recall that Task 2 was introduced (see precedence of operators can be extremely Section 1) to test whether a simple query confusing, even for an experienced user. could be expressed simply, whereas Task 3 One of us has been involved in experiments was intended to discover whether a lan- [Kaplan 19831 with Turner’s Z-F notation guage’s expressive power was at all limited. for database iterations [Turner 19811along Although it is true that all the languages the lines suggested in Section 6.1.3. Not beyond the relational query languages ap- only is it extremely simple to understand, pear to provide the power of a Turing ma- being syntactically quite close to the relachine, several omissions were discovered tional query languages, it is also convenient that might have been pinpointed by putting for several forms of high-level optimization a language design out to independent tests. [Nikhil 19841. Initially, it was not clear, for example, Languages such as Galileo, ML, and whether recursive transactions were possi- Miranda suggest that it is now possible to ble in Taxis (see 3.2). In Adaplex (see 3.1), produce, in languages with rich type systhe lack of temporary entity types made tems, the same sort of smooth learning the coding of the memoized version of Task curve for the user to climb in developing 3 particularly cumbersome.32In general, all progressively more complicated programs the computations we demanded appeared that is exhibited by SASL, Prolog, and possible but sometimes so awkward as to other simple interactive languages. It is be impracticable, and the awkwardness was important to continue this principle, espeinvariably associated with lack of type comcially where interaction with complex data pleteness. types, such as database schemas, becomes important. A major problem with these languages is that, although the treatment of a’ We understand that the designs of both these lansimple correct programs may be simple, inguages have recently been changed to correct these terpreting the type or run-time errors that failings.

ACM Computing

Surveys, Vol. 19, No. 2, June 1987

Types and Persistence in Database Programming Languages result from incorrect ones often requires expert understanding of the language. To summarize, we believe that there is no reason to limit the power of a database programming language on the grounds of simplicity, nor have we seen that there are gains in overall efficiency that result from allowing only a limited subset of operators. On the contrary, the inefficiencies introduced by having to switch language, program around the deficiency, or move data between programs, when the database language is not powerful enough for a given task, are much more serious. We suspect the same is true for the design of database machines. They should implement certain operations very efficiently but should also interpret a language that is powerful enough to express an arbitrary database program. 7.2 Data Type Completeness

There is no need to elaborate any further on the failure of data type completeness. The challenge to the designer of a new or newly extended programming language is to produce the appropriate set of base types and type constructors. The constructors are, of course, closely related to the choice of data model, whereas base types may be related to application targets, and both might be expected to vary. Looking at the more recent languages (see Sections 3, 4, and 5), we find that there is considerable convergence in the choice of constructors. In particular, all have an indexing mechanism, which in some is directly exposed as a type; all have some notion of a ckrss; and most have attempted to deal with inheritance. 7.2.1 Indexes and Bulk Structures

The usefulness of a generic index type in programming languages is self-evident and was the basis of several early database management systems. It is noteworthy that many database systems continue to be built on the basis of a package that provides for nothing more than the implementation of an efficient index. In PS-algol (Section 7.1)



and Poly (Section 4.2), the structure was insufficiently generic because the key type was restricted. In Pascal/R (Section 2.2) the relation could be treated as a sparse multidimensional array and subscripted on a key tuple. Adaplex and Galileo do not make the structure explicitly available but follow relational query systems in using it as an optimization technique. The design of these indexing constructs is invariably closely linked with the bulk data constructors, such as relations. But this is only one context in which they may be useful, and it remains an open question as to whether they should have a separate type or whether both types, relations and indexes, can be subsumed in one generic type. Iterators and some collection of bulk operators for these structures are clearly essential, but what form should a “place holder” or “cursor” have and how should the order of iteration be specified? In fact, the questions of whether the bulk type should have an ordering and whether cursors are even necessary have yet to be resolved. These, as well as problems of implementation, are research issues. In particular, our experience has shown that the majority of instances of “bulk” types are small and only a few are very large, so that many implementation strategies have excessive overheads. 7.2.2 Classes

In Task 1 we wished to model the parts used in manufacturing as a set of uniformly typed entities. The need for this kind of representation is ubiquitous in databases, and any database language must provide a mechanism for doing this. In Pascal/R, one starts by constructing a record type and using that to construct a relation. There is thus a clear separation between the underlying type of the entity and the associated extent (in fact, there may be several extents associated with a given type). The entity types of Adaplex and the DataL!luss of Taxis do not make this distinction. Galileo requires a separate name for the extent and the underlying entity type; but although it

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

will allow classes over identical types, the basis for nearly all aspects of the language. type for each class is a new type. Effec- Each language takes a somewhat different tively, in all three, one declaration produces approach to expressing constraints on speboth the type and the extent. The designer cializations of a type. In Pascal (Section of a new language should ask what is to be l.l), for example, the variants of a record gained by doing this. In Pascal/R, the fact are disjoint. Thus, aperson record type may that the constituent fields of records that have variants that correspond to student form the basis of relations are limited to and employee types, but to create a person certain types should be regarded as a failure record type that represents both a student of type completeness resulting from an and an employee (or neither) requires extra engineering compromise. It would, in fact, variants to be added. In Adaplex, disjoint be useful to have a relation of integer, unions are the default but can be relaxed that is, to have a generic set type. with an overlap statement. Moreover, a If one is to adopt a single declaration for type is the union of its subtypes, and to a type and extent, a consequent problem is allow that a person may be neither an emwhether the extent should necessarily be ployee nor a student requires the declarapersistent. One may also require a repre- tion of a further empty subtype to account sentation of a subset of the extent. We have for this. By contrast, in Taxis, the default seen that temporary relations can be most is that a person may be student, employee, useful in certain kinds of computation and both, or neither. In fact, some work is rethat having persistence as a default will quired to express an integrity constraint necessitate a separate method of declaring that will prevent the two subtypes from temporary extents. overlapping. In Galileo, three options are Note that in Pascal/R, persistence is a provided for specifying how subtypes and supertypes relate to each other. In Amber, property of a database, not of a relation. The fact that databases can contain only the subtype relationship is inferred from relations should again be regarded as an the intrinsic properties of the type. engineering compromise, and it would be The interaction between procedures that extremely useful if objects of other types operate on the type hierarchy also needs could put in a Pascal/R database. The da- to be considered. We defined specializatabase constructor of Pascal/R is a good tions of a Taxis TransactionClass (that method of establishing a type and naming operated on parts) to operate on base- and convention for a database. However, mul- composite-parts. This apparently gives tiple instances and fields of any type are rise to a hierarchy of transaction classes required to conform with type complete- that corresponds to the hierarchy of other ness. But to conform with the need to allow classes, or types. As we have seen in the normal progress of database evolution, Section 5.2, in Amber this hierarchy of the binding cannot be entirely static. Am- procecure types is contravariant [Cardelli ber (Section 5.2) addresses this with its 1984a] with the hierarchy of argument type dynamic, PS-algol uses the dynamic types, since any procedure that takes a part properties of pointers and tables, and Gal- as an argument can necessarily take a baseileo and Poly exploit the manipulation of part as argument, but not vice versa; in environments. All these mechanisms ad- other words, there are more procedures that dress somewhat different functions, and operate on base-parts than on parts. This leads to the initially rather surprising confurther research on the appropriate mixture of static and dynamic binding is discussed clusion that we should think of the type of elsewhere [Atkinson and Morrison 1985a]. a procedure that operates on parts as a specialization of the type of a procedure that operates on base-parts. 7.2.3 Inheritance All the advanced database programming languages express some form of inheritance within their type system. Taxis is interesting in its attempt to use inheritance as the ACM Computing

Surveys, Vol. 19, No. 2, June 1987

7.3 Polymorphism

The examples given in ML and Poly show that parameterized polymorphism is an ex-

Types and Persistence in Database Programming

tremely powerful tool for constructing new data types. However, there are a number of tasks relevant to database programming that still defeat polymorphic languages. One that we discussed is the construction of a generic index type, for which one needs a hash function (for hash tables) or a comparison function (for search trees). And whereas it is always possible to construct such functions, the construction usually depends on some system-dependent information about the structures involved. Therefore, a generic index type, given the state of these languages, must be predefined. A second, and more fundamental, problem relevant to databases is the data type of a relation. Suppose one wishes to assign a data type to the user-defined function join3(x,

y, 2) = join&

join( y, 2))

where join denotes the natural join of two relations. How does one express the argument and result types of join3 if it is to be a generic function, that is, defined for all relations (x, y, z)? The interaction between relational data types and polymorphic programming has yet to be properly understood, in fact, we believe that the data type of the natural join has yet to be discovered. As another example, consider the problem of substituting for every occurrence of one value (substructure) in a data structure of arbitrary type another value of a different (or even the same) type. We can do this if we know the complete type of the data structure in which the substitution is to be done; but we cannot produce generic code to do this for all types, even though at some level of database restructuring, this is a common operation. Similar problems arise if one wishes to define a universal printing function, a snapshotting function, a generic forms-based data acquisition function, and so on. The requirement for “universal application” programs like the problems just mentioned [Owoso 1984, 19851 is so extensive in the database context that this area requires immediate attention. One strategy, illustrated by the Poly examples, is to have every object provide a sufficient set of base functions for each abstract type. But if that




is pursued, it will certainly be necessary to have a mechanism for adding to the set of base functions already in the database. This appears to be an unsolved problem for typed languages, and the lack of a solution is still an argument that is often offered against strict (or any) typing. There are a number of programming tasks that require some form of self-reference in the language. We have remarked that it is not possible to write one function that memoizes another in languages such as ML, although this is possible in an untyped language such as LISP. A similar problem is that, in an incremental programming environment, one would like to call upon the compiler as one calls upon other functions, but how is the type of the compiler to be expressed? Finally, we should emphasize that strict type checking is highly desirable for database programming. The open question is how much of it can be made static. Since the term “static” is relative, this is only a question that can be answered with a better understanding of persistence. 7.4 Persistence

We have advocated throughout this paper that persistence and data type should be orthogonal properties of values. We have also seen that transience is equally important for certain structures. Of the languages we have reviewed, PS-algol, Galileo, Poly, and Amber provide a completely uniform approach to persistence. Some interactive languages provide a form of checkpointing (Section 6.2). The user may save an existing workspace and recall it later on. However, this is not adequate for database work. There is no way a user can exploit modularity to save parts of a workspace. For example, it is likely that the database itself should be persistent but that experimental programs should be kept in a separate, disposable, workspace. More important, when sharing of databases is required, checkpointing is completely inadequate. It is interesting to note here that APL does provide decomposable workspaces. However, this is relatively easy to implement since APL’s workspaces are flat. There is no need to maintain references ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson and 0. P. Buneman

from one workspace to another. Unfortunately, the use of flat workspaces prohibits the use of references-essential for database work-and makes static scoping difficult to manage. Transactions and currency are both intimately connected with the provision of persistence. Transactions were explicitly supported by Adaplex atomic and in PS-algol. Other languages that are addressing the problem of concurrency and distribution find transactions essential [Liskov et al. 19831. Recently, there has been considerable debate as to whether it is appropriate to build in a particular model of transactions, or whether it is better for the language to provide more primitive constructs out of which the programmer constructs appropriate transactional and concurrent behavior for the various abstractions he or she provides [ Weihl1985a, 1985131.This is still an open research question. Another open question is the treatment of exceptions in a persistent environment. In recent work, Borgida [1985a, 1985b] has argued that it is necessary to design database management systems whose integrity constraints may be violated, and has analyzed the persistence of error states in a database. None of the languages, with the possible exception of Adaplex, which inherits the properties of Ada, has adequate facilities for concurrency. The design and implementation of concurrency constructs appropriate for database programming languages are clearly a prerequisite for their serious use. 7.5 Modularity: A Construct for Persistence and Organization

Traditional programming language research has distinguished between the language itself and the programming language environment. One of the achievements of database programming has been to integrate parts of the environment with the language itself. The traditional form of database programming is, as we have seen, to treat the database as part of the environment and to have separate programs for compiling and linking schemas, and so on. ACM Computing

Surveys, Vol. 19, No. 2, June 1987

In contrast, database programming languages have made an initial contribution to what must be desirable for all languages: a proper solution to the integration of languages with their environments. For databases, this requires a proper model of persistence and of how, and when, to perform binding and type checking. In programming environments, it is necessary to manipulate many persistent objects, such as source code, interface specifications, and compiled code. Such objects may be organized as modules. It is reasonable to expect that a proper integration of any language with its environment will lead to the same database programming language being used for these modules as is used for all other data, simplifying the programming environment. Further, the modules provide a context for retaining the procedures and other data values and for retaining the bindings between types, values, and names. Indeed, one of their roles is to assist in managing the large numbers of names that appear in a substantial software system. They also form a natural unit for component replacement. Consequently, they are important in organizing the construction and maintenance of both the body of data and the suite of programs. Indeed, we expect to see any distinction between the treatment of program and of other data disappear. Poly, Galileo, and PS-algol, where the analogs of modules were procedure closures or environments, demonstrated that the module can be used to enforce arbitrary protection and constraints. This is the subject of much current research, for example, Atkinson and Morrison [ 1985b], Cardelli and MacQueen [ 19851, and MacQueen [ 19851. If database programming language research succeeds, then programmers will no longer consciously use databases. A uniform system of modules will provide for the maintenance of both persistent data and persistent code. ACKNOWLEDGMENTS Some of the material for during a seminar course at vania in spring 1984. We contributions made by the

this paper was assembled the University of Pennsylvery much appreciate the participants in that semi-

Types and Persistence in Database Programming Languages mu: Mark Reinhold, Sharon Perl, Rishyur Nikhil, and others. Further material was developed with a senior honors class and an ICL course in the 1985 Candlemass term at the University of Glasgow. We particularly appreciated the assistance given by Dave Matthews, who actually programmed all the tasks before our very eyes in Poly. Many useful discussions with Luca Cardelli helped in our understanding of Galileo and Amber. Alex Borgida and Brian Nixon gave us a great deal of help in understanding Taxis and submitted the examples in this paper to the compiler that is being developed. Joachim Schmidt and Matthias Jarke explained Pascal/R and its descendante, Modula/R and DBPL, to us&eve Fox of Computer Corporation of America helped us with the Adaplex examples. Anthonio Albano and Renzo Orsini helped us with Galileo. Ron Morrison, Tony Davie, and the rest of our Persistent Programming Research Group provided many detailed and constructive comments. David MacQueen, Peter Wegner, Bob Harper, and Jim Donahue, as well as others that were at the Appin workshop, contributed to our general understanding of types. Our work together was supported by the Universities of Glasgow and Pennsylvania. The University of Edinburgh helped with access to equipment during one stage in the preparation of this paper. Our work was funded by the British Science and Engineering Research Council, which gave a fellowship to Peter Buneman to spend a year in Scotland (GRC 86280). Support was also provided by National Science Foundation CER grant 5-22930, ONR contract 5-20689, and AR0 contract. DAA29-84-9-0027. Other grants from SERC, GRA 86541, GRC 21977, and GRC 21960, provided travel, electronic communication, and equipment. This work has also been supported by International Computers, Ltd., University Research Council.

REFERENCES AHO, A. V., AND ULLMAN, J. D. 1979. Universality of data retrieval languages. In Proceedings of the 6th ACM Symposium on Principles of Programming Languages. ACM, New York, pp. 110-120. AIT-KACI, H., AND NASR, R. 1985. Login: A logic programming language with built in inheritance. In Proceedings of the A&n Workshop on Data !&pes and P&&ence, Z&search Report 16. M. P. Atkinson. 0. P. Buneman. and R. Morrison. Eds. Persistent Programming kesearch Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland. ALBANO, A., CARDELLI, L., AND ORSINI, R. 1983.

Galileo: A strongly -_ typed, __ interactive conceptual language. Tech. Rep. In&ml Technical Gocument Services. AT&T Bell Laboratories. Murray Hill, N.J. ALBANO, A., CARDELLI, L., AND ORSINI, R. 1985a.

Galileo: A strongly typed interactive conceptual



language. ACM Trans. Database Syst. 10, 2 (June), 230-260. ALBANO, A., GHELLI, G., AND ORSINI, R. 1985b. The implementation of Galileo’s persistent values. In Proceedings of the Appin Workshap on Data Types and Per&e&e, Research Report i6. M. P. A&iison. 0. P. Buneman. and R. Morrison. Eds. Persist&. Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 197-208. ALBANO, A., OCCHIUTO, M.


1985c. Galileo Reference Manual, Vax/Unix version 1.0. Tech. Rep., Diparimento di Informatica, Univ. di Piss, Pisa, Italy. ALEANO,







PEDRESCHI,D. 1985. The type system of Galileo. In Proceedings of the Appin Workshop on Data Types and P~rsikence, &search Rep& 16. M. P. Atkinson. 0. P. Buneman. and R. Morrison. Eds. Persist& Programming .&search Group; Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 175-195. AMBLE, T., BRATBERGSENGEN, K., AND RISNES, 0.

1979. ASTRAL: A structured and unified approach to database design and manipulation. In Proceedings of the Database Architecture Conference (Venice, Italy, June). ASTRAHAN, M. M., BLASCEN. M. W., CHAMBERLIN. D. D., ESWARAN, K. P., GRAY, J. N., GRIFFITH& P. P.. KING. W. F.. LORIE. R. A.. MCJONES. P. R.; MEHL; J. W., .PUTZO~U, G. R., TRAIGER; I. L., WADE, B. W., AND WATSON, V. 1976.

System R: Relational approach to database management. ACM Trans. Database Syst. I,2 (June), 97-137. ATKINSON, M. P. 1978. Programming languages and databases. In The 4th kernati&J co&rence on Very Large Data Bases, S. B. Yao, Ed. (Berlin, West Germany, Sept.). IEEE & ACM, New York, pp. 408-419. ATKINSON, M. P., AND MORRISON, R. 1984. First. class persistent procedures are enough. In Proceedings of the 4th Conference on the Foundatians of Theoretical Computer Science and Software Technology (Bangalore, India). Springer-Verlag, Berlin, pp. 223-240. ATKINSON, M. P., AND MORRISON, R. 1985a. First class persistent procedures. ACM Trans. Program. Lung. Syst. 7,4 (Oct.), 501-538. ATKINSON, M. P., AND MORRISON,R. 1985b. Types, bindings and parameters in a persistent environment. in P&eedings of the kppin Workshop on Data lpwes and Persistence. Research ReDort 16, M. P. Atkinson, 0. P. Bunen&, and R. M&son; Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. l-24. ATKINSON, M. P., AND MORRISON, R. 1986. Towards an integrated persistent graphical programming language. In Proceedings of the 18th Hawaii Znternutional Conference on Systems Sciences, vol. 2 (Jan.). Western Periodicals, North Hollywood, Calif., pp. 842-854. ACM ComputingSurveys,Vol. 19,No. 2,June 1987



M. P. Atkinson

and 0. P. Buneman

ATKINSON, M. P., CHISHOLM,K. J., AND COCKSHOTT, W. P. 1981. PS-algol: An Algol with a persistent heap. SIGPLAN Not. (ACM) 17,7 (July), 24-31. Also available as Tech. Rep. CSR-94-81, Comnuter Science Dent.. Edinburgh Univ.. Edinburgh, Scotland. ATKINSON, M. P., CHISHOLM, K. J., COCKSHOW, W. P., AND MARSHALL., R. M. 1983a. Algorithms for a persistent heap. Softw. Pratt. Exper. 13, 7 (Mar.). ATKINSON, M. P., BAILEY, P. J., CHISHOLIU, K. J., COCKSHO~, W. P., AND MORRISON, R. 1983b. An approach to persistent programming. Comput. J. 26, 4 (Nov.). ATKINSON, M. P., BAILEY, P., COCKSHOTF,W. P., CHISHOLM, K. J., AND MORRISON, R. 1984. Progress with Persistent Programming. Cambridge University Press, Cambridge, England. ATKINSON, M. P., MORRISON, R., AND PRAT-~EN, G. D. 1986. Designing a persistent information space architecture. In Proceedings, H. J. Kugler, Ed. IFIP, Dublin, Sept., pp. 115-119. BACKUS, J. 1978. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Commun. ACM 21, 8 (Aug.), 613-641. BALZER, R. M. 1986. Living in the next generation operating system. In Information Processing 86. Elsevier North-Holland, New York. BANCILHON, F., AND RAMAKRISHNAN, R. 1986. An amateur’s introduction to recursive query processing strategies. In Proceedings, ACM SIGMOD, (Washington, D.C., May). ACM, New York, pp. 166-176. BEVER, M., AND LOCKEMANN,P. C. 1985. Database hosting in strongly typed language. ACM Trans. Database Syst. 10, 1 (Mar.), 107-126. BORGIDA, A. 1983. Features of Languages for Conceptual Information System Development. Tech. Rep. Dept. of Computer Science, Hill Center, Rutgers Univ., New Brunswick, N.J. BORGIDA, A. 1985a. Accommodating exceptions to type. In Proceedings of the Appin Workshop on Data Types and Persistence, Research Report 16. M. P. Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 265-271. BORGIDA,A. 1985b. Flexible data exceptions. In Proceedings of the 1lth International Conference on Very Laree Data Bases (Sinaanore, Autr.). VLDB Endowment, Saratoga, Calif. BORCIDA, A. 1985c. Language features for flexible handling of exceptions in information systems. ACM Trans. Database Syst. 10,4 (Dec.), 565-603. BRACHMAN, R. J. 1978. A New Paradigm for Representing Knowledge. Tech. Rep. BBN 3605. Bolt, Beranek and Newman, Cambridge, Mass. BRACHMAN, R. J. 1983. What IS-A is and isn’t: An analysis of taxonomic links in semantic networks. Computer 16, 10 (Oct.), 30-35.

ACM ComputingSurveys,Vol. 13, No. 2, June 1987

BRAGGER, R. P., DUDLER, A., REBSAMEN, J., AND ZEHNDER, C. A. 1983. Gambit: An Interactive Database Design Tool for Data Structures, Zntegrity Constraints and Transactians. Institute Fur Informatik, Eidgenossiche Technische Hochschule, Zurich, pp. 65-95. BRODIE, M., MYLOPOULOS, J., AND SCHMIDT, J. 1983. On Conceptual Modelling: Perspectives from Artificial Intelligence, Databases, and Programming Languages. Springer-Verlag, Berlin. BRUCE, K., AND WEGNER, P. 1987. An algebraic model of subtype and inheritance. In Proceedings of the Roscoff Workshop on Database Programming Langties, Altan-CRAI (Sept.). Available as a technical report from Computer and Information Science Dept., Univ. of Pennsylvania, Philadelphia, Pa. or from Altair, BP 105 Rocquencourt, 76153 LeChesnay Cedex, France. BUNEMAN, 0. P. 1985. Data Types for Data Base Programming. In Proceedings of the Appin Workshop on Data Types and Persistence, Research Report 16. M. P. Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 285-298. BUNEMAN, 0. P., AND ATKINSON, M. P. 1986. Inheritance and persistence in database programming languages. In Proceedings of ACM SIGMOD (Washington, D.C.). ACM, New York, X,2, June. pp. 4-15. BUNEMAN, 0. P., AND OHORI, A. 1986. A domain theoretic approach to higher-order relations. In ICDT 86: International Conference on Database Theory (Rome). Springer-Verlag, Berlin. BUNEMAN, 0. P., AND OHORI, A. 1987. Using powerdomains to generalize relational databases. Tech. Rep. Computer and Information Science Dept., Univ. of Pennsylvania, Philadelphia. BUNEMAN, 0. P., FRANKEL, R. E., AND NIKHIL, R. 1982. An implementation technique for database query languages. ACM Trans. Database Syst. 7, 2 (June), 164-186. BUNEMAN, 0. P., HIRSCHBERG, J., AND ROOT, D.

1982. A CODASYL interface for Pascal and Ada. In Proceedings of the 2nd British National Conference on Databases (Bristol, England, July). British Computer Society, Bristol, England. BURGE, W. H. 1977. Recursive Programming Techniques. Addison-Wesley, Reading, Mass. CARDELLI, L. 1984a. A Semantics of Multiple Znheritance. Springer-Verlag, Berlin, pp. 51-67. CARDELLI, L. 198413. Amber. Tech. Rep. AT&T Bell Labs, Murray Hill, N.J. CARDELLI, L., AND MACQUEEN, D. M. 1985. Persistence and type abstraction. In Proceedings of the Appin Workshop on Data Types and Persistence, Research Report 16. M. P. Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 221-230.

Types and Persistence in Database Programming Languages CARDELLI, L., AND WEGNER, P. 1985. On understanding types, data abstraction, and polymorphism. ACM Comput. Suru. 17,4 (Dec.), 471-522. CAREY, M., DEWITT, D., RICHARDSON, J., AND SHEIKTA, E. 1986. Object and file management in the EXODUS extensible database system. In Proceedings of the 12th International Conference on Very Large Databases (Kyoto, Japan, Aug.).

VLDB Endowment, Saratoga, Calif. CATTELL, R. G. G. 1983. Design and implementation of a relationship-entity-datum data model. Tech. Rep. CSL-83-4; Xerox Palo Alto Research Center, Palo Alto, Calif. CHEN, P. P. S. 1976. The entity-relationship model: Toward a unified view of data. ACM Trans. Database Syst. 11, 1 (Mar.), 9-36. CHUNG, K. L. 1984. Implementation of Taxis, Process Management and Enforcement of Semantic Integrity Constraints. Master’s thesis, Dept. of Computing Science, Univ. of Toronto, Toronto, Canada. CLOCKSIN, W. F., AND MELLISH, C. S. 1981. Programming in Prolog. Springer-Verlag, Berlin. CODD,E. F. 1970. A relational model for large shared databanks. Commun. ACM 13,6 (June), 377-387. CODD,E. F. 1979. Extending the database relational model to capture more-meaning. ACM Trans. Database Syst. 4, 4 (Dec.),


COLE, A. J., AND MORRISON,k. 1982. An Zntroduction to Programming with S-algal. Cambridge University Press, New York. COPELAND, G. P., AND KHOSHAFIAN, S. N. 1987. Identity and versions for complex objects. In Proceedings of the Appin Workshop on Persistent Object Systems, Research Report 44. M. P. Atkin-

son, 0. P. Buneman, and R. Morrison, Eds. Glasgow, Scotland, Aug., pp. 407-428. COPELAND,G., AND MAIER, D. 1984. Making Smalltalk a database system. In Proceedings of SZGMOD’84, ACM, New York, pp. 316-325. DAHL, O., AND NYGAARD,K. 1966. Simula, an Algolbased simulation language. Commun. ACM 9. 9 (Sept.), 671-678. - DARLINGTON, J., HENDERSON, P., AND TURNER, D. A. 1982. Functional Programming and Its Applications. Cambridge University Press, Cambridge, England. DATE, C. J. 1981a. Referential integrity. In The 7th Znternational


on Very Large


Bases (Cannes, France). IEEE, New York, pp. 2-12. DATE, C. J. 1981b. An Introduction to Database Systems. 3d ed. Addison-Wesley, Reading, Mass. DATE, C. J. 1983a. An Introduction to Database Systems, vol. 2. Addison-Wesley, Reading, Mass. DATE, C. J. 1983b. Database: A Primer. AddisonWesley, Reading, Mass. DEMERS,A., AND DONAHUE,J. 1979. Revised report on Russell. Tech. Rep. TR79-389. Dept. of Computer Science, Cornell Univ., Ithaca, N.Y.


DONAHUE, J., HAUSER, C., AND KENT, J. 1986. A client interface to an entity-relationship database system. Tech. Rep. CSL-86-4. Xerox Palo Alto Research Center, Palo Alto, Calif. ECKHARDT, H., EDELMAN, J., KOCH, J., MALL, M., AND SCHMIDT, J. W. 1985. Draft report on the database programming language DBPL. DBPLMemo 091-85. Fachbereich Informatik, Univ. of Frankfurt, Frankfurt, West Germany. FAIRBAIRN, J. 1982. Ponder and its type system. Tech. Rep. 31, Computer Laboratory, Univ. of Cambridge, Cambridge, England. FAIRBAIRN,J. 1985. A new type-checker for a functional language. In Proceedings of the Appin Workshop on Data Types and Persistence. Research Report 16. M. P:Atkinson, 0. P. Buneman,

and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 107123. GALLAIRE, H., AND MINKER, J. 1978. Logic and Databases. Plenum, New York. GALLAIRE, H., MINKER, J., AND NICOLAS, J.-M. 1984. Logic and databases: A deductive approach. ACM Comput. Surv. 16, 2 (June), 153185. GOGUEN, A., AND MESEGUER, J. 1984. EQLOG: Equality, types and generic modules for logic programming. J. Logic Program. 1,179-209. GOLDBERG,A., AND ROBSON,D. 1983. Smalltalk-80: AddisonThe Language and Its Implementation. Wesley, Reading, Mass. GOLDSTEIN, I. P., AND BOBROW, D. G. 1980. Extending object oriented programming in Smalltalk. In Proceedings of the 1980 Lisp Conference (Aug.). ACM, New York, pp. 75-81. GORDON, M. J., MILNER, A. J. R. G., AND WADSWORTH, C. P. 1979. Edinburgh LCF. Lecture Notes in Computer Science, vol. 78, SpringerVerlag, New York. HALL, P. A. V. 1983. Adding database management to Ada. SZGPLAN Not. (ACM) 13, 3 (Apr.), 1317. HAMMER, M., AND MCLEOD, D. 1980. On database management system architecture. In Znfotech State of the Art Report on Database, M. Atkinson, Ed. Info&h, Maidenhead, England. HAMMER, M., AND MCLEOD, D. 1981. Database description with SDM: A semantic database model. ACM Trans. Database Syst. _ 6. 3 (Sent.). ._.. 351-386. HENDERSON,P., AND MORRIS, J. H. 1976. A lazy evaluator. In 3rd ACM Symposium on Principles of Programming Languages. ACM, New York, pp. 95-103. HOROWITZ, E., AND KEMPER, A. 1983. AdaRel: A relational extension of Ada. Tech. Rep. TR-83218, Dept. of Computing Science, Univ. of Southern California, Los Angeles, Calif. ICHBIAH, J. H., BARNES, J. G. P., HELIARD, J. C., KRIEG-BRUCKNER,B., ROUBINE, O., AND WICH-

ACM Computing

Surveys, Vol. 19, No. 2, June 1987



M. P. Atkinson

and 0. P. Buneman

MANN, B. A. 1979. Rationale of the design of LAMPSON, B. W., HORNING, J. J., LONDON, R. L., the programming language Ada. S&&m Not. MITCHELL, J. G., AND POPEK, G. L. 1977. (ACM) 14.6. Report on the programming language EUCLID. SZGPLAN Not. (ACM) 12,2. IVERSON,K. i. 1979. Operators. ACM Tmns. Program. Lang. Syst. 1.2 (Oct.), 161-176. LISKOV, B. 1981. CLU Reference Manual, G. Goos and J. Hartmanis, Eds. Lecture notes in JARKE, M., AND KOCH, J. 1982. A survey of query Computer Science, vol 114. Springer-Verlag, optimization in centralized database systems. Berlin. Tech. Rep. CRIS 44, GBA 82-73 CR, Center for LISKOV! B.? HERLIHY! M., JOHNSON, P., LEAVENS, F&search on Information Systems, New York Univ., New York, Nov. G., SCHEIFLER, R., AND W~IHL, W. 1983. Preliminarv ARGUS reference manual. JARKE, M., AND KOCH, J. 1983. Range nesting: A Tech. Rep. Memo 39. Programming Methodology fast method to evaluate quantified queries. In Group, Laboratory for Computer Science, MasProceedings of the ACM SZGMOD International sachusetts Institute of Technology, Cambridge, Conference on Management of Data (San Jose, Mass., Oct. Calif.). ACM, New York. Also Tech. Rep. CRIS 49, GBA 83-25 CR, Center for Research on InforMACQUEEN, D. M. 1985. Modules for standard ML. mation Systems, New York Univ., New York, Polymorphism 2.2. AT&T Bell Laboratories. May. MAIER, D., STEIN, J., OTIS, A., AND PURDY, A. JARKE, M., AND KOCH, J. 1984. Query optimization 1986. Development of an object-oriented in database systems. ACM Comput. Sum. 16, 2 DBMS. SZGPLAN Not. (ACM) 21, 11 (Nov.), (June), 111-152. 172-182. JARKE, M., AND VASSILOU, Y. 1985. A framework MALL, M., SCHMIDT, J. W., AND REIMER, M. 1984. for choosing a database query language. ACM Data selection, sharing, and access control in a Comput. Survey. 17,3 (Sept.), 313-340. rela’tional scenario. In On Conceptuul Modelling, M. L. Brodie, J. L. Mylopoulos, and J. W. KAHN, G., MACQUEEN, D., AND PLOTKIN, G. EDS. Schmidt, Eds. Springer-Verlag, Berlin. Semantics of Data Tvpes: International Symposium. Springer-Verlag, Berlin, 1984. MATHEWS, C. J. 1985a. Poly manual. Tech. Rep. 63, Computer Laboratory, Univ. of Cambridge, KAPLAN, H. 1983. High level interfaces for dataCambridge, England. bases. Master’s thesis, Dept. of Computing and Information Science, Univ. of Pennsylvania, MATHEWS, D. C. J. 1985b. Overview of the poly Philadelphia, Pa programming language. In Proceedings of the Appin Workshop on Data Types and Persistence. KENT, W. 1978. Data and Reality. North-Holland, Research Report 16. M. P. Atkinson, 0. P. BuneAmsterdam. man, and R. Morrison, Eds. Persistent ProgramKENT, W. 1979. Limitations of record-based informing Research Group, Dept. of Computing mation models. ACM Trans. Database Syst. 4, 1 Science, Univ. of Glasgow, Glasgow, Scotland, (Mar.), 107-131. pp. 255-263. KERSTEN, M. L., AND WASSERMAN,A. I. 1981. The MCCARTHY, J. 1962. LISP 1.5 Programmer’s Manarchitecture of the plain data base handler. Softw. ual. MIT Press, Cambridge, Mass. Pmct. Exper. 11,175-186. KHOSHAPIAN, S. N., AND COPELAND, G. P. MERRE~, T. H. 1977. Relations as programming language elements. Znf. Process. L&t. 6, 1 (Feb.), 1986. Object identity. SZGPLAN Not. (ACM), 29-33. 21,ll (Nov.), 406-416. MERE, T. H. 1983. Extending the relational data KQCH, J., MALL, M., PUTFARKEN, P., REIMER, M., model to capture less meaning. ACM SZGMOD SCHMIDT, J. W., AND ZEHNDER, C. A. Rec. 14,3,5&69. 1983. Modula/R report. Lilith version. Tech. Rep. Institute Fur Informatik, Eidgenossische MERRETI, T. H. 1984. Relational Inform&on Systems. Reston Publishing, Reston, Va. Technische Hochschule, Zurich. MERRE?T, T. H. 1985a. First Steps to Algebraic ProcKRABLIN, G. L. 1985, Building flexible multilevel essing of Text. Academic Press, Orlando, Fla. transactions in a distributed persistent environment. In Proceedings of the Appin Workshop on MERRET~, T. H. 1985b. Persistence and Aldat. In Data l)pes and Persistence, Research Report 16. Proceedings of the Appin Workshop on Data Apes M. P. Atkinson, 0. P. Buneman, and R. Morrison, and Persistence. Research Report 16. M. P. AtkinEds. Persistent Programming Research Group, son, 0. P. Buneman, and R. Morrison, Eds. PerDept. of Computing Science, Univ. of Glasgow, sistent Programming Research Group, Dept. of Glasgow, Scotland, pp. 83-105. Computing Science, Univ. of Glasgow, Glasgow, Scotland, Aug., pp. 35-48. KULKARNI, K. G., AND ATKINSON, M. P. 1983. Use of PS-algol to experiment with data models. MERRE?T, T. H., AND DOCHTING, B. 1984. Softw. Pratt. Exper. 17,3 (Mar.), 171-185. Relational storage and processing of two dimensional diagrams. Comput. Graph. $3. KULKARNI, K. G., AND ATKINSON, M. P. 1985. MICHIE, D. 1968. “Memo” functions and machine EFDM: Extended Functional Data Model. Brit. Comput. J. 19,l (Jan.), 38-45. learning. Nature, 218 (Apr.), 19-22.

ACM ComputingSurveys,Vol. 19, No.

2, June 1987

Types and Persistence in Database Programming Languages MILLER, D. 1986. A theory of modules for logic programming. In Proceedings of the 1986 IEEE Symposium on Logic Progmmming (Salt Lake City, Utah). IEEE, New York. MILNER, R. 1978. A theory of type polymorphism in programming. J. Comput. Syst. Sci. 17, 348-375. MILNER, R. 1979. Flowgraphs and flow algebras. J. ACM 26,4 (Oct.), 794818. MILNER. R. 1983. A proposal for standard ML. Polymorphism I,3 (Dec.). MILNER, R. 1984. A proposal for standard ML. In Proceedings of the 19sb ACM Symposium on Lisp and Functional Programming (Aug.). ACM, New York, pp. 184-197. MORRISON, R. 1979. S-Algol Language Reference Manual. Tech. R.~D.CS/79/1. Dent. of Comnutational Science, Univ. of St.. Andrew*, St. Andrews, Scotland. MORRISON,R., DEARLE, A., BROWN, A. L., AND ATKINSON, M. P. 1985a. The persistent store as an enabling technology for integrated support environments. In Proceedings of the 8th Intemational Conference on Software Engineering (London, Aug.). IEEE, New York. MORRISON,R., BROWN, A. L., BAILEY, P. J., DAVIE, A. J. T., AND DEARLE, A. 1985b. A persistent graphics facility for the ICL PERQ. Softw. Pmct. Exper. 14, 3. MYCROF~, A. 1984. An inferential type system for Prolog. Tech. Rep., Computer Science Dept., Univ. of Edinburgh, Edinburgh, Scotland. MYLOPOULOS,J., AND WONG, H. K. T. 1986. Some features of the Taxis data model. In The 6th Intematianai Conference on Very Large Data Bases (Montreal, Canada, Oct. l-3). ACM, New York. MYLOPOULOS, J.. BERNSTEIN, P. A., AND WONG, H. K. T. 1989. A language facility for designing database intensive applications. ACM Trans. Database Syst. 5.2 (June), 185-207. NIKHIL, R. 1982. RDB-A relational database management svstem. User Manual. Tech. R~.D..Dent. of Computer Science and Information Science, Univ. of Pennsylvania, Philadelphia. NIKHIL, R. 19&. An incremental, strongly typed applicative programming system for databases. Ph.D. dissertation, Dept. of Computing and Information Science, School of Engineering and Applied Science, Univ. of Pennsylvania, Philadelphia. NIXON, B. 1983. A Taxis compiler. Master’s thesis, Dept. of Computing Science, Univ. of Toronto, Toronto, Ont., Canada. NIXON, B., ED. 1984. Taxis ‘&I: Selected papers. Tech. -Rep. TR CSRG-169, Computer Science Research G~uD. Univ. of Toronto, Toronto, Ont., Canada. O’BRIEN, P. 1983. An integrated interactive design environment for Taxis. In Proceedings of the 1988 Softfair Conference (July). Softfair, Washington, D.C.



O’BRIEN, P., BULLIS, B., AND SCHAPFERT,C. 1986. Persistent and shared objects in trellis/owl. In Proceedings of the 1986 IEEE International Workshop-on Object-Oriented Database Systems. IEEE, New York. OHORI, A. 1987. Orderings and types in databases. In Proceedings of the Roscoff Workshop on Database Progmmming Longucges, Altair-CRAI. Available as a technical report from Computer and Information Science Dept., Univ. of Pennsylvania, Philadelphia, Pa. 19104 or from Altair, BP 105 Rocquencourt, 76153 LeChesnay Cedex, France. OLLE, T. W. 1978. The CODASYL Approach to Data Base Management. Wiley-Interscience, New York. ONTOLOGIC,INC. 1986. Vbase Object Manager User ManuaI. Bilerica, Mass. OWOSO.G. 0. 1984. Data descriution and maninulation in persistent programming languages. Ph.D. dissertation. Univ. of Edinhureh. Computer Science Dept., Edinburgh, Scotlaid. OWOSO,G. 0. 1985. Flexible data handling in programming languages. In Proceedings of the Appin Workshop on Data Types and Persistence. Research Report 16. M. P. Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland. PERSISTENT PROGRAMMING RESEARCH GROUP 1985. The PS-Algol Reference Manual. 2d ed. Tech. Rep. PPR-12-85. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland. PIROTPE,A., AND LACROIX,M. 1980. User interfaces for database application programming. In Infotech State of the Art Conference on Databases. Infotech, London, England. RICHARD, P., AND VELEZ, F. 1987. An object-oriented formal date model. In Proceedings of the Roscoff Workshop on Database Pmgmmming Languuges, Altair-CRAI. Sept. Available as a technical report from Computer and Information Science Dent.. Univ. of Pennsvlvania. Philadelphia, Pa. 19104, or from Ah&,-BP 105 Rocquencourt, 76153 LeChesnay Cedex, France. ROSENTHAL, A., HEILER, S., DAYAL, U., AND MANOLA, F. 1986. Traversal recursion: A practical approach to supporting recursive applications. In Proceedings of the ACM SIGMOD International Conference on the Management of Data (Washington, D.C., May 28-30). ACM, New York, pp. 166-176. ROWE, L. 1985. Windows on relations. In Proceedings of the 11th Internatianai Conference on Very Large Data Bases (Singapore, Aug.). J. Bubenko, Ed., VLDB Endowment, Saratoga, Calif. ROWE, L., AND SHOENS,K. 1979. Data abstraction, views and updates in RIGEL. In Proceedings of ACM SIGMOD Internationai Conference on Management of Data (Boston, Mass., May 30-June 1). ACM, New York, pp. 71-81.

ACM ComputingSurveys,Vol. 19,No. 2,June 198’7



M. P. Atkinson and 0. P. Bunernan

SAMET, P. A., ED. 1981. Query Languages: A Unified

Approach. Monographs in Informatics. Heyden & Sons, London, England. SCHMIDT,J. W. 1977. Some high level language constructs for data of type relation. ACM Trans. Database Syst. 2,3 (S&t.), 247-261. SCHMIDT, J. W., AND BRODIE, M. 1983. Relational Database Systems. Springer-Verlag, New York. SCHMIDT, J. W., AND MALL, M. 1983. Abstraction mechanisms for database programming. SZGPLAN Not. (ACM) l&6 (June). SHIPMAN, D. W. 1981. The functional data model and the data language DAPLEX. ACM Trans. Database Syst. 6, 1 (Mar.), 140-173. SHOPIRO,J. E. 1979. THESEUS-A Programming language for relational databases. ACM Trans. Database Syst. 4,4 (Dec.), 493-517. SKARRA,A. H., AND ZDONIK, S. B. 1986. The management of changing types in an object-oriented database. SIGPLAN Not. (ACM) 21, 11 (Nov.), 483-495. SMITH, J. M., AND SMITH, D. C. P. 1977. Database abstractions: Aggregation and generahsation. ACM Tmns. Database Syst. 2,2 (June), 105-133. SMITH, J. M., Fox, S., AND LANDERS, T. 1983. ADAPLEX: Rationale and Reference Manual. 2d ed. Computer Corporation Of America, Cambridge, Mass. STONEBRAKER,M., AND ROWE, L. A. 1986. The design of postgres. In Proceedings of ACM SZGMOD Znternational Conference on the Management of Data (June). ACM, New York, pp. 340355.

STONEBRAKER,M., WONG, E., KREPS, P., AND HELD, G. 1976. The design and implementation of INGRES. ACM Trans. Database Syst. 1,3 (Sept.), 189-222. STROUSTRUP,B. 1986. The C++ Programming Language. Addison-Wesley, Reading, Mass. SWINEHART, D. C., ZELLWEGER,P. T., HAGMANN, R. B., AND BEACH, R. J. 1986. A structural view of the Cedar programming language. ACM Trans. Program. Lang. Syst. 8,4 (Oct.), 419-490. TEITELMAN, W. 1975. INTERSLIP reference manual. Tech. Rep., Xerox Palo Alto Research Center, Palo Alto, Cabf. TSICHRITZIS, D. C., AND LOCHOVSKY, F. H. 1977. Data Base Management Systems. Academic Press, New York. TURNER, D. A. 1981. The semantic elegance of applicative languages. In Proceedings of 1981 Con-

Received November

ference on Functional Programming Languages & Computer Architecture (Portsmouth, N.H., Oct.). ACM, New York, pp. 18-22. TURNER, D. A. 1985. Miranda: A non-strict functional language with polymorphic types. In Functional Programming Languages and Computer Architecture, J.-P. Jouannaud, Ed. SpringerVerlag, Berlin, pp. 1-16. TURNER, D. 1986. An overview of Miranda. Sigplon Not. (ACM) 2, 12 (Dec.), 158-167. ULLMAN, J. D. 1982. Principles of Database Systems. 2d ed. Pitman, Marshfield, Mass. VAN WIJNGAARDEN,A., MAILLOUX, B. J., PECK, J. E. L.. COSTER. C. H. A.. SINTZOFF. M.. LINDSEY. C.‘H., MEE~TENS, L..G. L. T., AND FISKER, RI G. 1969. Report on the algorithmic language AlgoI63. Numer. Math. 14, 79-218. WASSERMAN,A. I. 1978. Design goals for PLAIN. In Proceedings of the 11th Hawaii Zntematianal Conference on Systems Sciences, Western Periodicals, No. Hollywood, Cahf., pp. 25-30. WASSERMAN, A. I., AND BOOSTER, T. W. 1981. String handling and pattern matching in PLAIN. Tech- Rep. 50; Laboratory of Medical Information Science. Univ. of California. San Francisco. Cahf. ’ WASSERMAN,A. I., SHERTZ, D. D., KERSTEN, M. L., RRIT, R. P., AND VAN DE DIPPE, M. D. 1981. Revised report on the programming language PLAIN. SZGPLAN Not. (ACM). WEIHL, W. E., AND LISKOV, B. 1985a. Implementation of resilient, atomic data types. ACM Trans. Pmgmm. Lung. Syst. 7,2 (Apr.), 244-269. WEIHL, W. E. 1985b. Linguistic support for atomic data types. In Proceedings of the Appin Workshop on Data Types and Persistence. Research Report 16. M. P.-Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 145-173. WIRTH, N. 1971. The programming language PASCAL. Acta Znf. 1. WIRTH, N. 1983. Programming in Mod&a-2. 2d ed. Springer-Verlag, Berlin. ZDONIK, S. B., AND WEGNER, P. 1985. A database approach to languages, libraries and environments. In Proceedings of the Appin Workshop on Data Types and Persistence. Research Report 16. M. P. Atkinson, 0. P. Buneman, and R. Morrison, Eds. Persistent Programming Research Group, Dept. of Computing Science, Univ. of Glasgow, Glasgow, Scotland, pp. 145-173.

1985; revised June 1986 and September 1987 ; final revision accepted November 1987.

ACM ComputingSurveys,Vol. 19, No. 2, June 1987