Theseus - A Programming Language for Relational Databases Jonathan E. Shopiro
Computer Science Department
The University of Rochester
Rochester~ NY 14627
TR31 (Revised) March 1979
Theseus~ a very high level programming language extending EUCLID~ is described. Data objects in Theseus include relations and a-sets~ a generalization of records. Primary design goals of
Theseus are to facilitate. the writing of well-structured programs for database applications and to serve as a vehicle for research in automatic program optimization.
The preparation of this paper was supported in part by the Alfred P. Sloan Foundation under Grant No. 74-12-5, and in part by the National Science Foundation under Grant No. MCS76-10825.
A database is a model of some part of the world. h s~t of relations [Codd 7G] describing the ~ntities of that worl~ is an agreeable form for such a rr.odel, because it is easy for people to understand how the tuples of the relations represent facts ebout the objects of th~ world [Date 77] One of the main advantages of relation~l databases ov~r ot~£r kinds of databases is that information about many diff~rent as?~cts of the world can b~ easily accommod~lted in the same model ·.rJith:)l~t bias tow3ra any particular aspect. For example, a relation~l database for a corporaticn can contain information about emploY2es, d~partments, suppliers, produ=ts, and customers and ~ll their interr~lationships in such? wny th~t it is equ~lly easy to r2qu~~t information about any of the various kinds of obj~cts or relation ships. A considerable effort (Chamberlin 7G, Zloof 75] has b?~n cxpen~~d on the design of interactive query languag~s for rel! tional databases, but there has been rather less work [Stonebraker and Row~ 77] on progra~ming 1~ngu3g~s for relation~l dat~~ases. believe that the Si".me benefits of e2.sy ·:;cceS5 to inforn,:; tion r0g~rding all aspects of the data model should accrue to th~ applicat ion programmer i'lS w2ll ilS th0 interact i'70 us'?r, es!?(>-:::i~lly since most us: of the databas~, intcr~ctiv~ or b~tch, is llk~ly to be by specialized application programs rather than ~ generel us:r inter fac'.:. H(:
On~ of the reasons for the l~ck of ~~velopmQnr of progr~mrning lan9u~ges for relational databases has be~n that ~ hu~~n 9rogr:~~er
with ~nowledge of the physic~l implementation underlying rh~ r01~ tional dat~bas~ could write ~ much more efficient 9rogr~rr th~n ~ould be expected from a co=piler of a relation21 IJnguag~. ~~ beli~va th~t 3uto~atic progr~rn opti~ization (9articu13rly flo~ ~nalysis [Hocht 77] and data structure se:~ction [Lov an~ aOvn0r 761) h~s reached the state at which this no longer need b~ tru~; th~t is, we believe that a compiler can be built th~t can t3Ke 2 program written at the relational level and a description of the physical iffi?lement~tion of the database ane genpr?te highly effi cient code. ~e are particularly interested in glob~l ?rogra~ o~ timization [Allen 69], rather than o~timiz~tion of in6ividu~1 r~ triev~l requests [Lori? ~nd Wode 77]: . in designing Th~seus w~re to build u testbed for research in program optimiz~tion thpt would be ~ pow~rfulr useful, and useable language for relational database applic~tions. We felt it should have the following ch~ri'lcteristics: Cur
It should make
easy to write and understand.
Theseus - A Programming Language for Relational Databases
2. It should not require the user to know anything about the physical structure of the data. He must know the names of relations, their field names and data types, and the semantics of the relational model of the world.
3. II'""- ~_.
It should not have any arbitrary limitation~'in its power, so that any computable function can be expressed in it • [*]
4. It should be amenable to compile time analysis, so that the compiler can make efficient choices of access paths, data representations, etc. To achieve these goals, it would not be adequate to add a few system calls to a general purpose programming language [Stonebraker 77]. The approach that we have taken is to extend the programming language EUCLID [Lampson 77], itself an extension and modification of PASCAL [Jensen and Wirth 75]. The primary reason that we chose EUCLID as a base for our ex tensions is that it was designed for verifiability and the Theseus compiler will have to prove various properties of the pro~rarn being compiled in order to apply certain optimizing transformatlons. Otherwise, we could have made these extensions to nearly any block-structured language. We have also assumed some non-relational extensions to EUCLID. We will use strings of unspecified length, real numbers, and homogeneous sets of indi viduals (i.e. sets of elements of uniform simple type) without further comment. Since EUCLID was designed for writing small sys tem programs, and Theseus is intended to be used for relational database application programs, we have necessarily departed to some extent from the spirit of EUCLID, but we have tried to do it as little harm as possible.
THE MODEL AND THE LANGUAGE
This section of the paper gives the syntax [**] and semantics of our extension to EUCLID. The syntax is designed to merge smoothly with EUCLID, but it is subject to change as we get user feedback; the objects and operations are not expected to change
(.] This is controversial1
see [Aho and Ullman 79].
[**] We use a modified BNF with I separating alternatives, square brackets surrounding optional items, and curly brackets surrounding items that may appear zero or more times. When these symbols appear in quotes they denote themselves.
Theseus - A Programming Language for R?lational Dctabases 2.1
A fundamental data type of Thesaus is the a-set (for "asso ciation set") [Feldman and Williams 78], ~ set of name-valu~ pairs best thought of as an expandable EUCLID record. In order to 3void ambiguity, when we use the word, name, in the text of this pa?~r in this technical sens~ (i.c. a field name of an a-set), w~ will put it in single quotes. The 'names' that can be used in a-sets are all declared globally along ~ith the data typ~ for each. Thus th~ 'names' in an a-set sarve the same function as the field nam~s in 2 Pascal record, and the values that are associ~ted with thos~ 'nemes' correspon~ to the values stor~d in the fi~lds of th? P~s~21 record. Pointers, a-sets and 'names' arc not allowed as va]u~s In a-sets. The rel.:ltions in our moc1~l ar,:: sets of a-sets, ,,-lith th~ 'nam~s' found in the a-setA of the relation serving as th~ colu~n names of th~ relation. Each a-set in th~ relation then repr~sent3 a tuple or row, the 'name'-value pairs in the ~-set associating ~i~ appropriate values with the column names. Thus tnc reader fa~111~r with the relational database literature may safely substitute "row" or "tuple" for "~-set" in the following, with the under~t~n1ing th~t some po~itions in th0 row m2y be empty. We view ~-scts as ~ gener21iz~tion and unificatic~ of 52vcral ?rogr3rn~lng l~ngu~gc concepts, ~.g. racards, argument lists, can "''''''J~ m"""''-''gt''.>s [ .... .,.l-;,..,.,n 7:11 ";,\ OV~OI""" t'1""t r' -1'-' O'-I"\g"'!:'"'~'''-r It:" _ '~1.i ._;;;;;.;;::J,-",,_ i:_Uu~a ;/ .... ,\_,_.,,~ ..;_\ 4-,..1. _I.l~l'''._. t' '"'xts 4._
lot will b~ im~roved by t~~ rc?lacemcnt of these m~ny concepts by one cle2n and powerful one. In t~a con~~xt of datab~se rn~n~J~~?n~, using a-sets instead of tUQle~ will ~OUC the relation~l ~!t~ b2S~ ~ n,lo"'l~""'''' "",",-l"I.now,'='·..Ig:. ·1·.3(:\1-,... ,...." • ..~ n tlod~ .1_ .L.rc ,...t' . . . lon 0 ft't'\,,-t'~~-::"I1 .::1 .. .;.;.r 1.1..1. ..... .1."'._ ... .:!l. ... ~._nc_ _c!.:>· ~h~ uses of a-sets in oth~r progr~~~lng language contexts ;5 ~~ll as the interesting challeng~ of g~ncr~ting efficient cod~ for l~n gU~g2S that use them will be dis:usse~ elsewhere. Grner progran~ing l~ngu~ge constructs that aro simil~r to t~~ ''l:h,=s?us c-ser .::iro tho: ::mQ3CL t~b10 [Gris·,.. . 01d 711, tile: SETL t.-:-,:)ular t . [ henne .• 7 :C':.]J , ,:In:::.~. '" C' .::. ., ,. 1· tun~ lon ....:I y '.no,,~.;;;;C:lW1' state .1?nts:.:,:-; ~(' us!":: [::;r group~~ rctri2vnls. For 5t~ternent3 ~lth ~or2 t'12n on~ p~r3m~t~r, generator pair can also be used to form joins of r~l~tions, 25 discussed in section 4.4.2, below. n
Insertion anJ Deletion
As mentioned abov~, relatio~s have associate~ proce5ur?s t~nt will be activated wh~n an insertion or ~eletion is atte~~tcd. ihe compiler supplies default versions of thes0 prccedure~ with e~ch relation declaration, but the default versions can be su?pl~nt2~ by the programmer. This technique can be use~ to enforce semantic constraints on the d~tabase, which inclu~e nor~al form [Cod~ 711 constraints as well as constraints that cannot be expressed in terms of functional dependencies.
Theseus - A Programming Language for Relational Databases
Insertion and deletion procedures have exactly two formal pa rameters, which must be of type a-set. The first parameter will be the a-set to be inserted or deleted and the second, a var parame ter, is the result a-set. through which the procedure iii'dTcates the status of the operation to its caller. ,. , ,
!:%~', .. ,. Within the procedure, when it is determined that the insertion '. or. deletion should actually take place, this is· indicated by a statement like the ordinary relation insertion or deletion, except. .that the keyword is nowInsert or nowDelete. These reserved words are illegal outside insertion and deletion procedures. The simplest example of an insertion procedure is the default.
procedure Insert ( TheAset: aSet, var Result: aSet ) .. imports ( theRelation) ---- ----- begin nowInsert TheAset in theRelation returning Result !!'!2. Insert The default deletion procedure is analagous. The following is an example of a programmer defined insertion procedure, preceded by declarations of three types and a 'name' for each. pervasive ~ returnCodeType = ( okay, noGood ) pervasive ~ reasonType = ( noKey, noSalary, payCut eervasive ~ date .. record var day: 1 •• 31, var month: ( Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec ), var year: 1900 •• 2100 end date
name returnCode: returnCodeType
name changeDate: date
EMP: relation key empNo procedure Insert ( EmpRec: aSet, ~ Why: ~ imports theRelation begin Why :- Empty if ~ present ( EmpRec, empNo ) then £!:!t returnC