Combinators as control mechansims in multiprocessor systems

Retrospective Theses and Dissertations 1987 Combinators as control mechansims in multiprocessor systems Deborah Lee Knox Iowa State University Foll...
Author: Isaac Curtis
3 downloads 0 Views 2MB Size
Retrospective Theses and Dissertations

1987

Combinators as control mechansims in multiprocessor systems Deborah Lee Knox Iowa State University

Follow this and additional works at: http://lib.dr.iastate.edu/rtd Part of the Computer Sciences Commons Recommended Citation Knox, Deborah Lee, "Combinators as control mechansims in multiprocessor systems " (1987). Retrospective Theses and Dissertations. Paper 8553.

This Dissertation is brought to you for free and open access by Digital Repository @ Iowa State University. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Digital Repository @ Iowa State University. For more information, please contact [email protected].

INFORMATION TO USERS While the most advanced technology has been used to photograph and reproduce this manuscript, the quality of the reproduction is heavily dependent upon the quality of the material submitted. For example: • Manuscript pages may have indistinct print. In such cases, the best available copy has been filmed. • Manuscripts may not always be complete. In such cases, a note will indicate that it is not possible to obtain missing pages. • Copyrighted material may have been removed from the manuscript. In such cases, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, and charts) are photographed by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each oversize page is also filmed as one exposure and is available, for an additional charge, as a standard 35mm slide or as a 17"x 23" black and white photographic print. Most photographs reproduce acceptably on positive microfilm or microfiche but lack the clarity on xerographic copies made from the microfilm. For an additional charge, 35mm slides of 6"x 9" black and white photographic prints are available for any photographs or illustrations that cannot be reproduced satisfactorily by xerography.

8716783

Knox, Deborah Lee

COMBINATORS AS CONTROL MECHANISMS IN MULTIPROCESSOR SYSTEMS

Iowa State University

University Microfilms Intsmâtionsl 300 N. zeeb Road, Ann Arbor, Ml 48106

PH.D.

1987

PLEASE NOTE:

In all cases this material has been filmed in the best possible way from the available copy. Problems encountered with this document have been identified here with a check mark V .

1.

Glossy photographs or pages

2.

Colored illustrations, paper or print

3.

Photographs with dark background

4.

Illustrations are poor copy

5.

Pages with black marks, not original copy

6.

Print shows through as there is text on both sides of page

7.

Indistinct, broken or small print on several pages

8.

Print exceeds margin requirements

9.

Tightly bound copy with print lost in spine

\/

10.

Computer printout pages with indistinct print

11.

Page(s) author.

lacking when material received, and not available from school or

12.

Page(s)

seem to be missing in numbering only as text follows.

13.

Two pages numbered

14.

Curling and wrinkled pages

15.

Dissertation contains pages with print at a slant, filmed as received

16.

Other

. Text follows.

University Microfilms International

Combinators as control mechanisms in multiprocessor systems

by

Deborah Lee Knox

A Dissertation Submitted to the Graduate Faculty in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major: Computer Science

Approved;

Signature was redacted for privacy. In Charge of Major Work

Signature was redacted for privacy. For the Major Department

Signature was redacted for privacy. For the Gradu

Colleg

Iowa Stale University Ames. Iowa 1987

ii

TABLE OF CONTENTS

1. INTRODUCTION 1.1 Introduction 1.2 Problem Statement 1.3 Thesis Organization 2. RLVIEW OF BACKGROUND MATERL4L 2.1 Introduction 2.2 Literature Review 2.3 Lambda Calculus 2.3.1 Functional Languages Compared to Imperative Languages 2.3.2 Functional Languages and the Lambda Calculus 2.3.3 Applicative Expressions 2.3.4 Useful Properties of the Lambda Calculus 2.3.5 Correspondence Between a Functional Language and the Lambda Calculus 2.4 Combinators 2.4.1 Useful Properties of the Combinator Model 2.4.2 Correspondence of the Lambda Calculus and Combinators 2.4.3 The S-K Reduction Machine 3. A METHODOLOGY TO GENERATE COMPUTER ARCHITECTURES 3.1 Abdali's Abstraction Algorithm 3.1.1 Demonstration of Abdali's Abstraction Method 3.1.2 Comparison of Abdali's Method and Turner's Method 3.2 Introduction of New Combinators 3.2.1 Revisiting Examples Using New .Abstraction Rules 3.3 Optimization of the New Abstraction Method ; 3.3.1 Evaluation of the Optimized Abstraction Method 3.4 Another Optimization 3.4.1 More Examples 3.5 Review of a Fixed-Program Machine 3.6 Modification of the Evaluator 3.6.1 Evaluation of Examples Using the Modified Evaluator 3.7 Multiprocessor Support 3.7.1 Allocation of Processors 3.7.2 Synchronization of Execution 3.7.3 Deallocation of Processors 4. IMPLEMENTATION NOTES 4.1 Machine Instructions 4.2 .Multiprocessor Support 4.3 Management of a Finite Processor System 4.4 System Requirements

1 1 3 3 5 5 5 12 12 12 13 13 16 17 18 18 20 22 22 25 27 30 33 37 40 42 45 49 53 54 65 65 68 69 71 71 73 75 76

iii

4.5 Simulation Enhancements 4.6 Examples of Our Multiprocessing Scheme 5. PERFORMANCE MEASUREMENTS 5.1 Traditional Measures 5.2 Measures of Our Systems 5.3 Interpretation of Our Measures 5.4 An In-Depth Interpretation of Our Measures 6. SUMMARY 6.1 Thesis Summary 6.2 Future Research 7. REFERENCES 8. ACKNOWLEDGEMENTS 9. APPENDIX 1: COMPILATION RULES AND SINGLE PROCESSOR MACHINE 10. APPENDIX 2: MULTIPROCESSOR MACHINE

77 78 118 118 120 122 124 132 132 133 136 139 140 144

1

1. INTRODUCTION

1.1 Introduction

Interest in functional programming languages and their support has increased rapidly in recent years. Although the elegance of functional languages has long been recognized, such languages have not, in general, been in wide use. One reason for this is that functional language implementations have not proven to be competitive with imperative language implementations. However, there are signs this situation is starting to change. Researchers are rediscovering that functional languages provide an inherent parallelism that is not present in sequential languages. If this parallelism could be taken advantage of, then the functional approach would be competitive with the traditional imperative approach. More importantly, this inherent parallelism would map quite nicely to multiprocessor evaluation. One of the problems with capitalizing on the inherent parallelism found in functional languages is how to capture it. Most research being conducted involves studying a program to determine where the parallelism exists, and then deciding how to decompose the program to best take advantage of this parallelism. This decomposition of the program involves generating subprograms that "fit" the given architecture. It is our belief that a very viable way to capture the parallelism inherent in functional programs is at the language level itself. In particular, we feel the language, via its semantic definition, should directly dictate the computer architecture upon which it is to run. It is this belief that has been the guiding force behind our work. Our work in this area began after studying a method in which a language dictates its supporting architecture in a formal way [21,22]. In Wand's approach to developing a machine, he first analyzed the denotational semantics of the language to be supported. The

2

denotational semantics define very precisely the meaning of the constructs of the language. Using this definition, he redefined the language in "operational" terms. The resulting operational definition provided his basis for developing a computer to support the language. In this sense, then, the language definition itself defines the requirements for the computer. Wand shows how he analyzed the denotational semantics for a programming language in order to develop a translator for the language, and then proceeds to map the translator definitions into architectural features. He presents a simple addition expression language, derives a translator for the language, and shows that the underlying architecture "derived" from the language definition is a stack machine. We have been unable, thus far, to utilize Wand's approach. We still believe in the concept of deriving an architecture from the language it is intended to support. We also believe that the way to do this is from the denotational semantics of the language. We have broken the problem down and have tackled a portion of it. Our approach is to use the denotational definition of a language and to translate this definition into combinator expressions [4]. These combina tors have well-defined rules for interpretation associated with them. This means that the combinator based definition of the language is operational in nature, i.e.. that we can use the rules as machine instructions. As we develop the machine instructions, we are beginning to specify a supporting architecture. In the process of deriving the architecture to support our language, we found that the combinators are dual-purposed. Not only do they serve as machine instructions for our underlying evaluator, but they provide a built-in mechanism to decompose our programs. We have found that combinators can, in fact, be used as control mechanisms for parallel evaluation. In particular, the inherent parallelism of functional languages can be captured via the natural decomposition capabilities of combinators. We do not have to struggle to break apart our programs in order to take advantage of any parallelism that may be built into the function. The combinators manage the decomposition of our program and allocate

3

program pieces to processors for (presumably parallel) evaluation.

1.2 Problem Statement

Our belief is that computer architectures should be derived from the programming languages they are intended to support. Furthermore, we contend that the inherent parallelism of functional programming languages should automatically be supported by such a derived architecture. Given a language, the semantics of that language can be defined denotationally. This definition is then converted into an equivalent definition that is operational in nature. Such a definition is then translated into machine constructs that represent the underlying supporting architecture. Our approach to solving this problem focuses on the semantics of the language. We start with a language definition given denotationally in terms of the lambda calculus [2]. This definition is then mapped into combinator expressions. Since the parallelism is inherent to the semantics of the language, it should be reflected in the definitions of the combinators as well. We plan to capture this parallelism by mapping the combinators onto an underlying implementation consisting of multiple processors. The combinators not only dictate the underlying architecture needed for support, but also direct the allocation of parallel evaluation of the combinator expressions.

1.3 Thesis Organization

The remainder of this thesis describes our method to tailor (adjust) a multiprocessor system to utilize the inherent parallelism in programs during evaluation. In Chapter 2, we review the pertinent literature, and provide background information on the theoretical foundations of lambda calculus and combinatory logic. We introduce a

4

basic functional language and demonstrate how combinators can be used to represent programs of this functional language. In Chapter 3. we introduce a newly developed set of combinators and discuss their equivalence to traditional sets of combinators. We present some comparisons among these various sets of combinators. Our new combinators are shown to be appropriate for sequential evaluation of programs, but more importantly are shown to have the capability to direct the decomposition of program code to allow for parallel execution on a multiprocessor system. In Chapter 4, we provide supporting information on how our system is implemented. We discuss the machine instructions of our evaluator and various supporting functions required to handle the multiprocessing features. We present system requirements for both the infinite and finite processor simulations. The chapter concludes with a presentation of some examples evaluated by our multiprocessing scheme. In Chapter 5. we analyze the performance of our multiprocessor system. We discuss some traditional performance measurements and present those measures of importance to our implementation. Based on these measurements, an analysis of an example evaluation by our multiprocessor system is conducted. The chapter concludes with a presentation of some system performance results. Finally, in Chapter 6, we summarize the work presented and outline future research directions.

5

2. REVIEW OF BACKGROIÎND MATERIAL

2.1 Introduction

We are interested in the support of functional languages, in particular in the underlying architecture used to support the execution of programs. We are proposing to derive an architecture from the definition of a language. We base our approach on a functional language because of certain key properties of this class of languages. Functional languages provide a high degree of expressive power. This expressiveness yields shorter code that is easier to read. More importantly, functional languages have the property of referential transparency. This property provides the potential for parallel evaluation of programs. Since the language is free from assignment statements, it is free from side-effects. This means that when evaluating an expression, subexpressions do not interfere with one another and may be evaluated in any order — even in parallel. Our approach to supporting a functional language is to map the language through a series of transitions. This thesis demonstrates each stage of the mapping. The stages of transition are depicted below. functional language -* ^.expression -* combinator expression -» machine code The combinators of the language define the architecture features necessary to support our language, thus the language dictates the support it requires. After some brief background information, each of these steps of transition are examined in detail.

2.2 Literature Review

Functional languages are based on a mathematical model known as the lambda calculus [2]. It provides a formal method to study functions and function application. We are not interested in studying the lambda calculus, but want to use the resulting

6

model to express our functional language. Stoye et al. [18], Gordon [5], and others, have shown that a language can be formally defined in terms of the lambda calculus. The denotational semantics of a language provide us with a framework to compare it against other languages. These formal semantics also allow us to establish formal proofs of correctness of individual programs. More important to our work is that the definition of a language provides the basis for a method of automatically generating an implementation of the language. It is this last feature that we are most interested in pursuing. Wand has used this capability in deriving a combinator machine, as we introduced in Chapter 1. Schmidt, in his work on the semantics of programming languages, has suggested that semantic definitions be used to "tune" computer architectures [l6,17]. Functional languages can also be translated into a form from which all bound variables are removed. This representation is based on the mathematical model of combinatory logic. The roles of the eliminated bound variables are expressed by primitive operators, called combinators, and a mechanism for applying a function to its argument. Turner [19] presents a method for translating lambda calculus programs into combinator notation. Before proceeding with Turner's results, we first review some of the features of combinators. Combinators were introduced by Schonfinkel in the early 1920s. They were based on the same area of mathematical logic that produced the lambda calculus. A combinator is an expression that is equivalent to a X.-expression that has no free variables. We are interested in using combinators because of this elimination of variables. The significance of not having variables in the code is that there is no need to maintain an environment. It is well-established that the maintenance of an environment, and subsequent variable look-up is very expensive in traditional implementations of von-Neumann languages. Another benefit of using a combinator model to represent programs is the simplicity of the resulting combinator machine to execute the code. This machine's instruction set is comprised of the

7

translation rules of the combinators. Using the combinator approach. Turner presents a method for the compilation of a high level language. He presents a method for eliminating the bound variables in an expression in order to produce combinator code. He also develops a machine to execute this code, known as the S-K Reduction Machine. By progressively reducing the combinator code, the expression is transformed to a number, or some other final value. This transformation method is unconventional. In the traditional "fixed program" machine, the code is is not modified once it has been compiled. The importance of the transformations in Turner's machine is that expressions are replaced by mathematically equivalent ones. The reduction rules in Turner's machine follow a policy of normal order reduction. This is simple to follow, but also has the advantage of being known to terminate (whenever termination is possible). This is important since it allows the machine to support non-strict functions. Turner made some comparisons of his S-K Reduction Machine against an SECD machine. (The SECD machine, introduced by Landin [15], has become the standard approach in implementing functional programming languages.) Favorable results for the S-K machine were gathered when comparing the size of the required code and the speed of execution (i.e., the number of steps required to complete execution). Turner also points out that the S-K machine is actually more powerful than the SECD machine since the S-K machine follows a normal order reduction, while the SECD machine is an applicative machine. Modifying the SECD machine to handle unevaluated parameters (i.e. normal order reduction) appeared to slow the machine by an order of magnitude. Jones [ll] also investigated the efficiency in using combinator code (as Turner proposed) compared to implementing a lambda calculus reducer. He studied three reducers: a normal order lambda reducer, an applicative order lambda reducer, and a combinator reducer. He discovered that the combinator reducer outperformed both

8

lambda reducers. Among other conclusions, his results point out the high cost of the environment lookup for the lambda reducers. He concluded that combinatorial implementations of functional languages were competitive with other alternatives. There are two other important features of Turner's machine. It requires the evaluation of a common subexpression only once, no matter how many times the subexpression is included in a program. A second benefit is that it allows a programmer to introduce abstractions into his program without any penalty in execution. The first time an abstraction is used, it is replaced by the necessary expansion. These benefits are not unique to the S-K machine, but occur because the machine is a reduction machine. However, even if the machine is compared against a lambda calculus reduction machine (that performs j3-reductions). the S-K machine outperforms the latter. This is because the use of combinators allows for very simple reductions rules which can be performed faster than /3-reductions. Turner presented another algorithm for the abstraction process which handles special cases in a more efficient manner than his original method [20]. The algorithm keeps the number of combinators constant, therefore keeps the machine instructions at a small number. Abdali proposes an alternative algorithm that also handles the use of multivariable abstraction, but uses an infinite number of combinators that need to be implemented at the machine level [l]. We will provide a detailed study of Abdali's approach later in this thesis. Other researchers have found Turner's work to be a strong foundation to build upon. Whereas Turner implemented the S-K Reduction Machine in software, Clarke et al. [3] investigated how Turner's ideas could be implemented in hardware. The SKIM (The S, K, I Reduction Machine) implementation views the combinator rules as machine code and implements a fixed program in microcode to execute the combinator sequences of user programs. The SKIM investigators were pleased with the outcome of their project. The

9

simplicity of the combinators lead to a equally simple, yet fast, hardware design. Favorable results with the SKIM project led to a successor implementation. The SKIM II project recognized improvements on microcoding, methods, and algorithms [18]. SKIM II provides the environment for software experimentation in the methods for combinator reductions. In particular, the project team is investigating the compilation of functional programs into microcode. Various optimizations to Turner's methods are presented, although the authors point out that these modifications are invisible to the high level programmer. The SKIM II project should be able to easily incorporate the use of multiple processors and experience increased performance. It is also pointed out that current methods for combinator generation are far from satisfactory, the project should be able to make use of any improvement in that area. Another area of investigation into implementing functional languages is that of compiling the functional programs. For conventional languages, compiled code generally executes faster than interpreted code. Jones and Muchnick present an approach for translating combinator code into fixed-program code [lO]. Their method translates combinator code into stack machine code. This compiled code is then evaluated. Their evaluator operates by reducing a given combinator to its head normal form. This form is required by an extension to their algorithm to perform call-by-need. If normal order (call-by-name) evaluation is followed, the combinator expression can be reduced to a value. The call-by-need implementation allows some improvement in execution by generating pointers to common code so that it needs to be executed only once, regardless of the number of times it is called. Jones and Muchnick indicate that their methods are comparable to the S-K Reduction Machine and the SKIM approaches. They also suggest further improving the execution by adapting the code for execution by a multiprocessor system. It became apparent to us that

10

this was exactly the approach we would take. In all the work on combinators, a common theme to emerge is the need to find a set of combinators that is optimal in expressive power. In addition to the work of Turner and Abdali in finding optimal combinators, Hughes [8a] explores "super-combinators". Pointing out several problems with Turner's basic approach, including slow compilation due to the optimization rules and the numerous passes over the code as each variable is abstracted, Hughes proceeds to describe an approach to overcome these problems. His approach introduces super-combinators, which are a generalization of the class of combinators. Any X-expression that contains no free variables and has a body that is an applicative form is a combinator. (An applicative form consists of variables and constants joined by application.) Combinators are considered to be constants, so a combinator can be built from other combinators. This leads to an infinite number of combinators. A compiler would not need to contain definitions of these infinite number of combinators, but must be able to generate the definitions for the combinators it uses. Each program would have a unique set of combinators compiled. The lambda calculus is considered to be the canonical programming language. Hughes chose the language of constant applicative forms (cafs) as the graph-reduction machine code. He points out that Turner's conversion from lambda calculus programs into combinators achieves full laziness by breaking down the computation into very small, independent steps. The machine code is far removed from the program code, making it very difficult to debug. Hughes presents a new method that increases the size of the granularity of the steps of compulation. Refining the process of optimizing the granularity of the combinators. Hudak and Goldberg [6,7] introduced "serial combinators". They present a method for translating a functional program into serial combinators suitable for execution on a multiprocessor system that employs no shared memory. The serial combinators are intended to improve

11

on the notion of the super-combinator by making them larger to retain locality and improve efficiency, but also by ensuring that no parallelism is lost. Compared to Turner's method, which requires n abstractions for the translation process and n reductions to execute an expression with a free variable at lexical depth n , the serial combinator method requires constant overhead to translate from X-expressions into combinator notation. Hudak's and Goldberg's intent is to provide a general-purpose system on which a user could write and debug a functional program which is then run on a parallel machine for improved performance. The parallel machine has no special need for communications or synchronization primitives or for special parallel constructs. This is in contrast to the work on the Rediflow project [12,13], where the programmer defines the granularity by source level function definitions. Another graph reduction machine is the G-machine which was designed by Johnsson [9]. The work was based on Turner's combinator approach, but instead of using a fixed set of combinators, each user defined function is used as a "combinator", actually a rewrite rule. Functions are compiled into code for the G-machine which, when executed, creates and reduces expression graphs to reduce expressions to their values. Kieburtz [14] presents an evaluator based upon the G-machine's abstract architecture. He points out that control in a programmed graph reduction is specified by a sequence of instructions. These instructions are statically derived by the compilation of the applicative expression. This contrasts with combinator reduction, where control is derived dynamically from the combinator expression. It is suggested that a programmed reduction system allows the use of current technology in that current computer architectures can execute the machine code. However, as we have seen, functional languages are not supported very efficiently on von Neumann architectures. The maintenance of and access to the environment appear to be the primary source of overhead when evaluating functional language programs with conventional

12

evaluators. The Kieburtz evaluator alters the importance of overhead of the environment by eliminating non-local variables, creating suspensions rather than closures, saving state in hardware, utilizing a separate processor to handle memory management, and the use of the register space as stacks.

2.3 Lambda Calculus

2.3.1 Functional Languages Compared to Imperative Languages

Traditional imperative languages force the programmer to think of the flow of control through a program. This is brought on. in part, by thinking of variables as storage locations. Program statements are history sensitive, forcing the value of a variable to be dependent on the order of computation. This demands thai the programmer keep track of the interaction different sections of code have on the same variables. This presence of "side effects" reduces the capability of parallelism in execution. Functional languages avoid some of the problems associated with the conventional languages. There can not be any side efl"ects because the model does not include assignment statements. The programmer, therefore, does not need to be concerned with the flow of control through a program, i.e.. the programs are not history sensitive. The value of an expression depends only on its context. As a result, an expression can be evaluated at any time and replaced by its equivalent value.

2.3.2 Functional Languages and the Lambda Calculus

Functional languages are based on a mathematical model known as the lambda calculus. It provides a formal method to study functions and function application. We will use the lambda calculus model to define our functional language.

13

2.3.3 Applicative Expressions

Functional languages are based on applying a function to its arguments. An applicative expression [15] (AE) is either 1.

an identifier.

2.

or a X-expression. which has a bound variable part that is either an identifier or an identifier list, and a X-body. which is an AE,

3.

or a combination, consisting of an operator (rator), and an operand (rand). Both the rator and the rand are AEs.

Example Applicative Expressions Expression

Type

X \a . a + 1 \ a .\ b . a / ( 2 x 6 + 3 ) [Xa . a + l](4)

identifier X-expression X-expression combination

In the combination above, [Xa .a + l] is the rator, or function. It is an AE. with \a representing the bound variable part, and a + 1 representing the X-body. The rand of the combination is the (4). Function application occurs by substituting the 4 for every occurrence of a in the X-body.

2.3.4 Useful Properties of the Lambda Calculus

There are certain features of the lambda calculus that make the model very useful. One such property, referential transparency, is due to the absence of assignment statements. When an expression is evaluated, it can be replaced by its equivalent value at any time. Unlike updating a variable, there are no possible side effects from replacing the

14

expression with an equivalent form. This equivalence provides an opportunity to optimize. Once the evaluation is completed for a particular expression, the resulting value can be used later without re-evaluating the original expression. This could result in a savings of time and resources during program execution. Another feature of the lambda calculus is called the Church-Rosser property [4]. It allows expressions in the lambda calculus to be evaluated in any order. If different orders of evaluation all terminate, we are guaranteed they each yield the same result. This means we can evaluate the rators and the rands in any order. In fact, we may even evaluate the rators and the rands in parallel and apply the results of the rators to the resulting rands. The following example illustrates the ability to evaluate an expression in different orders. Part (a) of the example uses an outside/in evaluation order, while part (b) uses an inside/out approach. (a) [Ax . ([\2 . ([Xy . z XyKx + 2))](3))K2) = [Xz . ([Xy . z X y ](2 + 2))](3) = [Xz . ([Xy . z X >'](4))K3) = [Xy .3 X y](4) = 3X4 = 12

(b) [Xx . ([Xz . ([Xy .2 XyKx + 2))](3))](2) = [X.t . ([Xz . z X (x + 2)](3))](2) = [Xx . 3 X (x + 2)K2) = 3 X(2 + 2) = 3X4 = 12

This next example illustrates the importance of convergence for two different evaluation orders to produce equivalent results. Given the AE: [Xx . 2]([Xy . y y ](Xy . y y ))

15

First, applying the rator to the rand, we evaluate the expression to 2. Now, suppose we choose to evaluate the rand before making the function application. We get: [XJC . 2]([\>' . y y ](XY • Y 3* ))

The rand evaluates to itself! It is easy to see that if we choose to always evaluate the rand in this example, then the evaluation will never converge. The following example is designed to allow parallel evaluation of the rator and the rand. First, a sequential evaluation is demonstrated in part (a). (a) [[Xx . k y . X + 1](1) K[\2 - Z + l](2)) = [[Xx .\y .X + 1](1)K2 + 1) = [[Xx . Xy . X + l](l)](3) = [Xy . 1 + 1K3) = [\y . 2](3) =2

This particular expression could have the rator and the rand evaluated in parallel. One machine could evaluate the rator and then wait for the evaluation of the rand to complete (via a second machine) and finish the expression evaluation by applying the resulting rator to the final form of the rand. Part (b) demonstrates this optimization: (b) Machine 1 [Xx . Xy . X + l ] ( l ) = Xy . 1 + 1 — Xy 2

Machine 2 ([Xz . z + lK2)) =2 + 1 =3

To finish. Machine 1 gets the result from Machine 2 and applies the rator to the rand: = [Xy . 2](3) = 2

16

2.3.5 Correspondence Between a Functional Language and the Lambda Calculus

A simple functional language consisting of let expressions illustrates a correspondence between functional languages and the lambda calculus. A let expression creates a local environment by binding the variables. The general form of the functional language is; let / UJ.X2, • • ,x„) = E in M

This corresponds to the X-expression: [\/ . A/](\(XI,X2. • • • ,A:„ ) . £) A simple example illustrates this mapping: let / (a: ) = X + 1 in /(2) corresponds to Wf . / ( 2 ) ] ( \ x . x + 1) =

[Xx

.X

+ l](2)

= 2 + 1

=3 The next example illustrates the static scoping used in the lambda calculus. let a = 1 in let / (x ) = a +2 in let a =5 in /(17) corresponds to

17

[\a . [\/ . ([Xa . / (l7)](5))](Xx . a + 2)](1) = [\a . [ \ f . f (17)](\z . a + 2)](1) = [\a . [Xx . a + 2](17)](l) = [Xa . a + 2](l) = 1+2

= 3 The environment for the function / (x ) includes the binding of a to 1. Therefore, when the function is called with an argument of 17, the result is 3. If dynamic scoping had been used the result would have been 5. The next example illustrates the need to rename variables in order to avoid confusion regarding the scope of the bound variables. let % = 2 in let X =3 in let y = X +2 in X Xy

corresponds to [Xx . ([Xx . ([Xy

.X

XyKx + 2))](3))](2)

Upon renaming, we get the following: [Xx . ([Xz . ([Xy . z Xy](x + 2))](3))](2) = [Xz . ([Xy . z X y ](2 + 2))](3) = [Xz . ([Xy . z Xy](4))](3) = [Xy . 3 + y ](4) =3x4 = 12

2.4 Combinators

Another representation of functional languages is the combinator model. This model allows us to eliminate the use of variables. The role of these eliminated variables can be expressed by primitive operators, called combinators. together with a mechanism for applying a function to its argument.

18

2.4.1 Useful Properties of the Combinator Model

Two properties of the combinator model are of interest. First, the evaluation permits non-strict functions to be implemented at no extra cost. This means that not all arguments to a function must be evaluated, i.e.. an operand will only be evaluated if and when it is needed. The lambda calculus model does not support non-strict functions unless some modifications are made (at an extra cost). Second, user defined functions are inexpensive to use (after the initial overhead cost of the first call).

2.4.2 Correspondence of the Lambda Calculus and Combinators

We are interested in transforming \-expressions into combinator expressions. (We already have seen how to translate a high level functional language into À-expressions.) This step will allow us to take a high level language and "compile" it into a form that can be executed by a simple combinator machine. The instruction set of this machine consists of the combinator rules. As we will see. the use of combinators will eliminate the need for bound variables and the environment model to represent them. Turner discusses a method for removing variables from programs which is based on three initial combinators: S fgx =fx(gx) K X y=X I x =X A special abstraction operation denoted by [x]E is used to remove all occurrences of x in E. where x is a variable and E is an expression. The abstraction operation is defined for combinators S. K. and I as follows: [XKEjEJ) - S([x]E,)([X]E2) [x]y -» K y X [x]x -* I

19

where Ej and

are expressions and where y is a constant or variable (other than x).

The resulting code from abstracting the variables can be very large. There are some optimization rules: S ( K E,)( K E,) - K (E.EJ S(KEj)I ^Ej S ( K E )E -* B EjEj if no earlier rule applies S Ej( K E ) -» C EjEj if no earlier rule applies j

2

j

These optimization rules account for special cases where the variable being abstracted is not present in all subexpressions. B and C are combinators to handle these special situations. An example conversion from a À-expression to combinator expression illustrates the elimination of the variables via the above rules. The following example was mapped from a simple functional language into a \-expression in section 2.3.5. [Xf.f(2)](\x.x+1)

1.

Conversion (abstract both f and x) ([f]f(2 ))([x](plus X 1)) = ( S ([f]f)([f]2 )X S ([x] plus x)([x]l)) = ( S ( I X K 2 )X S ( S ([x]plusX[x]x))( K D) = ( S (I X K 2 )X S ( S ( K plus)( I )X K D)

2.

Optimization = C I 2( S(plusK K D) = C 1 2( C plus 1)

Notice that the variables f and x have been eliminated from the final expression.

20

2.4.3 The S-K Reduction Machine

The S-K Reduction Machine evaluates the combinator expressions resulting from the variable abstraction. The evaluation mechanism is through transformation of the combinator expression. Reduction rules are applied to each stage of the transformation until a value is obtained. Some of the reduction rules are: S fgx-»fx(gx) K X y -»x C f g X -»(f x) g B f g x -•f (g x) I X -• X and also the primitives, such as: plus X y -» x+y etc.

To continue with the example from the above section: 3.

Reduction = I ( C plus 1 )2 = C plus 1 2 = plus 2 1 =3 The S-K reduction machine transforms the combinator expression with each

reduction step. This is unlike a conventional fixed program machine that executes the code, usually without modification. The transformations that occur replace expressions by mathematically equivalent ones, e.g., plus 1 2 is replaced by 3 so that the actual evaluation of plus 1 2 only occurs once. The order of the reductions is defined as normal order. At each step, the leftmost reduction is performed. This method is not only simple to use. but is guaranteed to terminate (if possible). Normal order reduction is what we previously called the

21

outside/in evaluation. The other order we looked at, inside/out, is called applicative order reduction. This involves performing the innermost reductions first. Applicative order reduction is not guaranteed to converge. Also, applicative order reduction does not support non-strict functions. Each subexpression must be evaluated, which can cause problems. The normal order reduction method supports non-strict functions.

22

3. A METHODOLOGY TO GENERATE COMPUTER ARCHITECTURES

3.1 Abdali's Abstraction Algorithm

In an attempt to improve upon Turner's combinator sequences. Abdali presents an algorithm to carry out a new abstraction method. His method results in combinator code consisting of a different set of combinators. It performs a one step abstraction on multiple variables and yields a single combinator for the group of variables. In comparison. Turner's method requires that the abstractions be nested and a combinator sequence be generated for each variable being abstracted. To accommodate multiple variable abstractions. Abdali introduces the notion of a family of combinators. Each combinator family has an infinite number of members, but each member behaves in a manner dictated by the family. Consider, for example, the K combinator that Turner uses. Its behavior can be characterized by the reduction rule K X y - * X . A b d a l i m a k e s u s e o f a f a m i l y o f Ar„ c o m b i n a t o r s , w h e r e T u r n e r ' s K

combinator is actually a member of the set. in particular K j. The action of the K,, combinator family is given by the reduction rule: K„ a b I ••• b„ -* a

It is easy to see that Turner's K follows the family's reduction rule, for n =1. The other combinator families employed in Abdali's scheme contain members whose actions we are already familiar with: I X -*x B f X y

f {x y)

The more general family combinators have the following reduction properties: A'" a 1 • • • a„ -» a,„ a hi

n

^1

bm Ci ••• c„ -• a (ft 1 cJ • • • c„ ) • • • (6„, cI • • • c„ )

m ,n ^ 1

23

By inspection, we can see that the combinators we are familiar with are indeed members of the above families. Suppose m=\ and %=1: // X -*x Bi f X y ^ f {x y) A working description of Abdali's abstraction algorithm will be given. For a more

formal treatment, see the original work [l]. An expression of the form [xj • • •

]e

indicates that the variables xi • • • x„ are being abstracted from the expression e . Given such an expression, the following rules are applied, in order, to perform the abstraction. 1.

If X i is not contained in e for all i, then 1

[x I • • • x„] e -* K„ e

Examples: [x ] a - * K i a [x.y]l -^K2 7 [x.yl + a b -* K 2"^ a b

2.

If e matches x i • • •

. then [xi - • • x„]e -»/

Examples:

[x .y ] x y -*/

3.

If e matches an x, for some i. then [ x I • • • x „ ] e - * I!,

Examples: [x,y]x [x.3'.z]y

1^1

24

4.

If g is of the form g x ^ - • •

and x - , is not contained in g for all i , then

[xI • • • x„] e -*g

1

Examples: [x ] + X -» + [x ,y ] + a: y -* +

5.

If e is of the form g x „ • • • x „ and x, is not contained in g for i ^ m . then [xi • • xje -*[x 1 - x^.Jg Examples: [x J ] + y -*[x] + [x .y .2 ] + + 1 w 2 -• [x .y ] + + 1 w

6.

If e is of the form x, f 2 [x 1

f m for some i . then

• x„ ] e -» 5^" I

([x 1 • • • x„ ] / 2) • • • ([x 1 • • • x„ ] /,„ )

Examples: [x ] X a b c - * B I I /1' ([x ] a ) ([x ] 6 ) ([x ] c ) [x ,y ] y a b c - * 8 2 I 1 2 ([x ,y ] a ) ([x .y ] 6 ) ([x .y ] c ) [x .y]x a b -*82 1 I2

.y]a) ([x .y ] 6 )

[/] / 2 - 5 ? / l l ([/]2) 7.

If e is of the form f \ f 2 ' ' ' Ï m where / 1 is the longest initial component not containing an x,- for all i, then [x 1 • • • x„ ] e -» 5^""' / 1 ([x 1 • • • x„ ] / 2) • • • ([x 1 • • • x„ ] f , „ ) Examples: [x ] a (6 X ) c - * B i a ([x ] 6 x ) ([x ] c ) [x .y ] a b ( c y ) ( d x ) - * 8 ? ( a ô ) ([x .y ] c y ) ([x .y ] [x ] +

X

1

81 + ([x ] X ) ([x ] 1 )

x)

25

3.1.1 Demonstration of Abdali's Abstraction Method

A few examples of converting simple functional programs into combinator code will illustrate how Abdali's abstraction method works. Later in this chapter we will compare combinator sequences obtained by Abdali's method to those obtained when Turner's original abstraction algorithm is followed. Example 1 (Function Definition) let / (x ) = z + 1 in / (2) Lambda calculus notation: [X/ . / 2] (\ x . + x 1) Abdali's abstraction method:

([/]/ 2) ([x] + X 1) Rule 6

B f I 1 } ([/ ] 2) ([x] + a: 1)

Rule 1

Bi I II Ui2)([x] + x 1)

Rule 7

B l I II U i 2 ) 5 f + (U]x) ([x] D)

Rule 2

Bl I II {KxDB^ + / ([X] D)

Rule 1

B} I n

2) (5i2 + / (ATi D)

Reduction: B ? I // (JiTi 2) { B ! + I Ui D) - » / U } W f + I i K i 1))) K i 2 ( . B ^ + I (/fi 1)))

- » / i ' ( B f + / i K i D ) U , 2 ( g f + / (AT, 1 ) ) ) + / Ui 1) (%i 2(5f + / (A:i 1))) - + ( / ( A : i 2 ( 5 f + / ( i i f i 1 ) ) ) ) ( j f i 1 C f i 2 ( a f + y ( a Ti i ) ) ) ) -+ Ui 2 iBl + / - + 21 -»3

1))) 1

26

Example 2 (Multiple Parameters) let / (a .6 ) = a + b in / (3,4) Lambda calculus notation: [X/ . / (3 4)] (\(a , b ) . + a b ) Abdali's abstraction method: ([/ ] / 3 4) ([a ,6] + a 6) Rule 6

/ 7/ ([/ ] 3) ([/ ] 4) ([a .6] + a 6 )

Rule 1

/ 7/ (A-j 3) ([/ ] 4) [a .6] + a 6)

i?uZe 1

1 II (A'i3)(A-i4)([a,fe] + a b )

Rule A

af7 7f Cfi3)(#i4) +

Reduction: B l I I Î U x 3) (A:, 4) + ^ 7 (7/ +) (Ai 3 +)(%! 4 +) -7i' + (À-, 3 +) (A-j 4 +) -+ + (A"1 3 +) (A 1 4 +) -* + 3 (A" 1 4 +) -»+ 3 4 -7 Example 3 (Nesting) let a =1 in let 6 =2 in a •¥ b

Lambda calculus notation: [Xa . [X6 . + a 6 ] 2] 1 Abdali's abstraction method; ([a] ([^] + a 6)2) 1 Note the abstraction is performed from the inside of the expression to the outside.

27

Rule 4

([a ] + a 2) 1

Ridel

5 ? + ([a]a) ([a]2) 1

Ridel

5f+/([a]2)l

Ride 1

+ I { K i 2) 1

Reduction B l + I { K i 2) 1

- + (/ 1) U i 2 1) -»+ 1 U i 2 1) -•+ 1 2

-3

3.1.2 Comparison of Abdali's Method and Turner's Method

The primary reason we became interested in Abdali's work was that we were looking for a way to use simultaneous definitions in the lambda calculus and still have the capability to convert programs into combinator sequences. Consider the following program: let a =1 and 6=2 in a +6 Turner's method would force us to curry the lambda calculus notation for this program; [Xa . [X6 . (+ a ) 6] 2] 1 The above notation is identical to what we would arrive at for let a = 1 in let 6 =2 in a •¥ b

Thus, the necessity of currying forces the nesting of the variables where nesting is not

28

really desired. What we want is for the variables a and b to be declared in the same block, and the abstractions to clearly depict the simultaneous nature of the variable declarations. Suppose we represent the above program as [\(a , h ) . + a 6] (1.2) where the (a .6) represents the simultaneous definition of the variables. It is now an easy job to translate the lambda calculus notation into a combinator sequence using Abdali's method. ([a .6] + a 6) (1,2) Rule 4

+ (1.2)

We won't always see such a simple outcome, but it is clear that Abdali's method will handle simultaneous definitions as well as multiparameter functions. To further demonstrate this multiparameter capability. Example 2 above shows that let / (a .6 ) = a + 6 in / (3,4) translates to [\/ . / 3 4] (X(a ,6) . + a b ) . This lambda calculus equation can easily be handled by Abdali's abstraction method. In order to apply Turner's abstraction rules, the equation must be curried to [X/ . (/ 3)4] (Xa . (X6 . (+ a ) )). This currying operation forces us to abstract a and b in sequence rather than simultaneously, thus making the abstraction process a little longer. Lambda calculus notation: [X/ . (/ 3)4] (Xa . {Kb . (+ a ) 6)) Turner's abstraction method: ([/](/ 3)4)([a]([6](+a)6)) (5 ( [ / ] / 3)([/]4))([a](5 ([6] + a ) ([&]&))) (S (S ( [ / ] / ) ( [ / ] 3 ) ) {K 4)) ([a](S (S ([6]+) ([ftja))/))

29

(S (S I i K 3)) ( K 4)) ([a] (5 (S ( K +)

a)) /))

(S (C / 3) (Jir 4)) ([a] (5 (.K (+ a)) /)) C (C / 3) 4 ([a ] + a ) C (C / 3)4(S ([a]+) ([a]a)) C (C / 3 ) 4 ( 5 (JT + ) / ) C (C / 3) 4 + Reduction: C (C / 3) 4 + -»C/3 + 4 -7 + 34 -+34 -*7 Upon further study of Abdali versus Turner combinator representations of programs, it became apparent that in certain cases Abdali's representations were inferior to Turner's due to the sheer number of combinators generated. Clearly Abdali's method offers the capability to handle multiple parameters while Turner's algorithm does not, but in some situations Turner's code is more desirable. For example, consider Example 3 from above. let a =1 in let 6 =2 in a +b Abdali's method generates the combinator sequence code for this program is much shorter: C + 2 1.

+ I ( K i 2 ) 1 . Turner's optimized

30

3.2 Introduction of New Combinators

The logical step was to somehow combine the features of both methods. We wanted reasonable combinator sequences to be generated, but did not want to lose the capability to handle multiple parameters simultaneously. Turner's code is an optimized version, so we attempted to model our new abstraction scheme on Turner's set of combinators. At the same time, we decided to keep the notion of families of combinators that Abdali uses. Turner's rules for abstraction are repeated again below. -"S ( U ] £ i) ( U ] £ 2 ) [x]x

[x]y -*K y

Abdali includes the I and K in his translation rules as follows: For [ x I • • • x „ ] e

1.

If X; is not contained in e for all i , then 1

[xi • • • x„]e -* K„ e

2.

If e matches x j • • • x „ . then [x 1 • • •

3.

]e

If e matches an X, for some i. then [x 1 • • • x„ ] e

1

We introduce an Abdali-like translation rule for S,'," : If e is an application F A . then [x 1 • • • x„ ] e -• S,/ ([x 1 • • • x„ ] F ) ([x 1 • • • x„ ] /i ) Extending this rule for the more general case, we get the following:

31

If g is an application grouping f \ f 2 ' ' ' f m A . then [x 1 • • •

] c -»

S™ ([x 1 • • • X„ ] / 1) ([x 1 • • • Xn ] / 2) • • • (U 1 • • • X„ ] /m ) ([x 1 • • •

] /I )

The Sn combinator is defined by Sn"* X y 1 • • •

z 1 • • • z„ -• (x z 1 • • • z„ ) (y 1 21 • • • z„ ) • • • (y„ z 1 • • • z„ )

It is easy to see that the S combinator Turner uses is a member of the S™ family. S f g x = S } f g x = f x ( g x )

We now have three families of combinators: SI,". K„ . and /"'. We consider / as a special case of the family Z™. Turner used members of these families. While they are sufficient to remove all variables from programs, long-winded combinator code results. To improve the code, extra combinators B and C were introduced. We will also include the families of B'" and C," in our new set of combinators. Abdali already introduced the 5™ family of combinators in his work. 5™ a 61 • • • 6,„ c 1 • • • c„

(61 c 1 • • • c„ ) • • • (6,„ ci • • • c„)

The C/i" combinator family is similarly defined. Cn f \ -

fm g

\ XI • • • x„) • • • i f

X^

• x„) g

The optimizations for these combinators. patterned after Turner's results, follow: Opt 1 5™ {K„ Eo) U n E 0 -

- U „ E„, ) - * K „ { E o E ,

Opt 2 s: {K„ Eo) In' 4' " Jn"

Opt 3 £0) E , - - - E , „ ^ 5- E ^ E , • • • E n , Opt 4 S„'" £0

Un En, ) -» a;' £0

• E„,

--

)

32

We can show that the left and right hand sides of these equations are always equal by applying both sides to arbitrary x j • • •

and simplifying.

Opt 1 S™

Eo)

E0--U„ E^)xi--x„

-*iKn E Q X I - • • x„) (.K„ El X I • • • x„) • • • (.K„ E„ X I • • • x „ ) -*EOEI • • • E„ and K„ {EoEi • • • E,„) x i • • • x„

EÇ^EI • • • E„, therefore

T (Ar„ £o) (^„ E L ) - - ( K „ E . „ ) - > K „ ( £ „ Opt 2 5„"(Ar„ E o ) 7 „ ' l „ ^ - - - I „ " x i - - - x „ -*(,K„ EQXI -

X„) UN^ XI - - • X„)

XI

- • X„) - - - (//; X 1 • • •

— £ ( ) X i X2 • • - Xn and j?0 *1 ^2 • • • therefore 5„" ( K „ E o ) I n

• - - la XI • - - x„

Opt 3 S™ (/:„ E(0 EL - - - E„, X1 • • • x„ -» (^n

1• • •

-• £(, (-^1 ^ 1 • • •

) (-^l X I • • • X„ ) • • • (£„, X 1 • • • X„ ) )• • •

and B!,"E„EI

X , • • • X„

^1 • • •

)

)

33

- * E q { E I X I - • • Xn) • • • i.E^ X1 • • •

)

therefore SIP {KN E Q ) EL' •• EN,

BH' E Q E I • • • EM XX • • • x „

Opt 4 S n Eq Ei - • • EM- l i-Kn

Xi - • • x„

-*iEo Xi • • • x „ ) ( E l Xi • • • x „ ) • • • ( E „ - i Xi • • • x „ ) (.K„ E,„ x i • • • x „ ) -*(EoXi - •• x„) (El xi -

x„) •• • (Em-i xi - • • x„) E„

and Cn Eo Ex • • • E,„ xx - • • x„ -*{.Eq Xx - • • x„){ExXx - • • x„) • • •{E^.x'^y - • • therefore SX E^Ex -

- E„,.x

-C r £ o - ^ 1 • • • E „ ,

3.2.1 Revisiting Examples Using New Abstraction Rules

It will be useful to study some examples to see if the proposed set of abstraction rules produce desirable code sequences. We will revisit some earlier examples. Example 1 let / i x ) = X +1 in

/ (2) Lambda calculus notation: [\/ ./ 2] (X x . + .r 1) New abstraction method: ( [ / ] / 2) ([z ] + z 1)

S i ( [ / ] / ) ( [ / ] 2) (5^ ax]+)([x]A:)([x] D) 5 / I ( K i 2 ) i S ^ (Kx + ) / (ATi D ) optimizing once yields

34

S} I iKi2) {Bl + / {Kl D) and optimizing again yields C/ / 2(5f + I U i D) Reduction: 1 IkBl + 1 {Kl D) -»(/ {Bl + I {Kl 1)))2

+ / {Kl 1) 2 -^+ (/ 2) {Kl 1 2) - + 2 Ui 1 2) -»+ 2 1

-3 One way of measuring the desirability of the code produced is to count the number of combinators that result from each method. New abstraction method: C l I 1 { 3 \ + / ( i f i 1 ) ) Abdali's method: B i I // { K i 2) { B l + 1 { K i 1)) Turner's method: C I 2 { C + 1 ) In this example, we see that our new method produced code with a number of combinators between the number resulting from Turner's method and from Abdali's method. Since we have added the ability to handle multiple parameters and simultaneous definitions to Turner's original method, we should expect to pay the price by producing longer code. Similarly, since we have introduced optimizations to Abdali's technique, we would expect to get code that is the same length (if the Abdali code is optimal) or shorter. The cost to have the shorter code is a longer compilation phase, but the method gives us (potentially) a shorter execution phase.

35

Example 2 let / (a ,& ) = a + 6 in / (3,4) Lambda calculus notation: [\/ . / (3 4)] (X(a ,&) . + a &) New abstraction method: ([/ ] / 3 4) ([a .è] (+ a b)) Sf ( [ / ] / ) ([/ ] 3) ([/ ] 4)

( [ a . b ] +) ([a ,6] a ) ([a .6] i ))

/ Ui 3) (iifi 4) (S| (%; +)

7| )

optimizing Cf / Ui 3) 4 (5| Uz +)

/I )

optimizing C l I U i 3) 4 +

Abdali's method: B i I 1 } ( . K i 3) (A'l 4) + We have previously discussed the necessity of currying in order for Turner's original method to handle the above example. Turner's abstraction process requires more work than our new technique. Turner's method: C(C/3)4 + Note in this example that our new technique produces code that is just as efficient as Turner's, yet is produced in fewer abstraction steps. Reduction: Cf I ( K i 3) 4 + - ( / + ) (A-i 3 + ) 4 —* + ( K 1 3 + ) 4 ->+3 4 -»7

36

Example 3 let a =1 in let 6 =2 in a +6

Lambda calculus notation: [\a . [\6 . + a 6 ] 2] 1 New abstraction method: ([a]([&] + a 6 ) 2 ) 1

( [ a ] ( S f ( [ & ] + ) ([6] a ) ([6] 6)) 2 ) 1 ([a](Sf

+)Uia)7)2) 1

optimizing (Xa]{Bl + {Ky a ) 1)2) 1 S Î {[a]Bl + U,a)/)([a]2) 1

51^ (5? ([a]5f ),([a]+) ([a ] iif, a ) ([a ]/)) Uj 2) 1 5/ (5? iKiBl){Ki+){Sl ([a]/ri)([a]a))(iiri/))(ii:i2) 1 S/ (5? U i 5,2) U i +)(5/

I)iK^l)){Ki2) 1

optimizing 5i' (5? Ui Bf)(A\ +)

(A-i /)) U, 2) 1

optimizing Si

g f (A-j +) K i iKi / ) ) (/r, 2) 1

optimizing Cl

5,2 (A:, +) AT, (À-, /)) 2 1

Abdali's method: 5^ + / (AT, 2) 1 Turner's method: C + 2 1 Our method obviously has some faults also! The Abdali generated code is shorter than the code produced by the new abstraction method. One reason for this unexpected result is that the new method is designed to handle simultaneous definitions well, rather than the

37

nested declarations that this example contains.

3.3 Optimization of the New Abstraction Method

After observing several cases where Abdali's code was better in some sense than our new code, we decided that our method could be improved upon. The optimization rules for the new method were too restrictive in certain cases. Abdali's rules could handle some abstractions more efficiently than our current set of rules could. We studied examples to determine some of the inefficiencies of our method. We found that we could improve upon the optimization rules that handled the special cases of recognizing when a variable is being abstracted from one or more subexpressions. To accommodate this situation, we introduce the combinator families of

and

ymM •* n Y^ - • •

XI • • • X„ -*yo • • •

X I • • • X „ ) • • • (.y„ X I • • • X „ )

y n - ^ y o - - - y n , X i - - - x „ -*(.yoXi • • • x „ ) • • • ( y t _ i X i • • • x „ ) y^ • • • y,„ The optimization rules that produce these combinators are as follows: Opt 5 S™ ( K „ £ « ) • • •

£;._,) £ , • • • E,„ -* X'^'^ E n - • • E,„

S„"' £,>•••

£,)••• { K „ E , „ ) - yf"*

Opt 6 En,

As we did for the first set of optimization rules, we can show that these rules hold by applying both sides of the equations to arbitrary

• • • .t„ and simplifying.

Opt 5 S„"' (K„ E,t) • • • (.K„ Ek _i) E^ ••• E„ XI ••• x„ EaXi - • • x„) • • • (K„ Et

x i • • • x„ ) (E^ Xi • • • x„) • • • (f,„ x i •



)

38

-*EQ • • • E t - i i E k x y • • X „ ) • • - {En, X I - • • X „ ) and Xn"*-*

EO - • • EM

xi • • • x„

-•£0 • • • Ek-i{Ek x i • • • X n ) • • • ( E „ x i • • • x „ ) therefore T (^n Eo)--

Et-i) Ek •••E^-* Xir^ E^---Em

Opt 6 S„"' E o -

Et-i(.K„ E t ) • • (.Kn E ^ ) x i - • • x„

-» (£0 * 1 • • • ^/I ) • • •

-1 ^ 1 • • •

) (-^n Ek A: 1 • • • Xn ) • • • iK„ E„, X ^ ' • • X „ )

-*{.EqXi - • • X^) • • - {Et-iXi - • • x„) Et • • • E„

and riT''E o - - - E , „ x,---x„

-» (£0 ^ 1 • • •

) • • • (•£"* -1 ^ 1 • • •

) -Êit ' ' ' E„,

therefore s r E o - ' E t - i U n E t ) - • • ( K „ E„, ) - y r £(hole_count 1 1)0) (my_push (list "doJ)) (my_push resumeflag) (my_lop)) (t (cond (( < (length slack) 1) (my_push (list 'do_l)) (my_push resumeflag) (join))) (check_args 1 Ictr) (my_trace (list 'doj)) (cond (resumed (setq resumed nil) (do_top))))))) (def do_K (lambda () (check_args 2 "Kclr) (iny_lrace (list 'do_K))

148

(setq slack (cons (stkl) (slk2r))))) (def do_S (lambda () (check_args 3 'Sctr) (my_trace (list do_S)) (setq stack (cons(stkl)(cons(stk3) (cons (cons (stk2) (stk3)) (slk3r))))))) (def doJ (lambda 0 (check_args 3 'Bctr) (my_lrace (list doJ)) (setq stack (cons (stkl) (cons (cons (stk2) (stk3)) (stk3r)))))) (def do_C (lambda () (check_args 3 Cctr) (my_lrace (list 'do_C)) (setq stack (cons (stkl) (cons (stk3) (cons (stk2) (stk3r))))))) (def toss (lambda (n 1) (cond ((eq n 0) 1) (t (toss (- n 1) (cdr 1)))))) (def find (lambda (n 1) (cond ((eq n 0) nil) ((eq n 1) (car 1)) (t (find (- n 1) (cdr I)))))) (def extract (lambda (s n 1) (firstn n (toss (- s 1) 1)))) (def firstn (lambda (n 1) (cond ((eq n 0) nil) (t (cons (car 1) (firstn (- n 1) (cdr 1))))))) (def build (lambda (x y) (cond ((eq X nil) nil) (t (cons (cons (car x) y) (build (cdr x) y )))))) (def do_newS (lambda (m n) (setq idle_lemp (max clockjLicks (untag_stack (+ (+ m n) 1)))) (cond ((neq idle_lemp clock_ticks) (setq overhead (+ overhead

149

(min local_pverhead (- idle_temp clock_ticks)))))) (selq idle_time (+ idle_time (- idle_temp clock_ticks))) (cond ((neq idle_temp clock_ticks)-(my_trace (list 'wait (- idle_temp clock_licks))))) (setq dock_ticks idle.iemp) (cond (( > (hole_çount l( + ( + mn)l))0) (my_push (list do_newS m n)) (my_push resumeflag) (my_top)) (t (cond ((< (length stack) (+ (+ m 1) n)) (my_push (list 'dojnewS m n)) (my_push resumeflag) (join))) (check_args (+ (+ m 1) n) "newSctr) (my_trace (list 'do_newS m n)) (setq stack (append (build (extract 1 (+ m 1) stack) (toss (+ m 1) (extract 1 (+ (+ m 1) n) stack))) (toss (+ (+ 1 m) n) stack))) (startprocs 2 (+ m 1)) (update_çlocks) (cond (resumed (setq resumed nil) (do_top))))))) (def do_X (lambda (m n k) (setq idle_temp (max clock_ticks (untag_stack (+ (+ m n) 1)))) (cond ((neq idle_temp clock_licks) (setq overhead (+ overhead (min local_overhead (- idle_temp clock_ticks)))))) (setq idle_time (+ idle_lime (- idle_lemp clock_ticks))) (cond ((neq idle_temp clock_ticks) (my_trace (list wait (- idle_temp clock_t.icks))))) (setq clock_ticks idle_lemp) (cond (( >(hole_count l( + ( + mn)l))()) (my_push (list dom n k)) (my_push resumeflag) (my_top)) (t (cond (((hole_count l( + ( + mn)l))0) (my_push (list do^ m n k)) (my_push resumeflag) (my_top)) (t (cond (( < (length slack) (+ (+ m 1) n)) (my_push (list do_Z m n k))

151

(my_push resumeflag) (join))) (check_args (+ (+ m 1) n) "Zctr) (niy_trace (list 'do_Z m n k)) (setq stack (append (append (extract 1 k stack) (append (build (list (find (+ k 1 ) stack)) (toss (+ m 1 ) (extract 1 (+ (+ m 1) n) stack))) (extract (+ k 2) (- m k) stack)

)

) (loss (+ (+ 1 m) n) stack))) (cond ((eq k 0) (startprocs 2 (+ m 1 ))) (t(startprocs (+ k 1) (+ m 1)))) (update_çlocks) (cond (resumed (setq resumed nil) (do_top))))))) (def do_newI (lambda (m n) (setq idle_temp (max clock_ticks (untag_stack (+ (+ m n) 1)))) (cond ((neq idle_lemp clock_ticks) (setq overhead (+ overhead (min local_pverhead (- idle_temp clock_ticks)))))) (setq idle_lime (+ idle_time (- idle_temp clock_ticks))) (cond ((neq idlejLemp clock_ticks) (my_trace (list 'wait (- idle_temp clock_ticks))))) (setq clock_licks idle_temp) (cond (( > (hole_count 1 n) 0 ) (my_push (list 'do_newI m n)) (my_push resumeflag) (my_top)) (t (cond ((< (length stack) n) (my_push (list 'do_newI m n)) (my_push resumeflag) (join))) (check_args n "newlctr) (my_lrace (list 'dojewl m n)) (setq Slack (append (list (find m stack)) (loss n stack))) (cond (resumed (setq resumed nil) (do_top))))))) (def dojiewK (lambda (n) (setq idle_temp (max clock_ticks (untag_stack (+ (+ m n) 1)))) (cond ((neq idle_temp clock_ticks) (setq overhead (+ overhead (min local_pverheail (- icilejLemp clock_ticks)))))) (setq idle_time (+ idle_lime (- idle_temp clotk_iicks))) (cond ((neq idle_lemp clock_ticks) (my_irace ( lisi wail (- idle_temp clock_iicks)))))

152

(setq cIock_ticks idle_temp) (cond (( > (hole_counl l( + nl))0) (my_push (list 'dojewK n)) (my_push resumeflag) (my_lop)) (l (cond (( < (length stack) (+ 1 n)) (my_push (list "do_newK n)) (my_push resumeflag) (join))) (checkjrgs (+ 1 n) 'newKctr) (my_lrace (list 'dojewK n)) (selq stack (append (list (stkl)) (toss (+ n 1) stack))) (cond(resumed(setq resumed nil) (do_top))))))) (def init (lambda () (setq Xctr 0 newSctr 0 Yctr 0 Zctr 0 newlctr 0 newKctr 0) (setq stack "(abcdeflmnopwxy z)))) (def new_top (lambda 0 (setq newtopctr (+ newtopctr 1)) (setq idle_temp (max clock_ticks (untag_stack 1))) (cond ((neq idle_temp clock_ticks) (setq overhead (+ overhead (min local_pverhead (- idle_temp clock_licks)))))) (setq idle_lime (+ idle_time (- idle_temp clock_licks))) (cond ((neq idle_lemp clock_ticks) (my_trace (list "wait (- idle_lemp clockjticks))))) (selq clock_ticks idle_temp) (cond ((eq (stkl) resumeflag) (setq local (stk2)) (setq stack (stk2r)) (setq resumed 'l) (eval local))) (cond ((numberp (stkl)) (go (no_guote (retstkl)))) ((atom (stkl)) (progn (setq tem (stkl)) (selq Slack (slklr)) (cond ((atom stack) (cond ((not (null slack)) (selq Slack (list slack)))))) (go (no_auole lem)))) (t (progn (setq oldpair (stkl)) (setq stack (cons (caar slack) (ourappend (cdar slack) (slklr)))) (new_top)))))) (def ourappend (lambda (x y) (cond

153

((null y) x) ((null x) y) (t (append (cond ((atom x) (list x)) (t x)) (cond ((atom y) (list y)) (t y))))

)))

(def fix_args (lambda (toptemp) (selq idlejemp (max clock_ticks (untag_stack 2))) (cond ((neq idle_lemp clock_ticks) (setq overhead (+ overhead (min local_overhead (- idle_temp clock_ticks)))))) (setq idle_lime (+ idle_time (- idle_temp clock_ticks))) (cond ((neq idle_temp clock_ticks) (my_trace (list 'wait (- idlejtemp clock_ticks))))) (selq clock_ticks idle_lemp) (update_clocks) (cond ((is_hole (slkl)) (cond ((is_hole (slk2)) (my_push toptemp) (my_top)) ((numberp (stk2)) (my_push toptemp) (my_top)) (t (setq stack (append (list (stkl) (fork (slk2) nil)) (stk2r))) (my_push toptemp) (my_top)))) ((numberp (stkl)) (cond ((is_hole (stk2)) (my_push toptemp) (my_top)) ((numberp (stk2)) nil) (t (setq Slack (append (list (slkl) (fork (slk2) nil)) (stk2r))) (my_push toptemp) (my_top)))) (t (cond ((is_j)ole (stk2)) (setq stack (append (list (fork (stkl) nil) (slk2)) (sik2r))) (my_push toptemp) (my_iop)) ((numberp (stk2)) (selq slack (append (list (fork (slkl) nil) (slk2)) (slk2r))) (my_push loplemp) (my_lop)) (l (selq stack (append (list (fork (stkl) nil) (fork (sik2) nil)) (stk2r))) (my_push toptemp) (my_iop))

))))) (def slart_arg (lambda (i) (cond

154

((> i (length stack)) nil) ((is_hole (nthelem i stack)) t) ((numberp (nthelem i stack)) nil) (( (holejcount 1 n) 0 ) t) (t nil)))) (def is_hole • (lambda (x) (def look (lambda (y) (cond ((null y) nil) ((eq x (car y)) t) (t (look (cdr y)))))) (look hoiejds))) (def append I (lambda (x y) (append x (list y)))) (def fork (lambda (myslack myrel_stack) (setq procs_]n_iise (+ procs_in_use l)) (cond (( < max_in_use procs_in_use) (setq max_in_use procs_in_use))) (setq forkctr (+ forkctr 1)) (cond ((atom myslack) (selq myslack (list myslack)))) (cond ((atom (cdr myslack)) (cond ((not (null (cdr myslack))) (selq myslack (list (car myslack) (cdr myslack))))))) (cond ((atom myret_stack) (selq myret_slack (list myret_slack)))) (cond ((atom (cdr myrel_slack)) (cond ((not (null (cdr myret_slack))) (selq myret_stack (list (car myrel_slack) (cdr myret_stack))))))) (selq ready (append) ready myslack)) (selq hoiejds (appendl hole_ids (gensym)))

155

(setq ready 1 (appendl ready 1 myret_slack)) (setq timers (appendl timers 0)) (setq master_clocks (appendl master_clocks 0)) (setq trace_slufF (appendl trace_stuff (list mystack))) (setq idle_list (appendl idle_list 0)) (setq overhead_times (appendl overhead_limes overhead_constant)) (setq overheadjn (appendl overheadjn (list (our_length mystack) (depth mystack)))) (rplaca (last currenljrace) (appendl (car (last currentjrace)) (last holejds))) (car (last hole_ids))

))

(def join (lambda () (setq procsJn_use (- procs_in_use 1)) (update_clocks) (setq master_cIock (+ master_clock overhead_constant)) : overhead (setq proc.switches (+ proc^witches l)) (setq current_trace (appendl current_trace stack)) (setq joinctr (+ joinctr 1)) (setq ready (cdr ready)) (setq over_out (our_length stack)) (setq over_putl (depth stack)) (cond ((eq (length stack) 1) (setq stack (car stack)))) (dsubst (list "timeflag (list master_clock overhead) stack) (car hole_ids) ready) (dsubst (list 'timeflag (list master_clock overhead) stack) (car hole_ids) ready1) (setq time_counts (cons (list (car hole_ids) master_clock) time_counts)) (setq traces (cons (list (car holejds) current_irace) traces)) (setq idle^ummaries (cons (list (car holejds) idle_lime) idlesummaries)) (setq overhead^ummaries (cons (list (car holejds) overhead) overhead_summaries)) (setq timers (cdr timers)) (setq master^locks (cdr master^locks)) (setq trace_stuff (cdr trace__stiifr)) (setq idlejist (cdr idlejist)) (setq overhead_times (cdr overheadjimes)) (setq holejds (cdr holejds)) (setq ready1 (cdr ready1)) (setq stack (car ready)) (setq ret_stack (car ready 1)) (selq clock_ticks (car timers)) (setq master_clock (car master__clocks)) (setq current_trace (car trace_stuff)) (setq idlejime (car icllejisl)) (setq overhead (car overliead_limes)) (setq overheadJn (ctir overlieadjn)) (new_top)))

156

(def my_rolate (lambda (x y) (set X (appendl (cdr (eval x)) (eval y))) (set y (car (eval x))))) (def do_top (lambda () (check_args 0 "lopctr) (my_lrace (list 'do_top)) (cond ((> (length ready) 1) (setq proc_switches (+ proc.switches 1)))) (setq ready (appendl (cdr ready) stack)) (setq stack (car ready)) (setq ready 1 (appendl (cdr ready 1) ret^lack)) (setq ret^tack (car ready 1)) (myjotate 'timers 'clockjicks) (niy_rotate "masters-locks 'master^lock) (my^otate trace__stufr 'current_trace) (my_rotale 'idlejist 'idle_lime) (my_rotale 'overheadjimes 'overhead) (setq holejds (appendl (cdr holejds) (car holejds))) (setq overheadjn (appendl (cdr overhead_in) (car overhead^n))) (new_top)

))

(def my_top (lambda () (setq clock_ticks (- clock_ticks 1)) (do_top)

))

(def our jnit (lambda (x) (setq resumed nil) (setq resumeflag 'spflag) (setq topjLemp "Ldeb) (setq ready (list nil)) ; processor stack frames (setq ready1 (list nil)) : processor return slack frames (setq hole_ids (list (gensym))) : return hole ids (setq timers (list 0)) ; processor clockjicks (setq master_clocks (list 0)) : processor master_clock ticks (setq trace^tuff nil) : process traces (setq idle_list (list 0)) ; processor idle times (setq overhead_limes (list 0)) : overhead limes (setq idle_summaries nil) : idlejime summaries (setq lime__counts nil) : masterjlock summaries (setq traces nil) : overall trace summaries (setq overhead_in (list 0)) : input expense (setq overhead^ummaries nil) : the overhead times (setq slack x)

))

(def myprinl (lambda () (print 'slack=) (print stack) (terpri) (print 'ready=) (print ready) (terpri) (print 'ready 1=) (print ready 1) (terpri) (print 'hole^ds=) (prim iiole_ids) (terpri)

157

(def calll (lambda (cont) (cond ((