Computer Science and State Machines

Computer Science and State Machines Leslie Lamport 8 June 2008 Contribution to a Festschrift honoring Willem-Paul de Roever on his retirement. Comp...
7 downloads 0 Views 117KB Size
Computer Science and State Machines Leslie Lamport 8 June 2008

Contribution to a Festschrift honoring Willem-Paul de Roever on his retirement.

Computation Computer science is largely about computation. Many kinds of computing devices have been described, some abstract and some very concrete. Among them are: • Automata, including Turing machines, Moore machines, Mealy machines, pushdown automata, and cellular automata. • Computer programs written in a programming language. • Algorithms written in natural language and pseudocode. • von Neumann computers. • BNF grammars. • Process algebras such as CCS. Computer scientists collectively suffer from what I call the Whorfian syndrome1 —the confusion of language with reality. Since these devices are described in different languages, they must all be different. In fact, they are all naturally described as state machines.

State Machines There are two ways to define state machine, one emphasizing the states and the other the transitions from one state to the next. I will use the simpler one that emphasizes states. For brevity, I ignore termination/liveness and consider only safety. A state machine is then specified by a set S of states, a set I of initial states, and a next-state relation N on S, so I ⊆ S and N ⊆ S × S. It generates all computations s 1 → s 2 → s 3 → · · · such that: S1. s 1 ∈ I S2. hs i , s i+1 i ∈ N , for all i . For example, a BNF grammar can be described by a state machine whose states are sequences of terminals and/or non-terminals. The set of initial states contains only the sequence consisting of the single starting nonterminal. The next-state relation is defined to contain hs, t i iff s can be 1

See http://en.wikipedia.org/wiki/Sapir-Whorf hypothesis .

1

transformed to t by applying a production rule to expand a single nonterminal. Some of the computing devices listed above have an event (called an “input”, “output”, or “action”) associated with a state transition. Those devices can be represented by augmenting the state to include the last event. α In other words a transition s −→ t from state s to state t with event α can be represented as a transition from (augmented) state hs, β i to state ht, αi, where β is the event that “led to” s. (Initial states have the form hs, ⊥i for a special initial event ⊥.) Describing all the other kinds of computing devices listed above as state machines is straightforward. Complexity results only from the innate complexity of the device, programs written in a modern programming language being especially complicated. However, representing a program in even the simplest language as a state machine may be impossible for a computer scientist suffering from the Whorfian syndrome. Languages for describing computing devices often do not make explicit all components of the state. For example, simple programming languages provide no way to refer to the call stack, which is an important part of the state. For one afflicted by the Whorfian syndrome, a state component that has no name doesn’t exist. It is impossible to represent a program with procedures as a state machine if all mention of the call stack is forbidden. Whorfian-syndrome induced restrictions that make it impossible to represent a program as a state machine also lead to incompleteness in methods for reasoning about programs.

Specifying a State Machine To use state machines, we need a language for specifying them. The languages designed by computer scientists for describing computations usually specify state machines, defining the computations by S1 and S2. A partisan of such a language will insist that it is ideal for describing any state machine. I will ignore computer scientists and use instead the language employed by every other branch of science and engineering—namely, ordinary mathematics. In science and engineering, a set of states is usually specified by a collection of variables and their ranges, which are sets of values. A state s assigns to every variable v a value s(v ) in its range. For example, physicists might describe the state of a particle moving in one dimension by variables x (the particle’s position) and p (its momentum) whose ranges are the set of real numbers. The state s t at a time t is described by the real numbers s t (x ) 2

and s t (p), which physicists usually write x (t) and p(t). We specify the set of initial states the way sets of states are generally described—by a boolean-valued expression containing variables and ordinary mathematical constants and operators. For the particle example, x = 0 specifies the set of all states s such that s(x ) = 0 and s(p) is any real number. Because most fields of science and engineering study continuous processes, there is no standard way to describe a next-state relation. The simplest way I know to do it is with an expression that can contain primed as well as unprimed variables, the unprimed variables referring to the first state and the primed variables to the second state. For example, (x 0 = x +1)∧(p 0 > x 0 ) specifies the relation consisting of all pairs hs, t i of states such that t(x ) = s(x ) + 1 and t(p) > t(x ).

State Machines in Action The benefits of describing state machines mathematically rather than hiding them behind computer-science languages would make a long list. It might begin with the replacement of esoteric programming logics by ordinary mathematics. For example, the Hoare triple {P }S {Q} becomes the formula P ∧ S ⇒ Q 0 , where S is the relation on states described by the program statement S and Q 0 is formula Q with each variable primed. Instead of compiling such a list, I consider one nice little example—two algorithms that appear unrelated until they are expressed mathematically as state machines. The first algorithm is described by this simple program X that runs forever, alternately performing the operations P and C. X : loop P ; C endloop The second algorithm is an important hardware protocol called two-phase handshake, illustrated by this diagram. p Cons P rod ¾ c The “wires” p and q can assume the values 0 and 1; the arrows indicate that p is set by process Prod and c is set by process Cons. The processes synchronize using p and c so they take turns executing operations P and C. Their protocol can be described as follows, where p and c are initially equal

3

and ⊕ is defined to be addition modulo 2 (known to hardware designers as 1-bit exclusive-or). Y : process P rod : whenever p = c do P ; p := p ⊕ 1 end || process Cons : whenever p 6= c do C ; c := c ⊕ 1 end It is easy to see, though not completely obvious, that Y alternately performs P and C operations, just like X . From the state machines’ pseudocode descriptions, this seems coincidental. The mathematical descriptions of these state machines reveal that it is no coincidence. Starting from X , we can derive Y mathematically. For simplicity, assume P and C to be atomic operations. They are then described by relations between primed and unprimed variables. To avoid introducing new symbols, let P and C also denote these two mathematical relations. Let varPC be the set of variables that occur in these relations. To describe program X as a state machine, we must introduce a variable to represent the control state—part of the state not described by program variables, so to victims of the Whorfian syndrome it doesn’t exist. Let’s call that variable pc, which we assume is not in varPC . The state variables of X are therefore pc and the variables in varPC . Since P and C are atomic operations, each executed as a single step, the variable pc assumes just two values. Let those values be 0 and 1. State machine X then has initial predicate Init X and next-state relation Next X defined as follows, where Init PC specifies the initial values of the variables in varPC . Init X



= (pc = 0) ∧ Init PC ∆

Next X =

((pc = 0) ∧ P ∧ (pc 0 = 1)) ∨ ((pc = 1) ∧ C ∧ (pc 0 = 0))

To describe Y as a simple state machine, we assume that the body of each process is executed as a single atomic action. Thus, when p = c is true, process Prod both executes P and increments p as one step. There is then no control state, and the state variables are p, c, and the variables in varPC . The initial predicate and next-state relation of Y are Init Y



= (p = c) ∧ Init PC ∆

Next Y = Prod ∨ Cons

4

where formulas Prod and Cons, which describe the two processes, are defined by: ∆

Prod = (p = c) ∧ P ∧ (p 0 = p ⊕ 1) ∧ (c 0 = c) ∆

Cons = (p 6= c) ∧ C ∧ (c 0 = c ⊕ 1) ∧ (p 0 = p) The mathematical relation between these two state machines is simple: Y is obtained from X by substituting p ⊕ c for pc. Substituting an expression for a variable is a basic and powerful mathematical operation. Let us now see exactly how we derive Y from X by this substitution. For any formula F , let F be the formula obtained from F by this substitution. For example, pc 0 equals (p ⊕ c)0 , which equals p 0 ⊕ c 0 . It is easy to see that (

pc =

0 if p = c 1 if p 6= c

from which we obtain Init X



= (p = c) ∧ Init PC ∆

Next X = Pr ∨ Co where ∆

Pr = (p = c) ∧ P ∧ (p 0 6= c 0 ) ∆

Co = (p 6= c) ∧ C ∧ (p 0 = c 0 ) The formulas Init X and Next X are the initial predicate and next-state relation of a state machine X whose states are the states of Y. We first consider its relation to state machine X . Define a mapping Ψ from states of Y to states of X by letting Ψ(s) assign the same values to the variables in varPC as s, and letting it assign to pc the value s(p) ⊕ s(c). (Recall that s(p) and s(c) are the values assigned to p and c by state s.) Extend Ψ to a mapping on computations (sequences of states) by letting Ψ(s 1 → s 2 → . . .) equal Ψ(s 1 ) → Ψ(s 2 ) → . . . . It follows easily from our definition of F that a formula F is true of state s of Y iff F is true of state Ψ(s) of X . Similarly, a relation R is true of a pair hs 1 , s 2 i of states of Y iff R is true of hΨ(s 1 ), Ψ(s 2 )i. It follows that a sequence σ of states of Y is a computation of the state machine X iff Ψ(σ) is a computation of X . 5

Let us now consider the disjuncts of the next-state relation Next X , starting with Pr . Because p and c assume only the values 0 and 1, p = c implies p 0 6= c 0 ≡

((p 0 = p ⊕ 1) ∧ (c 0 = c)) ∨ ((p 0 = p) ∧ (c 0 = c ⊕ 1))

This implies Pr ≡

((p = c) ∧ P ∧ (p 0 = p ⊕ 1) ∧ (c 0 = c)) ∨ ((p = c) ∧ P ∧ (p 0 = p) ∧ (c 0 = c ⊕ 1))

A Pr step therefore either increments p and leaves c unchanged (satisfying the first disjunct) or else increments c and leaves p unchanged (satisfying the second disjunct). If we want an algorithm in which the process that executes P modifies only p, then we must allow only the first possibility, eliminating the second disjunct. We are left with the first disjunct, which equals Prod . A similar calculation shows that we obtain Cons from Co by eliminating a disjunct that modifies p and leaves c unchanged. This leads us to a state machine with initial predicate Init X and next-state predicate Prod ∨ Cons, which is precisely the state machine Y. Our derivation shows that Prod implies Pr and Cons implies Co. Hence, Next Y implies NextX . Since Init Y equals InitX , we deduce that any computation σ of Y is a computation of X . We have already seen that σ is a computation of X iff Ψ(σ) is a computation of X . Hence, if σ is any computation of Y, then Ψ(σ) is a computation of X . Because the states s and Ψ(s) assign the same values to the variables in varPC , this means that σ has the same P and C steps as Ψ(σ). Thus, we deduce that the derived protocol Y produces the same sequence of P and C operations as does X . Since it is obvious that X alternately executes P and C operations, this shows that Y does too. In other words, this shows that Y is correct by construction. When presenting this kind of derivation, it is conventional to pretend that it leads to the discovery of the resulting protocol. I presented Y before the derivation to make it easier to see where we were heading. This allowed me to “cheat” by letting pc assume the convenient values 0 and 1. Had I chosen two arbitrary values a and b instead, we would have substituted if p = c then a else b for pc. A simple calculation would have shown (

pc =

a if p = c b if p 6= c 6

From that point, the derivation would have proceeded exactly as before, with the same formulas InitX and NextX .

A Lesson Using ordinary mathematics, we have derived the simple but useful protocol Y from the trivial algorithm X by substituting p ⊕c for pc. We could do this because we represented these algorithms as state machines and we described the state machines using ordinary mathematics. The pseudocode descriptions probably seem more natural to most computer scientists. But how could our derivation possibly have been done from those descriptions? How do we substitute for a variable pc that doesn’t appear in the pseudocode? Even if pc did appear as a variable, what would it mean to substitute an expression for it in an assignment statement pc : = . . . ? Quite a number of formalisms have been proposed for specifying and verifying protocols such as Y. The ones that work in practice essentially describe a protocol as a state machine. Many of these formalisms are said to be mathematical, having words like algebra and calculus in their names. Because a proof that a protocol satisfies a specification is easily turned into a derivation of the protocol from the specification, it should be simple to derive Y from X in any of those formalisms. (A practical formalism will have no trouble handling such a simple example.) But in how many of them can this derivation be performed by substituting for pc in the actual specification of X ? The answer is: very, very few. Despite what those who suffer from the Whorfian syndrome may believe, calling something mathematical does not confer upon it the power and simplicity of ordinary mathematics.

7