Parallel and Distributed Systems Group

Partial Order Trace Analyzer (POTA) for Distributed Programs Alper Sen and Vijay K. Garg TR-PDS-2003-004 IS D E UN THCIPLINA I P AT 2003 EXAS F...
2 downloads 0 Views 500KB Size
Partial Order Trace Analyzer (POTA) for Distributed Programs Alper Sen and Vijay K. Garg

TR-PDS-2003-004

IS D

E UN THCIPLINA I

P

AT

2003

EXAS FT O CIVITATIS

RSITY VERAESIDIUM

May

A U STIN

Parallel and Distributed Systems Group Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas 78712

Partial Order Tra e Analyzer (POTA) for  Distributed Programs Alper Sen and Vijay K. Garg Department of Ele tri al and Computer Engineering The University of Texas at Austin, Austin, TX, 78712, USA

fsen,gargge e.utexas.edu f

g

http://www.e e.utexas.edu/~ sen,garg May 16, 2003

Abstra t Che king the orre tness of software is a growing hallenge. In this paper, we present a prototype implementation of Partial Order Tra e Analyzer (POTA), a tool for he king exe ution tra es of message passing programs using temporal logi . So far runtime veri ation tools have used the total order model of an exe ution tra e, whereas POTA uses a partial order model. The partial order model enables us to apture possibly exponential number of interleavings and, in turn, this allows us to nd bugs that are not found using a total order model. However, veri ation in partial order model su ers from the state explosion problem { the number of possible global states in a program in reases exponentially with the number of pro esses. POTA employs an e e tive abstra tion te hnique alled omputation sli ing. A sli e of a omputation (exe ution tra e) with respe t to a predi ate is the omputation with the least number of global states that ontains all global states of the original omputation for whi h the predi ate evaluates to true. The advantage of this te hnique is that, it mitigates the state explosion problem by reasoning only on the part of the global state spa e that is of interest. We implemented omputing sli ing algorithms for temporal logi predi ates from a subset of CTL in POTA. The overall

omplexity of evaluating a predi ate in this logi upon using omputation sli ing be omes polynomial in the number of pro esses ompared to exponential without sli ing. We illustrate the e e tiveness of our te hniques in POTA on several test ases su h as the General Inter-ORB Proto ol (GIOP) [23℄ and the primary se ondary proto ol [39℄. POTA also ontains a translator module from exe ution tra es to Promela [21℄ (input language SPIN). This module  supported

in part by the NSF Grants ECS-9907213, CCR-9988225, Texas Edu ation

Board Grant ARP-320, an Engineering Foundation Fellowship, and an IBM grant

1

enables us to ompare our results on exe ution tra es with SPIN. In some

ases, we were able to verify tra es with 250 pro esses ompared to only 10 pro esses using SPIN. 1

Introdu tion

A fundamental problem in distributed systems is that of predi ate dete tion { dete ting whether a nite exe ution tra e of a distributed program satis es a given predi ate. There are appli ations of predi ate dete tion in many domains su h as testing, debugging, and monitoring of distributed programs. For example, when debugging a distributed mutual ex lusion algorithm, it is useful to monitor the system to dete t on urrent a

esses to the shared resour es. A nite tra e an be modeled in two ways. The rst model imposes a partial order between events, for example Lamport's happened-before relation [26℄. The se ond model imposes a total order (interleaving) of events. We use the former approa h in this paper, whi h is a more faithful representation of on urren y [26℄. Consider an exe ution of a distributed program. The partial order model of the resulting exe ution tra e is shown in Figure 1(a). In the tra e, there are two pro esses P1 and P2 with integer variables x and y, respe tively. The events are represented by solid ir les. Pro ess P2 sends a message to pro ess P1 by exe uting event f1 and pro ess P1 re eives that message by exe uting event e1 . Ea h event is labeled with the value of the respe tive variable immediately after the event is exe uted. For example, the value of x immediately after exe uting e1 is 2. The rst event on ea h pro ess initializes the state of the pro ess. The set of all global states rea hable from the initial state fe0 ; f0 g is displayed in Figure 1(b). Observe that fe1 ; f0 g is not a rea hable global state be ause it depi ts a situation where a message has been re eived from P2 by P1 , that is e1 , but P2 has not yet sent the message. By using a partial order representation, we are able to apture all possible interleavings of events, namely ten in total, rather than a single interleaving. One su h interleaving sequen e is fe0 ; f0 g, fe0; f1 g, fe1; f1 g, fe2 ; f1 g, fe3 ; f1 g, fe3 ; f2g, fe3 ; f3g as shown in Figure 1(b) with thi k lines. Therefore we an obtain better overage in terms of testing and debugging by apturing all interleavings. This overage may translate into nding bugs that are not found using a single interleaving. The main problem in predi ate dete tion in the partial order model is the state explosion problem |the set of possible global states of a distributed program with n individual pro esses an be of size exponential in n. A variety of strategies for ameliorating the state explosion problem, in luding symboli representation of states and partial order redu tion have been explored [29, 16, 40, 33, 9, 38, 39℄. In this paper, we present a prototype implementation of Partial Order Temporal Analyzer (POTA) tool for he king exe ution tra es of distributed programs. POTA onsists of an instrumentation module, a translator module that translates exe ution tra es into Promela [21℄ (SPIN input language) and an an2

{e3 , f 3} Final state {e3 , f 2} 11 00 00 11 00 11 00 00 11 00 {e2 , f 2} 11 {e3 , f 1} D {e1 , f 3} 11 00 11 00 11 00 11 00 11 00 11 00 11 00 00 11 00 {e0 , f 3} 11 {e1 , f 2} 11 {e2 , f 1} V 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 00 11 {e0 , f 2} 11 {e1 , f 1} C 00 11 00 11 00 11 00 11 00 {e0 , f 1} 11 00 11 00 11 00 11 00 11 00 11 00 11 {e , f } Initial state

W

{e2 , f 3}11 00 00 11

x

P1

0

2

11 00 e1 00 11

0 1

e01 0

y P2

0

0

00 11 1 0 00 0 11 f1 f 0

1

Initial state: {e0 , f 0 }

4

11 00 e2 00 11

2

11 00 00 11 f 2

5

11 00 e3 00 11

6

11 00 00 11 f 3

Final state: {e3 , f 3 }

00 11

0

(a)

: meta−event

Initial state: {e1 , f 1 }

0

(b) 0

{e2 } x = 4

x=2 y=0 {e1 , e0 , f 1 , f 0 }

11 00 00 11 00 11

2

e0

e1

0 {f 3 , f 2 } y = 6

f0

Final state: {e2 , f 3 }

0 f1

Initial state: {e0 , f 0 }

(c)

4 e2

2

6

f2

f3

Final state: {e2 , f 3 } (d)

Figure 1: (a) A omputation (b) its set of all rea hable global states ( ) its sli e with respe t to (2  x  4) ^ (y 6= 2) (d) its sli e with respe t to EF((2  x  4) ^ (y 6= 2)) alyzer module. The use of an e e tive abstra tion te hnique alled omputation sli ing for temporal logi veri ation is the most signi ant aspe t of POTA and

onstitutes the analyzer module. Computation sli ing was introdu ed in [14, 31℄ as an abstra tion te hnique for analyzing distributed omputations ( nite exe ution tra es). A omputation sli e, de ned with respe t to a global predi ate, is the omputation with the least number of global states that ontains all global states of the original omputation for whi h the predi ate evaluates to true. Sli ing an be used to throw away the extraneous global states of the original omputation in an eÆ ient manner, and fo us on only those that are urrently relevant for our purpose. Using the results in [14, 31℄ and [35℄, we an eÆ iently use omputation sli ing for the subset of CTL [4℄ with the following three properties. First, temporal operators are EF, EG, and AG and boolean operators are onjun tion and disjun tion. Se ond, atomi propositions are regular predi ates, whi h we 3

will de ne later. Third, negation operator has been pushed onto atomi propositions. We all this logi Regular CTL plus (RCTL+), where the plus denotes that the disjun tion and negation operators are in luded in the logi . We also

onsider a disjun tion and negation free subset of RCTL+ and denote this by Regular CTL (RCTL). In RCTL+, we use the lass of predi ates, alled regular predi ates, that was introdu ed in [14℄. The sli e with respe t to a regular predi ate ontains pre isely those global states for whi h the predi ate evaluates to true. Regular predi ates widely o

ur in pra ti e during veri ation. Some examples of regular predi ates are onjun tion of lo al predi ates [12, 22℄ su h as \all pro esses are in red state", ertain hannel predi ates [12℄ su h as \at most k messages are in transit from pro ess Pi to Pj ", and some relational predi ates [12℄. To illustrate predi ate dete tion using omputation sli ing, onsider the omputation in Figure 1(a). Let p = (2  x  4) ^ (y 6= 2), and suppose we want to dete t EF(p). Without omputation sli ing, we are for ed to examine all global states of the omputation, thirteen in total, to de ide whether the omputation satis es the predi ate. Figure 1(b) ontains the set of all rea hable global states of the omputation. In the gure, we represent a global state as a tuple where ea h element is the last event that o

urred on a pro ess. Alternatively, we an

ompute the sli e of the omputation with respe t to regular predi ate EF(p) and use this sli e for predi ate dete tion. For this purpose, rst we ompute the sli e with respe t to the atomi proposition p as follows. Immediately after exe uting f2 , the value of y be omes 2 whi h does not satisfy y 6= 2. To rea h a global state satisfying y 6= 2, f3 has to be exe uted. In other words, any global state in whi h only f2 has been exe uted but not f3 is of no interest to us and an be ignored. The sli e is shown in Figure 1( ). It is modeled by a partial order on a set of meta-events; ea h meta-event onsists of one or more \primitive" events. A global state of the sli e either ontains all the events in a meta-event or none of them. Moreover, a meta-event \belongs" to a global state only if all its in oming neighbours are also ontained in the state. The sli e ontains only four states C; D; V and W and has mu h fewer states than the omputation itself { exponentially smaller in many ases { resulting in substantial savings. Using the sli e in Figure 1( ), we an obtain the last state that satis es p in the omputation, whi h is denoted by W . We also know from the de nition of EF(p) that every global state of the omputation that o

urs before W satis es EF(p), e.g. states en losed in the dashed ellipse in Figure 1(b). Therefore, applying this observation we an ompute the sli e with respe t to EF(p) as shown in Figure 1(d). Finally, we he k whether the initial state of the omputation is the same as the initial state of the sli e. If the answer is yes then the predi ate is satis ed, otherwise not. POTA implements predi ate dete tion algorithms for RCTL and RCTL+ whi h use omputation sli ing. We show in [35℄, that the omplexity of predi ate dete tion for a predi ate p in RCTL is O(jpj  n2 jE j), where jpj is the number of boolean and temporal operators in p. To the best of our knowledge, there did not exist tools that implement eÆ ient algorithms (polynomial in the number of pro esses) to dete t predi ates that ontain nested temporal logi predi ates. 4

An example of a nested predi ate is AG(EF(reset)), whi h states that reset is possible from every state. Furthermore, we validate with experiments that even for RCTL+ predi ates our omputation sli ing based te hnique is very e e tive. We performed experiments using POTA on several proto ols. We also used the POTA translator module to enable omparison with SPIN on exe ution tra es. In fairness, SPIN is designed for he king orre tness of programs and not tra es. However, to the best of our knowledge it is the best distributed program veri ation tool we an use for our partial order models. Some of the proto ols we used for experiments are the General Inter-ORB Proto ol (GIOP) [23℄ and the primary se ondary proto ol [39℄. GIOP is a entral feature of the Common Obje t Request Broker Ar hite ture (CORBA) that aids in a hieving the desired interoperability between ORBs. The CORBA spe i ation de nes a standard proto ol to allow ommuni ation of obje t invo ations between ORBs. Kamel and Leue [23℄ ould not fully verify an abstra t model of GIOP with 10 pro esses. Instead, they veri ed a simpli ed version of the proto ol without server migration fun tionality. In one ase, we generated exe ution tra es of unsimpli ed GIOP proto ol for a on guration with 250 pro esses. However, even with an exe ution tra e input, SPIN failed to omplete veri ation with more than 10 pro esses. We also inje ted faults into the proto ol and analyzed the resulting exe ution tra es. With SPIN, we used bit-state hashing approximation option to handle larger number of pro esses, but in this ase SPIN failed to nd the faults before running out of memory. However, POTA was able to nd the faults easily. In all ases, our algorithms are signi antly faster and spa e eÆ ient than SPIN. We have measured over three orders of magnitude gain over SPIN some experiments. Computation sli ing an indeed be used to fa ilitate predi ate dete tion even for a larger lass of predi ates than RCTL+ as illustrated by the following example. Consider a predi ate p that is a onjun tion of two lauses p1 and p2 . Now, assume that p1 is su h that it belongs to RCTL+ but p2 has no stru tural property that an be exploited for eÆ ient dete tion, su h as, (x1  x2 +x3 > x4 ), where xi is an integer variable on pro ess i. To dete t p, without omputation sli ing, we are for ed to use global-state-spa e- onstru tion-based approa hes, whi h do not take advantage of the fa t that p1 an be dete ted eÆ iently. With omputation sli ing, however, we an rst ompute the sli e for p1 . If only a small fra tion of global states satisfy p1 , then instead of dete ting p in the omputation, it is mu h more eÆ ient to dete t p in the sli e. Therefore by spending only polynomial amount of time in omputing the sli e we an throw away exponential number of global states, thereby obtaining an exponential speedup overall. The remainder of this paper is organized as follows. In the next se tion, we

ompare our work with related work in this area. Se tion 3 presents an overview of the POTA ar hite ture. In Se tion 4, we dis uss the omputation model that is used to represent distributed programs. A brief overview of omputation sli ing is given in Se tion 5. Se tion 6.1 des ribes the regular predi ates and our logi . Se tion 7 des ribes the underlying temporal logi omputation sli ing algorithms. We give experimental results in Se tion 8. Finally, some on lusions 5

and a des ription of future work are given in Se tion 9. 2

Related Work

The notion of a omputation sli e is similar to the on ept of a program sli e [41℄. Given a program and a sli ing riterion, that is, a set of variables, a program sli e onsists of all statements in the program that may a e t the value of the variables in the set at some given point. The riterion in program sli ing has also been extended to some predi ate lasses su h as atomi propositions in temporal logi LTL [8℄. Millett and Teitelbaum [30℄ have applied program sli ing to Promela. Program sli ing has been shown to be useful in program debugging, testing, program understanding, and software maintenan e [25, 41℄. A sli e an signi antly narrow the size of the program to be analyzed, thereby making the understanding of the behavior easier. We expe t to have the same bene t from a omputation sli e for predi ate dete tion and other problems. Predi ate dete tion is a hard problem. Dete ting even a 2-CNF predi ate under EF modality has been shown to be NP- omplete, in general [13℄. Predi ate dete tion is a widely-studied problem. There are three major approa hes to solving predi ate dete tion: global-snapshot-based approa h [2℄, global-state-spa e- onstru tion-based approa h (in luding model he king) [4, 5℄, and predi ate-restri tion-based approa h [12℄. The rst approa h an dete t only stable predi ates (whi h remain true on e they be ome true), and the se ond approa h su ers from the state explosion problem. We follow the predi ate-restri tion-based approa h that exploits the stru ture of the predi ate and dire tly uses the omputation to dete t if the predi ate is satis ed in a global state. Some examples of the predi ates for whi h the predi ate dete tion

an be solved eÆ iently are: onjun tive [12, 22℄, disjun tive [12℄, stable [2℄, observer-independent [3, 12℄, linear [12, 36℄, and non-temporal regular [14, 31℄ predi ates. These predi ate lasses have been so far dete ted under some or all of the temporal operators EF, EG, AG, AF and under the until operator of CTL [36℄, but not under any nesting of these operators. For example, a predi ate EF(p ^ EG(q)), where p and q are onjun tive predi ates, annot be eÆ iently dete ted using only the eÆ ient algorithms for onjun tive predi ates. In POTA, we an dete t su h nested temporal logi predi ates eÆ iently. The idea of using temporal logi for analyzing exe ution tra es (also referred to as runtime veri ation) has re ently been attra ting a lot of attention. We rst presented a temporal logi framework for partially ordered exe ution tra es in [36℄ and gave eÆ ient algorithms for predi ates of the form EG(p) and AG(p) when p is a linear predi ate. The eÆ ien y of those algorithms depended on the fa t that p was a state predi ate and therefore we ould eÆ iently evaluate the satis ability of p at a global state. However, in this paper we present implementation of eÆ ient algorithms even when p is a temporal predi ate. Some other examples of using temporal logi for he king exe ution tra es are the ommer ial Temporal Rover tool (TR) [7℄, the MaC tool [24℄, the JPaX tool [19℄, and the JMPaX tool [37℄. TR allows the user to spe ify the temporal 6

formula in programs. These temporal formula are translated into Java ode before ompilation. The MaC and JPaX tools onsider a totally ordered view of an exe ution tra e and therefore an potentially miss bugs that an be dedu ed from the tra e. Other LTL based veri ation of exe ution tra es are based on automata generation [11℄ or rewriting [20℄, where the veri ation omplexity is polynomial time for full LTL yet the representation model is a total order. JMPaX tool is loser to POTA be ause of the partial order tra e model. The di eren es in both approa hes an be summarized as follows. POTA works with message passing distributed programs, whereas JMPaX onsiders multithreaded shared memory Java programs. JMPaX uses a subset of temporal logi with safety where atomi propositions an be arbitrary. Whereas POTA uses a subset of temporal logi with both safety and liveness where atomi propositions are restri ted. The omplexity of the predi ate dete tion algorithm in POTA is polynomial-time whereas the omplexity is exponential-time in JMPaX. 3

Overview of POTA Ar hite ture

The overall stru ture of POTA ar hite ture is shown in Figure 2. The tool

onsists of 3 main modules; analyzer, translator, and instrumentor.

Distributed Program

Specification

Analyzer Instrument Instrumented Program

Detect Slice

Computation Slice

Execute Partial Order Execution Trace

Translate Translate

Instrumentor

Promela code

Execute Spin

Translator

Figure 2: Overview of POTA Ar hite ture The analyzer module ontains our omputation sli ing and predi ate dete tion algorithms. Given an exe ution tra e and a predi ate (spe i ation) in 7

RCTL+, the omputation sli e may ontain more states than the ones that satisfy the predi ate. Therefore, the analyzer module uses the following strategy to de ide whether the predi ate is satis ed or not. Case 1, if the sli e and the input tra e have di erent initial states then the predi ate is not satis ed. In this

ase a ounterexample is generated. Case 2, if the predi ate is from RCTL and the sli e and the input tra e have the same initial states then the predi ate is satis ed. Case 3, when the predi ate does not belong to RCTL (that is, it ontains disjun tion or negation operators) and the sli e and the input tra e have the same initial states then we have to take an extra step. This is be ause the initial state of the sli e may not satisfy the predi ate. Therefore, we employ the translation module and translate the sli e into Promela [21℄ (input language of SPIN). Then we use SPIN to he k the tra e assuming that there are equivalent spe i ations in LTL. The translator module takes a partial order representation of a tra e and generates output in spe i languages. This module serves two purposes; to enable omparison of our sli ing te hnique with other te hniques su h as partial order redu tion and to enable veri ation of predi ates that do not belong to RCTL but for whi h we an take advantage of omputation sli ing. The latter purpose is served when the predi ate belongs to RCTL+ as explained in Case 3 in the above paragraph or when the predi ate is a onjun tion of predi ates where one of the onjun ts belong to RCTL+ as explained in the introdu tion. Sin e we are working with distributed programs whi h exhibit a lot of parallelism and independen y, partial order redu tion te hniques an take advantage of these properties of distributed programs. The SPIN model he ker ontains implementation of partial order redu tion te hniques. Currently, translation from tra es to Promela is supported. The translation me hanism is similar to the te hnique explained in [27℄ for translations from message sequen e harts (MSC) to Promela. The instrumentation module inserts ode at the appropriate pla es in the program to be monitored. The instrumented program is su h that it outputs the values of variables relevant to the predi ate in question and keeps a ve tor

lo k that is updated for ea h internal, send and re eive event a

ording to the Fidge/Mattern algorithm [10, 28℄. We use the ve tor lo k to obtain a partial order representation of tra es. Upon running the instrumented program a separate log le for ea h pro ess is generated. Ea h log le onsists of a sequen e of lo al states that a pro ess goes through. Ea h lo al state ontains the values of variables relevant to the predi ate being veri ed and a ve tor lo k. Log les for every pro ess are then

ombined to obtain a partial order representation of the exe ution tra e. Instead of using a log le, if every pro ess sends its tra e to a dedi ated pro ess whi h ombines them during runtime, we an obtain an on-line veri ation environment. Currently, programs are manually instrumented. We ondu ted experiments with Java and Promela programs. For SPIN programs, we insert ode into Promela programs and also made hanges to the SPIN sour e ode so that we

an obtain a partial order model when we run SPIN in simulation mode with 8

the option for generating a message sequen e hart output. SPIN's MSC output is by default a total ordered exe ution. However, we observed from this MSC output that there are unne essary dependen ies therefore events do not need to be totally ordered su h as request messages from two di erent pro esses sent to two di erent servers do not need to be totally ordered. We are in the pro ess of hoosing an appropriate instrumentation te hnique for Java programs. The hoi e is between Java JDI as in [1℄ or byte ode instrumentation as in JPaX. 4

Model

We assume a loosely- oupled message-passing asyn hronous system without any shared memory or a global lo k. A distributed program onsists of n sequential pro esses denoted by P1 ; P2 ; : : : ; Pn ommuni ating via asyn hronous messages. In this paper, we are on erned with a single omputation (exe ution ) of a distributed program. We assume that no messages are altered or spuriously introdu ed. We do not make any assumptions about FIFO nature of hannels. Traditionally, a distributed omputation is modeled as a partial order on a set of events, alled happened-before relation [26℄. The happened-before relation between any two \primitive" events e and f an be formally stated as the smallest relation su h that e happened-before f if and only if e o

urs before f in the same pro ess, or e is a send of a message and f is a re eive of that message, or there exists an event g su h that e happened-before g and g happened-before f. In this paper we relax the restri tion that the order on events must be a partial order. More pre isely, we use dire ted graphs to model distributed

omputations as well as sli es. Dire ted graphs allow us to handle both of them in a uniform and onvenient manner. Given a dire ted graph G, let V(G) and E(G) denote the set of verti es and edges, respe tively. We de ne a onsistent ut (global state) on dire ted graphs as a subset of verti es su h that if the subset ontains a vertex then it ontains all its in oming neighbours. Formally, C is a onsistent ut of G, if 8e; f 2 V(G) : (e; f) 2 E(G) ^ (f 2 C) ) (e 2 C). We say that a strongly

onne ted omponent is non-trivial if it has more than one vertex. We denote the set of onsistent uts of a dire ted graph G by C (G). Observe that the empty set ; and the set of verti es V(G) trivially belong to C (G). We all them trivial onsistent uts. We use P (G) to denote the set of pairs of verti es (u; v) su h that there is a path from u to v in G. We assume that ea h vertex has a path to itself. We model a distributed omputation (or simply a omputation ), denoted by hE; !i, as a dire ted graph with verti es as the set of events E and edges as !. We use event and vertex inter hangeably. To limit our attention to only those onsistent uts that an a tually o

ur during an exe ution, we assume that P (hE; !i) ontains at least the Lamport's happened-before relation [26℄ . A distributed omputation in our model an ontain y les. This is be ause whereas a omputation in the happened-before model aptures the observable 9

order of exe ution of events, a omputation in our model aptures the set of possible onsistent uts. Intuitively, ea h strongly onne ted omponent of a

omputation an be viewed as a meta-event ; a meta-event onsists of one or more primitive events. We assume the presen e of a titious global initial and a global nal event, denoted by ? and >, respe tively. The global initial event o

urs before any other event on the pro esses and initializes the state of the pro esses. The global nal event o

urs after all other events on the pro esses. Any nontrivial onsistent ut will ontain the global initial event and not the global nal event. Therefore, every onsistent ut of a omputation in traditional model (happened-before model) is a non-trivial onsistent ut of the omputation in our model and vi e versa. Note that the empty onsistent ut, ;, in the traditional model orresponds to f?g in our model and the nal onsistent ut, E, in the traditional model orresponds to E f>g in our model and we denote this by E . We use upper ase letters C, D, H, V , and W to represent onsistent uts. The prede essor and su

essor events of an event e on the pro ess on whi h e o

urs are denoted by pred(e) and su

(e), respe tively, if they exist. A frontier of a onsistent ut is the set of those events of the ut whose su

essors, if 4 they exist, are not ontained in the ut. Formally, frontier(C) = f e 2 C j su

(e) exists ) su

(e) 62 C g. A onsistent ut is uniquely hara terized by its frontier and vi e versa. Figure 3 shows a omputation and its latti e of (non-trivial) onsistent uts. A onsistent ut in the gure is represented by its frontier. For example, the

onsistent ut C = fe3 ; e2 ; e1 ; f2 ; f1 ; ?g is represented by fe3 ; f2 g. {e3 , f 3} {e2 , f 3}11 00 00 11

x 2

1 0 0e1 1

P1

4

0 1

e21 0

5

0 1

e31 0

1 0 0 1 P2

C

1 0 0 1 f

11 0

0 y1 0

f2

1 0 0 1 2

f3

1 0 0 1 6

Initially x=0, y=0 (a)

11 00 00 11 00 11

C

{e3 , f 2} 11 00 00 11 00 11 00 11 00 00 11 00 {e1 , f 3} 11 {e2 , f 2} 11 {e3 , f 1} 00 11 00 11 00 11 00 11 00 11 00 11 00 00 11 00 {f 3}11 {e1 , f 2} 11 {e2 , f 1} 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 00 11 {f 2} 11 {e1 , f 1} 00 11 00 11 00 11 00 11 00 {f 1}11 00 11 00 11 00 11 00 11 00 11 00 11 { }

(b)

Figure 3: (a) A omputation hE; !i (b) and its latti e orresponding to C (G) Given a onsistent ut, a predi ate is evaluated with respe t to the values of variables resulting after exe uting all events in the ut. If a predi ate p evaluates to true for a onsistent ut C, we say that C satis es p. We leave the predi ate unde ned for the trivial onsistent uts. 10

5

Ba kground on Sli ing

The notion of omputation sli e is based on the Birkho 's Representation Theorem for Finite Distributive Latti es [6℄ whi h we des ribe next.

5.1 Birkho 's Theorem We rst des ribe some on epts needed to understand the theorem. Given a latti e, its meet (in mum) and join (supremum) operators are denoted by u and t , respe tively. A latti e is distributive if meet distributes over join [6℄. We

all an element of a latti e join-irredu ible if it annot be expressed as join of two distin t elements (of the latti e), both di erent from itself [6℄. Let L be a latti e and J I (L) be the set of its join-irredu ible elements. In ase L is a distributive latti e, it satis es an important property. Spe i ally, every element in L an be expressed as join of some subset of elements in J I (L) and vi e versa [6, Birkho 's Theorem℄. In other words, J I (L) ompletely hara terizes L. This is signi ant be ause jJ I (L)j is generally mu h smaller|exponentially in many

ases|than jLj. Hen e if some omputation on L an instead be performed on J I (L), we obtain a signi ant omputational advantage. Consider a omputation hE; !i and let C (E) denote the set of its onsistent

uts. In [15℄, it was shown that C (E) forms a distributive latti e under the relation ; its join and meet operators orrespond to set union ([) and set interse tion (\), respe tively. Furthermore, no additional stru tural property is satis ed by C (E). The set of join-irredu ible elements of C (E) is isomorphi to the set of strongly onne ted omponents of hE; !i . Now, onsider a subset D  C (E). We say that D forms a sublatti e of C (E) if D is losed under set union and set interse tion. That is, given two onsistent

uts from D, the onsistent uts obtained by their set union and set interse tion also belong to D. It an be proved that any sublatti e of a distributive latti e is also a distributive latti e [6℄. Thus if D is a sublatti e of C (E), then, using Birkho 's Theorem, J I (D) ompletely hara terizes D. This forms the basis for the notion of omputation sli e.

5.2 Computation Sli e Roughly speaking, a omputation sli e (or simply a sli e) is a on ise representation of all those onsistent uts of the omputation that satisfy the predi ate. More pre isely,

De nition 1 (sli e [32℄) A sli e of a omputation with respe t to a predi ate is a dire ted graph with the least number of onsistent uts that ontains all

onsistent uts of the given omputation for whi h the predi ate evaluates to true. We denote the sli e of a omputation hE; !i with respe t to a predi ate p by sli e(hE; !i ; p). Note that hE; !i = sli e(hE; !i ; true). It was proven in [32℄ that the sli e exists and is uniquely de ned for all predi ates. The main 11

idea behind the proof is as follows. Consider the omputation hE; !i and a predi ate p. Let C (E) denote the set of onsistent uts of hE; !i and further, let Cp (E)  C (E) be the subset of those onsistent uts that satisfy p. We show that there exists a unique subset D  C (E) satisfying the following onditions. First, D ontains Cp (E), that is, Cp (E)  D. Se ond, D forms a sublatti e of C (E). Last, among all sublatti es that ful ll the rst two onditions, D is the smallest one. From Birkho 's Theorem, J I (D), the set of join-irredu ible elements of D, ompletely hara terizes D. We all the poset (partially ordered set) indu ed on the onsistent uts of J I (D) by the relation  as the sli e of hE; !i with respe t to p. Ea h join-irredu ible element gives rise to a metaevent. Alternatively, the sli e an also be represented by a dire ted graph drawn on the set of events su h that the set of onsistent uts of the graph is exa tly D. Su h a graph an be obtained by simply forming a strongly onne ted

omponent out of ea h meta-event. Whereas the poset representation of a sli e is better for presentation purposes, the graph representation is more suited for sli ing algorithms. Every sli e derived from the omputation hE; !i has the trivial onsistent uts (; and E) among its set of onsistent uts. A sli e is empty if it has no non-trivial onsistent uts [32℄. In the rest of the paper, unless otherwise stated, a onsistent ut refers to a non-trivial onsistent ut. In general, a sli e will ontain onsistent uts that do not satisfy the predi ate (besides trivial onsistent uts). In ase a sli e does not ontain any su h ut, it is alled lean. We next give the lass of predi ates for whi h the sli e is lean. 6

Regular Predi ates

Given a omputation, the set of onsistent uts satisfying a regular predi ate forms a sublatti e of the set of onsistent uts of the omputation [14℄. Equivalently,

De nition 2 (regular predi ate [32℄) A predi ate is regular if given two onsistent uts that satisfy the predi ate, the onsistent uts obtained by their set union and set interse tion also satisfy the predi ate. Formally, given a regular predi ate p, (C satis es p) ^ (D satis es p) ) (C \ D satis es p) ^ (C [ D satis es p) We say that a regular predi ate is non-temporal if it does not ontain temporal operators su h as EF, AG, and EG, otherwise it is a temporal regular predi ate. In [14℄ polynomial-time algorithms are given for omputing sli es for non-temporal regular predi ates. In [35℄, we showed that EF(p), AG(p), and EG(p) are temporal regular predi ates when p is regular and gave polynomialtime algorithms to ompute these sli es, whi h we will brie y explain in the next se tion. Some examples of non-temporal regular predi ates are monotoni hannel predi ates su h as \there are at least k messages in transit from Pi to Pj ", onjun tion of lo al predi ates su h as \Pi and Pj are in riti al se tion", and relational predi ates su h as x1 x2  5, where xi is a monotoni ally non-de reasing 12

integer variable on pro ess i. From the de nition of a regular predi ate we dedu e that a regular predi ate has a least satisfying ut and a greatest satisfying

ut. Furthermore, the lass of regular predi ates is losed under onjun tion. Also in [31℄ polynomial-time algorithms are given to ompute sli es with respe t to boolean ombination of regular predi ates. Given the sli es with respe t to two regular predi ates, the omplexity of omputing the sli e for the onjun tion and disjun tion of these regular predi ates is O(n2 jE j). The omplexity of

omputing the sli e for the negation of a regular predi ate is O(n2 jE j2 ). Note that regular predi ates are not losed under disjun tion and negation operators therefore sli es obtained with respe t to predi ates that ontain these operators may not be lean.

6.1 RCTL+ Syntax and Semanti s We de ne su

essor of a ut by a relation .  C (G)  C (G) su h that C . D if and only if D = C [ e, where e is the set of verti es in some strongly onne ted

omponent in hE; !i and e \ C = ;. We denote the re exive losure of this relation by .. A onsistent ut sequen e C ; C ; : : : ; C of (C (G); ) satis es that for ea h 0  i < k, C . C . We say that a ut D is rea hable from a ut C if C  D. 0

i

1

k

i+1

Propositional temporal logi s use a nite set of atomi propositions AP , ea h one of whi h represents some property of the global state. A labeling fun tion : C (G) ! 2AP assigns to ea h global state the set of predi ates from AP that hold in it. In this paper we assume that atomi propositions are non-temporal regular predi ates and their negations. The formal syntax of RCTL+ is given below.  Every predi ate ap 2 AP is an RCTL+ formula.  If p and q are RCTL+ formulas, then so are p _ q, p ^ q, EF(p), EG(p), and AG(p). Given a nite distributive latti e L = (C (G); ), the formulas of RCTL+ are interpreted over the onsistent uts in C (G). Let p be an RCTL+ formula and C be a onsistent ut in C (G). Then, the satisfa tion relation, L; C j= p means that predi ate p holds at onsistent ut C in latti e L = (C (G); ) and is de ned indu tively below. We denote C j= p as a short form for L; C j= p, when L is lear from the ontext.  C j= ap i ap 2 (C) for an atomi proposition ap.  C j= p ^ q i C j= p and C j= q.  C j= p _ q i either C j= p or C j= q.  C j= EG(p) i for some onsistent ut sequen e C0 ; : : : ; Ck su h that (i) C0 = C, (ii) Ck = E , (iii) Ci . Ci+1 for 0  i < k, we have (iv) Ci j= p for all 0  i  k.  C j= AG(p) i for all onsistent ut sequen es C0 ; : : : ; Ck su h that (i) C0 = C, (ii) Ck = E , (iii) Ci . Ci+1 for 0  i < k, we have (iv) Ci j= p for all 0  i  k.  C j= EF(p) i for some onsistent ut sequen e C0 ; : : : ; Ck su h that (i) C0 = C, (ii) Ck = E , (iii) Ci . Ci+1 for 0  i < k, we have (iv) Ci j= p for some 13

0  i  k. We de ne L j= p if and only if L; f?g j= p. The formula C j= AG(p) (resp. C j= EG(p)) intuitively means that for all onsistent ut sequen es (resp. for some onsistent ut sequen e) C; : : : ; E , p holds at every ut of the sequen e. The formula C j= EF(p) intuitively means that for some onsistent ut sequen e C; : : : ; E , there exists a onsistent ut that satis es p. We de ne RCTL as the subset of RCTL+ where disjun tion and negation operators are not allowed. The predi ate dete tion problem is to de ide whether the initial onsistent

ut of a distributed omputation satis es a predi ate. 7

Algorithms for Computing Sli es for Temporal Predi ates

Our distributed program analysis tool POTA uses omputation sli ing for predi ate dete tion. Mittal and Garg [31℄ also used omputation sli ing for eÆ ient dete tion of predi ates of the form EF(p), EG(p), AG(p) for non-temporal regular p. However, their predi ate dete tion algorithm is based on omputing sli es for non-temporal regular predi ates. Therefore, it annot be used for dete ting nested temporal predi ates su h as AG(p) when p is a temporal predi ate like p = EF(q). In this se tion, we explain our sli ing algorithms from [35℄ for temporal regular predi ates to enable eÆ ient predi ate dete tion for RCTL+ whi h also in ludes nested temporal predi ates. The sli e of a omputation with respe t to a temporal predi ate is the smallest omputation that ontains all onsistent uts of the given omputation for whi h the predi ate holds. We proved in [35℄ that temporal predi ates EF(p), EG(p), and AG(p) are regular for regular p. Therefore, the sli es for these temporal predi ates are lean. The input to ea h algorithm in this se tion is a omputation hE; !i and its sli e with respe t to a regular predi ate p, that is, sli e(hE; !i ; p). The output of ea h algorithm is an appli ation of a temporal operator on the sli e. For example, in order to generate a sli e with respe t to AG(EF(p)), where p is a non-temporal regular predi ate, we an use the sli ing algorithms explained in this se tion as follows: First, we ompute the sli e for p using the algorithms in [14, 31℄ for non-temporal regular predi ates. Then, we give this sli e and the omputation to the EF sli ing algorithm to obtain sli e(hE; !i ; EF(p)). Finally, the output of EF sli ing algorithm and the omputation is given as an input to AG sli ing algorithm to obtain sli e(hE; !i ; AG(EF(p))). Sin e the onsistent uts of the sli e of a omputation is a subset of onsistent uts of the omputation, the sli e an be obtained by adding edges to the omputation. In other words, the sli e ontains additional edges that do not exist in the omputation. For example, onsider Figure 7(a) that displays the sli e of the omputation in Figure 3 with respe t to :((x = 5) ^ (y = 2)). 14

The only onsistent ut in the omputation that does not satisfy the predi ate is fe3 ; e2 ; e1 ; f1 ; ?g. By adding the edge (f2 ; e3 ), we disallow this onsistent ut from the sli e. Below, for omputing sli es for EF(p), we will show whi h edges we add to the omputation. Similarly, sin e the onsistent uts of the sli e for AG(p) is a subset of onsistent uts of the sli e for p, the sli e for AG(p) an be obtained by adding edges to the sli e for p. Now we explain Algorithm A1 in Figure 5 for generating the sli e of a omputation with respe t to EF(p). From the de nition of EF(p), all onsistent

uts of the omputation that an rea h the greatest onsistent ut that satisfy p, say W , will also satisfy EF(p) and furthermore these are the only uts that satisfy EF(p). We an nd the ut W using sli e(hE; !i ; p) when it is nonempty. We onstru t the sli e for EF(p) from the omputation so that W is the nal ut of the sli e. To ensure that all uts whi h annot rea h W do not belong to the sli e, we add edges from > to the su

essors of events in the frontier of W . Adding an edge from > to an event makes any ut that ontains the event trivial. Figure 4 shows the appli ation of Algorithm A1. Given the sli e of the omputation in Figure 3(a) for some predi ate p as shown in Figure 4(a), rst we ompute the nal ut of the sli e for p, that is, fe2 ; f3 g. Then, on the omputation, we add an edge from > to the su

essor of e2 , that is e3 . The su

essor of f3 does not exist so we do not add any other edges. The resulting sli e for EF(p) is displayed in Figure 4( ). Now we des ribe the AG(p) sli ing algorithm in Figure 5. We explained above that to obtain the sli e for AG(p) we will add edges to the sli e for p and eliminate onsistent uts that do not belong to sli e for AG(p). Now we show whi h edges we should add. We laim that onsistent uts of the sli e(hE; !i ; p) that do not in lude vertex e of ea h additional edge (e; f) do not satisfy AG(p). For simpli ity, let the sli e(hE; !i ; p) have a single additional edge (e; f). For example, onsistent uts f?g, ff1 ; ?g, fe1 ; f1 ; ?g, and fe2 ; e1 ; f1 ; ?g of the sli e in Figure 7(a) do not in lude vertex f2 of the additional edge (f2 ; e3 ). It is easy to see that these four onsistent uts do not satisfy AG(p) and therefore we should add edges to eliminate them. We now give a proof sket h of the orre tness of the algorithm for the simpli ed ase with a single additional edge. The proof for full ase an be found in [35℄.

Theorem 1 Given a omputation hE; !i and sli e(hE; !i ; p), a onsistent ut D in hE; !i satis es AG(p) i it in ludes vertex e of the additional edge (e; f) in sli e(hE; !i ; p). Proof Sket h:

If a onsistent ut D does not in lude vertex e then there exists a onsistent

ut H that an be rea hed from D in the omputation su h that H does not in lude e but in ludes f. In this ase, it is lear that H does not satisfy p sin e (e; f) is an edge in the sli e(hE; !i ; p) and every onsistent ut of sli e(hE; !i ; p) that in ludes f must in lude e. Therefore from the de nition of AG(p), D does not satisfy AG(p). Now we prove the other dire tion. If a onsistent ut D does not satisfy AG(p) then there exists a onsistent ut H rea hable from D su h that H does 15

{e2 , f 3}11 00

11 00 00 11

e1

1 0 0 1 0 1

1 0 0 1 0 1

1 0 0 1 0 1

e2

e3

1 0 0 1 0 1

00 {f 3}11 1 0 0 1 0 1

0f 1 1

1 0 0 1 0 1

f2

1 0 0 1

f 31 0

11 00 00 11

00 11

00 {f 2} 11 00 11

11 00 {e2 , f 2} 00 11 00 11 00 11 {e2 , f 1} 00 11 00 11

00 11

00 {f 1}11 11 00 00 11

1 0 0 1

11 00 00 11 00 11 00 11 { }

(a)

(b)

{e2 , f 3}11 00 00 11

00 {e1 , f 3} 11 00 11

1 0 0e1 1

0 1

e21 0

0 1 1 0 0 1

f

1 1 0 0 1

f2

1 0 0 1

f3

11 00 {e2 , f 2} 00 11 00 11 00 00 11 00 {f 3}11 {e1 , f 2} 11 {e2 , f 1} 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 00 11 {f 2} 11 {e1 , f 1} 00 11 00 11 00 11 00 11 00 {f 1}11 00 11 00 11 00 11 00 11 00 11 00 11 { } 00 11

e31 0

1 0 0 1

00 11

1 0 0 1

(c)

(d)

Figure 4: (a) A sli e of hE; !i in Fig. 3 (b) the orresponding sublatti e ( ) The appli ation of the temporal operator EF on the sli e in (a) (d) the orresponding sublatti e not satisfy p. We know that only the onsistent uts that in lude f but not e do not satisfy p. Sin e H is rea hable from D and H does not in lude e, we have that D also does not in lude e. In Algorithm A2, for any additional edge (e; f), we add an edge from vertex e to vertex ?. This ensures that onsistent uts of the omputation that do not in lude vertex e of any additional edge (e; f) are disallowed from the sli e, whereas the rest still belong to sli e(hE; !i ; AG(p)). For example, onsistent

ut fe1 ; f1 ; ?g of the sli e in Figure 7(a) does not in lude vertex f2 of the additional edge (f2 ; e3 ) in Figure 7(a), therefore we add an edge (f2 ; ?) and obtain the sli e in Figure 7( ). The ut fe1 ; f1 ; ?g annot be a onsistent ut of this new sli e sin e it has to in lude vertex f2 . The algorithm for EG(p) sli ing displayed in Figure 6 is similar to the AG(p) sli ing algorithm and will not be explained here. The omplexity of the temporal 16

Algorithm A1

h !i h !i ) h !i Let be h !i and let be the nal ut of sli e(h !i ) If exists then 8 2 ( ): add an edge from the vertex > to ( ) in

Input: A omputation E; and sli e( Output: sli e( E; ; EF(p))

Step 1.

Step 2. Step 3. Step 4. Step 5.

G

E;

E;

E;

;p

W

;p

W

e

f rontier W

su

e

return G else return empty sli e

Algorithm A2

G

h !i h !i : ! h !i !

h !i

Input: A omputation E; and sli e( E; ; p) Output: sli e( E; ; AG(p)) Step 1. Let G be sli e( E; ; p) Step 2. For ea h pair of verti es (e; f ) in G su h that, (i) (e f ) in E; , and (ii) (e f ) in G

h !i

Step 3.

add an edge from vertex

return

e

to the vertex

G

?

Figure 5: Algorithms for generating a sli e with respe t to EF(p) and AG(p) Algorithm A3

h !i h !i : ! h !i ! !

h !i

Input: A omputation E; and sli e( E; ; p) Output: sli e( E; ; EG(p)) Step 1. Let G be sli e( E; ; p) Step 2. For ea h pair of verti es (e; f ) in G su h that, (i) (e f ) in E; , and (ii) (e f ) and (f e) in G

h !i

Step 3.

add an edge from vertex

return

e

to the vertex

G

?

Figure 6: Algorithm for generating a sli e with respe t to EG(p) sli ing algorithms is O(njE j) [35℄.

7.1 Complexity of RCTL Predi ate Dete tion Given a predi ate in RCTL we an ompute the sli e for the predi ate re ursively from inside-out by applying the appropriate temporal or boolean operator on the sli es. It is then easy to see if the predi ate is satis ed by just he king whether the initial state of the omputation and the sli e are the same. The

omplexity of predi ate dete tion is dominated by the omplexity of omputing the sli e with respe t to a non-temporal regular predi ates, whi h has O(n2 jE j)

omplexity [14, 31℄. Therefore, the overall omplexity of predi ate dete tion 17

{e3 , f 3}

e1 1 0 0 1 02 1

e2 1 0 0 1 04 1

e3 1 0 0 1 0 1 5

f 21 0

f 31 0

11 00 00 11 00 11 {e2 , f 3}11 {e , f } 00 000 00 111 11 0003 2 00 111 11 000 111 {e1 , f 3} 11 00 00 11 {e2 , f 2} 00 11 00 11 00 11 00 11 00 00 {f 3}11 {e2 , f 1} {e , f } 11 00 11 00 11 00 001 2 11 11 00 11 00 11 00 11 00 11 0 1 00 11 00 {f 2} 11 {e , f } 0 1 00 11 00 11 0 1 001 1 11 00 11 00 {f 1}11 00 11 00 11 00 11 00 11 00 11 00 11 { }

1 0 0 1 0 1 f 1 0 01 1 00 1

1 0 02 1

1 0 06 1

(a)

(b) {e3 , f 3}

e1 1 0 0 1 02 1

1 0 0 1 0 1 f1 1 0 0 1 00 1

e2 1 0 0 1 0 1

e3 1 0 0 1 0 1

f 21 0

f 31 0

4

1 0 02 1

11 00 00 11 00 11 {e2 , f 3}11 {e3 , f 2} 00 000 111 00 11 000 00 111 11 000 111 {e1 , f 3} 11 00 00 11 {e2 , f 2} 00 11 00 11 00 11 00 11 00 {f 3}11 {e1 , f 2} 00 11 00 11 00 11 00 11 00 11 00 11 0 1 00 {f 2} 11 0 1 00 11 0 1 00 11

5

1 0 06 1

(c)

(d)

Figure 7: (a) The sli e of hE; !i in Fig. 3 with respe t to :((x = 5) ^ (y = 0)) (b) the orresponding sublatti e ( ) The sli e of hE; !i in Fig. 3 with respe t to AG :((x = 5) ^ (y = 2)) (d) the orresponding sublatti e for RCTL is O(jpj  n2 jE j), where jpj is the number of boolean and temporal operators in p. The predi ate dete tion in RCTL+ has worst ase exponential-time omplexity. However, the sli e is in general mu h smaller than the omputation whi h we validate with experiments of the next se tion. 8

Experimental Results

In order to evaluate the e e tiveness of POTA, we performed experiments with s alable proto ols, omparing our omputation sli ing based approa h with partial order redu tion based approa h of SPIN [21℄. All experiments were performed on a 1.4 Ghz Pentium 4 ma hine running Linux. We restri ted the 18

memory usage to 512MB, but did not set a time limit. The two performan e metri s we measured are running time (T in se onds) and memory usage (M in megabytes). We use the symbol  to denote that the veri ation was not omplemented due to running out of memory. Due to spa e limitations we present some of the experimental results in this se tion. The other results an be found in [34℄. We onsider the following message passing distributed programs distributed dining philosophers, primary-se ondary, and GIOP proto ols.

8.1 Distributed Dining Philosophers (ddph) We use the Java proto ol from [18℄ for this exer ise and he k the following prop W erties. The omplement of the safety property, that is, i;j 20:::(n 1) EF(eati ^ 

eatj ) where i and j denote philosophers next to ea h other. The omplement 



W

of the liveness property, that is, i20:::(n 1) EF hungryi ^ EG(:eati ) , for ea h philosopher i. Observe that the negation of a lo al predi ate :eati is also a lo al predi ate and furthermore it is a regular predi ate. Finally, we he k the property AG(EF(eati )) whi h denotes that eating is possible from every state. Table 1 displays our results for the liveness property.

POTA SPIN

T M T M

3 0.14 0.18 0.1 1.67

Table 1: Distributed Dining Philosophers, Liveness Property

4 0.17 0.29 1.16 4.13

5 0.19 0.36 15.67 35.46

6 0.23 0.42 144.75 223.26

7 0.22 0.5

8 0.4 0.64

9 0.46 0.78

10 0.49 0.92

20 3.54 1.57

* *

30 10.37 4.53

40 18.16 6.84

100 137.26 33.45

8.2 Primary Se ondary The primary se ondary program [39℄ on erns an algorithm designed to ensure that the system always ontains a pair of pro esses a ting together as primary and se ondary. The property requires that there is a pair of pro esses Pi and Pj su h that (1) Pi is a ting as a primary and orre tly thinks that Pj is its se ondary, and (2) Pj is a ting as a se ondary and orre tly thinks that Pi is its primary. Both the primary and se ondary may hoose new pro essesas their su

essor at any time. The omplement of the safety property is  V EF :isP rimaryi _ :isSe ondaryj _ (se ondaryi 6= Pj ) _ (primaryj 6= Pi ) when i; j 2 0 : : : (n 1), i 6= j. Note that this predi ate ontains disjun tion operators and the sli e may not be lean. However, Table 2 shows that even in this ase sli ing an redu e the state spa e substantially.

19

250 965.3 96.01

POTA SPIN

T M T M

3 0.14 0.44 0.01 1.57

Table 2: Primary Se ondary, Safety Property

4 0.21 0.82 0.01 1.57

5 0.28 1.38 0.02 1.67

6 0.6 0.38 0.07 2.29

7 0.97 5.7 0.38 5.56

8 1.03 3.14 0.45 6.48

9 1.34 5.08 1.96 23.48

10 2.46 13.73

20 7.88 64.74

30 33.29 178.35 * *

40 133.37 239.2

8.3 GIOP In this se tion, we present experimental results for the General Inter-ORB Proto ol (GIOP) whi h was veri ed in [23℄ using SPIN. The Common Obje t Request Broker Ar hite ture (CORBA) [17℄ des ribes the ar hite ture of a middleware platform that supports the implementation of appli ations in distributed and heterogeneous environments. The CORBA standard is issued by OMG. The ORB is the key omponent of the CORBA programming model. An ORB is responsible for transferring operations from Clients to Servers. This requires the ORB to lo ate a Server implementation (and possibly a tivate it), transmit the operation and its parameters, and nally return the results ba k to the Client. The General Inter-ORB Proto ol (GIOP) is the abstra t proto ol whi h is used for ommuni ations between CORBA ORBs. It spe i es the transfer syntax and a standard set of message formats for ORB interoperation over any

onne tion-oriented transport Proto ol. GIOP is designed to be simple and easy to implement, while still allowing for reasonable s alability and performan e. In order to allow server obje ts to move between di erent ORBs and have messages forwarded to them wherever they are, GIOP supports server migration. Figure 8 displays the high level view of the Promela model of the GIOP system as depi ted in [23℄. The proto ol onsists of User, Client, Transport, Agent and Server pro esses. Here, we ondu t experiments for 4 of the 8 LTL predi ates used in [23℄ (properties (iv) and (v) are onsidered as one). 1. After sending a URequest message a User should eventually re eive the

orresponding UReply message.   EF URequestSenti ^ EG(:UReplyRe eivedi) , for all users i. 2. After sending an SRequest the GIOP-Agent should eventually re eive a

orresponding SReply.   EF SRequestSenti ^ EG(:SReplyRe eivedi) , for all agents i. 3. Requests sent by a lient are responded to eventually by a reply unless they have been an elled.  EF CRequestSenti ^ EG(:CReplyRe eivedi _ :CCan elSenti) , for all lients i. 20

User

Server

GIOP Client

GIOP Agent

Transport

Transport

Figure 8: GIOP model 4. If the its request was performed  user re eived no ex eption, W V  exa tly on e. AG :NoEx eptioni _ k j Serverj P ro essedi = m , where m = 1 if k = j and m = 0 otherwise, for all users i and for all servers j; k. 5. If the request was performed at most on e.  user re eived ex eption, WitsV  AG :SystemEx eptioni _ k j Serverj P ro essedi = m V



_

Serverl P ro essedi = 0 , where m = 1 if k = j and m = 0 l otherwise, for all users i and for all servers j; k; l. The full veri ation (not approximate) of GIOP by Kamel and Leue [23℄ even for the on guration in Figure 8 with 10 pro esses was not ompleted due to state explosion. They ould verify a simpli ed version of proto ol without server migration with 10 pro esses. To enable veri ation for larger number of pro esses, they used an approximation te hnique in SPIN alled bit-state hashing where two bits of memory are used to store a rea hable state. SPIN displays a state overage number (hash-fa tor) at the end of a veri ation with bit-state hashing. With bit-state hashing, they ould verify the unsimpli ed version of proto ol with 20 pro esses with 1.5 hash-fa tor, whi h means that the overage was less than one per ent sin e best overage is obtained when the hash-fa tor is greater than 100. We generated exe ution tra es for a variety of GIOP ar hite tures where we dupli ated the User and Server blo ks. In one ase, we generated exe ution tra es from unsimpli ed version of GIOP proto ol where the total number of pro esses was in reased to 250 and ompleted full veri ation of these tra es. In Tables 3 and 4, we present our experimental results for the GIOP models with server migration. 21

Table 3: GIOP Property (iii) POTA SPIN

10 0.13 0.5 362.61 320.03

T M T M

20 0.57 1.71

30 0.42 3.19

60 2.77 26.62 * *

Table 4: GIOP Property (v) POTA SPIN

T M T M

10 0.12 0.48 319.21 305.49

20 5.34 4.48

30 19.06 24.62

170 28.12 92

60 134.68 158.11 * *

250 45.2 256.3

170 174.33 257.69

8.4 Dis ussion In all exe ution tra e veri ations, SPIN ould verify up to only 7 pro esses in ddph, and 10 in primary se ondary and GIOP proto ols, even when DCOLLAPSE and DMA ompilation options were used. Observe that sin e the memory we use is larger than the one used in [23℄, the veri ation of the unsimpli ed GIOP with 10 pro esses is now possible in SPIN. We obtain more than three orders of magnitude speed up and state spa e redu tion ompared to partial order redu tion with SPIN as shown in GIOP experiments. Using our sli ing based te hnique we ould verify upto 250 pro esses in some ases. We also inje ted faults into the tra es and ompared results. Even with bit-state hashing enabled veri ation in SPIN, the faults ould not be found be ause the state spa es were too large and the overage was low. However, the faults were easily found by POTA. 9

Con lusion

We have presented a partial order exe ution tra e analysis tool (POTA) that implements our omputation sli ing algorithms. For problem sizes that pre lude exhaustive program veri ation or exhaustive runtime veri ation, POTA proves to be an e e tive tool. Our te hnique is orthogonal to other redu tion te hniques, that is, one an always use POTA to redu e the state spa e as long as we an exploit the spe i ation for omputation sli ing. As a next step, we would like to apply our theory to multi-threaded Java programs. Furthermore, we are also interested in the resear h of sele tion of exe ution tra es. A knowledgements : We would like to a knowledge Neeraj Mittal for his ontribution in the implementation of our tool. We also thank Gerard J. Holzmann for dis ussion on SPIN.

22

Referen es

[1℄ M. Brorkens and M. Moller. Dynami event generation for runtime he king using the JDI. In Klaus Havelund and Grigore Rosu, editors, Ele troni Notes in Theoreti al Computer S ien e, volume 70 of Ele troni Notes in Theoreti al Computer S ien e. Elsevier S ien e Publishers, 2002. [2℄ K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transa tions on Computer Systems, 3(1):63{75, February 1985. [3℄ B. Charron-Bost, C. Delporte-Gallet, and H. Fau onnier. Lo al and temporal predi ates in distributed systems. ACM Transa tions on Programming Languages and Systems, 17(1):157{179, Jan 1995. [4℄ E. M. Clarke and E. A. Emerson. Design and Synthesis of Syn hronization Skeletons using Bran hing Time Temporal Logi . In Pro . of the Workshop on Logi s of Programs, volume 131 of Le ture Notes in Computer S ien e, Yorktown Heights, New York, May 1981. [5℄ R. Cooper and K. Marzullo. Consistent dete tion of global predi ates. In Pro . of the Workshop on Parallel and Distributed Debugging, pages 163{ 173, Santa Cruz, CA, May 1991. ACM/ONR. [6℄ B. A. Davey and H. A. Priestley. Introdu tion to Latti es and Order. Cambridge University Press, Cambridge, UK, 1990. [7℄ D. Drusinsky. The temporal rover and the ATG rover. In SPIN Model Che king and Veri ation, volume 1885 of LNCS, pages 323{330, 2000. [8℄ M. B. Dwyer and J. Hat li . Sli ing software for model onstru tion. In

Pro . Workshop on Partial Evaluation and Semanti -Based Program Manipulation, pages 105{118, 1999.

[9℄ J. Esparza. Model he king using net unfoldings. S ien e of Computer Programming, 23(2):151{195, 1994. Also appeared in Pro . TAPSOFT '93, volume 668 of Le ture Notes in Computer S ien e, pages 613{628. Springer-Verlag, 1993. [10℄ C. J. Fidge. Partial orders for parallel debugging. Pro eedings of the ACM

SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Noti es, 24(1):183{194, January 1989.

[11℄ B. Finkbeiner and H. B. Sipma. Che king nite tra es using alternating automata. In Klaus Havelund and Grigore Rosu, editors, Runtime Veri ation 2001, volume 55 of Ele troni Notes in Theoreti al Computer S ien e, pages 44{60. Elsevier, July 2001. [12℄ V. K. Garg. Elements of Distributed Computing. John Wiley & Sons, 2002. 23

[13℄ V. K. Garg and N. Mittal. On dete ting global predi ates in distributed

omputations. In Pro . of the 15th International Conferen e on Distributed Computing Systems (ICDCS), pages 3{10, Phoenix, Arizona, 2001. [14℄ V. K. Garg and N. Mittal. On sli ing a distributed omputation. In Pro . of the 15th International Conferen e on Distributed Computing Systems (ICDCS), pages 322{329, Phoenix, Arizona, 2001. [15℄ V. K. Garg and N. Mittal. On sli ing a distributed omputation. In Pro . of the 15th International Conferen e on Distributed Computing Systems (ICDCS), pages 322{329, Phoenix, Arizona, 2001. [16℄ P. Godefroid and P. Wolper. A partial approa h to model he king. In Pro eedings of the 6th IEEE Symposium on Logi in Computer S ien e, pages 406{415, 1991. [17℄ Obje t Management Group. The Common Obje t Request Broker: Ar hite ture and Spe i ation. August 1997. [18℄ S. Hartley. Con urrent Programming: The Java Programming Language. Oxford University Press, 1998. [19℄ K. Havelund and G. Rosu. Monitoring java programs with Java PathExplorer. In K. Havelund and G. Rosu, editors, Runtime Veri ation 2001, volume 55 of Ele troni Notes in Theoreti al Computer S ien e. Elsevier S ien e Publishers, 2001. [20℄ K. Havelund and G. Rosu. Monitoring programs using rewriting. In Pro eedings of Int. Conferen e on Automated Software Engineering (ASE'01), pages 135{143, San Diego, California, 2001. [21℄ G. J. Holzmann. The model he ker spin. IEEE Transa tions on Software Engineering, 23(5), May 1997. [22℄ M. Hur n, M. Mizuno, M. Raynal, and M. Singhal. EÆ ient dete tion of

onjun tions of lo al predi ates. IEEE Transa tions on Software Engineering, 24(8):664{677, 1998. [23℄ M. Kamel and S. Leue. Formalization and validation of the general interorb proto ol (giop) using promela and spin. Software Tools for Te hnology Transfer, 2(4):394{409, April 2000. [24℄ M. Kim, S. Kannan, I. Lee, O. Sokolsky, and M. Viswanathan. Javama : a run-time assuran e tool for java programs. In K. Havelund and G. Rosu, editors, Runtime Veri ation 2001, volume 55 of Ele troni Notes in Theoreti al Computer S ien e. Elsevier S ien e Publishers, 2001. [25℄ B. Korel and J. Rilling. Appli ation of dynami sli ing in program debugging. In Pro . of the 3rd International Workshop on Automated and Algorithmi Debugging, pages 43{58, 1997. 24

[26℄ L. Lamport. Time, lo ks, and the ordering of events in a distributed system. Communi ations of the ACM, 21(7):558{565, July 1978. [27℄ S. Leue and P.B. Ladkin. Implementing and verifying ms spe i ations using promela/xspin. In Pro eedings of the DIMACS Workshop SPIN96, the 2nd International Workshop on the SPIN Veri ation System, volume 32 of DIMACS Series, 1997. [28℄ F. Mattern. Virtual time and global states of distributed systems. In

Parallel and Distributed Algorithms: Pro . of the International Workshop on Parallel and Distributed Algorithms, pages 215{226. Elsevier S ien e Publishers B.V. (North-Holland), 1989.

[29℄ K. L. M Millan. Symboli Model Che king. Kluwer A ademi Publishers, 1993. [30℄ L. I. Millett and T. Teitelbaum. Issues in sli ing promela and its appli ations to model he king, proto ol understanding, and simulation. Software Tools for Te hnology Transfer, 2(4):343{349, April 2000. [31℄ N. Mittal and V. K. Garg. Computation sli ing: Te hniques and theory. In In Pro . of the 15th International Symposium on Distributed Computing (DISC), pages 78{92, Lisbon, Portugal, 2001. [32℄ N. Mittal and V. K. Garg. Computation sli ing: Te hniques and theory. In In Pro . of the 15th International Symposium on Distributed Computing (DISC), pages 78{92, Lisbon, Portugal, 2001. [33℄ D. Peled. All from one, one for all: On model he king using representatives. In C. Cour oubetis, editor, Computer Aided Veri ation: Pro . of the 5th International Conferen e CAV'93, pages 409{423. Springer, Berlin, Heidelberg, 1993. [34℄ A. Sen. http://maple.e e.utexas.edu/~sen/. [35℄ A. Sen and V. K. Garg. Automati generation of sli es for temporal logi predi ate dete tion. Te hni al Report TR-PDS-2002-001, PDSL, ECE Dept. Univ. of Texas at Austin, 2002. Available at http://maple.e e.utexas.edu/. [36℄ A. Sen and V. K. Garg. Dete ting temporal logi predi ates on the happened-before model. In Pro . of the International Parallel and Distributed Pro essing Symposium (IPDPS), Fort Lauderdale, Florida, 2002. [37℄ K. Sen, G. Rosu, and G. Agha. Runtime safety analysis of multithreaded programs. Te hni al Report UIUCDCS-R-2003-2334, Univ. of Illinois at Urbana Champaign, April 2003. [38℄ S. D. Stoller and Y. Liu. EÆ ient symboli dete tion of global properties in distributed systems. In 10th Int'l. Conferen e on Computer-Aided Veri ation (CAV), volume 1447 of LNCS, pages 357{368, 1998. 25

[39℄ S. D. Stoller, L. Unnikrishnan, and Y. A. Liu. EÆ ient dete tion of global properties in distributed systems using partial-order methods. In 12th Int'l. Conferen e on Computer-Aided Veri ation (CAV), volume 1855 of LNCS, pages 264{279, 2000. [40℄ A. Valmari. A stubborn atta k on state explosion. In Pro . of ComputerAided Veri ation (CAV '90), volume 531 of LNCS, pages 156{165, Berlin, Germany, 1991. [41℄ M. Weiser. Programmers use sli es when debugging. Communi ations of the ACM, 25(7):446{452, July 1982.

26

10

Appendix

10.1 Illustration of Various Con epts 10.2 Illustration of Con epts in Se tion 4 Example 1 Consider the omputation depi ted in Figure 9(a). It has three pro esses, namely p1 , p2 and p3 . The events e1 , f1 and g1 are the initial events, and the events e4 , f4 and g4 are the nal events of the omputation. The ut A = fe1 ; e2 ; e3 ; e4 ; f1 ; g1 g is not onsistent be ause g4 ! e4 and e4 2 A but g4 62 A. On the other hand, the ut fe1 ; e2 ; f1 ; f2 ; g1 g is a onsistent ut. The events e1 , f1 and g1 belong to the same strongly onne ted omponent or metaevent. Pro esses p1 , p2 and p3 host integer variables x, y and z , respe tively. The predi ate x  1 is a lo al predi ate whereas the predi ate x + y  z is not. The onsistent ut fe1 ; f1; g1 g satis es the predi ate x + y  z but the onsistent

ut fe1; e2 ; f1 ; f2 ; g1g does not.

10.3 Illustration of the Birkho 's Theorem Example 2 Consider the omputation shown in Figure 9(a). The (distributive) latti e spanned by its set of onsistent uts is shown in Figure 9(b). In the gure, ea h onsistent ut is labeled with the number of events that have to be exe uted on ea h pro ess to rea h the ut. The join-irredu ible elements of the latti e have been drawn with thi k boundaries. (They have exa tly one in oming edge.) The latti e has eight join-irredu ible elements whi h is same as the number of strongly onne ted omponents of the omputation. It an be veri ed that every

onsistent ut of the omputation an be obtained as the join of some subset of these eight join-irredu ible elements and vi e versa. For instan e, the onsistent

ut R (in Figure 9(b)) an be expressed as the join of the onsistent uts U and V.

10.4 Illustration of the Notion of Computation Sli e Example 3 Consider the latti e of onsistent uts depi ted in Figure 9(b). The

onsistent uts that satisfy the predi ate x + y z  1 have been shaded in the gure. Figure 9( ) depi ts the smallest sublatti e that ontains these onsistent uts. The onsistent uts P and Q do not satisfy the predi ate but have been in luded to omplete the sublatti e. The join-irredu ible elements of the sublatti e have been drawn with thi k boundaries. There are, in total, seven join-irredu ible elements, namely T , U , V , W , X , Y and Z . Figure 9(d) portrays the partial order indu ed on the set J = fT; U; V; W; X; Y; Z g. There is a one-to-one orresponden e between the set of join-irredu ible elements and the set of strongly onne ted omponents of the graph shown in Figure 9(d). It an be veri ed that every onsistent ut in the sublatti e an be expressed as join of some subset of J and, furthermore, the join of every subset of J is a onsistent

ut of the sublatti e. 27

(4,4,4)

1 p1 x e1

p2

2

5

e2

e3

y 1 f1

3 p3 z g1

3

5

f2

f3

(3,3,3)

e4

5

6

g2

g3

(2,3,3)

(3,3,2)

f4 (3,3,1)

g4

(2,3,2)

(3,2,2)

(a) (3,2,1)

Z

(2,3,1)

(3,1,2)

(2,2,2)

(4,4,4)

R (3,1,1)

Q (3,3,3)

U

P

Y

(3,2,2)

(2,3,3)

(3,1,2)

(2,2,2)

V

(1,1,2)

(2,1,1)

(1,1,1)

X

W

(2,1,2)

(2,2,1)

strongly onne ted

omponents

(0,0,0)

(2,1,2)

U

V

(2,1,1)

(1,1,2)

p1

T

p2

(1,1,1)

p3

(0,0,0) ( )

T

U

e1

e2

f1

(b)

W e3

X f2

Z e4

Y f3

f4

V g1

g2

g3

g4

(d)

: trivial onsistent ut

: non-trivial onsistent ut

: join-irredu ible element

: onsistent ut that satis es the predi ate

Figure 9: (a) A omputation, (b) the latti e of its onsistent uts, ( ) the smallest sublatti e that ontains all onsistent uts satisfying the predi ate x + y z  1, and (d) the poset indu ed on the set of join-irredu ible elements of the sublatti e. 28