Capturing Causality
Assigning logical timestamps
Distributed snapshot
Logical Time For every minute spent in organizing, an hour is earned or a minute is lost. Marco Aiello, Eirini Kaldeli University of Groningen
Distributed Systems Course, 2009
Capturing Causality
Assigning logical timestamps
Outline
1
Capturing Causality Causal relation between events Changing the order of events
2
Assigning logical timestamps Lamport Timestamps Vector Clocks
3
Distributed snapshot Detecting a consistent global state Cuts and consistent cuts The Chandy-Lamport snapshot algorithm
Distributed snapshot
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Logical Time No reference to global time: Local physical clocks cannot be perfectly synchronised. So, cannot appeal to physical time to order events in a total manner. However, what really interests us is an order that preserves causality, i.e. the relation between events that potentially influence each other.
å
Assign logical timestamps to events, which are communicated through the standard message passing between the processors, and can be used to induce the causality relations between events.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Causal relation between events
Happens-Before relation Definition Event φi happens-before φj , denoted by φi → φj if either: 1
the two events occurred at the same process and φi precedes φj
2
φi is the event sending message m and φj is the event receiving m
3
there exists an event φ such that φi → φ and φ → φj (transitivity) The → relation is an irreflexive partial order. If φi 9 φj and φj 9 φi , then φi and φj are concurrent: φi kφj
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Causal relation between events
Causal influence on a space-time diagram Example What is the happened-before relation between the events? p1
φ1
φ2 m1 φ3
p2
φ4 physical time
m2
p3
φ5
φ6
+ What if we put φ5 after φ3 and before φ6 ?
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Changing the order of events
Causal Shuffle Definition Given a sequence of events σ = {φ1 , ..., φk }, a permutation π of σ is a causal shuffle of σ if: 1
the order of events occuring at individual processors remains unchanged, i.e. ∀i, 1 ≤ n, σ|i = π|i , where |i refers to the events occuring in pi .
2
if a message m is sent during pi ’s event φ in σ, then in π, φ precedes the delivery of m.
The resulting sequence π is indistinguishable to the processors. Lemma Any total ordering of the events in σ that is consistent with the → relation is a causal shuffle of σ.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Changing the order of events
Causal Shuffles of an example execution Example φ3
p1 φ2
p2
p3
φ4
φ1
Which of the following permutations are causal shuffles? φ1 , φ3 , φ4 , φ2
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Changing the order of events
Causal Shuffles of an example execution Example φ3
p1 φ2
p2
p3
φ4
φ1
Which of the following permutations are causal shuffles? φ1 , φ3 , φ4 , φ2
7
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Changing the order of events
Causal Shuffles of an example execution Example φ3
p1 φ2
p2
p3
φ4
φ1
Which of the following permutations are causal shuffles? φ1 , φ3 , φ4 , φ2 φ3 , φ1 , φ2 , φ4
7
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Changing the order of events
Causal Shuffles of an example execution Example φ3
p1 φ2
p2
p3
φ4
φ1
Which of the following permutations are causal shuffles? φ1 , φ3 , φ4 , φ2
7
φ3 , φ1 , φ2 , φ4
3
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Lamport Timestamps
Lamport Timestamps Definition
Want to mark events so that some information about causality is captured å Assign a Lamport Timestamp LT (φ) to each event φ: Each pi keeps a local counter LTi , which is initally set to 0. At each event n φ in pi , o LTi = max LTi , max{LT hmsgs received upon φi} + 1 When pi sends a message, it attaches the LTi value to the message.
+ For each pi , LTi is strictly increasing.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Lamport Timestamps
Lamport Timestamps for an example execution
Example p1
φ1
φ2
h1i φ3
p2
p3
φ5 h1i
φ4
φ6
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Lamport Timestamps
Lamport Timestamps for an example execution
Example p1
φ1
φ2
h1i
h2i φ3
p2
p3
φ4
h3i φ5 h1i
φ6
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Lamport Timestamps
Lamport Timestamps for an example execution
Example p1
φ1
φ2
h1i
h2i
p2
p3
φ3
φ4
h3i
h4i
φ5
φ6
h1i
h5i
+ Note that LT (φ5 ) < LT (φ4 ) but ¬(φ5 → φ4 )
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Lamport Timestamps
Lamport Timestamps and Happens-Before Relation
Theorem (Weak consistency) Let φ1 , φ2 be two events in an execution. If φ1 → φ2 then LT (φ1 ) < LT (φ2 ). Drawback of Lamport Timestamps If LT (φ1 ) < LT (φ2 ) we can only tell that ¬(φ2 → φ1 ), but we don’t know whether φ1 → φ2 or φ1 k φ2 . The problem is that < induces a total order over integers while → a partial one, so the non-causality relation is lost.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Capturing concurrency as well: vector timestamps
Choose logical timestamps from a non totally ordered domain å vectors over integers Each pi keeps a local vector of size n VCi , whose entries VCi [j] are initially set to 0. At each event φ in pi , VCi [i] = VCi [i] + 1 and for all j 6= i VCi [j]n = o max VCi [j], max{VC[j]hmsgs received upon φi} VCi is attached to every message sent by pi .
+ VCi [j] is an “estimate” maintained by pi for VCj [j], i.e. the events having occurred in pj so far.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector Clocks in an example execution
Example p1
φ1
φ2
h1, 0, 0i
h2, 0, 0i φ3
p2
p3
φ5 h0, 0, 1i
φ4
φ6
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector Clocks in an example execution
Example p1
φ1
φ2
h1, 0, 0i
h2, 0, 0i φ3
p2
p3
φ4
h2, 1, 0i φ5 h0, 0, 1i
φ6
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector Clocks in an example execution
Example p1
φ1
φ2
h1, 0, 0i
h2, 0, 0i
p2
p3
φ3
φ4
h2, 1, 0i
h2, 2, 0i
φ5
φ6
h0, 0, 1i
h2, 2, 2i
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector clocks can indeed capture concurrency Only pi can increase VCi [i], so pj ’s estimation about pi ’s steps is less or equal than their actual number. Proposition For every pi , VCi [j] ≤ VCj [j] for all i, j 1 ≤ i, j ≤ n Vector clocks capture concurrency, i.e. it holds that φ1 kφ2 iff VC(φ1 ) and VC(φ2 ) are incomparable. Recall that: V1 ≤ V2 iff for all 1 ≤ i ≤ n V1 [i] ≤ V2 [i]. V1 < V2 iff V1 ≤ V2 ∧ V1 6= V2 . E.g. (2, 2, 3) < (3, 2, 4) V1 kV2 iff ¬(V1 ≤ V2 ) ∧ ¬(V2 ≤ V1 ). E.g. (3, 2, 4)k(4, 1, 4)
Theorem (Strong consistency) φ1 → φ2 iff VC(φ1 ) < VC(φ2 )
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector clocks strong consistency: proof
Proof. ⇒ If φ1 , φ2 at same pi trivial. If φ1 at pi sends message received by φ2 at pj , then VCj (φ2 )[k ] ≥ VCi (φ1 )[k ] for all k 6= j and VCj (φ2 )[j] = VCj (φ1 )[j] + 1. Rest by transitivity of the < relation of vectors. ⇐ If φ2 → φ1 , contradiction. If φ1 kφ2 then VCj [i](φ2 ) < VCi [i](φ1 ) since the only way that VCj [i](φ2 ) = VCi [i](φ1 ) would be the existence of a sequence of events φ0i s.t. φ1 → φ01 . . . φ0n → φ2 . Similarly VCi [j] < VCj [j]. Thus, VCi (φ1 ), VCj [φ2 ] would be incomparable.
Capturing Causality
Assigning logical timestamps
Distributed snapshot
Vector Clocks
Vector Clock Size Lower Bound
Size of vector timestamps n is big, can we do better? Theorem (Lower bound on the size of vector clocks) If VC is a function that maps each event in an execution in a system of n processors to a vector in S k , where S is any totally ordered set (e.g.