ReplicationAndConsistency
September 2002
Replication and Consistency Distributed Systems September 2002
Introduction to replication (1) Replication can provide the following 9 Performance enhancement
o e.g. several web servers can have the same DNS name and the servers are selected in turn. To share the load. o replication of read-only data is simple, but replication of changing data has overheads
9 Fault-tolerant service
o guarantees correct behaviour in spite of certain faults (can include timeliness) o if f of f + 1 servers crash then 1 remains to supply the service o if f of 2f + 1 servers have byzantine faults then they can supply a correct service
September 2002
© DoCS 2002
© DoCS
2
1
ReplicationAndConsistency
September 2002
Introduction to replication (2) 9 Availability is hindered by
o server failures replicate data at failure- independent servers and when one fails, client may use another. Note that caches do not help with availability (they are incomplete). o network partitions and disconnected operation Users of mobile computers deliberately disconnect, and then on re-connection, resolve conflicts
September 2002
© DoCS
3
Requirements for replicated data What is replication transparency? 9 Replication transparency
o clients see logical objects (not several physical copies) they access one logical item and receive a single result
9 Consistency
o specified to suit the application,
e.g. when a user of a diary disconnects, their local copy may be inconsistent with the others and will need to be reconciled when they connect again. But connected clients using different copies should get consistent results.
September 2002
© DoCS 2002
© DoCS
4
2
ReplicationAndConsistency
September 2002
System model 9 Each logical object is implemented by a collection of physical copies called replicas
o the replicas are not necessarily consistent all the time (some may have received updates, not yet conveyed to the others)
9 We assume an asynchronous system where processes fail only by crashing and generally assume no network partitions 9 Replica managers o an RM contains replicas on a computer and access them directly o RMs apply operations to replicas recoverably i.e. they do not leave inconsistent results if they crash o objects are copied at all RMs unless we state otherwise o static systems are based on a fixed set of RMs o in a dynamic system: RMs may join or leave (e.g. when they crash) o an RM can be a state machine.
September 2002
© DoCS
5
Replication 9 Improve availability 9 Improve performance o Scale o Location
9 Cost is consistency 9 “Informal replication” - caching 9 “Formal replication” - managed
September 2002
© DoCS 2002
© DoCS
6
3
ReplicationAndConsistency
September 2002
Managing Concurrency 9 Objects must manage concurrent invocations 9 Replicas states must be consistent 9 Concurrent invocations on replicas should have same behavior as on single object.
client
client
object
client
client
1
3 replica
replica 2
September 2002
© DoCS
7
Object Replication
Object managed
“System” managed
A remote object capable of handling concurrent invocations on its own. A remote object for which an object adapter is required to handle concurrent invocations September 2002
© DoCS 2002
© DoCS
8
4
ReplicationAndConsistency
September 2002
Object Replication
a) b)
A distributed system for replication-aware distributed objects. A distributed system responsible for replica management
September 2002
© DoCS
9
Consistency Models 9 Data-Centric Consistency Models 9 Client-Centric Consistency Models
September 2002
© DoCS 2002
© DoCS
10
5
ReplicationAndConsistency
September 2002
Consistency Models (Data Centric) 9 Strict consistency 9 Sequential consistency 9 Linearizability 9 Causal consistency 9 FIFO consistency 9 Weak consistency
September 2002
© DoCS
11
Strict Consistency Condition: Any read of X returns the value of the most recent write of X. 9 Depends on a global clock.
strict consistency
not strict consistency
Wi(x)a - process i writes value a to variable x Ri(x)b - process i gets value b by reading x tsOP(x) - the OP (read, write) was performed on x at this time September 2002
© DoCS 2002
© DoCS
12
6
ReplicationAndConsistency
September 2002
Sequential Consistency Condition: The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the Operations of each individual process appear in this sequence in the order specified by its program.
September 2002
© DoCS
13
Sequential Consistency
Sequentially Consistent
September 2002
© DoCS 2002
Not Sequentially Consistent
© DoCS
14
7
ReplicationAndConsistency
September 2002
Sequential Consistency Process P1
Process P2
Process P3
x = 1; print ( y, z);
y = 1; print (x, z);
z = 1; print (x, y);
Four valid executions
September 2002
© DoCS
15
Linearizable Consistency Condition: The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in the order specified by its program, and in addition if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence.
September 2002
© DoCS 2002
© DoCS
16
8
ReplicationAndConsistency
September 2002
Causal Consistency Condition: If event B is caused or influenced by event A then all processes see A before B o Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
September 2002
© DoCS
17
Causal Consistency
September 2002
© DoCS 2002
© DoCS
18
9
ReplicationAndConsistency
September 2002
Causal Consistency (a)
(b)
(a) is not causal consistent but (b) is September 2002
© DoCS
19
FIFO Consistency Necessary Condition: Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. 9 Messages propagated in order 9 No causal consistency. Pipelined RAM (PRAM) Consistency September 2002
© DoCS 2002
© DoCS
20
10
ReplicationAndConsistency
September 2002
FIFO Consistency
September 2002
© DoCS
21
FIFO Consistency Process P1
Process P2
Process P3
x = 1; print ( y, z);
y = 1; print (x, z);
z = 1; print (x, y);
x = 1; print (y, z); y = 1; print(x, z); z = 1; print (x, y);
x = 1; y = 1; print(x, z); print ( y, z); z = 1; print (x, y);
y = 1; print (x, z); z = 1; print (x, y); x = 1; print (y, z);
Prints: 00
Prints: 10
Prints: 01
September 2002
© DoCS 2002
© DoCS
22
11
ReplicationAndConsistency
September 2002
FIFO Consistency Process P1
Process P2
x = 1; if (y == 0) kill (P2);
y = 1; if (x == 0) kill (P1);
Two concurrent processes.
September 2002
© DoCS
23
Weak Consistency 9 Based on critical section model 9 Assumes locking is done separately 9 Sets of operations can be performed on local copy within the critical section.
Transaction Manager
Scheduler
Data Manager September 2002
© DoCS 2002
© DoCS
24
12
ReplicationAndConsistency
September 2002
Weak Consistency 9 Uses synchronization variable o Accesses to synchronization variables are sequentially consistent o No operation on a synchronization variable is allowed to be performed until all previous writes have completed everywhere o No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.
September 2002
© DoCS
25
Weak Consistency 9 Valid sequence
9 Invalid sequence
September 2002
© DoCS 2002
© DoCS
26
13
ReplicationAndConsistency
September 2002
Weak Consistency 9 Properties: 9 Accesses to synchronization variables associated with a data store are sequentially consistent 9 No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere 9 No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed. September 2002
© DoCS
27
Weak Consistency int a, b, c, d, e, x, y; int *p, *q; int f( int *p, int *q);
/* variables */ /* pointers */ /* function prototype */
a = x * x; b = y * y; c = a*a*a + b*b + a * b; d = a * a * c; p = &a; q = &b e = f(p, q)
/* a stored in register */ /* b as well */ /* used later */ /* used later */ /* p gets address of a */ /* q gets address of b */ /* function call */
A program fragment in which some variables may be kept in registers. September 2002
© DoCS 2002
© DoCS
28
14
ReplicationAndConsistency
September 2002
Release Consistency (1)
A valid event sequence for release consistency.
September 2002
© DoCS
29
Release Consistency (2) Rules: 9 Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully. 9 Before a release is allowed to be performed, all previous reads and writes by the process must have completed 9 Accesses to synchronization variables are FIFO consistent (sequential consistency is not required). September 2002
© DoCS 2002
© DoCS
30
15
ReplicationAndConsistency
September 2002
Entry Consistency Conditions: 9 An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process. 9 Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode. 9 After an exclusive mode access to a synchronization variable has been performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner. September 2002
© DoCS
31
Entry Consistency
A valid event sequence for entry consistency.
September 2002
© DoCS 2002
© DoCS
32
16
ReplicationAndConsistency
September 2002
Summary a) b)
Consistency models not using synchronization operations. Models with synchronization operations.
Consistency
Description
Strict
Absolute time ordering of all shared accesses matters.
Linearizability
All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp
Sequential
All processes see all shared accesses in the same order. Accesses are not ordered in time
Causal
All processes see causally-related shared accesses in the same order.
FIFO
All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order (a)
Consistency
Description
Weak
Shared data can be counted on to be consistent only after a synchronization is done
Release
Shared data are made consistent when a critical region is exited
Entry
Shared data pertaining to a critical region are made consistent when a critical region is entered. (b)
September 2002
© DoCS
33
Client-Centric Consistency Goal: Show how we can perhaps avoid system-wide consistency, by concentrating on what specific clients want, instead of what should be maintained by servers. Background: Most large-scale distributed systems (i.e., databases) apply replication for scalability, but can support only weak consistency:
o DNS: Updates are propagated slowly, and inserts may not be immediately visible. o NEWS: Articles and reactions are pushed and pulled throughout the Internet, such that reactions can be seen before postings. o WWW: Caches all over the place, but there need be no guarantee that you are reading the most recent version of a page.
September 2002
© DoCS 2002
© DoCS
34
17
ReplicationAndConsistency
September 2002
Eventual Consistency
September 2002
© DoCS
35
Client-Centric Models (1) 9 Monotonic Reads o If a process reads the value of a data item x, any successive reads on x by that process will always return the same value of x or a more recent value.
9 Monotonic Writes o A write operation by a process on a data item x is completed before any successive write operation on x by the same process.
September 2002
© DoCS 2002
© DoCS
36
18
ReplicationAndConsistency
September 2002
Monotonic Reads 9 Example: Automatically reading your personal calendar updates from different servers. Monotonic Reads guarantees that the user sees all updates, no matter from which server the automatic reading takes place. 9 Example: Reading (not modifying) incoming mail while you are on the move. Each time you connect to a different e-mail server, that server fetches (at least) all the updates from the server you previously visited. September 2002
© DoCS
37
Monotonic Reads
The read operations performed by a single process P at two different local copies of the same data store. a) b)
A monotonic-read consistent data store A data store that does not provide monotonic reads.
Notation: WS(xi) is the set of write operations (at Li) that lead to version xi of x (at time t); WS(xi,xj) indicates that it is known that WS(xi) is part of WS(xj) September 2002
© DoCS 2002
© DoCS
38
19
ReplicationAndConsistency
September 2002
Monotonic Writes 9 Example: Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2. 9 Example: Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).
September 2002
© DoCS
39
Monotonic Writes
The write operations performed by a single process P at two different local copies of the same data store a) A monotonic-write consistent data store. b) A data store that does not provide monotonic-write consistency. September 2002
© DoCS 2002
© DoCS
40
20
ReplicationAndConsistency
September 2002
Client-Centric Models (2) 9 Read your writes o The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process.
9 Writes follow reads o A write operation by a process on a data item x following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.
September 2002
© DoCS
41
Read Your Writes 9 Example: Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.
September 2002
© DoCS 2002
© DoCS
42
21
ReplicationAndConsistency
September 2002
Read Your Writes
a) b)
A data store that provides read-your-writes consistency. A data store that does not.
September 2002
© DoCS
43
Writes Follow Reads 9 Example: See reactions to posted articles only if you have the original posting (a read “pulls in” the corresponding write operation).
September 2002
© DoCS 2002
© DoCS
44
22
ReplicationAndConsistency
September 2002
Writes Follow Reads
a) b)
A writes-follow-reads consistent data store A data store that does not provide writes-follow-reads consistency
September 2002
© DoCS
45
Implementation 9 Unique IDs on writes o Each write is uniquely identified o Identification includes replica o Process keeps read and write sets of IDs
9 Timestamps o V(i)[j] equals timestamp of latest write initiated on server j to be propagated to server i. o Leads to efficient representation of read and write sets.
September 2002
© DoCS 2002
© DoCS
46
23
ReplicationAndConsistency
September 2002
Distribution Protocols 9 Placement and nature of replicas 9 Distributing updates 9 Achieving consistency client
September 2002
© DoCS
47
Replicas 9 Permanent o Load sharing o Mirroring
9 Server-Initiated o Push caches o Temporary hosts- NOMAD
9 Client-Initiated o Caches
September 2002
© DoCS 2002
© DoCS
48
24
ReplicationAndConsistency
September 2002
Replica Placement
The logical organization of different kinds of copies of a data store into three concentric rings. September 2002
© DoCS
49
Permanent Replicas 9 The files that constitute a site are replicated across a limited number of servers on a single LAN 9 A Web site is copied to a limited number of servers, called mirrored sites, which are geographically spread across the Internet.
September 2002
© DoCS 2002
© DoCS
50
25
ReplicationAndConsistency
September 2002
Server-Initiated Replicas Counting access requests from different clients.
September 2002
1.
Keep track of access counts per file, aggregated by considering server closest to requesting clients
2.
Number of accesses drops below threshold D ⇒ drop file
3.
Number of accesses exceeds threshold R ⇒ replicate file
4.
Number of access between D and R ⇒ migrate file
© DoCS
51
Client-Initiated Replicas 9 Client caches o o o o o
Improve access time Located at client machine Consistency left to client Cache hit Caches can be shared between clients
September 2002
© DoCS 2002
© DoCS
52
26
ReplicationAndConsistency
September 2002
Update Propagation 9 State or Operation? o Notification of update o New copy of data o Copy of operation
9 Trade bandwidth for processing
9 Push or Pull? 9 Push by server o Server must know replicas o Client immediately updated
9 Pull by client o Client must poll or o Delay response when item requested
9 Leases
September 2002
© DoCS
53
Pull versus Push Protocols Issue
Push-based
Pull-based
State of server
List of client replicas and caches
None
Messages sent
Update (and possibly fetch update later)
Poll and update
Response time at client
Immediate (or fetch-update time)
Fetch-update time
A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.
September 2002
© DoCS 2002
© DoCS
54
27
ReplicationAndConsistency
September 2002
Epidemic Protocols 9 Replicas are: o Infective o Susceptible o Removed
9 Update is pushed o May not infect all replicas o Augment with a pull
9 If push to infected replica stop with probability 1/k. September 2002
© DoCS
55
Principles Basic idea: Assume there are no write–write conflicts: 9 Update operations are initially performed at one or only a few replicas 9 A replica passes its updated state to a limited number of neighbors 9 Update propagation is lazy, i.e., not immediate 9 Eventually, each update should reach every replica September 2002
© DoCS 2002
© DoCS
56
28
ReplicationAndConsistency
September 2002
Principles 9 Anti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwards 9 Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).
September 2002
© DoCS
57
Consistency Protocols 9 Primary Based o Remote Write o Local Write
9 Replicated Write o Active Replication o Quorum Based Voting o Cache-Coherence
September 2002
© DoCS 2002
© DoCS
58
29
ReplicationAndConsistency
September 2002
Remote-Write Protocols (1)
Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded. September 2002
© DoCS
59
Remote-Write Protocols (2)
The principle of primary-backup protocol. September 2002
© DoCS 2002
© DoCS
60
30
ReplicationAndConsistency
September 2002
Local-Write Protocols (1)
Primary-based local-write protocol in which a single copy is migrated between processes. September 2002
© DoCS
61
Local-Write Protocol (2)
September 2002
© DoCS 2002
© DoCS
62
31
ReplicationAndConsistency
September 2002
Active Replication (1) The problem of replicated invocations.
9 Operation forwarded to each replica 9 Operations must be sequenced 9 Use multicast or sequencer
September 2002
© DoCS
63
Active Replication (2)
a) b)
Forwarding an invocation request from a replicated object. Returning a reply to a replicated object. September 2002
© DoCS 2002
© DoCS
64
32
ReplicationAndConsistency
September 2002
Quorum Based Protocols
9 Valid read/write sets o NR + NW > N o NW > N/2
9 Invalid 9 Read-One Write-ALL September 2002
© DoCS
65
Orca OBJECT IMPLEMENTATION stack; top: integer; stack: ARRAY[integer 0..N-1] OF integer OPERATION push (item: integer) BEGIN GUARD top < N DO stack [top] := item; top := top + 1; OD; END; OPERATION pop():integer; BEGIN GUARD top > 0 DO top := top – 1; RETURN stack [top]; OD; END; BEGIN top := 0; END;
# variable indicating the top # storage for the stack # function returning nothing # push item onto the stack # increment the stack pointer
# function returning an integer # suspend if the stack is empty # decrement the stack pointer # return the top item
# initialization
A simplified stack object in Orca, with internal data and two operations. September 2002
© DoCS 2002
© DoCS
66
33
ReplicationAndConsistency
September 2002
Management of Shared Objects in Orca
Four cases of a process P performing an operation on an object O in Orca. September 2002
© DoCS
67
Causally-Consistent Lazy Replication
The general organization of a distributed data store. Clients are assumed to also handle consistency-related communication. September 2002
© DoCS 2002
© DoCS
68
34
ReplicationAndConsistency
September 2002
Processing Read Operations Performing a read operation at a local copy.
September 2002
© DoCS
69
Processing Write Operations
Performing a write operation at a local copy. September 2002
© DoCS 2002
© DoCS
70
35