Replication and Consistency

ReplicationAndConsistency September 2002 Replication and Consistency Distributed Systems September 2002 Introduction to replication (1) Replication...
Author: Mark Merritt
3 downloads 2 Views 395KB Size
ReplicationAndConsistency

September 2002

Replication and Consistency Distributed Systems September 2002

Introduction to replication (1) Replication can provide the following 9 Performance enhancement

o e.g. several web servers can have the same DNS name and the servers are selected in turn. To share the load. o replication of read-only data is simple, but replication of changing data has overheads

9 Fault-tolerant service

o guarantees correct behaviour in spite of certain faults (can include timeliness) o if f of f + 1 servers crash then 1 remains to supply the service o if f of 2f + 1 servers have byzantine faults then they can supply a correct service

September 2002

© DoCS 2002

© DoCS

2

1

ReplicationAndConsistency

September 2002

Introduction to replication (2) 9 Availability is hindered by

o server failures ƒ replicate data at failure- independent servers and when one fails, client may use another. Note that caches do not help with availability (they are incomplete). o network partitions and disconnected operation ƒ Users of mobile computers deliberately disconnect, and then on re-connection, resolve conflicts

September 2002

© DoCS

3

Requirements for replicated data What is replication transparency? 9 Replication transparency

o clients see logical objects (not several physical copies) ƒ they access one logical item and receive a single result

9 Consistency

o specified to suit the application,

ƒ e.g. when a user of a diary disconnects, their local copy may be inconsistent with the others and will need to be reconciled when they connect again. But connected clients using different copies should get consistent results.

September 2002

© DoCS 2002

© DoCS

4

2

ReplicationAndConsistency

September 2002

System model 9 Each logical object is implemented by a collection of physical copies called replicas

o the replicas are not necessarily consistent all the time (some may have received updates, not yet conveyed to the others)

9 We assume an asynchronous system where processes fail only by crashing and generally assume no network partitions 9 Replica managers o an RM contains replicas on a computer and access them directly o RMs apply operations to replicas recoverably ƒ i.e. they do not leave inconsistent results if they crash o objects are copied at all RMs unless we state otherwise o static systems are based on a fixed set of RMs o in a dynamic system: RMs may join or leave (e.g. when they crash) o an RM can be a state machine.

September 2002

© DoCS

5

Replication 9 Improve availability 9 Improve performance o Scale o Location

9 Cost is consistency 9 “Informal replication” - caching 9 “Formal replication” - managed

September 2002

© DoCS 2002

© DoCS

6

3

ReplicationAndConsistency

September 2002

Managing Concurrency 9 Objects must manage concurrent invocations 9 Replicas states must be consistent 9 Concurrent invocations on replicas should have same behavior as on single object.

client

client

object

client

client

1

3 replica

replica 2

September 2002

© DoCS

7

Object Replication

Object managed

“System” managed

A remote object capable of handling concurrent invocations on its own. A remote object for which an object adapter is required to handle concurrent invocations September 2002

© DoCS 2002

© DoCS

8

4

ReplicationAndConsistency

September 2002

Object Replication

a) b)

A distributed system for replication-aware distributed objects. A distributed system responsible for replica management

September 2002

© DoCS

9

Consistency Models 9 Data-Centric Consistency Models 9 Client-Centric Consistency Models

September 2002

© DoCS 2002

© DoCS

10

5

ReplicationAndConsistency

September 2002

Consistency Models (Data Centric) 9 Strict consistency 9 Sequential consistency 9 Linearizability 9 Causal consistency 9 FIFO consistency 9 Weak consistency

September 2002

© DoCS

11

Strict Consistency Condition: Any read of X returns the value of the most recent write of X. 9 Depends on a global clock.

strict consistency

not strict consistency

Wi(x)a - process i writes value a to variable x Ri(x)b - process i gets value b by reading x tsOP(x) - the OP (read, write) was performed on x at this time September 2002

© DoCS 2002

© DoCS

12

6

ReplicationAndConsistency

September 2002

Sequential Consistency Condition: The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the Operations of each individual process appear in this sequence in the order specified by its program.

September 2002

© DoCS

13

Sequential Consistency

Sequentially Consistent

September 2002

© DoCS 2002

Not Sequentially Consistent

© DoCS

14

7

ReplicationAndConsistency

September 2002

Sequential Consistency Process P1

Process P2

Process P3

x = 1; print ( y, z);

y = 1; print (x, z);

z = 1; print (x, y);

Four valid executions

September 2002

© DoCS

15

Linearizable Consistency Condition: The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in the order specified by its program, and in addition if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence.

September 2002

© DoCS 2002

© DoCS

16

8

ReplicationAndConsistency

September 2002

Causal Consistency Condition: If event B is caused or influenced by event A then all processes see A before B o Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.

September 2002

© DoCS

17

Causal Consistency

September 2002

© DoCS 2002

© DoCS

18

9

ReplicationAndConsistency

September 2002

Causal Consistency (a)

(b)

(a) is not causal consistent but (b) is September 2002

© DoCS

19

FIFO Consistency Necessary Condition: Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. 9 Messages propagated in order 9 No causal consistency. Pipelined RAM (PRAM) Consistency September 2002

© DoCS 2002

© DoCS

20

10

ReplicationAndConsistency

September 2002

FIFO Consistency

September 2002

© DoCS

21

FIFO Consistency Process P1

Process P2

Process P3

x = 1; print ( y, z);

y = 1; print (x, z);

z = 1; print (x, y);

x = 1; print (y, z); y = 1; print(x, z); z = 1; print (x, y);

x = 1; y = 1; print(x, z); print ( y, z); z = 1; print (x, y);

y = 1; print (x, z); z = 1; print (x, y); x = 1; print (y, z);

Prints: 00

Prints: 10

Prints: 01

September 2002

© DoCS 2002

© DoCS

22

11

ReplicationAndConsistency

September 2002

FIFO Consistency Process P1

Process P2

x = 1; if (y == 0) kill (P2);

y = 1; if (x == 0) kill (P1);

Two concurrent processes.

September 2002

© DoCS

23

Weak Consistency 9 Based on critical section model 9 Assumes locking is done separately 9 Sets of operations can be performed on local copy within the critical section.

Transaction Manager

Scheduler

Data Manager September 2002

© DoCS 2002

© DoCS

24

12

ReplicationAndConsistency

September 2002

Weak Consistency 9 Uses synchronization variable o Accesses to synchronization variables are sequentially consistent o No operation on a synchronization variable is allowed to be performed until all previous writes have completed everywhere o No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.

September 2002

© DoCS

25

Weak Consistency 9 Valid sequence

9 Invalid sequence

September 2002

© DoCS 2002

© DoCS

26

13

ReplicationAndConsistency

September 2002

Weak Consistency 9 Properties: 9 Accesses to synchronization variables associated with a data store are sequentially consistent 9 No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere 9 No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed. September 2002

© DoCS

27

Weak Consistency int a, b, c, d, e, x, y; int *p, *q; int f( int *p, int *q);

/* variables */ /* pointers */ /* function prototype */

a = x * x; b = y * y; c = a*a*a + b*b + a * b; d = a * a * c; p = &a; q = &b e = f(p, q)

/* a stored in register */ /* b as well */ /* used later */ /* used later */ /* p gets address of a */ /* q gets address of b */ /* function call */

A program fragment in which some variables may be kept in registers. September 2002

© DoCS 2002

© DoCS

28

14

ReplicationAndConsistency

September 2002

Release Consistency (1)

A valid event sequence for release consistency.

September 2002

© DoCS

29

Release Consistency (2) Rules: 9 Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully. 9 Before a release is allowed to be performed, all previous reads and writes by the process must have completed 9 Accesses to synchronization variables are FIFO consistent (sequential consistency is not required). September 2002

© DoCS 2002

© DoCS

30

15

ReplicationAndConsistency

September 2002

Entry Consistency Conditions: 9 An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process. 9 Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode. 9 After an exclusive mode access to a synchronization variable has been performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner. September 2002

© DoCS

31

Entry Consistency

A valid event sequence for entry consistency.

September 2002

© DoCS 2002

© DoCS

32

16

ReplicationAndConsistency

September 2002

Summary a) b)

Consistency models not using synchronization operations. Models with synchronization operations.

Consistency

Description

Strict

Absolute time ordering of all shared accesses matters.

Linearizability

All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp

Sequential

All processes see all shared accesses in the same order. Accesses are not ordered in time

Causal

All processes see causally-related shared accesses in the same order.

FIFO

All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order (a)

Consistency

Description

Weak

Shared data can be counted on to be consistent only after a synchronization is done

Release

Shared data are made consistent when a critical region is exited

Entry

Shared data pertaining to a critical region are made consistent when a critical region is entered. (b)

September 2002

© DoCS

33

Client-Centric Consistency Goal: Show how we can perhaps avoid system-wide consistency, by concentrating on what specific clients want, instead of what should be maintained by servers. Background: Most large-scale distributed systems (i.e., databases) apply replication for scalability, but can support only weak consistency:

o DNS: Updates are propagated slowly, and inserts may not be immediately visible. o NEWS: Articles and reactions are pushed and pulled throughout the Internet, such that reactions can be seen before postings. o WWW: Caches all over the place, but there need be no guarantee that you are reading the most recent version of a page.

September 2002

© DoCS 2002

© DoCS

34

17

ReplicationAndConsistency

September 2002

Eventual Consistency

September 2002

© DoCS

35

Client-Centric Models (1) 9 Monotonic Reads o If a process reads the value of a data item x, any successive reads on x by that process will always return the same value of x or a more recent value.

9 Monotonic Writes o A write operation by a process on a data item x is completed before any successive write operation on x by the same process.

September 2002

© DoCS 2002

© DoCS

36

18

ReplicationAndConsistency

September 2002

Monotonic Reads 9 Example: Automatically reading your personal calendar updates from different servers. Monotonic Reads guarantees that the user sees all updates, no matter from which server the automatic reading takes place. 9 Example: Reading (not modifying) incoming mail while you are on the move. Each time you connect to a different e-mail server, that server fetches (at least) all the updates from the server you previously visited. September 2002

© DoCS

37

Monotonic Reads

The read operations performed by a single process P at two different local copies of the same data store. a) b)

A monotonic-read consistent data store A data store that does not provide monotonic reads.

Notation: WS(xi) is the set of write operations (at Li) that lead to version xi of x (at time t); WS(xi,xj) indicates that it is known that WS(xi) is part of WS(xj) September 2002

© DoCS 2002

© DoCS

38

19

ReplicationAndConsistency

September 2002

Monotonic Writes 9 Example: Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2. 9 Example: Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).

September 2002

© DoCS

39

Monotonic Writes

The write operations performed by a single process P at two different local copies of the same data store a) A monotonic-write consistent data store. b) A data store that does not provide monotonic-write consistency. September 2002

© DoCS 2002

© DoCS

40

20

ReplicationAndConsistency

September 2002

Client-Centric Models (2) 9 Read your writes o The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process.

9 Writes follow reads o A write operation by a process on a data item x following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.

September 2002

© DoCS

41

Read Your Writes 9 Example: Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.

September 2002

© DoCS 2002

© DoCS

42

21

ReplicationAndConsistency

September 2002

Read Your Writes

a) b)

A data store that provides read-your-writes consistency. A data store that does not.

September 2002

© DoCS

43

Writes Follow Reads 9 Example: See reactions to posted articles only if you have the original posting (a read “pulls in” the corresponding write operation).

September 2002

© DoCS 2002

© DoCS

44

22

ReplicationAndConsistency

September 2002

Writes Follow Reads

a) b)

A writes-follow-reads consistent data store A data store that does not provide writes-follow-reads consistency

September 2002

© DoCS

45

Implementation 9 Unique IDs on writes o Each write is uniquely identified o Identification includes replica o Process keeps read and write sets of IDs

9 Timestamps o V(i)[j] equals timestamp of latest write initiated on server j to be propagated to server i. o Leads to efficient representation of read and write sets.

September 2002

© DoCS 2002

© DoCS

46

23

ReplicationAndConsistency

September 2002

Distribution Protocols 9 Placement and nature of replicas 9 Distributing updates 9 Achieving consistency client

September 2002

© DoCS

47

Replicas 9 Permanent o Load sharing o Mirroring

9 Server-Initiated o Push caches o Temporary hosts- NOMAD

9 Client-Initiated o Caches

September 2002

© DoCS 2002

© DoCS

48

24

ReplicationAndConsistency

September 2002

Replica Placement

The logical organization of different kinds of copies of a data store into three concentric rings. September 2002

© DoCS

49

Permanent Replicas 9 The files that constitute a site are replicated across a limited number of servers on a single LAN 9 A Web site is copied to a limited number of servers, called mirrored sites, which are geographically spread across the Internet.

September 2002

© DoCS 2002

© DoCS

50

25

ReplicationAndConsistency

September 2002

Server-Initiated Replicas Counting access requests from different clients.

September 2002

1.

Keep track of access counts per file, aggregated by considering server closest to requesting clients

2.

Number of accesses drops below threshold D ⇒ drop file

3.

Number of accesses exceeds threshold R ⇒ replicate file

4.

Number of access between D and R ⇒ migrate file

© DoCS

51

Client-Initiated Replicas 9 Client caches o o o o o

Improve access time Located at client machine Consistency left to client Cache hit Caches can be shared between clients

September 2002

© DoCS 2002

© DoCS

52

26

ReplicationAndConsistency

September 2002

Update Propagation 9 State or Operation? o Notification of update o New copy of data o Copy of operation

9 Trade bandwidth for processing

9 Push or Pull? 9 Push by server o Server must know replicas o Client immediately updated

9 Pull by client o Client must poll or o Delay response when item requested

9 Leases

September 2002

© DoCS

53

Pull versus Push Protocols Issue

Push-based

Pull-based

State of server

List of client replicas and caches

None

Messages sent

Update (and possibly fetch update later)

Poll and update

Response time at client

Immediate (or fetch-update time)

Fetch-update time

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

September 2002

© DoCS 2002

© DoCS

54

27

ReplicationAndConsistency

September 2002

Epidemic Protocols 9 Replicas are: o Infective o Susceptible o Removed

9 Update is pushed o May not infect all replicas o Augment with a pull

9 If push to infected replica stop with probability 1/k. September 2002

© DoCS

55

Principles Basic idea: Assume there are no write–write conflicts: 9 Update operations are initially performed at one or only a few replicas 9 A replica passes its updated state to a limited number of neighbors 9 Update propagation is lazy, i.e., not immediate 9 Eventually, each update should reach every replica September 2002

© DoCS 2002

© DoCS

56

28

ReplicationAndConsistency

September 2002

Principles 9 Anti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwards 9 Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).

September 2002

© DoCS

57

Consistency Protocols 9 Primary Based o Remote Write o Local Write

9 Replicated Write o Active Replication o Quorum Based Voting o Cache-Coherence

September 2002

© DoCS 2002

© DoCS

58

29

ReplicationAndConsistency

September 2002

Remote-Write Protocols (1)

Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded. September 2002

© DoCS

59

Remote-Write Protocols (2)

The principle of primary-backup protocol. September 2002

© DoCS 2002

© DoCS

60

30

ReplicationAndConsistency

September 2002

Local-Write Protocols (1)

Primary-based local-write protocol in which a single copy is migrated between processes. September 2002

© DoCS

61

Local-Write Protocol (2)

September 2002

© DoCS 2002

© DoCS

62

31

ReplicationAndConsistency

September 2002

Active Replication (1) The problem of replicated invocations.

9 Operation forwarded to each replica 9 Operations must be sequenced 9 Use multicast or sequencer

September 2002

© DoCS

63

Active Replication (2)

a) b)

Forwarding an invocation request from a replicated object. Returning a reply to a replicated object. September 2002

© DoCS 2002

© DoCS

64

32

ReplicationAndConsistency

September 2002

Quorum Based Protocols

9 Valid read/write sets o NR + NW > N o NW > N/2

9 Invalid 9 Read-One Write-ALL September 2002

© DoCS

65

Orca OBJECT IMPLEMENTATION stack; top: integer; stack: ARRAY[integer 0..N-1] OF integer OPERATION push (item: integer) BEGIN GUARD top < N DO stack [top] := item; top := top + 1; OD; END; OPERATION pop():integer; BEGIN GUARD top > 0 DO top := top – 1; RETURN stack [top]; OD; END; BEGIN top := 0; END;

# variable indicating the top # storage for the stack # function returning nothing # push item onto the stack # increment the stack pointer

# function returning an integer # suspend if the stack is empty # decrement the stack pointer # return the top item

# initialization

A simplified stack object in Orca, with internal data and two operations. September 2002

© DoCS 2002

© DoCS

66

33

ReplicationAndConsistency

September 2002

Management of Shared Objects in Orca

Four cases of a process P performing an operation on an object O in Orca. September 2002

© DoCS

67

Causally-Consistent Lazy Replication

The general organization of a distributed data store. Clients are assumed to also handle consistency-related communication. September 2002

© DoCS 2002

© DoCS

68

34

ReplicationAndConsistency

September 2002

Processing Read Operations Performing a read operation at a local copy.

September 2002

© DoCS

69

Processing Write Operations

Performing a write operation at a local copy. September 2002

© DoCS 2002

© DoCS

70

35