Hardware Transactional Memory on Beehive

Hardware Transactional Memory on Beehive Andrew Birrell, Tom Rodeheffer, Chuck Thacker Microsoft Research, Silicon Valley Basics: Concurrent Execut...
Author: Basil Lawson
0 downloads 0 Views 549KB Size
Hardware Transactional Memory on Beehive

Andrew Birrell, Tom Rodeheffer, Chuck Thacker Microsoft Research, Silicon Valley

Basics: Concurrent Execution Debit-credit with …a…race … Possible Intended behavior behavior Core #2: var r1 = Read(x); var r2 = r1 + 7; Write(x, r2); r2: 17

Core #3: var r3 = Read(x); var r4 = r3 – 5; Write(x, r4); r4: 5 Memory 10 x: 5 17 12

September 2010

Hardware Transactional Memory on Beehive

2

Classic Solution: Mutual Exclusion Core #2: lock(m) { var r1 = Read(x); var r2 = r1 + 7; Write(x, r2); }

Core #3: lock(m) { var r3 = Read(x); var r4 = r3 – 5; Write(x, r4); } Memory 10 x: 17 12 m: held free

September 2010

Hardware Transactional Memory on Beehive

3

Lock-based Mutual Exclusion is Hard • Locking levels • Composition • Locking granularity

September 2010

Hardware Transactional Memory on Beehive

4

Locking Levels • Deadlock if locking order is inconsistent: Core #2: lock(p) { lock(q) { … } }

Core #3: lock(q) { lock(p) { … } }

• Requires partial order on locks • Doesn’t scale September 2010

Hardware Transactional Memory on Beehive

5

Composition • Hard to extend existing libraries • E.g. hash table:

– add, lookup, remove

• How to enhance with atomic “rename”?

– rename = { add(); remove() } …leaves temp duplicate – rename = { remove(); add() } … leaves temp gap

• Requires access to internal locking mechanism September 2010

Hardware Transactional Memory on Beehive

6

Locking Granularity • Simple locking can inhibit concurrency: Core #2: lock(m) { var r1 = Read(x[i]); var r2 = r1 + 7; Write(x[i], r2); }

Core #3: lock(m) { var r3 = Read(x[j]); var r4 = r3 – 5; Write(x[j], r4); }

• Tricky trade-off of complexity/performance September 2010

Hardware Transactional Memory on Beehive

7

Alternatives • Rely on experts? – – – –

Parallel processing libraries (LINQ) Map-Reduce, Hadoop, Dryad, Parallelizing compilers, GPU graphics

• Use a better abstraction?

– Atomic Transactions: • semantics as if sequential • actually, concurrent

September 2010

Hardware Transactional Memory on Beehive

8

Debit-credit with Atomic Transactions Core #2: atomic { var r1 = Read(x); var r2 = r1 + 7; Write(x, r2); }

Core #3: atomic { var r3 = Read(x); var r4 = r3 – 5; Write(x, r4); } Memory x: 17 10 12

September 2010

Hardware Transactional Memory on Beehive

9

Atomic Transaction Semantics • Database transactions: – – – –

Atomicity: all of the transaction happens, or nothing Consistency: failed transactions have no effect Isolation: internal state invisible to others Durability: after commit, effects are permanent

• Transactional memory:

– Serialization: execution is indistinguishable from some serial execution of the transactions – Reality: non-transactional code can see non-atomic effects of transactions

September 2010

Hardware Transactional Memory on Beehive

10

Transactional Memory Research • ~500 papers in last 15 years • All about software TM, or about simulations • Software TM is extremely inefficient • Hardware TM needs hardware  • So, Beehive … September 2010

Hardware Transactional Memory on Beehive

11

Implementing TM Transactions • Execute in parallel, hoping to succeed • Detect conflicts that prevent serialization • Rollback all but one and retry them

September 2010

Hardware Transactional Memory on Beehive

12

Implementing TM Debit-Credit Core #2: tm_startTX(); { var r1 = x; var r2 = r1 + 7; x := r2; } tm_endTX(); R{ x } W{ x: 17} September 2010

Memory x: 17 10 12

Core #3: tm_startTX(); { var r3 = x; var r4 = r3 – 5; x := r4; } tm_endTX(); R{ x } 5 }} W{ x: 12

Hardware Transactional Memory on Beehive

13

Conflict Detection Abstractly • For each uncommitted transaction “X” maintain: – Read set R(X), all locations read so far – Write set W(X), all locations written so far – Ability to undo writes for rollback (or do them on commit)

• If W(X) intersects with R(Y) or W(Y), and X commits before Y: – rollback Y – (or could rollback X) – (or could delay committing X)

September 2010

Hardware Transactional Memory on Beehive

14

Conflict Detection in Beehive Hardware • During transaction “X”: – Maintain R(X) and W(X) – Defer writes to DRAM

• To commit “X”:

– Send writes to DRAM – Send W(X) around the ring

• During transaction “Y”:

– Snoop on W(X), compare with R(Y) – Rollback and retry on conflict

September 2010

Hardware Transactional Memory on Beehive

15

Finally, Some Hardware • R(X) is recorded in a “Bloom filter” • D-cache evicts go to victim cache, not DRAM • Filter snoops on Tx writes; a conflict triggers abort • “Commit” sends writes to ring

W(X): Victim Cache

Core N

D cache

R(X): Bloom Filter

The Ring September 2010

Hardware Transactional Memory on Beehive

16

Sidebar: “What’s a Bloom filter?” • Probabilistic storage for set membership • Storage can be less than set size • Operations:

– Add x to the set – Is x in the set? • If x is in the set, answer “yes” • If x isn’t in the set, answer either “yes” or “no”

• Skill is controlling probability of false positives September 2010

Hardware Transactional Memory on Beehive

17

Details (1): Serializing Commits • tm_endTX() {

P(commitMutex); Flush victim cache and D-cache; Clear “inTx” state; V(commitMutex);

September 2010

Hardware Transactional Memory on Beehive

18

Details(2): Conflicts and Rollback • tm_startTX() {

}

setjmp(…); // inline at caller Flush D-cache; Set “inTx”;

• On conflict: clear “inTx”, trap to location 2 • abortHandler() {

Invalidate D-cache; longjmp(…)

September 2010

Hardware Transactional Memory on Beehive

19

Details (3): Mice and Elephants • Victim cache can overflow • Clear “inTX”, trap to location 3 • Elephant:

– tm_startTX(): { P(commitMutex); flush; set “inTX”; } – execute the transaction, completely; – tm_endTX(): { clear “inTX”; V(commitMutex); }

September 2010

Hardware Transactional Memory on Beehive

20

Measurements • With Sungpack Hong • Eigenbench and STAMP benchmarks (Stanford) • Comparing with “SwissTM” (EPFL) and “TL2” (Sun) software TM • Work-in-progress

September 2010

Hardware Transactional Memory on Beehive

21

Eigenbench (1): no conflicts

Short Transactions (9RD, 1WR)

Large Transactions (270RD, 30WR) 10.00%

12

9.00%

11.71 80.00% 70.00%

10.69

8.00%

10

9.23

7.00%

p 8 u d e e 6 p S 4

p 8 u d e e 6 p S

Speedup

6.00% 5.00% 4.00%

4

3.00% 2.00%

2

1.00%

8.92

50.00% 40.00% 30.00% 20.00%

2 3.11%

0 0

2

4

0

5

10

15

6

8

10

12

10.00% 0.00%

14

Number of Cores

0.00%

0

60.00%

Speedup

10

90.00%

12

Unprotected

TM

%Overflow

%Abort

Number of Cores Unprotected

September 2010

TM

%Overflow

%Abort

Hardware Transactional Memory on Beehive

22

Eigenbench (2): with Conflicts

Speedup (8 cores, 90R, 10W) 6

Speedup

5 4 p u d e3 e p S 2 1 0 0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

Esimated Degree of Conflicts (From Analytic Model) Beehive Speedup

September 2010

SwissTM Speedup

TL2 Speedup

Hardware Transactional Memory on Beehive

23

Red-Black Trees Execution time in committed transactions 10000 9000

?

1012

8000 #

876

7000

6000 c y 5000 c l 4000 e s 3000

733

717

767

6147 4679

5205

4354

4444

1492

1556

1651

1836

2099

1

2

4

8

12

2000 1000 0 Number of Cores tm_startTX

September 2010

Hardware Transactional Memory on Beehive

Body

tm_endTX

24

10000

Breakdown of core, 10 RB Tree (12

STAMP Benchmarks Overflow / False Positives

Nondeterministic Execution

September 2010

Overflow

Small TXs Floating Point Operations

Hardware Transactional Memory on Beehive

25

Atomicity: a Cautionary Tale Core #2: tm_startTX(); { if (ok) { x->f(); } } tm_endTX();

Core #3: tm_startTX(); { x = NULL; ok = false; } tm_endTX();

Memory R{ ok 0 }ok: 0 } ok,}x } W{ x: 0, • Make TX writes on the ring &y x:atomic 0 • Inhibit mice while ok: 1running an elephant September 2010

Hardware Transactional Memory on Beehive

26

Status • It works • It needs polishing • The fundamental question remains:

– “is TM significantly easier than locks/monitors?”

• Beehive TM will help us answer this

September 2010

Hardware Transactional Memory on Beehive

27

Suggest Documents