Programming Languages Dr. Michael Petter Exercise Sheet 2

WS 2016/17

Assignment 2.1 MESI-Protocol. Take the example from slide 23 of the lecture and start with the cache state, where CPU B exclusively has a and b in its cache. Assume the execution B.1 A.1 A.2 B.1 B.2. Draw a happened-before diagram without any store/invalidate buffers. Solution 2.1 a=1

cac he

I a b I

cac he

E0 a b E0

S1

M1

write back

read

write back

S1

read

read respon se invalidate a ck

read inalidate

M1 read invalidate

B

b=1

read respon invalidate ase ck

A

S1

I S1

I

b==0

b==0

a==1

Assignment 2.2 Dekker. Draw a happened-before diagram for the Dekker algorithm describing an interaction of two threads for a case where one of the threads succeeds to enter the critical section. Assume the hardware to be sequentially consistent. In the beginning, all variables have a value of zero and are in shared state.

T1

read write b ack

S0

ck write ba

I

S1 M0 S0

M1

cac he

S0 f [0] S0 f [1] turn S0

S1

M1 I

read

S1 S1 M1

f[1]==1

invalidate invalida invalida te ack te invalidate ack

cac he

Solution 2.2 f[0]=1; T0 S0 f [0] f [1] S0 S0 turn

f[1]=1;

f[0]==1

turn!=1 f[1]=0; 1

Assignment 2.3 MESI-Protocol. Consider the following example program with Threads A and B executing a() and b(), respectively: struct G { int b=0; int a=0; }; void a(){ G.b=1; int rega=G.a; // * }

void b(){ G.a=1; int regb=G.b; // * }

1. Devise a configuration of the underlying machine model, such that a sequence of instructions, reaching the respective program points * lead to both the variables rega and regb containing the value 0. 2. Draw a happened-before diagram of a sequence of instructions that lead to this state.

cac he

B

M1 []

[ b=1 ]

I

M1 []

[ a=1 ]

G.a=1; Invalidate buffers;

I te ack invalidate

S0 G.b S0 G.a store [ ]

rega=G.a;

G.b=1;

invalida

S0 G.b [ ] store S0 G.a

cac he

A

invalidate invalida te ack

Solution 2.3 Store buffers;

regb=G.b;

2

rega=G.a; M1 [a]

invalidate invalida te ack

cac h

I

cac h

B

I M1

e

S0 G.b S0 G.a invalid [ ]

[]

ack invalidate

S0 G.b invalid [ ] S0 G.a

invalida te

G.b=1;

e

A

[]

[b] G.a=1;

regb=G.b;

Assignment 2.4 Dekker Implementation.

1. Implement Dekker’s algorithm without memory barriers. To implement Posix threads in C, you might want to look for pthread_create() in pthread.h and compile with the -pthread compiler flag! 2. Demonstrate that out-of-order execution actually breaks Dekker’s algorithm when implemented without memory barriers. Hint: Clever instrumentation makes the difference! 3. Introduce memory barriers in your Dekker’s implementation; Test whether you can still observe broken behaviour. The statements to introduce memory barriers are compiler dependent. • Clang or GNU C++ as in MingW/Orwell-Dev-C++ or Linux systems use __sync_synchronize(void), • MacOS’ Xcode uses OSMemoryBarrier(void) defined in libkern/OSAtomic.h • MS’ Visual C++ uses _mm_mfence(void) defined in intrin.h As an environment for threads, you may use Posix threads, e.g. // gcc -pthread dekker.c -o dekker #include // pthread_create, pthread_exit #include // printf #include // exit int main(int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc; long t; flag[0] = false; flag[1] = false; 3

for(t = 0; t < NUM_THREADS; t++) { printf("In main: creating thread %ld\n", t); rc = pthread_create(&threads[t], NULL, dekker, (void *)t); if(rc) { printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); } } /* last thing that main() should do */ pthread_exit(NULL); } Solution 2.4

1. Looks like this: int flag[2]; int turn = 0; int data = 0; void *dekker(void *threadid) { long tid = (long)threadid; // keep book of the thread's id printf("This is thread #%ld!\n", tid); while(true) { flag[tid] = true; while(flag[1 - tid] == true) { if(turn != tid) { flag[tid] = false; while(turn != tid) ; flag[tid] = true; } } // start critical section data++; printf("tid #%ld -> %d;\n", tid, data); // threadnr -> datavalue data--; assert(data==0); // here we identify irregularities // end critical section turn = 1 - tid; flag[tid] = false; } pthread_exit(NULL); } 2. With memory barriers, no assert fails anymore (are all of the barriers used here really necessary?): 4

int flag[2]; int turn = 0; int data = 0; void *dekker(void *threadid) { long tid = (long)threadid; // keep book of the thread's id printf("This is thread #%ld!\n", tid); while(true) { flag[tid] = true; while(__sync_synchronize(), flag[1 - tid] == true) { if(__sync_synchronize(), turn != tid) { flag[tid] = false; while(__sync_synchronize(), turn != tid) ; flag[tid] = true; } } // start critical section data++; printf("tid #%ld -> %d;\n", tid, data); // threadnr -> datavalue data--; assert(data==0); // here we identify irregularities... // end critical section turn = 1 - tid; __sync_synchronize(); flag[tid] = false; } pthread_exit(NULL); }

5