Software Transactional Memory

Software Nir Transactional Memory Dan Shavit* MIT Touitou and Tel-Aviv Tel-Aviv University by Abstract University constructing classes ...
Author: Isabel Small
1 downloads 0 Views 846KB Size
Software Nir

Transactional

Memory Dan

Shavit*

MIT

Touitou

and

Tel-Aviv

Tel-Aviv

University

by

Abstract

University

constructing

classes

blockirag [7, 15, 14]. As we learn

from

chroni~ation signing

highly

isting of

a

literature,

is

flexibility

greatly

concurrent

hardware

Buildktg

inflexible

on

chronization

for

supporting

synchronization

can

of

be

Herlihy

provide

a general

Empirical

all

a k-word

evidence

chitectures the

concurrent

shows

that

lock-free

translation

and

outperforms

Herlihy’s

numbers

of processors.

software-transactional

always in

translation

based

the

and

to

a Load.

unlike

most

by

employing

operating

proposed

Barnes

style

protocols,

pol-

language

they

1

tency

Introduction

adding

are

able

using

protocol.

can

, and

or soon on the

for

to

level

a single

algorithms

the

of

word, imprac-

problem

on

machines

Herlihy

and

hardware

to

can

trans-

associative

to the

support

cache

a flexible

cache

consistency transactional

operations.

be written

Moss

solution:

a specialized changes

of provid-

existing

support.

synchronization

operation

executed

which

current

concurrent

primitives

minor

writing

chronization

[18]

operations operation

system

By

several

for

of the

an ingenious

memory.

making

helping”

icy.

most

to overcome

programming

and

Heap

Corn pare&Swap

support

[7] suggested

efficient

actional

of our

lists net-

Fetch&Complement

of

Rappoport’s

highly

on

counting

future.

ing

have

the

of

on two

compression

a three-word

of these near

simpro-

a Corn pare&Swap path

Linked/Store-Conditional

greatly

data-structures

combination

and

architectures

in the

[16]

for sufficiently

use

non-

flexibil-

concurrent

operation,

Unfortunately,

Bershad

of Barnes,

parallel

using

more.

operations

use

S pi ice

which

be developed

making

ar-

[2]

are

literature,

non-blocking

which

a special

that

the

non-blocking

the

[22]

implemented

tical

based

are

Pu

[5]

from

designing

Fetch&I nc, Israeli

translating

efficiency

“recursive

of

implementations

synchronization

of

Anderson’s

many a

STM

and uses

and be

the

task

Examples

ones

style

to the

is that

on a costly

works

outperforms

method key

approach

it is not

offer

multiprocessor

method

The

which

STM-transaction.

methods

large

methods,

our

for

the

words,

syn-

only

choosing

Massalin

software

use

lock-free

on simulated

the

we

using

method

compare&swap

collected

single

non-blocking,

We

to

grams.

programming

machines

implementations

on implementing

is

ex-

a

novel

in

plifies

level

Moss,

ity

de-

the

on

a

operation.

highly

object

of

transactional and

STM

existing

Load.Linked/Store.Conditional sequential

on

transactional

operations. on

best

(STM),

flexible

implemented

at

based

memory

syn-

task

operation

hardware

transactional

method

is

itional

the

choosing

the

Unfortunately,

and

methodology

soflwar-e

in

simplifies

programs.

Load_Linked/Store_Cond

word.

of

the

operations

of

As we learn

Any

syn-

as a transaction

an optimistic

algorithm

built

into

Unfortunately

though,

this

the

transactional

and

the

consisis block-

solution

ing. A major chines

obstacle widely

designing Given an

on the

highly the

increasingly

serious

tention

for

to

to

number gram

timing

highly and uses

they and

by

limit

anomalies

of

(possibly

modern

means

and

critical eliminating

make

critical

is

the

a multiprocessor sections

its

transactional

supports

flexible

face

clear

f0cu5

memory

pro-

altogether)

tions

which

This

class

Though

on

access

we cannot

and

processor

failures.

most

of in the

aim

known

the

memto todays in the

transactional

that

sequence

the

for

resiliency

a software

transactions,

a pre-determined

primitives

that

transactional

of applicability

of

static

includes

chronization

and

introduce

design

of synchroniza-

machines,

implement.ati0n9

support

a novel

software

in terms among

anomalies

that

(STM),

our

advantages

We

programming

software.

portability

approach,

implementation.

memory

performance,

of tirnhg

We

the

in

overall has

to adopt based

transactional

operations

machines,

The

proposes hardware

sofiware

ory

system

decrease

paper

not

same

con-

failures. to

im-

sections

increase

but

tion

for

of critical

processor

sections

is

multiprocessor

and

programming

delay

techniques

parallelism,

interconnect,

concurrent size

in

in

structures.

unpredictable

conventional

objects

since

memory

vulnerable key

problem

data

This

ma-

of programmers and

that

that

concurrent

unsuitable,

multiprocessor

difficulty

programs

realization

we argue

plementing

to making

is the

concurrent

growing

architectures,

are

way

acceptable

and

is,

transac-

of locations. proposed

syn-

literature.

Permission to make d@al/hard copies of all or part of this material for personal or classroom use is granted without fee provided that the copies a~e not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and ik date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission PODC

and/or

95 Ottawa

1.1

CA

01995

ACM

0-89791-710-3/95/08.

of the

.$3.50

Author:

E-mail:

shanir@theory.

lcs. rnit .edu

operation

204

a nutshell environment,

operations

clusively lContact

in

In a non-faulty

fee. Ontario

STM

is usually

ownerships Op.

on the

If a transaction

the

way

based

to ensure

on locking

memory cannot

locations capture

the

atomicity

or acquiring accessed

exby an

an ownerships

it fails,

and releases

erwise, acquired.

To

deadlocks,

which

the

guarantee for

ownerships

continue

certain

process

which

This

other

same

location

plete

its

tional

is

to

only

help

eral

has

single

transactions

key

order

to

of

attempting

is the

cooperative

this

location in

free

if

ing

the

gives,

to

com-

one

release

we

using

cific

can

by

The

raon-

cooperative

other

One

can

use

method into

to

STM

for

to

[6, 26].

The

memory

to

tions.

non-blocking

Herlihy,

in

sequential

objects

cording done ory,

to

his

by first

switching

the the

help

into

it

to the

solution

Herlihy

for

large

current

updating.

Alemany

suggested the

new

data

structure,

to

whole

improve

price

of loosing

support

making

not

object, and

the

and

all that

portability,

standard

does

not

support

[4] and

with

as our and

drawbacks

em-

its

spe-

which

are

method:

a

recursive

causes

access

structure

processes

a disjoint

part

using

of

to

help

of the

data

ever,

in many

cases

since

have

to

will

have

to first

P‘s

operation

help

to

On

[20]

b, help

sys-

Q,

will

then

Q,

likely

other

hand,

and

fail.

for

again

P,

Moreover,

it,

when

also

the

P

and

location

after

b will

only HOW-

change

read

waiting

requesting

to

to

likely

help

P.

executes

the P

restart.

only

the

P

transaction,

a and

help

find

if

STM

release

have

most

P,

processes

as an

not have

on system

help

and

not

already

processes

some

coopera-

a

in

any

P has

ac-

redundantly

P. the

a will

the

the

pare&Swap

method

All

the

operation. will

P

a 2-word that

operation

own

P’s Compare&Swap

retry.

b, all

its

its

but

helped

Assume to

co-

fail

executes

b.

Q complete

11 after

of

nevertheless

which

on

level

percentage

b. According

continues

Q changed

will

P

are

a and

owns

helps

b and

transaction operations

they

locations

P first

acquires

quired

since

Q already

method,

Compare&Swap

the

a high

a process

on

process

case

con-

operating

assumptions

example

on

Compare&Swap

contention

for

k-word fail

“helped,”

k-word

then

ap-

LaMarca

general

tive

not

processes.

memory

a suitable

the

by

a set of strong

is

operations.

provide

of this

Ac-

of mem-

like

Felten

efficiency

’s

tentatively

atomic

and

the

at

version

does

of locking

tem

new

thus

by other Take

of

structure block

are

Compare&Swap

Herlihy

ones.

allocated

structures

proach

as

transformation

a data

’s method

data

loca-

guarantees

the

has

mostly

generate

other

of Load_ Linked/Store_Conditional

Unfortunately,

a

Com-

desired

concurrent

a new

on

to

k-word

sequel

updating

into

changes

pointer

the

non-blocking

methodology,

changes

the

a general

method

transactional

which

operative

succeed. in

offer

which

STM’S

and

of

transactional

of

implementation

to

to

copying

making

the

(referred first

approach

as an atomic

always

general

frequently

processes

operations

implementations

caching

2) on

[19]

streamlined

Compare&Swap

However,

translation

method

which

Unlike

concurrent

use

Figure

STM will

[15]

was the

the

collection

them (see

transaction

method),

object

on

any

performing transaction

The

highly

is straightforward:

implement

object,

some

based

approach to

a general

and

implemen-

and

.

major

is done

Barnes

structure.

Translation

sequential

ones

pare&Swap

that

provide

translating

non-blocking

shared

Lock-free

this

Rappoport

a clean k-word

two

based

and

needs

the lock-

by

call

“helping”

Sequential

STM

and

by Israeli

the

have

our

it helps

on specific

method,

both

approach

Though

are vague

achieving

a process

operation, chain.

paper

to

caching

process

cooperative

suggest,

the

by another own

executing

key

whenever

Prakash

implementation

overcome

redundant-helping.

1.2

in

of a non-blocking

results

update

method

its

a recent the

the The

Load-Linked/Store_Conditional

pirical

sev-

a location

which

details, using

in

locks.

dependency

and

implementation

need

among

complete the

the

behavior

locked

along

Shasha,

tation

transacone

to

involved

releasing

already

process

capture

the

and

recursively

the

Turek,

Moreover,

help

a location

to

out,

a location

policy

of [6, 26]

methodology,

coordination

to

helping

resilient

we must

feature

transaction.

overhead

a “reactive”

of

The

owner

the

non-blocking

even

to

locations

the

swapped

trying

the

eliminate

In order

a “helping”

owner

in

order.

key,

operation

by acquiring

delayed,

are

of their the

environment,

been

Oth-

ownerships

first

completes

by

the

the

is done

a faulty

which

help

avoid

must

increasing

achieved

is that

its

employing

it

transaction.

approach

one

transaction

transactions

own

effectively

in

every

acquired.

frees

transactions

in some

executes

crashed.

forcing

static

liveness

that

already

Op and

liveness,

needed

ensuring

make

or

the ownerships

it succeeds in executing

Finally, value

The

The

P.

will

if

to

processes

processes Q hasn’t

of b in its

2-word fail

own

Comacquire

waiting

for

waiting

for b will

changed

b, P will

cache.

behavior. To overcome in

[6],

the A

whole

limitations hk

object

similar

and

approach

Shasha,

and

“simulates”

vate

memory, the

in the

stores rest arts the

the

cache new

from

k-word

a location but

writing

process

uses

atomic

operation

memory

update.

values the

in the

beginning.

Read-Modify-write

disjoint

to the for

Barnes,

is done an

this

memory. Barnes

To make ceptable,

Turek,

its

which

the

checks

is the

case,

Otherwise, suggested

by locking

has

the

the

Results

techniques Parallel We

val-

when

(see

Section

found

blocking

process

method periments

order

205

the

cited

translation

to reduce

the

system

5) the

stable

above.

We

on

use

a simulated

accessing

translation and

the show

that

acone

(non-faulty).

We

comparison of

well

Alewife

machine, the

method

cooperative

the

methods overhead

the

of

translation

accepted

Proteus

[8, 9].

shared-memory in

stable

experimental conditions

Simulator

that

performance

is

first

under

Hardware

concurrency

to implement

needs

pay

distributed

value

operation

in ascending

sequential-to-non-blocking

performance

k-word the

the

to

private

if the

to the

Empirical

one

present

pri-

is done

non-blocking

are equivalent If

updating.

time

into

Our

by

in

first

1.3

copying

a process

updating the

Barnes,

avoids

proposed

According of

’s method, that

independently

execution

the

Write

the

concurrent

memory

Then,

Read-Modifyin

allows

i.e reading

ues contained read

method,

[26].

the

shared

memory.

of Iferiihy

caching

was

Prakash

first

from

the

introduced

object

[1] cache-coherent as

the

grows,

outperforms method.

in general

STM

potentiid the both

Unfortunately, and

other

for

STM

non-

Herlihg’s our

non-blocking

ex-

Dequeueo BeginTransaction DeletedItem G Read_transactional( Hesd) if DeletedItem Q Null ReturnedValue = Empty else Write-transactional( Hesd,DeletedItem+ if DeletedItem-+Next n Null Write-transactional( ’Ml, Null) ReturnedValue n DeletedItem+Value EndTransaction end Dequeue

k.word_C&S(Size,

DataSetlJ,OldH,

for

i=l

to

if

Size

do

Read_transactional

techniques

are

methods In ible

A Non

inferior

such

architecture

I:

to

summary,

similar

STM

shared

and

improved

section

objects,

for

i=l

to

Size

mentation

and

Finally,

in

and

ReturnedValue

for

design

which

ensures

the

lock-based a shared

package

software

transactional

standard

properties

in non-faulty

in

ones.

of the

our

concur-

faulty

The

our

A

begin

variant is

by

of the

a finite

transactional

sequence

the transactional

memory

of

local

and

memory,

of [16]. shared

A

a

- reads

a local

the

value

memory

of

machine

a shared

location

The

data

set of

a transaction

accessed

by the

structions.

Any

cessfully, other

– stores location.

in

of the

which

case For

of a doubly

returns A

dequeued

k-word

proposed

The

k-word

two

cessful

values

that

case,

memory turns A

item

and

New

the

1 may

transaction

as in the

stores

a C&S-Success

are

data set’s

transaction

Figure set,

the

checks

New value,

its

size.

equivalent

size

A suc-

a finite

implementation the

old. into

otherwise

In

try

the

the

set,

set

will

and

focus

on

supports

of the

in

the

in

Figure

known

literature.

Dequeue

that

could

2 is an procedure

in

(but

not

in

one

whole

cannot

and

with

system.

An

be swapped

implemented fail

forever,

if

out

transac-

order

However,

if pro(as when

used

only

can be made

since the

the

is non-blocking

in different

list).

if

implies

successfully

if it

implementation during

terminates

the

repeatedly

non-blocking,

swapped

same

terminate

hardware

linked

which

successfully

non-blocking

the

tolerant,

locations

their

is

will

theory

process

by a process

a process

The

two

if any

It

necessarily

times.

a doubly

Our

(1)

data

that

terminates

of attempts

transactions,

3

be

data

most

transaction

is swap

to write

updating

never

while

transaction)

assumption

of [16]

will

the

we

transaction

transaction

(not

number

many

tolerant

includes

of attempts.

process

after

static

paper

in

thus

a deterministic

the

operations

of some

dtierent

is repeatedly

whether

to

values

2 is

number

STM

tions

and

is wait-free

the

execution

some

cesses

(3)

memory

that

can

be stored

This

implementation

a possibly

under

transaction

and

as parameters

inputs

transaction,

executes

infinitely

successfully

value.

data

the

be performed

or an Empty

of the

to

~from

terminates

memory

transaction returns

a value

as parameters

and

atomically

gets

a transactional a class

of

a single

execution

for

swap-

process

which

of a transaction

successfully).

the it

re-

Implementation

of

Static-STM

CtYS-Fadure. sofiware which

changes action

the

in

visible

as in Figure

insuc-

form

1 is not.

a finite

that

locations

or complete

dequeuing

Compare&Swap

stored

are

transaction

gets

Old

k-word

the

ject

which

vectors

of shared fail,

trans-

order.

advance,

should

Compare&Swap

STM

after

the

which

synchronization

of a static

in Figure

register

Write-transactional

either

changes

Compare&Swap

a transaction and

its

list

the

of a local

set

and

may

example,

linked If

the

is the

among

transaction

transaction. of

transactions,

and

repeated

transaction

as a transaction. it

contents

Read-transactional

processes.

head

the

on

values

in

which

based

new

output

An

a shared

following

sequen-

real-time

a special

of the

which

repeatedly into

the

execute

order

their

is known

inputs

implementations

register.

Write.transactional

with is

set

the

the

example

into

satisfy

to

sequential

of as a procedure

static.

transaction

instructions:

Read-transactional

appear

The

data

set (2)

returns

sofiware

should

interleaving.

transaction

the

thought

performance

Memory

presenting

without

is consistent

static

which

proof.

function

We

transactions

i.e.,

actions

imple-

correctness

empirical

memory [13]:

Serializability:

runs

data

Transactional

Transaction

following

3 we describe

a sketch

tially,

evaluation.

2

2: A Static

bus

of flex-

of highly

resiliency

In Section

present

C&S-Success

EndTransaction

A tomicity: software

5 we

(DataSet~],New~])

=

k_word_C&S

end

for

a novel

provide

Section

Results

offers

STM,

Old~]

do

Wrke_transactional

non-resilient

[23].

performance

introduces

#

C& S-l%lure

Exitl’ransacti.m

Next )

in flavor.

coordination-operation

=

Transaction

standard

as queue-locks were

rent

Static

(DataSet~]])

ReturnedValue

Figure Figure

NewD)

BeginTransacti.m

transactional behaves

to its

addresses

is a thread

of primitive

operations

memory

like

a memory by means

of control to

that

memory.

(STM), that

is a shared supports

of transactions. applies Any

a finite

ob-

multiple A

implement

Memory[M],

trans-

of

206

a non-blocking a vector

transactional

sequence

implementation

We

memory,

termines

for

it.

process

Each

which

any

cell

static cent ains

TM the

Ownerships[M], in Memorg[lll],

keeps

in

the

shared

of

size

data a vector

which memory

M

stored

using in the

which

transaction a record

deowns with

StartTransaction(

input,DataSet

Initialize(Tranj

,input

Tranj

=

+ Stable

)

,DataSet

+

-.

Stable

=

return

tran,version) tran+stat.s)

version,’lkue)

False

Tranj +. Version++ if TTanj + Status =

rrm,versiOn,IsInitiatOr)

AcquireOwnerships( f;~ayi~~fl/l;h~j(

TransactiOn(Tranj,Tranj Tranj

‘hansaction(t

)

True

if

(version

#

tran-+version)

SC(tran+status, Success

(Success,

then

CalcOutput(Tranj

-+

OldValues,input))

if status

else

=

=

I?dure

Success

NewValues

return

O))

LL(tran+status) then

AgreeOldValues( return

then

(Success,

(status,failadd)

tran,version) =

CalcNewValues(tran+

UpdateMemory(tran,

version)

ReleaseOwnerships(

tran,version)

ReleaseOwnerships(

tran,version)

OldValues,tran+ NewValues)

else

Figure

if IsInitiator

3: St artTransaction

then

failtran=

Ownerships[failadd]

if failtran

=

Nobody

then

return

the

following

set.

Addo

in

fields:

increasing

order.

Oldvatues~ Null

the

successful

tion

are

of the its

– the

vector

involved

in

record

termines

cent ains

this

vector

to

every process

and

time

which

the

PJ,

the

former

may

a Figure

transac-

The

eventually

help

O, which

transaction.

process

terminates

determines

AcquireO

wnerships(tran,

transize

de-

This

for

address

P]

the

the

stable,

that

and

the

if

the

by

3. Transaction the

processors

checks

record’s

will

After the

called

by the

The

parameter

output

of the

Mmtiator,

from

read

never

version

by the

or by

the

parameter

during

ownership

the

If the

sets the

new

it

old

the

a helping

that

the

owns record 1 The Validate

the

status 0).

values

into

the

of

operation

this

field

In

ReleaseO

case

for

record,

the

status

the

failure.

The

it

already

owns

it helps

Helping

the

i =

1 to

if LL(

size

(Null

, O) ) then

while

ion],

tran)

then

loop (Failure,i)

) then

version) do

tran+Add~] O wnerships[locat

if tran+.

version

ion])

AgreeOldValues( size

= i =

=

version

#

SC(Ownerships[location],

tran

then

then

return

Nobody)

tran,version)

tran+si5e 1 to

field

location=

tries

if LL(tran+

size

do

tran+.Add~] OldValues[locat

if tran+version

#

ion])

version

#

then

Null

then

return

SC(tran+OldValues[location],Memory[location]) UpdateMemory(tran, si5e

the

for

process

field

and,

the

in

transaction

size

do

LL(Memory[location]) AllWritten #

if oldvalue#

then

if

case

(not

then

newvalues~]

LL(tran+

if version

return

tran+version

SC(Memory[location],

re-

newvalues)

tran+Add~]

if version

newvalues~])

AllWritten)) #

return

then

tran+version

then then

return

SC(tran+AllWritten,True)

which

only

1 to

if tran+

contains first

i =

version,

tran+size

oldvalue=

and

process

=

location=

calculates

memory

is performed

loop

tran+si5e

location=

(Fail-

yet,

the

to the

while

returning

a vzlue

of success

exit

SC( O wnerships[locat

wnerships(tran, =

calling

be set to

have

Otherwise,

process,

upon

will

doesn’t

them

that

location.

a stable

so then

writes

caused

a helping

do

return Add~]])

return

rou-

first by

then return

then

if SC(tran+status,

of the

version

locations

transaction’s

to be stored,

failing

use

the (Success,

ownerships

is in

to

the

then

SC(tran+status, if

and

the

Transaction, set’s

fails field

that

it is not

data

Nobody

process.

when

Null

version

else

and

number

used

since

call.

the

status

ownerships.

location

leases

to

values

releases the

it

AquireOwnership,

zme, fadadd).

the

on If

process

tran

Transaction

instance

is not

=

suc-

executed,

#

=

the

input

whether

process

initiating

AquireOwnership.

writes

#

(Ownerships[tran+

exit

as parameters

transaction

contains

This

change

acquire

process

LL

if owner if

for

from

tran+.add~]

if owner

a con-

has

the

4), gets

indicating

initiating

executedl is crdled

to

=

as

executing

transaction

(Figure

address

was

will

=

first

record

size

Transaction

value

tine

location owner

vector.

a boolean

record

do

do

if LL(tran+status)

of

a transaction

declares

transaction.

process

procedure

tran,

of

then

helping

the

if so calculates

Oid Vaiues The

record

any of

the

execution

ion rout ine of Figure

description

ceeded,

the

process’s

ensuring

transaction the

initiates

Transact

initializes

sistent

size

true

if tran-+version

process

calling

version)

tran+size

1 to

while

record. A

=

i =

field

a transac-

the

4: Transaction

other owner

of the

Tran3

of

the

initially

failtran+version

values

of the input.

=

if failtran+stable

to

case

between

an integer,

failversion

TransactiOn(failtran,failversiOn,Fslse)

initialized In

the

data

addresses

transaction.

are

output

synchronize

number

tion

every

The

processes

instance

is incremented For

vector

set

the

cells

this

locations.

the

of

transaction.

Version–

the

input

which

size of the

data

every

order

and

the the

of

from

used

transactions:

its

Input

transaction

is calculated

contains

cent sins

beginning

in the

fields

which

which

a consensus

at

stored

Size

– a vector

else

if the

state.

unbounded is available

field [18,

can

be

avoided

if

an

additional

Figure

19].

207

5: Ownerships

and

Memory

access

input)

Since by

AcquireOwnerships

the

that

initiator

(1)

same

all

processes

locations

from

the

moment

tion.

The

but

which

reads

ership

on

a free

it,

undecided. (Nu1l,O) read

in the

the

action,

All

have

the

before

past to

not

Any

only

prevents

acquiring

by

the To

of T is the ownership

Claim

4.2

the

process

Proof:

I ) which

ecuting

trans-

location

UpdateMemory in

order

to

after

the

so every

process

updating

the

wrote location

A failing

Formallyj

following

actional

[21],

memory

static

for

transactions

k types TranJ

as

(Sketch

than

Return, ..n.

In

set to Failure.

owner-

failing

location

the

acquired

and

cess should

sets

memory

failing

of a static

that

supports

described

as an

(DataSet)

and

(FinalStatus,

k types

trans-

where

implementation,

transaction

record

of the

version

started

the

any

tran,

field).

the

initiator Transaction

of

to

be

helping

of T

T.

All

with the

an instance

Therefore,

execution

tine

and

transaction

(which

the

owns

k and

T is related

to

processes

parameters

helping

are the

The

implementation

record

which

of T.

executing

which tran)

execute

The

as rouare

initiator

processes

a

content

(tran,version,False),

processes

processes

(the process

the

4.1

is

atomic

and

of T.

owned

P has

of thk

and

lemma

instruction

should

All

the

4.3

same

data

Any

set vector

executing

set will

processes

serializ-

not

process be

able

T

of to

was

stored

which

update

any

T read

by T“s

read

.

of

the

the

the

saw

the

if P has

failing

pro-

belongs itself.

before

failing

transac-

the

to T. But,

in

failing

process

pro-

saw

the

executed

the

Store-Conditions

therefore

the

Store-Conditions

loI I



has failed.

the

initiator.

diRerent

data

shared

data

) Assume

nates

successfully.

failures

is finite.

in the

computation,

for

the

same

on

tries

such

By

dresses

higher

have

Claim

4.2

are

the an

those

there

of

failing

the

of

has failed

there

are infinitely on

A

many but

failed

to the

con-

initiator

which

have

one

and

ownership

A – a contradiction

than

lo-

every

is at least often,

transactions

the

is completed, number

Since

that

if

Since

transaction

transaction

acquired

pro-

location

only

often.

infinitely

the

on, Ac-

same

happen

location.

point in the

are several

of the

that

fail

of transaction

“stuck”

infinite

that termi-

some

there

may

it follows

which

number

case

implies

to help

retrying,

transaction

infinitely

be

processes

highest

failed.

this

contradiction

no

ownership

when

must in turn

the

transaction

the

only

transaction before

In

of

if from

processes

released

there

which

the only

Thk

and

This

A,

the

transaction.

that

way which

happens

acquire

is released

sider

that

to

is squired

follows

Assume

all

try

by in

routine.

which

is non-blocking.

schedule

This

quireOwnerships cesses

implementation

(Sketch

is an infinite

it

is based on the

of a transaction

which

since P has

the

location

free

in was

Now,

then

ex-

status

process

location

failing

an

a higher

invariant the

that

location

the

first

failing

failing

ownership,

The

there

the

executing

the

be

on

transaction.

location

and

before

on the

him 1.

the

the

acquired

instruction

location

The proof

that

P

before

confirmed

on a higher

P saw the

occupied

transactions.

of proofi invariants:

to

location.

Let

By

by another

seen

since

case,

cation

able.

Sketch following

failed

owns its fail-

ownership

ownership

before

that

contrary.

location. that

P has

Therefore

location

Lemma

never

its failing

an

Therefore

have

Lemma

J C 1...

the

process

exe-

fai~ing

with

actions:

number

we define

The

k different

automaton

of output

Output)

status.

T will

the

is undefined

ownership

Proof:

our

defined

be

define

T, is the

failing

that

acquired

P acquired

status

actions:

Request,

Zel.

n processes

the

location

which the

4.1,

cation

specification

that

) Assume

process

tion’s

Outline

the

can

of input

TTanJ

the

Proof

to T’s

transaction,

or a higheT

prevent

ownerships.

Correctness

Failure

we first

on it.

cess saw it occupied

4

property transaction

is still

a different

memory

do

after

location acquire

Lemma to

which

non-blocking of a failing

own-

status

any

the

process

process

ing location

process

Store.-Conditiona

synchronize

updating

the

(with

(2)

to prove failing

transac-

proving

before

as owned

to be True,

releasing

cuting

the

the

becomes

that

transaction

values

released.

field

on

Failure.

new

from

the

This

processes

been

Written

the

status

5, the

process

ships

that

for

for

have

by writing

that

between

transaction

allowed property.

will

field.

in

writing

a slow

location

status

version

In order

ensure

instructions)

of the are

non-blocking

is done

location

Figure

status

either

ownership the

is essential

confirm

to set the

When in

to This

acquire

by checking

the

the

be called we must

Store.Conditional

property

also

5 may

processes

to

ownerships

second

atomicity

the

that

no additional

try

is done

and

fixed,

helping

will

(this

Load.Linked

the

of Figure

or by the

fact

have on

ad-

A is

that



highest.

structures. To 2.

All

the

executing

processes

acquire

ownership

All

the

ownerships

the

version

field

after

of

of a transaction

the

the

owned

by

T’s

record

status T

will

T will

be released

is incremented

gorithm

never

of T has been

set.

T’s

will

All

the

T will

executing update

the

processes memory

of a successful before

T’s

transaction

AllWritten

field

is

set to True.

208

only

helping

increase

of the

or

decreases

“redundant

the

helping

and the

the

avoid

al-

as much

when

a failing

process.

Such

help-

consequently,

will

cause

In

interval it

any

implementa-

ownerships

helped.

helps”

occur,

S’rM

occurs

non-faulty

release

if not

must

In

contention to

released

no failures

paradigm

helping.”

another

process

have

when

redundant

“helps”

cess increases function

on the

above,

helped

would

overheads

“redundant

given

the 3.

based

transaction ing

initiator.

major

as possible tion

before by

avoid

our

later

then

algorithm,

between discovered.

it

a prohelps

as a

5

An

Empirical

tion 5.1

Evaluation

of

Transla-

no

Doubly

Methodology

We

compared

methods ing

Colbrook

and

without

and

[8].

Our

2048

contention

of switching

software

at

Dellarocas,

architecture

was

MIT

[1].

of 6 bytes

and

4 cycles

or wiring

in

in the

us-

Brewer,

distributed-memory

lines

cost

other

architectures

by

network

development

with

and

network

developed

cache-coherent

under

a cache

cost

Weihl

of STIVI

bus

simulator

Alewife

currently

performance

64 processor

Proteus

of the

had

the

on

the

both

the

array

cent ain

item

in

version

of

used

a

slightly

respectively.

an item

processor

Corn pare&Swap stamp.

2

version

operation

where

On

pare&Swap

may

existing

ples

of enqueue/dequeue

1

tial

be

lock-free

We ous

used

methods

This

when

the

the

serve

as

64

bits

the

by

using

the

Alpha

a

shared

64

size

of

the

for

evaluating structures.

data

structure

We

bits

Each

of n processes

10000/n

times.

In this

change

the whole

object

increments

variThe and

a shared

benchmark state,

updates

the

counter

in

and have no built

A resource

a few processes to time

share

a process

tries

in par-

allocation

scenario

[10]:

a set of resources

and from

time

to atomically

acquire

a subset

size s of those resources. This is the typical of a well designed distributed data structure. of space we show only cesses atomically locations

increment

chosen

length

60.

highly

the benchmark

have

5000/n

uniformly

The

at random

benchmark

concurrent

times

queue

captures

and counter

the

transaction

n.

We used

t ation

[11].

a variant In

consequently dequeues heap

this

of

used

the

lier

and

with

the

is probably

greatest the

this

built

directly

cost

a memory

we the

believe the theoretical

tations

of

[17]

do not

and

the 3 The

spurious

value

empty

and

most

trying

operation

Load-Linked/Store-Conditional have

a random

s =

2

a vector

of

the

behavior

of

Proteus), while

access

to

non-blocking failures

the

it

efficient 15,

shared

the

18]

non-blocking

raeli

and

four 1.

(which wont.

Alpha between

[12]

2.

there

the

will

be achieved

only

if the

size

of

ia rela-

size.

we

to

queue-lock in

the

STM

processes

data

do

set before

value

which

compare

says

STM

methods [23]

include

solution

(the

method.

backoff

manner).

Method

Compare&Swap

cooperative

and

based

All

the ear-

exclusive

Herlihy’s

to

described

based

a mutually

k-word

the

Is-

imple-

non-blocking

[3] to reduce

contention.

leads us to conclude

differentiating

among

parallelism: do

not

process

at The

the

that

there

performance

joint

parts

The

price

the

a time

is

allow

oj

are

of the

data

update

it

to the

is a least

the

private

pointer).

when

the

only

the

data

the to

coopera-

access

Hedihy’s

dis-

object

is such

the

process

number

copy,

a failing

and the

that

the

reading nature

almost

updates

methods, locations

of

(reading

Fortunately,

caching

lock-free

accesses

of the

to

In both

In

size

protocols

performed are local.

and processes

memory

the

coherence

cached

to

update: of

copying

Herlihy’s and

structure.

number

in is at least

writing

and

parallelism

allowed

concurrent

a jailing

the

cache

locking

potential

of the

and

failure

Both

exploit

software-transactional

methods

copy

However,

in

a boolean

MCS

methods

Potential

cesses

will

general

is that

stored on

transaction

the

methods:

and

could

of

benchmarks

to be presented

factors

the

its the

private

price

accessed

of

all ac-

of a

during

execution.

PowerPC Load_Linked

3.

operations. property

the

Results

object

world then implemenor

object

translation

of the

update

is

value

use exponential

method,

theoretical

the real existing

interfering

oMl_J or not.

Rappoport’s

tive

This

Compare&Swap

to since

memory

The

than

Store_Conditional

as on

times. since

a failing

is closer

and

size is n.

benchmark

[6,

a heap

5000/n

without

simplification

is accessed

method

n processes

in

maximsl

in

a failing

Load-Linked/Store-Conditional allow

its

since

64-bit Com pare&Swap Load&inked/Store.Conditional

Store-Conditional

from

less

proposed

into

access

is

value

queue

software

a blocking

The data

from

of the

parallelism

Compare&Swap

only

to

above

structure

The

three

has n pro-

The

started,

nonblocking

5.2

k-word

on the

We

heap implemen-

each

couof ini-

enqueues/dequeues

as specialization

above.

two

implementations

of a sequential

up-

of the

a queue

to the

ia equal

queue on a heap of size

benchmark

enqueues

is initially

2 Naturally

priority

index

limited

empty,

by

dequeues

5000/n

compared

2)

structure.

A shared

the

on

supports

of the

a high

item

executes

operations

which element

and

enqueue/dequeue

value

one

Queue

process

of

Every

array,

index

head to contain

ia not

list.

a new

item’s

aa

cells

next

in each

if the

as in [24, 25]. Priority

the and

locations

to agree

methods

behavior For lack

which

of the

in

enqueues

benchmark

the

given

mentation

Allocation

tail

two

of processes,

Figure

not

data

are short,

allelism. Resource

of a queue

number

implemented

(given

Ber-

of parallelism,

Counting

data

Com -

or using

data

small

of the

first

previous

Each

queue

For

updated

tively

a time

list.

tail/head

other.

the

64-bit

scheme

benchmarks

implementing in

bits

a

the

each

we

[7].

synthetic

for

vary

amount

size n.

update

support

supports

as on the

methodology

four

methods

32

implemented

Load-Linked/Store_Conditional shad’s

not

in the

the

new

the

The

Instead

machines

by updating

size

The

of cells

the

the

architectures.

was

and

process

contain

item

does

that

of

next

architecture

array.

is a couple

Each

tadto

the

implementation

an

head

index

access

instructions.

modified

list

the

in

the

and

n.

An

list

a memory

Alewife

Proteus

Load_Linked/Store.Conditional

the

dating

with

Queue

represent

that

concurrency

linked

machine Each

Linked

since

current

for increases

a doubly

cycle/packet. The

potential

structure

Methods

number

of

is finite.

The

amount

of helping

ists

only

the

erative

209

in

methods.

by other

processes:

software-transactional In

the

cooperative

Helping and

the

implementation,

excoop-

12000,

~

1

1

8000

.-.

------

1

10

20

-------

------

30

50

k-word ations

only

that by all

the

and

so on...

the

locations.

mance

factor,

terminate

the

The

results

6.

and

the

vertical

there

architecture, higher

the

number

concurrently,

sors,

the

the

number

of

priority

queue and

accessed

most

linearly

word

updatea k-word on

the

given

in Fig-

of processors achieved.

to

the

give

This

since size

.

method

the

of the

On the

bus

significantly

the

update

with In

the queue

the

STM

Compare&Swap

5.3

-------

------

30

-------

40

50

60

local

work

can

performance

7,

the

STM

number

declines,

as the

be performed

of

a certain

of

causing

that

Every

theoretical implemented

parison them

increases

smaller

too.

im-

and

than better

the

size

than

not

the

methods)

(in

STM).

of

the

allow

chose

doubly

linked

it

is

limited:

the

paid usually

method,

priority

object.

transactions

most

for

low

two

in

it

should

ran

mark

are

results

given

inherent

As

in

processes

method a

the

failed number

granularity implies

may

of of

that

of

performs

update

the up

to

the

the

price

Table

~tkc~~~~thod.

in all

remote Israeli

of

disjoint and

10.

In

since of a 2-word

the

this

operations times

of the

advantage

should

the

the high

bench-

throughqueue

number and

that

to Israeli

priority

in Israeli

number

give

highest

sequential

is the

priority

queue

provides

and

regular

priority

algorithm

STM for

spite

the

the

highlights

1, where

entries

in all benchmarks for

the

counter

of are

of faihng Rappoport

of successful

STM and

the

other

the

k-word

pure

bench-

throughput

ratio

outperforms

the

coop-

outperforms

Herlihy’s

benchmark.

protwo-

4 In of

210

tion

fact,

since

using it

avoids

3-word freezzng

Compare&Swap [1S]

nodes

simplifies

the

a

pro-

Rappoport

different instead

aa for

As can be seen,

method except

2.5

helping.

execution

slightly

concurrent

of the

reason

We

a concurrent

4.

in Figure

counter

Com-

.

summarize in

is

all

which

recursive the

operation

of the

method, the

k-word

operation.

on

to

ation

Compare&Swap

erative

grows

the

We marks

advantage

benchmark

structure

Rappoport

put.

based

another

operation same

(in

policy

for

during

Compare&Swap

the

The

is

use

perfor-

backoff

supported

helps an

the

implementation

algorithm

it

implement

without

cooperative

a process

give

Our

Store-Conditional

is

since

one should

we compare

a specific

a software

[18],

ways com-

non-redundant-helping

Rappoport’s

whenever

method:

counter

such

and

in many

to get a fair

methods

the

for

Compare&Swap

k-word

There the

the

compare

STM

queue

method.

In order

methods

without

also

needs Israeli

benchmarks,

method

results.

than

Herlihy’s

twice the

queue

object at

queue.

penalty size:

the

though

methods

be improved

Therefore,

non-blocking

with

Therefore,

and

the

non-blocking

can

form.

and

Compare&Swap

acceasing

levels,

Test-and-Test-smd-Set.

the non-blocking

We

explicitly

queue.

number

Herlihy’s

concurrency

in practice.

purest

of all the

the

We

the

increases, Still,

all the

of

method

between

in their

uses a 3-word

a grow-

does

of processors

in than

comparison

when

proces-

conflicts,

structure

number

A

cess,

Figure

constant higher

only

k-word

methods.

Compare&Swap

is

in

because

1

20

is still

pare&Swap

level.

though

8

10

remains

work

the

perfor-

methods,

benchmark

performs

9 contains

concurrently

cesses.

are

parallelism

parallelism

accessed

concurrency

benchmark,

poorly

need

fails

number

based

caching

is a data

STM

concurrency

a failure local

the

Iocations the

Figure more

the

for

the

as the

of locations

Therefore, in

Com-

as failing

is equivalent

beyond

k-word

k-word

degrades.

concurrency, number

thus

them

unsuccessful

throughput

increases,

for

of

throughput

the

the

bus,

potential

----a --- . . . . . ..a

Benchmark

mance

caching

allocation

and On

a...

but

helping

that

not

Herlihy’s

than

of processors

proves.

A

and

oper-

is a crucial

benchmark

memory

resource

0

are

ones,

a transaction

shows

is no potential

throughput

In

---”.

e

0

helped.

shows

to the

locking

this

of the

when

axis

of updated and

operations

and

counting

axia

ia cruel

amount object

ing

the

an

transactions,

it is not

horizontal

benchmark

by

most

, and

for

The

L

concurrently,

are in turn

method,

only

as ~ailirzg

failing

locations

STM

in STM

location,

J

‘-%n-a-

1000

6: Counting

Compare&Swap

that

Moreover,

Compare&Swap

ure

k-word same

is helped

same

first

the

operations In

pare&Swap

, including

by

acceas

also

g w Q

-+-

Processors

Compare&Swap not

2000

1

1 “Q..

60

Figure

helped

method

..

g .g

1

tter+iiay!s,.rnethod ❑ ~ QUEUE spin KYcii““”X--

+------i-... _-

_____

40

1

.. . . . .. . gyera~ve

Q.....

0

+-------

1

1

STM + ““” -X

m

e

1

1

x

“n.....

o

~

. .. ..%.. ---

0

0

6000

1

STM +

-a g U3 :

1

Xk30perutiv.3method =@ .... Hertihy’s method -B-.... QUEUE spin lock -xQ.... ... .... ‘El.. .... ..

~

10000

Alewife

BUS #

..... 1 ...

❑.

implementa-

it

,7,

Alewife

BUS 12000

c

12000

1

10000

s

0 is u! :0

8000 6000

b n !4? .g p

~.\ -. %- ---..:’..*

x---

-. ,x.- ... -. -...

4000

E

..X

/’...+------+-------+’...

1

x

+

‘%..~ -’-+ ------

-+

t o~

o

@. . . . . . ..+ . . . . . . ..m . . . . . ..m..-..

-m

10

50

20

30

40

I

dl

I

0

10

o!

60

.. .

“““““~ . . .. .. x.. ., ........ 20

7: Resource

Allocation

,

1

-1

~7.....

,

m g c1 (0 to

“ “’”’-% .. .

.-x..

.x

2000

/

x,, ,$ ‘,.

.. ,

.,

10

20

30

40

50

x ....,,, ““’-x....

% la g .2 p

1000

g

500

--%- . .. .... ~. ..,.,

-’m.

1

,

1

0

60

8: Priority

Queue

Acknowledgments

10

20

Herlihy

Scale ings

The

MIT

Alewife

Distributed-Memory of

Workshop

processors, tended

Kluwer

version

publication,

Scalable

Academic

of

and

Machine:

Multiprocessor. on

this

Shared haa

In

ProceedMulti-

1991.

been

as MIT/LCS

E. W.

Blocking

Synchronization

cessors.

In

pression. Parallel

Primitives

Proceeding

for

of

Algorithms

and

An

submitted Memo

the

Asynchronous Jth

ACM

works.

for

[6] G.

T.E.

M.P.

pp.

Anderson. for

The shared

performance memory

List pagea

of

Performance

oj

Iesues

1 Ith

ACM

in

Non-

MultiproSymposium

Computation,

spin

multiprocessors.

Pages

on

1?25-134

and ACM,

N.

Shavit.

Vol.

Counting

41, No.

Net-

5 (September

Method

Structures on

for In

Parallel

Implementing

Proceedings

Lock-Free of the

Algorithms

and

5th

ACM

Architectures

1993.

Comon

199-208,

lock

Herlihy, of the

A

Data

[7] B,N

Bershad.

current Carnegie

ternatives

Systems,

1020-1048.

Barnes

Shared

1992. [3]

Distributed

on Shared-Memory

Proceedings

Journal

1994),

ex-

TM-454,

Symposium

Architectures,

and

1992.

Symposium

R. J. Anderson.

l?elten

of Distributed

[5] J. Aspnes,

1991.

[2]

60

A Large-

Memory

Publishers,

paper

appears

50

1990.

[4] J. Alemany,

August et al.

on Parallel

January

Principles

A. Agarwal

40

for their

comments.

References

[1]

1

30

Transaction

1(1):6-16,

and Maurice

.. . ... .. --.-. -.:-----y ----------------

Benchmark

IEEE

Greg Barnes

x

Processors

Figure

helpful

I

.,

1500

Processors

We wish to thank

I

I

Cooperatie method -i--Herlih ‘smethod -D-QUEU~ spin lock -x--

y

2000

r)

many

60

1

+------+

6

50

STM +

‘“”,-x.

2500

o

40

........m

Benchmark

2500

1

Cooperative method -+-Herlih ‘s method .n-QUEU~spinlock X-

..

3000

z a

........

STM + ....

1

Alewife 1

,

3500

% W : c1

........

30

BUS

$ 3

-x- -... -x .... .... .. -x

Processors

Figure

, x,

/+

{ la . . . . . . . .

Processors

4000 I

.%.-’-

x . ..

1

2000

0

.. .

.,-”

alIn

211

[6] E.A. E.

consideration

Technical

Mellon

University.

Brewer Weihl.

Practical

objects.

C.N. Proteus:

A.

lock-free

CMU-CS-91-

September

Dellarocas, A

for

Report,

183,

1991.

Colbrook,

High-Performance

con-

and

W.

Parallel-

~r-iiiza

=1====1

5000

2000

x ... ... .

6000

4000

x.-.

+4-------

--.)+-..

--------

t

i

t

i

~i

‘~

20

30 40 Processors

50

600

,

BUS ,

,

550 -

-

350

-

300

-

250

-

200

-

Linked

Queue

‘.

‘.

‘.

50

60

1

1

Benchmark

Alewife ,

,

,

,

350

200 i

‘., ‘.+ ....

x’:

‘k . . .. .. w.-.

150

. ..

%..

%-. -’--.+-

0

I

1

I

10

20

50

Figure

-

- ----

10:

Simulator.

+--.-.._.-+ I 1

1

30 40 Processors

Non-blocking

50

C.N.

Dellarocaa.

Proteus.

*------

o

ations

of Israeli

Documen-

10

20

-+

30

40

50

60

Processors

& Rappoport

[15] M.

September

User

----

,0 ~

60

implement

MIT/LCS/TR-516.

+..y

100

1989. Brewer

30 40 Processors

250 ‘.

100 -

E.A.

20

300 *

150 -

[9]

10

400

, , STM -e-- -

Cooperative mefhod -+--

500 -

r

0

60

9: Doubly

-+-.

+.-... Q. -“-+.. ‘- .. . ----- +------t--------w . . . . .. --EK......P. . . . . . ..+. -...--..-.$ , 1

0 10

[email protected]

e

1000 -

Figure

400

-

500 -

o

450

.-x..

1500 -

i

t

Alewife

-x

t

3000

Architecture

--------

‘a Priority

Herlihy.

A

Queue

methodology

concurrent

data

gramming

Languages

November

1993.

for

implementing

ACM

objects. and

highly

Transactions

on

15(9):

Systems,

Pro-

745–77o,

t ation.

[16] M. [10]

K.

Chandy

Problem. guages [11]

T.H.

and InA

and

to

CM

The

Drinking

Transaction

on

6(4):632-646,

Systems,

Cormen,

duction

J. Misra.

C .E.

Leiserson

algorithms.

MIT

Programming October

and

R.L.

In

Lan-

20th

pages

1984.

Rivest.

Herlihy

and

Architectural

Philosopher

[17] IBM.

Intro-

Press.

Annual 289-5’00,

Power

[18] A. Israeli

DEC.

[13]

M.

Alpha

Herlihy

ness

system

and

condition

action pages

J.M. for

reference

manual.

Wing.

Linear&ability:

concurrent

on Programming 463-492,

July

objecte.

Languages

and

M.

Herlihy.

action pages

on Programming

124-149,

January

Languages

and

on

Notes

Verlag,

1-17.

pages

Memory:

Data

Structures.

Computer

Architecture,

1993.

PC. Reference

199.5’. Lecture

In

ACM

Trans-

Systems,

12(3),

[19]

A.

Israeli

and

In A CM

Trans-

Systems,

13(1),

L.

Implementatione the

Synchronization.

Symposium May

Transactional

Lock-Free

manual.

in

Efficient Priority

Wait

Free Imple-

Queue.

Computer

Science

In

725,

WDA

G

Springer

A correct-

1990.

Wait-Free

Moss. for

of a Concurrent

13th

[20]

A.

LaMarca.

Synchronization

1991.

212

Rappoport. of

ACM

Computing [14]

B.

and L. Rappoport.

mentation [12]

J.E

Support

Strong

Symposium

pages A

Disjoint-Access-Parallel Shared on

Memory

Principles

Proc.

of

of Distributed

151-160. Performance Protocols.

Evaluation Proc.

of the 13th

of Lock-Free A CM

Sym-

Throughput

ratio of

STMf

Counter Doubly

linked

queue

queue

Table

posiwn

on

Principles

0.34 0.30 6.07 2.44 22.5 24.14 0.42 0.41

Bus Alewife Bus Alewife Bus Alewife BUS Alewife

Resource Allocation Priority

10 processors Herlihy’s Cooperative method method

other

1: Pure

implementation

of Distributed

throughput

Computing,

pages

130-140. [21]

N. Lynch

and M. Tuttle.

for Distributed Symposium Pages

[22]

on

kernel.

versit y. Mars [23]

J.M.

[24]

L. Rudolph, chines.

[25]

of and

Support Systems,

Allocation

the

3rd

Interna-

for

Program-

April

1991.

A Simple

Load

in Parallel

Ma-

Symposium

on

ACM

pages

Architectures,

and A. Zemach. the

Annual

Architectures

D. Shasha Making

1992

Principle

Touitou.

posal.

and

Lock

237-245,

Non-blocking.

Lock-Free University

Trees. In ProceedParallel

Locking

Concurrent In

Systems

without

Data

Struc-

Proceedings pages

Programming: April

Algorithms

1994.

S. Prakash. Based

of Database

Tel Aviv

on

June

(SPAA),

J. Turek

Algorithms

Diffracting

Symposium

blocking:

D.

Synchronization

of the 4th

and E. Upfal.

for Task

Algorithms

of

ture

[27]

Uni-

1991.

ings

[26]

OS

Columbla

Scott

Operating

M. Slivkin, Scheme

N. Shavit and

and

In Proceedings

Parallel

July

and M.L. In Proceedings on Architecture

Languages

Balancing

multiprocessor

CUCS-005-91.

1991.

Conference

ming

A lock-free

Report

Contention.

tional

ACM

Computation,

1987.

Mellor-Crummey

without

Proofs

of 6th

of Distributed

and C. Pu.

Technical

Correctness

In Proceedings

Principles

August

137-151

H. Massalin

Hierarchical

Algorithm.

of

the

212-222.

A Thesis

Pro-

1993.

213

0.74 0.45 58.9 12.9 85.61 59.8 2.8 1.1

1.98 1.92 1.44 1.75 1.09 1.12 1.26 1.27

ratio:

60 processors Herlihy’s Cooperative method method

STM

/ other

8.44 7.6 3.36 7.28 1.69 2.35 2.16 2.24

methods

Suggest Documents