Mutual Exclusion in Shared Memory
Slides provided by
Prof. Jennifer Welch
1
Shared Memory Model
• Processors communicate via a set of
shared variables, instead of passing
messages.
• Each shared variable has a type,
defining a set of operations that can be
performed atomically.
2
Shared Memory Model Example
p0 p1 p2
read write read write
X Y
3
Shared Memory Model
• Changes to the model from the
message-passing case:
– no inbuf and outbuf state components
– configuration includes a value for each
shared variable
– only event type is a computation step by a
processor
– An execution is admissible if every
processor takes an infinite number of steps
4
Computation Step in Shared
Memory Model
• When processor pi takes a step:
– pi 's state in old configuration specifies
whch shared variable is to be accessed
and with which operation
– operation is done: shared variable's value
in the new configuration changes
according to the operation's semantics
– pi 's state in new configuration changes
according to its old state and the result of
the operation
5
Observations on SM Model
• Accesses to the shared variables are
modeled as occurring instantaneously
(atomically) during a computation step,
one access per step
• Definition of admissible execution
implies
– asynchronous
– no failures
6
Mutual Exclusion (Mutex) Problem
• Each processor's code is divided into four
sections:
entry
remainder critical
exit
– entry: synchronize with others to ensure mutually
exclusive access to the …
– critical: use some resource; when done, enter
the…
– exit: clean up; when done, enter the…
– remainder: not interested in using the resource
7
Mutual Exclusion Algorithms
• A mutual exclusion algorithm specifies
code for entry and exit sections to
ensure:
– mutual exclusion: at most one processor
is in its critical section at any time, and
– some kind of "liveness" or "progress"
condition. There are three commonly
considered ones…
8
Mutex Progress Conditions
• no deadlock: if a processor is in its entry
section at some time, then later some
processor is in its critical section
• no lockout: if a processor is in its entry
section at some time, then later the same
processor is in its critical section
• bounded waiting: no lockout + while a
processor is in its entry section, other
processors enter the critical section no more
than a certain number of times.
• These conditions are increasingly strong.
9
Mutual Exclusion Algorithms
• The code for the entry and exit sections
is allowed to assume that
– no processor stays in its critical section
forever
– shared variables used in the entry and exit
sections are not accessed during the
critical and remainder sections
10
Complexity Measure for Mutex
• An important complexity measure for
shared memory mutex algorithms is
amount of shared space needed.
• Space complexity is affected by:
– how powerful is the type of the shared
variables
– how strong is the progress property to be
satisfied (no deadlock vs. no lockout vs.
bounded waiting)
11
Test-and-Set Shared Variable
• A test-and-set variable V holds two
values, 0 or 1, and supports two
(atomic) operations:
– test&set(V):
temp := V
V := 1
return temp
– reset(V):
V := 0
12
Mutex Algorithm Using Test&Set
• code for entry section:
repeat
t := test&set(V)
until (t = 0)
An alternative construction is:
wait until test&set(V) = 0
• code for exit section:
reset(V)
13
Mutual Exclusion is Ensured
• Suppose not. Consider first violation,
when some pi enters CS but another pj is
already in CS
pj enters CS: pi enters CS:
sees V = 0, sees V = 0, impossible!
sets V to 1 sets V to 1
no node leaves CS so V stays 1
14
No Deadlock
• Claim: V = 0 iff no processor is in CS.
– Proof is by induction on events in
execution, and relies on fact that mutual
exclusion holds.
• Suppose there is a time after which a
processor is in its entry section but no
processor ever enters CS.
no processor is in CS
V always equals 0, next t&s returns 0
proc enters CS, contradiction!
no processor enters CS
15
What About No Lockout?
• One processor could always grab V
(i.e., win the test&set competition) and
starve the others.
• No Lockout does not hold.
• Thus Bounded Waiting does not hold.
16
Read-Modify-Write (rmw) Shared
Variable
• Assume: The state of such a variable can
be of any size.
• Variable V supports the (atomic) operation
– rmw(V,f ), where f is any function
temp := V
V := f(V)
return temp
• This variable type is very “strong”: One
shared variable suffices to achieve “no
lockout”
17
Mutex Algorithm Using RMW
• Conceptually, the list of waiting processors is
stored in a circular queue of length n
• Each waiting processor remembers in its
local state its location in the queue (instead of
keeping this info in the shared variable)
• Shared RMW variable V keeps track of active
part of the queue with first and last pointers,
which are indices into the queue (between 0
and n-1)
– so V has two components, first and last
18
Conceptual Data Structure
The RMW shared object
just contains these two
"pointers"
19
Mutex Algorithm Using RMW
• Code for entry section:
// increment last to enqueue self
position := rmw(V,(V.first,V.last+1)
// wait until first equals this value
repeat
queue := rmw(V,V)
until (queue.first = position.last)
• Code for exit section:
// dequeue self
rmw(V,(V.first+1,V.last))
20
Correctness Sketch
• Mutual Exclusion:
– Only the processor at the head of the
queue (V.first) can enter the CS, and only
one processor is at the head at any time.
• n-Bounded Waiting:
– FIFO order of enqueueing, and fact that no
processor stays in CS forever, give this
result.
21
Space Complexity
• The shared RMW variable V has two
components in its state, first and last.
• Both are integers that take on values
from 0 to n-1, n different values.
• The total number of different states of V
thus is n2.
• And thus the required size of V in bits is
2*log2 n .
22
Spinning
• A drawback of the RMW queue algorithm is
that processors in entry section repeatedly
access the same shared variable
– called spinning
• Having multiple processors spinning on the
same shared variable can be very time-
inefficient in certain multiprocessor
architectures
• Alter the queue algorithm so that each waiting
processor spins on a different shared variable
23
RMW Mutex Algorithm With
Separate Spinning
• Shared RMW variables:
– Last : corresponds to last "pointer"
from previous algorithm
• cycles through 0 to n
• keeps track of index to be given to the
next processor that starts waiting
• initially 0
24
RMW Mutex Algorithm With
Separate Spinning
• Shared RMW variables:
– Flags[0..n-1] : array of binary variables
• these are the variables that processors
spin on
• make sure no two processors spin on the
same variable at the same time
• initially Flags[0] = 1 (proc "has lock") and
Flags[i ] = 0 (proc "must wait) for i > 0
25
Mutex using separte spinning
Initially Last = 0; Flags[0]=1; Flags[i]=0, 1
my-place := rmw(Last, Last+1 mod n)
wait until (Flags[my-place] = 1)
Flags[my-place] = 0
Flags[my-place + 1 mod n] = 1
26
Overview of Algorithm
• entry section:
– get next index from Last and store in a local
variable myPlace
– spin on Flags[myPlace] until it equals 1
(means proc "has lock" and can enter CS)
– set Flags[myPlace] to 0 ("doesn't have lock")
• exit section:
– set Flags[myPlace+1] to 1 (i.e., give the
priority to the next proc)
27
Question
• Do the shared variables Last and Flags
have to be RMW variables?
• Answer: The RMW semantics
(atomically reading and updating a
variable) are needed for Last, to make
sure two processors don't get the same
index at overlapping times.
28
Invariants of the Algorithm
1. At most one element of Flags has
value 1 ("has lock")
2. If no element of Flags has value 1,
then some processor is in the CS.
3. If Flags[k] = 1, then exactly
(Last - k) mod n processors are in the
entry section, spinning on Flags[i], for i
= k, (k+1) mod n, …, (Last-1) mod n.
29
Typo in textbook: replace (k-Last-1) on page 69 first paragraph by (Last-k)
Correctness
• Those three invariants can be used to
prove:
– Mutual exclusion is satisfied
– n-Bounded Waiting is satisfied.
30
A lower bound on number of
shared memory states
31
Lower Bound on Number of
Memory States
Theorem (4.4): Any mutex algorithm with
k-bounded waiting (and no-deadlock)
uses at least n states of shared
memory.
Proof: Assume in contradiction there is
an algorithm using less than n states of
shared memory.
32
Lower Bound on Number of
Memory States
• Consider this execution of the algorithm:
p0 p0 p0 … p1 p2 pn-1
C C0 C1 C2 …… Cn-1
p0 in CS by p1 in p2 in pn-1 in
No-deadlock entry entry entry
section section section
• There exist i and j such that Ci and Cj
have the same state of shared memory.
33
Lower Bound on Number of
Memory States
Same shared memory state in Ci and Cj
pi+1, pi+2, …, pj
Ci Cj
p0 in CS, p0 in CS,
p1-pi in entry, p1-pj in entry,
rest in remainder rest in remainder
= sched. in which
p0-pi take steps alternately
by ND, some ph ph enters CS
has entered CS k+1 times while
k+1 times pi+1 is in entry
34
ND = no deadlock, CS = critical section
Lower Bound on Number of
Memory States
• But why does ph do the same thing when
executing the sequence of steps in when
starting from Cj as when starting from Ci?
• All the processes p0,…,pj do the same thing
because:
– they are in same states in the two configs
– shared memory state is same in the two configs
– only differences between Ci and Cj are
(potentially) the states of pi+1,…,pj and they don't
take any steps in
35
Discussion of Lower Bound
• The lower bound of n just shown on number of
memory states only holds for algorithms that
must provide bounded waiting in every
execution.
• Suppose we weaken the liveness condition to
just no-lockout in every execution: then
square-root(2n) + ½ distinct shared memory
states is a lower bound
• And if liveness is weakened to just
no-deadlock in every execution, then the bound
is just 2 (see algo. using test&set: slide 13)
36
"Beating" the Lower Bound with
Randomization
• An alternative way to weaken the requirement
is to give up on requiring liveness in every
execution
• Consider Probabilistic No-Lockout: every
processor has non-zero probability of
succeeding each time it is in its entry section.
• Now there is an algorithm using O(1) states of
shared memory.
Recommended reading: Section 14.2
37