Lock-Free in Swift: Memory model and Peterson’s algorithm
Today we will continue to explore atomics and the lock-free topic we started in the previous article. We will discuss the memory model that Swift inherited from C++ in its atomic package and reimplement Peterson’s algorithm in Swift.
Memory model
Finding the right place for the memory barrier can be difficult. If we use it too often, it might slow down the code, making it sequential. In C++, there’s a memory model developed for managing memory barriers for us, and Swift has adopted that memory model as well. The main idea is for each operation with atomic primitive, we decide a way we operate with barriers. In some cases, we don’t need barriers at all, and in others, we need some order.
Let’s explore the various types of memory ordering in the model:
- sequentiallyConsistent — this model imposes the strongest restrictions on the processor. The instructions before the atomic operation with that type can’t be reordered after it and instructions after can’t be reordered before the operation. This ordering aligns with an intuitive code execution flow for the programmer. It employs the heaviest barriers and that impacts the performance.
- acquiring — combined usage of two barriers Load Load and Load Store which prevent load instructions before the barrier to be reordered with the load and store instructions after the barrier.
- releasing — combined usage of two barriers Store Store and Load Store which prevent store and load instructions before the barrier to be reordered with store instructions after the barrier.
- relaxed — it’s a model that doesn’t have any restrictions and allows the processor to reorder operation for optimal performance. In other words, it involves no memory barriers at all.
- acquiringAndReleasing — a combination of acquiring and releasing.
By leveraging acquiring
and releasing
we can organize access to the resource so instructions from outside of the guarded by acquiring and releasing code cannot be moved inside and instructions inside of the code block cannot move out from it.
In the following example, we can see the initiation of an atomic variable and load and store operations on it. There is a parameter ordering
that allows us to configure the memory ordering for these operations. The available options aresequentiallyConsistent
, acquiring
, releasing
and relaxed
.
import Atomics
let test = ManagedAtomic<Int>(10)
print(test.load(ordering: .acquiring))
test.store(15, ordering: .releasing)
print(test.load(ordering: .relaxed))
Result:
10
15
Complier reorder
Another potential source of reorderings is the compiler. To optimize binary it makes heuristic optimizations and reorders. To prevent that we can use special barriers for the compilers. But we don’t need to worry about the compile reorderings because the memory barriers described above prevent compiler reorders as well.
Read-write-modify operations
In the previous article, we delved into read and write atomic operations and briefly mentioned read-write-modify (RMW) operations. As a quick recap, atomic operations cannot be divided into parts during an execution. Read-write-modify operation is another type of atomic operation based on read and write operations. It enables us to perform operations expecting possible changes of data from other threads. They are crucial components of lock-free algorithms.
Compare and swap (CAS) one of the fundamental RMW operations. It has three parameters: address
(address of the variable), expected
(a value that we expect to be located by the provided address), and new
(a new value intended to be set to the given address). If the value located by address matches the expected value CAS sets the new value to the address and returns true otherwise it doesn’t alter anything and returns false. The pseudo-code snippet below illustrates the structure of the compareAndSwap function.
// Atomic
func compareAndSwap<T: Equatable>(address: UnsafeMutablePointer<T>, expected: T, new: T) -> Bool {
if address.pointee == expected {
address.pointee = new
return true
}
return false
}
Here, we observe its representation in atomics
package and its application. The deliberately left comments above and below the atomic operations serve the purpose of clarifying where barriers can be positioned based on the insights gained from the previous article.
import Atomics
let test = ManagedAtomic<Int>(0)
// Load Load | Load Store
let (exchanged, original) = test.compareExchange(expected: 0, desired: 1, ordering: .acquiringAndReleasing)
// Load Store | Store Store
// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(exchanged)
print(original)
Result:
1
true
0
In certain scenarios, we want just to exchange the original value for the new one without any extra conditions. This operation can be accomplished using exchange
function. The primary distinction between exchange
and store
lies in the fact that exchange
allows us to employ acquiringAndReleasing
ordering and we will see it in more detail shortly.
let test = ManagedAtomic<Int>(10)
// Load Load | Load Store
let original = test.exchange(11, ordering: .acquiringAndReleasing)
// Load Store | Store Store
// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(original)
Result:
11
10
Compare and swap operation is one of the key functions used in lock-free algorithms leveraging it we can implement another widely used RMW operation known as fetch and add. This operation is a lock-free variant of the conventional increment operation.
fetchAndAdd
takes two parameters: the memory address and the increment value. It returns the value before the increment. Internally, it employs compareAndSwap
in the while loop to exchange the current value to the incremented one. This loop will iterate until the incremented value is successfully assigned. Let’s look at its possible implementation:
func fetchAndAdd<T: AdditiveArithmetic>(address: UnsafeMutablePointer<T>, increment: T) -> T {
let current = address.pointee
while(!compareAndSwap(address: address, expected: current, new: current + increment)) { }
return current
}
That is a version of the fetchAndAdd
from atomics package:
import Atomics
let test = ManagedAtomic<Int>(10)
// Load Load | Load Store
let original = test.loadThenWrappingIncrement(by: 5, ordering: .acquiringAndReleasing)
// Load Store | Store Store
// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(original)
Result:
15
10
There are different implementations of compareAndSwap
on the various processor architectures. In the x86 architecture, it’s implemented as a single atomic command compareAndSwap
. However, on ARM architectures, it involves the usage of two atomic commands: load-linked
and store-conditional
. This dual-command approach introduces additional computational overhead to achieve atomicity at the processor level.
To mitigate this extra cost we can use a weak version of compareAndSwap
, accordingly named weakCompareAndSwap
. This variant is specifically designed for use within a while true
loop. It's important to note that weakCompareAndSwap
does not guarantee complete atomicity and may occasionally return false even when the data hasn't changed. In contrast, the regular compareAndSwap
would return true in such cases.
In the provided code snippet, a while loop is executed until the variable test
gets exchanged. The body of the while loop is empty but in more complex examples involving data structures, it typically performs specific tasks within the loop.
let test = ManagedAtomic<Int>(0)
// Load Load | Load Store
while(test.weakCompareExchange(expected: 0, desired: 10, successOrdering: .acquiringAndReleasing, failureOrdering: .acquiring).exchanged) { /* Empty */ }
// Load Store | Store Store
Peterson lock
In the preceding article, we implemented the Peterson lock at the machine level. Now, we will replicate the same functionality using the Swift atomics library. Initially, we introduce atomic variables to manage the lock state, maintaining consistency with the variable names we used in the earlier article by using A
, B
, and turn
:
final class PetersonLock {
enum ThreadId {
case thread0
case thread1
}
private var A = ManagedAtomic<Int>(0)
private var B = ManagedAtomic<Int>(0)
private var turn = ManagedAtomic<Int>(0)
}
A
identifies that the lock has been acquired by thread0 and B
serves the same purpose for thread1. The variable turn
determines which thread is currently executing the lock function. Although the Peterson lock implementation involves only two threads and may not represent a real-world scenario we want to implement it since we did it in the previous article. This decision is motivated by the goal of having a clearer understanding of the Swift constructions we employed here.
To recreate the logic from the previous implementation, we’re using .relaxed
ordering for all atomic operations. Since we want to directly set the memory barrier, consistent in the same way taken in the earlier article. The barrier function takes the parameter acquiringAndReleasing
, implying a combination of acquire (Load Load and Load Store) and release (Store Store and Load Store), aligning with our abstract code from before.
It’s important to note that after the lock completes, we set A
(or B
) to 1, indicating that the lock has been released. Following this, we use atomicMemoryFence(ordering: .releasing)
to empty store buffer and update the memory with the latest value of A
(or B
). This step ensures synchronization and coherence in the memory state.
func lock(threadId: ThreadId) {
if threadId == .thread0 {
// Acquire thread 0
A.store(1, ordering: .relaxed)
turn.store(1, ordering: .relaxed)
// Load Load | Load Store
// Load Store | Store Store
atomicMemoryFence(ordering: .acquiringAndReleasing)
// If the lock was acquired by thread 1 keep spinning
while(B.load(ordering: .relaxed) == 1 && turn.load(ordering: .relaxed) == 1) { }
// Do some work
// Release thread 0
A.store(0, ordering: .relaxed)
// Load Store | Store Store
atomicMemoryFence(ordering: .releasing)
} else {
// Acquire thread 1
B.store(1, ordering: .relaxed)
turn.store(0, ordering: .relaxed)
// Load Load | Load Store
// Load Store | Store Store
atomicMemoryFence(ordering: .acquiringAndReleasing)
// If the lock was acquired by thread 0 keep spinning
while(A.load(ordering: .relaxed) == 1 && turn.load(ordering: .relaxed) == 0) { }
// Do some work
// Relese thread 1
B.store(1, ordering: .relaxed)
// Load Store | Store Store
atomicMemoryFence(ordering: .releasing)
}
}
It was an example of bridging the previous and current articles transitioning from abstract code to an implementation in Swift. Now we can refine the code leveraging all the provided by Swift Atomics memory orderings. Additionally, we want to enhance familiarity with traditional locks by introducing two methods, lock
and unlock
.
In the method lock below notable changes were introduced compared to the previous example with the explicit lock. Primarily, the switch from store
to exchange
is noteworthy, as store
doesn't support the use of acquiringAndReleasing
, but exchange
does.
However, there’s an additional aspect to consider: acquiringAndReleasing
is applied to the turn
variable, signifying it empties the invalidateQueue before and executes the store buffer after the exchange
method. However, we also need acquiring
to be applied to A
and B
as well in the while loops to ensure we get the most recent value. It’s a very intricate process to arrange orderings for the atomic operations and sometimes it’s even more comprehensible to set barriers explicitly.
func lock(threadId: ThreadId) {
if threadId == .thread0 {
// Acquire thread 0
A.store(1, ordering: .relaxed)
// Load Load | Load Store
_ = turn.exchange(1, ordering: .acquiringAndReleasing)
// Load Store | Store Store
// If the lock was acquired by thread 1 keep spinning
// Load Load | Load Store
while(B.load(ordering: .acquiring) == 1 && turn.load(ordering: .relaxed) == 1) { }
} else {
// Acquire thread 1
B.store(1, ordering: .relaxed)
// Load Load | Load Store
_ = turn.exchange(0, ordering: .acquiringAndReleasing)
// Load Store | Store Store
// If the lock was acquired by thread 0 keep spinning
// Load Load | Load Store
while(A.load(ordering: .acquiring) == 1 && turn.load(ordering: .relaxed) == 0) { }
}
}
Another perspective about orderings is whether the variable is shared between threads. If it does we need possibly add some barriers but if it’s not we can keep it relaxed since it will not be read outside of the processor’s cache.
unlock
switches off A
or B
depending on the passedthreadId
. In the earlier example, we didn’t encapsulate unlock
in a separate function; instead, it was embedded at the end of the lock
method. There, it was used another fence to set Store Store barrier. Here, we can use the same logic by passing releasing
to store
:
func unlock(threadId: ThreadId) {
if threadId == .thread0 {
// Release thread 0
A.store(0, ordering: .releasing)
// Load Store | Store Store
} else {
// Release thread 1
B.store(0, ordering: .releasing)
// Load Store | Store Store
}
}
Now when we have implemented lock
and unlock
methods we can see how they work in practice:
func testPetersonLock() {
var test = 0
let lock = PetersonLock()
DispatchQueue.global().async {
lock.lock(threadId: .thread0)
test = 1
sleep(1)
print(test)
lock.unlock(threadId: .thread0)
}
DispatchQueue.global().async {
lock.lock(threadId: .thread1)
test = 2
sleep(1)
print(test)
lock.unlock(threadId: .thread1)
}
}
Result:
1
<- 1 second wait ->
2
or:
2
<- 1 second wait ->
1
Depending on which thread acquires the lock first, it will print the associated number and hold the thread for one second. The second thread will wait until the first one is released. As previously mentioned, this lock is designed to function with only two threads and is not suitable for production development. However, it serves as a straightforward example to gain an understanding of atomics usage.
Today, we discussed how to use atomic operations such as load, store, and RMW operations. y revisiting Peterson’s algorithm from a prior article, we demonstrated its implementation using Swift atomics primitives. In the next article, we will delve into the ABA problem which is a very common challenge for atomic algorithms.