Lock-Free in Swift: Memory model and Peterson’s algorithm

Alex Shchukin
9 min readDec 28, 2023

--

Today we will continue to explore atomics and the lock-free topic we started in the previous article. We will discuss the memory model that Swift inherited from C++ in its atomic package and reimplement Peterson’s algorithm in Swift.

Memory model

Finding the right place for the memory barrier can be difficult. If we use it too often, it might slow down the code, making it sequential. In C++, there’s a memory model developed for managing memory barriers for us, and Swift has adopted that memory model as well. The main idea is for each operation with atomic primitive, we decide a way we operate with barriers. In some cases, we don’t need barriers at all, and in others, we need some order.

Let’s explore the various types of memory ordering in the model:

  • sequentiallyConsistent — this model imposes the strongest restrictions on the processor. The instructions before the atomic operation with that type can’t be reordered after it and instructions after can’t be reordered before the operation. This ordering aligns with an intuitive code execution flow for the programmer. It employs the heaviest barriers and that impacts the performance.
  • acquiring — combined usage of two barriers Load Load and Load Store which prevent load instructions before the barrier to be reordered with the load and store instructions after the barrier.
  • releasing — combined usage of two barriers Store Store and Load Store which prevent store and load instructions before the barrier to be reordered with store instructions after the barrier.
  • relaxed — it’s a model that doesn’t have any restrictions and allows the processor to reorder operation for optimal performance. In other words, it involves no memory barriers at all.
  • acquiringAndReleasing — a combination of acquiring and releasing.

By leveraging acquiring and releasing we can organize access to the resource so instructions from outside of the guarded by acquiring and releasing code cannot be moved inside and instructions inside of the code block cannot move out from it.

In the following example, we can see the initiation of an atomic variable and load and store operations on it. There is a parameter ordering that allows us to configure the memory ordering for these operations. The available options aresequentiallyConsistent, acquiring, releasing and relaxed.

import Atomics

let test = ManagedAtomic<Int>(10)
print(test.load(ordering: .acquiring))
test.store(15, ordering: .releasing)
print(test.load(ordering: .relaxed))

Result:

10
15

Complier reorder

Another potential source of reorderings is the compiler. To optimize binary it makes heuristic optimizations and reorders. To prevent that we can use special barriers for the compilers. But we don’t need to worry about the compile reorderings because the memory barriers described above prevent compiler reorders as well.

Read-write-modify operations

In the previous article, we delved into read and write atomic operations and briefly mentioned read-write-modify (RMW) operations. As a quick recap, atomic operations cannot be divided into parts during an execution. Read-write-modify operation is another type of atomic operation based on read and write operations. It enables us to perform operations expecting possible changes of data from other threads. They are crucial components of lock-free algorithms.

Compare and swap (CAS) one of the fundamental RMW operations. It has three parameters: address (address of the variable), expected (a value that we expect to be located by the provided address), and new (a new value intended to be set to the given address). If the value located by address matches the expected value CAS sets the new value to the address and returns true otherwise it doesn’t alter anything and returns false. The pseudo-code snippet below illustrates the structure of the compareAndSwap function.

// Atomic
func compareAndSwap<T: Equatable>(address: UnsafeMutablePointer<T>, expected: T, new: T) -> Bool {
if address.pointee == expected {
address.pointee = new
return true
}
return false
}

Here, we observe its representation in atomics package and its application. The deliberately left comments above and below the atomic operations serve the purpose of clarifying where barriers can be positioned based on the insights gained from the previous article.

import Atomics

let test = ManagedAtomic<Int>(0)

// Load Load | Load Store
let (exchanged, original) = test.compareExchange(expected: 0, desired: 1, ordering: .acquiringAndReleasing)
// Load Store | Store Store

// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(exchanged)
print(original)

Result:

1
true
0

In certain scenarios, we want just to exchange the original value for the new one without any extra conditions. This operation can be accomplished using exchange function. The primary distinction between exchange and store lies in the fact that exchange allows us to employ acquiringAndReleasing ordering and we will see it in more detail shortly.

let test = ManagedAtomic<Int>(10)
// Load Load | Load Store
let original = test.exchange(11, ordering: .acquiringAndReleasing)
// Load Store | Store Store

// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(original)

Result:

11
10

Compare and swap operation is one of the key functions used in lock-free algorithms leveraging it we can implement another widely used RMW operation known as fetch and add. This operation is a lock-free variant of the conventional increment operation.

fetchAndAdd takes two parameters: the memory address and the increment value. It returns the value before the increment. Internally, it employs compareAndSwap in the while loop to exchange the current value to the incremented one. This loop will iterate until the incremented value is successfully assigned. Let’s look at its possible implementation:

func fetchAndAdd<T: AdditiveArithmetic>(address: UnsafeMutablePointer<T>, increment: T) -> T {
let current = address.pointee
while(!compareAndSwap(address: address, expected: current, new: current + increment)) { }

return current
}

That is a version of the fetchAndAdd from atomics package:

import Atomics

let test = ManagedAtomic<Int>(10)

// Load Load | Load Store
let original = test.loadThenWrappingIncrement(by: 5, ordering: .acquiringAndReleasing)
// Load Store | Store Store

// Load Load | Load Store
print(test.load(ordering: .acquiring))
print(original)

Result:

15
10

There are different implementations of compareAndSwap on the various processor architectures. In the x86 architecture, it’s implemented as a single atomic command compareAndSwap . However, on ARM architectures, it involves the usage of two atomic commands: load-linked and store-conditional. This dual-command approach introduces additional computational overhead to achieve atomicity at the processor level.

To mitigate this extra cost we can use a weak version of compareAndSwap , accordingly named weakCompareAndSwap. This variant is specifically designed for use within a while true loop. It's important to note that weakCompareAndSwap does not guarantee complete atomicity and may occasionally return false even when the data hasn't changed. In contrast, the regular compareAndSwap would return true in such cases.

In the provided code snippet, a while loop is executed until the variable test gets exchanged. The body of the while loop is empty but in more complex examples involving data structures, it typically performs specific tasks within the loop.

let test = ManagedAtomic<Int>(0)
// Load Load | Load Store
while(test.weakCompareExchange(expected: 0, desired: 10, successOrdering: .acquiringAndReleasing, failureOrdering: .acquiring).exchanged) { /* Empty */ }
// Load Store | Store Store

Peterson lock

In the preceding article, we implemented the Peterson lock at the machine level. Now, we will replicate the same functionality using the Swift atomics library. Initially, we introduce atomic variables to manage the lock state, maintaining consistency with the variable names we used in the earlier article by using A, B, and turn:

final class PetersonLock {
enum ThreadId {
case thread0
case thread1
}

private var A = ManagedAtomic<Int>(0)
private var B = ManagedAtomic<Int>(0)
private var turn = ManagedAtomic<Int>(0)
}

A identifies that the lock has been acquired by thread0 and B serves the same purpose for thread1. The variable turn determines which thread is currently executing the lock function. Although the Peterson lock implementation involves only two threads and may not represent a real-world scenario we want to implement it since we did it in the previous article. This decision is motivated by the goal of having a clearer understanding of the Swift constructions we employed here.

To recreate the logic from the previous implementation, we’re using .relaxed ordering for all atomic operations. Since we want to directly set the memory barrier, consistent in the same way taken in the earlier article. The barrier function takes the parameter acquiringAndReleasing, implying a combination of acquire (Load Load and Load Store) and release (Store Store and Load Store), aligning with our abstract code from before.

It’s important to note that after the lock completes, we set A (or B) to 1, indicating that the lock has been released. Following this, we use atomicMemoryFence(ordering: .releasing) to empty store buffer and update the memory with the latest value of A (or B). This step ensures synchronization and coherence in the memory state.

func lock(threadId: ThreadId) {
if threadId == .thread0 {
// Acquire thread 0
A.store(1, ordering: .relaxed)

turn.store(1, ordering: .relaxed)

// Load Load | Load Store
// Load Store | Store Store
atomicMemoryFence(ordering: .acquiringAndReleasing)

// If the lock was acquired by thread 1 keep spinning
while(B.load(ordering: .relaxed) == 1 && turn.load(ordering: .relaxed) == 1) { }

// Do some work

// Release thread 0
A.store(0, ordering: .relaxed)

// Load Store | Store Store
atomicMemoryFence(ordering: .releasing)
} else {
// Acquire thread 1
B.store(1, ordering: .relaxed)

turn.store(0, ordering: .relaxed)

// Load Load | Load Store
// Load Store | Store Store
atomicMemoryFence(ordering: .acquiringAndReleasing)

// If the lock was acquired by thread 0 keep spinning
while(A.load(ordering: .relaxed) == 1 && turn.load(ordering: .relaxed) == 0) { }

// Do some work

// Relese thread 1
B.store(1, ordering: .relaxed)

// Load Store | Store Store
atomicMemoryFence(ordering: .releasing)
}
}

It was an example of bridging the previous and current articles transitioning from abstract code to an implementation in Swift. Now we can refine the code leveraging all the provided by Swift Atomics memory orderings. Additionally, we want to enhance familiarity with traditional locks by introducing two methods, lock and unlock.

In the method lock below notable changes were introduced compared to the previous example with the explicit lock. Primarily, the switch from store to exchange is noteworthy, as store doesn't support the use of acquiringAndReleasing, but exchange does.

However, there’s an additional aspect to consider: acquiringAndReleasing is applied to the turn variable, signifying it empties the invalidateQueue before and executes the store buffer after the exchange method. However, we also need acquiring to be applied to A and B as well in the while loops to ensure we get the most recent value. It’s a very intricate process to arrange orderings for the atomic operations and sometimes it’s even more comprehensible to set barriers explicitly.

func lock(threadId: ThreadId) {
if threadId == .thread0 {
// Acquire thread 0
A.store(1, ordering: .relaxed)

// Load Load | Load Store
_ = turn.exchange(1, ordering: .acquiringAndReleasing)
// Load Store | Store Store

// If the lock was acquired by thread 1 keep spinning
// Load Load | Load Store
while(B.load(ordering: .acquiring) == 1 && turn.load(ordering: .relaxed) == 1) { }
} else {
// Acquire thread 1
B.store(1, ordering: .relaxed)

// Load Load | Load Store
_ = turn.exchange(0, ordering: .acquiringAndReleasing)
// Load Store | Store Store

// If the lock was acquired by thread 0 keep spinning
// Load Load | Load Store
while(A.load(ordering: .acquiring) == 1 && turn.load(ordering: .relaxed) == 0) { }
}
}

Another perspective about orderings is whether the variable is shared between threads. If it does we need possibly add some barriers but if it’s not we can keep it relaxed since it will not be read outside of the processor’s cache.

unlock switches off A or B depending on the passedthreadId. In the earlier example, we didn’t encapsulate unlock in a separate function; instead, it was embedded at the end of the lock method. There, it was used another fence to set Store Store barrier. Here, we can use the same logic by passing releasing to store:

func unlock(threadId: ThreadId) {
if threadId == .thread0 {
// Release thread 0
A.store(0, ordering: .releasing)
// Load Store | Store Store
} else {
// Release thread 1
B.store(0, ordering: .releasing)
// Load Store | Store Store
}
}

Now when we have implemented lock and unlock methods we can see how they work in practice:

func testPetersonLock() {
var test = 0
let lock = PetersonLock()

DispatchQueue.global().async {
lock.lock(threadId: .thread0)

test = 1
sleep(1)
print(test)

lock.unlock(threadId: .thread0)
}

DispatchQueue.global().async {
lock.lock(threadId: .thread1)

test = 2
sleep(1)
print(test)

lock.unlock(threadId: .thread1)
}
}

Result:

1
<- 1 second wait ->
2

or:

2
<- 1 second wait ->
1

Depending on which thread acquires the lock first, it will print the associated number and hold the thread for one second. The second thread will wait until the first one is released. As previously mentioned, this lock is designed to function with only two threads and is not suitable for production development. However, it serves as a straightforward example to gain an understanding of atomics usage.

Today, we discussed how to use atomic operations such as load, store, and RMW operations. y revisiting Peterson’s algorithm from a prior article, we demonstrated its implementation using Swift atomics primitives. In the next article, we will delve into the ABA problem which is a very common challenge for atomic algorithms.

--

--

No responses yet