../

Understanding Memory Order

2022-09-07

C++11 and C11 both added content related to memory order in their new standards, a requirement born from multi-core processors and increasingly large processor caches. In multi-core processors, although multiple cores share main memory (NUMA is not considered here), caches are mutually independent. Thus, modern CPUs have become like distributed systems. And memory order is a very important synchronization mechanism within this distributed system.

The memory orders in the new standards include:

Actually, there is also a “consume” memory order, but mainstream compilers do not have specific implementations for it and automatically treat it as “acquire memory order.” Therefore, consume memory order can be ignored.

First, let’s start with acquire and release. The names of these two operations come from the acquisition and release of mutex locks. Here, Git can be used to aid understanding. Computer main memory can be viewed as a central Git repository, such as GitHub or GitLab; while the caches of various CPUs can be viewed as the local Git repositories of developers distributed across different locations.

When a thread on a CPU core reads or writes memory, it might actually be reading or writing the cache, and these read/write operations (load/store) will not necessarily be immediately synchronized to main memory. At this time, another thread on another CPU core might also be reading or writing the same segment of memory, which leads to inconsistency. If using the Git analogy, a conflict has occurred. In Git operations, we can manually resolve conflicts, but the rapidly changing CPU naturally does not have this mechanism. Once a conflict occurs, overwriting will happen based on the order of arrival, potentially generating race conditions. In severe cases, it might even cause the process to crash.

Acquire and release are similar to pull/push. Among them, acquire is similar to git pull, which pulls down the main memory state prior to the acquire operation, ensuring the current cache state is the latest. Release is similar to git push, which synchronizes modifications made to the cache prior to the release operation to main memory. Acq-rel, as the name implies, applies to atomic operations of “read then write,” pulling before reading and synchronizing after writing. Here, we can guarantee via synchronization and atomic operations that critical operations will not conflict, ensuring there are no race conditions.

However, in architectures like x86/x64, operations on variables in all memory are actually automatically acquire-release; this is called a strong memory model. Conversely, ARM architecture does not have this guarantee; this type of architecture is called a weak memory model. But, even in x86/x64, one cannot let their guard down. The compiler might perform aggressive optimizations on the program, and Load/Store operations might be reordered. Acquire and release tell the compiler that reordering is not allowed here. For example, because acquire needs to guarantee that all reads/writes before the operation are synchronized to main memory, read/write operations after acquire cannot be reordered to before acquire; similarly, read/write operations before release cannot be reordered to after release.

A typical scenario for using acquire-release is the reference counting of smart pointers. When a smart pointer leaves scope, the reference count decreases by 1; if the count reaches zero, it is necessary to destruct and reclaim memory. Here, it is necessary to use acquire-release. First, because memory might need to be reclaimed, it must be guaranteed that operations on other CPU cores are visible to this thread to ensure the reclamation behavior is correct; second, after reclamation is complete, other threads must also be made aware of this; meanwhile, the compiler must be prohibited from arbitrarily optimizing and reordering the read/write order. Therefore, acquire-release is necessary.

Sequentially-consistent is even more rigorous than acquire-release. It not only synchronizes to main memory but also guarantees that caches on all CPU cores are updated, achieving an effect similar to a single core, at the cost of slow speed. To reduce developer confusion, seq-cst is the default behavior for atomic operations in C++11. By contrast, release does not guarantee that all threads see the current thread’s modifications; other threads can only guarantee they will definitely see modifications from before the release when they perform an acquire.

Relaxed, on the other hand, involves absolutely no synchronization; it merely guarantees that the read/write of the variable currently undergoing the atomic operation is atomic. Relaxed is often used for incrementing reference counts by 1. Because acquiring a reference will absolutely not trigger destruction and free operations, it will absolutely not lead to double free or leak; it only requires that the operation on the count variable itself is atomic.

However, atomic operations and memory order remain dangerous operations and require extreme caution. If conditions permit, it is best to use communication to replace sharing; if that is not possible, in the vast majority of cases, mutex and condvar are sufficient.

References


Mistivia - https://mistivia.com