Main index | Section 9 | Options |
#include <sys/smr.h>
Note that any SMR-protected data structure must be implemented carefully such that operations behave correctly in the absence of mutual exclusion between readers and writers. The data structure must be designed to be lock-free; SMR merely facilitates the implementation, for example by making it safe to follow dangling pointers and by helping avoid the ABA problem.
When shared accesses to and mutations of a data structure can proceed concurrently, writers must take care to ensure that any items removed from the structure are not freed and recycled while readers are accessing them in parallel. This requirement results in a two-phase approach to the removal of items: first, the item is unlinked such that all pointers to the item are removed from the structure, preventing any new readers from observing the item. Then, the writer waits until some mechanism guarantees that no existing readers are still accessing the item. At that point the memory for that item can be freed and reused safely. SMR provides this mechanism: readers may access a lock-free data structure in between calls to the smr_enter() and smr_exit() functions, which together create a read section, and the smr_advance(), smr_poll(), smr_wait(), and smr_synchronize() functions can be used to wait for threads in read sections to finish. All of these functions operate on a smr_t state block which holds both per-CPU and global state. Readers load global state and modify per-CPU state, while writers must scan all per-CPU states to detect active readers. SMR is designed to amortize this cost by batching to give acceptable performance in write-heavy workloads.
In typical usage, a UMA zone (created with the UMA_ZONE_SMR flag or initialized with the uma_zone_set_smr() function) is coupled with a smr_t state block. To insert an item into an SMR-protected data structure, memory is allocated from the zone using the uma_zalloc_smr() function. Insertions and removals are serialized using traditional mutual exclusion and items are freed using the uma_zfree_smr() function. Read-only lookup operations are performed in SMR read sections. uma_zfree_smr() waits for all active readers which may be accessing the freed item to finish their read sections before recycling that item's memory.
If the zone has an associated per-item destructor, it will be invoked at some point when no readers can be accessing a given item. The destructor can be used to release additional resources associated with the item. Note however that there is no guarantee that the destructor will be invoked in a bounded time period.
Internally, SMR maintains a global ‘write sequence’ number. When entering a read section, smr_enter() loads a copy of the write sequence and stores it in per-CPU memory, hence ‘observing’ that value. To exit a read section, this per-CPU memory is overwritten with an invalid value, making the CPU inactive. Writers perform two operations: advancing the write sequence number, and polling all CPUs to see whether active readers have observed a given sequence number. These operations are performed by smr_advance() and smr_poll(), respectively, which do not require serialization between multiple writers.
After a writer unlinks an item from a data structure, it increments the write sequence number and tags the item with the new value returned by smr_advance(). Once all CPUs have observed the new value, the writer can use smr_poll() to deduce that no active readers have access to the unlinked item, and thus the item is safe to recycle. Because this pair of operations is relatively expensive, it is generally a good idea to amortize this cost by accumulating a collection of multiple unlinked items and tagging the entire batch with a target write sequence number.
smr_poll() is a non-blocking operation and returns true only if all active readers are guaranteed to have observed the target sequence number value. smr_wait() is a variant of smr_poll() which waits until all CPUs have observed the target sequence number value. smr_synchronize() combines smr_advance() with smr_wait() to wait for all CPUs to observe a new write sequence number. This is an expensive operation and should only be used if polling cannot be deferred in some way.
The acronym SMR is also used to mean "shingled magnetic recording", a technology used to store data on hard disk drives which requires operating system support. These two uses of the acronym are unrelated.
SMR (9) | January 17, 2023 |
Main index | Section 9 | Options |
Please direct any comments about this manual page service to Ben Bullock. Privacy policy.
“ | A typical Unix /bin or /usr/bin directory contains a hundred different kinds of programs, written by dozens of egotistical programmers, each with its own syntax, operating paradigm, rules of use ... strategies for specifying options, and different sets of constraints. | ” |
— The Unix Haters' handbook |