linearise updates more efficiently #9
Labels
domain: Rust
Involves Rust code.
priority: next
I'll probably get to it, eventually.
state: approved
Approved to proceed.
type: fix
Iterations on existing features or infrastructure
work: clear
A known solution is (to be) implemented.
work: complicated
A specific goal already exists, but the path there isn't certain.
Is your feature request related to a problem? Please describe.
Queueing updates in
GlobalSignalsRuntime
currently takes out a global exclusive lock, which is mildly put less than ideal.This action can be made lock-free and barrier-free between concurrent writes.
This reduces read/write synchronisation essentially to multiple SPSC channels which can likely use
compare_exchange
withRelaxed
failure
ordering due to an unbroken series of transaction IDs (see below).That still leaves congestion of the atomic (see below), but considering that shouldn't be called in a tight loop, it might be irrelevant in that the atomic is likely to be committed to a shared cache naturally before it is accessed again.
Describe the solution you'd like
when an update is submitted
Relaxed
(?)fetch_add
to produce a transaction index.No wrapping logic is necessary – even if the counter is increased each nanosecond, it would take close to 600 years for an
u64
to cap out.Release
barrier specific to that storage compartment.when updates are consumed
u64
is used to lock and allocate consumption. (u64::MAX
acts as lock which breaks out of the loop when encountered.) Without changing semantics, the frequency of this check can be reduced based on when the outermost call on a thread's context stack exits, only then trying to perform updates.Acquire
swap
.If the number is equal to the submission counter, undo the lock and exit.
(Conveniently, it's possible to ensure the overall queue sequence never skips a transaction index.)
Describe alternatives you've considered
Is this just an MPSC that respects outside memory barriers and doesn't introduce them itself?
It may exist already, and if not it likely makes for a useful library. Maybe crossbeam-channel or an explicit SPSC channel library could help (or be used for the thread-associated storage if it allows peeking).
Additional context
This is inspired by https://snarfed.org/2019-12-28_39697. Thanks to @snarfed for pointing me in this direction, that was spot-on.
I can't quite use transactional memory unfortunately, but my submitted-updates API can still benefit from a very similar implementation.
I'll label this as both work: clear and work: complicated since the approach is straightforward in theory but packaging it nicely may not be. There may also be an existing implementation, which going to need more research to rule out.
The text was updated successfully, but these errors were encountered: