linearise updates more efficiently #9

Tamschi · 2024-11-22T23:05:05Z

Is your feature request related to a problem? Please describe.
Queueing updates in GlobalSignalsRuntime currently takes out a global exclusive lock, which is mildly put less than ideal.
This action can be made lock-free and barrier-free between concurrent writes.

This reduces read/write synchronisation essentially to multiple SPSC channels which can likely use compare_exchange with Relaxed failure ordering due to an unbroken series of transaction IDs (see below).

That still leaves congestion of the atomic (see below), but considering that shouldn't be called in a tight loop, it might be irrelevant in that the atomic is likely to be committed to a shared cache naturally before it is accessed again.

Describe the solution you'd like

when an update is submitted
- a central atomic u64 counter is incremented via Relaxed(?) fetch_add to produce a transaction index.
  No wrapping logic is necessary – even if the counter is increased each nanosecond, it would take close to 600 years for an u64 to cap out.
- The update/transaction number pair is enqueued in thread-associated storage with a Release barrier specific to that storage compartment.
when updates are consumed
- A second atomic u64 is used to lock and allocate consumption. (u64::MAX acts as lock which breaks out of the loop when encountered.) Without changing semantics, the frequency of this check can be reduced based on when the outermost call on a thread's context stack exits, only then trying to perform updates.
- Locking can be done with an Acquire swap.
  If the number is equal to the submission counter, undo the lock and exit.
- Otherwise, find the enqueued updates in that range and perform them in order. Spinning may be necessary here.
  (Conveniently, it's possible to ensure the overall queue sequence never skips a transaction index.)
- Unlock and loop around, refetching the submission counter. (Memory order?)

Describe alternatives you've considered
Is this just an MPSC that respects outside memory barriers and doesn't introduce them itself?
It may exist already, and if not it likely makes for a useful library. Maybe crossbeam-channel or an explicit SPSC channel library could help (or be used for the thread-associated storage if it allows peeking).

Additional context
This is inspired by https://snarfed.org/2019-12-28_39697. Thanks to @snarfed for pointing me in this direction, that was spot-on.
I can't quite use transactional memory unfortunately, but my submitted-updates API can still benefit from a very similar implementation.

I'll label this as both work: clear and work: complicated since the approach is straightforward in theory but packaging it nicely may not be. There may also be an existing implementation, which going to need more research to rule out.

The text was updated successfully, but these errors were encountered:

Tamschi · 2024-11-26T10:32:23Z

It seems there's no existing unbounded SPSC I can use for this. I should be able to chain ringbuf FIFOs, though, by storing them in a linked list where the producer holds a reference to the first and the consumer to the last.

Producer:
- If the current buffer is full, create a new one with twice the capacity, and store the new node in the previous next with Release. (Ensure that this can't happen after the ringbuf producer is dropped, likely with a barrier in the list node's Drop::drop.)
Consumer:
- When peeking fails, check if the producer is held. If yes, fail. If not, consume the first entry in the buffer list. If the list is now empty, fail while reporting that the channel is closed.

Provide a way to add a new small buffer to trim the overall allocation?

Tamschi added this to `isoprenoid` Nov 22, 2024

Tamschi moved this to Ready in `isoprenoid` Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linearise updates more efficiently #9

linearise updates more efficiently #9

Tamschi commented Nov 22, 2024

Tamschi commented Nov 26, 2024

linearise updates more efficiently #9

linearise updates more efficiently #9

Comments

Tamschi commented Nov 22, 2024

Tamschi commented Nov 26, 2024