Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linearise updates more efficiently #9

Open
Tamschi opened this issue Nov 22, 2024 · 1 comment
Open

linearise updates more efficiently #9

Tamschi opened this issue Nov 22, 2024 · 1 comment
Labels
domain: Rust Involves Rust code. priority: next I'll probably get to it, eventually. state: approved Approved to proceed. type: fix Iterations on existing features or infrastructure work: clear A known solution is (to be) implemented. work: complicated A specific goal already exists, but the path there isn't certain.

Comments

@Tamschi
Copy link
Owner

Tamschi commented Nov 22, 2024

Is your feature request related to a problem? Please describe.
Queueing updates in GlobalSignalsRuntime currently takes out a global exclusive lock, which is mildly put less than ideal.
This action can be made lock-free and barrier-free between concurrent writes.

This reduces read/write synchronisation essentially to multiple SPSC channels which can likely use compare_exchange with Relaxed failure ordering due to an unbroken series of transaction IDs (see below).

That still leaves congestion of the atomic (see below), but considering that shouldn't be called in a tight loop, it might be irrelevant in that the atomic is likely to be committed to a shared cache naturally before it is accessed again.

Describe the solution you'd like

  • when an update is submitted

    • a central atomic u64 counter is incremented via Relaxed(?) fetch_add to produce a transaction index.
      No wrapping logic is necessary – even if the counter is increased each nanosecond, it would take close to 600 years for an u64 to cap out.
    • The update/transaction number pair is enqueued in thread-associated storage with a Release barrier specific to that storage compartment.
  • when updates are consumed

    • A second atomic u64 is used to lock and allocate consumption. (u64::MAX acts as lock which breaks out of the loop when encountered.) Without changing semantics, the frequency of this check can be reduced based on when the outermost call on a thread's context stack exits, only then trying to perform updates.
    • Locking can be done with an Acquire swap.
      If the number is equal to the submission counter, undo the lock and exit.
    • Otherwise, find the enqueued updates in that range and perform them in order. Spinning may be necessary here.
      (Conveniently, it's possible to ensure the overall queue sequence never skips a transaction index.)
    • Unlock and loop around, refetching the submission counter. (Memory order?)

Describe alternatives you've considered
Is this just an MPSC that respects outside memory barriers and doesn't introduce them itself?
It may exist already, and if not it likely makes for a useful library. Maybe crossbeam-channel or an explicit SPSC channel library could help (or be used for the thread-associated storage if it allows peeking).

Additional context
This is inspired by https://snarfed.org/2019-12-28_39697. Thanks to @snarfed for pointing me in this direction, that was spot-on.
I can't quite use transactional memory unfortunately, but my submitted-updates API can still benefit from a very similar implementation.

I'll label this as both work: clear and work: complicated since the approach is straightforward in theory but packaging it nicely may not be. There may also be an existing implementation, which going to need more research to rule out.

@Tamschi Tamschi added state: approved Approved to proceed. work: complicated A specific goal already exists, but the path there isn't certain. priority: next I'll probably get to it, eventually. work: clear A known solution is (to be) implemented. type: fix Iterations on existing features or infrastructure domain: Rust Involves Rust code. labels Nov 22, 2024
@Tamschi Tamschi moved this to Ready in `isoprenoid` Nov 22, 2024
@Tamschi
Copy link
Owner Author

Tamschi commented Nov 26, 2024

It seems there's no existing unbounded SPSC I can use for this. I should be able to chain ringbuf FIFOs, though, by storing them in a linked list where the producer holds a reference to the first and the consumer to the last.

  • Producer:
    • If the current buffer is full, create a new one with twice the capacity, and store the new node in the previous next with Release. (Ensure that this can't happen after the ringbuf producer is dropped, likely with a barrier in the list node's Drop::drop.)
  • Consumer:
    • When peeking fails, check if the producer is held. If yes, fail. If not, consume the first entry in the buffer list. If the list is now empty, fail while reporting that the channel is closed.

Provide a way to add a new small buffer to trim the overall allocation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: Rust Involves Rust code. priority: next I'll probably get to it, eventually. state: approved Approved to proceed. type: fix Iterations on existing features or infrastructure work: clear A known solution is (to be) implemented. work: complicated A specific goal already exists, but the path there isn't certain.
Projects
Status: Ready
Development

No branches or pull requests

1 participant