-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(app): non blocking enqueue_batch #7724
Merged
psychedelicious
merged 10 commits into
main
from
psyche/experiment/app/non-blocking-enqueue
Mar 3, 2025
Merged
feat(app): non blocking enqueue_batch #7724
psychedelicious
merged 10 commits into
main
from
psyche/experiment/app/non-blocking-enqueue
Mar 3, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This allows for read and write concurrency without using a global mutex. Operations may still fail they take longer than the busy timeout (5s). If we get a database lock error after waiting 5s for an operation, we have a problem. So, I think it's actually better to use a busy timeout instead of a global mutex. Alternatively, we could add a timeout to the global mutex.
Rely on WAL mode and the busy timeout. Also changed: - Remove extraneous rollbacks when we were only doing a `SELECT` - Remove try/catch blocks that were made extraneous when removing the extraneous rollbacks
This allows tags to be invalidated while mutations are executing, resolving an issue in this situation: - A long-running mutation starts. - A tag is invalidated; for example, user edits a board name, and the boards list query tag is invalidated. - The boards list query isn't fired, and the board name isn't updated. - The long-running mutation finishes. - Finally, the boards list query fires and the board name is updated. This is the "delayed" behaviour. The "immediately" behaviour has the fires requests from tag invalidation immediately, without waiting for all mutations to finish. It may cause extra network requests and stale data if we are mutating a lot of things very quickly. I don't think it will be an issue in practice and the improved responsiveness will be a net benefit.
This reverts commit ddf00bf.
This reverts commit eb6a323.
Note for reviewers - there is very little code added in this PR. The diff is large because of changed indentation. Many extraneous try/except blocks were removed, shifting code a bit. git isn't representing this very well. |
hipsterusername
approved these changes
Mar 3, 2025
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Most service methods in the application are blocking. For example,
enqueue_batch
blocks the main thread - preventing HTTP and socket traffic. When enqueuing very large or complicated batches, this can cause unresponsiveness.To make
enqueue_batch
non-blocking, we can make it async and run it in a thread. However, we have another source of "blocking" - a global mutex on database operations.This global mutex is required because SQLite, by default, does not support concurrent reads and writes. We painstakingly wrap all DB operations in a re-entrant lock to prevent issues with interacting with a locked database.
SQLite has an alternate mode called WAL (write-ahead logging) mode, which supports concurrent reads and allow us to treat writes as concurrent.
(Technically, in WAL mode, SQLite will queue writes using the WAL log as a buffer, so it's not true concurrency. But we can interact with the DB as it if was concurrent.)
We can also enable the SQLite busy timeout at 5s. If we are waiting on an operation for more than 5s, SQLite will error. This is plenty of time for any SQLite operation we may need to do. If something takes longer than that, we have problems and should fail loudly.
The busy timeout ends up being a better failure mode than we had before w/ the global mutex, which had no timeout and could lock up the app indefinitely if something went awry.
So, to make
enqueue_batch
non-blocking, we need to:I also cleaned up many methods that touched the DB. Many read operations caught errors and did a connection rollback, but this is not necessary. Reads do not have anything to roll back and calling rollback is a noop, so we can remove this nonfunctional logic.
There's one draw-back to WAL mode. You cannot open the DB file over the network. For example, if your host machine is different from your client, and you share the DB via SMB, you cannot connect to the DB over SMB from the client.
Long ago, this drawback was the reason to not use WAL mode. But now, I think it is a fine tradeoff. You can still SSH to the host and connect to the DB directly.
Finally, I made a change to our RTKQ implementation fixing an issue where, during a long-running mutation (e.g. enqueuing a big batch), invalidated queries are not immediately re-fetched (e.g. changing a board's name would wait until the enqueue completes before refreshing).
Related Issues / Discussions
n/a
QA Instructions
Try the app. Everything should work. When enqueuing huge, complicated batches, the app should remain responsive.
Thanks to the massive perf improvements to batch prep in #7688, the benefits from making this change will be less impactful as we should never be blocked for long. Nevertheless, this really cleans up our DB code and reduces footgun potential from mishandled mutexes.
Merge Plan
n/a
Checklist
What's New
copy (if doing a release after this PR)