-
-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panel performance degredation while running load tests #6502
Comments
My experience is that Panel can handle the initial load of 2-3 users per second per process. Its not been a problem for my use cases so far. But it also means that there are use cases I stay away from currently. |
Thanks for the super helpful load testing framework, I have actually been working on moving the session creation to a thread and am aiming to have a version of that in Panel 1.5.0. |
One quick question after reviewing your document, was the admin page always enabled when you ran the load tests? I'm asking because the admin dashboard itself may be adding non-insignificant overhead. After disabling it here are the timings I get with latest Panel:
The load timings in particular are significantly better, though of course still nowhere near where we want them to be:
|
Okay, my investigations showed multiple things:
|
One more note about threading, the actual bottleneck likely isn't even the session creation itself (particularly for an app as simple as the one we are profiling) but rather that currently all websocket connections are owned by a single thread. This means that when the server has to send data to multiple sessions simultaneously, all the messages are queuing up to be sent to the frontend. This means that while sessions may be created in parallel, as long as the main thread is locked up none of the messages will actually be sent out. |
The solution will likely be to maintain a completely independent thread pool that grabs events ready to be sent off of a queue and dispatches them. |
Try to look at Gradio. They are probably the data app framework that has had biggest scaling due to usage on Hugging Face. My understanding is that every event is sent to queue that is shared across all users to enable scaling. |
You can also look at Streamlit they run every user in separate thread. It solves some problems but creates others from what I've heard. |
The main problem as I see it is that our "template" is quite heavy and custom. Streamlit uses same minimal html template to start web socket. Then rest is sent. It means they can respond to many more initial http requests. |
Their server architectures are very different so I don't think there's much we can steal from them. I have a pretty good idea how we can scale the applications better. Moving session creation and destruction to a thread is one part of it but the bigger part will be to move the websocket writes to a thread.
This isn't really true, or rather isn't what's causing the problem. The problem is that we create a session (i.e. run the user code) before the initial template is sent. You can already work around this by enabling |
I agree server architectures are very different and you cannot copy code. Still I believe there is a good probability of finding ideas and understanding scaling requirements in 2024. |
@armaaar - thanks for putting this together! @philippjfr or @MarcSkovMadsen - is there a recommended way to define "page loaded" in such load tests for Panel? I wanted to try with the latest
If you think k6 is useful for Panel users would be great if you could add a mention to the docs with the best practice to define "page loaded" in a robust way. One more thought on this:
I think this load test is very atypical since for normal Panel apps there's a few MB of JS/CSS loaded once but then cached by the browser, but here possibly k6 constantly comes with clean browser caches and to a certain extent just puts load on Panel as a static file server? |
If I go to https://panel-gallery-dev.holoviz.dsp.anaconda.com/portfolio_analyzer I see page load times vary between sometimes 1 sec and sometimes 10 or 20 seconds. Also I see that 8 MB resources (???) and 2.3 MB transferred but if I have browser cache on only 7 kB are transferred. That's normal behaviour and correct interpretation of the Chrome network tab load timings and download sizes? ![]() |
That server is severely underserved right now unfortunately, we're working on it but for now those numbers may be right. I also had to update the k6 script, will make a PR for an improved benchmark on latest Panel tomorrow or Monday. |
Okay, I tracked down the main issue that was plaguing the particular test case above. Specifically the problem was that during session cleanup Bokeh was calling While not perfect this did make a big difference to the timing:
The average loading time is now well below 1 second and even the p95 is "only" 1.5 seconds. |
Once we tried to measure the performance of Panel apps using K6, we noticed huge degredation in performance for only 10 users using the app simultanously, even after enabling defer loading and refactor pages into classes.
We have created a "Hello, World!" Panel page in isolation and ran our load tests against it, which shows huge performance degredation the more the panel server is hit. We suspect that a huge part of it is from how Panel handles sessions. Apparently, sessions destruction (and maybe initialization as well) is not using threads and is blocking the whole server.
Here is a repository that demostrates the performance degredation of panel app, being hit by 10 users simultanously: https://github.com/armaaar/panel-performance
ALL software version info
Description of expected behavior and the observed behavior
Complete, minimal, self-contained example code that reproduces the issue
Please refer to our test repository here: https://github.com/armaaar/panel-performance. The readme file has instalation, how to run, and results documented.
Screenshots or screencasts of the bug in action
Everything can be found in the repo readme: https://github.com/armaaar/panel-performance/blob/master/README.md
The text was updated successfully, but these errors were encountered: