-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some containers are not being monitored continuously and dissappear from the graphs (For example "OpenProject") #58
Comments
Please see #50 It seems to be fixed by upgrading to the latest version of docker, or possibly just restarting the docker service. I'd love to get to the bottom of this and find the cause if it's fixable on this end. It would be helpful to know:
|
Hello, thank you for such a quick answer. Unfortunately i cannot update since its a synology NAS and custom docker updates bring a lot of other issues. Here are my answers:
2.Last restart was 5 days ago. In any case, its good to know that the latest update of docker fixes it since at some point in the future it will be fixed on my NAS. |
It looks like it could be a problem with the Docker Engine API in version 24. Not all machines on 24 have the issue though, so it's possible that a restart could help. At least for a bit. If someone comes along with the same problem on version 27 then I'll put it to the top of my list. But right now it's low priority since it seems to be fixable with an upgrade. |
Let me know if you can test this again. I'm curious if any of the updates we've made will fix this. Also please try upgrading the Synology Container Manager if they've released an update. Thanks! |
This is happening to me still, I have lots of gaps in both of the docker graphs. I updated to the Beta Synology Container manager (see below version), but the problem still occurs.
|
It's a bug with the old version that Synology uses. I think I figured out a workaround and will try to release it tomorrow. |
Let me know if 0.5.1 fixes this. I can't test with Synology, but I was able to replicate and fix the issue using an LXC container running Docker 24.0.2. |
@ektorasdj Awesome! You might want to unsubscribe from thread notifications while I troubleshoot with Nathan. @nathang21 First of all, impressive resource utilization. You're definitely getting your money's worth. The problem is indeed related to the number of containers. 24.0.2 seems to have a bug with the How many containers are you running in total, and do you see the same number of containers populate each time? If so, how many? Can you try running the agent with the I'm short on time tonight, but tomorrow I can give you a little bash script to verify that it's the same issue I was seeing. |
Sounds like progress! @henrygd Thanks for noticing :) I will say I do run BOINC which uses a limited set of spare resources available to donate compute power towards research efforts, which smooths out the CPU load around 75% or so on average (as configured). I have 60 containers running, and that number is fairly static (ie no dynamically scaling workloads), but is slowly trending up as I discover more things to self host in my addiction/hobby. No rush at all, I just added the env variable, and I see the following logs which looks expected (followed by timeout spam):
|
Cool, I'm going to look into setting that up. I remember doing folding@home on my PS3 back in the day. Here are two bash scripts which should help figure out the problem. They both send 10 requests for container stats to Docker. One does it in sequence and the other in parallel. First run the curl command to make sure it returns the stats properly. I'm using container ID 323201cdd8ac because it was in your logs, but swap it out with something else in curl --unix-socket /var/run/docker.sock -H "Content-Type: application/json" "http://localhost/containers/323201cdd8ac/stats?stream=0&one-shot=1" sequence.sh Save this to It should take 9 seconds to complete. #!/bin/bash
for i in {1..10}
do
curl -s --unix-socket /var/run/docker.sock -H "Content-Type: application/json" "http://localhost/containers/323201cdd8ac/stats?stream=0&one-shot=1"
done parallel.sh Save this to It should take 1 second to complete. Also try changing 10 to 60. It should still take only 1 second. #!/bin/bash
for i in {1..10}
do
curl -s --unix-socket /var/run/docker.sock -H "Content-Type: application/json" "http://localhost/containers/323201cdd8ac/stats?stream=0&one-shot=1" &
done
wait Please take your time, and I don't need the entire output. Just let me know if you see anything different in the timings. Thanks! |
Same yeah, well feel free to reach out if you run into trouble, the linuxserver.io image is the best I think for getting it running in Docker, it emulates the old school desktop app. curl test
I ran them all a few times, I was noticing some different results. Hope this lmks, lmk if you need more tests. sequence test
parallel test 10
parallel test 60:
|
Thanks for doing that. Really strange results. Not sure how making 10 requests could take almost 5 minutes, or why the timings are so spread out. My first thought is to check the health of the containers. Maybe one is stuck in a boot loop or has some other issue that is causing problems. Try running Or maybe run ctop and look for anything irregular. |
Thanks for sharing ctop, really great tool I hadn't seen before. Indeed, I was noticing the past few days that everything was a little slow, I think my disk I/O was overloaded. I've taken significant steps to customize a bunch of my containers to prioritize using RAM instead of writing to disk, as well as reducing writes in general. Everything feels more stable now, however the docker graphs in Beszel are still full of gaps (more gaps than actual data - Edit: Actually the graphs now are improved, more data than gaps, but still frequent gaps), and I just re-ran the same scripts, here is the trimmed output:
Does this look any better? |
I increased the timeout to 2100ms in 0.5.3, so that's probably helping as well. In the next release I'll use two different timeouts depending on version, and bump the older versions to eight seconds or so. That may finally fix it. When I was testing docker 24, it was in an otherwise empty LXC container. So I was seeing the delay, but it was always consistent. In your case the inconsistency may be due to other programs also accessing the API and creating a queue. Which, again, is a bug does not happen in 25+. |
Just an update after the latest release, unfortunately I'm now seeing gaps in all of the graphs, not just the docker ones, which appears to be a regression. Will share logs and more details tomorrow when I'm at my computer. Edit: See attached screenshot + snippet of logs. Let me know if this is helpful or if you need more specific details.
|
I added an env var It looks like you're getting queue pileups that overlap time windows from Beszel requests. I just thought of one other thing to try that I'll put in the next release. |
Cool thanks, I've started playing around with this but so far haven't noticed a big difference. I assume there are dimminition returns of increasing the timeout, but not quite sure how to think about it. I've tried 15s and 30s respectively but both still had gaps (now in all graphs as mentioned previously, not just the docker graphs). |
Let me know if you have any more success with 0.6.2. 🤞 I'd also recommend pausing the system for a bit in case the API queue is really jammed up. If you want to try tweaking |
Try dropping Let me know if you still have gaps after that. I have a couple other things to try. |
No worries, let me know if you start getting gaps in the non-docker metrics again. Or if anything else changes in a significant way. I'll continue tweaking things on this end. |
Still seeing occasional gaps in non-docker metrics and docker metrics, the latter being much more common. Overall it's definitely much more stable, and may be related to the I/O bottlenecks on my system discussed previously (which I may explore continuing to optimize). |
Thanks for the update. I'll have another possible fix in the next release. I think the most likely explanation is that another service you host is also requesting info from Docker. Every one minute, the agent tries to ask docker for the stats. Sometimes timing is bad and it hits a queue that was made by the other service. If this is the case, it may be impossible to fully fix, but we can try to make it as stable as possible. |
@nathang21 I decided not to include the tweaks for old docker versions in the 0.7.0 release because I got sidetracked with localization and didn't have time to test it as much as I wanted to. If you want to test manually, you can do this:
# image: "henrygd/beszel-agent:latest"
image: "beszel-agent:latest
wget https://henrygd-assets.b-cdn.net/beszel/bin/beszel-agent-image.zst
docker load -i ./beszel-agent-image.zst
docker compose up -d This may also be a way to test different things in a more controlled way, rather than including the changes in every new release. |
Hello,
Thank you for this nice tool. Unfortunately, I have some containers (created through Portainer) that are not continuously showing in the graphs. For example, check OpenProject, which is continuously running with no issues, but it disappears from the graphs for some time.

The text was updated successfully, but these errors were encountered: