-
Notifications
You must be signed in to change notification settings - Fork 524
Kestrel stops processing requests for a period #2986
Comments
It looks like you're blocking thread pool threads by calling Task.Wait in a bunch of places:
Are you doing synchronous reads/writes anywhere? Is your custom body logic reading from the request body |
Yes, there is one instance of a non-async read (I believe there was a reason to do it this way originally). To ensure there is no thread pool starvation I do have the thread pool set to a very large size. Thanks for the advice. I will remove the synchronous stream read logic and see if it helps. |
Maybe look at the number of items queued to the thread pool:
|
Hmm that may not work for queued tasks. Checking something. |
This is not looking too happy, that's for sure. |
Starvation it is! It's likely Kestrel can't serve requests because it can't even schedule the callback to parse the incoming http requests. If you can't change the calling code, you can buffer the incoming body so you won't kill the thread pool. Option 1: Change the Stream reading to be async. Option 2: Buffer the incoming Streams so you can keep the synchronous stream reading code. private static async Task HandleConnection(HttpContext context)
{
var request = context.Request;
var response = context.Response;
var client = UserManager.GetClientFromToken(incomingToken);
if (client == null)
{
response.StatusCode = 403;
return Task.CompletedTask;
}
context.Request.EnableBuffering();
await context.Request.Body.DrainAsync(CancellationToken.None);
context.Request.Body.Seek(0L, SeekOrigin.Begin);
var tcs = new TaskCompletionSource<bool>();
client.SetStreams(request.Body, response.Body, (int)(request.ContentLength ?? 0), tcs);
// the request is read by the client synchronously here.
// the response is processed and written to by a separate worker pool within one second of this call.
// after the worker finishes using the response, it will set tcs.SetResult(true) to complete the returned task below.
await tcs.Task;
} |
DrainAsync is in the Microsoft.AspNetCore.WebUtilities namespace. |
Thanks again for the options. I will make the changes and report back after testing with production workload tomorrow! |
On looking back over the data I collected I noticed that I was graphing the I am also kicking myself for not realising the cause here, as I spend a good month working on a large refactor to async code in another project which encountered similar issues. I have since re-deployed and things are looking to be 100% stable (I converted one It's a wonder in this async world we live in there are no debug level diagnostics to warn the develoeper of threadpool starvation, as it's easy to run in to and hard to diagnose if you don't know what you're looking for. Anyways, all fixed. Happily serving ~2,000req/s with no signs of struggling 👍. |
I have been migrating a .NET Framework 4.x project to .NET core. In the process I ran into an issue with
HttpListener
not working as expected, so migrated my code to use Kestrel.The server application is a high-performance game server that handles request and response bodies in a completely custom way. It processes around 3000req/s. After around 5-30 minutes of correct execution, the server goes into a state where it is no longer processing incoming requests. Eventually it will recover from this state, but by this point clients believe they have lost connections.
I have been debugging over the last few days and have ruled out my code being the direct issue, but I have also been unable to reproduce under non-production load. I did manage to get a core dump in the hanging state and you can find the log of sos EEstack here. I am able to share any further lldb/sos output should it be relevant.
While I fully intend to continue investigation on my own, I want to post this here for visibility in case someone working on the project can guide me in the right direction based on the thread call stacks above.
The project was built against
.NET core 2.1.4
/Kestrel 2.2.0-preview2-35157
on a macOS host usingdotnet publish
, running on an Ubuntu host (ridlinux-x64
).Note that I was seeing similar behaviour before migrating from
HttpListener
, so it is quite possibly something at a lower level or unrelated to Kestrel directly.As the project this is being used in is private, I cannot link to the source, but here is an extract showing how I am using Kestrel:
Thanks in advance for any help with this issue.
The text was updated successfully, but these errors were encountered: