-
Notifications
You must be signed in to change notification settings - Fork 524
Kestrel: Deadlocked in SocketOutput? #1278
Comments
@physhi do you pass a cancellation token to any WriteAsync calls? What does your application look like? Also what version of kestrel are you using. edit: Actually that stack looks like the request aborted token is being passed into WriteAsync. |
Yep there's a deadlock... |
I'm using 1.1.0 version of dotnet core running on windows. My application is really a file store with a lot of big files, and I have bandwidth throttling and content range and bunch of content transformations implemented in code. What I do is I try to pass cancellation token as far as possible so that I can terminate request if user has disconnected from the server. Currently, because of this deadlock issue, I've to implement a watchdog process with very aggressive ping times and recycling kestrel as soon as the server stops processing a request. |
The only way to avoid it would be to stop passing a cancellation token. It's something we need to fix for 1.1.1. The problem is that we're disposing the cancellation registration in the write callback under the context lock KestrelHttpServer/src/Microsoft.AspNetCore.Server.Kestrel/Internal/Http/SocketOutput.cs Line 429 in b46e48f
KestrelHttpServer/src/Microsoft.AspNetCore.Server.Kestrel/Internal/Http/SocketOutput.cs Line 299 in b46e48f
|
@physhi Are you manually calling |
Yep, I have code path that calls abort, but looking at my logs, I don't see it being called. |
Putting this in 1.1.1 so that it's triaged with the rest of 1.1.1 items. |
@physhi have you tried not passing in the token? |
@davidfowl I'd tried not passing the token but still the problem persists. |
@physhi can you take a snapshot of the threads when it hangs without the token? |
It's difficult to take snapshot, as I see hangs in production service and it get's recycled as soon as hang is detected. I just know that the service was restarted and that's it. |
The problem is that we're fixing the issue but if no cancellation token is passed into WriteAsync, this won't happen. It's possible there's another hang and what you're experiencing won't be fixed by this fix #1281 |
@davidfowl, Let me deploy the proposed fix and see if it fixes the issue. I'll know if there are not hangs in next 24 hours. |
So I am trying to see if this is related to what I have seen a couple times over the last couple weeks. I have a fairly simple app that is running as an Azure App Service on a single Large Instance. It has sometimes gone more than a week without any Issues serving about 1.5 million pages a day and then will stop processing requests. If I restart the App it will start without any Issues and usually hums along after that. Looking in the log I see the following error just before it stops processing anything. 2017-01-12 21:47:28.036 +00:00 [Warning] Unable to bind to http://localhost:17739 on the IPv6 loopback interface. |
That error is harmless, hard to know if it's the same issue without any more information. A process dump when the application hangs would confirm. |
There seems to be a dead lock in SocketOutput (and kestrel stops accepting new connections). My server does lock up randomly. See below the two sets of threads that may be causing deadlock. I looked at the code and these two sets of stack does seem to have possibility of deadlocking.
I'd taken 2 other dumps and I saw same pair of stacks getting stuck.
Since I can't repro the deadlock, I can't really tell what's causing these threads to get in deadlock. This issue could be related to #1267, but big difference is that in my case CPU goes down to zero.
The text was updated successfully, but these errors were encountered: