Skip to content
This repository was archived by the owner on Dec 19, 2018. It is now read-only.

webHost.Dispose() does not complete #918

Closed
izaaz opened this issue Jan 20, 2017 · 13 comments
Closed

webHost.Dispose() does not complete #918

izaaz opened this issue Jan 20, 2017 · 13 comments

Comments

@izaaz
Copy link

izaaz commented Jan 20, 2017

Hi,

We have a dot net core WebListener service deployed in Service Fabric. When we upgrade the service version, SF tries to stop the application by calling CloseAsync before it upgrades it to the new version. This is our implementation for that.

        Task ICommunicationListener.CloseAsync(CancellationToken cancellationToken)
        {
            etwEventSource.ServiceMessage(serviceContext, "StartClose");
            try
            {
                webHost?.Dispose();
            }
            catch (Exception ex)
            {
                etwEventSource.ServiceMessage(serviceContext, "CloseService-Exception", ex);
                // no op
            }
            etwEventSource.ServiceMessage(serviceContext, "StopClose");

            return Task.FromResult(true);
        }

What we see is, when the service is receiving traffic and when service fabric calls CloseAsync, the dispose function call does not complete (no exception gets thrown) and hence service fabric waits for 15 minutes before killing the application. Where as, when the service is not receiving any traffic, the dispose method completes fine and everything executes as expected.

Any idea why dispose might not be returning when the service is receiving traffic?

Thanks,
Izaaz

Project.json:
"Microsoft.ApplicationInsights.AspNetCore": "1.0.2",
"Microsoft.AspNet.WebApi.Client": "5.2.3",
"Microsoft.AspNet.WebApi.Core": "5.2.3",
"Microsoft.AspNetCore.Mvc": "1.0.1",
"Microsoft.AspNetCore.Owin": "1.0.0",
"Microsoft.AspNetCore.Routing": "1.0.1",
"Microsoft.AspNetCore.Server.Kestrel": "1.0.1",
"Microsoft.AspNetCore.Server.WebListener": "1.1.0",
"Microsoft.ServiceFabric": "5.3.301",
"Microsoft.ServiceFabric.Data": "2.3.301",
"Microsoft.ServiceFabric.Services": "2.3.301"
@Tratcher
Copy link
Member

WebListener may be waiting for current requests to complete. Are there any long running requests active when this happens?

@davidfowl
Copy link
Member

Do you have any logs? Turning the logs up to verbose might help you see what's happening. You could also take a look at the debugger or a dump to see what the active threads are doing.

@izaaz
Copy link
Author

izaaz commented Jan 20, 2017

I actually have a repro with a simple WebListener application. Check out the basic web app here.

https://github.com/izaaz/SFTestApp

I deployed this to my local cluster and generated fake traffic using the following script.

   void Main()
   {
          while(true)
          {
                 Parallel.For(0, 10, (i) =>
                 {
                        var request = WebRequest.Create("http://localhost:8416/api/values");
                        using (var response = request.GetResponse())
                        {
                               Console.WriteLine(response.Headers);
                        }
                 });
          }
   }

When this program is running, try upgrading the SF with a new version. You'll notice the cluster go into warning state.

image

image

Meanwhile, I am going to try taking a dump of the process and see what might be going on.

@davidfowl
Copy link
Member

You shouldn't need a dump right? Just pause the threads and look at the callstacks. Parallel stacks is a great tool for this if you can reproduce it in vs

@izaaz
Copy link
Author

izaaz commented Jan 20, 2017

I wasnt able to attach the debugger to the application. I couldn't find it in the exe list.

Anyways, I took a procdump. Here are the parallel stacks.

image

Is there a symbol server to get the pdb files for Service Fabric?

@masnider
Copy link

Should just be on symbols

@izaaz
Copy link
Author

izaaz commented Jan 20, 2017

@masnider I am not sure if i understood what you meant? It wasn't able to find it in the default "Microsoft Symbol Servers"

@izaaz
Copy link
Author

izaaz commented Jan 20, 2017

Just to add, when I use kestrel instead of weblistener, I don't see the issue anymore.

@n777ty-zz
Copy link

Looks like a deadlock of sorts..

Thread calling webhost.Dispose owns a lock on System.Collections.Generic.Dictionary`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.IService, Microsoft.Extensions.DependencyInjection],[System.Object, mscorlib]]

And is now waiting for requests to drain....

However, at least one request that’s supposed to drain is actually trying to acquire that lock.. so it’s stuck in:

000000001b79ce98 00007ffd977a6c24 [HelperMethodFrame: 000000001b79ce98] System.Threading.Monitor.Enter(System.Object)
000000001b79cf90 00007ffd292341bb DynamicClass.lambda_method(System.Runtime.CompilerServices.Closure, Microsoft.Extensions.DependencyInjection.ServiceProvider)
000000001b79d1f0 00007ffd28eb2964 Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(System.IServiceProvider, System.Type)
000000001b79d230 00007ffd28eb287b Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[System.__Canon, mscorlib]
000000001b79d270 00007ffd29345b4c Microsoft.AspNetCore.Mvc.ObjectResult.ExecuteResultAsync(Microsoft.AspNetCore.Mvc.ActionContext)
000000001b79d2b0 00007ffd293454e9 Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker+d__30.MoveNext()
000000001b79d300 00007ffd293453f5 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker+d__30, Microsoft.AspNetCore.Mvc.Core]](d__30 ByRef) [f:\dd\ndp\clr\src\BCL\system\runtime\compilerservices\AsyncMethodBuilder.cs @ 322]
000000001b79d3b0 00007ffd29345353 Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeResultAsync(Microsoft.AspNetCore.Mvc.IActionResult)
000000001b79d460 00007ffd2933ae3b Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State ByRef, Scope ByRef, System.Object ByRef, Boolean ByRef)
000000001b79d5f0 00007ffd293450a1 Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker+d__28.MoveNext()

@davidfowl
Copy link
Member

@Tratcher it seems like WebListener waits indefinitely on requests to drain. There should be a timeout like in Kestrel.

@davidfowl
Copy link
Member

Opened aspnet/HttpSysServer#298 as this isn't a hosting issue.

@n777ty-zz
Copy link

Pls make it configurable ;)

@davidfowl
Copy link
Member

Unfortunately, I can't think of a great workaround for this one. You'd have to copy the MessagePump code and tweak this method https://github.com/aspnet/HttpSysServer/blob/5fca3b0022a5125e0dad98004abd4e01983f98fe/src/Microsoft.AspNetCore.Server.WebListener/MessagePump.cs#L222-L226.

Then you could register it as the server for your application.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants