-
Notifications
You must be signed in to change notification settings - Fork 524
ASP.NET Core MVC,CPUs Become Higher Slowly #2937
Comments
My first thought is that you have a memory leak, and the GC has to work more, or page to disk. Also maybe try to get two traces, one when it's low CPU, and one when CPU is much higher. We can then to a diff and see what changed. |
Hi,@sebastienros, memory is always stable. |
Are you using If you can please share the double traces I suggested you to take. |
Hi,@sebastienros We use HttpClient, |
@goldeneyezhang what transport do you use Sockets or Libuv ? |
@marfenij Sockets |
@gzpbx thanks, so we stay with libuv |
I see a lot of threads is blocked ,use pstack,show epoll_wait pstack 80514 |
Your high CPU trace shows the swapper taking up most of the time spent (if I'm reading this correctly) cc @vancem |
@davidfowl the swapper is actually just the Idle loop. If you look at the functions where the CPU is being spent in swapper you will see | ||||+ kernel.kallsyms!arch_cpu_idle Which I am assuming is where the OS lets CPUS run when it has nothing better. If you filter to just what is going on in the dotnet process, you see that 3/4 of the time is spend in System.private.Corelib (the main framework DLL), but we don't have symbols for methods inside. Including @brianrob who can help with understanding why that did now work. However we should make sure you are following the instructions https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md. In particular you have to set export COMPlus_PerfMapEnabled=1 so that these environment variables get inherited into the dotnet process. If you DIDN'T do this it will have the symptoms that you describe. Case of the variables matters (as always on Linux). Finally there is a potential work-around using a very new (not even documented yet), feature of .NET 2.1 called EventPipe. You use it by setting some the environment variable. export COMPlus_EnableEventPipe=1 Then start the process, then let it run (it is OK that it runs a long time). Then you have to shut the process down cleanly (if you are and ASP.NET app, they have a Ctrl-C hander that does this. You shoud have a *.netperf file generated. |
Ah good to know 😄 |
@vancem we know why we can't get the symbols. Use the export ComPlus_XXXX=1 and then restart supervisor that can't get the symbols , we must use the same shell to execute export and dotnet run. For now , we have to restart the program every day... yes on production ... |
@vancem Are you sure about "export COMPlus_EnableEventPipe=1" when I execute this export thing and then start the process but failed to start the port (can't access from url). The other 2 export commands are OK. And the Ctrl+C doesn't OK either. |
@gzpbx Do you think you could share the application with me? I could try to repro the issue and get the appropriate traces. If possible you can send me a link by email (first.last at microsoft.com) |
@vancem I got the *.netperf file . Phase 1: All user requests on this VM |
@sebastienros, it's not that easy, the application can't run alone, it needs other services (some of them are WCF services). It's an internet web site. |
@gzpbx You seem to have a logger that's blocking thread pool threads on a Task.Wait. What is this?
|
Are you also making outbound http requests on a timer interval? What things does your application do when they are no requests? |
Ok trying to narrow it down:
|
@davidfowl , You seem to have a logger that's blocking thread pool threads on a Task.Wait. What is this? |
@gzpbx It's not really exactly long running from the looks of it: This dispatch suggests that you kick off a work item to the thread pool then wait on it synchronously on another thread. What does this loop look like? BTW I'm not saying this is the problem, I'm just looking for things that are happening off request threads. |
} Hi,@davidfowl , we have a log queue, if no log,then delay. |
Change this: Task.Delay(sleepTime).Wait(); To: Thread.Sleep(sleepTime); Task.Delay is only better if you're going to await it. If you're blocking a thread then I'd recommend sleeping. Where is ProcessItemLoop kicked off? Is it a dedicated Thread? |
Thanks,@davidfowl,we use it like this, Task.Run(() => ProcessItemLoop()); |
For long running work that blocks, you should spawn a thread manually or use Task.Factory.StartNew with TaskCreationOptions.LongRunning. var thread = new Thread(ProcessItemLoop)
{
Name = "Queue Processing Loop",
IsBackground = true
};
thread.Start(); OR Task.Factory.StartNew(ProcessItemLoop, TaskCreationOptions.LongRunning); |
@davidfowl , The fix(change long running task and remove WingonConfig.ReadConfig) doesn't work. The cpu still go higher slowly, so ... |
Can you collect a new profile? |
This time I collect 3 profiles (Not using EventPipe, EventPipe needs Ctrl-C to generate *.netperf file but I need keep the application running in the 3 times collecting ) -
The weird thing is the second and third profiles (basically no user request) there are still many thread have cpu load (sum of sample). |
@gzpbx can you get a memory dump when there's no requests but high CPU? Maybe looking at a dump might reveal something. https://github.com/dotnet/coreclr/blob/21f94f2be12e16bb36187d40c2fa82af9ebfd469/Documentation/botr/xplat-minidump-generation.md |
Yeah, we have a monitor (in zabbix) to see the application is alive or not, 1 request/ 1 minute. I will get a pure No requesting dump. |
@davidfowl The cpu doesn't go down after on request. There 4429856 objects, too many ? Our memory dump => coredump.1047.zip |
OK I have looked at all the data to date. First we are struggling the restrictions with our current capabilities (we know that they are problematic, and indeed we have fixed at least some of them in V2.2 but that does not help us just this minute). I am including @brianrob, to keep me honest. Basically this investigation would be much easier if we had symbols for the framework. You have to do this rather ugly process explained here https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md#resolving-framework-symbols Another work-around would be to suppress the use of precompiled images. If you added the variable
If you collect more traces, please try that. However I think I have figured out quite a bit with the data we already have. The complaint is that there is CPU when there are not requests, and it gets worse with time. Indeed the most useful data the EventPipe data, Process-20485.netperf.zip, which shows that in fact lots of time is being spent in the Timer class firing timers. This was reinforced by data from the last file 152_10_1131.trace.zip which in its 'events' view shows .NET runtime events that fire every time a timer goes off (currently you need to be an expert to know how to use these events). It shows that we are firing a timer 1200 times a second. This is clearly too much, and fixing that will likely fix the issue. Also in addition to the ones that fire, there is so much overhead in the timer queue itself that it implies that there are MANY entries in the queue. One very suspicious possibility is that timers are entered that have PERIODIC times specified. These never die until canceled, and thus could accumulate in the queue over time, leading to EXACTLY the symptom we see. This is my current theory. Some of the timers are doing a 'ReadConfig' but these only go off once a minute, so are not interesting. Indeed non of the callbacks that I see in the trace go of frequently enough to explain things. My current theory is that the callback is so small that it does not show up on during a CPU sample. Anyway we have a interesting line of attack. At this point we should put a breakpoint in the System.Threading.TimerQueue.CallCallback method as well as System.Threading.TimerQueue.UpdateTimer (this is called every time a new entry is added to queue of things that might go off). Lets just see what is being put in the queue. That will probably tell us what we need to know. Vance |
Also, the Timer queue starts at the System.Threading.TimerQueue.Instances. This is an array (one per CPU), of TimerQueues. Each TimerQueue has a m_timers entry that is the linked list of entries. Each entry has a due time (in msec, as recorded by Environment.TickCount) and a callback. Our problem is that something is putting too many entries in the list. Note that once an entry gets into the queue, if it has a periodic time it will stay in the queue until explicitly removed (and UpdateTimer is not called on each new due-time). |
@vancem @davidfowl We can send you our source code If you feel OK . I am not sure the third component (DLLs) use that much Timer or TimerQueue but we don't use that much other DLLs. When I got the coredump.1047.zip file, the application has taken 35,374 api calls. 35,374 is close to 34,890 , is that possible each api call (our application will call an internal WCF service ) generate a Timer or TimeQueue ? |
Do you have any calls to Task.Delay or any CancellationTokens that timeouts? Take a look at the number of CancellationTokenSources or DelayPromise instances (if you have any) |
You could also try to gcroot the timer instances to see what is holding onto them (other than the timer queue itself) |
No Task.Delay (has been removed) but we have CancellationTokens - About "the timer instances to see what is holding onto them" we are working on the coredump.1047.zip, It's real hard to investigate it on linux with only a few sos commands. |
Why aren’t you disposing the cancellation token source after the outgoing http request is complete? |
It does get cleaned up but after the timeout. I’m not sure why you didn’t see the issue before. It’s actually worse on .NET Framework (the timer is more optimized on .NET Core). You end spinning up lots of timers and because you’re not disposing the CTS, they will fire even if the http requests competes. |
You should dispose the CancellationTokenSource anyways. I'm not sure about the WCF client. Is that the generated client? |
Shouldn't wcf clients be handled the same way HttpClient is, reusing its instances? |
Yes, basically the generated but we do some update (e.g. use our entities copy code from WCF Server) |
@gzpbx did you resolve this? |
Yes, you can close this issue. Thank you for your help. |
We deploy Asp.net Core MVC on Linux,CPUs become higher slowly

From Zabbix,we can see
.NET Core SDK (reflecting any global.json):
Version: 2.1.402
Commit: 3599f217f4
Runtime Environment:
OS Name: centos
OS Version: 7
OS Platform: Linux
RID: centos.7-x64
Base Path: /usr/share/dotnet/sdk/2.1.402/
Host (useful for support):
Version: 2.1.4
Commit: 85255dde3e
.NET Core SDKs installed:
2.1.402 [/usr/share/dotnet/sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.1.4 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.4 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.1.4 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
We use perf to collect info ,but have no idea.
sampleTrace.trace.zip
Any ideas on a possible cause?
The text was updated successfully, but these errors were encountered: