Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

AV in coreclr during long haul with Ngnix #656

Closed
yuwaMSFT opened this issue Feb 25, 2016 · 3 comments
Closed

AV in coreclr during long haul with Ngnix #656

yuwaMSFT opened this issue Feb 25, 2016 · 3 comments
Labels

Comments

@yuwaMSFT
Copy link

The server was running HelloWorldMvc with Nginx as reverse proxy. 500-1000 clients kept sending requests.

(lldb) thread select 1

  • thread Parse names/ports from server urls #1: tid = 0, 0x00007f68f8beff53 libcoreclr.soJIT_CheckedWriteBarrier_End + 3, name = 'dnx', stop reason = signal SIGSEGV frame #0: 0x00007f68f8beff53 libcoreclr.soJIT_CheckedWriteBarrier_End + 3
    -> 0x7f68f8beff53 <JIT_CheckedWriteBarrier_End+3>: movq %rcx, (%rdi)

Several threads were trying to allocate objects at that time, and thread 14 just finished a GC at that time.
Thread 14 just did a GC:

  • thread This is not a binding #14: tid = 13, 0x00007f68fa16208a libpthread.so.0__lll_unlock_wake + 26, stop reason = signal SIGSEGV frame #0: 0x00007f68fa16208a libpthread.so.0__lll_unlock_wake + 26
    frame Parse names/ports from server urls #1: 0x00007f68fa15e8a4 libpthread.so.0_L_unlock_722 + 15 frame #2: 0x00007f68f8f9065e libcoreclr.soCorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings() [inlined] CorUnix::CPalSynchronizationManager::SignalThreadCondition(CorUnix::_ThreadNativeWaitData*) + 72 at synchmanager.cpp:2628
    frame Add Kestrel to the ci server #3: 0x00007f68f8f90616 libcoreclr.soCorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings(this=0x00007f684828f670) + 70 at synchmanager.cpp:4592 frame #4: 0x00007f68f8f8a062 libcoreclr.soCorUnix::CSynchControllerBase::Release() + 130 at synchmanager.hpp:639
    frame Improve data flows #5: 0x00007f68f8f8a02e libcoreclr.soCorUnix::CSynchControllerBase::Release(this=0x00007f6844042508) + 78 at synchcontrollers.cpp:111 frame #6: 0x00007f68f8f88eaa libcoreclr.soSetEvent [inlined] CorUnix::InternalSetEvent(hEvent=, fSetEvent=BOOL at scalar) + 113 at event.cpp:447
    frame Establish which ipv4/ipv6 socket addresses to open #7: 0x00007f68f8f88e39 libcoreclr.soSetEvent(hEvent=<unavailable>) + 57 at event.cpp:336 frame #8: 0x00007f68f8bc6d78 libcoreclr.soThreadSuspend::RestartEE(int, int) + 584

This thread was trying to serve the view from the managed stack:
OS Thread Id: 0x36ce (16)
Child SP IP Call Site
00007F685F7F9B88 00007F68FA15F414 [HelperMethodFrame: 00007f685f7f9b88]
00007F685F7FA240 00007F6880371412 DynamicClass.lambda_method(System.Runtime.CompilerServices.Closure, Microsoft.Extensions.DependencyInjection.ServiceProvider)
00007F685F7FA3F0 00007F687FF3C618 Microsoft.Extensions.DependencyInjection.ServiceProviderExtensions.GetRequiredService(System.IServiceProvider, System.Type)
00007F685F7FA410 00007F688057F3BC Microsoft.AspNetCore.Mvc.Razor.RazorPageActivator+<>c__DisplayClass7_0.b__2(Microsoft.AspNetCore.Mvc.Rendering.ViewContext)
00007F685F7FA440 00007F688057EC65 Microsoft.Extensions.Internal.PropertyActivator1[[System.__Canon, mscorlib]].Activate(System.Object, System.__Canon) 00007F685F7FA470 00007F688057CB69 Microsoft.AspNetCore.Mvc.Razor.RazorPageActivator.Activate(Microsoft.AspNetCore.Mvc.Razor.IRazorPage, Microsoft.AspNetCore.Mvc.Rendering.ViewContext) 00007F685F7FA4B0 00007F688057CA43 Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageCoreAsync(Microsoft.AspNetCore.Mvc.Razor.IRazorPage, Microsoft.AspNetCore.Mvc.Rendering.ViewContext) 00007F685F7FA4E0 00007F688057BF90 Microsoft.AspNetCore.Mvc.Razor.RazorView+<RenderPageAsync>d__14.MoveNext() 00007F685F7FA530 00007F688057B8CB System.Runtime.CompilerServices.AsyncTaskMethodBuilder1[[System.__Canon, mscorlib]].Start[[Microsoft.AspNetCore.Mvc.Razor.RazorView+d__14, Microsoft.AspNetCore.Mvc.Razor]](RenderPageAsync>d__14 ByRef)
00007F685F7FA560 00007F688057B827 Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageAsync(Microsoft.AspNetCore.Mvc.Razor.IRazorPage, Microsoft.AspNetCore.Mvc.Rendering.ViewContext, Boolean)
00007F685F7FA610 00007F688057B525 Microsoft.AspNetCore.Mvc.Razor.RazorView+d__13.MoveNext()
00007F685F7FA650 00007F688057B401 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Mvc.Razor.RazorView+d__13, Microsoft.AspNetCore.Mvc.Razor]](RenderAsync>d__13 ByRef)
00007F685F7FA680 00007F688057B36A Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderAsync(Microsoft.AspNetCore.Mvc.Rendering.ViewContext)
00007F685F7FA710 00007F6880575A1D Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor+d__17.MoveNext()
00007F685F7FA790 00007F68805756B1 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor+d__17, Microsoft.AspNetCore.Mvc.ViewFeatures]](ExecuteAsync>d__17 ByRef)
00007F685F7FA7C0 00007F6880575615 Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor.ExecuteAsync(Microsoft.AspNetCore.Mvc.ActionContext, Microsoft.AspNetCore.Mvc.ViewEngines.IView, Microsoft.AspNetCore.Mvc.ViewFeatures.ViewDataDictionary, Microsoft.AspNetCore.Mvc.ViewFeatures.ITempDataDictionary, System.String, System.Nullable`1)

After GC was done, thread 14 was notifying other 2 waiting thread about this event
(lldb) p ((CPalThread*) 0x00007f684828f500)->synchronizationInfo

m_lPendingSignalingCount = 2
m_rgpthrPendingSignalings = ([0] = 0x00007f68541d4820, [1] = 0x00007f683000c480, [2] = 0x00007f68541d4820, [3] = 0x00007f684828d4c0, [4] = 0x00007f68541d4820, [5] = 0x00007f6830002cf0, [6] = 0x00007f6830005040, [7] = 0x00007f682c002e00, [8] = 0x00007f6830005040, [9] = 0x0000000000000000)

The 2 threads to be notified are:
(lldb) p ((CPalThread_) 0x00007f683000c480)->m_threadId
(SIZE_T) $40 = 14038
(lldb) p ((CPalThread_) 0x00007f68541d4820)->m_threadId
(SIZE_T) $41 = 3403

These thread must have been suspended due to GC. Thread 3403 was still in waiting state, while Thread id 14038 is thread #1, which hit the AV (the managed call stack couldn’t display for this thread for some reason).
(lldb) thread select 1

  • thread Parse names/ports from server urls #1: tid = 0, 0x00007f68f8beff53 libcoreclr.soJIT_CheckedWriteBarrier_End + 3, name = 'dnx', stop reason = signal SIGSEGV frame #0: 0x00007f68f8beff53 libcoreclr.soJIT_CheckedWriteBarrier_End + 3
    -> 0x7f68f8beff53 <JIT_CheckedWriteBarrier_End+3>: movq %rcx, (%rdi)
    (lldb) register read rdi
    rdi = 0x00007f686794dfc0

If examining the address in GDB, it said couldn’t access.
(gdb) x 0x7f686794dfc0
0x7f686794dfc0: Cannot access memory at address 0x7f686794dfc0

If you compare this address with the GC heaps, it’s beyond Gen0, and not in LOH as well. It makes me guessing that it tried to access some memory just released by GC.

It sounds something related to GC behavior, or we were trying to access some object without proper refcount or pin.

@yuwaMSFT
Copy link
Author

@halter73

@muratg muratg added this to the 1.0.0-rc2 milestone Mar 1, 2016
@muratg muratg modified the milestones: 1.0.0, 1.0.0-rc2 Apr 13, 2016
@halter73
Copy link
Member

@muratg I don't think this should still be in the 1.0.0 milestone. There is not stack trace for the AV. No AccessViolationException (maybe this was a first chance exception). Kestrel classes are nowhere on the stack that is provided.

There was also a GC related fix made since this report that could be related.

Unless we can reproduce this issue again, I would close.

@muratg muratg modified the milestones: 1.0.1, 1.0.0 May 18, 2016
@muratg
Copy link
Contributor

muratg commented Jun 28, 2016

Closing. If you have a repro, refile please.

@muratg muratg closed this as completed Jun 28, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants