Skip to content
This repository was archived by the owner on Mar 15, 2022. It is now read-only.

GC: reporting for GC references from non-SSA references #33

Closed
AndyAyersMS opened this issue Feb 17, 2015 · 2 comments
Closed

GC: reporting for GC references from non-SSA references #33

AndyAyersMS opened this issue Feb 17, 2015 · 2 comments
Assignees

Comments

@AndyAyersMS
Copy link
Member

Value type GC fields need reporting, but don't naturally fit into SSA. Find a way to report them.

swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Apr 9, 2015
Precise GC using statepoints cannot handle aggregates that contain
managed pointers yet. (Issue dotnet#33)

So, this change adds a check to the reader-postpass, which scans the
generated IR if it contains values of managed-aggregate type.
If so, the compilation fails gracefully by throwing a NYI exception.

This check in ReaderIr.cpp is based on the assertion checks in
RewriteStatepointsForGC.cpp.

[Issue dotnet#32]
swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Apr 9, 2015
Precise GC using statepoints cannot handle aggregates that contain
managed pointers yet. (Issue dotnet#33)

So, this change adds a check to the reader-postpass, which scans the
generated IR if it contains values of managed-aggregate type.
If so, the compilation fails gracefully by throwing a NYI exception.

This check in ReaderIr.cpp is based on the assertion checks in
RewriteStatepointsForGC.cpp.

[Issue dotnet#32]
@swaroop-sridhar
Copy link
Contributor

Here's a summary of the discussion with @AndyAyersMS, @preames and @sanjoy on this topic:

Supporting aggregates containing GC-pointers
Pointers stored within aggregates on the stack need to be reported explicitly (reporting the entire aggregate won’t work.

Philip Reams wrote:
Thinking about this, I see a couple of approaches. They divide into two ways of thinking about the problem.

Option 1 - Extend the stackmap format to encode an object map for the on stack aggregate. Stackmap contains only the pointer to the alloca and a pointer to the appropriate object map. The runtime can translate the object map + alloca entry into specific stack offsets for individual fields if desired.

Option 2 - Report each field within the object directly in the stackmap. No format extensions needed.

For option 2, we could extend the lowering code to distinguish a GEP from an alloca from a "normal" SSA GEP. We already have this distinction for alloca pointers themselves. We might want to introduce a new argument section (for explicit slots?), but this could be prototyped quickly today. I do have some concerns about distinguishing values reliably - i.e. consider a transform which adds a 100 nop GEPs - but in practice, I suspect this would work fine.

For option 1, we could reuse the metadata argument to gc.root. We could interpret a pointer to a alloca with a gc_root marker as being an explicit stack slot (i.e. no mem2reg performed), and use the metadata field to describe the object map. We would need to either define a metadata format for the object map or find a generic way to serialize a metadata encoding into the stack map.

In either case, we probably need to extend Mem2Reg to not treat a statepoint gc arg use as an escaping reference. We want allocas to still become SSA in most cases.

Anyone else have another approach? Or thoughts on these? I mildly tempted to take option 2 out of simple expediency, but I'm mildly concerned some GC might expect to be able to get object maps for the structs on the stack.

Swaroop wrote:

My preference is also for option 2 – reporting each GC-pointer field individually in the stackmap – since it is closer to the CLR stackmap format.
For CLR, if we generate the stackmap per option 1, the LLILC JIT layer will have to translate it to the individual-pointer format.

Andy Ayers Wrote:
The option 1 approach doesn’t work with the current CoreCLR. That is the runtime cant’ figure it out.

Value types on the stack are what we (sometimes jokingly) refer to as “non self-describing aggregates”.

(As opposed to value types on the heap [boxed types] which have a vtable which links to gc info to describe layout, or ref types which always have a vtable which likewise links to gc layout, and currently never live on the stack).

So there’s no way to convey to the runtime that a range of stack locations corresponds to some value type. Instead we must report each stack slot that contains a gc reference. There’s a bunch of code in our jit to ensure that the LLVM type accurately captures this information, so presumably we can either feed GEPs for each GC ref slot in the aggregate or just feed the base address of the aggregate (which would be the alloca) to the statepoint instruction with the right type obtainable (I hope), and from that reconstruct the right info. Since the latter is just one extra arg per live value type with GC refs it seems preferable and gives us a bit of factorization.

Sanjoy Das wrote:
In LLVM, memory (both heap memory and memory alloca'ed on the stack) is untyped, and you cannot expect a "struct %Foo_" to remain a "struct %Foo_" through optimization.

Andy Ayers wrote:
We don't necessarily have to recover the type from the IR, provided we can map the address on the stack back to a particular local/temp that we've created. So perhaps that is more tractable.

If not, we can just report all the slots one by one, but there can be lots of them. Luckily we don't have inplace allocated arrays yet.

The other aspect of this is that we need to ensure that those GC ref slots in the value types are seen as updatable by the calls at statepoints. I think this is covered by escape semantics, which is tolerable. It means we'll think the non-GC fields may change at the call as well, which is unfortunate, but we can live with that for a while.

@swaroop-sridhar swaroop-sridhar self-assigned this Jun 22, 2015
@swaroop-sridhar swaroop-sridhar added this to the Sprint 86 milestone Jun 26, 2015
@swaroop-sridhar swaroop-sridhar modified the milestones: Sprint 87, Sprint 86 Jul 23, 2015
swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Aug 5, 2015
In Reader's post-process phase, add an additional check to
detect allocation of managed aggregates on the stack (alloca).
Conservatively defer to the default Jit for functions containing
such instructions.

Without this check, LLVM silently generated bad code
(until Issue dotnet#33 is fixed).

This change also clarifies a comment in GcInfo.cpp.
swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Aug 5, 2015
In Reader's post-process phase, add an additional check to
detect allocation of managed aggregates on the stack (alloca).
Conservatively defer to the default Jit for functions containing
such instructions.

Without this check, LLVM silently generated bad code
(until Issue dotnet#33 is fixed).

This change also clarifies a comment in GcInfo.cpp.
swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Aug 6, 2015
In Reader's post-process phase, add an additional check to
detect allocation of managed aggregates on the stack (alloca).
Conservatively defer to the default Jit for functions containing
such instructions.

Without this check, LLVM silently generated bad code
(until Issue dotnet#33 is fixed).

This change also clarifies a comment in GcInfo.cpp.
@swaroop-sridhar swaroop-sridhar removed this from the Sprint 87 milestone Aug 13, 2015
michellemcdaniel pushed a commit to michellemcdaniel/llilc that referenced this issue Aug 13, 2015
In Reader's post-process phase, add an additional check to
detect allocation of managed aggregates on the stack (alloca).
Conservatively defer to the default Jit for functions containing
such instructions.

Without this check, LLVM silently generated bad code
(until Issue dotnet#33 is fixed).

This change also clarifies a comment in GcInfo.cpp.
@swaroop-sridhar
Copy link
Contributor

Notes from an earlier discussion:


In MSIL code, the use of gc-aggregates is fairly common.

(1) The common case is that a pointer to the gc-aggregate is live at a call-site.
(2) The gc-aggregates can also be passed by value. This is less common, and in most cases reduces to (1) because of calling convention.

For example:

    public class Num { … }
    public struct Pair
    {
        Num n1;
        Num n2;
        public Pair(int i) { … }
        public int Sum() { … }
    }

    public static int Main() {
        Pair P = new Pair(1);
        return P.Sum();
    }

Here, we silently generate bad code (StackMap) for Main() today -- only a pointer to Pair P is reported as a root, and not the constituent fields.

  define i32 @Program.Test.Main() gc "coreclr" {
  %loc0 = alloca %Program.Pair
  %safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(
                               i64 2882400000, i32 0, void ()*CoreCLR!JIT_PollGC, i32 0, i32 0, i32 0, i32 0)
  %safepoint_token.1 = call i32 (i64, i32, void (%Program.Pair*, i8, i64)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp0Program.Pairi8i64f(
                       i64 2882400000, i32 0, void (%Program.Pair*, i8, i64)* CoreCLR!JIT_MemSet, i32 3, i32 0, %Program.Pair* %loc0, i8 0, i64 16, i32 0, i32 0)
  %1 = addrspacecast %Program.Pair* %loc0 to %Program.Pair addrspace(1)*, !dbg !8
  %safepoint_token.2 = call i32 (i64, i32, void (%Program.Pair addrspace(1)*, i32)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1Program.Pairi32f(
                                 i64 2882400000, i32 0, void (%Program.Pair addrspace(1)*, i32)* Program.Pair..ctor, i32 2, i32 0, 
                                 %Program.Pair addrspace(1)* %1, i32 1, i32 0, i32 0, %Program.Pair addrspace(1)* %1), !dbg !9
  %2 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %safepoint_token.2, i32 9, i32 9), !dbg !10 ; (%1, %1)
  %3 = addrspacecast %Program.Pair* %loc0 to %Program.Pair addrspace(1)*, !dbg !10
  %safepoint_token.3 = call i32 (i64, i32, i32 (%Program.Pair addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i32p1Program.Pairf(
                                 i64 2882400000, i32 0, i32 (%Program.Pair addrspace(1)*)* Program.Pair.Sum, i32 1, i32 0, 
                                 %Program.Pair addrspace(1)* %3, i32 0, i32 0, %Program.Pair addrspace(1)* %3), !dbg !11
  %4 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %safepoint_token.3, i32 8, i32 8), !dbg !12 ; (%3, %3)
  %5 = call i32 @llvm.experimental.gc.result.i32(i32 %safepoint_token.3), !dbg !12
  ret i32 %5, !dbg !12
}

00007ffe`0ec504a0 56              push    rsi
00007ffe`0ec504a1 4883ec30        sub     rsp,30h
00007ffe`0ec504a5 48b884dbc76dfe7f0000 mov rax,offset CoreCLR!JIT_PollGC
00007ffe`0ec504af ffd0            call    rax
00007ffe`0ec504b1 488d742420      lea     rsi,[rsp+20h]
00007ffe`0ec504b6 48b8f05d116efe7f0000 mov rax,offset CoreCLR!JIT_MemSet
00007ffe`0ec504c0 31d2            xor     edx,edx
00007ffe`0ec504c2 41b810000000    mov     r8d,10h
00007ffe`0ec504c8 4889f1          mov     rcx,rsi
00007ffe`0ec504cb ffd0            call    rax
00007ffe`0ec504cd 48b8c800c50efe7f0000 mov rax,offset Program.Pair..ctor(Int32) 00007ffe`0ec504d7 ba01000000      mov     edx,1
00007ffe`0ec504dc 4889f1          mov     rcx,rsi
00007ffe`0ec504df ffd0            call    rax
00007ffe`0ec504e1 48b8d000c50efe7f0000 mov rax,offset Program.Pair.Sum()
00007ffe`0ec504eb 4889f1          mov     rcx,rsi
00007ffe`0ec504ee ffd0            call    rax
00007ffe`0ec504f0 4883c430        add     rsp,30h
00007ffe`0ec504f4 5e              pop     rsi
00007ffe`0ec504f5 c3              ret
    0:000> !gcinfo 00007ffe`0ec504a0
entry point 00007ffe0ec50480
00000030 is a safepoint: 
0000004c is a safepoint: 
00000060 is a safepoint:  +sp+20(interior)
0000006f is a safepoint:  +sp+20(interior)

If I understand correctly, the proposal from last meeting is to modify ReWriteStatepointsForGC phase such that the GC-pointer fields of gc-aggregates are made explicit at statepoints.
That is, in place of

  %safepoint_token.2 = call i32 (i64, i32, void (%Program.Pair addrspace(1)*, i32)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1Program.Pairi32f(
                                 i64 2882400000, i32 0, void (%Program.Pair addrspace(1)*, i32)* Program.Pair..ctor, i32 2, i32 0, 
                                 %Program.Pair addrspace(1)* %1, i32 1, i32 0, i32 0, %Program.Pair addrspace(1)* %1), !dbg !9
  %2 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %safepoint_token.2, i32 9, i32 9), !dbg !10 ; (%1, %1)

We’ll have something like

    %2 = getelementpointer   %Program.Pair, %Program.Pair addrspace(1)* %1, i64 0, i32 0
    %3 = getelementpointer   %Program.Pair, %Program.Pair addrspace(1)* %1, i64 0, i32 1
  %safepoint_token.2 = call i32 (i64, i32, void (%Program.Pair addrspace(1)*, i32)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1Program.Pairi32f(
                                 i64 2882400000, i32 0, void (%Program.Pair addrspace(1)*, i32)* Program.Pair..ctor, i32 2, i32 0, 
                                 %Program.Pair addrspace(1)* %1, i32 1, i32 0, i32 0, %Program.Pair addrspace(1)* %2, %Program.Pair addrspace(1)* %3), !dbg !10
  %4 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %safepoint_token.2, i32 9, i32 9), !dbg !11 ; (%2, %2)
  %5 = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %safepoint_token.2, i32 10, i32 10), !dbg !11 ; (%3, %3)

Here, %1 (the pointer to the gc-aggregate) is not reported separately, since (a) all constituent fields are covered, and (b) the aggregate is on the stack.

The main concerns with this approach is throughput: large number of arguments on statepoint instructions and associated relocations can degrade throughput.
(1) Stack allocated structs with a large number of gc-pointer fields are not common.
(2) However, it is common to have several instances of small gc-aggregates live in a method.

swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Oct 16, 2015
Report GcInfo for GcAggregates, Pinned pointers, GS Cookie
Security Object, and Generics Context.

The stackmap generated via Statepoints only handles SSA values,
and therefore does not track pointers contained within GC-aggregates.
To circumvent this problem,
(1) Gc-aggregate stack allocations are marked frame-escaped to
    keep the allocation live throughout the function.
(2) We report all pointers within GC-aggregates as untracked.

Pinned pointers and other special symbols are handled similarly.

Testing:

NGEN of MsCorlib pass when built for precise GC.

All CoreCLR tests passed when built for Precise GC
(Statepoints inserted and GcInfo generated) but run with Conservative GC.
dotnet-ci.cloudapp.net/job/dotnet_llilc_release_win64_prtest/1111/

Previously passing CoreCLR tests pass with Precise GC (108 tests failed).
http://dotnet-ci.cloudapp.net/job/dotnet_llilc_release_win64_prtest/1112/

Notes:

1) GcInfo, Emitter and Recorder should be separated to different files.
   Will do in a separate checkin (to help view diffs for review)
2) DoInsertStatepoints can be enabled by default (while running with
   Conservative GC). Will do this in a separate change.

Fixes dotnet#33, dotnet#29, dotnet#766, dotnet#767, dotnet#768
swaroop-sridhar added a commit to swaroop-sridhar/llilc that referenced this issue Oct 19, 2015
Issue dotnet#33
The stackmap generated via Statepoints only handles SSA values,
and therefore does not track pointers contained within GC-aggregates.
To circumvent this problem,
(1) Gc-aggregate stack allocations are marked frame-escaped to
keep the allocation live throughout the function.
(2) We report all pointers within GC-aggregates as untracked.

Issue dotnet#29
Pinned pointers are handled similarly.

Issues dotnet#766, dotnet#767, dotnet#768
GS Cookie, Security Object, and Generics Context are tracked in the GcInfo
and escaped. They are not reported in the encoder because we need some
additional information.

Testing:

NGEN of MsCorlib pass when built for precise GC.

All CoreCLR tests passed when built for Precise GC
(Statepoints inserted and GcInfo generated) but run with Conservative GC.
http://dotnet-ci.cloudapp.net/job/dotnet_llilc_release_win64_prtest/1111/

Previously passing CoreCLR tests pass with Precise GC (108 tests failed).
http://dotnet-ci.cloudapp.net/job/dotnet_llilc_release_win64_prtest/1112/

Notes:

1) GcInfo, Emitter and Recorder should be separated to different files.
Will do in a separate checkin (to help view diffs for review)
2) DoInsertStatepoints can be enabled by default (while running with
Conservative GC). Will do this in a separate change.

Fixes dotnet#33, dotnet#29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants