Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing yk tracing locations on function prototype #32

Closed
Pavel-Durov opened this issue Jul 25, 2023 · 10 comments
Closed

Missing yk tracing locations on function prototype #32

Pavel-Durov opened this issue Jul 25, 2023 · 10 comments
Assignees

Comments

@Pavel-Durov
Copy link
Contributor

Pavel-Durov commented Jul 25, 2023

Some function prototypes don't have to trace locations when it Lua interpreter frees their references.

Solution:

We need to find a way to store trace locations in a way that will persist through GC cycles so that we can remove (aka drop) them from tracing when functions are freed by GC.

Context:
Lua bytecodes for a function are stored inside this "prototype" struct and therefore it made sense to put our locations in there too so that's what we did as the first implementation.
When GC frees functions, we cannot obtain references to trace locations and cannot drop them via yk_location_drop.

Related Issues: #31, #25

@ltratt
Copy link
Contributor

ltratt commented Jul 25, 2023

[I don't think this has anything to do with the trace compiler per se, although that happens to be the way that we notice that we've not found the right part of yklua to alter.]

@Pavel-Durov Pavel-Durov changed the title Lua GC frees functions while the trace compiler have references to their tracing locations. Lua GC frees tracing locations before releasing them Jul 25, 2023
@Pavel-Durov
Copy link
Contributor Author

@ltratt re-worded

@ltratt
Copy link
Contributor

ltratt commented Jul 25, 2023

Works for me!

@Pavel-Durov Pavel-Durov self-assigned this Jul 25, 2023
@Pavel-Durov
Copy link
Contributor Author

From my understanding, GC clears all the heap-allocated objects that have no reference to them.

In this issue we have f->yklocs with a surprising value of null, and we assumed that it was collected by the GC.

But if it was collected by the GC I would expect f to be cleared as well since f holds a reference to yklocs.

And in the callstack I can see that when we get Segment fault, the GC is just about to clear the f reference 0x929990:

@@ f: 0x929990, f->yklocs: (nil) <------------------- printf just before Segmentation fault.

Program received signal SIGSEGV, Segmentation fault.
0x000000000080808e in luaF_freeproto (L=<optimised out>, f=0x929990) at lfunc.c:280
280	        printf("@@ f->yklocs[i]: %p,\n", f->yklocs[i]);
(gdb) bt
#0  0x000000000080808e in luaF_freeproto (L=<optimised out>, f=0x929990) at lfunc.c:280
#1  0x000000000080a9fe in freeobj (L=0x917e38, o=0x929990) at lgc.c:767
#2  0x00000000008158a7 in deletelist (L=0x917e38, p=0x929990, limit=<optimised out>) at lgc.c:1494
#3  0x00000000008156b4 in luaC_freeallobjects (L=0x917e38) at lgc.c:1512
#4  0x000000000085052d in close_state (L=0x917e38) at lstate.c:276
#5  0x0000000000850bae in lua_close (L=0x917e38) at lstate.c:421
#6  0x00000000007d9cd0 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:663

So it doesn't look like a GC issue to me 🤔

@ltratt
Copy link
Contributor

ltratt commented Jul 25, 2023

My suggestion is that we put to one side that this an interpreter with a VM: we've got a double free and so we need to find all the paths that can call free and work out where we should be calling free ourselves. For example, setting a breakpoint on the function that does the double free might be a start.

@Pavel-Durov
Copy link
Contributor Author

Yklua loads the same function twice but in two different ways.

The first time it loads it as text using luaY_parser and the second time as "pre-compiled" binary using luaU_undump (prototypes are dumped/saved prior to that).

See condition: https://github.com/ykjit/yklua/blob/main/src/ldo.c#L965

When prototype binaries are loaded/undumped - tracing locations are not loaded cause we didn't add any code to save them here: https://github.com/ykjit/yklua/blob/main/src/ldump.c#L213

It looks less and less like a GC issue (I was able to reproduce the same error with GC disabled) :)

@ltratt
Copy link
Contributor

ltratt commented Jul 26, 2023

The first time it loads it as text using luaY_parser and the second time as "pre-compiled" binary using luaU_undump (prototypes are dumped/saved prior to that).

Aha, that sounds promising!

@Pavel-Durov
Copy link
Contributor Author

Another thing to note 🤓

We've seen this behaviour when running lua tests (all.lua file) which shadows the standard dofile function: https://github.com/ykjit/yklua/blob/main/tests/all.lua#L143 - it causes this pre-compiled loading behaviour.

For example:

-- NO missing locations on function prototype Segmentation fault
dofile('main.lua')

-- This code will have missing locations on function prototype Segmentation fault
local dofile = function (n, strip)
  local f = assert(loadfile(n))
  local b = string.dump(f, strip)
  f = assert(load(b))
  return f()
end

dofile('main.lua')

@Pavel-Durov Pavel-Durov changed the title Lua GC frees tracing locations before releasing them Missing yk tracing locations on function prototype Jul 26, 2023
bors bot added a commit that referenced this issue Aug 14, 2023
36: Add function undump locations recovery r=vext01 a=Pavel-Durov

Solves #32

+ Added function prototype location reconstruction as part of function
undump process.
+ Moved yk functionality to lyk.h and lyk.c.

### Note:
We still get errors such as: 

```shell
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:1101: void llvm::ValueHandleBase::RemoveFromUseList(): Assertion `*PrevPtr == this && "List invariant broken"' failed
...
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:1061: void llvm::ValueHandleBase::AddToUseList(): Assertion `Entry && "Value doesn't have any handles?"' failed
...
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:285: llvm::ValueName* llvm::Value::getValueName() const: Assertion `I != Ctx.pImpl->ValueNames.end() && "No name entry found!"' failed.
```
when we run `../src/lua -e"_U=true" all.lua`.


Co-authored-by: Pavel Durov <[email protected]>
bors bot added a commit that referenced this issue Aug 14, 2023
36: Add function undump locations recovery r=ltratt a=Pavel-Durov

Solves #32

+ Added function prototype location reconstruction as part of function
undump process.
+ Moved yk functionality to lyk.h and lyk.c.

### Note:
We still get errors such as: 

```shell
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:1101: void llvm::ValueHandleBase::RemoveFromUseList(): Assertion `*PrevPtr == this && "List invariant broken"' failed
...
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:1061: void llvm::ValueHandleBase::AddToUseList(): Assertion `Entry && "Value doesn't have any handles?"' failed
...
yk-fork/ykllvm/llvm/lib/IR/Value.cpp:285: llvm::ValueName* llvm::Value::getValueName() const: Assertion `I != Ctx.pImpl->ValueNames.end() && "No name entry found!"' failed.
```
when we run `../src/lua -e"_U=true" all.lua`.


Co-authored-by: Pavel Durov <[email protected]>
bors bot added a commit that referenced this issue Aug 25, 2023
64: Track initialised yk locations r=ltratt a=Pavel-Durov

Partially fixes (only for serial compilation)  - #62

# Changes

- Remove reallocarray in favour of calloc
- Change YkLocations to be stored as pointers
- Change lyk interface to reflect its "hooks" functionality
- Move tests to a separate test script and enable serialised all.lua test suite
- Added logging in lyk module for ease of debugging
- Updated readme

`test.sh` was tested for resiliency multiple times:
```shell
$ try_repeat -v 100 sh ./test.sh
``` 
# Issues

## Uninitialised locations memory 
`reallocarray` does not initialise memory with default zero bytes locations as `calloc`. 
Moving to `calloc` and using pointers allowed to check for NULL with default values set at initialisation time.

## Tests

When lua tests are executed one by one as single files it produces different results from when it's executed via `all.lua` test suite.

Example:

```shell

$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./main.lua 
...
../src/lua: ./main.lua:343: assertion failed!
stack traceback:
        [C]: in function 'assert'
        ./main.lua:343: in main chunk
        [C]: in ?
```
When run through `all.lua` it passes:
```shell
$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****
...
***** FILE 'gc.lua'*****
...
```
Since `all.lua` is what we base the stability of yklua, I think we should include it in the tests as is, currently only serialised compilation works.

Running `all.lua` prior to these changes results in error (`main/56c5787799b876f36babbae24e9afc025806b831`):
```
$ YKD_SERIALISE_COMPILATION=1 gdb -ex 'r' -ex 'bt' --args ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****

Program received signal SIGSEGV, Segmentation fault.
core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
2575    /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs: No such file or directory.
#0  core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
#1  alloc::sync::{impl#33}::drop<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global> (self=0x7fffffffb410)
    at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/alloc/src/sync.rs:2370
#2  0x00007ffff7b0d4ab in core::ptr::drop_in_place<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#3  0x00007ffff7b0067e in core::mem::drop<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#4  0x00007ffff7b121f0 in ykrt::location::{impl#1}::drop (self=0x7fffffffb468) at ykrt/src/location.rs:197
#5  0x00007ffff7affa9b in core::ptr::drop_in_place<ykrt::location::Location> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#6  0x00007ffff7aff6bd in core::mem::drop<ykrt::location::Location> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#7  0x00007ffff7b004bd in ykcapi::yk_location_drop (loc=...) at ykcapi/src/lib.rs:90
#8  0x000000000088aa0e in free_loc (f=<optimised out>, i=<optimised out>, idx=<optimised out>) at lyk.c:66
#9  0x000000000088aba5 in yk_free_locactions (f=0x928cc0) at lyk.c:76
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x0000000000807738 in luaF_freeproto (L=0x916e38, f=0x928cc0) at lfunc.c:276
#11 0x0000000000809fde in freeobj (L=0x916e38, o=0x928cc0) at lgc.c:767
#12 0x0000000000817145 in sweepgen (L=0x916e38, g=<optimised out>, p=<optimised out>, limit=<optimised out>, pfirstold1=<optimised out>) at lgc.c:1106
#13 0x0000000000816713 in youngcollection (L=0x916e38, g=0x916f00) at lgc.c:1239
#14 0x000000000081557d in genstep (L=0x916e38, g=0x916f00) at lgc.c:1434
#15 0x0000000000815044 in luaC_step (L=0x916e38) at lgc.c:1686
#16 0x00000000007e4447 in lua_pushstring (L=0x916e38, s=<optimised out>) at lapi.c:553
#17 0x000000000089e60c in findloader (L=0x916e38, name=0x928df8 "tracegc") at loadlib.c:641
#18 0x000000000089de0a in ll_require (L=0x916e38) at loadlib.c:666
#19 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x89db80 <ll_require>) at ldo.c:506
#20 0x00000000007fe016 in luaD_precall (L=0x916e38, func=0x923780, nresults=1) at ldo.c:569
#21 0x0000000000883f68 in luaV_execute (L=0x916e38, ci=<optimised out>) at lvm.c:1655
#22 0x00000000007feb3b in ccall (L=0x916e38, func=<optimised out>, nResults=<optimised out>, inc=<optimised out>) at ldo.c:609
#23 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x917730, nResults=-1) at ldo.c:627
#24 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#25 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487308) at ldo.c:144
#26 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487308, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#27 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#28 0x00000000007dc623 in docall (L=0x916e38, narg=0, nres=-1) at lua.c:160
#29 0x00000000007dbd24 in handle_script (L=0x916e38, argv=<optimised out>) at lua.c:255
#30 0x00000000007d9fe3 in pmain (L=0x916e38) at lua.c:634
#31 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x7d97f0 <pmain>) at ldo.c:506
#32 0x00000000007fe0c8 in luaD_precall (L=0x916e38, func=0x9176f0, nresults=1) at ldo.c:572
#33 0x00000000007fea7f in ccall (L=0x916e38, func=0x9176f0, nResults=1, inc=<optimised out>) at ldo.c:607
#34 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x9176f0, nResults=1) at ldo.c:627
#35 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#36 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487058) at ldo.c:144
#37 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487058, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#38 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#39 0x00000000007d94b0 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:660
```

That's cause we're freeing uninitialised yk locations.


Co-authored-by: Pavel Durov <[email protected]>
@Pavel-Durov
Copy link
Contributor Author

Was fixed in #36 and #64

1 similar comment
@Pavel-Durov
Copy link
Contributor Author

Was fixed in #36 and #64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants