Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track initialised yk locations #64

Merged
merged 1 commit into from
Aug 25, 2023
Merged

Track initialised yk locations #64

merged 1 commit into from
Aug 25, 2023

Conversation

Pavel-Durov
Copy link
Contributor

Partially fixes (only for serial compilation) - #62

Changes

  • Remove reallocarray in favour of calloc
  • Change YkLocations to be stored as pointers
  • Change lyk interface to reflect its "hooks" functionality
  • Move tests to a separate test script and enable serialised all.lua test suite
  • Added logging in lyk module for ease of debugging
  • Updated readme

test.sh was tested for resiliency multiple times:

$ try_repeat -v 100 sh ./test.sh

Issues

Uninitialised locations memory

reallocarray does not initialise memory with default zero bytes locations as calloc.
Moving to calloc and using pointers allowed to check for NULL with default values set at initialisation time.

Tests

When lua tests are executed one by one as single files it produces different results from when it's executed via all.lua test suite.

Example:

$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./main.lua 
...
../src/lua: ./main.lua:343: assertion failed!
stack traceback:
        [C]: in function 'assert'
        ./main.lua:343: in main chunk
        [C]: in ?

When run through all.lua it passes:

$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****
...
***** FILE 'gc.lua'*****
...

Since all.lua is what we base the stability of yklua, I think we should include it in the tests as is, currently only serialised compilation works.

Running all.lua prior to these changes results in error (main/56c5787799b876f36babbae24e9afc025806b831):

$ YKD_SERIALISE_COMPILATION=1 gdb -ex 'r' -ex 'bt' --args ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****

Program received signal SIGSEGV, Segmentation fault.
core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
2575    /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs: No such file or directory.
#0  core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
#1  alloc::sync::{impl#33}::drop<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global> (self=0x7fffffffb410)
    at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/alloc/src/sync.rs:2370
#2  0x00007ffff7b0d4ab in core::ptr::drop_in_place<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#3  0x00007ffff7b0067e in core::mem::drop<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#4  0x00007ffff7b121f0 in ykrt::location::{impl#1}::drop (self=0x7fffffffb468) at ykrt/src/location.rs:197
#5  0x00007ffff7affa9b in core::ptr::drop_in_place<ykrt::location::Location> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#6  0x00007ffff7aff6bd in core::mem::drop<ykrt::location::Location> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#7  0x00007ffff7b004bd in ykcapi::yk_location_drop (loc=...) at ykcapi/src/lib.rs:90
#8  0x000000000088aa0e in free_loc (f=<optimised out>, i=<optimised out>, idx=<optimised out>) at lyk.c:66
#9  0x000000000088aba5 in yk_free_locactions (f=0x928cc0) at lyk.c:76
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x0000000000807738 in luaF_freeproto (L=0x916e38, f=0x928cc0) at lfunc.c:276
#11 0x0000000000809fde in freeobj (L=0x916e38, o=0x928cc0) at lgc.c:767
#12 0x0000000000817145 in sweepgen (L=0x916e38, g=<optimised out>, p=<optimised out>, limit=<optimised out>, pfirstold1=<optimised out>) at lgc.c:1106
#13 0x0000000000816713 in youngcollection (L=0x916e38, g=0x916f00) at lgc.c:1239
#14 0x000000000081557d in genstep (L=0x916e38, g=0x916f00) at lgc.c:1434
#15 0x0000000000815044 in luaC_step (L=0x916e38) at lgc.c:1686
#16 0x00000000007e4447 in lua_pushstring (L=0x916e38, s=<optimised out>) at lapi.c:553
#17 0x000000000089e60c in findloader (L=0x916e38, name=0x928df8 "tracegc") at loadlib.c:641
#18 0x000000000089de0a in ll_require (L=0x916e38) at loadlib.c:666
#19 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x89db80 <ll_require>) at ldo.c:506
#20 0x00000000007fe016 in luaD_precall (L=0x916e38, func=0x923780, nresults=1) at ldo.c:569
#21 0x0000000000883f68 in luaV_execute (L=0x916e38, ci=<optimised out>) at lvm.c:1655
#22 0x00000000007feb3b in ccall (L=0x916e38, func=<optimised out>, nResults=<optimised out>, inc=<optimised out>) at ldo.c:609
#23 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x917730, nResults=-1) at ldo.c:627
#24 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#25 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487308) at ldo.c:144
#26 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487308, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#27 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#28 0x00000000007dc623 in docall (L=0x916e38, narg=0, nres=-1) at lua.c:160
#29 0x00000000007dbd24 in handle_script (L=0x916e38, argv=<optimised out>) at lua.c:255
#30 0x00000000007d9fe3 in pmain (L=0x916e38) at lua.c:634
#31 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x7d97f0 <pmain>) at ldo.c:506
#32 0x00000000007fe0c8 in luaD_precall (L=0x916e38, func=0x9176f0, nresults=1) at ldo.c:572
#33 0x00000000007fea7f in ccall (L=0x916e38, func=0x9176f0, nResults=1, inc=<optimised out>) at ldo.c:607
#34 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x9176f0, nResults=1) at ldo.c:627
#35 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#36 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487058) at ldo.c:144
#37 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487058, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#38 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#39 0x00000000007d94b0 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:660

That's cause we're freeing uninitialised yk locations.

@Pavel-Durov Pavel-Durov changed the title Track initialised and uninitialised yk locations Track initialised yk locations Aug 25, 2023
@Pavel-Durov
Copy link
Contributor Author

bors try

@vext01
Copy link
Contributor

vext01 commented Aug 25, 2023

So this fixes the crashes?

@Pavel-Durov
Copy link
Contributor Author

Pavel-Durov commented Aug 25, 2023

It fixes the crashes but only for serialised compilation.

YKD_SERIALISE_COMPILATION=1 ${LUA} -e"_U=true" all.lua WORKS 👍
YKD_SERIALISE_COMPILATION=0 ${LUA} -e"_U=true" all.lua FAILS 👎

@Pavel-Durov
Copy link
Contributor Author

looks like bors try didn't trigger the buildbot...

return LYK_VERBOSE;
}

void print_proto_info(char *msg, Proto *f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I hear Laurie's footsteps approaching...

I'm OK with leaving this debugging code in for now, but once we have everything solved, we should remove it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a convincing reason, we should not include it, because we'll never remember to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also all of the debugging stuff should not be compiled-in for non-devug builds, since they will impact performance.

See NDEBUG in the assert manual page for more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we conditionally compile it in, I'd still like to know: is it going to be useful going forwards? Code, once committed, has a tendency to stay put...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a convincing reason, we should not include it, because we'll never remember to remove it.

I knew he'd say that.

So maybe it's best kept as a private patch @Pavel-Durov ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel your pain @Pavel-Durov.

Since Laurie is letting you keep it in, I'd jump on that. Just make sure it's all guarded #ifndef NDEBUG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👉 5048954

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I wasn't clear. We need to use a flag of our own devising so that we can later differentiate "we added this block" from "Lua uses NDEBUG for other purposes". [Admittedly I only see two NDEBUGs in Lua, but it's greater than zero.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok np

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👉 632330f

@Pavel-Durov
Copy link
Contributor Author

bors try

@bors
Copy link
Contributor

bors bot commented Aug 25, 2023

try

Already running a review

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

If the try succeeds, please squash.

@Pavel-Durov
Copy link
Contributor Author

I don't thing bors try triggered anything...

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

bors try-

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

bors ping

@bors
Copy link
Contributor

bors bot commented Aug 25, 2023

pong

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

bors try

bors bot added a commit that referenced this pull request Aug 25, 2023
@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

@Pavel-Durov Sometimes bors is a bit lazy and needs a try-. I don't know why! It's running now https://ci.soft-dev.org/#/builders/1/builds/2487

@bors
Copy link
Contributor

bors bot commented Aug 25, 2023

try

Build failed:

@Pavel-Durov
Copy link
Contributor Author

failed cause of my typo in buildbot.sh

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

OK, please push a new commit with the fix.

@Pavel-Durov
Copy link
Contributor Author

fixed 👉 f499cb1

@Pavel-Durov
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Aug 25, 2023
@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

Please squash if try succeeds.

@bors
Copy link
Contributor

bors bot commented Aug 25, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@Pavel-Durov Pavel-Durov force-pushed the tracing-initialised-locations branch from f499cb1 to 150d826 Compare August 25, 2023 17:36
Remove reallocarray in favour of calloc
Change YkLocations to be stored as pointers
Change lyk interface to reflect its "hooks" functionality
Move tests to a separate test script and enabled serialised all.lua test suite
Added logging in lyk module for ease of debugging
Updated readme with debugging instructions
@Pavel-Durov Pavel-Durov force-pushed the tracing-initialised-locations branch from 150d826 to ff503eb Compare August 25, 2023 17:40
@Pavel-Durov
Copy link
Contributor Author

squashed 👉 ff503eb

@ltratt
Copy link
Contributor

ltratt commented Aug 25, 2023

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 25, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit b931338 into main Aug 25, 2023
@bors bors bot deleted the tracing-initialised-locations branch August 25, 2023 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants