Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysprof: lj_sysprof.c:139: stream_frame_lua: Assertion `func != ((void *)0)' failed. #8594

Closed
olegrok opened this issue Apr 23, 2023 · 0 comments · Fixed by #9388
Closed
Assignees
Labels
bug Something isn't working crash luajit

Comments

@olegrok
Copy link
Collaborator

olegrok commented Apr 23, 2023

Tarantool Enterprise 2.10.6-0-g5d09e81a6-r553-nogc64-debug
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/__w/sdk/sdk/build.sdk/tarantool-2.10/static-build/tarantool-prefix -DENABLE_BACKTRACE=TRUE
Compiler: GNU-9.3.1
C_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/__w/sdk/sdk/tarantool-2.10=. -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/__w/sdk/sdk/tarantool-2.10=. -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

Segmentation fault
  code: SEGV_MAPERR
  addr: 0x8
  context: 0x7f839cdff400
  siginfo: 0x7f839cdff530
  rax      0x0                0
  rbx      0x407ed53c         1082053948
  rcx      0x32               50
  rdx      0x20               32
  rsi      0x4000             16384
  rdi      0x407ed010         1082052624
  rsp      0x7f839cdff9c0     140203249367488
  rbp      0x4185cf00         1099288320
  r8       0x5653589694a2     94915968537762
  r9       0x0                0
  r10      0x7f845239af50     140206291922768
  r11      0xfffffffe         4294967294
  r12      0x0                0
  r13      0x0                0
  r14      0x407ee088         1082056840
  r15      0x565358941833     94915968374835
  rip      0x565358968fae     94915968536494
  eflags   0x10206            66054
  cs       0x33               51
  gs       0x0                0
  fs       0x0                0
  cr2      0x8                8
  err      0x4                4
  oldmask  0x0                0
  trapno   0xe                14
Current time: 1682238900
Please file a bug at http://github.com/tarantool/tarantool/issues
Attempting backtrace... Note: since the server has already crashed, 
this may fail as well
#1  0x5653588abc6d in crash_collect+272
#2  0x5653588ac677 in crash_signal_cb+96
#3  0x7f8452242520 in __sigaction+80
#4  0x565358968fae in lj_alloc_free+3739
#5  0x5653589694d5 in lj_alloc_f+51
#6  0x56535893000a in lj_tab_free+652
#7  0x56535891e91b in gc_sweep+463
#8  0x56535891fd58 in gc_onestep+562
#9  0x56535891ffa8 in lj_gc_step+164
#10 0x56535890e4a0 in lua_pushstring+93
#11 0x5653589419b9 in lj_cf_debug_getlocal+390
#12 0x5653589081a7 in lj_BC_FUNCC+70
#13 0x565358912edb in lua_pcall+845
#14 0x56535888c2c8 in luaT_call+41
#15 0x565358885ddb in lua_fiber_run_f+124
#16 0x5653586469b2 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+30
#17 0x5653588b6007 in fiber_loop+145
#18 0x56535926d824 in coro_init+116
[1]    3425775 IOT instruction (core dumped)  ./init.lua --bootstrap true

Backtrace:

Thread 1 "tarantool" received signal SIGSEGV, Segmentation fault.
0x0000555555981ed7 in tmalloc_large (m=0x40000010, nb=1040) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_alloc.c:1196
1196	/__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_alloc.c: Нет такого файла или каталога.
(gdb) bt
#0  0x0000555555981ed7 in tmalloc_large (m=0x40000010, nb=1040) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_alloc.c:1196
#1  0x0000555555982ea0 in lj_alloc_malloc (msp=0x40000010, nsize=1032) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_alloc.c:1323
#2  0x00005555559844f3 in lj_alloc_f (msp=0x40000010, ptr=0x0, osize=0, nsize=1032) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_alloc.c:1483
#3  0x000055555593b8ef in lj_mem_realloc (L=0x44091790, p=0x0, osz=0, nsz=1032) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_gc.c:860
#4  0x000055555594a4af in newtab (L=0x44091790, asize=129, hbits=0) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_tab.c:138
#5  0x000055555594a5fa in lj_tab_new (L=0x44091790, asize=129, hbits=0) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_tab.c:161
#6  0x000055555594a784 in lj_tab_new_ah (L=0x44091790, a=128, h=0) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_tab.c:170
#7  0x0000555555929c05 in lua_createtable (L=0x44091790, narray=128, nrec=0) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_api.c:731
#8  0x00005555558a420f in luamp_decode (L=0x44091790, cfg=0x40019ad8, data=0x7fff48e7fa30) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/msgpack.c:361
#9  0x00005555558a4232 in luamp_decode (L=0x44091790, cfg=0x40019ad8, data=0x7fff48e7fa30) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/msgpack.c:363
#10 0x00005555558a4232 in luamp_decode (L=0x44091790, cfg=0x40019ad8, data=0x7fff48e7fa30) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/msgpack.c:363
#11 0x00005555558a4232 in luamp_decode (L=0x44091790, cfg=0x40019ad8, data=0x7fff48e7fa30) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/msgpack.c:363
#12 0x00005555558a4232 in luamp_decode (L=0x44091790, cfg=0x40019ad8, data=0x7fff48e7fa30) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/msgpack.c:363
#13 0x0000555555871bae in port_msgpack_dump_lua (base=0x7fff48e7fd10, L=0x44091790, is_flat=true) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/lua/misc.cc:176
#14 0x000055555585d696 in port_dump_lua (port=0x7fff48e7fd10, L=0x44091790, is_flat=true) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lib/core/port.h:149
#15 0x000055555585e601 in push_lua_args (L=0x44091790, ctx=0x7fff48e7fca0) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/lua/call.c:356
#16 0x000055555585e6b6 in execute_lua_call (L=0x44091790) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/lua/call.c:383
#17 0x00005555559231a7 in lj_BC_FUNCC () at buildvm_x86.dasc:811
#18 0x000055555592dedb in lua_pcall (L=0x44091790, nargs=1, nresults=-1, errfunc=0) at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/luajit/src/lj_api.c:1163
#19 0x00005555558a72c8 in luaT_call (L=0x44091790, nargs=1, nreturns=-1) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lua/utils.c:497
#20 0x000055555585f1a2 in box_process_lua (handler=HANDLER_CALL, ctx=0x7fff48e7fca0, ret=0x7fff48e7fe40)
    at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/lua/call.c:644
#21 0x000055555585f26d in box_lua_call (
    name=0x7fff54a9a1ce "__netbox_call_with_fiber_storage!\223\204\247account\202\254is_anonymousãstr\251anonymous\253routing_key\260MarketDataReport\245start\313A\331\021;\330ڙ\002\242id\331$156f796c-d60a-4728-aad0-9c66e24e5f79\263vshard.storage.call\224\315&\224\245write\260vshard_proxy"..., name_len=32, args=0x7fff48e7fd10, 
    ret=0x7fff48e7fe40) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/lua/call.c:678
#22 0x00005555557b9600 in box_process_call (request=0x7fffed414d58, port=0x7fff48e7fe40) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/call.c:180
#23 0x000055555566c068 in tx_process_call (m=0x7fffed414cd0) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/box/iproto.cc:2155
#24 0x00005555558d7adf in cmsg_deliver (msg=0x7fffed414cd0) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lib/core/cbus.c:375
#25 0x00005555558d8c0c in fiber_pool_f (ap=0x7fff4a403ac8) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lib/core/fiber_pool.c:64
#26 0x00005555556619b2 in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=0x5555558d8a56 <fiber_pool_f>, ap=0x7fff4a403ac8)
    at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lib/core/fiber.h:1022
#27 0x00005555558d1007 in fiber_loop (data=0x0) at /__w/sdk/sdk/tarantool-2.10/tarantool/src/lib/core/fiber.c:921
#28 0x0000555556288824 in coro_init () at /__w/sdk/sdk/tarantool-2.10/tarantool/third_party/coro/coro.c:108
local ok, err = misc.sysprof.start({mode = 'L', interval = 25, path = '/home/oleg/Projects/tdg/tdg2/sysprof.bin'})
if not ok then
    error(err)
end

box.ctl.on_shutdown(function()
    misc.sysprof.stop()
end)

Initially it failed with stacktrace above. On attempt to run Tarantool under gdb I got an assertion.

Also stacktrace from release with debug info build:

Segmentation fault
  code: 128
  addr: (nil)
  context: 0x7ffaf037df40
  siginfo: 0x7ffaf037e070
  rax      0x45dbc450         1172030544
  rbx      0x41129010         1091735568
  rcx      0xfffffffb00000000 -21474836480
  rdx      0xff0b0245dbc438   71788223451087928
  rsi      0x0                0
  rdi      0x2c8              712
  rsp      0x7ffaf037e4e0     140715748746464
  rbp      0x7c8              1992
  r8       0x260              608
  r9       0x41129288         1091736200
  r10      0xffff             65535
  r11      0xc800000000000000 -4035225266123964416
  r12      0xfffffffffffff838 -1992
  r13      0x5                5
  r14      0x0                0
  r15      0x7c0              1984
  rip      0x559e18ddb4c8     94137510376648
  eflags   0x10282            66178
  cs       0x33               51
  gs       0x0                0
  fs       0x0                0
  cr2      0x0                0
  err      0x0                0
  oldmask  0x4000000          67108864
  trapno   0xd                13
Current time: 1682267042
Please file a bug at http://github.com/tarantool/tarantool/issues
Attempting backtrace... Note: since the server has already crashed, 
this may fail as well
#1  0x559e18d707df in crash_signal_cb+175
#2  0x7ffbace42520 in __sigaction+80
#3  0x559e18ddb4c8 in lj_alloc_malloc+1240
#4  0x559e18dbbe93 in lj_mem_realloc+51
#5  0x559e18dd9945 in resolve_symbolnames+357
#6  0x7ffbacf74ed0 in dl_iterate_phdr+416
#7  0x559e18dd9fce in lj_symtab_dump_newc+46
#8  0x559e18dda1ca in stream_host+42
#9  0x559e18dda3ab in sysprof_signal_handler+91
#10 0x7ffbace42520 in __sigaction+80
#11 0x559e18ddc63c in lj_alloc_free+1116
#12 0x559e18dc3715 in lj_tab_free+245
#13 0x559e18dba7ee in gc_sweep+158
#14 0x559e18dbb3fb in gc_onestep+75
#15 0x559e18dbbb84 in lj_gc_step+84
#16 0x559e18db5155 in lua_pushstring+133
#17 0x559e18dcc0f3 in lj_cf_debug_getlocal+211
#18 0x559e18db17e3 in lj_BC_FUNCC+70
#19 0x559e18db61a4 in lua_pcall+116
#20 0x559e18d57a3b in luaT_call+11
#21 0x559e18d515dc in lua_fiber_run_f+92
#22 0x559e18b99f0d in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+13
#23 0x559e18d79a5d in fiber_loop+61
#24 0x559e19672f68 in coro_init+72
@olegrok olegrok added crash bug Something isn't working luajit labels Apr 23, 2023
mkokryashkin added a commit to tarantool/luajit that referenced this issue May 4, 2023
Currently, the symtab update is not signal-safe, and
it needs to be fixed. One of the possible solutions is
to perform that update in a VM hook instead of sysprof
signal handler.

This patch adds a temporary fix for the problem, introducing
the `SPS_GUARD` state to the sysprof, which prohibits any
symtab updates, without stoping the sampling process.

Part of tarantool/tarantool#8140
Part of tarantool/tarantool#8594
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jun 5, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jun 5, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jun 7, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jun 7, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jun 8, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 3, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jul 3, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jul 4, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 10, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes build with plain makefile, by adding
the `LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jul 10, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 14, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes build with plain makefile, by adding
the `LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jul 14, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
mkokryashkin added a commit to tarantool/luajit that referenced this issue Jul 17, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding
the `LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Jul 17, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
ligurio added a commit to ligurio/tarantool that referenced this issue Nov 8, 2023
Follows up tarantool#8594

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing
ligurio added a commit to ligurio/tarantool that referenced this issue Nov 8, 2023
Follows up tarantool#8594

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing
ligurio added a commit to ligurio/tarantool that referenced this issue Nov 8, 2023
Follows up tarantool#8594

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing
mkokryashkin added a commit to tarantool/luajit that referenced this issue Nov 8, 2023
Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding
the `LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594
mkokryashkin added a commit to mkokryashkin/tarantool that referenced this issue Nov 8, 2023
sysprof: fix crash during FFUNC stream

Closes tarantool#8594

NO_DOC=LuaJIT bump
NO_TEST=LuaJIT bump
igormunkin pushed a commit to tarantool/luajit that referenced this issue Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution,
which may lead to a sysprof crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State` with
`lj_sysprof_topframe` structure, which contains `top_frame` and `ffid`
properties. `ffid` property makes sense only when the LuaJIT VM state is
set to `FFUNC`. That property is set to the ffid of the fast function
that VM is about to execute.  In the same time, `top_frame` property is
not updated now, so the top frame of the Lua stack can be streamed based
on the ffid, and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding the
`LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594
igormunkin pushed a commit to tarantool/luajit that referenced this issue Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution,
which may lead to a sysprof crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State` with
`lj_sysprof_topframe` structure, which contains `top_frame` and `ffid`
properties. `ffid` property makes sense only when the LuaJIT VM state is
set to `FFUNC`. That property is set to the ffid of the fast function
that VM is about to execute.  In the same time, `top_frame` property is
not updated now, so the top frame of the Lua stack can be streamed based
on the ffid, and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding the
`LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594

Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Reviewed-by: Sergey Bronnikov <sergeyb@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
(cherry picked from commit 285a1b0)
igormunkin pushed a commit to tarantool/luajit that referenced this issue Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution,
which may lead to a sysprof crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State` with
`lj_sysprof_topframe` structure, which contains `top_frame` and `ffid`
properties. `ffid` property makes sense only when the LuaJIT VM state is
set to `FFUNC`. That property is set to the ffid of the fast function
that VM is about to execute.  In the same time, `top_frame` property is
not updated now, so the top frame of the Lua stack can be streamed based
on the ffid, and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding the
`LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594

Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Reviewed-by: Sergey Bronnikov <sergeyb@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
(cherry picked from commit 285a1b0)
igormunkin added a commit to igormunkin/tarantool that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes tarantool#8594
Closes tarantool#8767
Closes tarantool#9339
Part of tarantool#9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit to igormunkin/tarantool that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* test: fix flaky OOM error frame test
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes tarantool#8594
Closes tarantool#8767
Closes tarantool#9339
Part of tarantool#9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit to igormunkin/tarantool that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes tarantool#8594
Closes tarantool#8767
Closes tarantool#9339
Part of tarantool#9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* test: fix flaky OOM error frame test
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes #8594
Closes #8767
Closes #9339
Part of #9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes #8594
Closes #8767
Closes #9339
Part of #9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit that referenced this issue Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect.
* Fix ABC FOLD rule with constants.
* test: add test for conversions folding
* Add NaN check to IR_NEWREF.
* LJ_GC64: Fix lua_concat().
* test: introduce asserts assert_str{_not}_equal
* ci: enable codespell
* cmake: introduce target with codespell
* codehealth: fix typos
* tools: add cli flag to run profile dump parsers
* profilers: purge generation mechanism
* memprof: refactor symbol resolution
* sysprof: fix crash during FFUNC stream
* Fix last commit.
* Print errors from __gc finalizers instead of rethrowing them.
* x86/x64: Fix math.ceil(-0.9) result sign.
* test: fix flaky fix-jit-dump-ir-conv.test.lua
* IR_MIN/IR_MAX is non-commutative due to underlying FPU ops.
* Fix jit.dump() output for IR_CONV.
* Fix FOLD rule for x-0.
* FFI: Fix pragma push stack limit check and throw on overflow.
* Prevent compile of __concat with tailcall to fast function.
* Fix base register coalescing in side trace.
* Fix register mask for stack check in head of side trace.
* x64: Properly fix __call metamethod return dispatch.

Closes #8594
Closes #8767
Closes #9339
Part of #9145

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
igormunkin added a commit to tarantool/luajit that referenced this issue Nov 25, 2023
Often tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus the test added
in the commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to test routine hang for
several environments and if it does not we cannot guarantee that the
desired scenario is tested (only rely on statistics). As a result the
test for the aforementioned patch was disabled for Tarantool CI in the
commit fef60a1 ("test: prevent hanging
Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace> implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof to collect the particular
event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence it's enough to
call <tostring> fast function in the profiled instance (i.e. "tracee").
To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin,
the manager (i.e. "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resumimg the "tracee";

As a result sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594
igormunkin added a commit to tarantool/luajit that referenced this issue Nov 25, 2023
Often tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus the test added
in the commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to test routine hang for
several environments and if it does not we cannot guarantee that the
desired scenario is tested (only rely on statistics). As a result the
test for the aforementioned patch was disabled for Tarantool CI in the
commit fef60a1 ("test: prevent hanging
Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace> implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof to collect the particular
event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence it's enough to
call <tostring> fast function in the profiled instance (i.e. "tracee").
To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin,
the manager (i.e. "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resumimg the "tracee";

As a result sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594
igormunkin added a commit to tarantool/luajit that referenced this issue Nov 25, 2023
Often tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus the test added
in the commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to test routine hang for
several environments and if it does not we cannot guarantee that the
desired scenario is tested (only rely on statistics). As a result the
test for the aforementioned patch was disabled for Tarantool CI in the
commit fef60a1 ("test: prevent hanging
Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace> implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof to collect the particular
event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence it's enough to
call <tostring> fast function in the profiled instance (i.e. "tracee").
To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin,
the manager (i.e. "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resumimg the "tracee";

As a result sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594
igormunkin added a commit to igormunkin/tarantool that referenced this issue Nov 28, 2023
Closes tarantool#9387
Follows up tarantool#7900
Follows up tarantool#8594

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
NO_CHANGELOG=LuaJIT submodule bump
igormunkin added a commit to igormunkin/tarantool that referenced this issue Nov 28, 2023
* test: rewrite sysprof test using managed execution
* test: disable buffering for the C test engine

Closes tarantool#9387
Follows up tarantool#7900
Follows up tarantool#8594

NO_DOC=LuaJIT submodule bump
NO_TEST=LuaJIT submodule bump
NO_CHANGELOG=LuaJIT submodule bump
igormunkin added a commit to tarantool/luajit that referenced this issue Dec 5, 2023
Often, tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus, the test
added in commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects the FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to the test routine hang for
several environments, and even if it does not, we cannot guarantee that
the desired scenario is tested (only rely on statistics). As a result,
the test for the aforementioned patch was disabled for Tarantool CI in
the commit fef60a1 ("test: prevent
hanging Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace>, implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times, we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof collect the particular event
type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to
call <tostring> fast function in the profiled instance (i.e., "tracee").
To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring>
builtin, the manager (i.e., "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resuming the "tracee";

As a result, sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594
igormunkin added a commit to tarantool/luajit that referenced this issue Dec 5, 2023
Often, tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus, the test
added in commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects the FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to the test routine hang for
several environments, and even if it does not, we cannot guarantee that
the desired scenario is tested (only rely on statistics). As a result,
the test for the aforementioned patch was disabled for Tarantool CI in
the commit fef60a1 ("test: prevent
hanging Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace>, implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times, we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof collect the particular event
type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to
call <tostring> fast function in the profiled instance (i.e., "tracee").
To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring>
builtin, the manager (i.e., "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resuming the "tracee";

As a result, sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594
igormunkin added a commit to tarantool/luajit that referenced this issue Dec 6, 2023
Often, tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus, the test
added in commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects the FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to the test routine hang for
several environments, and even if it does not, we cannot guarantee that
the desired scenario is tested (only rely on statistics). As a result,
the test for the aforementioned patch was disabled for Tarantool CI in
the commit fef60a1 ("test: prevent
hanging Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace>, implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times, we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof collect the particular event
type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to
call <tostring> fast function in the profiled instance (i.e., "tracee").
To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring>
builtin, the manager (i.e., "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resuming the "tracee";

As a result, sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
(cherry picked from commit caa9986)
igormunkin added a commit to tarantool/luajit that referenced this issue Dec 6, 2023
Often, tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus, the test
added in commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects the FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to the test routine hang for
several environments, and even if it does not, we cannot guarantee that
the desired scenario is tested (only rely on statistics). As a result,
the test for the aforementioned patch was disabled for Tarantool CI in
the commit fef60a1 ("test: prevent
hanging Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace>, implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times, we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof collect the particular event
type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to
call <tostring> fast function in the profiled instance (i.e., "tracee").
To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring>
builtin, the manager (i.e., "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resuming the "tracee";

As a result, sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
igormunkin added a commit to tarantool/luajit that referenced this issue Dec 6, 2023
Often, tests for sampling profiler require long running loops to be
executed, so a certain situation is likely to occur. Thus, the test
added in commit 285a1b0 ("sysprof: fix
crash during FFUNC stream") expects the FFUNC VM state (and even the
particular instruction to be executed) at the moment when stacktrace is
being collected. Unfortunately, it leads to the test routine hang for
several environments, and even if it does not, we cannot guarantee that
the desired scenario is tested (only rely on statistics). As a result,
the test for the aforementioned patch was disabled for Tarantool CI in
the commit fef60a1 ("test: prevent
hanging Tarantool CI by sysprof test") until the issue is not resolved.

This patch introduces the new approach for testing our sampling profiler
via <ptrace>, implementing precise managed execution of the profiled
instance mentioned in tarantool/tarantool#9387.

Instead of running around <tostring> gazillion times, we accurately step
to the exact place where the issue reproduces and manually emit SIGPROF
to the Lua VM being profiled. The particular approach implemented in
this patch is described below.

As it was mentioned, the test makes sysprof collect the particular event
type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to
call <tostring> fast function in the profiled instance (i.e., "tracee").
To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring>
builtin, the manager (i.e., "tracer") is implemented.

Here are the main steps (see comments and `man 2 ptrace' for more info):
  1. Poison <int 3> instruction as the first instruction at
     <lj_ff_tostring> to stop at the beginning of the fast function;
  2. Resume the "tracee" from the "tracer";
  3. Hit the emitted interruption, restore the original instruction and
     "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
  4. Do the hack 1-3 for <lj_fff_res1>;
  5. Emit SIGPROF while resuming the "tracee";

As a result, sysprof collects the full backtrace with <tostring> fast
function as the topmost frame.

Resolves tarantool/tarantool#9387
Follows up tarantool/tarantool#8594

Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
(cherry picked from commit caa9986)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crash luajit
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants