New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sysprof: lj_sysprof.c:139: stream_frame_lua: Assertion `func != ((void *)0)' failed. #8594
Comments
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
May 4, 2023
Currently, the symtab update is not signal-safe, and it needs to be fixed. One of the possible solutions is to perform that update in a VM hook instead of sysprof signal handler. This patch adds a temporary fix for the problem, introducing the `SPS_GUARD` state to the sysprof, which prohibits any symtab updates, without stoping the sampling process. Part of tarantool/tarantool#8140 Part of tarantool/tarantool#8594
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jun 5, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jun 5, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jun 7, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jun 7, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jun 8, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 3, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jul 3, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 4, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jul 4, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 10, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes build with plain makefile, by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jul 10, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 14, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes build with plain makefile, by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jul 14, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Jul 17, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Jul 17, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
ligurio
added a commit
to ligurio/tarantool
that referenced
this issue
Nov 8, 2023
Follows up tarantool#8594 NO_CHANGELOG=testing NO_DOC=testing NO_TEST=testing
ligurio
added a commit
to ligurio/tarantool
that referenced
this issue
Nov 8, 2023
Follows up tarantool#8594 NO_CHANGELOG=testing NO_DOC=testing NO_TEST=testing
ligurio
added a commit
to ligurio/tarantool
that referenced
this issue
Nov 8, 2023
Follows up tarantool#8594 NO_CHANGELOG=testing NO_DOC=testing NO_TEST=testing
mkokryashkin
added a commit
to tarantool/luajit
that referenced
this issue
Nov 8, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594
mkokryashkin
added a commit
to mkokryashkin/tarantool
that referenced
this issue
Nov 8, 2023
sysprof: fix crash during FFUNC stream Closes tarantool#8594 NO_DOC=LuaJIT bump NO_TEST=LuaJIT bump
igormunkin
pushed a commit
to tarantool/luajit
that referenced
this issue
Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594
igormunkin
pushed a commit
to tarantool/luajit
that referenced
this issue
Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594 Reviewed-by: Sergey Kaplun <skaplun@tarantool.org> Reviewed-by: Sergey Bronnikov <sergeyb@tarantool.org> Signed-off-by: Igor Munkin <imun@tarantool.org> (cherry picked from commit 285a1b0)
igormunkin
pushed a commit
to tarantool/luajit
that referenced
this issue
Nov 13, 2023
Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594 Reviewed-by: Sergey Kaplun <skaplun@tarantool.org> Reviewed-by: Sergey Bronnikov <sergeyb@tarantool.org> Signed-off-by: Igor Munkin <imun@tarantool.org> (cherry picked from commit 285a1b0)
igormunkin
added a commit
to igormunkin/tarantool
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes tarantool#8594 Closes tarantool#8767 Closes tarantool#9339 Part of tarantool#9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
igormunkin
added a commit
to igormunkin/tarantool
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * test: fix flaky OOM error frame test * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes tarantool#8594 Closes tarantool#8767 Closes tarantool#9339 Part of tarantool#9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
igormunkin
added a commit
to igormunkin/tarantool
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes tarantool#8594 Closes tarantool#8767 Closes tarantool#9339 Part of tarantool#9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
This was referenced Nov 21, 2023
igormunkin
added a commit
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * test: fix flaky OOM error frame test * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes #8594 Closes #8767 Closes #9339 Part of #9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
igormunkin
added a commit
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes #8594 Closes #8767 Closes #9339 Part of #9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
igormunkin
added a commit
that referenced
this issue
Nov 21, 2023
* Mark CONV as non-weak, to prevent elimination of its side-effect. * Fix ABC FOLD rule with constants. * test: add test for conversions folding * Add NaN check to IR_NEWREF. * LJ_GC64: Fix lua_concat(). * test: introduce asserts assert_str{_not}_equal * ci: enable codespell * cmake: introduce target with codespell * codehealth: fix typos * tools: add cli flag to run profile dump parsers * profilers: purge generation mechanism * memprof: refactor symbol resolution * sysprof: fix crash during FFUNC stream * Fix last commit. * Print errors from __gc finalizers instead of rethrowing them. * x86/x64: Fix math.ceil(-0.9) result sign. * test: fix flaky fix-jit-dump-ir-conv.test.lua * IR_MIN/IR_MAX is non-commutative due to underlying FPU ops. * Fix jit.dump() output for IR_CONV. * Fix FOLD rule for x-0. * FFI: Fix pragma push stack limit check and throw on overflow. * Prevent compile of __concat with tailcall to fast function. * Fix base register coalescing in side trace. * Fix register mask for stack check in head of side trace. * x64: Properly fix __call metamethod return dispatch. Closes #8594 Closes #8767 Closes #9339 Part of #9145 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Nov 25, 2023
Often tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus the test added in the commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to test routine hang for several environments and if it does not we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace> implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof to collect the particular event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence it's enough to call <tostring> fast function in the profiled instance (i.e. "tracee"). To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, the manager (i.e. "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resumimg the "tracee"; As a result sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Nov 25, 2023
Often tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus the test added in the commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to test routine hang for several environments and if it does not we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace> implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof to collect the particular event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence it's enough to call <tostring> fast function in the profiled instance (i.e. "tracee"). To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, the manager (i.e. "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resumimg the "tracee"; As a result sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Nov 25, 2023
Often tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus the test added in the commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to test routine hang for several environments and if it does not we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace> implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof to collect the particular event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence it's enough to call <tostring> fast function in the profiled instance (i.e. "tracee"). To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, the manager (i.e. "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resumimg the "tracee"; As a result sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594
igormunkin
added a commit
to igormunkin/tarantool
that referenced
this issue
Nov 28, 2023
Closes tarantool#9387 Follows up tarantool#7900 Follows up tarantool#8594 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump NO_CHANGELOG=LuaJIT submodule bump
igormunkin
added a commit
to igormunkin/tarantool
that referenced
this issue
Nov 28, 2023
* test: rewrite sysprof test using managed execution * test: disable buffering for the C test engine Closes tarantool#9387 Follows up tarantool#7900 Follows up tarantool#8594 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump NO_CHANGELOG=LuaJIT submodule bump
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Dec 5, 2023
Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Dec 5, 2023
Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Dec 6, 2023
Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org> Reviewed-by: Sergey Kaplun <skaplun@tarantool.org> Signed-off-by: Igor Munkin <imun@tarantool.org> (cherry picked from commit caa9986)
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Dec 6, 2023
Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org> Reviewed-by: Sergey Kaplun <skaplun@tarantool.org> Signed-off-by: Igor Munkin <imun@tarantool.org>
igormunkin
added a commit
to tarantool/luajit
that referenced
this issue
Dec 6, 2023
Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1 ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org> Reviewed-by: Sergey Kaplun <skaplun@tarantool.org> Signed-off-by: Igor Munkin <imun@tarantool.org> (cherry picked from commit caa9986)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Tarantool Enterprise 2.10.6-0-g5d09e81a6-r553-nogc64-debug
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/__w/sdk/sdk/build.sdk/tarantool-2.10/static-build/tarantool-prefix -DENABLE_BACKTRACE=TRUE
Compiler: GNU-9.3.1
C_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/__w/sdk/sdk/tarantool-2.10=. -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/__w/sdk/sdk/tarantool-2.10=. -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
Backtrace:
Initially it failed with stacktrace above. On attempt to run Tarantool under gdb I got an assertion.
Also stacktrace from release with debug info build:
The text was updated successfully, but these errors were encountered: