Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD: core dump static build #19

Closed
zloidemon opened this issue May 18, 2013 · 2 comments
Closed

FreeBSD: core dump static build #19

zloidemon opened this issue May 18, 2013 · 2 comments
Labels

Comments

@zloidemon
Copy link
Contributor

Doesn't work static build on FreeBSD:

[test] 11:02 %> gdb tarantool_box
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) r -c tarantool.cfg.sample
Starting program: /usr/local/bin/tarantool_box -c tarantool.cfg.sample
[New LWP 101079]
[New LWP 101079]
/usr/local/bin/tarantool_box: updating a stale pid file
2013-05-18 11:02:39.252 [14549] 1/sched I> space 0 successfully configured
2013-05-18 11:02:39.252 [14549] 1/sched I> recovery start
2013-05-18 11:02:39.252 [14549] 1/sched I> recover from `./00000000000000000001.snap'
2013-05-18 11:02:39.252 [14549] 1/sched I> snapshot recovered, confirmed lsn: 1
2013-05-18 11:02:39.252 [14549] 1/sched I> done `./00000000000000000002.xlog' confirmed_lsn: 5
2013-05-18 11:02:39.252 [14549] 1/sched I> WALs recovered, confirmed lsn: 5
2013-05-18 11:02:39.252 [14549] 1/sched I> building secondary indexes
2013-05-18 11:02:39.252 [14549] 1/sched I> bound to port 33013
2013-05-18 11:02:39.253 [14549] 1/sched I> I am primary
2013-05-18 11:02:39.253 [14549] 1/sched I> bound to port 33014
2013-05-18 11:02:39.253 [14549] 1/sched I> bound to port 33015
[New Thread 800c29400 (LWP 101079/tarantool_box)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 800c29400 (LWP 101079/tarantool_box)]
0x00000000005dedcc in thr_kill ()
(gdb) thread apply all bt
[New Thread 800c29c00 (LWP 113108/tarantool_box)]

Thread 3 (Thread 800c29c00 (LWP 113108/tarantool_box)):
#0  0x000000000052634c in _umtx_op_err ()
#1  0x00000000005206e5 in _thr_umtx_timedwait_uint ()
#2  0x0000000000528edd in cond_wait_common ()
#3  0x000000000044fab7 in wal_writer_pop (writer=0xa06320, input=0x7fffffbfdf90) at fio.h:170
#4  0x00000000004500e2 in wal_writer_thread (worker_args=0x8009e5178) at fio.h:170
#5  0x00000000005281f4 in thread_start ()
#6  0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffffbfe000: Bad address.

Thread 2 (Thread 800c29400 (LWP 101079/tarantool_box)):
#0  0x00000000005dedcc in thr_kill ()
#1  0x000000000062f1b8 in abort ()
#2  0x00000000005d88d5 in uw_init_context_1 (context=0x7fffffffc810, outer_cfa=0x7fffffffca40, outer_ra=0x46ea95) at unwind-pe.h:155
#3  0x00000000005d8d27 in _Unwind_RaiseException (exc=0x800c07f80) at unwind-pe.h:155
#4  0x000000000046ea95 in err_raise_ext ()
#5  0x000000000046eaf2 in lj_err_throw ()
#6  0x000000000046f426 in lj_err_lex ()
#7  0x0000000000485aee in lj_lex_error ()
#8  0x0000000000485c39 in err_token ()
#9  0x000000000048d5fe in lj_parse ()
#10 0x0000000000482a35 in cpparser ()
#11 0x00000000004ab06e in lj_vm_cpcall ()
#12 0x0000000000482bb9 in lua_load ()
#13 0x00000000004a3547 in luaL_loadbuffer ()
#14 0x00000000004a359a in luaL_loadstring ()
#15 0x000000000042dda8 in tarantool_lua_dostring (L=0x109e5378, str=0x668118 "os.execute = nil\nos.exit = nil\nos.rename = nil\nos.tmpname = nil\nos.remove = nil\nrequire = nil\n")
    at _ctype.h:125
#16 0x000000000042e495 in tarantool_lua_sandbox (L=0x109e5378) at _ctype.h:125
#17 0x000000000042e52d in tarantool_lua_load_init_script (L=0x109e5378) at _ctype.h:125
#18 0x000000000042ba39 in main (argc=1, argv=0x7fffffffd610) at rlist.h:142

tarantool.cfg.sample:

slab_alloc_arena = 0.1
pid_file = "box.pid"
primary_port = 33013
secondary_port = 33014
admin_port = 33015
rows_per_wal = 50000
space[0].enabled = 1
space[0].index[0].type = "HASH"
space[0].index[0].unique = 1
space[0].index[0].key_field[0].fieldno = 0
space[0].index[0].key_field[0].type = "NUM"
work_dir = "var"
@kostja
Copy link
Contributor

kostja commented Sep 24, 2013

ghost pushed a commit to truenas/ports that referenced this issue Oct 30, 2013
- Added plugins support
- Removed static build, doesn't work more details:
tarantool/tarantool#19
- Added patch from devel/libev

Approved by:	eadler (mentor)
@kostja
Copy link
Contributor

kostja commented Jan 28, 2014

It's been a while and we haven't got a chance neither to repeat it nor fix. Closing as not important. Please feel free to reopen if you're using Tarantool on FreeBSD and need a static build.

@kostja kostja closed this as completed Jan 28, 2014
splbio pushed a commit to splbio/freebsd-ports that referenced this issue Nov 24, 2014
- Added plugins support
- Removed static build, doesn't work more details:
tarantool/tarantool#19
- Added patch from devel/libev

Approved by:	eadler (mentor)


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@329243 35697150-7ecd-e111-bb59-0022644237b5
zloidemon added a commit that referenced this issue Mar 24, 2015
locker added a commit that referenced this issue Jan 17, 2018
say_logger_init() zeroes the default logger object (log_default) before
proceeding to logging subsystem configuration. If configuration fails
for some reason (e.g. error opening the log file), the default logger
will be left uninitialized, and we will crash trying to print the error
to the console:

  #0  0x564065001af5 in print_backtrace+9
  #1  0x564064f0b17f in _ZL12sig_fatal_cbi+e2
  #2  0x7ff94519f0c0 in __restore_rt+0
  #3  (nil) in +0
  #4  0x564064ffc399 in say_default+d2
  #5  0x564065011c37 in _ZNK11SystemError3logEv+6d
  #6  0x5640650117be in exception_log+3d
  #7  0x564064ff9750 in error_log+1d
  #8  0x564064ff9847 in diag_log+50
  #9  0x564064ffab9b in say_logger_init+22a
  #10 0x564064f0bffb in load_cfg+69a
  #11 0x564064fd2f49 in _ZL13lbox_cfg_loadP9lua_State+12
  #12 0x56406502258b in lj_BC_FUNCC+34
  #13 0x564065045103 in lua_pcall+18e
  #14 0x564064fed733 in luaT_call+29
  #15 0x564064fe5536 in lua_main+b9
  #16 0x564064fe5d74 in run_script_f+7b5
  #17 0x564064f0aef4 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
  #18 0x564064fff4e5 in fiber_loop+82
  #19 0x5640651a123b in coro_init+4c
  #20 (nil) in +4c

Fix this by making say_logger_init() initialize the default logger
object first and only assign it to log_default on success.

See #3048
llelik8 pushed a commit that referenced this issue Feb 21, 2018
locker added a commit that referenced this issue Apr 14, 2019
To propagate changes applied to a space while a new index is being
built, we install an on_replace trigger. In case the on_replace
trigger callback fails, we abort the DDL operation.

The problem is the trigger may yield, e.g. to check the unique
constraint of the new index. This opens a time window for the DDL
operation to complete and clear the trigger. If this happens, the
trigger will try to access the outdated build context and crash:

 | #0  0x558f29cdfbc7 in print_backtrace+9
 | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
 | #2  0x7fe24e4ab0e0 in __restore_rt+0
 | #3  0x558f29bfe036 in error_unref+1a
 | #4  0x558f29bfe0d1 in diag_clear+27
 | #5  0x558f29bfe133 in diag_move+1c
 | #6  0x558f29c0a4e2 in vy_build_on_replace+236
 | #7  0x558f29cf3554 in trigger_run+7a
 | #8  0x558f29c7b494 in txn_commit_stmt+125
 | #9  0x558f29c7e22c in box_process_rw+ec
 | #10 0x558f29c81743 in box_process1+8b
 | #11 0x558f29c81d5c in box_upsert+c4
 | #12 0x558f29caf110 in lbox_upsert+131
 | #13 0x558f29cfed97 in lj_BC_FUNCC+34
 | #14 0x558f29d104a4 in lua_pcall+34
 | #15 0x558f29cc7b09 in luaT_call+29
 | #16 0x558f29cc1de5 in lua_fiber_run_f+74
 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
 | #18 0x558f29cdca33 in fiber_loop+41
 | #19 0x558f29e4e8cd in coro_init+4c

To fix this issue, let's recall that when a DDL operation completes,
all pending transactions that affect the altered space are aborted by
the space_invalidate callback. So to avoid the crash, we just need to
bail out early from the on_replace trigger callback if we detect that
the current transaction has been aborted.

Closes #4152
locker added a commit that referenced this issue Apr 16, 2019
To propagate changes applied to a space while a new index is being
built, we install an on_replace trigger. In case the on_replace
trigger callback fails, we abort the DDL operation.

The problem is the trigger may yield, e.g. to check the unique
constraint of the new index. This opens a time window for the DDL
operation to complete and clear the trigger. If this happens, the
trigger will try to access the outdated build context and crash:

 | #0  0x558f29cdfbc7 in print_backtrace+9
 | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
 | #2  0x7fe24e4ab0e0 in __restore_rt+0
 | #3  0x558f29bfe036 in error_unref+1a
 | #4  0x558f29bfe0d1 in diag_clear+27
 | #5  0x558f29bfe133 in diag_move+1c
 | #6  0x558f29c0a4e2 in vy_build_on_replace+236
 | #7  0x558f29cf3554 in trigger_run+7a
 | #8  0x558f29c7b494 in txn_commit_stmt+125
 | #9  0x558f29c7e22c in box_process_rw+ec
 | #10 0x558f29c81743 in box_process1+8b
 | #11 0x558f29c81d5c in box_upsert+c4
 | #12 0x558f29caf110 in lbox_upsert+131
 | #13 0x558f29cfed97 in lj_BC_FUNCC+34
 | #14 0x558f29d104a4 in lua_pcall+34
 | #15 0x558f29cc7b09 in luaT_call+29
 | #16 0x558f29cc1de5 in lua_fiber_run_f+74
 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
 | #18 0x558f29cdca33 in fiber_loop+41
 | #19 0x558f29e4e8cd in coro_init+4c

To fix this issue, let's recall that when a DDL operation completes,
all pending transactions that affect the altered space are aborted by
the space_invalidate callback. So to avoid the crash, we just need to
bail out early from the on_replace trigger callback if we detect that
the current transaction has been aborted.

Closes #4152
locker added a commit that referenced this issue Apr 16, 2019
To propagate changes applied to a space while a new index is being
built, we install an on_replace trigger. In case the on_replace
trigger callback fails, we abort the DDL operation.

The problem is the trigger may yield, e.g. to check the unique
constraint of the new index. This opens a time window for the DDL
operation to complete and clear the trigger. If this happens, the
trigger will try to access the outdated build context and crash:

 | #0  0x558f29cdfbc7 in print_backtrace+9
 | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
 | #2  0x7fe24e4ab0e0 in __restore_rt+0
 | #3  0x558f29bfe036 in error_unref+1a
 | #4  0x558f29bfe0d1 in diag_clear+27
 | #5  0x558f29bfe133 in diag_move+1c
 | #6  0x558f29c0a4e2 in vy_build_on_replace+236
 | #7  0x558f29cf3554 in trigger_run+7a
 | #8  0x558f29c7b494 in txn_commit_stmt+125
 | #9  0x558f29c7e22c in box_process_rw+ec
 | #10 0x558f29c81743 in box_process1+8b
 | #11 0x558f29c81d5c in box_upsert+c4
 | #12 0x558f29caf110 in lbox_upsert+131
 | #13 0x558f29cfed97 in lj_BC_FUNCC+34
 | #14 0x558f29d104a4 in lua_pcall+34
 | #15 0x558f29cc7b09 in luaT_call+29
 | #16 0x558f29cc1de5 in lua_fiber_run_f+74
 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
 | #18 0x558f29cdca33 in fiber_loop+41
 | #19 0x558f29e4e8cd in coro_init+4c

To fix this issue, let's recall that when a DDL operation completes,
all pending transactions that affect the altered space are aborted by
the space_invalidate callback. So to avoid the crash, we just need to
bail out early from the on_replace trigger callback if we detect that
the current transaction has been aborted.

Closes #4152

(cherry picked from commit ccd46a2)
locker added a commit that referenced this issue Apr 16, 2019
To propagate changes applied to a space while a new index is being
built, we install an on_replace trigger. In case the on_replace
trigger callback fails, we abort the DDL operation.

The problem is the trigger may yield, e.g. to check the unique
constraint of the new index. This opens a time window for the DDL
operation to complete and clear the trigger. If this happens, the
trigger will try to access the outdated build context and crash:

 | #0  0x558f29cdfbc7 in print_backtrace+9
 | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
 | #2  0x7fe24e4ab0e0 in __restore_rt+0
 | #3  0x558f29bfe036 in error_unref+1a
 | #4  0x558f29bfe0d1 in diag_clear+27
 | #5  0x558f29bfe133 in diag_move+1c
 | #6  0x558f29c0a4e2 in vy_build_on_replace+236
 | #7  0x558f29cf3554 in trigger_run+7a
 | #8  0x558f29c7b494 in txn_commit_stmt+125
 | #9  0x558f29c7e22c in box_process_rw+ec
 | #10 0x558f29c81743 in box_process1+8b
 | #11 0x558f29c81d5c in box_upsert+c4
 | #12 0x558f29caf110 in lbox_upsert+131
 | #13 0x558f29cfed97 in lj_BC_FUNCC+34
 | #14 0x558f29d104a4 in lua_pcall+34
 | #15 0x558f29cc7b09 in luaT_call+29
 | #16 0x558f29cc1de5 in lua_fiber_run_f+74
 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
 | #18 0x558f29cdca33 in fiber_loop+41
 | #19 0x558f29e4e8cd in coro_init+4c

To fix this issue, let's recall that when a DDL operation completes,
all pending transactions that affect the altered space are aborted by
the space_invalidate callback. So to avoid the crash, we just need to
bail out early from the on_replace trigger callback if we detect that
the current transaction has been aborted.

Closes #4152

(cherry picked from commit ccd46a2)
drakonhg pushed a commit that referenced this issue Sep 2, 2021
drakonhg pushed a commit that referenced this issue Sep 2, 2021
drakonhg pushed a commit that referenced this issue Sep 2, 2021
svmhdvn pushed a commit to svmhdvn/freebsd-ports that referenced this issue Jan 10, 2024
- Added plugins support
- Removed static build, doesn't work more details:
tarantool/tarantool#19
- Added patch from devel/libev

Approved by:	eadler (mentor)
locker added a commit to locker/tarantool that referenced this issue Jun 10, 2024
`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  tarantool#1  0x590be9f7eb6d in crash_collect+256
  tarantool#2  0x590be9f7f5a9 in crash_signal_cb+100
  tarantool#3  0x72b111642520 in __sigaction+80
  tarantool#4  0x590bea385e3c in load_u32+35
  tarantool#5  0x590bea231eba in field_map_get_offset+46
  tarantool#6  0x590bea23242a in tuple_field_raw_by_path+417
  tarantool#7  0x590bea23282b in tuple_field_raw_by_part+203
  tarantool#8  0x590bea23288c in tuple_field_by_part+91
  tarantool#9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  tarantool#10 0x590be9d4fba3 in tuple_hint+40
  tarantool#11 0x590be9d50acf in vy_stmt_hint+178
  tarantool#12 0x590be9d53531 in vy_page_stmt+168
  tarantool#13 0x590be9d535ea in vy_page_find_key+142
  tarantool#14 0x590be9d545e6 in vy_page_read_cb+210
  tarantool#15 0x590be9f94ef0 in cbus_call_perform+44
  tarantool#16 0x590be9f94eae in cmsg_deliver+52
  tarantool#17 0x590be9f9583e in cbus_process+100
  tarantool#18 0x590be9f958a5 in cbus_loop+28
  tarantool#19 0x590be9d512da in vy_run_reader_f+381
  tarantool#20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  tarantool#21 0x590be9f8b697 in fiber_loop+219
  tarantool#22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes tarantool#10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer
locker added a commit to locker/tarantool that referenced this issue Jun 11, 2024
`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  tarantool#1  0x590be9f7eb6d in crash_collect+256
  tarantool#2  0x590be9f7f5a9 in crash_signal_cb+100
  tarantool#3  0x72b111642520 in __sigaction+80
  tarantool#4  0x590bea385e3c in load_u32+35
  tarantool#5  0x590bea231eba in field_map_get_offset+46
  tarantool#6  0x590bea23242a in tuple_field_raw_by_path+417
  tarantool#7  0x590bea23282b in tuple_field_raw_by_part+203
  tarantool#8  0x590bea23288c in tuple_field_by_part+91
  tarantool#9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  tarantool#10 0x590be9d4fba3 in tuple_hint+40
  tarantool#11 0x590be9d50acf in vy_stmt_hint+178
  tarantool#12 0x590be9d53531 in vy_page_stmt+168
  tarantool#13 0x590be9d535ea in vy_page_find_key+142
  tarantool#14 0x590be9d545e6 in vy_page_read_cb+210
  tarantool#15 0x590be9f94ef0 in cbus_call_perform+44
  tarantool#16 0x590be9f94eae in cmsg_deliver+52
  tarantool#17 0x590be9f9583e in cbus_process+100
  tarantool#18 0x590be9f958a5 in cbus_loop+28
  tarantool#19 0x590be9d512da in vy_run_reader_f+381
  tarantool#20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  tarantool#21 0x590be9f8b697 in fiber_loop+219
  tarantool#22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes tarantool#10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer
locker added a commit that referenced this issue Jun 13, 2024
`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  #1  0x590be9f7eb6d in crash_collect+256
  #2  0x590be9f7f5a9 in crash_signal_cb+100
  #3  0x72b111642520 in __sigaction+80
  #4  0x590bea385e3c in load_u32+35
  #5  0x590bea231eba in field_map_get_offset+46
  #6  0x590bea23242a in tuple_field_raw_by_path+417
  #7  0x590bea23282b in tuple_field_raw_by_part+203
  #8  0x590bea23288c in tuple_field_by_part+91
  #9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  #10 0x590be9d4fba3 in tuple_hint+40
  #11 0x590be9d50acf in vy_stmt_hint+178
  #12 0x590be9d53531 in vy_page_stmt+168
  #13 0x590be9d535ea in vy_page_find_key+142
  #14 0x590be9d545e6 in vy_page_read_cb+210
  #15 0x590be9f94ef0 in cbus_call_perform+44
  #16 0x590be9f94eae in cmsg_deliver+52
  #17 0x590be9f9583e in cbus_process+100
  #18 0x590be9f958a5 in cbus_loop+28
  #19 0x590be9d512da in vy_run_reader_f+381
  #20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  #21 0x590be9f8b697 in fiber_loop+219
  #22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes #10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer
locker added a commit that referenced this issue Jun 13, 2024
`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  #1  0x590be9f7eb6d in crash_collect+256
  #2  0x590be9f7f5a9 in crash_signal_cb+100
  #3  0x72b111642520 in __sigaction+80
  #4  0x590bea385e3c in load_u32+35
  #5  0x590bea231eba in field_map_get_offset+46
  #6  0x590bea23242a in tuple_field_raw_by_path+417
  #7  0x590bea23282b in tuple_field_raw_by_part+203
  #8  0x590bea23288c in tuple_field_by_part+91
  #9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  #10 0x590be9d4fba3 in tuple_hint+40
  #11 0x590be9d50acf in vy_stmt_hint+178
  #12 0x590be9d53531 in vy_page_stmt+168
  #13 0x590be9d535ea in vy_page_find_key+142
  #14 0x590be9d545e6 in vy_page_read_cb+210
  #15 0x590be9f94ef0 in cbus_call_perform+44
  #16 0x590be9f94eae in cmsg_deliver+52
  #17 0x590be9f9583e in cbus_process+100
  #18 0x590be9f958a5 in cbus_loop+28
  #19 0x590be9d512da in vy_run_reader_f+381
  #20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  #21 0x590be9f8b697 in fiber_loop+219
  #22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes #10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer

(cherry picked from commit 19d1f1c)
locker added a commit that referenced this issue Jun 13, 2024
`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  #1  0x590be9f7eb6d in crash_collect+256
  #2  0x590be9f7f5a9 in crash_signal_cb+100
  #3  0x72b111642520 in __sigaction+80
  #4  0x590bea385e3c in load_u32+35
  #5  0x590bea231eba in field_map_get_offset+46
  #6  0x590bea23242a in tuple_field_raw_by_path+417
  #7  0x590bea23282b in tuple_field_raw_by_part+203
  #8  0x590bea23288c in tuple_field_by_part+91
  #9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  #10 0x590be9d4fba3 in tuple_hint+40
  #11 0x590be9d50acf in vy_stmt_hint+178
  #12 0x590be9d53531 in vy_page_stmt+168
  #13 0x590be9d535ea in vy_page_find_key+142
  #14 0x590be9d545e6 in vy_page_read_cb+210
  #15 0x590be9f94ef0 in cbus_call_perform+44
  #16 0x590be9f94eae in cmsg_deliver+52
  #17 0x590be9f9583e in cbus_process+100
  #18 0x590be9f958a5 in cbus_loop+28
  #19 0x590be9d512da in vy_run_reader_f+381
  #20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  #21 0x590be9f8b697 in fiber_loop+219
  #22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes #10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer

(cherry picked from commit 19d1f1c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants