Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ER_CURSOR_NO_TRANSACTION in on_commit trigger for syncro space #8505

Closed
R-omk opened this issue Mar 27, 2023 · 3 comments · Fixed by #8736
Closed

ER_CURSOR_NO_TRANSACTION in on_commit trigger for syncro space #8505

R-omk opened this issue Mar 27, 2023 · 3 comments · Fixed by #8736
Assignees
Labels
2.10 Target is 2.10 and all newer release/master branches bug Something isn't working

Comments

@R-omk
Copy link

R-omk commented Mar 27, 2023

Bug description

main/103/app.lua init.c:376 E> ER_CURSOR_NO_TRANSACTION: The transaction the cursor belongs to has ended
main/103/app.lua F> transaction commit trigger failed

Tarantool 2.11.0-rc2-0-g2ae0c94a2 Linux-x86_64-RelWithDebInfo

Steps to reproduce

box.cfg{}
s1 = box.schema.space.create('s1', { if_not_exists = true, is_sync = true})
s1:create_index('primary', {if_not_exists = true, unique = true, parts = {{field = 1, type = 'unsigned'}}})
box.ctl.promote()

f_commit = function(row_pairs) row_pairs() end
on_replace_f = function(old,new) box.on_commit(f_commit) end

s1:on_replace(on_replace_f)
s1:insert({1})

@R-omk R-omk added the bug Something isn't working label Mar 27, 2023
@Gerold103
Copy link
Collaborator

Apparently this also affects transactions working with async spaces. If the limbo is not empty, then all txns go there, and on commit will get the same error.

@R-omk
Copy link
Author

R-omk commented Mar 27, 2023

Initially this problem was found during the process of moving a bucket in vshard, this fatal error can randomly occur on reciever side.

@Serpentian
Copy link
Contributor

Serpentian commented Jun 1, 2023

The error comes from the function:

lbox_txn_pairs(struct lua_State *L)
{
int64_t txn_id = luaL_toint64(L, lua_upvalueindex(1));
struct txn *txn = in_txn();
if (txn == NULL || txn->id != txn_id) {
diag_set(ClientError, ER_CURSOR_NO_TRANSACTION);
return luaT_error(L);
}
luaL_pushint64(L, txn_id);
lua_pushlightuserdata(L, stailq_first_entry(&txn->stmts,
struct txn_stmt, next));
lua_pushcclosure(L, lbox_txn_iterator_next, 2);
lua_pushnil(L);
lua_pushinteger(L, 0);
return 3;
}

txn is NULL, which means, that no transaction is assigned to the following fiber. Error has the following traceback:

#0  lbox_txn_pairs (L=0x5555559328e2 <lj_alloc_f(void*, void*, size_t, size_t)+81>) at /home/serpentian/Programming/tnt/tarantool/src/box/lua/init.c:451
#1  0x0000555555ad3387 in lj_BC_FUNCC () at buildvm_x86.dasc:811
#2  0x00005555558d9d34 in lua_pcall (L=0x40000378, nargs=1, nresults=-1, errfunc=0) at /home/serpentian/Programming/tnt/tarantool/third_party/luajit/src/lj_api.c:1163
#3  0x000055555585106a in luaT_call (L=0x40000378, nargs=1, nreturns=-1) at /home/serpentian/Programming/tnt/tarantool/src/lua/utils.c:618
#4  0x000055555584b3a2 in lbox_trigger_run (ptr=0x555556112320, event=0x7ffff440e038) at /home/serpentian/Programming/tnt/tarantool/src/lua/trigger.c:111
#5  0x00005555558a2b3e in trigger_run_list (list=0x7ffff45ffa40, event=0x7ffff440e038) at /home/serpentian/Programming/tnt/tarantool/src/lib/core/trigger.cc:119
#6  0x00005555558a2ce7 in trigger_run_reverse (list=0x7ffff440e198, event=0x7ffff440e038) at /home/serpentian/Programming/tnt/tarantool/src/lib/core/trigger.cc:166
#7  0x0000555555701f40 in txn_complete_success (txn=0x7ffff440e038) at /home/serpentian/Programming/tnt/tarantool/src/box/txn.c:795
#8  0x0000555555707afe in txn_limbo_read_confirm (limbo=0x555555e73a20 <txn_limbo>, lsn=4) at /home/serpentian/Programming/tnt/tarantool/src/box/txn_limbo.c:464
#9  0x0000555555708377 in txn_limbo_ack (limbo=0x555555e73a20 <txn_limbo>, replica_id=1, lsn=4) at /home/serpentian/Programming/tnt/tarantool/src/box/txn_limbo.c:657
#10 0x0000555555703185 in txn_commit (txn=0x7ffff440e038) at /home/serpentian/Programming/tnt/tarantool/src/box/txn.c:1140
#11 0x00005555557101a3 in box_process_rw (request=0x7ffff45ffc10, space=0x555555f543f0, result=0x7ffff45ffd08)
    at /home/serpentian/Programming/tnt/tarantool/src/box/box.cc:475
#12 0x000055555571bf37 in box_process1 (request=0x7ffff45ffc10, result=0x7ffff45ffd08) at /home/serpentian/Programming/tnt/tarantool/src/box/box.cc:3378
#13 0x000055555571cad8 in box_insert (space_id=512, tuple=0x7ffff4403240 "\221\001", 'P' <repeats 198 times>..., tuple_end=0x7ffff4403242 'P' <repeats 200 times>...,
    result=0x7ffff45ffd08) at /home/serpentian/Programming/tnt/tarantool/src/box/box.cc:3570
#14 0x0000555555818864 in lbox_insert (L=0x40000378) at /home/serpentian/Programming/tnt/tarantool/src/box/lua/index.c:61

txn_limbo_ack tries to confirm a request, which have triggers. These triggers need transaction to be set in the current fiber. The transaction is cleared in txn_commit:

tarantool/src/box/txn.c

Lines 1124 to 1145 in 606e50c

fiber_set_txn(fiber(), NULL);
if (journal_write(req) != 0)
goto rollback_io;
if (req->res < 0) {
diag_set_journal_res(req->res);
goto rollback_io;
}
if (txn_has_flag(txn, TXN_WAIT_SYNC)) {
struct txn_limbo_entry *limbo_entry = txn->limbo_entry;
assert(limbo_entry->lsn > 0);
/*
* XXX: ACK should be done on WAL write too. But it can make
* another WAL write. Can't be done until it works
* asynchronously.
*/
if (txn_has_flag(txn, TXN_WAIT_ACK)) {
txn_limbo_ack(&txn_limbo, txn_limbo.owner_id,
limbo_entry->lsn);
}
if (txn_limbo_wait_complete(&txn_limbo, limbo_entry) < 0)
goto rollback;
}


on_rollback/on_commit don't work with synchronous spaces at all!

Other traceback:

2023-06-03 15:34:11.791 [16576] main/119/main I> #1  0x55b106699e2f in backtrace_collect+153
#2  0x55b10661f7df in lbox_txn_pairs+355
#3  0x55b1068eb547 in lj_BC_FUNCC+70
#4  0x55b1066f1f04 in lua_pcall+890
#5  0x55b10666923b in luaT_call+41
#6  0x55b106663573 in lbox_trigger_run+359
#7  0x55b1066bad0e in trigger_run_list(rlist*, void*)+58
#8  0x55b1066baeb7 in trigger_run_reverse+173
#9  0x55b106519f40 in txn_complete_success+300
#10 0x55b10651fbba in txn_limbo_read_confirm+531
#11 0x55b106520433 in txn_limbo_ack+559
#12 0x55b106563e58 in tx_status_update(cmsg*)+296
#13 0x55b10669d395 in cmsg_deliver+48
#14 0x55b10669e4a5 in fiber_pool_f+521
#15 0x55b1064072e7 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+30
#16 0x55b1066957d9 in fiber_loop+181
#17 0x55b1068ac7a7 in coro_init+76

2023-06-03 15:34:11.791 [16576] main/119/main init.c:457 E> ER_CURSOR_NO_TRANSACTION: The transaction the cursor belongs to has ended

@Serpentian Serpentian self-assigned this Jun 1, 2023
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 2, 2023
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 4, 2023
Currently any transaction on synchronous space fails to complete with
the ER_CURSOR_NO_TRANSACTION error, when on_rollback/on_commit triggers
are set. This is caused due to the fact, that some rollback/commit
triggers require in_txn fiber variable to be set but it's not done
when a transaction is completed from the limbo.

Let's assign transaction to the fiber when we complete transaction from
the limbo. Moreover, let's add assertions, which check whether in_txn()
is set, when on_rollback/on_commit triggers are run.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 4, 2023
Currently any transaction on synchronous space fails to complete with
the ER_CURSOR_NO_TRANSACTION error, when on_rollback/on_commit triggers
are set. This is caused due to the fact, that some rollback/commit
triggers require in_txn fiber variable to be set but it's not done
when a transaction is completed from the limbo.

Let's assign transaction to the fiber when we complete transaction from
the limbo. Moreover, let's add assertions, which check whether in_txn()
is set, when on_rollback/on_commit triggers are run.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 6, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 6, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 13, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 16, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes tarantool#8505

NO_DOC=bugfix
Serpentian added a commit to Serpentian/tarantool that referenced this issue Jun 16, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes tarantool#8505

NO_DOC=bugfix
@sergepetrenko sergepetrenko added the 2.10 Target is 2.10 and all newer release/master branches label Jun 19, 2023
sergepetrenko pushed a commit that referenced this issue Jun 19, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes #8505

NO_DOC=bugfix
sergepetrenko pushed a commit that referenced this issue Jun 19, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes #8505

NO_DOC=bugfix

(cherry picked from commit 6fadc8a)
sergepetrenko pushed a commit that referenced this issue Jun 19, 2023
Currently some transactions on synchronous space fail to complete with
the `ER_CURSOR_NO_TRANSACTION` error, when on_rollback/on_commit triggers
are set.

This is caused due to the fact, that some rollback/commit triggers
require in_txn fiber variable to be set but it's not done when a
transaction is completed from the limbo. Callbacks, which are used to
work with iterators (`lbox_txn_pairs` and `lbox_txn_iterator_next`),
acquire tnx statements from the current transactions, but they cannot
do that, when this transaction is not assigned to the current fiber, so
`ER_CURSOR_NO_TRANSACTION` is thrown.

Let's assign in_txn variable when we complete transaction from the limbo.
Moreover, let's add assertions, which check whether in_txn() is correct,
in order to be sure, that `txn_complete_success/fail` always run with
in_txn set.

Closes #8505

NO_DOC=bugfix

(cherry picked from commit 6fadc8a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.10 Target is 2.10 and all newer release/master branches bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants