New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Сannot recover from an anonymous replica after writing to a local space due to an on_replace trigger for a global space. #8746
Labels
Comments
yanshtunder
changed the title
Сannot recover from an anonymous replica due to writing to a local space in an on_replace trigger for a global space.
Сannot recover from an anonymous replica after writing to a local space due to an on_replace trigger for a global space.
Jun 7, 2023
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Jul 27, 2023
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Jul 31, 2023
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 23, 2023
sergepetrenko
added
replication
2.10
Target is 2.10 and all newer release/master branches
labels
Aug 24, 2023
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 24, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Fix transaction boundary calculations for such cases. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 24, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 30, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Fix transaction boundary calculations for such cases. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 30, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 31, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Aug 31, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 1, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 8, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 8, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 11, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 11, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 13, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 13, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 15, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 15, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 18, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Sep 18, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 2, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 2, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 5, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 5, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 6, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up tarantool#8746 Follow-up tarantool#7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug tarantool#8746
sergepetrenko
added a commit
that referenced
this issue
Oct 9, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes #8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit
sergepetrenko
added a commit
that referenced
this issue
Oct 9, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up #8746 Follow-up #7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug #8746
sergepetrenko
added a commit
that referenced
this issue
Oct 9, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes #8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit (cherry picked from commit 099cb2d)
sergepetrenko
added a commit
that referenced
this issue
Oct 9, 2023
Force recovery first tries to collect all rows of a transaction into a single list, and only then applies those rows. The problem was that it collected rows based on the row replica_id. For local rows replica_id is set to 0, but actually such rows can be part of a transaction coming from any instance. Fix recovery of such rows Follow-up #8746 Follow-up #7932 NO_DOC=bugfix NO_CHANGELOG=the broken behaviour couldn't be seen due to bug #8746 (cherry picked from commit 85df1c9)
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 9, 2023
Transaction boundaries were not updated correctly for transactions in which local space writes were made from a replication trigger. Existing transaction boundaries and row flags from the master were written as is on the replica. Actually, the replica should recalculate transaction boundaries and even WAIT_SYNC/WAIT_ACK flags. Transaction boundaries should be recalculated when a replica appends a local write at the end of the master's transaction, and WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous transactions coming from an old term. The latter fix has uncovered the bug in skipping outdated synchronous transactions: if one replica replaces a transaction from an old term with NOPs and then passes that transaction to the other replica, the other replica raises a split brain error. It believes the NOPs are an async transaction form an old term. This worked before the fix, because the rows were written with the original WAIT_ACK = true bit. Now this is fixed properly: we allow fully NOP async tranasctions from the old term. Closes tarantool#8746 NO_DOC=bugfix NO_CHANGELOG=covered by the next commit (cherry picked from commit 099cb2d)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
If at an anonymous replica due to an on_replace trigger on the global space write is made to the local space, then recovering will be impossible. Run the tarantool like below:
Now try to recover from the anonymous replica. If you recover without
force_recovery
, you will get error:If you recover with
force_recovery = true
, you will get error:The text was updated successfully, but these errors were encountered: