New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchro replication needs answer with Confirm immediately #5100
Comments
Gerold103
added a commit
that referenced
this issue
Jun 22, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jun 25, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 25, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 29, 2020
Writer condition variable was used by the writer fiber to be woken up when it is time to send a heartbeat or an ACK. However it is not really needed, because writer fiber pointer is always available in the same structure as writer_cond, and can be used to call fiber_wakeup() directly. Note, fiber_cond_signal() is basically the same fiber_wakeup(). The patch is not just refactoring for nothing. It is a prerequisite for #5100. In this issue it will be needed to wakeup the applier's writer fiber directly on each WAL write from txn.c module. So the writer_cond won't be available. The only usable thing will be txn->fiber, which will be set to applier's writer. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jun 30, 2020
Writer condition variable was used by the writer fiber to be woken up when it is time to send a heartbeat or an ACK. However it is not really needed, because writer fiber pointer is always available in the same structure as writer_cond, and can be used to call fiber_wakeup() directly. Note, fiber_cond_signal() is basically the same fiber_wakeup(). The patch is not just refactoring for nothing. It is a prerequisite for #5100. In this issue it will be needed to wakeup the applier's writer fiber directly on each WAL write from txn.c module. So the writer_cond won't be available. The only usable thing will be txn->fiber, which will be set to applier's writer. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jun 30, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jul 1, 2020
Writer condition variable was used by the writer fiber to be woken up when it is time to send a heartbeat or an ACK. However it is not really needed, because writer fiber pointer is always available in the same structure as writer_cond, and can be used to call fiber_wakeup() directly. Note, fiber_cond_signal() is basically the same fiber_wakeup(). The patch is not just refactoring for nothing. It is a prerequisite for #5100. In this issue it will be needed to wakeup the applier's writer fiber directly on each WAL write from txn.c module. So the writer_cond won't be available. The only usable thing will be txn->fiber, which will be set to applier's writer. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 1, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jul 2, 2020
Writer condition variable was used by the writer fiber to be woken up when it is time to send a heartbeat or an ACK. However it is not really needed, because writer fiber pointer is always available in the same structure as writer_cond, and can be used to call fiber_wakeup() directly. Note, fiber_cond_signal() is basically the same fiber_wakeup(). The patch is not just refactoring for nothing. It is a prerequisite for #5100. In this issue it will be needed to wakeup the applier's writer fiber directly on each WAL write from txn.c module. So the writer_cond won't be available. The only usable thing will be txn->fiber, which will be set to applier's writer. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 2, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jul 4, 2020
Writer condition variable was used by the writer fiber to be woken up when it is time to send a heartbeat or an ACK. However it is not really needed, because writer fiber pointer is always available in the same structure as writer_cond, and can be used to call fiber_wakeup() directly. Note, fiber_cond_signal() is basically the same fiber_wakeup(). The patch is not just refactoring for nothing. It is a prerequisite for #5100. In this issue it will be needed to wakeup the applier's writer fiber directly on each WAL write from txn.c module. So the writer_cond won't be available. The only usable thing will be txn->fiber, which will be set to applier's writer. Part of #5100
sergepetrenko
pushed a commit
that referenced
this issue
Jul 4, 2020
Concept of 'commit' becomes not 100% matching WAL write event, when synchro replication comes. And yet applier relied on commit event when sent periodic hearbeats to tell the master the replica's new vclock. The patch makes applier send heartbeats on any write event. Even if it was not commit. For example, when a sync transaction's data was written, and the replica needs to tell the master ACK using the heartbeat. Closes #5100
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 5, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 6, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 7, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 7, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 7, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
LeonidVas
pushed a commit
that referenced
this issue
Jul 8, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
LeonidVas
pushed a commit
that referenced
this issue
Jul 8, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
LeonidVas
pushed a commit
that referenced
this issue
Jul 8, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 9, 2020
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
Gerold103
added a commit
that referenced
this issue
Jul 10, 2020
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
Gerold103
added a commit
that referenced
this issue
Jul 10, 2020
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Oct 31, 2022
When replica heartbeat is read while another heartbeat is en route, it's delivery is delayed for a full replication_timeout. This can be seen in replication/qsync_basic.test.lua occasional hangs on these lines: ``` -- -- tarantoolgh-5100: replica should send ACKs for sync transactions after -- WAL write immediately, not waiting for replication timeout or -- a CONFIRM. -- -- on replica: box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} test_run:switch('default') box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} <factored-out> -- Now commit something sync. It should return immediately even -- though the replication timeout is huge. box.space.sync:replace{4} ``` Change that so that a pending heartbeat is sent immediately upon status message return from tx thread. Closes tarantool#7869 NO_TEST=hard to test NO_DOC=bugfix NO_CHANGELOG=not user-visible
sergepetrenko
added a commit
to sergepetrenko/tarantool
that referenced
this issue
Nov 1, 2022
When replica heartbeat is read while another heartbeat is en route, it's delivery is delayed for a full replication_timeout. This can be seen in replication/qsync_basic.test.lua occasional hangs on these lines: ``` -- -- tarantoolgh-5100: replica should send ACKs for sync transactions after -- WAL write immediately, not waiting for replication timeout or -- a CONFIRM. -- -- on replica: box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} test_run:switch('default') box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} <factored-out> -- Now commit something sync. It should return immediately even -- though the replication timeout is huge. box.space.sync:replace{4} ``` Change that so a pending heartbeat is sent immediately upon status message return from tx thread. Closes tarantool#7869 NO_TEST=hard to test NO_DOC=bugfix NO_CHANGELOG=not user-visible
Gerold103
pushed a commit
that referenced
this issue
Nov 10, 2022
When replica heartbeat is read while another heartbeat is en route, it's delivery is delayed for a full replication_timeout. This can be seen in replication/qsync_basic.test.lua occasional hangs on these lines: ``` -- -- gh-5100: replica should send ACKs for sync transactions after -- WAL write immediately, not waiting for replication timeout or -- a CONFIRM. -- -- on replica: box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} test_run:switch('default') box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000} <factored-out> -- Now commit something sync. It should return immediately even -- though the replication timeout is huge. box.space.sync:replace{4} ``` Change that so a pending heartbeat is sent immediately upon status message return from tx thread. Closes #7869 NO_TEST=hard to test NO_DOC=bugfix NO_CHANGELOG=not user-visible
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Need to rework the process of confirmation: currently replica sends confirm with next heartbeat, which is not acceptable.
The intention was to save on additional traffic because of confirms, but in current implementation to achieve low latency one need to set the heartbeat interval to a small value, which in turn will cause heartbeat+ack traffic.
The text was updated successfully, but these errors were encountered: