Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchro replication needs answer with Confirm immediately #5100

Closed
sergos opened this issue Jun 22, 2020 · 0 comments
Closed

Synchro replication needs answer with Confirm immediately #5100

sergos opened this issue Jun 22, 2020 · 0 comments
Assignees
Labels
feature A new functionality qsync replication
Milestone

Comments

@sergos
Copy link
Contributor

sergos commented Jun 22, 2020

Need to rework the process of confirmation: currently replica sends confirm with next heartbeat, which is not acceptable.
The intention was to save on additional traffic because of confirms, but in current implementation to achieve low latency one need to set the heartbeat interval to a small value, which in turn will cause heartbeat+ack traffic.

@sergos sergos added this to the 2.5.1 milestone Jun 22, 2020
@kyukhin kyukhin added the feature A new functionality label Jun 22, 2020
Gerold103 added a commit that referenced this issue Jun 22, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
sergepetrenko pushed a commit that referenced this issue Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
sergepetrenko pushed a commit that referenced this issue Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 23, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
sergepetrenko pushed a commit that referenced this issue Jun 25, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 25, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
sergepetrenko pushed a commit that referenced this issue Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 29, 2020
Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
Gerold103 added a commit that referenced this issue Jun 29, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jun 30, 2020
Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
Gerold103 added a commit that referenced this issue Jun 30, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jul 1, 2020
Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 1, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jul 2, 2020
Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 2, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
sergepetrenko pushed a commit that referenced this issue Jul 4, 2020
Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
sergepetrenko pushed a commit that referenced this issue Jul 4, 2020
Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
Gerold103 added a commit that referenced this issue Jul 5, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 5, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 5, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 5, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 5, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 5, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 6, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 7, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 7, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 7, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
LeonidVas pushed a commit that referenced this issue Jul 8, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
LeonidVas pushed a commit that referenced this issue Jul 8, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
LeonidVas pushed a commit that referenced this issue Jul 8, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 9, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 9, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 9, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 9, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 9, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 9, 2020
Applier used to send ACKs to master when commit happens. But for
sync transactions this is not enough - their ACK should be sent
after WAL write. Master doesn't really care whether a commit
will happen after WAL write on the replica. The only thing which
matters is whether the replica managed to persist the sync
transaction.

Now applier uses WAL write event instead of commit to send ACKs.
Nothing changed for async transactions (for them WAL write ==
commit). But sync transactions now send ACKs immediately, without
waiting for heartbeat timeout.

Closes #5100
Closes #5127
Gerold103 added a commit that referenced this issue Jul 10, 2020
With synchronous replication a sycn transaction passes 2 stages:
WAL write + commit. These are separate events on the contrary with
async transactions, where WAL write == commit.

The WAL write event is needed on non-leader nodes to be able to
send an ACK to the master.

Part of #5100
Gerold103 added a commit that referenced this issue Jul 10, 2020
Applier has a writer fiber sending vclock of the instance to the
master after each WAL write or when heartbeat timeout passes.

However it missed WAL writes happened *during* sending ACK on a
previous WAL write. That made applier sleep heartbeat timeout
even though it had not sent data. It is not a problem for async
replication, but becomes a bug when sync transactions appear. For
them an ACK should be sent as soon as possible.

Part of #5100
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Oct 31, 2022
When replica heartbeat is read while another heartbeat is en route, it's
delivery is delayed for a full replication_timeout.

This can be seen in replication/qsync_basic.test.lua occasional hangs on
these lines:
```
--
-- tarantoolgh-5100: replica should send ACKs for sync transactions after
-- WAL write immediately, not waiting for replication timeout or
-- a CONFIRM.
--
-- on replica:
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
test_run:switch('default')
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
<factored-out>
-- Now commit something sync. It should return immediately even
-- though the replication timeout is huge.
box.space.sync:replace{4}

```

Change that so that a pending heartbeat is sent immediately upon status
message return from tx thread.

Closes tarantool#7869

NO_TEST=hard to test
NO_DOC=bugfix
NO_CHANGELOG=not user-visible
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Nov 1, 2022
When replica heartbeat is read while another heartbeat is en route, it's
delivery is delayed for a full replication_timeout.

This can be seen in replication/qsync_basic.test.lua occasional hangs on
these lines:
```
--
-- tarantoolgh-5100: replica should send ACKs for sync transactions after
-- WAL write immediately, not waiting for replication timeout or
-- a CONFIRM.
--
-- on replica:
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
test_run:switch('default')
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
<factored-out>
-- Now commit something sync. It should return immediately even
-- though the replication timeout is huge.
box.space.sync:replace{4}

```

Change that so a pending heartbeat is sent immediately upon status
message return from tx thread.

Closes tarantool#7869

NO_TEST=hard to test
NO_DOC=bugfix
NO_CHANGELOG=not user-visible
Gerold103 pushed a commit that referenced this issue Nov 10, 2022
When replica heartbeat is read while another heartbeat is en route, it's
delivery is delayed for a full replication_timeout.

This can be seen in replication/qsync_basic.test.lua occasional hangs on
these lines:
```
--
-- gh-5100: replica should send ACKs for sync transactions after
-- WAL write immediately, not waiting for replication timeout or
-- a CONFIRM.
--
-- on replica:
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
test_run:switch('default')
box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
<factored-out>
-- Now commit something sync. It should return immediately even
-- though the replication timeout is huge.
box.space.sync:replace{4}

```

Change that so a pending heartbeat is sent immediately upon status
message return from tx thread.

Closes #7869

NO_TEST=hard to test
NO_DOC=bugfix
NO_CHANGELOG=not user-visible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality qsync replication
Projects
None yet
Development

No branches or pull requests

3 participants