Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement quorum-based synchronous replication #4842

Open
10 of 12 tasks
sergos opened this issue Apr 3, 2020 · 8 comments
Open
10 of 12 tasks

Implement quorum-based synchronous replication #4842

sergos opened this issue Apr 3, 2020 · 8 comments
Assignees
Labels
feature A new functionality qsync replication
Milestone

Comments

@sergos
Copy link
Contributor

sergos commented Apr 3, 2020

The quorum-based synchronous replication has to address:

  • protocol backward compatibility to enable cluster upgrade w/o downtime
  • consistency of data on replica and leader
  • switch from leader to replica without data loss
  • up to date replicas to run read-only requests
  • ability to switch async replicas into sync ones
  • guarantee of rollback on leader and sync replicas
  • simplicity of cluster orchestration

The RFC is available here

Major subtasks:

@sergos sergos self-assigned this Apr 9, 2020
@kyukhin kyukhin added prio1 feature A new functionality labels Apr 10, 2020
@kyukhin kyukhin added this to the 2.5.1 milestone Apr 10, 2020
@Gerold103
Copy link
Collaborator

I started implementation on this branch: gerold103/gh-4842-sync-replication. Since the tickets are depending on each other, we probably should push our intermediate results more often and inter-merge our branches sometimes.

@Gerold103
Copy link
Collaborator

Gerold103 commented Apr 29, 2020

I pushed 'some kind of sync replication' on the branch.
https://github.com/tarantool/tarantool/tree/ad983366d97f6a9f50c0edfb48cb66918c42deaf
Basically it is just wait_lsn, but right inside txn_commit() in C. However there is a plan how to develop it next. Probably other issues may depend on that, so I decided to describe it here.

  1. I tried to solve a problem where to store transactions, which wait for a quorum, so as relays could reach to them and confirm. For that I introduced a structure txn_limbo. It is a list of transactions originated from the master, and a vclock. Every component of the vclock is how replicas see master's LSN. So it is 32 versions of master LSN, how it is seen on all replicas.

  2. I added replication_sync_quorum option. It is a global variable which means how many instances should apply the transaction before its commit succeeds.

  3. Every relay thread on the master in tx_status_update() tells the last vclock of the relay to the txn_limb, and increments ack_count of newly acked transactions.

  4. When transaction's fiber wakes up and sees it has reached the quorum, the commit succeeds.

The key thing here is txn_limbo. It is supposed to be the key data structure in all these confirmation/rollback things. It is a channel between replication module and txn module.

@Gerold103
Copy link
Collaborator

I pushed a more or less finished version of the things I described above on the same branch: gerold103/gh-4842-sync-replication.

@kostja
Copy link
Contributor

kostja commented May 10, 2020 via email

@Gerold103
Copy link
Collaborator

So if we add space option quorum, then all transactions will have their own quorum value? And the needed quorum is the MAX among quorums of affected sync spaces, right?

@kostja
Copy link
Contributor

kostja commented May 11, 2020

I mentioned earlier on the thread that I don't think we should let spaces with different quorum requirements mix in the same transaction and provided an example how it leads either to availability or consistency issues. Please read my reply to Sergey's spec.

The more I think of it the more I believe that we need to explicitily specify the replication group the space is part of, not even replication factor. and the replication group has to be explicitly defined from uuids of its participants. Otherwise we have 6 replicas, rf=3, two transactions get quorum=2 but from 2 different replicas -> consistency violation.

@kostja
Copy link
Contributor

kostja commented May 11, 2020

In other words, I suggest to introduce new ddl:
box.schema.group.create('groupname')
box.schema.group.add('gropuname', server_uuid)
box.schema.group.drop()
And then in space properties:
box.schema.space.create('spacename', {group = groupname})
It's a good idea to reserve the first thousand groups for the system.

@kostja
Copy link
Contributor

kostja commented May 11, 2020

System spaces should be part of a system group = all, which includes all nodes in _cluster

cyrillos added a commit that referenced this issue Jun 26, 2020
Introduced in 157beda
but never used since.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
cyrillos added a commit that referenced this issue Jun 26, 2020
Last time used in 1d97902

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
cyrillos added a commit that referenced this issue Jun 26, 2020
Introduced in 157beda
and never used since.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
cyrillos added a commit that referenced this issue Jun 26, 2020
We never use this method so no need to waste space.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
cyrillos added a commit that referenced this issue Jun 26, 2020
To operate with flags we've three helpers:
_set, _clear and _has. No need for additional
wrapper.

Same time it is more convenient to grep for
TXN_FORCE_ASYNC directly.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Gerold103 pushed a commit that referenced this issue Jun 29, 2020
Introduced in 157beda
but never used since.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Gerold103 pushed a commit that referenced this issue Jun 29, 2020
Last time used in 1d97902

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Gerold103 pushed a commit that referenced this issue Jun 29, 2020
Introduced in 157beda
and never used since.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Gerold103 pushed a commit that referenced this issue Jun 29, 2020
We never use this method so no need to waste space.

In-scope-of #4842

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
@kyukhin kyukhin modified the milestones: 2.5.1, 2.6.1 Jul 22, 2020
@kyukhin kyukhin modified the milestones: 2.6.1, wishlist Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality qsync replication
Projects
None yet
Development

No branches or pull requests

4 participants