-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document about single region 1PC #57
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Single Region 1PC | ||
|
||
For transactions that affect only one region, or more strictly speaking, that can be prewritten with only one prewrite request, can be committed directly while the prewrite request is being handled, so the commit phase can be totally removed. Therefore we can get less latency and more throughput. For TiDB, indices and rows are unsally not in a same region, **so this optimization only works under a limited amount of scenarios, like sysbench oltp_update_non_index**. But in a suitable scenario, it gains significantly better performance. | ||
|
||
We used to think about supporting 1PC before but stopped after meeting many difficulties. Now since async commit, which meets and solved mostly the same problem as 1PC, is implemented, we can continue supporting 1PC with much less effort to make. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From @cfzjywxk :
|
||
|
||
* Problems already solved by supporting async commit: | ||
* Commit ts calculation | ||
* Non-unique commit ts and rollback record overlapping | ||
* Memory lock | ||
* Replica read problem | ||
* Problems that we do not care anymore: | ||
* Binlog incompatibility | ||
|
||
## Basic Design | ||
|
||
Since async commit is implemented, it's not difficult to implement a working 1PC. However, it's still hard to make it perfectly correct and compatible with other components. | ||
|
||
* When committing a transaction, if TiDB finds that the prewrite phase can be done with only one single request, the transaction is allowed to be committed with 1PC protocol. A field named `try_one_pc` in the prewrite request will be set to let TiKV know that 1PC is available for this transaction. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would this be set or not set? If and only if the transaction only touches one region/store? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will be set iff the prewrite can be done with only one request. Typically it means the transaction affects only one region. But if there's too many affected keys in a region, it may need to be prewritten with more than one requests, in which case 1PC is not applicable. |
||
* When TiKV receives a request with `try_one_pc` set, it first handle it just like how it handles normal prewrite requests.But after generating write buffer and before writing them down to RocksDB, it will additionally check if the prewrite is fully successful, and convert the locks into commit records if so. And finally write them down to RocksDB. The `commit_ts` is `max(max_ts, start_ts, for_update_ts) + 1`. It fetches the `max_ts` while acquiring the memory lock, and the memory lock is released after applying, just like how async commit does. The final `commit_ts` will be sent back to TiDB via prewrite response. | ||
* 1PC and async commit can be independent. When TiKV rejects to commit a transaction with 1PC, the transaction can then fallback to normal transactions, and it may become a normal 2PC transaction or an async commit transaction, according to if the async commit flag is set. | ||
|
||
## Problems need to be solved | ||
|
||
### Schema version checking problem | ||
|
||
[This problem exists in async commit too](https://github.com/tikv/sig-transaction/blob/master/design/async-commit/parallel-commit-known-issues-and-solutions.md#schema-version-checking). But it's even harder to solve for 1PC, because if it's committed in TiKV, it will have no chance to check if the schema version has changed between the `start_ts` and `commit_ts`, while for async commit it can be checked after prewrite finishing. | ||
|
||
Possible solution: When trying committing a transaction with 1PC, find a ts `one_pc_max_commit_ts` before which we can guarantee that the schema version can't change, and send it to TiKV. TiKV will reject committing if the calculated `commit_ts` exceeds the `one_pc_max_commit_ts`. | ||
|
||
### CDC compatibility problem | ||
|
||
CDC syncs data from TiKV by observing applying events. Prewrites and commits are distinguished, and CDC will use these events to compose complete transactions and then send them to the downstream. So when 1PC is enabled, there need to be some way for CDC to distinguish if a apply event is caused by 1PC committing, otherwise CDC will expect that a commit event must has a corresponding prewrite event to compose a complete transaction. | ||
|
||
Possible solutions: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Firstly, we should have a way that disables 1PC when CDC is connected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will 1PC be shipped with the problem unsolved? Otherwise I think we can allow enabling 1PC only after resolving these problems, then we won't need to automatically disable 1PC when CDC is connected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we can enable 1PC by default on the master branch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if enabling it by default is a good idea unless we've done enough tests and it's stable enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. enable it by default can make the feature mature more quickly. |
||
1. Passing a `is_1pc` flag from txn layer to apply, just like how `TxnExtra` (which is used to support CDC outputting old value) was written previously. It would be ugly. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is the least ugly solution :-) 2 is very fragile, 3 seems wasteful and make write records more complex. |
||
2. When an apply batch puts write records but without deleting lockcf, we conclude that it must be a 1PC committing. This sounds like a simple way, but this may leave troubles to future. For example, if we want to introduce some new mechanisms that may amend persisted write records, we will then also need a new mechanism to distinguish the amending and 1PC committing. There's even one more problem: if the transaction being committed is a pessimistic transaction, the pessimistic lock need to be removed while writing the commit record | ||
3. Add a field to write record that marks it's committed by 1PC. This approach makes the write record more complicated may waste some disk space. | ||
|
||
## POC Test | ||
|
||
We verified the improvement on a draft implementation with a brief test with following configurations, on a cluster with 3 TiKV nodes(16c 90G each) and 1 TiDB node(16c 64G): | ||
|
||
``` | ||
sysbench oltp_update_non_index run --rand-type=uniform --db-driver=mysql --tables=8 --table-size=10000000 --time=270 --percentile=99 --report-interval=10 --threads=128 | ||
``` | ||
|
||
Here's the result: | ||
|
||
| test | qps | lat avg (ms) | lat .99 (ms) | lat max (ms)| | ||
|----------------------------|---------|--------------|--------------|-------------| | ||
|update_non_index<br/>1pc | 16603.37| 7.71| 15.00| 81.50| | ||
|update_non_index<br/>async_commit|13278.29| 9.64| 18.61| 110.50| | ||
|update_non_index<br/>2pc | 8892.85| 14.39| 24.83| 118.57| | ||
|
||
|
||
## Plan | ||
|
||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.