Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up synchronous transaction queue processing #9917

Open
sergepetrenko opened this issue Apr 8, 2024 · 0 comments
Open

Speed up synchronous transaction queue processing #9917

sergepetrenko opened this issue Apr 8, 2024 · 0 comments

Comments

@sergepetrenko
Copy link
Collaborator

Performance of synchronous replication degrades too much with the increase of parallel synchronous requests.
For example, on my machine with 6000 requests in parallel (box.info.synchro.queue.len reads 6000) master was only able to pull about 8000 RPS, and messages like this were frequent in the log:

2024-04-08 17:23:11.047 [43663] main txn.c:830 W> too long WAL write: 1 rows at LSN 11794161: 0.520 sec

The issue manifested itself even with replication_synchro_quorum = 1, meaning it's not related to network delay, also the size of the quorum didn't influence the results too much. It seems the problem lies in the way synchronous transactions are processed in the queue.

Besides, when trying the same 6000 concurrent requests to replace something in an async space, the RPS was as high as 300k, meaning the issue isn't related to batch finalization of transactions.

Most likely the cause of degradation is the way our txn_limbo_ack traverses the whole transaction list. In the example above txn_limbo_ack is always called with lsn of the last of 6000 transactions, but it still traverses the whole list and assigns ack_count separately to each transaction. We might improve this: persist an array of lsn's of acks, once the ack_lsn is increased - find the point up to which everything should be committed via binary search, for example.

@sergepetrenko sergepetrenko added bug Something isn't working performance qsync replication and removed bug Something isn't working labels Apr 8, 2024
@CuriousGeorgiy CuriousGeorgiy self-assigned this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants