Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wal_queue_max_size is set too late on initial box.cfg #10013

Closed
sergepetrenko opened this issue May 15, 2024 · 0 comments · Fixed by #10015
Closed

wal_queue_max_size is set too late on initial box.cfg #10013

sergepetrenko opened this issue May 15, 2024 · 0 comments · Fixed by #10015
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working replication

Comments

@sergepetrenko
Copy link
Collaborator

sergepetrenko commented May 15, 2024

Bug description

wal_queue_max_size is a dynamic cfg parameter, and it's applied by load_cfg only after box_cfg_xc() finishes.
In case box_cfg_xc includes syncing with a remote master, all the syncing wil happen with default wal_queue_max_size setting, 16 megabytes. Turns out, the default is too big for some applications when rows are tiny and wal_queue allows hundreds of thousands of rows to enter the queue in one event loop cycle. If this is the case, the same symptoms already discussed in #5536 appear, and the node is never in time to sync with the master.

Reproduced on Tarantool 2.8.4.
A workaround for Tarantool 2.8.4 is to start with box.cfg{replication_sync_timeout=0.01}. Then "sync" stage ends almost immediately and the desired wal_queue_max_size will be applied as soon as possible.

@sergepetrenko sergepetrenko added bug Something isn't working replication 2.11 Target is 2.11 and all newer release/master branches labels May 15, 2024
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue May 16, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in tarantool#5536 arised.

Fix this.

Closes tarantool#10013

NO_DOC=bugfix
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue May 16, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in tarantool#5536 arose.

Fix this.

Closes tarantool#10013

NO_DOC=bugfix
@sergepetrenko sergepetrenko self-assigned this May 16, 2024
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue May 20, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in tarantool#5536 arose.

Fix this.

Closes tarantool#10013

NO_DOC=bugfix
sergepetrenko added a commit that referenced this issue May 21, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in #5536 arose.

Fix this.

Closes #10013

NO_DOC=bugfix
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue May 21, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in tarantool#5536 arose.

Fix this.

Closes tarantool#10013

NO_DOC=bugfix

(cherry picked from commit ab0f791)
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue May 21, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in tarantool#5536 arose.

Fix this.

Closes tarantool#10013

NO_DOC=bugfix

(cherry picked from commit ab0f791)
sergepetrenko added a commit that referenced this issue May 21, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in #5536 arose.

Fix this.

Closes #10013

NO_DOC=bugfix

(cherry picked from commit ab0f791)
sergepetrenko added a commit that referenced this issue May 21, 2024
wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in #5536 arose.

Fix this.

Closes #10013

NO_DOC=bugfix

(cherry picked from commit ab0f791)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working replication
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant