Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

box: apply dynamic cfg even if option value is unchanged #8552

Closed
wants to merge 1 commit into from

Conversation

locker
Copy link
Member

@locker locker commented Apr 11, 2023

If a configuration option value passed to box.cfg is the same as the old one, the option handler isn't called. As a result, one can't restart the server IPROTO connection or replication by passing the same URI to box.cfg. This is inconvenient, for example:

  • To update an SSL key file without renaming it, one has to first pass an empty URI to box.cfg.listen and then reset it back.
  • To restart broken replication after fixing it manually (e.g. deleting a conflicting record on the replica), one has to first set box.cfg.replication to {} and then set it back instead of just calling box.cfg{replication = box.cfg.replication}.

There's no reason to skip an option update if the value is unchanged. It should be up to the option configuration callback.

The only callback that may actually fail on an attempt to set the same value is box.cfg.memtx_memory. This happens because the value is rounded up to a multiple of the quota unit size so we just need to fix the check accordingly.

Note that the box.cfg.replication configuration callback won't restart replication if it's up and running even if the URI list is rearranged thanks to commit 5994892 ("replication: fix replica disconnect upon reconfiguration"). This commit doesn't break this behavior - it just removes the load_cfg.lua code that would skip option update if the old and new option values are equal in Lua.

Needed for tarantool/tarantool-ee#432
Closes #8551

@locker locker requested a review from a team as a code owner April 11, 2023 16:00
@locker locker requested a review from nshy April 11, 2023 16:10
@coveralls
Copy link

coveralls commented Apr 11, 2023

Coverage Status

Coverage: 85.559% (+0.002%) from 85.557% when pulling 8b4b088 on locker:box-cfg-reload-fix into 7314a1b
on tarantool:master
.

@Mons
Copy link
Contributor

Mons commented Apr 11, 2023

Don't do this, It's breaking behavior.

@locker
Copy link
Member Author

locker commented Apr 11, 2023

Don't do this, It's breaking behavior.

What does it break? All tests pass. The replication configuration callback won't restart replication if it's up and running thanks to commit 5994892.

See also #8551 (comment).

@Totktonada
Copy link
Member

Even if it doesn't break things it sets the new mechanics and requires extra attention from callback authors to keep the old behavior. I have a proposal: #8551 (comment).

Copy link
Contributor

@p7nov p7nov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my suggestion on the changelog wording.

@unera
Copy link
Collaborator

unera commented Apr 12, 2023

what exactly can be broken?

@Mons
Copy link
Contributor

Mons commented Apr 12, 2023

what exactly can be broken?

Discussed in issue.
Confusing wording in description lead to misunderstanding.

If no active replication is restarted in case when nothing changed, then everything is ok.

If a configuration option value passed to `box.cfg` is the same as the
old one, the option handler isn't called. As a result, one can't restart
the server IPROTO connection or replication by passing the same URI to
`box.cfg`. This is inconvenient, for example:
 - To update an SSL key file without renaming it, one has to first pass
   an empty URI to `box.cfg.listen` and then reset it back.
 - To restart broken replication after fixing it manually (e.g. deleting
   a conflicting record on the replica), one has to first set
   `box.cfg.replication` to `{}` and then set it back instead of just
   calling `box.cfg{replication = box.cfg.replication}`.

There's no reason to skip an option update if the value is unchanged.
It should be up to the option configuration callback.

The only callback that may actually fail on an attempt to set the same
value is `box.cfg.memtx_memory`. This happens because the value is
rounded up to a multiple of the quota unit size so we just need to fix
the check accordingly.

Note that the `box.cfg.replication` configuration callback won't restart
replication if it's up and running even if the URI list is rearranged
thanks to commit 5994892 ("replication: fix replica disconnect upon
reconfiguration"). This commit doesn't break this behavior - it just
removes the `load_cfg.lua` code that would skip option update if the old
and new option values are equal in Lua.

Needed for tarantool/tarantool-ee#432
Closes tarantool#8551

NO_DOC=bug fix; this behavior isn't described anywhere in the docs
@kostja
Copy link
Contributor

kostja commented Apr 14, 2023

The client has a right to expect that if the network is OK and the hardware is running fine its queries will not fail. There is a whole class of applications for which it is good enough. Now you're saying you'll be dropping connections for no reason whatsoever, just because somebody reset an unrelated parameter.

A lot of applications are not idempotent. With interactive transactions it's even worse.

@locker
Copy link
Member Author

locker commented Apr 14, 2023

The client has a right to expect that if the network is OK and the hardware is running fine its queries will not fail. There is a whole class of applications for which it is good enough. Now you're saying you'll be dropping connections for no reason whatsoever, just because somebody reset an unrelated parameter.

I didn't say that.

No active connections would be dropped on reconfiguration after this change, because neither box.cfg.replication nor box.cfg.listen callbacks drop active connections. With box.cfg.listen it's always been this way while box.cfg.replication was patched to behave this way in commit 5994892.

Still, after this change reconfiguration of box.cfg.replication would wait for sync is complete, which may be an issue. At least, this breaks vshard tests.

@locker
Copy link
Member Author

locker commented Apr 17, 2023

#8551 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

box.cfg doesn't try to re-apply listen or replication options if the option value is the same
8 participants