Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't throw away valid configuration updates #5952

Merged
merged 3 commits into from Feb 10, 2020

Conversation

zaphod42
Copy link
Contributor

@zaphod42 zaphod42 commented Dec 6, 2019

What does this PR do?

The configuration reloading code takes various actions in order to limit the number of reloads that have to be performed.
One of those actions is to check if the new configuration is the same as the current configuration.
If the new configuration is the same, then there is no need to apply it and it can be safely ignored.

However, the check was performed at the wrong location.
The location assumed that the c.currentConfigurations variable is up to date.
This assumption does not hold given the throttling and asynchronous handling of configuration updates.
The effect of this incorrect assumption is that a configuration update can be thrown away and leave a prior configuration in place.

This resolves the problem by moving the check to the loadMessage method.
This is the point where the c.currentConfigurations variable is modified and so there is no chance that the new configuration will be compared against an incorrect version.

Motivation

Fixes #5901

More

  • Added/updated tests
  • Added/updated documentation

@ldez

This comment has been minimized.

@ldez ldez closed this Dec 6, 2019
@dtomcej dtomcej moved this from To review to Done in v2 Dec 10, 2019
@ldez ldez reopened this Dec 11, 2019
@ldez ldez changed the base branch from master to v2.1 December 11, 2019 09:07
@ldez ldez added the kind/bug/fix a bug fix label Dec 11, 2019
@alexconlin
Copy link

Hi @ldez @dtomcej , do you have any news about this PR? I wondered if it might have slipped out of your queue as it was moved from To review to Done but then was reopened. We're running a patched version at the moment and we'd love to be able to track updates to the project again.

@ldez ldez self-requested a review January 7, 2020 15:27
@juliens
Copy link
Member

juliens commented Jan 28, 2020

Thx for your PR!

By fixing this bug like this, I see a side effect. With your version, if we receive a "same configuration", it will now trigger the throttle and so if we receive, one "same configuration" and one different configuration, we will have to wait for the throttle time before applying the new configuration.

WDYT if you put the configuration compare in the ThrottleGoroutine https://github.com/containous/traefik/blob/16288d171cedf4cc1ff429d9099152721b90e8fb/pkg/server/configurationwatcher.go#L216 and just compare with a save of the previous configuration that was sent in the ring.In() chan?

@zaphod42
Copy link
Contributor Author

@juliens thank you for taking a look at this. I could take or leave the change you are suggesting. My main concern was around correctness rather than speed of config update as it seemed that the throttle design was already purposefully introducing delays. I chose separation that I did because it kept two concepts separate: selecting a "most recent" update (the "throttle"), and idempotently applying the configuration updates that are selected. The code that I changed already existed and I simply moved it to the place which seemed to actually achieve the goal given the design as I saw it. I didn't want to take on a different design, which is what I see the change you suggesting leading to. That is to change the "throttle" into a comprehensive configuration updates gating (de-duplication, rate limiting, validation) mechanism. I think that is a valid design, but not something I wanted to take on in this fix.

@juliens
Copy link
Member

juliens commented Jan 28, 2020

I agree that it changes a little bit the design, but your change changes the behaviour too, because, yes the throttle add some delay, but if you read the code carefully, you will see that the first received configuration is apply, and then we add the throttle duration.
With your modification, we could have a delay where we didn't have it before, so to be honest, I prefer a little change in the design ( that for me, is not perfect because it's inherit from the v1 and we should refactor it, i hope soon ) than a change in the behaviour.

@zaphod42
Copy link
Contributor Author

@juliens, I'd like to be clear on what you are saying. Are you saying that you won't accept this fix unless I make the change you are stating? Or are you saying that, if I have time, could I please make this change?

If you are asking if I have time. Then the answer is that I really don't. You are welcome to make the change to address the other issue that you are worried about if you want to address that.

I'll make the change if you are demanding that, but unhappily. I have provided a fix that addresses the issue I've reported. I've already spent a large amount of time on this bug to explain it to one of your collegues and have pretty much moved on by this point. The only thing holding me to this now is that I want to make sure we aren't running on a patched version of the code forever (or until we move off of traefik because of this bug). I am not relishing the idea of taking this on as, without some more thinking about it, I'm not convinced of its correctness. I'd rather not spend that time right now.

@juliens
Copy link
Member

juliens commented Jan 28, 2020

@zaphod42, to be clear, what I am saying is that I will not merge your PR as is because it fixes your bug, but it does add a new bug.

Having said that, I can really understand that contributors don't always have enough time to quickly manage comments, and that you don't have time to manage that. This is why, if you agree, since the fix I am offering does not take much time, I can modify your PR in order to handle this, and I would be really happy if you could take a look and confirm that it fixes your bug (even if I am already pretty sure that it fixes it thanks to the test that you added)

@juliens
Copy link
Member

juliens commented Feb 3, 2020

@zaphod42 I made the fix, could you confirm the fix works for you too?
Thx

Copy link
Member

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

@ldez ldez added this to the 2.1 milestone Feb 10, 2020
Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
:shipit:

Andrew Parker and others added 3 commits February 10, 2020 20:22
The configuration reloading code takes various actions in order to limit
the number of reloads that have to be performed. One of those actions is
to check if the new configuration is the same as the current
configuration. If the new configuration is the same, then there is no
need to apply it and it can be safely ignored.

However, the check was performed at the wrong location. The location
assumed that the c.currentConfigurations variable is up to date. This
assumption does not hold given the throttling and asynchronous handling
of configuration updates. The effect of this incorrect assumption is
that a configuration update can be thrown away and leave a prior
configuration in place.

This resolves the problem by moving the check to the loadMessage method.
This is the point where the c.currentConfigurations variable is modified
and so there is no chance that the new configuration will be compared
against an incorrect version.
Didn't run go fmt on the previous commit. This was pointed out by a
failing test.
@zaphod42
Copy link
Contributor Author

zaphod42 commented Mar 4, 2020

Hey guys, I wanted to come back and thank you for working on this. I'm sorry about snapping at you earlier, I let my stress get to me and it came out on you. You don't deserve that.

I took a look at the change and it looks like it addresses the race issue as well as the timing issue that you raised, @juliens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v2
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants