-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding --annotation-ttl for automatic unlock #119
Conversation
@dholbach any feedback on this one please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
I think it needs a few more safeguards as noted.
Also, how about unit tests?
@@ -79,6 +86,11 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) { | |||
if err := json.Unmarshal([]byte(valueString), &value); err != nil { | |||
return false, err | |||
} | |||
|
|||
if ttlExpired(value.Created, value.TTL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you have a running kured install, and you add the TTL flag; won't it see the previous annotation as expired immediately?
Similarly if someone takes the lock manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take lock manually as described, then set TTL flag it will invalidate it instantly as there will be no created
and ttl
values set in annotation:
if ttl > 0 && time.Since(created) >= ttl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this is acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So double checked this behaviour.
Launch params:
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Kubernetes Reboot Daemon: -64ebf53"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Node ID: XXX.compute.internal"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Reboot Sentinel: /var/run/update_engine_autoupdate_completed every 1m0s"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Blocking Pod Selectors: []"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Reboot on: SunMonTueWedThuFriSat between 00:00 and 23:59 UTC"
kured-pzs8b kured time="2020-05-05T18:27:40Z" level=info msg="Force annotation cleanup after: 2m0s"
Manually disabled reboots:
➜ ~ kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}'
daemonset.apps/kured annotated
Output:
kured-pzs8b kured time="2020-05-05T18:29:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:29:29Z" level=warning msg="Lock already held: manual"
kured-pzs8b kured time="2020-05-05T18:30:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:30:29Z" level=warning msg="Lock already held: manual"
kured-pzs8b kured time="2020-05-05T18:31:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:31:29Z" level=warning msg="Lock already held: manual"
kured-pzs8b kured time="2020-05-05T18:32:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:32:29Z" level=warning msg="Lock already held: manual"
kured-pzs8b kured time="2020-05-05T18:33:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:33:29Z" level=warning msg="Lock already held: manual"
Removing lock manually:
➜ ~ kubectl -n kube-system annotate ds kured weave.works/kured-node-lock-
daemonset.apps/kured annotated
Output:
kured-pzs8b kured time="2020-05-05T18:35:29Z" level=info msg="Reboot required"
kured-pzs8b kured time="2020-05-05T18:35:29Z" level=info msg="Acquired reboot lock"
kured-pzs8b kured time="2020-05-05T18:35:29Z" level=info msg="Draining node XXX.compute.internal"
...
So if lock has been taken manually, enabling this won't take over as I though initially.
To answer your question if this is acceptable - turns out an ability to manually disable reboot is not gone despite enabling annotation TTL functionality, so all is ok.
Of course this might not fit in every environments, but ideally I think the better solution would be to annotate node objects instead ds which would not require any cleanup mechanism being implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I had missed that if ttl > 0
is checking the ttl set by the locker (if any), not Kured's command-line setting.
@@ -79,6 +86,11 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) { | |||
if err := json.Unmarshal([]byte(valueString), &value); err != nil { | |||
return false, err | |||
} | |||
|
|||
if ttlExpired(value.Created, value.TTL) { | |||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would print a log message in this case, for when it does something unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not see any other log references inside this pkg, shall this be changed then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's valid; could add another return value and print at the higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think I could find place for err
inside ttlExpired
function as this is simple true/false check function.
If I'd return an err value here, it would be errored as fatal - https://github.com/weaveworks/kured/blob/f2a0f8e20dbe6b0cc4a5d03cbe06889333adee98/cmd/kured/main.go#L196-L205
Buuut, I might be blind...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll fix it in post.
@bboreham I'd love to extend unit tests if there are any ;) Addressed your concerns, can you have 2nd look please? |
Yes there are some: |
Added tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
@@ -79,6 +86,11 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) { | |||
if err := json.Unmarshal([]byte(valueString), &value); err != nil { | |||
return false, err | |||
} | |||
|
|||
if ttlExpired(value.Created, value.TTL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I had missed that if ttl > 0
is checking the ttl set by the locker (if any), not Kured's command-line setting.
@@ -79,6 +86,11 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) { | |||
if err := json.Unmarshal([]byte(valueString), &value); err != nil { | |||
return false, err | |||
} | |||
|
|||
if ttlExpired(value.Created, value.TTL) { | |||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll fix it in post.
I have observed this behavior when running kured together with cluster autoscaler, reason why this idea was born.
This PR introduces
--annotationTTL
parameter, by default set to 0 (means do not use this feature), which allows other kured pods to take off the lock if it has expired.Comparing kured to coreos-update-operator latter one is adding lock annotation onto node object which makes such feature as lockTTL completely not required.