-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug 1627769] Fix slow leak of wedged workers #2837
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-running ci-admin/tc-admin won't actually call updateWorkerPool unless the worker pools have changed.
If I'm understanding the issue correctly, it's that config.lifecycle is omitted in most worker-pool definitions, and that leads to interpretLifecycle returning null's.
In addition to updating the defaults in the schemas, could that function be modified to return default values in that case instead?
Yeah, the only thing about having defaults in two places is that i worried about making sure we would update both places later if we got around to it but that’s a good point I’ll add some defaults there too |
I think we've also tried to avoid defaults in schemas because their behavior is kind of wonky -- as you've seen here. But I am just parroting jonas here, so who knows :) |
I think we fixed a lot of the wonkiness with the patch we made to it forever ago. The defaults are nice for docs at least. Would you rather I just remove the schema default entirely here and just have the default in the code or do both? |
Both seems good -- the schema defaults are just to |
Interestingly setting the default |
@@ -21,7 +21,7 @@ properties: | |||
reregistrationTimeout: | |||
title: Checkin Timeout | |||
type: integer | |||
default: 345600 # 3 days | |||
default: 345600 # 4 days (note this is also set in interpretLifecycle; update both if changing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Oh wait, the default schema stuff is going to wreak havoc on tc-admin/ci-admin isn't it? I should just do the code-only bits I guess unless the admin tools understand this? |
I'm not sure how tc-admin or ci-admin would be affected -- neither tool uses the schemas, that I know of. What are you seeing? |
I haven't tried it yet, I'm just thinking this won't work. Since the default is applied to the input, the the actual config that is stored changes and the tc-admin diff will always detect a change until the local config is updated as well. |
Oh, good thinking. We could work around that in those tools, if it's worthwhile for the better display in the schema. |
Adding this to worker-lifecycle management sprint and we can decide what the nicest way to do that then. I'm leaning towards just having a default in the code at this point. |
ok, just gave up on making it show up in schemas. the schema docs will still show the default though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
https://bugzilla.mozilla.org/show_bug.cgi?id=1639365 filed for the intermittency |
Previous to this, the default lifecycles weren't getting applied. This was
fine in the 99% case where the worker succesfully registered as at that time
the
terminateAfter
date gets set to theexpires
of the creds returned.However for workers that did not register, the
reregisterTimeout
was nulland these workers were allowed to live forever. To apply this to all currently
existing pools, we should run
ci-admin apply
/tc-admin apply
.We had to add a default to all places where the lifecycle
$ref
was used. However Ajv does not allow the default to be specified outside ofproperties
anditems
so we just do it where it is "imported".This also changes our reference validation because in Ajv,
$ref
doesn't actually have to be alone. It can have siblings that it will merge with.Bugzilla Bug: 1627769