Add support for deleting scheduled tasks from the daemon #1417

brdji · 2022-08-03T09:17:04Z

Description

The command to remove (terminate) tasks (ie. testground terminate) does not remove scheduled jobs. If the daemon is overloaded, or we wish to remove scheduled tasks, we have to remove them from the queue (ie. Redis) manually.

What defines this endeavour to be complete?

Support terminating any task by state, other than the ones in progress (that would be too complicated, and we have a timeout for that).

Support terminating tasks by state (ie. testground terminate --state complete)
Support terminating scheduled tasks

The text was updated successfully, but these errors were encountered:

laurentsenta · 2022-08-04T09:22:41Z

Nice, thanks for creating this,
We might have the feature already in the dashboard (http://localhost:8042 by default): each task has "kill/delete" buttons.

If we decide to implement a CLI command, a few notes:

(1) What would happen if a user ran testground terminate --state complete?
I understand it as "terminate the tasks that are already complete" which looks like a no-op.

(2) Having a command specialized to terminate tasks by state sounds quite specialized,
could we provide testground kill --task ID instead?

It's more cohesive with the rest of the API (testground status --task ID)
It's more versatile since you may combine commands (the unix way)
- For example, with docker, you may write docker stop $(docker container ls -q --filter name='testground*')
- Similarly, we could combine the tasks and kill commands.

dektech · 2022-08-08T10:30:16Z

I completely agree that termination per state is quite specialized, and not really necessary when I think about it.
Termination per task ID would give us the possibility to clear stuck/scheduled tasks without disrupting anything else.

Regarding the testground terminate --state complete, yeah it looks like a no-op, but do you think it would be beneficial if we could clear all completed tasks from the list after 3/5/7 days or keep only the last 50/70/100 tasks?

laurentsenta · 2022-08-08T13:10:05Z

Regarding clearing history, is there any reason to focus on testground terminate?
I might lack some context.

If we want to remove old tasks' data from the logs,
I would stick to something similar to my previous comment:

Improve testground tasks to list the tasks we need. Something like testground tasks --status=completed --completed-after=$DATE,
Expose the testground delete operation we've seen in the dashboard into the CLI.

It seems like the effort would be the same, with a clearer UX and more opportunities for reuse.

Regarding benefits, is that a new feature or is this related to 3. Clean up the state after running tests, so performance is not affected if a test crashes or takes too longin Devgrant?

dektech · 2022-08-08T14:23:57Z

The new feature is partially related to what you just listed - not to clutter the system and affect other tests and overall performance.

But a more solid example of why would we need is what we are encountering while testing ping-pong (as described in #1416) - one pod goes into completed state, the other stays running indefinitely until removed from the cluster.
Even if we delete the pods, the daemon will still show the task as running, until the daemon deployment is deleted and then redeployed.
So the new feature would allow us to cancel/stop/terminate tasks without the need to restart the daemon pods.

laurentsenta · 2022-08-08T14:34:04Z

Got it, thanks for sharing, I don't think we need a new feature here, we need to fix bugs:

There is a task timeout for every test run in testground, when a task reaches this timeout we should kill it and update the state. Look for task_timeout_min in the codebase.
There is a kill operation in the dashboard, that should force kill the instances and update the state, if that is not working as intended we should fix it.
That might be a symptom of a concurrency error like in fix: testground instability due to race condition on outcomes #1407 or a software design flaw (should we hook into some sort of k8s status update stream?)

Could you open issues for the first two?

laurentsenta mentioned this issue Aug 8, 2022

testground terminate: options to indicate what to destroy #611

Open

This was referenced Aug 29, 2022

Delete tasks that have timeouted #1432

Open

Verify kill command is working as intended and write a help topic #1433

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for deleting scheduled tasks from the daemon #1417

Add support for deleting scheduled tasks from the daemon #1417

brdji commented Aug 3, 2022

laurentsenta commented Aug 4, 2022

dektech commented Aug 8, 2022

laurentsenta commented Aug 8, 2022

dektech commented Aug 8, 2022

laurentsenta commented Aug 8, 2022 •

edited

Add support for deleting scheduled tasks from the daemon #1417

Add support for deleting scheduled tasks from the daemon #1417

Comments

brdji commented Aug 3, 2022

Description

What defines this endeavour to be complete?

laurentsenta commented Aug 4, 2022

dektech commented Aug 8, 2022

laurentsenta commented Aug 8, 2022

dektech commented Aug 8, 2022

laurentsenta commented Aug 8, 2022 • edited

laurentsenta commented Aug 8, 2022 •

edited