You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm creating an issue for this change in condu because it might change a vast proportion of code.
The problem is:
When a worker process (child process that polls tasks from conductor server) is killed by the OS, for example due to OOM, there is no way to update the task being treated by the process that is gone as "FAILED".
The process that is killed is set as a Zombie process and the task type that was being handled by this process is never polled again from this worker.
The solution that I will provide is:
When child processes are triggered, they will be triggered from different threads that maintain a pipe open to the process that is triggered.
Whenever the process receives a task from Conductor server it communicates this task with the thread that triggered the process.
If the child process is killed, for example due to OOM, the thread that created the process can observe this and can update the task as FAILED and create a new process to replace the one killed before.
For any doubts or suggestions, comment please.
The text was updated successfully, but these errors were encountered:
This seems like a viable approach.
There are a few caveats to should be taken into consideration. It's preferable to use new processes instead of new threads because of the GIL. This builtin library should be able to handle the tasks you propose.
Hi @bioslikk long time no see :p.
As the threads will only be used to have a Pipe for the process they create, I think there is no big overhead in context switching between threads.
PS: I'm aware of GIL.
Cheers.
I'm creating an issue for this change in condu because it might change a vast proportion of code.
The problem is:
The solution that I will provide is:
For any doubts or suggestions, comment please.
The text was updated successfully, but these errors were encountered: