Skip to content

Handling Failing Conditions

vzakaznikov edited this page Jan 24, 2024 · 1 revision

The program is designed to handle the following failing conditions:

The server Never Registers a Runner: The server will remain in a running state and should be reclaimed by the scale-down service when it checks the actual runners registered for the current servers. If it finds a server that is running but no runner is active for it it will be deleted after the max-runner-registration-time period.
The ./config.sh Command Fails: The behavior will be the same as for the Server Never Registers a Runner case above.
The ./run.sh Command Fails: The server will be powered-off by the startup script and deleted by the scale-down service.
Creating A Server For Queued Job Fails: If creation of the server fails for some reason, then the scale-up service will retry the operation in the next interval, as the job's status will remain queued.
Runner Never Gets a Job Assigned: If the runner never gets a job assigned, then the scale-down service will remove the runner and delete its server after the max-unused-runner-time period.
Runner Created With a Mismatched Labels: The behavior will be the same as in the Runner Never Gets a Job Assigned case above.
Clone this wiki locally