You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The jobs won't start. I suppose it's a problem at the moment of fetching jobs from the table. Even restarting the mysql server does not help immediately. Sometimes, after several dozens of these errors, he eventually grabs the experiments and starts through.
There is no stack trace, and I have currently no clue why exactly it is happening. But it is a rather severe problem, because valuable cluster compute time is burnt only in waiting for the job to start.
I would suggest to rethink the way how the experiments are fetched. You might want to checkout the Java version in AILibs, in which we do a combined SELECT/INSERT statement to immediately reserve a job when it is being fetched (in the same query, without need of transaction).
The text was updated successfully, but these errors were encountered:
I frequently get an error when starting various jobs at the same time, but sometimes also when starting just a small number (64 jobs):
The jobs won't start. I suppose it's a problem at the moment of fetching jobs from the table. Even restarting the mysql server does not help immediately. Sometimes, after several dozens of these errors, he eventually grabs the experiments and starts through.
There is no stack trace, and I have currently no clue why exactly it is happening. But it is a rather severe problem, because valuable cluster compute time is burnt only in waiting for the job to start.
I would suggest to rethink the way how the experiments are fetched. You might want to checkout the Java version in AILibs, in which we do a combined SELECT/INSERT statement to immediately reserve a job when it is being fetched (in the same query, without need of transaction).
The text was updated successfully, but these errors were encountered: