Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when fetching experiments simultaneously. #67

Closed
fmohr opened this issue Nov 6, 2022 · 2 comments
Closed

Deadlock when fetching experiments simultaneously. #67

fmohr opened this issue Nov 6, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@fmohr
Copy link
Collaborator

fmohr commented Nov 6, 2022

I frequently get an error when starting various jobs at the same time, but sometimes also when starting just a small number (64 jobs):

1205 (HY000): Lock wait timeout exceeded; try restarting transaction
 raised when executing sql statement.

The jobs won't start. I suppose it's a problem at the moment of fetching jobs from the table. Even restarting the mysql server does not help immediately. Sometimes, after several dozens of these errors, he eventually grabs the experiments and starts through.

There is no stack trace, and I have currently no clue why exactly it is happening. But it is a rather severe problem, because valuable cluster compute time is burnt only in waiting for the job to start.

I would suggest to rethink the way how the experiments are fetched. You might want to checkout the Java version in AILibs, in which we do a combined SELECT/INSERT statement to immediately reserve a job when it is being fetched (in the same query, without need of transaction).

@fmohr fmohr added the bug Something isn't working label Nov 6, 2022
@alexandertornede
Copy link
Collaborator

Do you have a minimal code example to reproduce this?

Also: I suppose this is about the MySQL version?

@tornede
Copy link
Owner

tornede commented Nov 21, 2022

This issue should be solved with PR #76

@tornede tornede closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants