Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore slurm tracker? #572

Open
thvasilo opened this issue Oct 7, 2019 · 0 comments
Open

Restore slurm tracker? #572

thvasilo opened this issue Oct 7, 2019 · 0 comments

Comments

@thvasilo
Copy link

thvasilo commented Oct 7, 2019

There exists code for using the SLURM scheduler as a tracker for distributed training, but it was removed as an option from submit.py some time ago.

Lately I've been training XGBoost using an MPI cluster and while I haven't been able to get the mpi tracker to work, re-instating the SLURM tracker seems to work, after I made some changes to the command being called.

So would the community consider adding back SLURM as an option or is it supposed to be superseded by the mpi tracker now? In that case has anyone gotten the MPI tracker to train XGBoost recently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant