Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise Exception("The wandb backend process has shutdown") #4651

Closed
kanishk-adapt opened this issue Dec 16, 2022 · 3 comments
Closed

raise Exception("The wandb backend process has shutdown") #4651

kanishk-adapt opened this issue Dec 16, 2022 · 3 comments
Labels
a:app Area: Frontend/Backend

Comments

@kanishk-adapt
Copy link

Current Behavior

My script failed because of the below error,

`
File "bert_unlab_mlm.py", line 300, in

model, stat = train_eval(model, train_loader, val_loader, device)
File "bert_unlab_mlm.py", line 175, in train_eval
wandb.log({'train_batch_loss':loss.item()})
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1422, in log
self._log(data=data, step=step, commit=commit)
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1232, in _log
self._partial_history_callback(data, self.history_step, commit=True)
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 1083, in _partial_history_callback
self._backend.interface.publish_partial_history(
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 517, in publish_partial_history
self._publish_partial_history(partial_history)
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 61, in _publish_partial_history
self._publish(rec)
File "/home/kverma/anaconda3/envs/acl/lib/python3.8/site-packages/wandb/sdk/interface/interface_queue.py", line 49, in _publish
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown

Can you please help with this?

Expected Behavior

No response

Steps To Reproduce

No response

Screenshots

No response

Environment

OS: Linux

Version: wandb==0.12.11; Python 3.8.12

Additional Context

No response

@kanishk-adapt kanishk-adapt added the a:app Area: Frontend/Backend label Dec 16, 2022
@thanos-wandb
Copy link
Contributor

Hi @kanishk-adapt thank you for reporting this. Can you please provide a bit more information of the training environment and infrastructure where this was running? I've noticed you're using an older wandb SDK version, would it be possible to upgrade to the most recent one (0.13.7)? Also could you provide us with the debug.log and debug-internal.log files of the run that failed? you can find those in /wandb/run-<data-time>-<run-id>/logs relative path from where you executed the script. If you don't want to publicly share them, please send these files to support@wandb.com and mention this GitHub issue, and we can take it from there.

@thanos-wandb
Copy link
Contributor

Hi @kanishk-adapt I wanted to follow up on this issue, does it still occur for you and would it be possible to provide us with the information above so that we could investigate further what's the root cause? thanks!

@kanishk-adapt
Copy link
Author

Hi @thanos-wandb , thank you for your response.
I found a similar issue here: #3223
The resolution, to add environment variable: WANDB_START_METHOD="thread" seems to have resolved the issue.
I have now upgraded wandb to 0.13.7 and seems to be working well.
Thanks again,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:app Area: Frontend/Backend
Projects
None yet
Development

No branches or pull requests

2 participants