-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wandb hangs experiment (10 min+): Internal Server Error for url: https://api.wandb.ai/graphql #1016
Comments
@hanspinckaers our retry logic should do a rolling backoff and resume normal operation in the case of outages. None of that logic should have changed between 0.8.33 and 35. We're currently looking into a brief nightly outage caused by automatic database maintenance. We'll confirm the retry logic is working appropriately and are looking for solutions to the DB maintenance issue. Are you running this from a regular python process or within a Jupyter shell? |
It seems more likely that it is the outage then, the 0.8.33 version 'working' could just be a coincidence. This is running in a regular python process (PyTorch multiprocessing though). We had some hiccups with our storage system as well, so that could have played a role too. However, in some cases this exception was the last thing my python process logged before hanging or crashing. |
I have never seen this again, closing this issue now. |
@richardrl we had an outage last night that caused these errors. Everything should be functioning properly now. |
$ wandb --version && python --version && uname
wandb, version 0.10.30
Python 3.6.13 :: Anaconda, Inc.
Linux I still have this issue. |
@emanuelevivoli is the error you're talking about related to the sweep issue you filed here? Can you share the specific steps that cause your process to hang? |
hi @vanpelt , |
Hi, I'm experiencing this issue in Google Colab environment. To reproduce: #bash
wandb login --cloud "API_KEY" then #python
api = wandb.Api()
runs = api.runs(f'{entity}/{project_name}')
runs[0] output:
|
Hi, Im experience the same issue as @sadra-barikbin |
wandb --version && python --version && uname
Description
For a few days, I noticed experiments hanging on wandb logging. Sometimes I even saw crashes.
So far, downgrading to 0.8.33 seems to help. Will report if the problem arises again.
What I Did
The text was updated successfully, but these errors were encountered: