New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric present but marked as <null> (Parallel Coordinates) #1179
Comments
Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details. |
Hey there, trying to figure this out. Could you share the link to the project? If you want to you could send it to my mail: artyom@wandb.com |
Link sent! |
Honestly so far I'm not successful in recreating this issue— could you share some code to reproduce this? |
My code is pretty huge (with a lot of external references), so I will try to write a smaller version as soon as I can. |
That would be great! |
Hey @pierreelliott , do the names of your metrics contain special characters like / or . ? |
Hi @tyomhak , there are some (which should come from Tensorboard), like I've finally written a smaller version that you can find in this gist, however I can't reproduce my problem. For your information, this isn't a problem anymore as I've copied all needed metrics to each runs summary (important metrics were only composed of one value and |
Thank you very much! |
Hi @tyomhak , I have retried my script a few times and there are some things that might be of interest :
|
This issue is stale because it has been open 60 days with no activity. |
I am having a very similar issue on wandb 0.11.0, Python 3.8, on compute canada doing an Optuna hyperparam optimization run (using the offline mode and syncing later). When I sync, some values are correctly drawn, but do not appear in the Table. So if I want to sort by reward achieved, I can't, even though the reward is actually being registered and plotted correctly. The weird thing is that this is only happening on some runs, with no discernible pattern. Here's an example: Clearly, the value is being uploaded to the server, so this seems to be a client issue at the surface. |
Hey @jefft255 can you share a link to one of the runs this is impacting? |
https://wandb.ai/jft/olqg_stationkeeping_td3_belugasweep?workspace=user-jft It's private but I imagine you can still take a look as a developer? Otherwise how do I grant you access? Take a look at final_avg_reward. I tried using wandb.summary directly, to no avail. Even then, the expected behaviour is that the last logged metric is saved and you can sort by it in the table view. |
@jefft255 this definitely looks like a regression. Can you find one of the local run directories for the runs that aren't reporting these metrics and zip it then send it to vanpelt@wandb.com? It should be a sub directory named run-DATE-ID of the wandb directory relative to the script you ran. |
Zip sent! |
Hey @jefft255 i didn't get an email. Did you send it to vanpelt@wandb.com? |
Resent it, otherwise here's a OneDrive link: https://1drv.ms/u/s!Am7JVxHPejSNg-0VswGfUYHEVy4OEw?e=Z7lsF1 |
Hey @jefft255 we did some more digging. We still haven't found the root cause, but this issue only occurs with offline runs when an edge case is hit. Until we find the root cause, you can manually fix the runs missing summary metrics with the following script:
|
Your script works, and is a good enough fix for me for now, thank you for your help! |
TL;DR: Make sure you didn't disable wandb when running sweeps. Had the same problem. When I runned my code without the sweep everything worked. While coding and testing I had by default the |
Had the same issue today. I used a colab and ran some tests. Unfortunately I had an extra |
I had the same null issue but the above script returns me the following error
Do you have any clue why this is happening? |
Looks related to histograms in your summary. What version of wandb are you using? You might want to just skip over histogram keys with:
|
We are closing this due to inactivity, please comment to reopen. |
Hey! I ran into a similar issue. I checked the Any help will be very appreciated. Thx |
Want to note that I'm getting a similar issue as well. I can look into it more in the next few days to post more details, but the short version of things is that I have a number of runs which have their summary metrics filled while training progresses, but once the trial has finished the summary metrics disappear similar to the images above. I'm using the |
I had the same problem, and I figured that if I remove the line |
wandb --version && python --version && uname
Running into a similar issue where val_loss in the sweep configuration is not recognized. I see val_loss reported in each run but not the parallel coordinates in the sweep (just shows up as null):
|
wandb --version && python --version && uname
Description
Eventhough I have values for my
accuracy
metric (the graph is not empty), the Parallel Coordinates graph doesn't show the final value (all runs are marked as<null>
).What I Did
I guess I know where the problem might be because my logs are a little bit strange.
I am building my model in 2 times :
loss
andval_loss
metrics to Wandb plus an additionalaccuracy
which value is 0 (needed bykeras-tuner
library to perform the hyperparameter search) at the end of each epoch.accuracy
value (and few others metrics, as FalsePositive, ...) that I log to Wandb.This double logging thing make my last
accuracy
value logged one step after many other metrics (ie, ifloss
andval_loss
were logged during the first 156 steps, the lastaccuracy
value will be on the 157th step).For information, during the autoencoder's training the
accuracy
value is correctly shown as 0 and get wiped out afterwards.The text was updated successfully, but these errors were encountered: