New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorboard cannot load more than two event file in logdir #9512
Comments
@Cooper-Yang: Can you clarify? The last sentence makes it sound like the problem is fixed in nightly builds. |
@girving Sure. I choose nightly build because I encountered #7500 (OpKernel "bla bla bla" for unknown op: bla bla bla) while using tensorflow, and it was suggested to use a nightly build version for solving it. So I did. And then, I met this problem while using Tensorboard. Update: After some experiment, this problem exist in every version I tried, which include the current release 1.1.0, and nightly build 1.1.0-rc1, 1.1.0-rc2. |
@dandelionmane Any ideas about this TensorBoard+Windows problem? |
update: fresh install ubuntu 16.04 and tensorflow, turns out that tensorboard in ubuntu can load the same logdir without any problem, console output shows the following:
|
This is a known issue, TensorBoard doesn't like it when you write multiple event files from separate runs in the same directory. It will be fixed if you use a new subdirectory for every run (new hyperparameters = new subdirectory). |
@dandelionmane any plans to fix this ? With this issue, one must maintain a single writer per run, which isn't possible when, for example, using Keras's |
This issues comes up with estimators as well, which write to an I think this should be reopened :) |
If possible, it would also be good to know where this happened (run directory etc.), exactly what data triggered this warning. |
This sample project below can be used to reproduce the warnings. It's an implementation of the model in Getting Started with TensorFlow and uses https://github.com/guildai/examples/tree/master/iris Steps:
|
I'm also getting this issue with any news? |
I see same problem, why was this closed? |
I have the same Problem with tf.estimator.DNNRegressor |
I'm encountering the same problem when employing the TensorBoard callback for training a Keras model. Is there a workaround for this issue that doesn't involve creating a separate subdirectory for logging event files generated by each run? |
This prevents a TensorBoard bug, see: tensorflow/tensorflow#9512
Hi, can someone please elaborate on how to this? thanks |
I am also having this problem when using the tf.estimator.Estimator with the tf.estimator.RunConfig saving checkpoints. There had been an issue that was already closed without a solution: #17272 |
Same question |
When using tf.estimator.train_and_evaluate(...) with an tf.estimator.Estimator, I have the same problem, any ideas? |
two bad ~~ |
Two possible reasons triggers that: |
Two possible reasons triggers that: |
by using torch vision 1.3.0.dev20190924 |
I had the same problem using tensorboardX for pytorch. I notice that the code was writing two logs when the code starts. I was instantiating the summary writer in a main.py and writing from a train.py. Instantiating the writer in the trian.py solved. |
As mentioned by @mohamed-ezz:
In order to properly log runs with custom callbacks this needs to be addressed. I believe this should be reopen or a new issue needs to be created |
Same issue... I'm very surprised that such a trivial problem has not been addressed yet. |
Edit: Hope that helps! |
For anyone looking for a naive solution. After every run, just remove the logfile folder or log file itself. Something like "rm -rf tf_logs", so each time the tf_logs would be created like it is the first time, and then remove it after review. Since we have to call the tensorboard from teminal anyway, I think for now this would hold. |
If someone as [insert deprecating word here] as me is here, this might be caused by adding
to the wrong loop; probably in a batch loop inside the epochs loop. |
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes, Custom network structure and data pre-processing for my own task and dataset, modified based on current single GPU CIFAR-10 tutorial (which use the monitored session).
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows 10 Pro 1703
TensorFlow installed from (source or binary):
Binary, install locally by using
pip install .\xxx.whl
in my miniconda environment, for the environment,pip freeze
give the following informationTensorFlow version (use command below):
Nightly build #149 (GPU Version), 1.1.0-rc2
CUDA/cuDNN version:
CUDA 8.0, cuDNN 5.1
GPU model and memory:
Quadro M1200, 4GB, WDDM mode
Describe the problem
When restart the training (due to some hyper parameter adjustment) the third time, Tensorboard cannot load the new event file. It can only load the first two event file and after that scalar will stop refreshing.
Powershell console gave the following output:
The 'current step' 17454 in the output is the first step in my second restart.
Information about event files:
1st: events.out.tfevents.1493274079
2nd: events.out.tfevents.1493310339
3rd: events.out.tfevents.1493352650
About this problem in Ubuntu:
I just switch to windows several days ago, such problem did not exist in Ubuntu (at least 14.04). I was using the exact same script, but with tensorflow version 1.01 (GPU, not nightly version), install following the offical instruction.
Under windows, it was because of #7500, which leave me no choice but to install a nightly build.
The text was updated successfully, but these errors were encountered: