Tensorboard cannot load more than two event file in logdir #9512

Misairu-G · 2017-04-28T10:35:44Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes, Custom network structure and data pre-processing for my own task and dataset, modified based on current single GPU CIFAR-10 tutorial (which use the monitored session).
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows 10 Pro 1703
TensorFlow installed from (source or binary):
Binary, install locally by using pip install .\xxx.whl in my miniconda environment, for the environment, pip freeze give the following information

appdirs==1.4.3
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==2.2.0
matplotlib==2.0.0
numpy==1.12.1
olefile==0.44
packaging==16.8
Pillow==4.1.0
protobuf==3.2.0
pyparsing==2.2.0
python-dateutil==2.6.0
pytz==2017.2
six==1.10.0
tensorflow-gpu==1.1.0rc2
Werkzeug==0.12.1

TensorFlow version (use command below):
Nightly build #149 (GPU Version), 1.1.0-rc2
CUDA/cuDNN version:
CUDA 8.0, cuDNN 5.1
GPU model and memory:
Quadro M1200, 4GB, WDDM mode

Describe the problem

When restart the training (due to some hyper parameter adjustment) the third time, Tensorboard cannot load the new event file. It can only load the first two event file and after that scalar will stop refreshing.

Powershell console gave the following output:

[tensor] PS D:\Workspace\ConsorFlow> tensorboard.exe --logdir '../input_data/lpr_train_exp_01'
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
Starting TensorBoard b'52' at http://DESKTOP-P7T44AT:6006
(Press CTRL+C to quit)
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
ERROR:tensorflow:Unable to get size of D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT: D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Detected out of order event.step likely caused by a TensorFlow restart. Purging expired events from Tensorboard display between the previous step: -1 (timestamp: -1) and current step: 17454 (timestamp: 1493310366.6493406). Removing 174 scalars, 76 histograms, 76 compressed histograms, 451 images, and 0 audio.

The 'current step' 17454 in the output is the first step in my second restart.

Information about event files:
1st: events.out.tfevents.1493274079
2nd: events.out.tfevents.1493310339
3rd: events.out.tfevents.1493352650

About this problem in Ubuntu:
I just switch to windows several days ago, such problem did not exist in Ubuntu (at least 14.04). I was using the exact same script, but with tensorflow version 1.01 (GPU, not nightly version), install following the offical instruction.

Under windows, it was because of #7500, which leave me no choice but to install a nightly build.

The text was updated successfully, but these errors were encountered:

girving · 2017-04-28T14:11:43Z

@Cooper-Yang: Can you clarify? The last sentence makes it sound like the problem is fixed in nightly builds.

Misairu-G · 2017-04-28T14:43:31Z

@girving Sure. I choose nightly build because I encountered #7500 (OpKernel "bla bla bla" for unknown op: bla bla bla) while using tensorflow, and it was suggested to use a nightly build version for solving it. So I did.

And then, I met this problem while using Tensorboard.

Update: After some experiment, this problem exist in every version I tried, which include the current release 1.1.0, and nightly build 1.1.0-rc1, 1.1.0-rc2.

girving · 2017-04-28T15:07:21Z

@dandelionmane Any ideas about this TensorBoard+Windows problem?

Misairu-G · 2017-04-29T05:04:33Z

update: fresh install ubuntu 16.04 and tensorflow, turns out that tensorboard in ubuntu can load the same logdir without any problem, console output shows the following:

(tensor) coopery@WorkStation-CY:/media/coopery/X1/Workspace/input_data$ tensorboard --host 127.0.0.1 --logdir ./lpr_train_exp_01
Starting TensorBoard b'47' at http://127.0.0.1:6006
(Press CTRL+C to quit)
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:path ../external/data/plugin/text/runs not found, sending 404
WARNING:tensorflow:path ../external/data/plugin/text/runs not found, sending 404
WARNING:tensorflow:path ../external/data/plugin/text/runs not found, sending 404
WARNING:tensorflow:path ../external/data/plugin/text/runs not found, sending 404
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Detected out of order event.step likely caused by a TensorFlow restart. Purging expired events from Tensorboard display between the previous step: -1 (timestamp: -1) and current step: 17454 (timestamp: 1493310366.6493406). Removing 174 scalars, 76 histograms, 76 compressed histograms, 451 images, and 0 audio.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Detected out of order event.step likely caused by a TensorFlow restart. Purging expired events from Tensorboard display between the previous step: -1 (timestamp: -1) and current step: 64632 (timestamp: 1493381130.1441092). Removing 116 scalars, 76 histograms, 0 compressed histograms, 451 images, and 0 audio.

teamdandelion · 2017-06-16T21:53:47Z

This is a known issue, TensorBoard doesn't like it when you write multiple event files from separate runs in the same directory. It will be fixed if you use a new subdirectory for every run (new hyperparameters = new subdirectory).

mohamed-ezz · 2017-12-28T12:03:34Z

@dandelionmane any plans to fix this ?

With this issue, one must maintain a single writer per run, which isn't possible when, for example, using Keras's TensorBoard callback along with other custom image/audio tensorboard callbacks. The other option is to have each writer in a separate directory but then Tensorboard will show them as separate runs.

gar1t · 2018-03-10T17:00:26Z

This issues comes up with estimators as well, which write to an eval subdirectory of model_dir.

I think this should be reopened :)

tpet · 2018-03-11T16:26:21Z

If possible, it would also be good to know where this happened (run directory etc.), exactly what data triggered this warning.

gar1t · 2018-03-11T16:48:44Z

This sample project below can be used to reproduce the warnings. It's an implementation of the model in Getting Started with TensorFlow and uses tf.estimator.DNNClassifier.

https://github.com/guildai/examples/tree/master/iris

Steps:

git clone https://github.com/guildai/examples.git /tmp/tb-issue
cd /tmp/tb-issue/iris
python train.py
tensorboard --logdir model

JRMeyer · 2018-05-22T12:33:57Z

I'm also getting this issue with tf.estimator.DNNClassifier...

any news?

boulderZ · 2018-06-24T02:10:22Z

I see same problem, why was this closed?

meproyousuck · 2018-08-02T10:23:54Z

I have the same Problem with tf.estimator.DNNRegressor

pranav-vempati · 2018-08-25T07:53:58Z

I'm encountering the same problem when employing the TensorBoard callback for training a Keras model. Is there a workaround for this issue that doesn't involve creating a separate subdirectory for logging event files generated by each run?

This prevents a TensorBoard bug, see: tensorflow/tensorflow#9512

mjasonong · 2018-12-12T15:15:16Z

This is a known issue, TensorBoard doesn't like it when you write multiple event files from separate runs in the same directory. It will be fixed if you use a new subdirectory for every run (new hyperparameters = new subdirectory).

Hi, can someone please elaborate on how to this? thanks

simonsays1980 · 2019-01-04T10:34:32Z

I am also having this problem when using the tf.estimator.Estimator with the tf.estimator.RunConfig saving checkpoints. There had been an issue that was already closed without a solution: #17272

Issue: tensorflow/tensorflow#9512

datianshi21 · 2019-05-14T07:50:07Z

Same question

ywdong · 2019-06-13T11:04:08Z

When using tf.estimator.train_and_evaluate(...) with an tf.estimator.Estimator, I have the same problem, any ideas?

xxllp · 2019-07-02T01:03:57Z

two bad ~~

datianshi21 · 2019-08-01T06:59:34Z

When using tf.estimator.train_and_evaluate(...) with an tf.estimator.Estimator, I have the same problem, any ideas?

Two possible reasons triggers that:
Reason1: you haven't clean the model repo but you train a new model.
Solution: just clean the model repo before you train another model.
Reason2: you are using models like WDL that made up with more than one graphs
Solution: In this situation, you cannot use tensorboard graph visualization

datianshi21 · 2019-08-01T07:00:14Z

two bad ~~

Two possible reasons triggers that:
Reason1: you haven't clean the model repo but you train a new model.
**Solution: **just clean the model repo before you train another model.
Reason2: you are using models like WDL that made up with more than one graphs
**Solution: **In this situation, you cannot use tensorboard graph visualization

zyuanbing · 2019-10-01T14:38:35Z

by using torch vision 1.3.0.dev20190924
tensorboard works now.

mauricioarmani · 2019-10-08T23:35:56Z

I had the same problem using tensorboardX for pytorch. I notice that the code was writing two logs when the code starts. I was instantiating the summary writer in a main.py and writing from a train.py. Instantiating the writer in the trian.py solved.

Danfoa · 2020-02-12T14:09:38Z

As mentioned by @mohamed-ezz:

With this issue, one must maintain a single writer per run, which isn't possible when, for example, using Keras's TensorBoard callback along with other custom image/audio tensorboard callbacks. The other option is to have each writer in a separate directory but then Tensorboard will show them as separate runs.

In order to properly log runs with custom callbacks this needs to be addressed. I believe this should be reopen or a new issue needs to be created

andics · 2020-08-11T11:40:43Z

Same issue... I'm very surprised that such a trivial problem has not been addressed yet.
It makes most sense to me to use the same directory for the same experiment. I'm thinking about a script that combines the tensorboard files from one directory into two single ones so that tensorboard doesn't have a problem.

andics · 2020-08-11T12:08:23Z

Edit:
I found a solution. When starting your tensorboard server add the following flag:
--purge_orphaned_data false

Hope that helps!

TomMeowMeow · 2021-02-08T02:43:27Z

For anyone looking for a naive solution. After every run, just remove the logfile folder or log file itself. Something like "rm -rf tf_logs", so each time the tf_logs would be created like it is the first time, and then remove it after review. Since we have to call the tensorboard from teminal anyway, I think for now this would hold.
sample code
tensorboard --logdir tf_logs/
rm -rf tf_logs

ecorreig · 2023-09-01T11:00:38Z

If someone as [insert deprecating word here] as me is here, this might be caused by adding

if epoch == 0:
     tensorboard_writer.add_graph(model, data[0])

to the wrong loop; probably in a batch loop inside the epochs loop.

girving added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 28, 2017

girving added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Apr 28, 2017

teamdandelion added the comp:tensorboard Tensorboard related issues label Jun 16, 2017

teamdandelion closed this as completed Jun 16, 2017

mdangschat added a commit to mdangschat/ctc-asr that referenced this issue Oct 29, 2018

Added train_metrics_dir FLAG. Used for TraceHook.

eb14ff2

This prevents a TensorBoard bug, see: tensorflow/tensorflow#9512

oscmansan added a commit to nokutu/m3-project that referenced this issue Jan 22, 2019

Save tensorboard logs in separate folders

3c49a85

Issue: tensorflow/tensorflow#9512

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorboard cannot load more than two event file in logdir #9512

Tensorboard cannot load more than two event file in logdir #9512

Misairu-G commented Apr 28, 2017

girving commented Apr 28, 2017 •

edited

Misairu-G commented Apr 28, 2017

girving commented Apr 28, 2017

Misairu-G commented Apr 29, 2017

teamdandelion commented Jun 16, 2017

mohamed-ezz commented Dec 28, 2017 •

edited

gar1t commented Mar 10, 2018

tpet commented Mar 11, 2018 •

edited

gar1t commented Mar 11, 2018

JRMeyer commented May 22, 2018

boulderZ commented Jun 24, 2018

meproyousuck commented Aug 2, 2018

pranav-vempati commented Aug 25, 2018

mjasonong commented Dec 12, 2018

simonsays1980 commented Jan 4, 2019

datianshi21 commented May 14, 2019

ywdong commented Jun 13, 2019

xxllp commented Jul 2, 2019

datianshi21 commented Aug 1, 2019

datianshi21 commented Aug 1, 2019

zyuanbing commented Oct 1, 2019

mauricioarmani commented Oct 8, 2019

Danfoa commented Feb 12, 2020

andics commented Aug 11, 2020

andics commented Aug 11, 2020 •

edited

TomMeowMeow commented Feb 8, 2021 •

edited

ecorreig commented Sep 1, 2023 •

edited

Tensorboard cannot load more than two event file in logdir #9512

Tensorboard cannot load more than two event file in logdir #9512

Comments

Misairu-G commented Apr 28, 2017

System information

Describe the problem

girving commented Apr 28, 2017 • edited

Misairu-G commented Apr 28, 2017

girving commented Apr 28, 2017

Misairu-G commented Apr 29, 2017

teamdandelion commented Jun 16, 2017

mohamed-ezz commented Dec 28, 2017 • edited

gar1t commented Mar 10, 2018

tpet commented Mar 11, 2018 • edited

gar1t commented Mar 11, 2018

JRMeyer commented May 22, 2018

boulderZ commented Jun 24, 2018

meproyousuck commented Aug 2, 2018

pranav-vempati commented Aug 25, 2018

mjasonong commented Dec 12, 2018

simonsays1980 commented Jan 4, 2019

datianshi21 commented May 14, 2019

ywdong commented Jun 13, 2019

xxllp commented Jul 2, 2019

datianshi21 commented Aug 1, 2019

datianshi21 commented Aug 1, 2019

zyuanbing commented Oct 1, 2019

mauricioarmani commented Oct 8, 2019

Danfoa commented Feb 12, 2020

andics commented Aug 11, 2020

andics commented Aug 11, 2020 • edited

TomMeowMeow commented Feb 8, 2021 • edited

ecorreig commented Sep 1, 2023 • edited

girving commented Apr 28, 2017 •

edited

mohamed-ezz commented Dec 28, 2017 •

edited

tpet commented Mar 11, 2018 •

edited

andics commented Aug 11, 2020 •

edited

TomMeowMeow commented Feb 8, 2021 •

edited

ecorreig commented Sep 1, 2023 •

edited