Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorBoard could not refresh automatically when use HDFS path as logdir? #5438

Closed
wangyang0918 opened this issue Nov 7, 2016 · 5 comments
Closed
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@wangyang0918
Copy link

Environment
https://github.com/tensorflow/ecosystem/blob/master/docker/Dockerfile.hdfs

I use the following command to start a tensorflow job. It works well. However, the tensor board could not refresh automatically unless restart the tensor board server.


python mnist.py --data_dir=hdfs://hdpalt/user/danrtsey.wy/mnist-data --train_dir=hdfs://hdpalt/user/danrtsey.wy/.slider/checkpoints/test1

tensorboard --logdir=hdfs://hdpalt/user/danrtsey.wy/.slider/checkpoints/test1

BTW, i find the file size of event file on HDFS does not update. Although, the content has changed. Is this the reason?


$hadoop fs -ls hdfs://hdpalt/user/danrtsey.wy/.slider/checkpoints/test1/events.out.tfevents.1478500140.8e103b0b7135
-rw-r--r--   3 yarn danrtsey.wy         40 2016-11-07 14:29 hdfs://hdpalt/user/danrtsey.wy/.slider/checkpoints/test1/events.out.tfevents.1478500140.8e103b0b7135

$hadoop fs -cat hdfs://hdpalt/user/danrtsey.wy/.slider/checkpoints/test1/events.out.tfevents.1478500140.8e103b0b7135 | wc -l
9312
@aselle
Copy link
Contributor

aselle commented Nov 7, 2016

@jhseu, @danmane, any ideas won why this might be?

@wangyang0918 wangyang0918 added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Nov 7, 2016
@RenChunde
Copy link

RenChunde commented Nov 10, 2016

@aselle, @jhseu, @danmane, I think the reason is that when a HDFS file writing, its length got by listStatus/getFileStatus from Namenode will not be updated until the block completed or the file created with SyncFlag.UPDATE_LENGTH. But the new data flushed from OutputStream will be available to read for new InputStream. A workaround is to reopen the inputstream repeatedly, like the implementation of HBase replication.

@jhseu
Copy link
Contributor

jhseu commented Nov 17, 2016

Fixed internally and will show up during the next commit sync within a day or so.

We now reopen the inputstream upon reaching EOF as suggested by @RenChunde.

@jhseu jhseu closed this as completed Nov 17, 2016
@jhseu
Copy link
Contributor

jhseu commented Nov 17, 2016

Also note that the file size still doesn't update when listing the directory, but the new contents are available for reading and show up on tensorboard.

@wangyang0918
Copy link
Author

Thanks a lot for your attention to this issue. I will help to confirm after the next commit sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants