-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add TensorBoard writer without TensorFlow #2065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you change the title for this PR to "Add TensorBoard writer without TensorFlow"? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Have some change requests on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And one more comment. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great. @nfelt I think we should be ready for your review very soon. Any updates you'd request here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! @nfelt should be ready for your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies - didn't realize you still had masked_crc32c and u32 here. Let's just use the shared version of those instead.
remove `directory_check` and `reopen`
- dummy tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the restructuring and adding tests, it's looking a lot better! I realize this is a lot of comments - many are just small naming/comment issues. The rest are primarily about the tests, with a few comments on the multithreading logic and correctness of flushing behavior.
# In my experiment, the tolerance can be set as high to roughly to 0.95. | ||
# I set 0.9 here in case the CI is too slow. | ||
|
||
def test_async_writer_auto_flushing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have tests that use sleep(). It's better to instead access the w._queue property; accessing private members for test purposes is acceptable if there isn't a cleaner way to do it.
If you revise the flushing logic to call self._queue.task_done()
at the very end (the way my suggestion has it) then it's possible to do w._queue.join()
to ensure that all tasks have been finished (including any auto-flushing), which allows this to be tested deterministically.
The way that would work is that you could 'mock.patch' time.time()
to return a fake timestamp, and change the writer to be (rather than a raw file) a fake object where write() and flush() populate a list of lists (each outer list representing a set of flushed data, and each inner list representing the series of written data flushed at that point). Then you could have a test like this:
- set timestamp to 0
- w.write(b"1")
- w._queue.join()
- assert that fake writer contains [[b"1"]] (automatic auto-flush on first write)
- set timestamp to 0 + flush_secs - 1
- w.write(b"2")
- w._queue.join()
- assert that fake writer contains no new data (no auto-flush yet)
- set timestamp to 0 + flush_secs + 1
- w.write(b"3")
- w._queue.join()
- assert that fake writer contains [[b"1"], [b"2", b"3"]] (automatic auto-flush after "3")
- w.write(b"4")
- w._queue.join()
- assert that fake writer contains no new data
- set timestamp to 0 + flush_secs + 1 + flush_secs - 1
- w.write(b"5")
- w._queue.join()
- assert that fake writer contains no new data
- set timestamp to 0 + flush_secs + 1 + flush_secs + 1
- w.write(b"6")
- w._queue.join()
- assert that fake writer contains [[b"1"], [b"2", b"3"], [b"4", b"5", b"6"]] (automatic auto-flush after "6")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should made this a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so can we remove the more complex flushing tests from this PR and add them back in the follow-up?
That would be test_async_writer_auto_flushing
(along with the comment/diagram) and test_async_writer_flush_before_flush_secs
now that the latter has gotten more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanpa - yes, I realize now what dummy_delay is used for, but I'm still not crazy about that usage - see some of my other comments on the tests. Using sleep()
in tests can make them flaky which could impact our continuous testing stability.
Since we're on a tight timeframe, maybe what makes sense for now is to omit dummy_delay parameter and the auto_flushing test from this PR, and we can revisit how to make the tests work without sleep() calls in a follow up?
@nfelt Thank you for the precious comments. Now I have more sense about how to write unittests. (tests should be deterministic) Please have a look at those unresolved conversations for my questions. ps. The CI fails on tf-nightly only starting from bcbb1ac, It should be fixable by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to a few things here. Looking better!
from tensorboard import test as tb_test | ||
|
||
class EventFileWriterTest(tb_test.TestCase): | ||
def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @nfelt means you can just remove this def __init__(self, *args, **kwargs):
method completely. Give that a try and see how it goes.
assert f.read() == random_bytes | ||
|
||
|
||
def get_copy_by_OS(oldfilename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there is only one use, can we just inline this code? We can then remove it later.
|
||
class RecordWriterTest(tb_test.TestCase): | ||
def __init__(self, *args, **kwargs): | ||
super(RecordWriterTest, self).__init__(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - please just remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks for all the test updates! Just a few more comments.
One general comment that applies to several tests: let's avoid using os.urandom
and instead just use fixed dummy strings like b"hello world"
or if you need a specific length, use something like b"x" * 64
.
# In my experiment, the tolerance can be set as high to roughly to 0.95. | ||
# I set 0.9 here in case the CI is too slow. | ||
|
||
def test_async_writer_auto_flushing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so can we remove the more complex flushing tests from this PR and add them back in the follow-up?
That would be test_async_writer_auto_flushing
(along with the comment/diagram) and test_async_writer_flush_before_flush_secs
now that the latter has gotten more complicated.
Also heads up that this landing will be critical for our 1.1 release, which we'll patch ASAP! |
@nfelt, please merge when ready. Thank you. :) |
Summary: This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at tensorflow/tensorboard#2065. With these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using ``` import torch from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() s1 = torch.rand(1) writer.add_scalar('data/scalar1', s1[0], 0) writer.close() ``` Design: - `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard - `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets. Action items: - [x] `from torch.utils.tensorboard import SummaryWriter` - [x] rename functions - [x] unittests - [x] move actual writing function to tensorflow/tensorboard in tensorflow/tensorboard#2065 Review: - Please review for PyTorch standard formatting, code usage, etc. - Please verify unittest usage is correct and executing in CI Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future. cc orionr, ezyang Pull Request resolved: #16196 Differential Revision: D15062901 Pulled By: orionr fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
Summary: This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at tensorflow/tensorboard#2065. With these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using ``` import torch from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() s1 = torch.rand(1) writer.add_scalar('data/scalar1', s1[0], 0) writer.close() ``` Design: - `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard - `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets. Action items: - [x] `from torch.utils.tensorboard import SummaryWriter` - [x] rename functions - [x] unittests - [x] move actual writing function to tensorflow/tensorboard in tensorflow/tensorboard#2065 Review: - Please review for PyTorch standard formatting, code usage, etc. - Please verify unittest usage is correct and executing in CI Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future. cc orionr, ezyang Pull Request resolved: pytorch#16196 Differential Revision: D15062901 Pulled By: orionr fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
This PR addresses discussions in #1801.
cc @orionr