Skip to content

Conversation

lanpa
Copy link
Contributor

@lanpa lanpa commented Mar 27, 2019

This PR addresses discussions in #1801.

  • move writer files to tensorboard/writer
  • correct way to handle plugin protos (use softlink)
  • unittests
  • rewrite the writer in case of license issue.

cc @orionr

@orionr
Copy link
Contributor

orionr commented Mar 27, 2019

Can you change the title for this PR to "Add TensorBoard writer without TensorFlow"? Thanks.

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Have some change requests on here.

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more comment. :)

@lanpa lanpa changed the title [WIP] tensorboard-notf for pytorch (again) [WIP] Add TensorBoard writer without TensorFlow Mar 28, 2019
Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. @nfelt I think we should be ready for your review very soon. Any updates you'd request here?

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! @nfelt should be ready for your review.

@lanpa lanpa changed the title [WIP] Add TensorBoard writer without TensorFlow Add TensorBoard writer without TensorFlow Apr 2, 2019
Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies - didn't realize you still had masked_crc32c and u32 here. Let's just use the shared version of those instead.

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@orionr
Copy link
Contributor

orionr commented Apr 12, 2019

@nfelt, confirmed with @lanpa that this is ready for review! The optimization changes can hopefully be done as a followup, but let us know.

@orionr
Copy link
Contributor

orionr commented Apr 17, 2019

@nfelt and @wchargin, a gentle ping for review on this. Thanks.

Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the restructuring and adding tests, it's looking a lot better! I realize this is a lot of comments - many are just small naming/comment issues. The rest are primarily about the tests, with a few comments on the multithreading logic and correctness of flushing behavior.

# In my experiment, the tolerance can be set as high to roughly to 0.95.
# I set 0.9 here in case the CI is too slow.

def test_async_writer_auto_flushing(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have tests that use sleep(). It's better to instead access the w._queue property; accessing private members for test purposes is acceptable if there isn't a cleaner way to do it.

If you revise the flushing logic to call self._queue.task_done() at the very end (the way my suggestion has it) then it's possible to do w._queue.join() to ensure that all tasks have been finished (including any auto-flushing), which allows this to be tested deterministically.

The way that would work is that you could 'mock.patch' time.time() to return a fake timestamp, and change the writer to be (rather than a raw file) a fake object where write() and flush() populate a list of lists (each outer list representing a set of flushed data, and each inner list representing the series of written data flushed at that point). Then you could have a test like this:

  • set timestamp to 0
  • w.write(b"1")
  • w._queue.join()
  • assert that fake writer contains [[b"1"]] (automatic auto-flush on first write)
  • set timestamp to 0 + flush_secs - 1
  • w.write(b"2")
  • w._queue.join()
  • assert that fake writer contains no new data (no auto-flush yet)
  • set timestamp to 0 + flush_secs + 1
  • w.write(b"3")
  • w._queue.join()
  • assert that fake writer contains [[b"1"], [b"2", b"3"]] (automatic auto-flush after "3")
  • w.write(b"4")
  • w._queue.join()
  • assert that fake writer contains no new data
  • set timestamp to 0 + flush_secs + 1 + flush_secs - 1
  • w.write(b"5")
  • w._queue.join()
  • assert that fake writer contains no new data
  • set timestamp to 0 + flush_secs + 1 + flush_secs + 1
  • w.write(b"6")
  • w._queue.join()
  • assert that fake writer contains [[b"1"], [b"2", b"3"], [b"4", b"5", b"6"]] (automatic auto-flush after "6")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should made this a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so can we remove the more complex flushing tests from this PR and add them back in the follow-up?

That would be test_async_writer_auto_flushing (along with the comment/diagram) and test_async_writer_flush_before_flush_secs now that the latter has gotten more complicated.

Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lanpa - yes, I realize now what dummy_delay is used for, but I'm still not crazy about that usage - see some of my other comments on the tests. Using sleep() in tests can make them flaky which could impact our continuous testing stability.

Since we're on a tight timeframe, maybe what makes sense for now is to omit dummy_delay parameter and the auto_flushing test from this PR, and we can revisit how to make the tests work without sleep() calls in a follow up?

@lanpa
Copy link
Contributor Author

lanpa commented Apr 20, 2019

@nfelt Thank you for the precious comments. Now I have more sense about how to write unittests. (tests should be deterministic) Please have a look at those unresolved conversations for my questions. ps. The CI fails on tf-nightly only starting from bcbb1ac, It should be fixable by Merge remote-tracking branch 'upstream/master' into no-tf-sep-writer again before PR landing. Btw, since many asyncThreadTest related tests will be follow PR, I mark conversation resolved.
cc @orionr

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to a few things here. Looking better!

from tensorboard import test as tb_test

class EventFileWriterTest(tb_test.TestCase):
def __init__(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @nfelt means you can just remove this def __init__(self, *args, **kwargs): method completely. Give that a try and see how it goes.

assert f.read() == random_bytes


def get_copy_by_OS(oldfilename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there is only one use, can we just inline this code? We can then remove it later.


class RecordWriterTest(tb_test.TestCase):
def __init__(self, *args, **kwargs):
super(RecordWriterTest, self).__init__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - please just remove this method.

Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks for all the test updates! Just a few more comments.

One general comment that applies to several tests: let's avoid using os.urandom and instead just use fixed dummy strings like b"hello world" or if you need a specific length, use something like b"x" * 64.

# In my experiment, the tolerance can be set as high to roughly to 0.95.
# I set 0.9 here in case the CI is too slow.

def test_async_writer_auto_flushing(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so can we remove the more complex flushing tests from this PR and add them back in the follow-up?

That would be test_async_writer_auto_flushing (along with the comment/diagram) and test_async_writer_flush_before_flush_secs now that the latter has gotten more complicated.

@orionr
Copy link
Contributor

orionr commented Apr 23, 2019

@nfelt, should be ready for review. Thanks @lanpa!

@orionr
Copy link
Contributor

orionr commented Apr 23, 2019

Also heads up that this landing will be critical for our 1.1 release, which we'll patch ASAP!

@orionr
Copy link
Contributor

orionr commented Apr 24, 2019

@nfelt, please merge when ready. Thank you. :)

@nfelt nfelt merged commit 740be35 into tensorflow:master Apr 24, 2019
facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull request Apr 26, 2019
Summary:
This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX  code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at tensorflow/tensorboard#2065.

With  these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using

```
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
s1 = torch.rand(1)
writer.add_scalar('data/scalar1', s1[0], 0)
writer.close()
```

Design:
- `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard
- `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets.

Action items:
- [x] `from torch.utils.tensorboard import SummaryWriter`
- [x] rename functions
- [x] unittests
- [x] move actual writing function to tensorflow/tensorboard in tensorflow/tensorboard#2065

Review:
- Please review for PyTorch standard formatting, code usage, etc.
- Please verify unittest usage is correct and executing in CI

Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future.

cc orionr, ezyang
Pull Request resolved: #16196

Differential Revision: D15062901

Pulled By: orionr

fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX  code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at tensorflow/tensorboard#2065.

With  these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using

```
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
s1 = torch.rand(1)
writer.add_scalar('data/scalar1', s1[0], 0)
writer.close()
```

Design:
- `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard
- `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets.

Action items:
- [x] `from torch.utils.tensorboard import SummaryWriter`
- [x] rename functions
- [x] unittests
- [x] move actual writing function to tensorflow/tensorboard in tensorflow/tensorboard#2065

Review:
- Please review for PyTorch standard formatting, code usage, etc.
- Please verify unittest usage is correct and executing in CI

Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future.

cc orionr, ezyang
Pull Request resolved: pytorch#16196

Differential Revision: D15062901

Pulled By: orionr

fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants