Add a unit test for training and validation callbacks #32847

IvanUkhov · 2019-09-26T19:21:56Z

The test is to check that the progress bar shown by Keras during the training process is working properly when training and validating with inputs of unknown sizes.

pavithrasv · 2019-09-26T20:56:03Z

Can you verify that progress bar logger looks good and works as expected? i remember trying this and something was failing, don't recall what now.

IvanUkhov · 2019-09-26T21:07:00Z

Well, I tried with the same code as the one I wrote in #32819, and it worked as expected. However, I don’t think I’m in the position to guarantee that it doesn’t break anything. I assumed the test suite would catch if there had been any problems.

pavithrasv · 2019-09-26T22:21:15Z

Sounds good, can you add a test case for the two Dataset use case you tested in callbacks_test? There are existing unit tests that check Progbar, may be we can add something similar for this case?

IvanUkhov · 2019-09-27T12:17:22Z

The current status is that all tests in keras pass with the proposed change:

bazel test //tensorflow/python/keras/...
...
Executed 151 out of 152 tests: 152 tests pass.
INFO: Build completed successfully, 771 total actions

(There was apparently one skipped, but I suspect it was due to a previous run.)

I will then add a test, as it was suggested.

pavithrasv · 2019-09-27T16:58:36Z

Thank you for adding the test, can you confirm that the test you added fails without the change?

IvanUkhov · 2019-09-28T07:44:24Z

I did the following on master (earlier I reported results with respect to what comes with tensorflow/tensorflow:devel-py3 without pulling).

Without any changes and any new tests, the following one was failing under keras:

bazel test //tensorflow/python/keras/...
...
//tensorflow/python/keras/distribute:multi_worker_fault_tolerance_test   FAILED in 14 out of 14 in 9.2s
...
Executed 152 out of 152 tests: 151 tests pass and 1 fails locally.
INFO: Build completed, 1 test FAILED, 16920 total actions

I assumed it was irrelevant.

Then I focused on keras:callbacks_test, added the test there, and got a failure:

bazel test //tensorflow/python/keras:callbacks_test
...
//tensorflow/python/keras:callbacks_test                                 FAILED in 3 out of 4 in 38.8s
...
INFO: Build completed, 1 test FAILED, 5 total actions

[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v1_session_sequential
[  FAILED  ] KerasCallbacksTest.test_progbar_logging_training_validation_v1_session_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_subclass
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_function_functional
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_function_functional

However, it was due to a different reason (not what the test was asserting):

  File "…/tensorflow/python/keras/callbacks_test.py", line 352, in test_progbar_logging_training_validation
    steps_per_epoch=20)
...
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.

It complained that steps_per_epoch was not given, but it was. I thought it was due to a bug or lack of support in version 1, since it was coming from training.py, not training_v2.py (the latter is what this pull request makes an adjustment to). I forced the test to exclusion version 1:

@keras_parameterized.run_all_keras_modes(always_skip_v1=True)

Still without any changes outside the tests, the new test succeeded. I confirmed that the test was properly executed by changing the expected output and seeing it fail.

I thought it was fixed on master and checked out v1.15.0-rc1. Without any changes and any new tests, there was already one failure:

bazel test //tensorflow/python/keras/...
...
//tensorflow/python/keras:callbacks_test                                 FAILED in 4 out of 4 in 61.1s
...
Executed 153 out of 153 tests: 152 tests pass and 1 fails locally.
INFO: Build completed, 1 test FAILED, 10644 total actions

And it was test_progbar_logging_validation_split, which the new test was actually based on. The new test with always_skip_v1=True gave the following:

[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[  FAILED  ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass

And it was, as expected, due to AssertionError: Regex didn't match…. Then I changed the calculation of validation_callbacks, as shown in the patch, and got the following:

[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass

(test_progbar_logging_validation_split was failing as before.)

In summary, the proposed change fixes a bug in 1.15.0-rc1 but doesn’t seem to be needed for what is on the master branch. I think this is the following commit fixed it: 327c5be.

So what do we do with all this? The change doesn’t affect master; then it’s probably better to do not touch if it’s not broken. The test might be helpful still, though.

pavithrasv · 2019-10-04T19:07:39Z

Thank you for the detailed notes, i agree about adding the test case since it will be useful and we can call the issue done.

IvanUkhov · 2019-10-04T19:21:47Z

I’ve rebased and removed the first commit.

pavithrasv

Thank you! Could you update the title/desc to reflect the changes

IvanUkhov · 2019-10-04T19:55:47Z

Done!

pavithrasv

Thank you!

IvanUkhov · 2019-10-05T05:52:57Z

I’ve looked through the failed builds, and one was them was due to the new code. There was incorrect indentation on two lines. Fixed.

Imported from GitHub PR #32847 The test is to check that the progress bar shown Keras during the training process is working properly when training and validating with inputs of unknown sizes. Copybara import of the project: - 3912067 Add a unit test for training and validation callbacks by Ivan Ukhov <ivan.ukhov@gmail.com> - 2f4d26b Merge 3912067 into 18f70... by Ivan Ukhov <ivan.ukhov@gmail.com> COPYBARA_INTEGRATE_REVIEW=#32847 from IvanUkhov:shared-callbacks 3912067 PiperOrigin-RevId: 272958417

rthadur · 2019-10-07T17:42:02Z

Seems auto-merge is not happening but the changes are now committed so we can close this. Thank you for the PR.

tensorflow-bot bot added the size:XS CL Change Size: Extra Small label Sep 26, 2019

googlebot added the cla: yes label Sep 26, 2019

rthadur self-assigned this Sep 26, 2019

rthadur added this to Assigned Reviewer in PR Queue via automation Sep 26, 2019

rthadur added the comp:keras Keras related issues label Sep 26, 2019

rthadur requested a review from pavithrasv September 26, 2019 20:52

IvanUkhov mentioned this pull request Sep 27, 2019

Shared and mutated state between training and validation callbacks #32819

Closed

pavithrasv requested changes Oct 4, 2019

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Oct 4, 2019

IvanUkhov changed the title ~~Eliminate state sharing between training and validation callbacks~~ Add a unit test for training and validation callbacks Oct 4, 2019

pavithrasv previously approved these changes Oct 4, 2019

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Oct 4, 2019

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 4, 2019

kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 4, 2019

Add a unit test for training and validation callbacks

d40a633

IvanUkhov dismissed pavithrasv’s stale review via d40a633 October 5, 2019 05:51

PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Oct 5, 2019

rthadur closed this Oct 7, 2019

PR Queue automation moved this from Reviewer Requested Changes to Closed/Rejected Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a unit test for training and validation callbacks #32847

Add a unit test for training and validation callbacks #32847

IvanUkhov commented Sep 26, 2019 •

edited

pavithrasv commented Sep 26, 2019

IvanUkhov commented Sep 26, 2019 •

edited

pavithrasv commented Sep 26, 2019

IvanUkhov commented Sep 27, 2019

pavithrasv commented Sep 27, 2019

IvanUkhov commented Sep 28, 2019 •

edited

pavithrasv commented Oct 4, 2019

IvanUkhov commented Oct 4, 2019 •

edited

pavithrasv left a comment

IvanUkhov commented Oct 4, 2019

pavithrasv left a comment

IvanUkhov commented Oct 5, 2019

rthadur commented Oct 7, 2019

Add a unit test for training and validation callbacks #32847

Add a unit test for training and validation callbacks #32847

Conversation

IvanUkhov commented Sep 26, 2019 • edited

pavithrasv commented Sep 26, 2019

IvanUkhov commented Sep 26, 2019 • edited

pavithrasv commented Sep 26, 2019

IvanUkhov commented Sep 27, 2019

pavithrasv commented Sep 27, 2019

IvanUkhov commented Sep 28, 2019 • edited

pavithrasv commented Oct 4, 2019

IvanUkhov commented Oct 4, 2019 • edited

pavithrasv left a comment

Choose a reason for hiding this comment

IvanUkhov commented Oct 4, 2019

pavithrasv left a comment

Choose a reason for hiding this comment

IvanUkhov commented Oct 5, 2019

rthadur commented Oct 7, 2019

IvanUkhov commented Sep 26, 2019 •

edited

IvanUkhov commented Sep 26, 2019 •

edited

IvanUkhov commented Sep 28, 2019 •

edited

IvanUkhov commented Oct 4, 2019 •

edited