Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a unit test for training and validation callbacks #32847

Closed
wants to merge 1 commit into from
Closed

Add a unit test for training and validation callbacks #32847

wants to merge 1 commit into from

Conversation

IvanUkhov
Copy link
Contributor

@IvanUkhov IvanUkhov commented Sep 26, 2019

The test is to check that the progress bar shown by Keras during the training process is working properly when training and validating with inputs of unknown sizes.

@tensorflow-bot tensorflow-bot bot added the size:XS CL Change Size: Extra Small label Sep 26, 2019
@rthadur rthadur self-assigned this Sep 26, 2019
@rthadur rthadur added this to Assigned Reviewer in PR Queue via automation Sep 26, 2019
@rthadur rthadur added the comp:keras Keras related issues label Sep 26, 2019
@pavithrasv
Copy link
Member

Can you verify that progress bar logger looks good and works as expected? i remember trying this and something was failing, don't recall what now.

@IvanUkhov
Copy link
Contributor Author

IvanUkhov commented Sep 26, 2019

Well, I tried with the same code as the one I wrote in #32819, and it worked as expected. However, I don’t think I’m in the position to guarantee that it doesn’t break anything. I assumed the test suite would catch if there had been any problems.

@pavithrasv
Copy link
Member

Sounds good, can you add a test case for the two Dataset use case you tested in callbacks_test? There are existing unit tests that check Progbar, may be we can add something similar for this case?

@IvanUkhov
Copy link
Contributor Author

The current status is that all tests in keras pass with the proposed change:

bazel test //tensorflow/python/keras/...
...
Executed 151 out of 152 tests: 152 tests pass.
INFO: Build completed successfully, 771 total actions

(There was apparently one skipped, but I suspect it was due to a previous run.)

I will then add a test, as it was suggested.

@pavithrasv
Copy link
Member

Thank you for adding the test, can you confirm that the test you added fails without the change?

@IvanUkhov
Copy link
Contributor Author

IvanUkhov commented Sep 28, 2019

I did the following on master (earlier I reported results with respect to what comes with tensorflow/tensorflow:devel-py3 without pulling).

Without any changes and any new tests, the following one was failing under keras:

bazel test //tensorflow/python/keras/...
...
//tensorflow/python/keras/distribute:multi_worker_fault_tolerance_test   FAILED in 14 out of 14 in 9.2s
...
Executed 152 out of 152 tests: 151 tests pass and 1 fails locally.
INFO: Build completed, 1 test FAILED, 16920 total actions

I assumed it was irrelevant.

Then I focused on keras:callbacks_test, added the test there, and got a failure:

bazel test //tensorflow/python/keras:callbacks_test
...
//tensorflow/python/keras:callbacks_test                                 FAILED in 3 out of 4 in 38.8s
...
INFO: Build completed, 1 test FAILED, 5 total actions
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v1_session_sequential
[  FAILED  ] KerasCallbacksTest.test_progbar_logging_training_validation_v1_session_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_subclass
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_function_functional
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_function_functional

However, it was due to a different reason (not what the test was asserting):

  File "…/tensorflow/python/keras/callbacks_test.py", line 352, in test_progbar_logging_training_validation
    steps_per_epoch=20)
...
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.

It complained that steps_per_epoch was not given, but it was. I thought it was due to a bug or lack of support in version 1, since it was coming from training.py, not training_v2.py (the latter is what this pull request makes an adjustment to). I forced the test to exclusion version 1:

@keras_parameterized.run_all_keras_modes(always_skip_v1=True)

Still without any changes outside the tests, the new test succeeded. I confirmed that the test was properly executed by changing the expected output and seeing it fail.

I thought it was fixed on master and checked out v1.15.0-rc1. Without any changes and any new tests, there was already one failure:

bazel test //tensorflow/python/keras/...
...
//tensorflow/python/keras:callbacks_test                                 FAILED in 4 out of 4 in 61.1s
...
Executed 153 out of 153 tests: 152 tests pass and 1 fails locally.
INFO: Build completed, 1 test FAILED, 10644 total actions

And it was test_progbar_logging_validation_split, which the new test was actually based on. The new test with always_skip_v1=True gave the following:

[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[  FAILED  ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass

And it was, as expected, due to AssertionError: Regex didn't match…. Then I changed the calculation of validation_callbacks, as shown in the patch, and got the following:

[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_eager_sequential
[ RUN      ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass
[       OK ] KerasCallbacksTest.test_progbar_logging_training_validation_v2_funcgraph_subclass

(test_progbar_logging_validation_split was failing as before.)

In summary, the proposed change fixes a bug in 1.15.0-rc1 but doesn’t seem to be needed for what is on the master branch. I think this is the following commit fixed it: 327c5be.

So what do we do with all this? The change doesn’t affect master; then it’s probably better to do not touch if it’s not broken. The test might be helpful still, though.

@pavithrasv
Copy link
Member

Thank you for the detailed notes, i agree about adding the test case since it will be useful and we can call the issue done.

@IvanUkhov
Copy link
Contributor Author

IvanUkhov commented Oct 4, 2019

I’ve rebased and removed the first commit.

Copy link
Member

@pavithrasv pavithrasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Could you update the title/desc to reflect the changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Oct 4, 2019
@IvanUkhov IvanUkhov changed the title Eliminate state sharing between training and validation callbacks Add a unit test for training and validation callbacks Oct 4, 2019
@IvanUkhov
Copy link
Contributor Author

Done!

pavithrasv
pavithrasv previously approved these changes Oct 4, 2019
Copy link
Member

@pavithrasv pavithrasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Oct 4, 2019
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 4, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 4, 2019
PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Oct 5, 2019
@IvanUkhov
Copy link
Contributor Author

I’ve looked through the failed builds, and one was them was due to the new code. There was incorrect indentation on two lines. Fixed.

tensorflow-copybara pushed a commit that referenced this pull request Oct 5, 2019
Imported from GitHub PR #32847

The test is to check that the progress bar shown Keras during the training process is working properly when training and validating with inputs of unknown sizes.

Copybara import of the project:

  - 3912067 Add a unit test for training and validation callbacks by Ivan Ukhov <ivan.ukhov@gmail.com>
  - 2f4d26b Merge 3912067 into 18f70... by Ivan Ukhov <ivan.ukhov@gmail.com>

COPYBARA_INTEGRATE_REVIEW=#32847 from IvanUkhov:shared-callbacks 3912067
PiperOrigin-RevId: 272958417
@rthadur
Copy link
Contributor

rthadur commented Oct 7, 2019

Seems auto-merge is not happening but the changes are now committed so we can close this. Thank you for the PR.

@rthadur rthadur closed this Oct 7, 2019
PR Queue automation moved this from Reviewer Requested Changes to Closed/Rejected Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:keras Keras related issues ready to pull PR ready for merge process size:XS CL Change Size: Extra Small
Projects
PR Queue
  
Closed/Rejected
Development

Successfully merging this pull request may close these issues.

None yet

5 participants