Cleanup `tensorflow/c/experimental/gradients` Part 1 #45547

vnghia · 2020-12-09T18:19:54Z

One thing I don't understand, if computing numerical gradients with TensorFloat-32 is numerically unstable, does disabling TensorFloat-32 inside gradient_checker make a lot more sense than disabling it inside the test, since we will have to disable TF-32 in all binaries depending on gradient_checker anyway ?

saxenasaurabh · 2020-12-10T00:42:39Z

tensorflow/c/eager/unified_api_test.cc:124:76: error: invalid conversion from 'tensorflow::int64* {aka long long int*}' to 'int64_t* {aka long int*}' [-fpermissive]
TestTensorHandleWithDimsFloat(ctx.get(), data, dim_sizes, 2, &x_raw);

Please fix.

vnghia · 2020-12-10T00:56:25Z

tensorflow/c/eager/unified_api_test.cc:124:76: error: invalid conversion from 'tensorflow::int64* {aka long long int*}' to 'int64_t* {aka long int*}' [-fpermissive]
TestTensorHandleWithDimsFloat(ctx.get(), data, dim_sizes, 2, &x_raw);

Done

tensorflow/c/eager/gradient_checker_test.cc

rthadur · 2020-12-10T23:02:12Z

@vnvo2409 can you please resolve conflicts ?

vnghia · 2020-12-10T23:20:11Z

@vnvo2409 can you please resolve conflicts ?

Done

…util`

saxenasaurabh · 2020-12-11T18:05:06Z

Looks like eager/BUILD has conflicts. Rebase maybe? Also nn_grad_test is segfaulting internally. I will try to patch and see what's going on.

vnghia · 2020-12-11T18:18:15Z

@saxenasaurabh

eager/BUILD has conflicts.

I've already fixed it.

Also nn_grad_test is segfaulting internally.

There are 3 memory problems with nn_grad_test.

One is comming from SoftMaxModel and RunAndMaybeSum

tensorflow/tensorflow/c/eager/gradient_checker.cc

Lines 68 to 73 in bd1fd58

    
           Status RunAndMaybeSum(AbstractContext* ctx, Model forward, 
        
                                 absl::Span<AbstractTensorHandle* const> inputs, 
        
                                 absl::Span<AbstractTensorHandle*> outputs, 
        
                                 bool use_function) { 
        
             GradientRegistry registry; 
        
             std::vector<AbstractTensorHandle*> model_outputs(1);

SoftMaxModel returns 2 tensors but model_outputs take only 1 tensor so it causes a memory overflow. I intend to fix this issue in the next PR ( because running without asan is fine ), but we could fix in this PR if need though. WDYT ?

gradient_checker leaks a lot of AbstractTensorHandle. Will be fixed in the next PR too.

It seems that tape->ComputeGradients leaks a tensor. The traceback points to the allocation of a new tensor in BuildOnesLike. However, I could not find a fix for it yet.

saxenasaurabh · 2020-12-11T18:26:19Z

Ah got it. Let's fix the SoftmaxModel. The leaks we can address in a follow-up.

…r `gradient_checker`

vnghia · 2020-12-11T18:57:48Z

Let's fix the SoftmaxModel.

Done

vnghia · 2020-12-11T19:52:39Z

@saxenasaurabh
The copybara/feedback failed

vnghia requested review from kkimdev and qqfish as code owners December 9, 2020 18:19

google-ml-butler bot added the size:L CL Change Size: Large label Dec 9, 2020

google-cla bot added the cla: yes label Dec 9, 2020

vnghia mentioned this pull request Dec 9, 2020

Cleanup tensorflow/c/experimental/gradients Part 1 #45409

Closed

kkimdev requested a review from saxenasaurabh December 9, 2020 18:21

saxenasaurabh approved these changes Dec 9, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 9, 2020

saxenasaurabh approved these changes Dec 9, 2020

View reviewed changes

rthadur added kokoro:force-run Tests on submitted change and removed kokoro:force-run Tests on submitted change labels Dec 9, 2020

rthadur assigned gbaned Dec 9, 2020

rthadur added this to Assigned Reviewer in PR Queue via automation Dec 9, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 9, 2020

google-ml-butler bot removed the ready to pull PR ready for merge process label Dec 10, 2020

vnghia requested a review from saxenasaurabh December 10, 2020 00:56

saxenasaurabh approved these changes Dec 10, 2020

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Dec 10, 2020

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 10, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 10, 2020

saxenasaurabh reviewed Dec 10, 2020

View reviewed changes

tensorflow/c/eager/gradient_checker_test.cc Outdated Show resolved Hide resolved

vnghia requested a review from saxenasaurabh December 10, 2020 07:01

google-ml-butler bot removed the ready to pull PR ready for merge process label Dec 10, 2020

saxenasaurabh approved these changes Dec 10, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 10, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 10, 2020

google-ml-butler bot removed the ready to pull PR ready for merge process label Dec 10, 2020

vnghia requested a review from saxenasaurabh December 10, 2020 23:20

vnghia added 5 commits December 11, 2020 09:33

move TestSparseSoftmaxCrossEntropyWithLogitsGrad to nn_grad_test

7c5ca02

add GetValue and TestTensorHandleWithDimsInt to `unified_api_test…

2234086

…util`

refactor gradient_check to use unified_api_testutil

c18a4ca

Fix unified_api_test

9701a12

use delete[] instead of delete

c283090

vnghia force-pushed the gradients-cleanup branch from 3260575 to c283090 Compare December 11, 2020 02:35

saxenasaurabh approved these changes Dec 11, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 11, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 11, 2020

Take only one output of SparseSoftmaxCrossEntropyWithLogitsModel fo…

1f503ed

…r `gradient_checker`

google-ml-butler bot removed the ready to pull PR ready for merge process label Dec 11, 2020

vnghia requested a review from saxenasaurabh December 11, 2020 18:57

saxenasaurabh approved these changes Dec 11, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 11, 2020

copybara-service bot merged commit c4e8492 into tensorflow:master Dec 11, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Dec 11, 2020

vnghia deleted the gradients-cleanup branch December 11, 2020 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup `tensorflow/c/experimental/gradients` Part 1 #45547

Cleanup `tensorflow/c/experimental/gradients` Part 1 #45547

vnghia commented Dec 9, 2020 •

edited

saxenasaurabh commented Dec 10, 2020

vnghia commented Dec 10, 2020

rthadur commented Dec 10, 2020

vnghia commented Dec 10, 2020

saxenasaurabh commented Dec 11, 2020

vnghia commented Dec 11, 2020 •

edited

saxenasaurabh commented Dec 11, 2020

vnghia commented Dec 11, 2020

vnghia commented Dec 11, 2020

Cleanup tensorflow/c/experimental/gradients Part 1 #45547

Cleanup tensorflow/c/experimental/gradients Part 1 #45547

Conversation

vnghia commented Dec 9, 2020 • edited

saxenasaurabh commented Dec 10, 2020

vnghia commented Dec 10, 2020

rthadur commented Dec 10, 2020

vnghia commented Dec 10, 2020

saxenasaurabh commented Dec 11, 2020

vnghia commented Dec 11, 2020 • edited

saxenasaurabh commented Dec 11, 2020

vnghia commented Dec 11, 2020

vnghia commented Dec 11, 2020

Cleanup `tensorflow/c/experimental/gradients` Part 1 #45547

Cleanup `tensorflow/c/experimental/gradients` Part 1 #45547

vnghia commented Dec 9, 2020 •

edited

vnghia commented Dec 11, 2020 •

edited