fix the problem of sigmoid gradient generating NaN #1140

wcshds · 2024-01-13T10:48:56Z

Pull Request Template

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

fix #1139

Changes

Use sigmoid's derivative formula directly to avoid differentiating log and exp in autodiff.

codecov · 2024-01-13T11:05:42Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (76c9358) 85.67% compared to head (197cf27) 85.95%.
Report is 4 commits behind head on main.

Files	Patch %	Lines
burn-autodiff/src/ops/activation.rs	95.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1140      +/-   ##
==========================================
+ Coverage   85.67%   85.95%   +0.27%     
==========================================
  Files         513      518       +5     
  Lines       57006    57724     +718     
==========================================
+ Hits        48841    49616     +775     
+ Misses       8165     8108      -57

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard · 2024-01-13T15:08:28Z

burn-tensor/src/tensor/ops/tensor.rs

+    /// Returns a new tensor with sigmoid values.
+    ///
+    /// # Arguments
+    ///
+    /// * `tensor` - The tensor to take the sigmoid of.
+    ///
+    /// # Returns
+    ///
+    /// A tensor with the same shape as `tensor` with sigmoid values.
+    fn sigmoid<const D: usize>(tensor: FloatTensor<B, D>) -> FloatTensor<B, D> {
+        B::exp(B::neg(B::log(B::add_scalar(
+            B::exp(B::neg(tensor)),
+            1.0_f32.elem(),
+        ))))
+    }


We can add this function in burn-tensor/src/tensor/ops/activation.rs instead.

nathanielsimard · 2024-01-13T15:11:44Z

burn-tensor/src/tensor/activation/base.rs

+    match B::FloatElem::precision() {
+        Precision::Half => {
+            let tensor_full = tensor.to_full_precision();
+            let tensor_tmp = tensor_full.sigmoid();
+            Tensor::from_full_precision(tensor_tmp)
+        }
+        _ => tensor.sigmoid(),
+    }


I don't think we need full precision here, as it is now declared as a method in the backend. The backend implementations can choose to use full precision regardless of the circumstances. Perhaps we can consider incorporating full precision into the default implementation.

nathanielsimard · 2024-01-15T15:00:56Z

burn-tensor/src/tensor/ops/activation.rs

+        let tensor_full = B::to_full_precision(&tensor);
+        let tensor_tmp = B::FullPrecisionBackend::exp(B::FullPrecisionBackend::neg(
+            B::FullPrecisionBackend::log(B::FullPrecisionBackend::add_scalar(
+                B::FullPrecisionBackend::exp(B::FullPrecisionBackend::neg(tensor_full)),
+                1.0.elem(),
+            )),
+        ));
+
+        B::from_full_precision(tensor_tmp)


Not sure @louisfd if there is a more numerically stable implementation possible here.

Looked it up, i think it's the best we can do

louisfd

Hi, can you check my comment about the naming of the argument to backward, afterwards it will be ready

louisfd · 2024-01-16T18:28:08Z

burn-tensor/src/tensor/ops/activation.rs

+        let tensor_full = B::to_full_precision(&tensor);
+        let tensor_tmp = B::FullPrecisionBackend::exp(B::FullPrecisionBackend::neg(
+            B::FullPrecisionBackend::log(B::FullPrecisionBackend::add_scalar(
+                B::FullPrecisionBackend::exp(B::FullPrecisionBackend::neg(tensor_full)),
+                1.0.elem(),
+            )),
+        ));
+
+        B::from_full_precision(tensor_tmp)


Looked it up, i think it's the best we can do

louisfd · 2024-01-16T20:02:20Z

burn-tensor/src/tensor/ops/activation.rs

+    ///
+    /// The output tensor.
+    fn sigmoid_backward<const D: usize>(
+        x: FloatTensor<B, D>,


I think x should have a better name, like output, because it's actually sigmoid(x) that was saved in the state, not the original input.

louisfd

Thanks a lot

wcshds added 3 commits January 13, 2024 17:40

use sigmoid derivative formulas

6415689

add test

8ac7f6e

fix test error

a563e85

nathanielsimard requested changes Jan 13, 2024

View reviewed changes

wcshds added 2 commits January 13, 2024 23:57

move sigmoid to tensor/ops/activation.rs

7df177f

use full precision in the default implementation

c13603c

wcshds requested a review from nathanielsimard January 13, 2024 17:11

nathanielsimard requested a review from louisfd January 15, 2024 15:00

nathanielsimard reviewed Jan 15, 2024

View reviewed changes

rename the param of sigmoid_backward

197cf27

louisfd requested changes Jan 16, 2024

View reviewed changes

louisfd approved these changes Jan 16, 2024

View reviewed changes

louisfd merged commit a5bdf38 into tracel-ai:main Jan 16, 2024
14 checks passed

wcshds deleted the sigmoid-backward branch January 17, 2024 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the problem of sigmoid gradient generating NaN #1140

fix the problem of sigmoid gradient generating NaN #1140

wcshds commented Jan 13, 2024

codecov bot commented Jan 13, 2024 •

edited

nathanielsimard Jan 13, 2024

nathanielsimard Jan 13, 2024

nathanielsimard Jan 15, 2024

louisfd Jan 16, 2024

louisfd left a comment

louisfd Jan 16, 2024

louisfd Jan 16, 2024

wcshds Jan 16, 2024

louisfd left a comment

fix the problem of sigmoid gradient generating NaN #1140

fix the problem of sigmoid gradient generating NaN #1140

Conversation

wcshds commented Jan 13, 2024

Pull Request Template

Checklist

Related Issues/PRs

Changes

codecov bot commented Jan 13, 2024 • edited

Codecov Report

nathanielsimard Jan 13, 2024

Choose a reason for hiding this comment

nathanielsimard Jan 13, 2024

Choose a reason for hiding this comment

nathanielsimard Jan 15, 2024

Choose a reason for hiding this comment

louisfd Jan 16, 2024

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

louisfd Jan 16, 2024

Choose a reason for hiding this comment

louisfd Jan 16, 2024

Choose a reason for hiding this comment

wcshds Jan 16, 2024

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 13, 2024 •

edited