Backward pass issue when there is implicit broadcasting in add #181

jmacglashan · 2023-02-26T19:30:07Z

Describe the bug
You can reproduce this on crates.io 0.5, but it should also be reproducible on master.

To replicate the bug, try the following program

use burn::tensor::Distribution;
use burn::tensor::Tensor;
use burn_autodiff::ADBackendDecorator;
use burn_ndarray::NdArrayBackend;

type BackendAlias = NdArrayBackend<f32>;
type ADBackendAlias = ADBackendDecorator<BackendAlias>;

pub fn main() {
    let x: Tensor<ADBackendAlias, 2> = Tensor::random([2, 3], Distribution::Standard);
    let label: Tensor<ADBackendAlias, 2> = Tensor::random([2, 1], Distribution::Standard);

    let weights: Tensor<ADBackendAlias, 2> = Tensor::random([3, 1], Distribution::Standard);
    let bias: Tensor<ADBackendAlias, 2> = Tensor::random([1, 3], Distribution::Standard);

    let y = x.matmul(&weights).add(&bias);
    let loss = y.sub(&label).powf(2.0).sum();

    println!("y shape = {:?}", y.shape());
    println!("{:?}", loss.inner());
    let gradients = loss.backward();
    let weights_grad = weights.grad(&gradients);
    let bias_grad = bias.grad(&gradients);
    println!("{:?}\n{:?}", weights_grad, bias_grad);

}

When you run this, the forward pass works (it prints the shape of y and the value of the loss), but the backward pass produces a runtime error generating the error:

Tensor { value: NdArrayTensor { array: [15.016455], shape=[1], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } }
thread 'main' panicked at 'ndarray: inputs 2 × 3 and 1 × 3 are not compatible for matrix multiplication', <path_omitted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.6/src/linalg/impl_linalg.rs:299:5

Expected behavior
The expected behavior is if a forward pass is valid and runs without runtime error, a backward pass should also run without a runtime error.

Additional context
Note that this is not a typical model definition. This looks like an affine function, but it's actually a little different with an unexpected implicit broadcast in the add operation.

That is, for a proper Affine function, we should expect the bias term to have shape [1, 1] (or be reshaped to that). Instead, it has shape [1, 3]

This means the add operation in x.matmul(&weights).add(&bias) is adding tensors with shapes [2, 1] and shapes [1, 3] which produces a tensor with shape [2, 3] by implicitly broadcasting the first tensor along the second dimension to size 3, and implicitly broadcasting the second tensor along the first dimension to size 2.

Based on the error, I believe the issue is the backward pass is trying to push the head gradient of the add (with shape [2, 3]) onto the input with shape [1, 3], but the backward pass for add is not account for the fact that the broadcast could have happened. I think it would effectively have to inject a broadcast operation implicitly inside itself first and then the autodiff should work.

The text was updated successfully, but these errors were encountered:

jmacglashan · 2023-02-26T19:54:51Z

Here is a better (simpler) example that narrows down the problem

use burn::tensor::Distribution;
use burn::tensor::Tensor;
use burn_autodiff::ADBackendDecorator;
use burn_ndarray::NdArrayBackend;

type BackendAlias = NdArrayBackend<f32>;
type ADBackendAlias = ADBackendDecorator<BackendAlias>;

pub fn main() {
    let x: Tensor<ADBackendAlias, 2> = Tensor::random([2, 1], Distribution::Standard);
    let y: Tensor<ADBackendAlias, 2> = Tensor::random([1, 3], Distribution::Standard);
    let z = x.add(&y);
    println!("{:?}", z.inner());
    let grads = z.backward();
    let x_grad = x.grad(&grads).unwrap();
    println!("{:?}, {:?}", x.shape(), x_grad.shape()); // Outputs: Shape { dims: [2, 1] }, Shape { dims: [2, 3] }
    assert!(x_grad.shape() == x.shape()); // FAILS
}

What this shows is when the add operation broadcasts, the gradients on the arguments do not have the right shape. The above program shows that because we see the gradient it computes on x has a different shape than the actual tensor x.

antimora · 2023-04-25T16:52:25Z

@nathanielsimard when you have a chance, lets triage this bug. We need to understand its impact on users and severity. Maybe this bug does not exist because of your recent changes and fixes.

antimora · 2023-04-26T14:25:30Z

@jmacglashan, @nathanielsimard has fixed the bug. If you have a chance, can you please confirm the fix is complete?

antimora added the bug Something isn't working label Apr 2, 2023

nathanielsimard mentioned this issue Apr 25, 2023

Fix: support broadcast during backward #312

Merged

nathanielsimard closed this as completed in #312 Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backward pass issue when there is implicit broadcasting in add #181

Backward pass issue when there is implicit broadcasting in add #181

jmacglashan commented Feb 26, 2023

jmacglashan commented Feb 26, 2023 •

edited

Loading

antimora commented Apr 25, 2023

antimora commented Apr 26, 2023

Backward pass issue when there is implicit broadcasting in add #181

Backward pass issue when there is implicit broadcasting in add #181

Comments

jmacglashan commented Feb 26, 2023

jmacglashan commented Feb 26, 2023 • edited Loading

antimora commented Apr 25, 2023

antimora commented Apr 26, 2023

jmacglashan commented Feb 26, 2023 •

edited

Loading