Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward pass issue when there is implicit broadcasting in add #181

Closed
jmacglashan opened this issue Feb 26, 2023 · 3 comments · Fixed by #312
Closed

Backward pass issue when there is implicit broadcasting in add #181

jmacglashan opened this issue Feb 26, 2023 · 3 comments · Fixed by #312
Labels
bug Something isn't working

Comments

@jmacglashan
Copy link
Contributor

Describe the bug
You can reproduce this on crates.io 0.5, but it should also be reproducible on master.

To replicate the bug, try the following program

use burn::tensor::Distribution;
use burn::tensor::Tensor;
use burn_autodiff::ADBackendDecorator;
use burn_ndarray::NdArrayBackend;

type BackendAlias = NdArrayBackend<f32>;
type ADBackendAlias = ADBackendDecorator<BackendAlias>;

pub fn main() {
    let x: Tensor<ADBackendAlias, 2> = Tensor::random([2, 3], Distribution::Standard);
    let label: Tensor<ADBackendAlias, 2> = Tensor::random([2, 1], Distribution::Standard);

    let weights: Tensor<ADBackendAlias, 2> = Tensor::random([3, 1], Distribution::Standard);
    let bias: Tensor<ADBackendAlias, 2> = Tensor::random([1, 3], Distribution::Standard);

    let y = x.matmul(&weights).add(&bias);
    let loss = y.sub(&label).powf(2.0).sum();

    println!("y shape = {:?}", y.shape());
    println!("{:?}", loss.inner());
    let gradients = loss.backward();
    let weights_grad = weights.grad(&gradients);
    let bias_grad = bias.grad(&gradients);
    println!("{:?}\n{:?}", weights_grad, bias_grad);

}

When you run this, the forward pass works (it prints the shape of y and the value of the loss), but the backward pass produces a runtime error generating the error:

Tensor { value: NdArrayTensor { array: [15.016455], shape=[1], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } }
thread 'main' panicked at 'ndarray: inputs 2 × 3 and 1 × 3 are not compatible for matrix multiplication', <path_omitted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.6/src/linalg/impl_linalg.rs:299:5

Expected behavior
The expected behavior is if a forward pass is valid and runs without runtime error, a backward pass should also run without a runtime error.

Additional context
Note that this is not a typical model definition. This looks like an affine function, but it's actually a little different with an unexpected implicit broadcast in the add operation.

That is, for a proper Affine function, we should expect the bias term to have shape [1, 1] (or be reshaped to that). Instead, it has shape [1, 3]

This means the add operation in x.matmul(&weights).add(&bias) is adding tensors with shapes [2, 1] and shapes [1, 3] which produces a tensor with shape [2, 3] by implicitly broadcasting the first tensor along the second dimension to size 3, and implicitly broadcasting the second tensor along the first dimension to size 2.

Based on the error, I believe the issue is the backward pass is trying to push the head gradient of the add (with shape [2, 3]) onto the input with shape [1, 3], but the backward pass for add is not account for the fact that the broadcast could have happened. I think it would effectively have to inject a broadcast operation implicitly inside itself first and then the autodiff should work.

@jmacglashan
Copy link
Contributor Author

jmacglashan commented Feb 26, 2023

Here is a better (simpler) example that narrows down the problem

use burn::tensor::Distribution;
use burn::tensor::Tensor;
use burn_autodiff::ADBackendDecorator;
use burn_ndarray::NdArrayBackend;

type BackendAlias = NdArrayBackend<f32>;
type ADBackendAlias = ADBackendDecorator<BackendAlias>;

pub fn main() {
    let x: Tensor<ADBackendAlias, 2> = Tensor::random([2, 1], Distribution::Standard);
    let y: Tensor<ADBackendAlias, 2> = Tensor::random([1, 3], Distribution::Standard);
    let z = x.add(&y);
    println!("{:?}", z.inner());
    let grads = z.backward();
    let x_grad = x.grad(&grads).unwrap();
    println!("{:?}, {:?}", x.shape(), x_grad.shape()); // Outputs: Shape { dims: [2, 1] }, Shape { dims: [2, 3] }
    assert!(x_grad.shape() == x.shape()); // FAILS
}

What this shows is when the add operation broadcasts, the gradients on the arguments do not have the right shape. The above program shows that because we see the gradient it computes on x has a different shape than the actual tensor x.

@antimora antimora added the bug Something isn't working label Apr 2, 2023
@antimora
Copy link
Collaborator

@nathanielsimard when you have a chance, lets triage this bug. We need to understand its impact on users and severity. Maybe this bug does not exist because of your recent changes and fixes.

@antimora
Copy link
Collaborator

@jmacglashan, @nathanielsimard has fixed the bug. If you have a chance, can you please confirm the fix is complete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants