You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you run this, the forward pass works (it prints the shape of y and the value of the loss), but the backward pass produces a runtime error generating the error:
Tensor { value: NdArrayTensor { array: [15.016455], shape=[1], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } }
thread 'main' panicked at 'ndarray: inputs 2 × 3 and 1 × 3 are not compatible for matrix multiplication', <path_omitted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.6/src/linalg/impl_linalg.rs:299:5
Expected behavior
The expected behavior is if a forward pass is valid and runs without runtime error, a backward pass should also run without a runtime error.
Additional context
Note that this is not a typical model definition. This looks like an affine function, but it's actually a little different with an unexpected implicit broadcast in the add operation.
That is, for a proper Affine function, we should expect the bias term to have shape [1, 1] (or be reshaped to that). Instead, it has shape [1, 3]
This means the add operation in x.matmul(&weights).add(&bias) is adding tensors with shapes [2, 1] and shapes [1, 3] which produces a tensor with shape [2, 3] by implicitly broadcasting the first tensor along the second dimension to size 3, and implicitly broadcasting the second tensor along the first dimension to size 2.
Based on the error, I believe the issue is the backward pass is trying to push the head gradient of the add (with shape [2, 3]) onto the input with shape [1, 3], but the backward pass for add is not account for the fact that the broadcast could have happened. I think it would effectively have to inject a broadcast operation implicitly inside itself first and then the autodiff should work.
The text was updated successfully, but these errors were encountered:
What this shows is when the add operation broadcasts, the gradients on the arguments do not have the right shape. The above program shows that because we see the gradient it computes on x has a different shape than the actual tensor x.
@nathanielsimard when you have a chance, lets triage this bug. We need to understand its impact on users and severity. Maybe this bug does not exist because of your recent changes and fixes.
Describe the bug
You can reproduce this on crates.io 0.5, but it should also be reproducible on master.
To replicate the bug, try the following program
When you run this, the forward pass works (it prints the shape of y and the value of the loss), but the backward pass produces a runtime error generating the error:
Expected behavior
The expected behavior is if a forward pass is valid and runs without runtime error, a backward pass should also run without a runtime error.
Additional context
Note that this is not a typical model definition. This looks like an affine function, but it's actually a little different with an unexpected implicit broadcast in the add operation.
That is, for a proper Affine function, we should expect the bias term to have shape
[1, 1]
(or be reshaped to that). Instead, it has shape[1, 3]
This means the add operation in
x.matmul(&weights).add(&bias)
is adding tensors with shapes[2, 1]
and shapes[1, 3]
which produces a tensor with shape[2, 3]
by implicitly broadcasting the first tensor along the second dimension to size 3, and implicitly broadcasting the second tensor along the first dimension to size 2.Based on the error, I believe the issue is the backward pass is trying to push the head gradient of the add (with shape
[2, 3]
) onto the input with shape[1, 3]
, but the backward pass for add is not account for the fact that the broadcast could have happened. I think it would effectively have to inject a broadcast operation implicitly inside itself first and then the autodiff should work.The text was updated successfully, but these errors were encountered: