-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please clarify interpolation algorithm for resample2d #358
Comments
Link to #270 |
BruceDai: You want to sample pixel centers, not the top left of the pixel (which would poorly shift the output image slightly), and you want the output to be reflection invariant. For bilinear sampling
Starting from a given output coordinate, compute the corresponding input coordinate. The input coordinate gives you the location of the 4 input pixels to sample, and the fractional component of the input coordinate gives the weight for a linear interpolation between the four adjacent pixels. e.g. Given a 1D tall image being stretched horizontally, if the mapped input coordinate was x=10.25, then the output pixel would weigh 75% from input pixel x=10 and 25% from input pixel x=11. For 2D, you take the bilinear weight vertically too. In ONNX, it is In TF2, it looks like: import tensorflow as tf
# NCHW
x = tf.constant(
[[[
[ 0, 1, 2, 3],
[ 0, 1, 2, 3],
[12,13,14,15],
[12,13,14,15]
]]],
dtype=tf.float32
)
x_nhwc = tf.transpose(x, perm=[0, 2, 3, 1])
y_nhwc = tf.image.resize(
x_nhwc,
size=(8,8),
method=tf.image.ResizeMethod.BILINEAR,
preserve_aspect_ratio=False,
antialias=False,
name=None
)
y = tf.transpose(y_nhwc, perm=[0, 3, 1, 2])
print("x\n", x, sep='')
print("y\n", y, sep='');
# tf.Tensor(
# [[[[ 0. 1. 2. 3.]
# [ 0. 1. 2. 3.]
# [12. 13. 14. 15.]
# [12. 13. 14. 15.]]]], shape=(1, 1, 4, 4), dtype=float32)
# y
# tf.Tensor(
# [[[[ 0. 0.25 0.75 1.25 1.75 2.25 2.75 3. ]
# [ 0. 0.25 0.75 1.25 1.75 2.25 2.75 3. ]
# [ 0. 0.25 0.75 1.25 1.75 2.25 2.75 3. ]
# [ 3. 3.25 3.75 4.25 4.75 5.25 5.75 6. ]
# [ 9. 9.25 9.75 10.25 10.75 11.25 11.75 12. ]
# [12. 12.25 12.75 13.25 13.75 14.25 14.75 15. ]
# [12. 12.25 12.75 13.25 13.75 14.25 14.75 15. ]
# [12. 12.25 12.75 13.25 13.75 14.25 14.75 15. ]]]], shape=(1, 1, 8, 8), dtype=float32) The sampling pattern should look evenly distributed like:
And not like: For nearest neighbor samplingI don't see any details in https://www.w3.org/TR/webnn/#api-mlgraphbuilder-resample2d, but I presume/propose you just round to nearest with X.5 halves toward negative infinity (which is a common default in graphics). So an input coordinate of x = 10.4 would read from x = 10, x = 10.9 from 11, and 10.5 from 10.0. So |
@fdwr Thanks much for your clarifications, please take a look at resample2d implementations for WebNN-Baseline. |
Just want to say Thank You to @fdwr and @BruceDai for your attention to detail in this and other issues. We can include to the specification figures similar to those referred to in this issue if/when they help explain the algorithms. Any figures are considered informative and complement the normative algorithms i.e. they cannot supplant them. You can consider this idea an extra "enhancement" effort. The icing on the cake. As a concrete example inline SVG works for smaller and simpler figures: example and source. A good property of SVG is it allows links and of course as a text-based format allows for human-readable diffs. |
We discussed this in https://www.w3.org/2023/05/11-webmachinelearning-minutes.html#t11 and it looks like this would be an opportunity for an interested expert and WG participant (@fdwr maybe? :-)) to propose a PR that clarifies the expected semantics of these two interpolation algorithms:
Normative prose would go into |
There're two interpolation modes for resample2d:
@huningxin @wchao1115 @zolkis Please clarify interpolation algorithm for them, thanks.
The text was updated successfully, but these errors were encountered: