Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please clarify interpolation algorithm for resample2d #358

Open
BruceDai opened this issue Feb 28, 2023 · 5 comments
Open

Please clarify interpolation algorithm for resample2d #358

BruceDai opened this issue Feb 28, 2023 · 5 comments

Comments

@BruceDai
Copy link
Contributor

There're two interpolation modes for resample2d:

  1. Nearest Neighbor interpolation
  2. Linear interpolation

@huningxin @wchao1115 @zolkis Please clarify interpolation algorithm for them, thanks.

@BruceDai
Copy link
Contributor Author

Link to #270

@fdwr
Copy link
Collaborator

fdwr commented Mar 1, 2023

BruceDai: You want to sample pixel centers, not the top left of the pixel (which would poorly shift the output image slightly), and you want the output to be reflection invariant.

For bilinear sampling

scale.x = outputSize.x / inputSize.x
scale.y = outputSize.y / inputSize.y
inputCoordinate.x = (outputCoordinate.x + 0.5) / scale.x - 0.5
inputCoordinate.y = (outputCoordinate.y + 0.5) / scale.y - 0.5

Starting from a given output coordinate, compute the corresponding input coordinate. The input coordinate gives you the location of the 4 input pixels to sample, and the fractional component of the input coordinate gives the weight for a linear interpolation between the four adjacent pixels. e.g. Given a 1D tall image being stretched horizontally, if the mapped input coordinate was x=10.25, then the output pixel would weigh 75% from input pixel x=10 and 25% from input pixel x=11. For 2D, you take the bilinear weight vertically too.

In ONNX, it is Resize with coordinate_transformation_mode = half_pixel.

In TF2, it looks like:

import tensorflow as tf

# NCHW
x = tf.constant(
        [[[
           [ 0, 1, 2, 3],
           [ 0, 1, 2, 3],
           [12,13,14,15],
           [12,13,14,15]
        ]]],
        dtype=tf.float32
    )
x_nhwc = tf.transpose(x, perm=[0, 2, 3, 1])

y_nhwc = tf.image.resize(
    x_nhwc,
    size=(8,8),
    method=tf.image.ResizeMethod.BILINEAR,
    preserve_aspect_ratio=False,
    antialias=False,
    name=None
)
y = tf.transpose(y_nhwc, perm=[0, 3, 1, 2])
print("x\n", x, sep='')
print("y\n", y, sep='');

#   tf.Tensor(
#   [[[[ 0.  1.  2.  3.]
#      [ 0.  1.  2.  3.]
#      [12. 13. 14. 15.]
#      [12. 13. 14. 15.]]]], shape=(1, 1, 4, 4), dtype=float32)
#   y
#   tf.Tensor(
#   [[[[ 0.    0.25  0.75  1.25  1.75  2.25  2.75  3.  ]
#      [ 0.    0.25  0.75  1.25  1.75  2.25  2.75  3.  ]
#      [ 0.    0.25  0.75  1.25  1.75  2.25  2.75  3.  ]
#      [ 3.    3.25  3.75  4.25  4.75  5.25  5.75  6.  ]
#      [ 9.    9.25  9.75 10.25 10.75 11.25 11.75 12.  ]
#      [12.   12.25 12.75 13.25 13.75 14.25 14.75 15.  ]
#      [12.   12.25 12.75 13.25 13.75 14.25 14.75 15.  ]
#      [12.   12.25 12.75 13.25 13.75 14.25 14.75 15.  ]]]], shape=(1, 1, 8, 8), dtype=float32)

The sampling pattern should look evenly distributed like:

image
(from here https://jricheimer.github.io/tensorflow/2019/02/11/resize-confusion/)

And not like:

image

For nearest neighbor sampling

I don't see any details in https://www.w3.org/TR/webnn/#api-mlgraphbuilder-resample2d, but I presume/propose you just round to nearest with X.5 halves toward negative infinity (which is a common default in graphics). So an input coordinate of x = 10.4 would read from x = 10, x = 10.9 from 11, and 10.5 from 10.0. So x = ceil(x - 0.5) (not x = floor(x + 0.5)). Note this differs from the classic "round halves up" mode used in banking, and it differs from round halves to nearest even (which would give a bad staggered appearance).

@BruceDai
Copy link
Contributor Author

BruceDai commented Mar 7, 2023

@fdwr Thanks much for your clarifications, please take a look at resample2d implementations for WebNN-Baseline.

@anssiko
Copy link
Member

anssiko commented Mar 8, 2023

Just want to say Thank You to @fdwr and @BruceDai for your attention to detail in this and other issues.

We can include to the specification figures similar to those referred to in this issue if/when they help explain the algorithms. Any figures are considered informative and complement the normative algorithms i.e. they cannot supplant them. You can consider this idea an extra "enhancement" effort. The icing on the cake.

As a concrete example inline SVG works for smaller and simpler figures: example and source. A good property of SVG is it allows links and of course as a text-based format allows for human-readable diffs.

@anssiko
Copy link
Member

anssiko commented May 19, 2023

We discussed this in https://www.w3.org/2023/05/11-webmachinelearning-minutes.html#t11 and it looks like this would be an opportunity for an interested expert and WG participant (@fdwr maybe? :-)) to propose a PR that clarifies the expected semantics of these two interpolation algorithms:

enum MLInterpolationMode {
  "nearest-neighbor",
  "linear"
};

Normative prose would go into Arguments: > mode of the resample2d() method section and informative content such as figures could be added to a green note box similar to those "can be generically emulated" boxes in various decomposable ops following Returns:.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants