Signal an error when number of index argument exceeds array dimension #418

ukoethe · 2017-09-16T20:37:02Z

Currently, index access to arrays and ufuncs via operator() implements very permissive semantics: When passing too many arguments, only the last D are used, where D is the dimension. I'm convinced that this will lead to errors that are hard to debug. Instead, the function should throw an exception (if D is only known at runtime) or a static_assert (if D is known at compile time). I can't see any negative side effects of this change.

The text was updated successfully, but these errors were encountered:

JohanMabille · 2017-09-16T20:54:48Z

Actually, assuming a is a 2D expression and b is a 1D expression, the following code should be valid:

auto expr = a + b;
double res = expr(2, 3); // This is computed as a(2, 3) + b(2, 3);

More formally, we want the following to be true, whatever the dimensions of a and b are:

(a+b)(i0, ..., in) = a(i0, ..., in) + b(i0, ..., in);

This is not possible if we forbid more arguments than the dimension.

ukoethe · 2017-09-16T21:00:30Z

Then the cleanest solution would be to provide two functions:

b(2, 3) signals an error
b.broadcast(2, 3) implements the current behavior and makes explicit that this is intended (maybe find a shorter name).

ukoethe · 2017-09-16T21:02:29Z

BTW, numpy also signals an error when too many indices are provided and still manages to implement broadcasting.

ukoethe · 2017-09-16T21:06:06Z

Another solution to the broadcasting problem worked successfully in my vigra2 prototype: Embed the smaller array in an array view with the correct dimension and set the stride of the missing dimensions to 0. This has the additional advantage that one can broadcast over any dimension, not just the first.

JohanMabille · 2017-09-16T21:26:06Z

Numpy implements broadcasting for expression assignment only, it does not provide unevaluated expressions with evaluation upon element access (which is what we're doing in the example that I posted above).

When assigning an expression, the broadcasting mechanism in xtensor is different from the one described here; we use iterators and steppers instead of the strides index scheme to avoid computing inner product; numpy uses a similar one with pointer arithmetic.

I don't really see how allowing more index arguments than the dimension can be hard to debug, as long as the behavior is specified in the documentation (which is something we should really do).

Also I'm not sure to get what you mean about broadcasting over any dimension and not just the first ones. Do you mean something like the following?

xarray<double> a({3, 1, 4});
xarray<double> b({3, 3, 4});
auto expr = a + b;
double res = expr(2, 2, 2);

That is, broadcasting w.r.t. to the second dimension ?

ukoethe · 2017-09-16T21:44:37Z

Do you mean broadcasting w.r.t. to the second dimension ?

Yes. But I suppose this already works?

"Hard to debug" refers to the case where one accidentaly writes more indices than intended, and the code silently produces incorrect results. If the results are not too far off, it may take a long time until one even notices that something is wrong.

JohanMabille · 2017-09-16T22:02:08Z

About the broadcasting over any dimension, indeed that already works.

Considering an additional broadcasting method, we would have

(a+b).broadcast(i0, ..., in) == a.broadcast(i0, ..., in) + b.broadcast(i0, ..., in);

but (a+b)(i0, ...., in) may throw if a and b do not have the same dimension. Thus if I write some code that must work with any kind of expression, I will end up using the broadcast method everywhere instead of the operator(), to be sure I don't have any exception.

If the problem is debugging only, maybe a better solution would be to turn on exceptions according to a compilation flag (just like we do for bounds checking), or add an element access method that throws when it is provided too many arguments. Besides, I find it more "C++ish" (sorry for this barbarism), where the operator() can be very permissive, and other functions (like at) perform additional checks and may throw.

SylvainCorlay · 2017-09-16T22:55:12Z

The desired behavior is that if too many indices are provided, we drop the first until we reach the right number. For containers, it is to prepend with zeros until we reach the right number. We narrowed this recently and some expressions still don't implement correctly the case where too few arguments are provided, (namely xview and xgenerator).

This is the only rule that guarantees that (a + b)(i0, ..., in) = a(i0, ..., in) + b(i0, ..., in) with the broadcasting rules when a and b have different dimensions.

I feel very strongly that this is how it should be across the library and that throwing exceptions on dimension mismatch is actually a mistake

ukoethe · 2017-09-16T22:56:02Z

but (a+b)(i0, ...., in) may throw if a and b do not have the same dimension.

I didn't mean to be that strict. (a+b)(i0, ...., in) would only throw if n was larger than the maximum dimension of a and b (or if the shapes were not compatible). Thus, it would be implemented as

(a+b)(i0, ..., in) == a.broadcast(i0, ..., in) + b.broadcast(i0, ..., in)

after checking that the number of indices on the LHS is valid.

maybe a better solution would be to turn on exceptions according to a compilation flag

That's also a possibility. I generally follow this rule: if the check doesn't significantly affect performance, turn it on by default, otherwise have the user turn it on explicitly. One should not forget that the C/C++ tradition of having very permissive functions originated in a time when computers were really slow. We don't have to continue like this just for historical reasons when the performance argument is no longer valid (e.g., a bounds check may incur no performance hit at all if the CPU pipeline gets branch prediction right). Every diagnostic is helpful for the user!

ukoethe · 2017-09-16T22:58:32Z

For containers, it is to prepend with zeros until we reach the right number.

I don't argue against this rule at all. I just refer to the case when there are too many indices (meaning: more than the largest dimension in the expression) on the outermost expression/access.

JohanMabille · 2017-09-17T21:04:05Z

Even with the less strict implementation

(a+b)(i0, ..., in) == a.broadcast(i0, ..., in) + b.broadcast(i0, ..., in)

writing code that must work with any kind of expressions, whatever the dimension, will end up using the broadcast method everywhere. One of the use case we have is a 2D PDE solver where coeffecients may be constant over a direction (or even the two). In that case, to save memory space, we use 1D (or 0D) expressions but access them as if they were 2D (so the same code works with every kind of coefficients).

About the C++ tradition, my point was more on the intuitive part of it: having used the STL, we expect operator[] / operator() of containers to not check their arguments, and to have dedicated functions for that (the counterpart of the vector<T>::at function). I find it more intuitive to keep this scheme, that is operator() / operator[] does not throw, specific method such at or safe_access do. Thus the user has the choice, he can work with the safe API if he needs to debug or if this does not hit performances on his machine, or with the one guaranteed to be the fastest.

Every diagnostic is helpful for the user!

I strongly agree with you

So here is what I propose to do for fixing that:

add an at method that checks dimension and bounds
add a compilation flag to activate dimension checking for operator(), for debugging purpose.

ukoethe · 2017-09-18T15:12:37Z

Thanks a lot. One can now even compare the speed with and w/o ckecking code activated and eventually keep the check active at places were it doesn't hurt performance or usability.

JohanMabille · 2017-09-18T15:21:20Z

I forgot to mention, checking the dimension requires to compute the shape of the expression, which may add some overhead for some kinds of expressions where the shape is computed the first time it is asked.

This won't cause any performance hurt in most of the cases since the shape will be evaluated when assigning the expression, however this could give strange results in benchmarks.

ukoethe · 2017-09-18T15:25:20Z

If the expression only involves arrays whose dimension is known at compile time, the dimension check can be done in a metaprogramm followed by static_assert and costs nothing at runtime. This is probably the most powerful variant of the check I can think of.

ukoethe · 2017-09-18T15:29:32Z

This is especially useful because arrays with static dimensions are anyway the preferrable alternative in many situations.

SylvainCorlay added the Feature Request label Sep 17, 2017

JohanMabille mentioned this issue Sep 18, 2017

safe access method #420

Merged

JohanMabille closed this as completed in #420 Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signal an error when number of index argument exceeds array dimension #418

Signal an error when number of index argument exceeds array dimension #418

ukoethe commented Sep 16, 2017

JohanMabille commented Sep 16, 2017

ukoethe commented Sep 16, 2017

ukoethe commented Sep 16, 2017 •

edited

Loading

ukoethe commented Sep 16, 2017

JohanMabille commented Sep 16, 2017 •

edited

Loading

ukoethe commented Sep 16, 2017 •

edited

Loading

JohanMabille commented Sep 16, 2017 •

edited

Loading

SylvainCorlay commented Sep 16, 2017

ukoethe commented Sep 16, 2017

ukoethe commented Sep 16, 2017 •

edited

Loading

JohanMabille commented Sep 17, 2017 •

edited

Loading

ukoethe commented Sep 18, 2017

JohanMabille commented Sep 18, 2017

ukoethe commented Sep 18, 2017 •

edited

Loading

ukoethe commented Sep 18, 2017

Signal an error when number of index argument exceeds array dimension #418

Signal an error when number of index argument exceeds array dimension #418

Comments

ukoethe commented Sep 16, 2017

JohanMabille commented Sep 16, 2017

ukoethe commented Sep 16, 2017

ukoethe commented Sep 16, 2017 • edited Loading

ukoethe commented Sep 16, 2017

JohanMabille commented Sep 16, 2017 • edited Loading

ukoethe commented Sep 16, 2017 • edited Loading

JohanMabille commented Sep 16, 2017 • edited Loading

SylvainCorlay commented Sep 16, 2017

ukoethe commented Sep 16, 2017

ukoethe commented Sep 16, 2017 • edited Loading

JohanMabille commented Sep 17, 2017 • edited Loading

ukoethe commented Sep 18, 2017

JohanMabille commented Sep 18, 2017

ukoethe commented Sep 18, 2017 • edited Loading

ukoethe commented Sep 18, 2017

ukoethe commented Sep 16, 2017 •

edited

Loading

JohanMabille commented Sep 16, 2017 •

edited

Loading

ukoethe commented Sep 16, 2017 •

edited

Loading

JohanMabille commented Sep 16, 2017 •

edited

Loading

ukoethe commented Sep 16, 2017 •

edited

Loading

JohanMabille commented Sep 17, 2017 •

edited

Loading

ukoethe commented Sep 18, 2017 •

edited

Loading