Skip to content

Abort array/tensor construction if pyobject is not valid#176

Merged
JohanMabille merged 1 commit intoxtensor-stack:masterfrom
PerretB:py_container_from_null
Nov 5, 2018
Merged

Abort array/tensor construction if pyobject is not valid#176
JohanMabille merged 1 commit intoxtensor-stack:masterfrom
PerretB:py_container_from_null

Conversation

@PerretB
Copy link
Contributor

@PerretB PerretB commented Nov 4, 2018

This PR fixes #174.

Currently when pybind11 tries to cast a pyobject, that is not compatible with the ndarray interface, into a pyarray (or a pytensor), the custom type caster ends up building a pyarray based on an invalid pyobject (pointing to null). This leads to a segfault when init_from_python is called in the contructor.

The proposed solution is to abort the function init_from_python if the underlying pyobject is invalid.

Note that the constructed pyarray object will itself be invalid, which is consistent with the base class pyobject but probably not with the xtensor usual behavior. Another possibility could be to throw an exception or to change the validation logic of the type caster.

@JohanMabille
Copy link
Member

JohanMabille commented Nov 5, 2018

Thanks for tackling this!

I think we should mimic the behavior of pybind11 / python and throw an exception in the constructor if the argument type mismatches, as you propose as an alternative. Aborting the execution of init_from_python function will delay the segfault to a future call of operator() for instance, and that makes it more difficult to debug than signaling the error as soon as it happens, what do you think?

EDIT: also I think it could be nice to add a test for this use case so we are notified in the future if we break this behavior.

@PerretB
Copy link
Contributor Author

PerretB commented Nov 5, 2018

In fact, pybind11 does not expect an exception if the cast fails: the caster should just returns false. So from the point of view of pybind there is absolutely no problem and the incomplete pyarray will never be used. That could only be a problem for a user trying to create a pyarray from an invalid pyobject in c++ code without going through the caster logic.

If we throw an exception from the pyarray constructor, it must be caught by the caster (by the way, the same should be done about pytensor throwing when the dimension does not match as it breaks pybind11 overload resolution process): this is simple to do but I am not confident enough in xtensor design to tell if that can produce a memory leak ?

The other solution is to move away from the current optimistic approach of the caster by checking if the pyobject is suitable before trying to construct the pyarray: this solution will perhaps be a bit slower when the conversion is successful as there will be double checks (that won't solve the issue of someone trying to create a pyarray from an invalid pyobject in c++ code).

@JohanMabille
Copy link
Member

JohanMabille commented Nov 5, 2018

Ok so we have two things here: the behavior of a call from Python with the wrong argument type, and the behavior of the constructor of pyarray from C++ with an invalid pyobject.

Regarding the pure C++ behavior, having a segmentation fault due to an invalid initialization is fine, it is expected that you can shoot yourself in the foot if you're not careful enough.

Regarding the Python behavior, what I meant with "mimic pybind11 behavior" is having the same behavior as you explain in #174: calling a function expecting a pyarray with something else should trigger the following exception:

TypeError: foo(): incompatible function arguments. The following argument types are supported: 1. (arg0: numpy.ndarray[float32]) -> int

I don't recall all the internal of pybindd11, thus my proposal of throwing from the constructor since we already check if the type is compatible.

@PerretB
Copy link
Contributor Author

PerretB commented Nov 5, 2018

So the current fix is compliant with what you describe:

  • in c++, if the user performs an invalid initialization, he will get an invalid object;
  • in python, if the user tries to call a method with an incompatible object, pybind11 will be notified that the call is not possible and will eventually, if no other valid overloaded definitions exists, raise a TypeError exception.

However, raising an exception from the c++ code without catching it before returning to pybind11 in the type caster would be an error as it would break pybind11 overload resolution mechanism.
I will create an issue regarding this problem with current pytensor constructor: this can be fixed in this PR too.

Note that the check_array method https://github.com/QuantStack/xtensor-python/blob/3ebbcc40d86180f1b1741e4642df403669bcef73/include/xtensor-python/pyarray.hpp#L54 is not called in normal (without overload) name resolution as pybind11 will tell the type caster to allow conversions (convert is true) by default. If the method is overloaded, pybind11 will first try to find a perfect match, thus without any argument conversion (in this context ndarray[double] -> pyarray[double] is not considered as a conversion while ndarray[float] -> pyarray[double] is one), and, if this fails, it will try a second pass with conversions allowed.

Thus, I think that it's a matter of style preference: personally I don't like to throw exceptions from constructors, but I can change the PR in that way if you prefer.

@JohanMabille
Copy link
Member

Ok, thanks for the clarification and all the details. I have the same preference as you regarding throwing exceptions form constructors, I thought it was the only way to get the same behavior as pybind11 (as I said previously, I don't remember all the internal of pybind11).

@JohanMabille JohanMabille merged commit d11799f into xtensor-stack:master Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault on failed dispatching of an overrided method involving a pyarray argument

2 participants