Request) Make the rarray SEXP constructor error if incompatible types are passed #59

DavisVaughan · 2018-11-10T16:31:27Z

This has been a bug on my part that I struggled with for a few days now, and have just figured out.

I had a simple function that took in a SEXP that is automatically converted to a rarray<double>, and then just returns it. If I passed in c(1, 2, 3) everything worked fine. If I passed in 1:3 it gave garbage results like c(1e-314, 1e-314, 1e-314).

This is a result of me being dumb and not remembering that c(1, 2, 3) is a numeric vector, and 1:3 is an integer vector. So passing along the integer SEXP results in garbage when its converted to rarray<double>.

An example of all of this is in the readme here (just look at the calls to identity_cpp()):
https://github.com/DavisVaughan/xtensorfailure

With the identity function here:
https://github.com/DavisVaughan/xtensorfailure/blob/master/src/example.cpp#L7

Would it be possible to check that the type T matches up with the type of the R object provided, and throw an error if not? I think you could probably do this in the rarray and rtensor constructors, where you could maybe compare SXP as you have defined it against TYPEOF(SEXP_object_to_convert) and throw an error if they are different?

The text was updated successfully, but these errors were encountered:

wolfv · 2018-11-10T17:25:18Z

That is a good idea. I was struggling with the same problem before.

I think it should definitely be possible.

eddelbuettel · 2018-11-11T15:46:43Z

We struggle with this in Rcpp land too. The trouble is that we are smushing the lines between compile and run-time. C++ wants a compile time check, but R will only tell us at run-time what the type is. Hence the need for SEXP in interfaces, and dispatch inside the function.

In short, it looks like @DavisVaughan re-invented what we have explained eg in this somewhat classic piece from 2013 by Kevin at the Rcpp Gallery (which is referenced in eg a number of StackOverflow and mailing list answers).

I fear we can't do much better than this.

wolfv · 2018-11-11T16:10:38Z

Hi Dirk, great to see you in this issue -- I was about to ping you for ideas :)

I think we can do a bit better:

Disallow arrays with data types that are not double, int(32), bool or complex as I understand those are the only data types in R (of interest here). E.g. uint64 and others should all be forbidden
When taking in a R-array we should perform the dynamic type check. Yes, that's costly, but it's probably not that high of a price. When creating an rarray from C++ we don't need to pay the price as we know the type already. I guess pybind11 and Rcpp differ here: pybind11 does a runtime check of the type, and actually selects a working overload (yes, this can be expensive).

If I use the TYPEOF macro on a SEXP do I get the element type of the R vector/matrix?

eddelbuettel · 2018-11-11T16:44:38Z

I am hopeful we can do better :) So far we haven't.

Type restrictions are fine. The list is what R has. (For int64_t we cheat via "overloaded" interpretation, nothing has been written for uint64_t :-( ).

pybind11 is on my list of things to look at, but as I hardly use Python that list only gets push_back and no pop_front :-( So not sure how it different,

Yup. TYPEOF is your friend. Should be in a number of Rcpp Gallery posts, StackOverflow answers and of course the Rcpp sources. It is an R macros as R started this business with the "union-alike" SEXP.

dselivanov · 2018-11-11T16:53:15Z

A bit offtopic.

For int64_t we cheat via "overloaded" interpretation

For 32 bit float there is a float pkg which takes same "cheating" approach - use R's integer vectors as storage. I use float pkg with Armadillo "mapped" matrices without any issue. So I believe we can have interoperability of the float and xtensor.

wolfv · 2018-11-11T17:53:19Z

@dselivanov interesting. This approach would make the proposed type checking completely useless, right? Because we couldn't detect a "float" vector (which is actually a int32 vector to R)?

I think for these advanced R usage we're in need of advanced R people as the xtensor team! It would be awesome if you could help out @dselivanov :)

eddelbuettel · 2018-11-11T17:56:40Z

No, no. It's a side issue, just like my mention of the integer64 hack.

Fact: R has int32, double, complex, bool. All support NA (and most support NaN) so bool is three valued just to mess with us :)

Fact: They all travel in/out as SEXP and you can use TYPEOF at run-time to inquire about payload.

Fact: Add-on packages (bit64 for integer, float for float) cheat by sticking 64bit ints into a double, and 32bit floats into an int32.

Fact: None of that helps with rarray.

But mentioning these side-hacks shows different approaches in pending the rules a little.

SylvainCorlay · 2018-11-11T18:08:48Z

I also like pybind11's dynamic dispatch approach, although it is not always ideal because of the overhead.

Maybe we could make a dynamic dispatch in xtensor-r for the data type?

Thanks @eddelbuettel for chiming in!

eddelbuettel · 2018-11-11T18:30:05Z

Happy to help, particularly as "talk is cheap" :)

Dynamic dispatch may be worth it, definitely for an exploration. I am not really sure if that has been tried (and I am not following Arrow all that closely so I am not sure what they do over there). If it works for pybind11, and as you already put the C++14 marker down (which will "eventually" be less of an issue as all compilers catch up) it may be a good route.

wolfv · 2018-11-15T21:11:37Z

Hi @eddelbuettel & @DavisVaughan

I've implemented the safe guards in this PR: https://github.com/QuantStack/xtensor-r/pull/61/files

Do you guys want to quickly review the change? @eddelbuettel is this an appropriate way of raising an error to R?
It seems to work fine!

Regarding what you've been mentioning above... I had a bit of a talk with a Pandas dev who's interested in Arrow, and he mentioned that R implements sentinel values, correct? Is that how NA values are represented for Ints, and bools? It could be quite cool to support R's way of creating NA values etc. natively using xtensor_optional_assembly and related tools!
I was additionally wondering how the character / string array works in R (with regards to memory layout). Is it a bunch of \0 terminated strings, or do they all have the same buffer length (as they do in NumPy). I wonder wether it would be trivial to wrap this data type in xtensor, or not ... :)

Cheers & thanks for the help!

eddelbuettel · 2018-11-15T21:39:21Z

That looks good to me, and yes, Rcpp::stop() it is as we did a number of iterations on that over the years to get stacks unwounds etc pp. Should "Just Work" (TM).

The (super useful) NA and NaN definitions for types other double are in the R headers.

DavisVaughan · 2018-11-16T20:47:47Z

Thanks for implementing this, these merged changes look good!

DavisVaughan changed the title ~~Request) Make the rcontainer SEXP constructor error if incompatible types are passed~~ Request) Make the rarray SEXP constructor error if incompatible types are passed Nov 10, 2018

DavisVaughan closed this as completed Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request) Make the rarray SEXP constructor error if incompatible types are passed #59

Request) Make the rarray SEXP constructor error if incompatible types are passed #59

DavisVaughan commented Nov 10, 2018

wolfv commented Nov 10, 2018

eddelbuettel commented Nov 11, 2018 •

edited

wolfv commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018

dselivanov commented Nov 11, 2018 •

edited

wolfv commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018 •

edited

SylvainCorlay commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018

wolfv commented Nov 15, 2018

eddelbuettel commented Nov 15, 2018

DavisVaughan commented Nov 16, 2018

Request) Make the rarray SEXP constructor error if incompatible types are passed #59

Request) Make the rarray SEXP constructor error if incompatible types are passed #59

Comments

DavisVaughan commented Nov 10, 2018

wolfv commented Nov 10, 2018

eddelbuettel commented Nov 11, 2018 • edited

wolfv commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018

dselivanov commented Nov 11, 2018 • edited

wolfv commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018 • edited

SylvainCorlay commented Nov 11, 2018

eddelbuettel commented Nov 11, 2018

wolfv commented Nov 15, 2018

eddelbuettel commented Nov 15, 2018

DavisVaughan commented Nov 16, 2018

eddelbuettel commented Nov 11, 2018 •

edited

dselivanov commented Nov 11, 2018 •

edited

eddelbuettel commented Nov 11, 2018 •

edited