Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert readxl from Rcpp to cpp11 #659

Merged
merged 32 commits into from Aug 6, 2021
Merged

Conversation

sbearrows
Copy link
Contributor

@sbearrows sbearrows commented Jul 28, 2021

This initial commit contains the first steps for converting from Rcpp to cpp11

Now I'll convert file by file and push each time I've completed a file so we can review.

src/zip.cpp Outdated Show resolved Hide resolved
src/zip.cpp Outdated Show resolved Hide resolved
@sbearrows
Copy link
Contributor Author

@sbearrows sbearrows commented Jul 30, 2021

I had to edit a few other files to define the macros R_NO_REMAP and STRICT_R_HEADERS to get rid of the error message:

R headers were included before cpp11 headers and at least one of R_NO_REMAP or STRICT_R_HEADERS was not defined.

It was easier then try to figure out how to reorganize the header files.

src/CellLimits.h Outdated Show resolved Hide resolved
Copy link
Member

@jimhester jimhester left a comment

There were a few cases where as far as I could tell you were converting an R SEXP to another R SEXP using as_cpp(), which you should never need to do.

The mental model is as_cpp() is used to convert a SEXP to a native C++ type (like std::string, int, etc.) and as_sexp() is used to go the opposite way, e.g. convert a C++ type to an R SEXP.

But in general you should have to call these explicitly somewhat rarely, most of the time they can be done implicitly in the generated registration code.

src/XlsxWorkSheet.cpp Outdated Show resolved Hide resolved
@@ -198,7 +204,7 @@ class XlsxWorkSheet {

xcell->inferType(na, trimWs, wb_.stringTable(), dateFormats_);
CellType type = xcell->type();
Rcpp::RObject column = cols[col];
cpp11::sexp column = cpp11::as_sexp(cols[col]);
Copy link
Member

@jimhester jimhester Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this call to as_sexp()? I would guess no? maybe just this?

Suggested change
cpp11::sexp column = cpp11::as_sexp(cols[col]);
cpp11::sexp column(cols[col]);

Copy link
Contributor Author

@sbearrows sbearrows Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up having to do something similar to what we did with
std::vector<ColType> colTypes = colTypeStrings(static_cast<SEXP>(col_types));
Because I was getting a constructor error but let me know if there is another solution!


// catches empty sheets and sheets where requested rectangle contains no data
if (ws.nrow() == 0 && ws.ncol() == 0) {
return Rcpp::List(0);
cpp11::list x;
Copy link
Member

@jimhester jimhester Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be a writable list I think? Otherwise it is going to return NULL instead of list()

Suggested change
cpp11::list x;
cpp11::writable::list();

bool has_col_names = false;
switch(TYPEOF(col_names)) {
case STRSXP:
colNames = as<CharacterVector>(col_names);
colNames = cpp11::as_cpp<cpp11::strings>(col_names);
Copy link
Member

@jimhester jimhester Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need any cast? just colNames = col_names

@@ -47,7 +48,7 @@ List read_xlsx_(std::string path, int sheet_i,
if (TYPEOF(col_types) != STRSXP) {
Rcpp::stop("`col_types` must be a character vector");
}
std::vector<ColType> colTypes = colTypeStrings(as<CharacterVector>(col_types));
std::vector<ColType> colTypes = colTypeStrings(cpp11::as_cpp<cpp11::strings>(col_types));
Copy link
Member

@jimhester jimhester Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you need a cast here either

src/ColSpec.h Outdated Show resolved Hide resolved
src/CellLimits.h Outdated Show resolved Hide resolved
#include <set>
#include <vector>
Copy link
Member

@jennybc jennybc Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to keep the system includes, e.g. <whatever> together and above the ones defined here, e.g. "utils.h"?

If you agree, that would be a good thing to check globally. I think that is (mostly?) how things started.

Copy link
Contributor Author

@sbearrows sbearrows Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for reminding me about this.
I rearranged some of the header files to avoid conflicts with Rcpp and cpp11 libraries so I need to reorganize them (or organize them back to how you had them). I ended up using the macros #define R_NO_REMAP and #define STRICT_R_HEADERS instead to avoid the conflicts, which I'll remove since they won't be needed anymore once this PR is closed.

Copy link
Member

@jennybc jennybc Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state seems to be much more regular, but often puts the local headers before the system headers. It feels like a more conventional order would be: system, vendored code (libxls), written-by-us, which also reflects intended precedence. (Let me also acknowledge that this probably doesn't matter and is more a matter of style at this point.)

Copy link
Contributor Author

@sbearrows sbearrows Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry! I should have reread your comment before executing based on my memory. I'll rearrange in that order.

src/XlsWorkSheet.cpp Outdated Show resolved Hide resolved
src/XlsWorkSheet.h Outdated Show resolved Hide resolved
src/XlsxWorkSheet.cpp Outdated Show resolved Hide resolved
src/XlsxWorkSheet.h Outdated Show resolved Hide resolved
tests/testthat/test-read-excel.R Outdated Show resolved Hide resolved
src/StringSet.h Outdated Show resolved Hide resolved
src/ColSpec.h Outdated Show resolved Hide resolved
src/CellLimits.h Outdated Show resolved Hide resolved
Copy link
Member

@jimhester jimhester left a comment

We need to remove the rest of the Rcpp references as well. You can find them with find my files or with git grep Rcpp

  1. Remove Rcpp references from Imports and LinkingTo
  2. Remove the importFrom Rcpp roxygen annotation in R/readxl.R (and run devtools::document() afterwards)
  3. Remove any stray #include "Rcpp.h" directives

You may find some compilation errors, as Rcpp includes a number of standard headers which cpp11 may not. If any of these (like std::vector or std::map etc.) are used in the generated cpp11.cpp code you will need to include them in a special file, src/readxl_types.h which cpp11 will automatically include in the generated file.

@jimhester
Copy link
Member

@jimhester jimhester commented Aug 5, 2021

It looks like the only test still failing is on windows release due to std::assert not being declared, you should be able to #include <cassert> before the file including the libxls headers to fix this.

R/read_excel.R Outdated Show resolved Hide resolved
jimhester and others added 4 commits Aug 5, 2021
I think the Rcpp::stop crash is happening because unwind protect does a
long jump, and long jumping from a constructor is AFAIK undefined
behavior.

I am not clear about the cpp11::warning case, but possibly it could be a similar
issue.
R/read_excel.R Outdated Show resolved Hide resolved
tests/testthat/test-read-excel.R Outdated Show resolved Hide resolved
sbearrows and others added 3 commits Aug 6, 2021
cpp11 automatically translates strings to UTF-8 when you construct a std::string. Rcpp did not do this, and the existing tests assumed they would be converted in the native encoding.

so we do this explicitly using Rf_translateChar()
@jimhester jimhester merged commit 5682328 into tidyverse:master Aug 6, 2021
12 checks passed
@jimhester
Copy link
Member

@jimhester jimhester commented Aug 6, 2021

Thanks a million!

@jennybc
Copy link
Member

@jennybc jennybc commented Aug 6, 2021

Woohoo! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants