-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate R_ObjectTables #2922
Comments
There's also an example here: r-lib/rlang#153 It includes a small hack to turn objectables into generic environments: // This creates a new environment from C
SEXP bare_environment() {
SEXP env = Rf_cons(R_NilValue, R_EmptyEnv);
SET_TYPEOF(env, ENVSXP);
return env;
}
// This sets the HASHTAB of the new environment to the objectable
{
SEXP env_ptr = PROTECT(R_MakeExternalPtr(env_struct, R_NilValue, R_NilValue));
mut_class(env_ptr, string("UserDefinedDatabase"));
SEXP env = PROTECT(bare_environment());
SET_HASHTAB(env, env_ptr);
SEXP classes = PROTECT(Rf_allocVector(STRSXP, 2));
mut_chr_at(classes, 0, r_string("hook_env"));
mut_chr_at(classes, 1, r_string("UserDefinedDatabase"));
mut_class(env, classes);
} I will also look into implementing tidyeval overscoping with objectables, the right abstraction is an environment with two parents rather than two linear scopes chained together. |
@jeroen has strong views against object tables ... but I guess mostly when you |
Here they would be used as an enclosure. I am excited about object tables because they bring genericity to environments. |
See my initial explorations at https://github.com/r-lib/objectable. Performance is quite bad. |
Need to profile. I suspect part of the slowness is because two strings of 24 characters are compared for every objectable access: https://github.com/wch/r-source/blob/fdb530cd87f3c90c3dbd0b09a87878436d2fb3f0/src/main/envir.c#L102 If that turns out to be the problem, maybe Duncan Temple Lang can help bringing a fix to R-core. |
I think it's probably worth spending some time on this independent of dplyr, but even if we can improve performance dramatically, it'll only apply to new versions of R, and we'd need to carefully think through if that's ok. |
I wonder if ALTREP provides a way for implementing what we need here. |
@krlmlr do we have more details about this ? |
FWIW, @ltierney has mentioned he is working on object tables and altrep environments. Perhaps if you carefully write down what it is that you are trying to accomplish he may have some ideas. |
There's an R level wrapper in objectable — we did some performance exploration and found some significant performance overhead. If we want to use in dplyr, we'll need to work with Luke/R-core to identify and fix the bottlenecks. |
There's a trade-off between:
I'd like to to do some benchmarking in the context of jointprof and then decide. |
Yeah. I’ll try to write something. This is about how variables from a grouped data frame are resolved, e.g in a mutate. Now we use active bindings that end up calling a c++ function. Conceptually an object table or the hinted alternative environment make more sense as we would not have to explicitly create the active bindings. It may also be the case that an altrep keeping the full column and the indices and only resolving when needed, but again we would need to explicitly make those. |
Do these compare typical dplyr use cases ? i.e. if we have nc columns we need to make that nc active bindings that end up calling back some c++ code to extract the subset. I’d like to compare that to a use of object table directly calling the c++ code. We don’t need the intermediate R layer, which i imagine is what is expensive. Now we pay for hashing twice, once to find the active binding in the env, and once in the C++ code as part of the Subsets class. |
Handling data frames with a very large number of columns is currently low priority for dplyr. |
What's the limit? How many columns do we want to support well? |
I think we can close this now. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
instead of bindrcpp.
See RProtoBuf for an example implementation: https://github.com/eddelbuettel/rprotobuf/blob/20cc4ab41b36c9582adeed182f1c429e38df6be4/src/lookup.cpp#L222.
Existing package: http://www.omegahat.net/RObjectTables/.
The text was updated successfully, but these errors were encountered: