Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate R_ObjectTables #2922

Closed
3 tasks
krlmlr opened this issue Jun 27, 2017 · 17 comments
Closed
3 tasks

Investigate R_ObjectTables #2922

krlmlr opened this issue Jun 27, 2017 · 17 comments
Labels
feature a feature request or enhancement

Comments

@krlmlr
Copy link
Member

krlmlr commented Jun 27, 2017

instead of bindrcpp.

  • benchmark
  • robustness
  • drop-in replacement?

See RProtoBuf for an example implementation: https://github.com/eddelbuettel/rprotobuf/blob/20cc4ab41b36c9582adeed182f1c429e38df6be4/src/lookup.cpp#L222.

Existing package: http://www.omegahat.net/RObjectTables/.

@lionel-
Copy link
Member

lionel- commented Jun 28, 2017

See RProtoBuf for an example implementation:

There's also an example here: r-lib/rlang#153

It includes a small hack to turn objectables into generic environments:

// This creates a new environment from C
SEXP bare_environment() {
   SEXP env = Rf_cons(R_NilValue, R_EmptyEnv);
   SET_TYPEOF(env, ENVSXP);
   return env;
 }

// This sets the HASHTAB of the new environment to the objectable
{
  SEXP env_ptr = PROTECT(R_MakeExternalPtr(env_struct, R_NilValue, R_NilValue));
  mut_class(env_ptr, string("UserDefinedDatabase"));

  SEXP env = PROTECT(bare_environment());
  SET_HASHTAB(env, env_ptr);

  SEXP classes = PROTECT(Rf_allocVector(STRSXP, 2));
  mut_chr_at(classes, 0, r_string("hook_env"));
  mut_chr_at(classes, 1, r_string("UserDefinedDatabase"));
  mut_class(env, classes);
}

I will also look into implementing tidyeval overscoping with objectables, the right abstraction is an environment with two parents rather than two linear scopes chained together.

@romainfrancois
Copy link
Member

@jeroen has strong views against object tables ... but I guess mostly when you attach them

@lionel-
Copy link
Member

lionel- commented Jun 28, 2017

Here they would be used as an enclosure. I am excited about object tables because they bring genericity to environments.

@hadley
Copy link
Member

hadley commented Aug 23, 2017

See my initial explorations at https://github.com/r-lib/objectable. Performance is quite bad.

@hadley hadley added data frame feature a feature request or enhancement labels Aug 23, 2017
@krlmlr
Copy link
Member Author

krlmlr commented Jan 24, 2018

Need to profile. I suspect part of the slowness is because two strings of 24 characters are compared for every objectable access: https://github.com/wch/r-source/blob/fdb530cd87f3c90c3dbd0b09a87878436d2fb3f0/src/main/envir.c#L102

If that turns out to be the problem, maybe Duncan Temple Lang can help bringing a fix to R-core.

@hadley
Copy link
Member

hadley commented Jan 24, 2018

I think it's probably worth spending some time on this independent of dplyr, but even if we can improve performance dramatically, it'll only apply to new versions of R, and we'd need to carefully think through if that's ok.

@krlmlr
Copy link
Member Author

krlmlr commented Feb 5, 2018

I wonder if ALTREP provides a way for implementing what we need here.

@romainfrancois
Copy link
Member

@krlmlr do we have more details about this ?

@jeroen
Copy link

jeroen commented May 31, 2018

FWIW, @ltierney has mentioned he is working on object tables and altrep environments. Perhaps if you carefully write down what it is that you are trying to accomplish he may have some ideas.

@hadley
Copy link
Member

hadley commented May 31, 2018

There's an R level wrapper in objectable — we did some performance exploration and found some significant performance overhead. If we want to use in dplyr, we'll need to work with Luke/R-core to identify and fix the bottlenecks.

@krlmlr
Copy link
Member Author

krlmlr commented May 31, 2018

There's a trade-off between:

  • accepting some overhead when accessing a column (in the order of microseconds, perhaps less), or
  • paying the initial cost of establishing an environment of active bindings, most of which might never be accessed (e.g. in a very wide table).

I'd like to to do some benchmarking in the context of jointprof and then decide.

@romainfrancois
Copy link
Member

Yeah. I’ll try to write something. This is about how variables from a grouped data frame are resolved, e.g in a mutate.

Now we use active bindings that end up calling a c++ function.

Conceptually an object table or the hinted alternative environment make more sense as we would not have to explicitly create the active bindings.

It may also be the case that an altrep keeping the full column and the indices and only resolving when needed, but again we would need to explicitly make those.

@romainfrancois
Copy link
Member

Do these compare typical dplyr use cases ?

i.e. if we have nc columns we need to make that nc active bindings that end up calling back some c++ code to extract the subset.

I’d like to compare that to a use of object table directly calling the c++ code. We don’t need the intermediate R layer, which i imagine is what is expensive.

Now we pay for hashing twice, once to find the active binding in the env, and once in the C++ code as part of the Subsets class.

@hadley
Copy link
Member

hadley commented May 31, 2018

Handling data frames with a very large number of columns is currently low priority for dplyr.

@krlmlr
Copy link
Member Author

krlmlr commented May 31, 2018

What's the limit? How many columns do we want to support well?

@romainfrancois
Copy link
Member

I think we can close this now.

@lock
Copy link

lock bot commented Jun 16, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jun 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants