Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict pluck #482 #522

Merged
merged 18 commits into from
Nov 27, 2018
Merged

Strict pluck #482 #522

merged 18 commits into from
Nov 27, 2018

Conversation

daniel-barnett
Copy link
Contributor

This is an implementation of strict pluck, so-called chuck from #482.

Once we have agreed on everything, I'll also add some examples and descriptions to the help.

Since this is based on pluck, it seems like #480 is involved here too.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, although I think it would be better to move the checking a bit further down in order to get better error messages (which will require some test tweaking).

src/extract.c Outdated
@@ -66,10 +66,17 @@ int find_offset(SEXP x, SEXP index, int i) {
}
}

SEXP extract_vector(SEXP x, SEXP index_i, int i) {
SEXP extract_vector(SEXP x, SEXP index_i, int i, int strict) {
int offset = find_offset(x, index_i, i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to pass strict down to find_offset() — that's obviously a lot more work because there are so many more returns, but I think it would lead to considerably more informative error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted all of this so it gives specific messages for each type of problem in find_offset(); however, for the error messages, which way do you think is better out of one or two line errors?

chuck(1:10, Inf)
#> Error: Each accessor must be finite:
#> * Accessor 1 is Inf

chuck(1:10 Inf)
#> Error: Accessor 1 must be finite, not Inf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second - I slightly prefer the wording, and the former suggests we might warn for any Inf, not just the first we encounter (which while nice, I don't think that would be worth the implementation complexity)

src/extract.c Outdated
}

SEXP extract_attr(SEXP x, SEXP index_i, int i) {
SEXP extract_attr(SEXP x, SEXP index_i, int i, int strict) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should rename to extract_s4() to make the intent more clear?

src/extract.c Outdated
Rf_errorcall(R_NilValue,
"Can't find slot `%s`.",
Rf_translateCharUTF8(Rf_asChar(index_i))
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like you could use an else block here, just to make clear what the default is (instead of relying on Rf_getAttrib())

@daniel-barnett
Copy link
Contributor Author

Sorry for the delay; this latest push should have more informative error messages as I moved it further down as you suggested. I did change some of existing error messages (i.e. the ones that pluck also uses) so they're more consistent. If this is fine, I can go ahead and write up something for the documentation.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! A couple more comments/questions

src/extract.c Outdated
} else if (val >= n) {
if (strict)
Rf_errorcall(R_NilValue,
"Index %i exceeds the length of object being plucked (%.0f > %i).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why %.0f and not %i?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of double val = REAL(index)[0];, using %i gives nvalid index (1): index must be greater than 0, not 536870913..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it.

src/extract.c Outdated
if (Rf_length(index) > 1) {
Rf_errorcall(R_NilValue, "Index %i must have length 1", i + 1);
Rf_errorcall(R_NilValue,
"Index %i must have length 1, not %i.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these messages are a bit easier to write/read if they all start in the same way, either with "Invalid object:" or "Invalid index (%i):".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you replace all instances of "must not" by "can't" please? For instance this:

Object being plucked must not be NULL.

reads better as:

Plucked object can't be NULL.

cf https://style.tidyverse.org/error-messages.html

(just to be clear, the affirmative "must" cases are good!)

@hadley
Copy link
Member

hadley commented Sep 12, 2018

Feel free to start on the documentation - I think you should document chuck() with pluck() (i.e. with @rdname chuck, and then just tweak the existing pluck docs)

src/extract.c Outdated
if (val < 0 || val >= n)
return -1;
if (val < 0) {
if (strict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: can you add curly braces even to single expressions please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want me to add those for the existing code that I haven't yet modified too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lionel- I don't think we use that style anymore, and no parens is fine for simple expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I wasn' aware, Jim, Gabor and I still use it. It's consistent with our R style and easier to read since we use 2-level indents. Especially with multi-line single expressions such as here. But nevermind then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also have implications for covr, I think? That is, the presence of the brackets helps covr do its job in some weird edge cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the edge cases were fixed and this is officially blessed @jimhester style

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am behind the times then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The officially blessed jimhester style is to always use braces for single expressions.

The edge cases in covr are fixed, but I think it is still better to use braces, it is too easy to accidentally forget to add the braces in the future and write something like the following, then not realize the second if is unconditionally evaluated.

if (val < 0)
  print(val)
  if (strict)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am SHOOK

@hadley
Copy link
Member

hadley commented Nov 19, 2018

@daniel-barnett do you mind updating your code to use {} for all single expressions? Don't worry about changing the style for any code that you didn't write. We'll fix that.

@lionel-
Copy link
Member

lionel- commented Nov 27, 2018

Don't worry about the latest comments @daniel-barnett, I'm going to finish the PR because I need it merged before working on other pluck bugs and features.

@lionel-
Copy link
Member

lionel- commented Nov 27, 2018

Refactored the code to put all the checking logic at the end of the file. Merged several code paths to get more consistent error checking and make it easier to maintain. Improved consistency of error messages.

@hadley Could you review the last commit which adds documentation please? I have moved attr_getter() in its own topic so as not to distract from the important stuff.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed the docs

R/pluck.R Outdated
#' accessor functions, i.e. functions that take an object and return
#' some internal piece.
#' `pluck()` and `chuck()` implement a generalised form of `[[` that
#' allow you to index deeply and flexibly into data structures. While
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "while"

R/pluck.R Outdated
#' accessors because it reads linearly and is free of syntactic
#' cruft. Compare: \code{accessor(x[[1]])$foo} to `pluck(x, 1,
#' accessor, "foo")`.
#' * You can pluck or chuck with standard accessors like integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these should move down to details?

R/pluck.R Outdated
#'
#' @details
#'
#' Since it handles arbitrary accessor functions, `pluck()` is a type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this paragraph. Maybe it can be removed?

@lionel- lionel- merged commit b4ae036 into tidyverse:master Nov 27, 2018
@lionel-
Copy link
Member

lionel- commented Nov 27, 2018

Thanks again @daniel-barnett !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants