Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback before submitting to CRAN #1

Closed
tzakharko opened this issue Jul 1, 2022 · 18 comments
Closed

Feedback before submitting to CRAN #1

tzakharko opened this issue Jul 1, 2022 · 18 comments

Comments

@tzakharko
Copy link
Owner

This issue is reserved to discussing what needs to be changed/improved before the package is submitted to CRAN

@TimTaylor
Copy link

Thank you for sharing the package - I'm enjoying digging through the code. Two initial comments:

  1. You advertise as a drop in replacement for stopifnot() but you can only provide a single string literal per call. To expand, in `stopifnot() we can write:
ex_stopifnot <- function(x,y) {
    stopifnot(
        "x must be positive" = x > 0,
        "y must be negative" = y < 0
    )
}

whereas with precondition we would need

ex_precondition <- function(x,y) {
    precondition("x must be positive", x > 0)
    precondition("y must be negative", y < 0)
}

It would be nice if we could do the following (which would feel more like a stopifnot replacement):

ex_precondition_b <- function(x,y) {
    precondition(
        "x must be positive", x > 0,
        "y must be negative", y < 0
    )
}
  1. I wonder if it should be possible to turn off the {cli} formatting via an option?

Generally think it seems like a nice little package which I'd consider using 👍

@TimTaylor
Copy link

TimTaylor commented Jul 1, 2022

In cas useful also worth highlighting {vetr} and the review the author did.

@nfultz
Copy link

nfultz commented Jul 2, 2022

Hello, I've contributed to the (over-complicated) stopifnot - I like that your implementation is quite a bit more simple.

Here are a couple ideas / nice-to-haves:

  • One thing that would be nice is a way to disable checking completely a la -DNDEBUG in C. Could create a variable in the C file with a setter exposed, and early exit the checks if it's turned off.

  • I can't quite tell at first glance if you are using non-standard-evaluation, but {{x}} makes it look like you are - can you add an example where it works with other NSE ? eg within(mtcars, precondition({{mpg}} < 100))

  • Another thing I would recommend is to loop over all elements of an expression, instead of forcing the user to use all() - I generally prefer things to be vectorized by default. Just needs an extra loop.

@tzakharko
Copy link
Owner Author

Thanks @TimTaylor, @nfultz — that's exactly the kind of quality feedback I was hoping to get. Replying to your points in order.

  1. You advertise as a drop in replacement for stopifnot() but you can only provide a single string literal per call. To expand, in `stopifnot() we can write:

The original idea was that one assertion tests exactly one (potentially complex) predicate, which is why there is only a single string literal. But I understand that this can be limiting. I am quite open about providing multiple debug strings, one just needs to keep in mind that it makes the design more complex. We'd need to carefully consider what is exactly the relation between what is the exact relation between the debug strings and the conditions. It would make sense to me if the debug string applied to the conditions following it. e.g.:

precondition(
  "x is a positive integer", is.integer(x), x > 0,
  "y is a bare list", is.list(y), !is.object(y)
)

And then one would need to consider the following cases:

# what does one do if there are multiple debug strings without conditions? 
precondition(
  "x is a positive integer", is.integer(x), x > 0
 "something", # ??
 "y is a list", is.list(y)
)

# what if I want to "end" the debug string scope without providing a new debug string? 
precondition(
  "something about x", check(x), 
 "", # empty string? - not about x anymore
 check(y)

Of course, one could just do what stopifnot() does and use named assertion predicates:

precondition(
   "x is a positive integer" =  is.integer(x) && x > 0,
   "y is a bare list" =  is.list(y) && !is.object(y)
)

but then the debugging suffers since we can't pinpoint what exactly has failed... and of course, one can have the names auto-propagate to any subsequent condition, but then everything gets kind of messy...

I might be massively overthinking this, but I just don't want to end up with a design patterns that will confuse users more than it helps.

  1. I wonder if it should be possible to turn off the {cli} formatting via an option?

You mean just the colorisation or something else? The colorisation can be turned off via options(crayon.enabled = FALSE) or setting the environment variable NO_COLOR

    • One thing that would be nice is a way to disable checking completely a la -DNDEBUG in C. Could create a variable in the C file with a setter exposed, and early exit the checks if it's turned off.

Absolutely, there should be an option to control that. I already planned to do it anyway, you are right that it should be done before the package is on CRAN

  • I can't quite tell at first glance if you are using non-standard-evaluation, but {{x}} makes it look like you are - can you add an example where it works with other NSE ? eg within(mtcars, precondition({{mpg}} < 100))

No, there is no NSE, the embrace is only used to extract the expressions used in error reports for easier debugging. Good idea about the example.

  • Another thing I would recommend is to loop over all elements of an expression, instead of forcing the user to use all() - I generally prefer things to be vectorized by default. Just needs an extra loop.

The decision to use only logical scalars is a deliberate design choice because I think it makes it easier to catch some programming errors. In the end, it's about choosing between which default you want to treat as privileged: should the user add an explicit length(x) == 1L if they expect scalar data or should the user add an explicit all() if you expect vector data. I think that the length(x) == 1L check is easier to forget and erroneously accepting vector data where a scalar should be used can be potentially more tricky to debug in R with it's automatic vectorisation and data promotion. So I'd rather inconvenience the users into adding an explicit all() as it spells out more clearly what their code expects — at least that's the idea.

But then again, I don't have an overly strong opinion on this and I am certainly biased because much of the functions I write expect scalar objects. Maybe you have a motivating example for preferring vectorised conditions by default?

@tzakharko
Copy link
Owner Author

tzakharko commented Jul 6, 2022

As an experiment I have uploaded the vectorized-checks branch which makes things a bit more compatible with stopifnot() as an experiment

With this,

precondition("x is a vector of positive numbers", is.numeric(x), all(x > 0))

becomes

precondition("x is a vector of positive numbers" =  is.numeric(x), x > 0)

This also allows to do things like @TimTaylor suggested, e.g.

precondition( "x must be positive" = x > 0,  "y must be negative" = y < 0)
# note that the message "sticks", e.g. 
precondition( "x must be a positive scalar" = length(x) == 1, x > 0,  "y must be negative" = y < 0)

Looking at this, I think that multiple assertion messages can be useful, just unsure about the choice of the syntax (names vs. string literals — the literals do come with a small performance penalty). At the same time I am not convinced that vectorised check works better as a default as it opens up more possibilities for things to go wrong and could increase the chance of writing bad assertions (some more background on this rationale is in my last post)

@tzakharko
Copy link
Owner Author

The main branch now implements multiple assertion messages as per @TimTaylor's suggestion:

precondition( "x must be positive", x > 0,  "y must be negative", y < 0)

I am using the string literal syntax instead of assignment as I think it's a bit more transparent (even if the evaluation ends up a bit more involved)

@TimTaylor
Copy link

@tzakharko - I like the string literal syntax, it reminds me of fcase in data.table and feels very natural. The .env variable is nice as well as it let's us highlight the calling function which is something that isn't possible with stopifnot() (AFAIK). One minor point on the formatting in error messages and the double !. As the second ! is additional information it would be nice to distinguish it:

library(precondition)

ex_precondition <- function(x,y) {
    precondition( 
        "x must be positive", x > 0,
        "y must be negative", y < 0
    )
}

# this will give two ! symbols via cli
ex_precondition(1,1)
#> Error in `ex_precondition()`:
#> ! y must be negative
#> ! `y < 0` is not TRUE

# It would be nice if we could distinguish additional information, e.g.
cli::cli_abort(c("y must be negative", "{.code y < 0} is not TRUE"))
#> Error:
#> ! y must be negative
#> `y < 0` is not TRUE

# or perhaps
cli::cli_abort(c("y must be negative", "*" = "{.code y < 0} is not TRUE"))
#> Error:
#> ! y must be negative
#> • `y < 0` is not TRUE

Created on 2022-07-06 by the reprex package (v2.0.1)

@nfultz
Copy link

nfultz commented Jul 6, 2022

LGTM - the string literal syntax will interact better with i18n, which is a shortcoming of stopifnot at the moment.

If you choose not to vectorize the conditions to match stopifnot, I'd recommend doing a length 1 check + warning / error that matches with base R behavior for || / && - see the discussion in HenrikBengtsson/Wishlist-for-R#48 - this will be changing in base R 4.3

@tzakharko
Copy link
Owner Author

Agree on both points. The latest commit removes the "!" prefix from condition failures, it looks cleaner now

! precondition failure
  `x > 0` is not TRUE
  
  `x` = dbl -1

Also, an informative note is displayed if the condition did not produce exactly TRUE or FALSE:

precondition(1:5 > 0)
#> ! precondition failure
#>  `1:5 > 0` is not TRUE
#>
#> ℹ note: condition should produce a scalar TRUE or FALSE (logical vector found)

precondition(NA > 0)
#> ! precondition failure
#>  `NA > 0` is not TRUE
#>
#> ℹ note: condition should produce a scalar TRUE or FALSE (NA found)

precondition(list(list(1, 2)) > 0)
#> ! precondition failure
#>  `list(list(1, 2)) > 0` is not TRUE
#>
#> ℹ note: error occured when evaluating a condition

@TimTaylor
Copy link

I've been playing with this a little more and wondering if .env is actually of much use in the packages current form. From what I can tell the main use would be in bubbling up errors from internal functions to user facing functions. This works well for the function call itself but doesn't help with the message due to the string literal requirement for messages meaning they cannot be altered based on arguments of external functions. There are similar complications with how the you would want the conditions displayed when called from internal functions. I wonder if for the initial release you want to drop the .env parameter and focus on precondition being used from user-facing functions? Hope this makes sense.

@tzakharko
Copy link
Owner Author

tzakharko commented Jul 7, 2022

@TimTaylor the idea behind .env indeed to allow helper functions to do checking on account of the main function. It is definitely an advanced feature since you need to make sure that the condition promises are valid in the context of the main function (especially since .env is where the diagnostics routine will look for any embraced expression). So you can do something like this:

my_user_facing_function <- function(x, y) {
  validate_fun_args(x, y)
  x + y
}

# private
validate_fun_args <- function(x, y) {
  precondition(
    "x is a mandatory numeric argument", !is.missing(x), is.numeric({{x}}), 
    "y is a mandatory numeric argument", !is.missing(y), is.numeric({{y}}), 
   .env = parent.frame()
}

my_user_facing_function(1, "aa")


#> Error in `my_user_facing_function()`:
#> ! y is a mandatory numeric argument
#>   `is.numeric(y)` is not TRUE
#>   
#>   `y` = chr "aa"
#> Backtrace:
#>     ▆
#>  1. └─global my_user_facing_function(1, "aa")
#>  2.   └─global validate_fun_args(x, y)
#>  3.     └─precondition::precondition(...)

There are similar complications with how the you would want the conditions displayed when called from internal functions.

Does one need to treat this in a special way at all? The current model is fragile and puts the developer in charge for correctly propagating the arguments, but I think it does what is supposed to do. But maybe I am missing something that you are seeing?

@TimTaylor
Copy link

TimTaylor commented Jul 7, 2022

I think it's reasonable to put the onus on the developer so not to overcomplicate things for little gain. I understand the current behaviour and will try and convey below where I think there is slight friction in the design (note this is only in regards to using with internal functions). It's worth noting that normally I try to avoid condition checking internal functions anyway but occasionally I do (e.g. when I'm checking similar things repeatedly).

Example 1 - How I'm most likely to use precondition - works great

This is what really appeals to me is that I can write things like this!

my_user_facing_function_1 <- function(a, b, d) {
    precondition( 
        is.numeric(a),
        is.numeric(b),
        "`d` must be a string", is.character(d), length(d) == 1L
    )
}

my_user_facing_function_1(1,2L,letters)
#> Error in `my_user_facing_function_1()`:
#> ! `d` must be a string
#>   `length(d) == 1L` is not TRUE

Example 2 - Occasionally I'd use it like this

my_internal_function_2 <- function(x, .env = parent.frame()) {
    precondition(
        "`x` must be a string", is.character(x), length(x) == 1L, .env = .env
    )
}

my_user_facing_function_2 <- function(aaazzz) {
    my_internal_function_2(x=aaazzz)
}

As parameters do not match we will get a confusing error message for users (i.e. they only know of the aaazzz parameter not x). If I'm using internal functions I should not need to not think about matching arguments with the wrapping function.

my_user_facing_function_2(letters[1:2])
#> Error in `my_user_facing_function_2()`:
#> ! `x` must be a string
#>   `length(x) == 1L` is not TRUE

Example 3 - Try evaluating for some clarity

This will fail as it tries to evaluate x in the wrong environment

my_internal_function_3 <- function(x, .env = parent.frame()) {
    precondition(
        "`x` must be a string", is.character({{x}}), length({{x}}) == 1L, .env = .env
    )
}

my_user_facing_function_3 <- function(aaazzz) {
    my_internal_function_3(x=aaazzz)
}

my_user_facing_function_3(letters[1:2])
#> Error in `my_user_facing_function_3()`:
#> ! `x` must be a string
#>   `length(x) == 1L` is not TRUE
#>   
#>   `x` failed to evaluate (object 'x' not found)

Example 4 - Try evaluating for some clarity again

This will work but now the wrong call is given and the message is still not overly helpful due to the different arguments

my_internal_function_4 <- function(x) {
    precondition(
        "`x` must be a string", is.character({{x}}), length({{x}}) == 1L
    )
}

my_user_facing_function_4 <- function(aaazzz) {
    my_internal_function_4(x=aaazzz)
}

my_user_facing_function_4(letters[1:2])
#> Error in `my_internal_function_4()`:
#> ! `x` must be a string
#>   `length(x) == 1L` is not TRUE
#>   
#>   `x` = chr c("a", "b")

For completeness

For completeness, my current way of dealing with internal functions is somewhat involved (needing individual pre-specified assertions) but illustrates the sort of flexibility it would, ideally, be nice to achieve

.assert_scalar_character <- function(x, label = deparse(substitute(x)), call = sys.call(-1)) {
    if (!(is.character(x) && length(x) == 1)) {
        msg <- sprintf("`%s` must be a character vector of length 1.", label)
        stop(simpleError(msg, call[-2]))
    }
    invisible(x)
}

internal_fun <- function(x) {
    .assert_scalar_character(x, label = deparse(substitute(x)), call = sys.call(-1))
}

external_fun <- function(aaazzz) internal_fun(aaazzz)   
external_fun(letters)
#> Error in external_fun(): `aaazzz` must be a character vector of length 1.

external_fun2 <- function(aaazzz) .assert_scalar_character(aaazzz)
external_fun2(letters)
#> Error in external_fun2(): `aaazzz` must be a character vector of length 1.

@TimTaylor
Copy link

To summarise ... this boils down to how tightly coupled parameters/arguments are between the internal and external functions and how flexible (or not) you wish to be with this :-)

@TimTaylor
Copy link

Apologies. I've reread your initial response and realise this coupling is a conscious requirement you have made in how this is implemented. Please ignore the noise above.

@tzakharko
Copy link
Owner Author

It's not noise at all, what you wrote is perfectly legitimate and relevant. And I agree with you that it would be a great feature to have, just not sure how feasible is it to solve it for the general case. One could potentially introduce an auxiliary operation for argument substitution e.g.

.assert_scalar_character <- function(x) {
   precondition("{x} must be a character vector of length 1",  is.character({{substitute_arg(x, parent.frame())}}), {{length(substitute_arg, parent.frame())}} == 1, .env = parent.frame())
}

substitute_arg would be a no-op, unless an error is triggered. In that case the diagnostic routine will inspect the promise expression and trace the argument back to its original value in the relevant frame. I don't know whether it will cover all the relevant cases, but it should work in principle. The syntax might be a bit verbose, but it shouldn't be a problem seeing that it will only be used in auxiliary functions. And the runtime overhead will be negligible (unless an error is triggered, which is the slow case anyway).

Anyway, it's probably a bit premature to target this kind of functionality for the initial release, but I will keep it in mind as a potential enhancement. I think for now it should be sufficient to describe .env as a potentially unsafe, advanced parameter that targets only specific use cases where tight coupling is acceptable.

@tzakharko
Copy link
Owner Author

@TimTaylor, I went back to the drawing board and managed to implement what I think is a nice generic mechanism for forwarding arguments to internal check functions as well as giving them the ability to generate fully custom error messages. You will find more information in #3

Would be very curious to hear your thoughts on this!

I am closing this issue in the meantime.

@TimTaylor
Copy link

Cheers @tzakharko. I'll take a look but will likely not be until the latter half of September.

@tzakharko
Copy link
Owner Author

@TimTaylor take your time, I’m traveling until the end of the month anyway. Just wanted to get this patch out before I leave. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants