Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative width and indentation behavior for usage() #65

Closed
egnha opened this issue Apr 16, 2017 · 7 comments
Closed

Alternative width and indentation behavior for usage() #65

egnha opened this issue Apr 16, 2017 · 7 comments

Comments

@egnha
Copy link
Contributor

egnha commented Apr 16, 2017

Consider the following example of usage() from the help page (formatR 1.4.2):

usage(barplot.default, width = 60)

It produces

## Default S3 method:
barplot(height, width = 1, space = NULL, names.arg = NULL, legend.text = NULL, 
    beside = FALSE, horiz = FALSE, density = NULL, angle = 45, 
    col = NULL, border = par("fg"), main = NULL, sub = NULL, 
    xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, xpd = TRUE, 
    log = "", axes = TRUE, axisnames = TRUE, cex.axis = par("cex.axis"), 
    cex.names = par("cex.axis"), inside = TRUE, plot = TRUE, 
    axis.lty = 0, offset = 0, add = FALSE, args.legend = NULL, 
    ...)

There are two issues here:

  1. It would be desirable to have the option to indent subsequent lines by the width of barplot(, cf. how to split the parameters into different lines and keep them aligned? #57, Indenting consistent with Rstudio #53.
  2. The width parameter does not set an upper bound on the length of output lines, as the help page implies, where it is stated that width is the "the width of output." In the example, the first line is more than 60 characters (78).

@yihui I would like to propose two minor modifications to usage() (and only usage()) to resolve these points, assuming you consider them to be actual "issues":

  1. To handle the indentation issue, one can add a logical toggle to usage(), say indent.by.FUN (with default value FALSE, to maintain the current behavior), to turn on indentation by the width of the function name, see 9661846.

  2. To handle the width issue, one has to work around the fact that the width.cutoff parameter of deparse() is actually a lower bound for line lengths. So, to ensure that line lengths don't exceed width, one can first try width.cutoff = width, and if that leads to lines that are too long, then decrement width and repeat until all lines are no more than width in length. See 0e34eab for an implementation of this rudimentary strategy.

The outcome of these two modifications is that usage() now conforms to the naive interpretation of width:

  • Lines won't exceed width in length (while remaining relatively "full"):
    usage(barplot.default, width = 60)
    
    ## Default S3 method:
    barplot(height, width = 1, space = NULL, names.arg = NULL, 
        legend.text = NULL, beside = FALSE, horiz = FALSE, 
        density = NULL, angle = 45, col = NULL, 
        border = par("fg"), main = NULL, sub = NULL, 
        xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, 
        xpd = TRUE, log = "", axes = TRUE, axisnames = TRUE, 
        cex.axis = par("cex.axis"), cex.names = par("cex.axis"), 
        inside = TRUE, plot = TRUE, axis.lty = 0, 
        offset = 0, add = FALSE, args.legend = NULL, 
        ...)
  • Moreover, indentation by function-name width becomes an option:
    usage(barplot.default, width = 60, indent.by.FUN = TRUE)
    
    ## Default S3 method:
    barplot(height, width = 1, space = NULL, names.arg = NULL, 
            legend.text = NULL, beside = FALSE, horiz = FALSE, 
            density = NULL, angle = 45, col = NULL, 
            border = par("fg"), main = NULL, sub = NULL, 
            xlab = NULL, ylab = NULL, xlim = NULL, 
            ylim = NULL, xpd = TRUE, log = "", axes = TRUE, 
            axisnames = TRUE, cex.axis = par("cex.axis"), 
            cex.names = par("cex.axis"), inside = TRUE, 
            plot = TRUE, axis.lty = 0, offset = 0, 
            add = FALSE, args.legend = NULL, ...)

(To be precise, this new interpretation of width doesn't guarantee that line lengths won't exceed it, always, only that this will be so if it's possible with deparse(expr, width.cutoff), for some (valid) value of width.cutoff less than or equal to width.)

@egnha
Copy link
Contributor Author

egnha commented Apr 16, 2017

With commits 9661846, 0e34eab, all existing tests pass. I haven't (yet) added formal tests of the new behavior.

@yihui
Copy link
Owner

yihui commented Apr 17, 2017

You may submit two pull requests addressing these two issues separately. I think I can accept egnha@9661846 but I'm not sure about egnha@0e34eab. I came up with the similar idea a couple of years ago: https://github.com/yihui/Rd2roxygen/blob/b9fcedb24/R/build.R#L152-L162 but obviously it is not efficient.That said, I don't have other ways to implement the idea of the upper bound of the width. I wrote a blog post in Chinese two months ago about this issue: https://yihui.name/cn/2017/02/formatr/ Not sure if you are able to read it, but the possible ways are:

  1. Patch deparse() in base R;

  2. Use getParseData() to reconstruct lines while respecting the maximum width;

  3. Or use the sourcetools package.

The first way is probably the most difficult one. The latter two may be a little easier, but still not trivial.

@egnha
Copy link
Contributor Author

egnha commented Apr 19, 2017

I should have figured that you'd already tried the while-loop strategy. ;) I agree, it's too pokey.

In the meantime, I've come up with a different way to get usage() to literally interpret the width argument, by a method that is not as sophisticated as the ones you suggested: see the function usage2() in commit c4d40bd. Basically what I've done there is bypass tidy_source() and simply deparse each argument in succession, while keeping track of the token count, rather than repeatedly deparsing the entire call. Once the tokens-per-argument are tabulated, it is a simple matter to break the lines according to the specified width.

We even get a small performance boost over the original usage() (with the new indent.by.FUN option):

microbenchmark::microbenchmark(
  usage(barplot.default, 60L, indent.by.FUN = TRUE, output = FALSE),
  usage2(barplot.default, 60L, indent.by.FUN = TRUE, output = FALSE),
  times = 1e3L
)

#> Unit: milliseconds
#>                                                                expr      min       lq     mean   median       uq       max neval
#>   usage(barplot.default, 60L, indent.by.FUN = TRUE, output = FALSE) 3.846394 4.006600 4.397606 4.087900 4.346930 46.761532  1000
#>  usage2(barplot.default, 60L, indent.by.FUN = TRUE, output = FALSE) 1.580056 1.642587 1.803428 1.676515 1.768616  4.916034  1000

(In the case where the function is supplied as a string, further speed-up could be had by replacing argsAnywhere(), which can be quite slow compared to args()).

Typical output of usage2() looks like this:

usage2(barplot.default, 60L, indent.by.FUN = TRUE)

#> ## Default S3 method:
#> barplot(height, width = 1, space = NULL, names.arg = NULL,
#>         legend.text = NULL, beside = FALSE, horiz = FALSE,
#>         density = NULL, angle = 45, col = NULL,
#>         border = par("fg"), main = NULL, sub = NULL,
#>         xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL,
#>         xpd = TRUE, log = "", axes = TRUE, axisnames = TRUE,
#>         cex.axis = par("cex.axis"),
#>         cex.names = par("cex.axis"), inside = TRUE,
#>         plot = TRUE, axis.lty = 0, offset = 0, add = FALSE,
#>         args.legend = NULL, ...)

@yihui
Copy link
Owner

yihui commented Apr 19, 2017

I love it! Please submit a PR replacing my usage() with your usage2(). Thanks!

@egnha
Copy link
Contributor Author

egnha commented Apr 20, 2017

I'd be glad to submit a PR. :)

Aside from writing some tests and ensuring that corner cases are properly handled, there is the question of what usage() should do if the line-breaking condition is unfulfillable.

Example of what currently happens:

# usage2() calls tidy_usage() to fit the lines (formerly done by tidy_source())

usg = "foo  (bar, really_long_argument_name = long_argument_default_value)"
tidy_usg = tidy_usage("foo", usg, width = 50, indent = 3)

cat(tidy_usg)

#> foo(bar,
#>     really_long_argument_name = long_argument_default_value)

nchar(capture.output(cat(tidy_usg)))

#> [1]  8 60

With indent = 3 (width of foo) it is impossible to satisfy the width requirement of 50 characters, because key-value pairs won't be split, so usage2() just does the best it can. But in such cases I think it would be better to:

  1. Stop with an error whose message points to the culprit (what is too long), or
  2. Return the best fitting output, but raise a warning that the line-breaking condition is unfulfillable.

I think the first option is the more practical one, because usage() has no way of knowing whether the problem is caused by a badly chosen width or indent on the user's part, or a potential coding smell (e.g., super long argument name/value), and the notion of "best fitting" is probably best left to the user's discretion (i.e., the user should decide how to handle an error, and whether to retry usage() with an alternative width or indent value).

Which of these options (or an alternative) do you think is best?

@yihui
Copy link
Owner

yihui commented Apr 20, 2017

I'd add an argument, say, fail = c("warn", "stop", "none"), to decide what to do when usage() fails to respect the maximum width. I suggest warn to be the default for the sake of backward compatibility (previously I didn't stop()).

@egnha
Copy link
Contributor Author

egnha commented Apr 20, 2017

That is a good compromise. I'll do that.

@yihui yihui closed this as completed in #66 Apr 23, 2017
yihui added a commit that referenced this issue Apr 23, 2017
Implementation of usage() that respects width and indent (#65)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants