New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format documentation to follow tidyverse style guide #159

Closed
echasnovski opened this Issue Jul 26, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@echasnovski
Copy link
Collaborator

echasnovski commented Jul 26, 2018

I have a suggestion to format current documentation to be closer to tidyverse style guide. This (rather long) comment is going to show which format changes I think should be done to increase readability of documentation code. As some of them are matter of taste, this issue is created for discussion of this suggestion.

Slightly modified version of get_pvalue() documentation will be an example for cumulative changes (shown in collapsable spoilers):

Current documentation
#' Compute the p-value for (currently only) simulation-based methods
#' \code{get_pvalue()} is an alias of \code{p_value}
#' @param x data frame of calculated statistics or containing attributes
#' of theoretical distribution values
#' @param obs_stat a numeric value or a 1x1 data frame (as extreme or more extreme than this)
#' @param direction a character string. Options are "less", "greater", or "two_sided".
#' Can also specify "left", "right", or "both".
#'
#' @return a 1x1 data frame with value between 0 and 1
#' @export
#' @rdname get_pvalue
#' @examples
#' mtcars_df <- mtcars %>%
#'     dplyr::mutate(am = factor(am))

p_value <- function(x, obs_stat, direction) {x}

#' @export
#' @rdname get_pvalue
get_pvalue <- p_value

Step 1: Use markdown. Not many changes here, but documentation with more inline code and cross-linking will benefit from that.

After using markdown
#' Compute the p-value for (currently only) simulation-based methods
#' `get_pvalue()` is an alias of `p_value`.
#' @param x data frame of calculated statistics or containing attributes
#' of theoretical distribution values
#' @param obs_stat a numeric value or a 1x1 data frame (as extreme or more extreme than this)
#' @param direction a character string. Options are "less", "greater", or "two_sided".
#' Can also specify "left", "right", or "both".
#'
#' @return a 1x1 data frame with value between 0 and 1
#' @export
#' @rdname get_pvalue
#' @examples
#' mtcars_df <- mtcars %>%
#'     dplyr::mutate(am = factor(am))

p_value <- function(x, obs_stat, direction) {x}

#' @export
#' @rdname get_pvalue
get_pvalue <- p_value

Step 2: Reflow roxygen comments. Use width of 80 characters, as this goes along with recommendation for syntax and spares the need for horizontal scroll in IDE. New lines should be indented by two spaces. Alternatively, tags that span over multiple lines (like @description, @examples and @section) can have the corresponding tag on its own line and then subsequent lines don’t need to be indented (see this guide section). Use maximum allowed width. If output text will be shown as one line, it should be on one line in source too. Usually, this step can be done by using Ctrl+Shift+/ shortcut in RStudio (complicated nested list is one exception).

It will be good if code in examples is also under tidyverse style. In this case it means to use indentation of two spaces instead of four.

After reflow
#' Compute the p-value for (currently only) simulation-based methods
#' `get_pvalue()` is an alias of `p_value`.
#' @param x data frame of calculated statistics or containing attributes of
#'   theoretical distribution values
#' @param obs_stat a numeric value or a 1x1 data frame (as extreme or more
#'   extreme than this)
#' @param direction a character string. Options are "less", "greater", or
#'   "two_sided". Can also specify "left", "right", or "both".
#'
#' @return a 1x1 data frame with value between 0 and 1
#' @export
#' @rdname get_pvalue
#' @examples
#' mtcars_df <- mtcars %>%
#'   dplyr::mutate(am = factor(am))

p_value <- function(x, obs_stat, direction) {x}

#' @export
#' @rdname get_pvalue
get_pvalue <- p_value

Step 3: Arrange documentation blocks. Mostly used "blocks" are: title, description, parameters, details, extra sections, return value, "See also", examples, and development-target. All blocks should be separated by line break. After documentation code there should be a documented object (without line break).

By "development-target block" I mean a set of roxygen keywrods that don't directly appear in documentation: @export, @name, importFrom, etc.

I prefer to stick to this order of "blocks", however it is not mandatory. The one thing I really prefer is to have development-target block right in the end, which is usually right near the function. Also @export should come last to easily see whether the object is exported.

Also this step seems like a nice time to ensure that title is as short as it can be, moving the rest to description.

After arranging blocks
#' Compute the p-value
#' 
#' P-value is implemented for (currently only) simulation-based methods.
#' `get_pvalue()` is an alias of `p_value`.
#' 
#' @param x data frame of calculated statistics or containing attributes of
#'   theoretical distribution values
#' @param obs_stat a numeric value or a 1x1 data frame (as extreme or more
#'   extreme than this)
#' @param direction a character string. Options are "less", "greater", or
#'   "two_sided". Can also specify "left", "right", or "both".
#'
#' @return a 1x1 data frame with value between 0 and 1
#' 
#' @examples
#' mtcars_df <- mtcars %>%
#'   dplyr::mutate(am = factor(am))
#' 
#' @rdname get_pvalue
#' @export
p_value <- function(x, obs_stat, direction) {x}

#' @rdname get_pvalue
#' @export
get_pvalue <- p_value

Step 4: Use sentence case with full stop (except in title).

After sentence case
#' Compute the p-value
#' 
#' P-value is implemented for (currently only) simulation-based methods.
#' `get_pvalue()` is an alias of `p_value`.
#' 
#' @param x Data frame of calculated statistics or containing attributes of
#'   theoretical distribution values.
#' @param obs_stat A numeric value or a 1x1 data frame (as extreme or more
#'   extreme than this).
#' @param direction A character string. Options are "less", "greater", or
#'   "two_sided". Can also specify "left", "right", or "both".
#'
#' @return A 1x1 data frame with value between 0 and 1.
#' 
#' @examples
#' mtcars_df <- mtcars %>%
#'   dplyr::mutate(am = factor(am))
#' 
#' @rdname get_pvalue
#' @export
p_value <- function(x, obs_stat, direction) {x}

#' @rdname get_pvalue
#' @export
get_pvalue <- p_value

Step 5: Use @name to document multiple objects. Documentation should be done for NULL with setting an informative @name. This isn't really a tidyverse guide but rather an option in this section of roxygen2 vignette. I like this way because it forces to think about an explicit name for a set of documented objects.

Notes:

  • The way of "documenting NULL" forces to place @export tag near all documented functions. It shouldn't be placed in main documentation code.
  • If name differs from names of all objects, extra help page will be created in the package index.

This step is pretty arguable, so please tell your thoughts.

Final result
#' Compute the p-value
#' 
#' P-value is implemented for (currently only) simulation-based methods.
#' `get_pvalue()` is an alias of `p_value`.
#' 
#' @param x Data frame of calculated statistics or containing attributes of
#'   theoretical distribution values.
#' @param obs_stat A numeric value or a 1x1 data frame (as extreme or more
#'   extreme than this).
#' @param direction A character string. Options are "less", "greater", or
#'   "two_sided". Can also specify "left", "right", or "both".
#'
#' @return A 1x1 data frame with value between 0 and 1.
#' 
#' @examples
#' mtcars_df <- mtcars %>%
#'   dplyr::mutate(am = factor(am))
#' 
#' @name p_value
NULL

#' @rdname p_value
#' @export
p_value <- function(x, obs_stat, direction) {x}

#' @rdname p_value
#' @export
get_pvalue <- p_value

That's probably it. There is also a matter of indentation inside a nested list (I prefer to indent all text in list node equally). However, there seem to be no nested lists in documentation yet, so it is not urgent.

Questions:

  1. In style guide there is a suggestion to "not use code font for package names". So in documentation instead of "infer" there should be "infer package". I don'y really like this one. There are also options like "{infer}" (saw in this package couple of times) and "'infer'" (CRAN guide). Which one is preferred? I like using "{infer}" but it means writing "\{infer\}" in documentation code. So I think "infer" is better with "infer package" to occasionally remove ambiguity.
  2. @ismayc, what are your thoughts about all this? Especially about order of documentation blocks and step 5.
@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Jul 26, 2018

@echasnovski Definitely aligned on Steps 1-4 here. Step 5 seems to make sense to me as well. Forcing explicit naming seems like a good practice. Can you think of any downsides?

I'd prefer to use "{infer}" throughout or force the use of "infer package" if we must. That makes it much clearer to the outside reader.

@echasnovski

This comment has been minimized.

Copy link
Collaborator Author

echasnovski commented Jul 26, 2018

The only downside I can think of right now is an extra entry in package index. So if in the example there is #' @name p-val with later use of #' @rdname p-val, package index has three entries with description "Compute the p-value": p-val, p_value, and get_pvalue. For some people this might be confusing as there are only two functions. For other people this makes total sense as one of entries describe set of functions and it might be more intuitive for making reference.

Nevertheless, I adopted this approach and feel pretty happy. If in some situation extra entry makes absolutely no sense, name can be equal to one of the functions and it should fix this (as I briefly checked).

I agree with "{infer}" as a form of package name. Just need to remember to write "\\{infer\\}" in roxygen comments if markdown is used. There are two backslashes, I ironically misspelled before by using two backslashes in GitHub markdown which produced one backslash in the output.

@echasnovski

This comment has been minimized.

Copy link
Collaborator Author

echasnovski commented Jul 27, 2018

Which option will be better for PR reviewing?

  • Each step in separate commit. It means every commit will be about many relatively small changes in most of the files. On the other hand it may be harder to see "before and after" differences in every commit.
  • All steps in one commit. It should be better to see "before and after" differences as almost all documentation code will change. On the other hand it forces cognitive load in manual recognising of all steps.
@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Jul 27, 2018

Thanks much for asking. I’m fine with either one as they both have positives. Whichever way is more natural for you will work.

echasnovski added a commit to echasnovski/infer that referenced this issue Jul 29, 2018

Update documentation code to follow tidyverse style guide (tidymodels…
…#159).

Also:
- Fix some typos (in return value of `generate()` and by using "p-value" instead of "pvalue" in description of `pvalue_fill` of `visualize()`).
- Add more cross-linking.

echasnovski added a commit to echasnovski/infer that referenced this issue Jul 29, 2018

Update documentation code to follow tidyverse style guide (tidymodels…
…#159).

Also:
- Fix some typos (in return value of `conf_int()` and by using "p-value" instead of "pvalue" in description of `pvalue_fill` of `visualize()`).
- Add more cross-linking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment