Skip to content

Sketch implementation of new separate functions #1304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 78 commits into from
Nov 4, 2022
Merged

Sketch implementation of new separate functions #1304

merged 78 commits into from
Nov 4, 2022

Conversation

hadley
Copy link
Member

@hadley hadley commented Jan 21, 2022

TODO:

  • Better function names
  • extra and fill need prefixes
  • Better way to get bad rows? — No actual demand, no maybe in the future.
  • Should list2df() be baked into unnest_wider() or provided in stringr? — Doesn't really matter; leave in tidyr for this PR.
  • Warnings for separate_wider_fixed() & separate_wider_regex()
  • Need to pass along names_repair in *_wider
  • Check inputs
  • Add tests
  • Extract out repeated code to map_unpack() + map_unchop()?
  • Don't use named groups to avoid problems with special var names
  • separate_longer() that extracts group from multiple matches
  • Warn if separate_wider_regex() uses parens (i.e. check number of columns in match)
  • str_separate_wider_fixed should be able to create arbitrary number of columns too
  • Make into generics? — Maybe in the future.
  • Need way to diagnose/iterate with separate_wider_regex()
  • Think about how to incorporate into names_pattern etc in pivot_longer()
  • Update error handling — argument checking in top level functions, so no longer needed
  • Same spec for pivot_longer(names_pattern)
  • Use sep instead of delim
  • Some way to suppress warnings in separate_regex_wider
  • diagnose = TRUE
  • Add news bullet
  • Update superseded functions
  • Don't include NA values in problems

Copy link
Member

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly lots of small things. Overall looks really nice

I had a tough time knowing how to review the "progressively relax the matches" part of str_separate_wider_regex() because I'm not super familiar with regex

Copy link
Member

@mine-cetinkaya-rundel mine-cetinkaya-rundel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR looks great. I like the naming of the functions and the arguments. I've added some comments throughout but they're mostly minor.

One thing that I think would be beneficial is a table like https://github.com/tidyverse/tidyups/blob/9fbe5d48ec9c8dbf3bcd5db5995d905fba9532a7/002-tidyr-stringr.Rmd#L108-L114. I wonder if that's more of a vignette-thing than a function docs thing, but I think it would be good to expose a version of that with the new functions somewhere in the tidyverse docs.

A minor / potentially nitpicky comment is about styling of function arguments in documentation. For example, we should use "code text" (backticks) for the names_repair options in the pack.R documentation (

tidyr/R/pack.R

Lines 91 to 94 in f6b9509

#' * "minimal": no name repair or checks, beyond basic existence,
#' * "unique": make sure names are unique and not empty,
#' * "check_unique": (the default), no name repair, but check they are unique,
#' * "universal": make the names unique and syntactic
), that then gets used in various other places, e.g., "minimal". I think it's also worthwhile to make a similar change to the extra and fill options for separate() as well. (Note: I checked other similar arguments and .groups for summarize() doesn't use code text but method for geom_smooth() does. I think it's good to consistently distinguish character strings that are function arguments with styling.)

#' @return A data frame based on `data`. It has the same columns, but different
#' rows.
#' @examples
#' df <- tibble(id = 1:4, x = c("x", "x y", "x y z", NA))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these examples would be easier to follow if the dfs are defined with tribble() instead of tibble(), but I realize that takes space...

Comment on lines +9 to +12
#' `separate_rows()` has been superseded in favour of [separate_longer_delim()]
#' because it has a more consistent API with other separate functions.
#' Superseded functions will not go away, but will only receive critical bug
#' fixes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general comment about this sort of statement, I think it makes more sense above the description, as the first thing in the documentation for the function. I often find myself regretting having spent mental energy deciphering the description of the function just to find out it's been superseded/deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; I put them below in the belief that most people would want to know what a function does before knowing what replaces it. I can move them.

@hadley hadley merged commit 9174795 into main Nov 4, 2022
@hadley hadley deleted the separate branch November 7, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants