Would it be useful to have a dedicated function (say, pick()) to select columns from the current data? Currently, across() with only a .cols argument serves this role.
I would see a dedicated function having at least three advantages:
- Nicer syntax for union selections:
pick(1, last_col()) vs. across(c(1, last_col())).
- Better semantics.
across() makes sense when there’s functions to apply, but less so when it’s used just for selecting columns. pick() seems intuitive for only selecting columns.
- Reuse existing patterns:
across(c(1:2, 4), mean) vs. map_df(pick(1:2, 4), mean). The first requires you to know that across() can select columns and apply a function, latter can re-use existing function application methods.
The last point is particularly important if/when ... is deprecated in across() (#6073), as funtionality would not be identical anymore. For example:
# With no ..., need to use an anonymous function for na.rm
across(c(1, 3:4), ~ mean(., na.rm = TRUE))
# Could be avoided with `pick()`
map_df(pick(1, 3:4), mean, na.rm = TRUE)
I would see the primary uses for this as:
- Replace
across() in e.g. group_by() selections group_by(across(c(1, 3:5))) vs. group_by(pick(1, 3:5)). Big semantic and syntactic win, IMO.
- Passing arguments to functions that take data frame or matrix arguments. For example common questions about taking means or sums over rows in data frames. In my experience people don’t think to
apply(across(1:5), 1, f), but apply(pick(1:5), 1, f) might be more intuitive.
I could think of two ways to implement this as a wrapper:
pick <- function(...) {
across(.cols = c(...))
}
Or:
pick <- function(...) {
select(cur_data(), ...)
}
Although, particularly with the across() route, it would seem nicer to reverse the dependency and extract the relevant parts from across() intopick() instead.
I appreciate your consideration for this feature request.
Would it be useful to have a dedicated function (say,
pick()) to select columns from the current data? Currently,across()with only a.colsargument serves this role.I would see a dedicated function having at least three advantages:
pick(1, last_col())vs.across(c(1, last_col())).across()makes sense when there’s functions to apply, but less so when it’s used just for selecting columns.pick()seems intuitive for only selecting columns.across(c(1:2, 4), mean)vs.map_df(pick(1:2, 4), mean). The first requires you to know thatacross()can select columns and apply a function, latter can re-use existing function application methods.The last point is particularly important if/when
...is deprecated inacross()(#6073), as funtionality would not be identical anymore. For example:I would see the primary uses for this as:
across()in e.g.group_by()selectionsgroup_by(across(c(1, 3:5)))vs.group_by(pick(1, 3:5)). Big semantic and syntactic win, IMO.apply(across(1:5), 1, f), butapply(pick(1:5), 1, f)might be more intuitive.I could think of two ways to implement this as a wrapper:
Or:
Although, particularly with the
across()route, it would seem nicer to reverse the dependency and extract the relevant parts fromacross()intopick()instead.I appreciate your consideration for this feature request.