Deprecate dataframe-based mapping functions #226

lionel- · 2016-08-18T11:10:43Z

Deprecation of rows-based functions now that we have the new mapping functions in dplyr and the nest() / mutate() / map() idiom.

One use case that does not have any equivalent in dplyr is when the output is within the range c(1, nrow(df)). For example:

dmap(mtcars, summary)

This can be worked around, for example here we'd do

mtcars %>% map(summary) %>% as_data_frame()

However a new dplyr family of verbs for variable-length output may be useful. It could be called condense().

Like summarise() it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input.
Unlike summarise(), it would not require length 1 results and would only check for equal length within group. Grouping columns would be recycled to these lengths.

Ungrouped data frame: check squared constraint

mtcars %>% condense(col = 1:5, other = 5:1)
#> # A tibble: 5 x 2
#>     col other
#>   <int> <int>
#> 1     1     5
#> 2     2     4
#> 3     3     3
#> 4     4     2
#> 5     5     1

mtcars %>% condense(col = 1:5, other = 2:1)
#> Error: results must have same length

This gives us immediately:

mtcars %>% condense_all(summary)
#> # A tibble: 6 x 11
#>           mpg         cyl        disp          hp        drat
#>   <S3: table> <S3: table> <S3: table> <S3: table> <S3: table>
#> 1        10.4        4.00        71.1        52.0        2.76
#> 2        15.4        4.00       121.0        96.5        3.08
#> 3        19.2        6.00       196.0       123.0        3.70
#> 4        20.1        6.19       231.0       147.0        3.60
#> 5        22.8        8.00       326.0       180.0        3.92
#> 6        33.9        8.00       472.0       335.0        4.93
#> # ... with 6 more variables: wt <S3: table>, qsec <S3: table>, vs <S3:
#> #   table>, am <S3: table>, gear <S3: table>, carb <S3: table>

For a grouped data frame, we'd check the square constrain within groups:

grouped <- mtcars %>% group_by(am)

grouped %>%
  condense(
    col = rep(mean(cyl), times = round(mean(cyl))),
    other = rep(length(col), length(col))
  )
#> # A tibble: 12 x 3
#>       am   col other
#>    <dbl> <dbl> <dbl>
#> 1      0  6.95     7
#> 2      0  6.95     7
#> 3      0  6.95     7
#> 4      0  6.95     7
#> 5      0  6.95     7
#> 6      0  6.95     7
#> 7      0  6.95     7
#> 8      1  5.08     5
#> 9      1  5.08     5
#> 10     1  5.08     5
#> 11     1  5.08     5
#> 12     1  5.08     5

grouped %>%
  condense(
    col = rep(mean(cyl), times = round(mean(cyl))),
    other = rep(length(col), length(col) - 1)
  )
#> Error: results must have same length within groups

Relevant discussion: tidyverse/dplyr#154

codecov-io · 2016-08-18T11:14:27Z

Current coverage is 76.47% (diff: 85.71%)

Merging #226 into master will increase coverage by 0.18%

@@             master       #226   diff @@
==========================================
  Files            42         42          
  Lines          1544       1556    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           1178       1190    +12   
  Misses          366        366          
  Partials          0          0

Powered by Codecov. Last update d162445...39984cd

lionel- · 2016-08-18T14:25:43Z

Could also have disperse() verb that would be like condense() but spreads the result over numbered columns, thus resulting in 1 row per group like summarise()? That would be the equivalent to .collate = "cols" in purrr df functions.

With that verb the lengths can be different across results but must be the same across groups.

hadley · 2016-09-21T13:07:06Z

Can you please add a bullet to news, and move the dplyr stuff to a dplyr issue? It would also be useful to comment on dmap(iris, summary) - I'm not sure if summary() is key to your examples or not.

lionel- force-pushed the df-deprecation branch from 39984cd to 00d5563 Compare September 21, 2016 17:29

Deprecate dataframe-based mapping functions

4cdeccf

lionel- force-pushed the df-deprecation branch from 00d5563 to 4cdeccf Compare September 21, 2016 17:37

lionel- merged commit 8c72c35 into tidyverse:master Sep 21, 2016

lionel- deleted the df-deprecation branch November 27, 2018 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate dataframe-based mapping functions #226

Deprecate dataframe-based mapping functions #226

lionel- commented Aug 18, 2016

codecov-io commented Aug 18, 2016

lionel- commented Aug 18, 2016

hadley commented Sep 21, 2016

Deprecate dataframe-based mapping functions #226

Deprecate dataframe-based mapping functions #226

Conversation

lionel- commented Aug 18, 2016

codecov-io commented Aug 18, 2016

Current coverage is 76.47% (diff: 85.71%)

lionel- commented Aug 18, 2016

hadley commented Sep 21, 2016