Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate dataframe-based mapping functions #226

Merged
merged 1 commit into from
Sep 21, 2016

Conversation

lionel-
Copy link
Member

@lionel- lionel- commented Aug 18, 2016

Deprecation of rows-based functions now that we have the new mapping functions in dplyr and the nest() / mutate() / map() idiom.

One use case that does not have any equivalent in dplyr is when the output is within the range c(1, nrow(df)). For example:

dmap(mtcars, summary)

This can be worked around, for example here we'd do

mtcars %>% map(summary) %>% as_data_frame()

However a new dplyr family of verbs for variable-length output may be useful. It could be called condense().

  • Like summarise() it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input.
  • Unlike summarise(), it would not require length 1 results and would only check for equal length within group. Grouping columns would be recycled to these lengths.

Ungrouped data frame: check squared constraint

mtcars %>% condense(col = 1:5, other = 5:1)
#> # A tibble: 5 x 2
#>     col other
#>   <int> <int>
#> 1     1     5
#> 2     2     4
#> 3     3     3
#> 4     4     2
#> 5     5     1

mtcars %>% condense(col = 1:5, other = 2:1)
#> Error: results must have same length

This gives us immediately:

mtcars %>% condense_all(summary)
#> # A tibble: 6 x 11
#>           mpg         cyl        disp          hp        drat
#>   <S3: table> <S3: table> <S3: table> <S3: table> <S3: table>
#> 1        10.4        4.00        71.1        52.0        2.76
#> 2        15.4        4.00       121.0        96.5        3.08
#> 3        19.2        6.00       196.0       123.0        3.70
#> 4        20.1        6.19       231.0       147.0        3.60
#> 5        22.8        8.00       326.0       180.0        3.92
#> 6        33.9        8.00       472.0       335.0        4.93
#> # ... with 6 more variables: wt <S3: table>, qsec <S3: table>, vs <S3:
#> #   table>, am <S3: table>, gear <S3: table>, carb <S3: table>

For a grouped data frame, we'd check the square constrain within groups:

grouped <- mtcars %>% group_by(am)

grouped %>%
  condense(
    col = rep(mean(cyl), times = round(mean(cyl))),
    other = rep(length(col), length(col))
  )
#> # A tibble: 12 x 3
#>       am   col other
#>    <dbl> <dbl> <dbl>
#> 1      0  6.95     7
#> 2      0  6.95     7
#> 3      0  6.95     7
#> 4      0  6.95     7
#> 5      0  6.95     7
#> 6      0  6.95     7
#> 7      0  6.95     7
#> 8      1  5.08     5
#> 9      1  5.08     5
#> 10     1  5.08     5
#> 11     1  5.08     5
#> 12     1  5.08     5

grouped %>%
  condense(
    col = rep(mean(cyl), times = round(mean(cyl))),
    other = rep(length(col), length(col) - 1)
  )
#> Error: results must have same length within groups

Relevant discussion: tidyverse/dplyr#154

@codecov-io
Copy link

Current coverage is 76.47% (diff: 85.71%)

Merging #226 into master will increase coverage by 0.18%

@@             master       #226   diff @@
==========================================
  Files            42         42          
  Lines          1544       1556    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           1178       1190    +12   
  Misses          366        366          
  Partials          0          0          

Powered by Codecov. Last update d162445...39984cd

@lionel-
Copy link
Member Author

lionel- commented Aug 18, 2016

Could also have disperse() verb that would be like condense() but spreads the result over numbered columns, thus resulting in 1 row per group like summarise()? That would be the equivalent to .collate = "cols" in purrr df functions.

With that verb the lengths can be different across results but must be the same across groups.

@hadley
Copy link
Member

hadley commented Sep 21, 2016

Can you please add a bullet to news, and move the dplyr stuff to a dplyr issue? It would also be useful to comment on dmap(iris, summary) - I'm not sure if summary() is key to your examples or not.

@lionel- lionel- merged commit 8c72c35 into tidyverse:master Sep 21, 2016
@lionel- lionel- deleted the df-deprecation branch November 27, 2018 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants