Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadly consider do() and rowwise() #4723

Closed
hadley opened this issue Jan 9, 2020 · 2 comments
Closed

Broadly consider do() and rowwise() #4723

hadley opened this issue Jan 9, 2020 · 2 comments
Labels
feature grouping 👨‍👩‍👧‍👦

Comments

@hadley
Copy link
Member

hadley commented Jan 9, 2020

With the idea of (finally!) coming up with a formal replacement.

Imaginary code:

gapminder %>%
  nest_by(country) %>% 
  mutate(fit = lm(lifeExp ~ year, data = data)) %>% 
  summarise(broom::tidy(fit))

Equivalent to:

library(gapminder)
library(tidyverse)

gapminder %>%
  group_by(country) %>%
  nest() %>% 
  mutate(
    fit = map(data, ~ lm(lifeExp ~ year, data = .x)),
    tidy = map(fit, broom::tidy)
  ) %>% 
  select(tidy) %>% 
  unnest(tidy)
#> Adding missing grouping variables: `country`
#> # A tibble: 284 x 6
#> # Groups:   country [142]
#>    country     term         estimate std.error statistic  p.value
#>    <fct>       <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#>  1 Afghanistan (Intercept)  -508.     40.5        -12.5  1.93e- 7
#>  2 Afghanistan year            0.275   0.0205      13.5  9.84e- 8
#>  3 Albania     (Intercept)  -594.     65.7         -9.05 3.94e- 6
#>  4 Albania     year            0.335   0.0332      10.1  1.46e- 6
#>  5 Algeria     (Intercept) -1068.     43.8        -24.4  3.07e-10
#>  6 Algeria     year            0.569   0.0221      25.7  1.81e-10
#>  7 Angola      (Intercept)  -377.     46.6         -8.08 1.08e- 5
#>  8 Angola      year            0.209   0.0235       8.90 4.59e- 6
#>  9 Argentina   (Intercept)  -390.      9.68       -40.3  2.14e-12
#> 10 Argentina   year            0.232   0.00489     47.4  4.22e-13
#> # … with 274 more rows

Or

gapminder %>%
  group_by(country) %>%
  do(fit = lm(lifeExp ~ year, data = .)) %>% 
  do(data.frame(country = .$country, broom::tidy(.$fit)))

This requires:

  • mutate.rowwise() needs to automatically wrap in outputs in list where needed.
  • rowwise() needs to be able to capture grouping variables; or grouped_df needs some way to activate row-wise magic?
  • summarise.rowwise() would return grouped_df since might no longer have 1 row per group?
  • New nest_by() that works like group_nest() + rowwise(). Needs a lot of thinking about name.
@hadley hadley added feature grouping 👨‍👩‍👧‍👦 labels Jan 9, 2020
@romainfrancois
Copy link
Member

romainfrancois commented Jan 9, 2020

if we don't need fit to stick around, then we have :

library(gapminder)
library(dplyr, warn.conflicts = FALSE)

gapminder %>%
  filter(continent == "Asia") %>%
  group_by(country) %>% 
  summarise(broom::tidy(lm(lifeExp ~ year)))
#> # A tibble: 66 x 6
#>    country     term        estimate std.error statistic  p.value
#>    <fct>       <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 Afghanistan (Intercept) -508.      40.5       -12.5  1.93e- 7
#>  2 Afghanistan year           0.275    0.0205     13.5  9.84e- 8
#>  3 Bahrain     (Intercept) -860.      54.3       -15.8  2.07e- 8
#>  4 Bahrain     year           0.468    0.0274     17.0  1.02e- 8
#>  5 Bangladesh  (Intercept) -936.      32.3       -29.0  5.63e-11
#>  6 Bangladesh  year           0.498    0.0163     30.5  3.37e-11
#>  7 Cambodia    (Intercept) -736.     186.         -3.95 2.74e- 3
#>  8 Cambodia    year           0.396    0.0942      4.20 1.82e- 3
#>  9 China       (Intercept) -989.     128.         -7.74 1.57e- 5
#> 10 China       year           0.531    0.0645      8.23 9.21e- 6
#> # … with 56 more rows

Created on 2020-01-09 by the reprex package (v0.3.0.9000)

@romainfrancois
Copy link
Member

romainfrancois commented Jan 14, 2020

If we were to :

  • roll back summarise() ability to return size > 1
  • enlist the results that are not size 1 vectors

Then we can have this:

library(gapminder)
library(dplyr, warn.conflicts = FALSE)

gapminder %>%
  filter(continent == "Asia") %>%
  group_by(country) %>% 
  summarise(
     fit = lm(lifeExp ~ year),
     results = broom::tidy(fit)
  ) 

Perhaps with a warning here because we magic enlist a data frame (so a vector) of size > 1.

And we could make it explicit with results = list(broom::tidy(fit)) instead, but we would not have to make it explicit for fit = because it does not return a vector, nor we would need to add the [[ as long as we are in the summarise() call, as we know we've just made it.

This loses some functionality that summarise() was supposed to gain, i.e. size > 1 results, but that can be the job of do() which would allow it, and potentially not do group peeling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature grouping 👨‍👩‍👧‍👦
Projects
None yet
Development

No branches or pull requests

2 participants