-
Notifications
You must be signed in to change notification settings - Fork 2.1k
morph() to automatically remove columns "used up" by a mutate() #3721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
library(tidyverse)
#iris %>% transmutate(Petal.Area = Petal.Width * Petal.Length)
iris %>%
as_tibble() %>%
mutate(Petal.Area = Petal.Width * Petal.Length) %>%
select(-Petal.Width, -Petal.Length)
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Species Petal.Area
#> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 setosa 0.280
#> 2 4.9 3 setosa 0.280
#> 3 4.7 3.2 setosa 0.26
#> 4 4.6 3.1 setosa 0.3
#> 5 5 3.6 setosa 0.280
#> 6 5.4 3.9 setosa 0.68
#> 7 4.6 3.4 setosa 0.42
#> 8 5 3.4 setosa 0.3
#> 9 4.4 2.9 setosa 0.280
#> 10 4.9 3.1 setosa 0.15
#> # ... with 140 more rows
Thanks, I'm missing this functionality myself occasionally. Instead of parsing the expression, we could detect and record column access in the C++ code. What should the verb do in a grouped scenario, if some groups access a different set of columns than other groups? |
I think there are two natural options: |
I'd rather support only The biggest problem I see with both options is that type stability is compromised -- the resulting data frame might end up with different columns, depending on the data. Perhaps the safest thing to do would be to raise an error if different columns are accessed for different groups from this verb. What naming alternatives do we have? It might be difficult to remember the differences between |
I agree that perhaps the appropriate action for accessing different columns across groups is to raise an error. I'm not sure I follow the type stability concerns; if the For naming alternatives, perhaps we can turn to thesaurus: https://www.thesaurus.com/browse/transmute EDIT: A nicer alternative might be |
Suppose we have two group types: X and Y, the mutator code for group X accesses column We create something, but also take something else away. How about |
Thanks for the example, Kirill. That makes sense, and raising an error seems like the best approach to maintain consistency. I think I have a slight preference towards |
|
Yes!! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I like the idea of the |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I find those less descriptive of the proposed feature.
We're thinking about a new workflow based on tibble return values where |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
transmutate()
automatically removes columns that were used in a mutate()
-like call.
A few thoughts on implementation given the recent changes to dplyr internals: Implementing My primary concern is that this will be harder to implement if we switch to using ALTREP slices, but we could probably still have a fallback for |
Dear
dplyr
developers,A recent Stack Overflow question raised an interesting use case of having columns fed to a
mutate()
call automatically removed from the result. To do this, the mutator would need to parse the input expressions to determine what symbols were used, and I made the first pass at designing such a function. The question author liked my answer and suggested that I contribute it todplyr
.I am happy to work on a PR with a more robust implementation, but I wanted to check with you if such a feature would align with your design principles and the spirit of the package.
Thanks. Big fan of your work.
-Artem
The text was updated successfully, but these errors were encountered: