Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using rowwise and mutate to create new variables #3890

Closed
nealpsmith opened this issue Oct 9, 2018 · 3 comments
Closed

Using rowwise and mutate to create new variables #3890

nealpsmith opened this issue Oct 9, 2018 · 3 comments

Comments

@nealpsmith
Copy link

@nealpsmith nealpsmith commented Oct 9, 2018

When reviewing some code, I noticed there seems to be an issue with using rowwise() with mutate() to create a new dataframe column.

# Create a dataframe with some values that are in 2 rows
df <- as.data.frame(matrix(c(2,3,4,5,20,6,1,7,2,3), nrow = 2, byrow = TRUE))

# Use Apply to generate the median of each row: This gives correct values
df$apply.median <- apply(x, 1, median)

# Use rowwise with mutate to get medians: This gives incorrect values
df %<>% rowwise() %>%
  dplyr::mutate(dplyr.median=median(V1:V5))

df
Source: local data frame [2 x 7]
Groups: <by row>

# A tibble: 2 x 7
     V1    V2    V3    V4    V5 apply.median dplyr.median
  <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>        <dbl>
1     2     3     4     5    20            4         11  
2     6     1     7     2     3            3          4.5

I've seen this issue not just with looking at medians, but also other basic functions. For example if I try to look at the length of V1:V5 in the same manner, this is the output:

# Look for the length of the vector that is being used for the median
df %<>% rowwise() %>%
  dplyr::mutate(length = length(V1:V5))

> df
Source: local data frame [2 x 8]
Groups: <by row>

# A tibble: 2 x 8
     V1    V2    V3    V4    V5 apply.median dplyr.median length
  <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>        <dbl>  <int>
1     2     3     4     5    20            4         11       19
2     6     1     7     2     3            3          4.5      4
@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Oct 9, 2018

I don't think median(V1:V5) does what you think it does 😉

@hadley hadley closed this Oct 9, 2018
@romainfrancois

This comment has been minimized.

Copy link
Member

@romainfrancois romainfrancois commented Oct 9, 2018

When you do median(V1:V5) the : operates on the values, so for the first rows you get the same as median(2:20) -> 11, etc ...

This is more of a job for purrr::pmap :

library(tidyverse)
df <- as.data.frame(matrix(c(2,3,4,5,20,6,1,7,2,3), nrow = 2, byrow = TRUE))
df %>% 
  mutate(median = select(., V1:V5) %>% pmap_dbl(~median(c(...))))
#>   V1 V2 V3 V4 V5 median
#> 1  2  3  4  5 20      4
#> 2  6  1  7  2  3      3

Created on 2018-10-09 by the reprex package (v0.2.1.9000)

@lock

This comment has been minimized.

Copy link

@lock lock bot commented Apr 7, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Apr 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.