Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: across2() and pacross() #5703

Closed
mbcann01 opened this issue Jan 23, 2021 · 2 comments
Closed

Feature request: across2() and pacross() #5703

mbcann01 opened this issue Jan 23, 2021 · 2 comments

Comments

@mbcann01
Copy link

mbcann01 commented Jan 23, 2021

I found myself in a situation today where I wished there was an across2() or pacross() version of across() similarly to the way there are map2() and pmap() versions of purrr::map(). Here is a reproducible example to illustrate my situation.

I am given patient data that includes the Patient Health Questionnaire (phq). I've been asked to dichotomize each of the phq columns. I've also been given column labels and value labels and asked to add them to the new columns in the data frame.

# Example data
df <- tibble(
  id   = 1:10,
  sex  = c("m", "m", "m", "f", "f", "f", "m", "f", "f", "m"),
  phq1 = c(3, 1, 0, 1, 1, 0, 0, 3, 0, 1),
  phq2 = c(1, 2, 1, 1, 0, 1, 0, 3, 0, 1),
  phq3 = c(2, 1, 0, 0, 0, 0, 1, 3, 0, 1),
  phq4 = c(0, 2, 2, 0, 0, 0, 0, 3, 0, 1)
)
# Column labels
phq_var_labs <- c(
  'Little interest or pleasure in doing things dicot',
  'Felling down, depressed, or hopeless dicot',
  'Trouble falling or staying asleep, or sleeping too much dicot',
  'Feeling tired or having little energy dicot'
)

# Value label
phq_val_labs <- c(
  'Not at all and several days' = 0, 
  'More than half the days and nearly every day' = 1
)

Of course, I can easily create the new dichotomized variables using mutate() and across().

df %>% 
  mutate(
    across(
      .cols = phq1:phq4,
      .fns  = ~ case_when(
        .x  < 2 ~ 0,
        .x >= 2 ~ 1
      ),
      .names = "{col}_dicot"
    )
  )

However, I wasn't able to come up with a tidyverse iterative solution that I liked. Here is the best solution I was able to come up with:

# Dichotomize the values
df <- df %>%
  mutate(
    across(
      .cols = all_of(phq_8_vars),
      .fns  = ~ case_when(
        .x  < 2 ~ 0,
        .x >= 2 ~ 1
      ),
      .names = "{col}_dicot"
    )
  )

# Add column and value labels
for(i in seq_along(phq_8_vars)) {
  d <- paste0(phq_8_vars[[i]], "_dicot")
  attr(all_visits[[d]], "label") <- phq_8_var_labs[[i]]
  attr(all_visits[[d]],"labels") <- phq_8_val_labs
}

In my case, and perhaps others, it would be handy to be able to do something like:

df <- df %>% 
  mutate(
    across2(
      .cols = phq1:phq4,
      .y    = phq_var_labs,
      .fns  = function(x, .y) {
        v <- case_when(
          x  < 2 ~ 0,
          x >= 2 ~ 1
        )
        attr(v, "label")  <- .y
        attr(v, "labels") <- phq_val_labs
        v
      },
      .names = "{col}_dicot"
    )
  )
@romainfrancois
Copy link
Member

This feels a bit esoteric, especially with .cols= using tidy selection and .y= not.

For this specific example, it could be something like the code below. I'm not sure it needs to be abstracted out in a new function:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id   = 1:10,
  sex  = c("m", "m", "m", "f", "f", "f", "m", "f", "f", "m"),
  phq1 = c(3, 1, 0, 1, 1, 0, 0, 3, 0, 1),
  phq2 = c(1, 2, 1, 1, 0, 1, 0, 3, 0, 1),
  phq3 = c(2, 1, 0, 0, 0, 0, 1, 3, 0, 1),
  phq4 = c(0, 2, 2, 0, 0, 0, 0, 3, 0, 1)
)
# Column labels
phq_var_labs <- c(
  'Little interest or pleasure in doing things dicot',
  'Felling down, depressed, or hopeless dicot',
  'Trouble falling or staying asleep, or sleeping too much dicot',
  'Feeling tired or having little energy dicot'
)

out <- df %>% 
  mutate({
    data <- across(phq1:phq4)
    out <- purrr::map2_df(data, phq_var_labs, function(.x, .y) {
      v <- case_when(
        .x  < 2 ~ 0,
        .x >= 2 ~ 1
      )
      attr(v, "label")  <- .y
      attr(v, "labels") <- phq_var_labs
      v  
    })
    names(out) <- glue::glue("{col}_dicot", col = names(out))
    out
  })

structure(out$phq2_dicot)
#>  [1] 0 1 0 0 0 0 0 1 0 0
#> attr(,"label")
#> [1] "Felling down, depressed, or hopeless dicot"
#> attr(,"labels")
#> [1] "Little interest or pleasure in doing things dicot"            
#> [2] "Felling down, depressed, or hopeless dicot"                   
#> [3] "Trouble falling or staying asleep, or sleeping too much dicot"
#> [4] "Feeling tired or having little energy dicot"

Created on 2021-01-25 by the reprex package (v0.3.0)

@mbcann01
Copy link
Author

Hi @romainfrancois, Thank you for your consideration. I can see how it could seem esoteric. Other potential uses could include creating factors and assigning factor labels. Even still, I respect that it isn't worth pursuing further.

Also, thank you for providing an example of a potential solution for my specific issue. As a side note, it doesn't quite give the desired result. Notice that all four variable labels have been assigned to phq2_dicot above, as opposed to just "Felling down, depressed, or hopeless dicot"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants