Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coalesce() function "Must subset elements with valid subscript vector" #5326

Closed
mlok2u opened this issue Jun 11, 2020 · 8 comments · Fixed by #5334
Closed

coalesce() function "Must subset elements with valid subscript vector" #5326

mlok2u opened this issue Jun 11, 2020 · 8 comments · Fixed by #5334
Labels
bug an unexpected problem or unintended behavior funs 😆

Comments

@mlok2u
Copy link

mlok2u commented Jun 11, 2020

Hi,

I have historically been able to wrap the coalesce() function with a function that will identify suffixed columns in a dataframe and coalesce accordingly.
However, after updating to v.1.0.0, the functionality appears to be broken due to the following exception:
"Must subset elements with a valid subscript vector."


Brief description of the problem

library(dplyr)

coalesce_df_fields <- function(df, suffix_a, suffix_b) {
  suffix_a <- paste0('(',suffix_a,')$')
  suffix_b <- paste0('(',suffix_b,')$')

  fields_a <- data.frame(fields_a = colnames(df)[grepl(suffix_a, colnames(df))]) %>%
    mutate(root = gsub(suffix_a, '', fields_a))
  
  fields_b <- data.frame(fields_b = colnames(df)[grepl(suffix_b, colnames(df))]) %>%
    mutate(root = gsub(suffix_b, '', fields_b)) %>%
    filter(fields_b != 'not_in_sf')
  
  colname_ref <- fields_a %>% full_join(fields_b, by = 'root')
  ## LOOP: Need to create new field "root" in df that coalesces "fields_a" and "fields_b" values
  df[colname_ref$root] <- coalesce(df[colname_ref$fields_a], df[colname_ref$fields_b])
  
  return(df)
  
}

## use case of the function:

sample_df <- data.frame(id = c('a', 'b', 'c', 'd'),
                 int_x = c(NA, 5, NA, 8),
                 char_x = c('14', NA, 'hello', 'what'),
                 int_y = c(100, 389, 90, 9),
                 char_y = c('this', 'should', 'fill', 'in'))

df <- coalesce_df_fields(sample_df, '_x', '_y')
@mlok2u mlok2u changed the title coalesce() function coalesce() function "Must subset elements with valid subscript vector" Jun 11, 2020
@lionel-
Copy link
Member

lionel- commented Jun 11, 2020

Minimal example

coalesce(data.frame(x = c(NA, 1)), data.frame(x = 1:2))
#> Error: Must subset elements with a valid subscript vector.
#> ✖ Subscript must be a simple vector, not a matrix.

@Jillzee
Copy link

Jillzee commented Jun 15, 2020

I'm also experiencing this error after updating to 1.0.0 today for a similar data situation to those which lionel and mlok2u provided above. It had been working prior to updating.

@lionel-
Copy link
Member

lionel- commented Jun 15, 2020

I'll take a look now.

@lionel-
Copy link
Member

lionel- commented Jun 15, 2020

As far as I can tell this never worked properly. The behaviour with data frames is not documented in ?coalesce and it doesn't return sensical output in 0.8.5:

out <- dplyr::coalesce(data.frame(x = c(NA, 1)), data.frame(x = 1:2))
str(out)
#> 'data.frame':	2 obs. of  1 variable:
#>  $ x:List of 2
#>   ..$ : int  1 2
#>   ..$ : num 1

So I think you were relying on undefined behaviour. We should probably make it work with data frames but that will require some thinking and the behaviour will be different than in 0.8.5.

@mlok2u
Copy link
Author

mlok2u commented Jun 15, 2020

It appeared to have worked when using dplyr 0.7.7.

@lionel-
Copy link
Member

lionel- commented Jun 15, 2020

I've just tried with 0.7.7 and I see the same non-sensical output as in my last comment.

I have prepared a fix but the behaviour will be different, the data frames will be coalesced columns by columns.

lionel- added a commit to lionel-/dplyr that referenced this issue Jun 15, 2020
lionel- added a commit to lionel-/dplyr that referenced this issue Jun 15, 2020
@lionel-
Copy link
Member

lionel- commented Jun 15, 2020

@mlok2u My bad, the use case you've reported did work correctly in previous versions. With the fix in #5334, the only difference in the output is that the type of the column int will now be numeric instead of character, which is the correct behaviour.

@mlok2u
Copy link
Author

mlok2u commented Jun 16, 2020

Awesome - thanks for looking into this!

@hadley hadley added bug an unexpected problem or unintended behavior funs 😆 labels Jun 22, 2020
hadley pushed a commit that referenced this issue Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior funs 😆
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants