Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invisible differences in column names cause unnest() to fail #722

Closed
paleolimbot opened this issue Aug 29, 2019 · 4 comments
Closed

Invisible differences in column names cause unnest() to fail #722

paleolimbot opened this issue Aug 29, 2019 · 4 comments
Milestone

Comments

@paleolimbot
Copy link
Member

This is a very weird one that was discovered while trying to fix paleolimbot/rclimateca#17. I can't replicate it without the exact binary representation of the original data frame, even though identical(df, df2) is true. I think it involves weird characters in the column names. Note that this code works in the the CRAN version of tidyr (0.8.3) but fails in the current development version.

library(tibble)
library(tidyr)

df <- tibble(
  dataset = c("ec_climate", "ec_climate"),
  location = c("KENTVILLE CDA CS NS 27141", "KENTVILLE CDA CS NS 27141"),
  result = list(
    tibble(
      `Max Temp Flag` = character(0),
      `Min Temp (°C)` = character(0),
      `Min Temp Flag` = character(0),
      `Mean Temp (°C)` = character(0)
    ),
    tibble(
      `Max Temp Flag` = rep(NA_character_, 6),
      `Min Temp (°C)` = rep(NA_character_, 6),
      `Min Temp Flag` = rep(NA_character_, 6),
      `Mean Temp (°C)` = rep(NA_character_, 6)
    )
  )
)

unnest(df, result)
#> # A tibble: 6 x 6
#>   dataset location `Max Temp Flag` `Min Temp (°C)` `Min Temp Flag`
#>   <chr>   <chr>    <chr>           <chr>           <chr>          
#> 1 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> 2 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> 3 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> 4 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> 5 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> 6 ec_cli… KENTVIL… <NA>            <NA>            <NA>           
#> # … with 1 more variable: `Mean Temp (°C)` <chr>

bad_df_file <- tempfile(fileext = ".rds")
curl::curl_download(
  "https://gist.github.com/paleolimbot/ec9b62b758ae57a5b4669fa771fc40a0/raw/e96b55f54d68b1cb3877bb358b28b99dc8836ceb/bad_df.rds",
  bad_df_file
)
df2 <- readr::read_rds(bad_df_file)
df2$result <- lapply(df2$result, function(x) {
  attr(x, "flag_info") <- NULL
  x
})

testthat::expect_identical(df, df2)
unnest(df2, result)
#> Column names `Min Temp (°C)`, `Mean Temp (°C)` must not be duplicated.
#> Use .name_repair to specify repair.

Created on 2019-08-29 by the reprex package (v0.2.1)

@hadley hadley added this to the v1.0.0 milestone Aug 29, 2019
@jennybc
Copy link
Member

jennybc commented Aug 29, 2019

@paleolimbot I am making some progress on this (I can reproduce it). I'm curious: what version of vctrs do you have? I suspect this thread eventually leads there.

@paleolimbot
Copy link
Member Author

0.2.0, I believe!

vctrs * 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)

@paleolimbot
Copy link
Member Author

I also have an email from BDR about a "byte order mark" (U+FEFF) appearing in my source files and I wonder if it also appears in the files I'm reading from Environment Canada (that has somehow made it into the column name vector).

@jennybc
Copy link
Member

jennybc commented Aug 30, 2019

This is a very strange object you have @paleolimbot. But in any case, I think it's an issue for vctrs (r-lib/vctrs#553), not tidyr.

@jennybc jennybc closed this as completed Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants