Offer backwards compatibility to < 1.2.0 in name repair #546

HughParsonage · 2019-02-02T00:46:39Z

Since the introduction of .name_repair changes the output of read_excel and since it is quite difficult to reproduce the same effect, it would be nice to offer a package option, say getOption("readxl.old_repair_names"), to provide backwards compatibility. Otherwise, the only alternative appears to be a hack using assignInNamespace. That is, instead of

assignInNamespace(ns = "readxl",
                  x = "set_readxl_names",
                  value = function (l, .name_repair = "unique") {
    tibble_version <- utils::packageVersion("tibble")
    if (FALSE && tibble_version > "1.4.2") {
        if (is.null(.name_repair)) {
            return(tibble::as_tibble(l))
        }
        else {
            return(tibble::as_tibble(l, .name_repair = .name_repair))
        }
    }
    tibble::repair_names(tibble::as_tibble(l, validate = FALSE), 
        prefix = "X", sep = "__")
})

Just have options("readxl.old_repair_names" = TRUE) and then modify set_readxl_names like

set_readxl_names <- function (l, .name_repair = "unique") {
    tibble_version <- utils::packageVersion("tibble")
    if (!getOption("readxl.old_repair_names", FALSE) && tibble_version > "1.4.2") {
        if (is.null(.name_repair)) {
            return(tibble::as_tibble(l))
        }
        else {
            return(tibble::as_tibble(l, .name_repair = .name_repair))
        }
    }
    tibble::repair_names(tibble::as_tibble(l, validate = FALSE), 
        prefix = "X", sep = "__")
})

The text was updated successfully, but these errors were encountered:

raredd · 2019-02-04T16:08:59Z

This update has broken so much code, how could this be overlooked?

jennybc · 2019-02-13T06:42:56Z

@HughParsonage

If you want the old style of name repair in the presence of the current CRAN versions of both readxl and tibble, I suggest that you define a function like legacy_repair(). Below I inline the logic from tibble::repair_names(). And then you can call it .name_repair argument.

library(readxl)

legacy_repair <- function(nms, prefix = "X", sep = "__") {
  if (length(nms) == 0) return(character())
  blank <- nms == ""
  nms[!blank] <- make.unique(nms[!blank], sep = sep)
  new_nms <- setdiff(paste(prefix, seq_along(nms), sep = sep), nms)
  nms[blank] <- new_nms[seq_len(sum(blank))]
  nms
}

legacy_repair(rep_len("", 3))
#> [1] "X__1" "X__2" "X__3"
legacy_repair(c("x", "", "x"))
#> [1] "x"    "X__1" "x__1"

readxl_test_sheet <- "~/rrr/readxl/tests/testthat/sheets/names-need-repair-xlsx.xlsx"

read_excel(readxl_test_sheet)
#> New names:
#> * `a b` -> `a b..1`
#> * `a b` -> `a b..2`
#> * `` -> `..3`
#> # A tibble: 2 x 4
#>   `a b..1` `a b..2` ..3   `c%&$`
#>      <dbl> <chr>    <chr>  <dbl>
#> 1        1 a        one      1.1
#> 2        2 b        two      2.2

read_excel(readxl_test_sheet, .name_repair = legacy_repair)
#> # A tibble: 2 x 4
#>   `a b` `a b__1` X__1  `c%&$`
#>   <dbl> <chr>    <chr>  <dbl>
#> 1     1 a        one      1.1
#> 2     2 b        two      2.2

Or you could decline name repair and use tibble::repair_names() after the fact.

library(readxl)

readxl_test_sheet <- "~/rrr/readxl/tests/testthat/sheets/names-need-repair-xlsx.xlsx"

x <- read_excel(readxl_test_sheet, .name_repair = "minimal")
x
#> # A tibble: 2 x 4
#>   `a b` `a b` ``    `c%&$`
#>   <dbl> <chr> <chr>  <dbl>
#> 1     1 a     one      1.1
#> 2     2 b     two      2.2

tibble::repair_names(
  tibble::as_tibble(x, validate = FALSE), prefix = "X", sep = "__"
)
#> # A tibble: 2 x 4
#>   `a b` `a b__1` X__1  `c%&$`
#>   <dbl> <chr>    <chr>  <dbl>
#> 1     1 a        one      1.1
#> 2     2 b        two      2.2

Or maybe stay on an earlier version of readxl or tibble until the time is right to take control of variable names? This should not affect sheet reads where the names are well-formed to begin with.

We really want to reduce the inconsistencies across tidyverse packages re: how column names are repaired. I know this requires some adjustments for people, but this was a considered move. It was also pre-announced (see comment I'm about to write in response to @raredd).

jennybc · 2019-02-13T06:50:01Z

@raredd

This is not an oversight. readxl has been downloaded almost 600K times since its December 2018 release, from RStudio CRAN mirrors alone. This is the very first issue opened about name repair, about 6 weeks later. We go through complete revdep checks for readxl and for tibble and we search & analyze non-package code on GitHub before we make such changes. Changes to column name repair were pre-announced when v1.1.0 was released in April of 2018. There are costs to changing and to not changing and we think making this change is a net improvement to the ecosystem.

HughParsonage · 2019-02-13T07:21:25Z

Thank you very much @jennybc -- that solution works for the sheets I had had trouble with.

(Apologies for the lack of reprex and for seeming ungrateful -- I'm truly spoiled by this package.)

jennybc · 2019-02-13T07:34:34Z

No worries @HughParsonage, I know this is a disruption and the average level of pain is not indicative of what any specific user experiences.

raredd · 2019-02-14T15:55:26Z

@jennybc There are already four built-in name repairs, why not simply add a 5th that was the standard for however long since readxl was released?
Yes, it is an entire almost three months since Dec 2018, but only recently have I had to go back and update several projects which predate Dec 2018, all of which have thrown me errors about variable names. If I am the only one, I will send you a check.
As a package developer you know the vast majority of users will never even look at the other options of name repair much less use them--yet you support those three plus the new default, and already have a solution for legacy code. I honestly do not care what the default is--I'm sure you only have the best names--but adding a 5th option for name repair that hundreds of thousands of users have been using since readxl's first release seems apparent.

Addresses tidyverse/readxl#546 Addresses tidyverse/tidyr#641 Closes r-lib#359

jennybc closed this as completed Feb 13, 2019

hadley mentioned this issue May 28, 2019

Deprecated unnest() calls need to use old name repair tidyverse/tidyr#641

Closed

lionel- added a commit to lionel-/vctrs that referenced this issue Jun 19, 2019

Add vec_as_names_legacy()

b3b1570

Addresses tidyverse/readxl#546 Addresses tidyverse/tidyr#641 Closes r-lib#359

lionel- mentioned this issue Jun 19, 2019

Add vec_as_names_legacy() r-lib/vctrs#430

Merged

jennybc mentioned this issue Jun 19, 2019

Note the existence of vctrs::vec_as_names_legacy() #575

Closed

lionel- added a commit to lionel-/vctrs that referenced this issue Jun 20, 2019

Add vec_as_names_legacy()

fe827e1

Addresses tidyverse/readxl#546 Addresses tidyverse/tidyr#641 Closes r-lib#359

lock bot locked and limited conversation to collaborators Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer backwards compatibility to < 1.2.0 in name repair #546

Offer backwards compatibility to < 1.2.0 in name repair #546

HughParsonage commented Feb 2, 2019

raredd commented Feb 4, 2019

jennybc commented Feb 13, 2019

jennybc commented Feb 13, 2019

HughParsonage commented Feb 13, 2019

jennybc commented Feb 13, 2019

raredd commented Feb 14, 2019

Offer backwards compatibility to < 1.2.0 in name repair #546

Offer backwards compatibility to < 1.2.0 in name repair #546

Comments

HughParsonage commented Feb 2, 2019

raredd commented Feb 4, 2019

jennybc commented Feb 13, 2019

jennybc commented Feb 13, 2019

HughParsonage commented Feb 13, 2019

jennybc commented Feb 13, 2019

raredd commented Feb 14, 2019