New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returned tibble has NAs in colnames #199

Closed
t-kalinowski opened this Issue Aug 29, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@t-kalinowski

t-kalinowski commented Aug 29, 2016

particularly problematic if dplyr (and tibble) is attached, because of this error

library(dplyr)
path <- "2016-07-27_Site Wide Transducer Data_ID.xlsm"
readxl::read_excel(path = path, sheet = "MW-Morris1A" ) 
Error in x[needs_ticks] <- paste0("`", gsub("`", "\\\\`", x[needs_ticks]),  : 
  NAs are not allowed in subscripted assignments

with a fresh session

names(readxl::read_excel(path = path, sheet = "MW-Morris1A" ))
 [1] "Well IDs"                                 "Elapsed Time"                             "SN#: 417685                             " ""                                         "SN#: 417685                             "
 [6] ""                                         "SN#: 417685"                              "14.99"                                    "From Barotroll"                           "114-P2B1-MW101S"                         
[11] NA                                         NA                                         NA                                         ""                                         ""                                        
[16] ""
@LiNk-NY

This comment has been minimized.

LiNk-NY commented Nov 4, 2016

I would change the issue title to read_excel returns NA instead of "" colnames of tbl_df

@jennybc

This comment has been minimized.

Member

jennybc commented Nov 5, 2016

I can reproduce this, without dplyr explicitly loaded.

Even then, though, read_excel() succeeds. It's just the printing that fails due to names that need backticks. I assume that's true above as well. A workaround you could use until there's a better solution is to immediately work on the names. The simplest thing is to run them through make.names() and then at least you can start printing the object.

Here's an example w/ a horrific sheet that imports with very disordered names.

library(readxl)
x <- read_excel("~/rrr/gapminder/data-raw/xls/gapdata001-1.xlsx")
#> Warning in read_xlsx_(path, sheet, col_names = col_names, col_types =
#> col_types, : [150, 49]: expecting numeric: got 'see e21: t3 koll for
#> details on problems after 1993.'
x
#> Error in x[needs_ticks] <- paste0("`", gsub("`", "\\\\`", x[needs_ticks]), : NAs are not allowed in subscripted assignments

x2 <- x
names(x2) <- make.names(names(x2))
x2
#> # A tibble: 259 × 276
#>                     Area   NA.
#>                    <chr> <dbl>
#> 1               Abkhazia    NA
#> 2            Afghanistan    NA
#> 3  Akrotiri and Dhekelia    NA
#> 4                Albania    NA
#> 5                Algeria    NA
#> 6         American Samoa    NA
#> 7                Andorra    NA
#> 8                 Angola    NA
#> 9               Anguilla    NA
#> 10   Antigua and Barbuda    NA
#> # ... with 249 more rows, and 274 more variables:
#> #   X1800...type.of.estimate <chr>,
#> #   Ad.hoc.adjustment.of.growth..the.growth.were.adjusted.to.reconcile.the.level.at.a.certain.starting.year.with.the.level.in.2005..The.level.of.the.starting.year.were.assigned.a.certain.level..and.the.subsequent.growth.were.adjusted.accoringly. <chr>,
#> #   NA. <dbl>,
#> #   Years.for.which.we.use.the.average.level.of.the.Maddison.country.groups <chr>,
#> #   Maddison.country.group...For.some.countries..for.some.years..we.used.the.average.for.the.relevant.Maddison.aggregated.county.group..This.data.was.then.adjusted.with.Gapminder.model.2.and.then.multiplied.with.an.arbitrary..spreadout.factor.. <chr>,
I'LL SPARE YOU THE REST
@t-kalinowski

This comment has been minimized.

t-kalinowski commented Nov 8, 2016

Yup, this still bugs me 😀 .

This is how i workaround it these days:

fix_names <- function(x, unique = TRUE) setNames(x, make.names(names(x), unique = unique))

actually, because the strings of .. caused by whitespace annoy me, I often convert them to _.

fix_names <- function(x, unique = TRUE) 
  setNames(x, make.names(names(x)) %>%
             gsub("\\.+", "_", .) %>%
             make.names(unique = unique))

This, along with issue #82 has me avoiding attachingreadxl in my analysis scripts, instead defining my own local version with these workarounds...

read_excel <-  function(..., col_names = FALSE, ignore_warnings = FALSE) {
  quiet_read <- purrr::quietly(readxl::read_excel)
  out <- quiet_read(..., col_names = col_names)
  if (ignore_warnings) 
    out[["warnings"]] <- character()
  if (length(c(out[["warnings"]], out[["messages"]])) == 0)
    return(fix_names(out[["result"]]))
  else readxl::read_excel(..., col_names = col_names)
}
@LiNk-NY

This comment has been minimized.

LiNk-NY commented Nov 8, 2016

@jennybc Thanks for the help!
make.names and make.unique should be used sparingly (although tidy data shouldn't require them!).
The tidyverse likes to keep names as is and I think this is the right way to go. There should be a warning for any names that are empty and converted via make.names.

@jennybc

This comment has been minimized.

Member

jennybc commented Nov 8, 2016

Yes, I agree that readxl should handle such names and handle them like other pkgs in the tidyverse.

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Nov 8, 2016

I agree that there should be as little munging of names as possible. To solve the tibble printing issues, missing names should be left as "".

However, I would prefer readxl not become more chatty, or become more "helpful" by calling make.names or some variant of it automatically. I posted my workaround in the hope that it would be helpful to you, not as a suggested change for readxl.

@jennybc jennybc changed the title from Returned object has NA instead of "" names to Returned tibble has NAs in colnames Jan 6, 2017

@jennybc jennybc closed this in 387161d Feb 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment