Empty column names need to be given unique names #364

jabranham · 2016-02-14T16:06:05Z

I used readr::read_csv() to import a csv file into R, which resulted in no errors. Later, when trying to run lm() on the data i got the error

Error in terms.formula(formula, data = data) : 
  attempt to use zero-length variable name

I found this question/answer on stackexchange, which seems to indicate that other people are having the same issue.

http://stackoverflow.com/questions/31385976/error-attempt-to-use-zero-length-variable-name

The text was updated successfully, but these errors were encountered:

hadley · 2016-03-02T01:45:04Z

Could you please provide a minimal reproducible example (preferably using a small made up dataset)?

jabranham · 2016-03-03T21:02:31Z

Yes, this is so simple that I feel like I'm just making some sort of silly mistake. But here is code that replicates it for me.

set.seed(2904)

thedata <- data.frame(
  x = rnorm(100),
  y = rnorm(100, 3, 270),
  groups = rep(1:5))

write.csv(thedata, "examplecsv.csv")

thedata2 <- readr::read_csv("examplecsv.csv")

lm(y ~ x * groups, data = thedata2)

And the results of sessionInfo, if you want it:

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.2.3 readr_0.2.2    tools_3.2.3    Rcpp_0.12.3

jennybc · 2016-03-03T21:24:27Z

The write.csv() command puts row names into the file and readr::read_csv() brings them back in as a variable with the empty string as a name. Which apparently lm() doesn't like, even though the model has nothing to do with that variable.

You could prevent the writing of row names via write.csv(thedata, "examplecsv.csv", row.names = FALSE) or prevent the reading of them via readr::read_csv("examplecsv.csv", col_types = "_nni"). Or drop them once you read the data in, thedata2[names(thedata2) != ""] but before you fit the model.

Which, I now see, is pretty much what the SO answers say.

Look for a variable named "" in your data and either drop it or rename it.

UPDATE: Hey, I recognize you from Austin!

jabranham · 2016-03-04T15:43:51Z

Ah, that makes sense. Might it make sense for read_csv to name the rownames something if they aren't named (like row_names)? When I use utils::read.csv() it names them X.

And why in the world does lm complain about variables that aren't in the model? That's just weird.

Yup, I was at your talk in Austin. I've taught a few people git and show them your "burn it all down" slides.

Update: This is somewhat related to tidyverse/dplyr#1576

jennybc · 2016-03-04T16:26:12Z

I think @hadley might refine this particular point of the readr philosophy: "Column names are left as is". Maybe an exception will be made to populate missing names? And maybe the empty-string-as-variable-name would get the same treatment.

hadley · 2016-03-04T16:27:50Z

Yes, this will definitely get fixed - missing/empty col names need some repair because they cause so many downstream problems, and it's never helpful to maintain the missingness.

Will probably adopt some convention like _missing_1, _missing_2 for missing names.

hadley · 2016-06-02T10:36:10Z

We now have:

read_csv(",,\n1,2,3")
#> Source: local data frame [1 x 3]
#> 
#>      X1    X2    X3
#>   <int> <int> <int>
#> 1     1     2     3

But this probably needs an explicit warning

hadley · 2016-07-13T15:37:14Z

It doesn't seem useful to generate an invalid data frame, so now both missing and duplicated column names get an automatic fix and a warning:

x1 <- read_csv(",,\n1,2,3")
#> Warning: Missing column names filled in: 'X1' [1], 'X2' [2], 'X3' [3]
x2 <- read_csv("x,x,x\n1,2,3")
#> Warning: Duplicated column names deduplicated: 'x' => 'x_1' [2], 'x' =>
#> 'x_2' [3]
x3 <- read_csv("X2,\n1,2")
#> Warning: Missing column names filled in: 'X2' [2]
#> Warning: Duplicated column names deduplicated: 'X2' => 'X2_1' [2]

cluoma · 2016-08-26T09:40:43Z

Is it possible to disable 'deduplicating' column names? This is undesired behaviour in my case since the csv is malformed such that the first and second rows together create unique column names.

hadley · 2016-08-26T12:33:29Z

Just use col_names = FALSE

mightypog · 2017-04-17T14:19:25Z

I did a completely low tech thing. I pulled the empty column out of the original spreadsheet and reloaded it. Not a coding solution, but it worked.

hadley changed the title ~~read_csv() later results in error attempt to use zero-length variable name~~ Empty column names need to be given unique names Jun 1, 2016

hadley added feature a feature request or enhancement ready labels Jun 1, 2016

ggranath mentioned this issue Jun 16, 2016

Empty column names need to be given unique names? tidyverse/readxl#182

Closed

hadley modified the milestone: 0.3.0 Jul 13, 2016

hadley self-assigned this Jul 13, 2016

hadley closed this as completed in 06330ff Jul 13, 2016

hadley removed the ready label Jul 13, 2016

tshynik mentioned this issue Jun 20, 2017

Empty column name causes errors after using spread() tidyverse/tidyr#314

Closed

lock bot locked and limited conversation to collaborators Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty column names need to be given unique names #364

Empty column names need to be given unique names #364

jabranham commented Feb 14, 2016

hadley commented Mar 2, 2016

jabranham commented Mar 3, 2016

jennybc commented Mar 3, 2016

jabranham commented Mar 4, 2016

jennybc commented Mar 4, 2016

hadley commented Mar 4, 2016

hadley commented Jun 2, 2016

hadley commented Jul 13, 2016 •

edited

cluoma commented Aug 26, 2016 •

edited

hadley commented Aug 26, 2016

mightypog commented Apr 17, 2017

Empty column names need to be given unique names #364

Empty column names need to be given unique names #364

Comments

jabranham commented Feb 14, 2016

hadley commented Mar 2, 2016

jabranham commented Mar 3, 2016

jennybc commented Mar 3, 2016

jabranham commented Mar 4, 2016

jennybc commented Mar 4, 2016

hadley commented Mar 4, 2016

hadley commented Jun 2, 2016

hadley commented Jul 13, 2016 • edited

cluoma commented Aug 26, 2016 • edited

hadley commented Aug 26, 2016

mightypog commented Apr 17, 2017

hadley commented Jul 13, 2016 •

edited

cluoma commented Aug 26, 2016 •

edited