New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty column names need to be given unique names #364
Comments
Could you please provide a minimal reproducible example (preferably using a small made up dataset)? |
Yes, this is so simple that I feel like I'm just making some sort of silly mistake. But here is code that replicates it for me. set.seed(2904)
thedata <- data.frame(
x = rnorm(100),
y = rnorm(100, 3, 270),
groups = rep(1:5))
write.csv(thedata, "examplecsv.csv")
thedata2 <- readr::read_csv("examplecsv.csv")
lm(y ~ x * groups, data = thedata2) And the results of sessionInfo, if you want it: R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.2.3 readr_0.2.2 tools_3.2.3 Rcpp_0.12.3 |
The You could prevent the writing of row names via Which, I now see, is pretty much what the SO answers say. Look for a variable named UPDATE: Hey, I recognize you from Austin! |
Ah, that makes sense. Might it make sense for And why in the world does Yup, I was at your talk in Austin. I've taught a few people git and show them your "burn it all down" slides. Update: This is somewhat related to tidyverse/dplyr#1576 |
I think @hadley might refine this particular point of the |
Yes, this will definitely get fixed - missing/empty col names need some repair because they cause so many downstream problems, and it's never helpful to maintain the missingness. Will probably adopt some convention like |
We now have: read_csv(",,\n1,2,3")
#> Source: local data frame [1 x 3]
#>
#> X1 X2 X3
#> <int> <int> <int>
#> 1 1 2 3 But this probably needs an explicit warning |
It doesn't seem useful to generate an invalid data frame, so now both missing and duplicated column names get an automatic fix and a warning: x1 <- read_csv(",,\n1,2,3")
#> Warning: Missing column names filled in: 'X1' [1], 'X2' [2], 'X3' [3]
x2 <- read_csv("x,x,x\n1,2,3")
#> Warning: Duplicated column names deduplicated: 'x' => 'x_1' [2], 'x' =>
#> 'x_2' [3]
x3 <- read_csv("X2,\n1,2")
#> Warning: Missing column names filled in: 'X2' [2]
#> Warning: Duplicated column names deduplicated: 'X2' => 'X2_1' [2] |
Is it possible to disable 'deduplicating' column names? This is undesired behaviour in my case since the csv is malformed such that the first and second rows together create unique column names. |
Just use |
I did a completely low tech thing. I pulled the empty column out of the original spreadsheet and reloaded it. Not a coding solution, but it worked. |
I used readr::read_csv() to import a csv file into R, which resulted in no errors. Later, when trying to run lm() on the data i got the error
I found this question/answer on stackexchange, which seems to indicate that other people are having the same issue.
http://stackoverflow.com/questions/31385976/error-attempt-to-use-zero-length-variable-name
The text was updated successfully, but these errors were encountered: