New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore missing/duplicate names if column is skipped #571
Comments
|
You can get the result you want by explicitly skipping that column. Here is one way, but there are some others, such as using library(readr)
read_csv("X1,X2,\nhi,there,\n", col_types = "cc_")
#> Warning: Missing column names filled in: 'X3' [3]
#> # A tibble: 1 × 2
#> X1 X2
#> <chr> <chr>
#> 1 hi there |
|
Hm. It seems like skipping columns always occurs after all data has been read, which is why the warning makes sense if you know how The same thing is true for skipping columns in arbitrary positions if they don't have values at all, for example: This results in two warnings because first the missing column automatically gets renamed to Considering the behavior above, I was expecting to supply 3 column names - but this doesn't work and I only have to specify the names for the used columns. I'm really just starting to use |
|
I think this is a problem to do with automatically renaming columns that are then skipped. An option to skip consecutive delimiters seems dangerous to me. library(readr)
read_csv("X1,\nhi", col_types = "c_")
#> Warning: Missing column names filled in: 'X2' [2]
#> Warning: 1 parsing failure.
#> row col expected actual
#> 1 -- 2 columns 1 columns
#> # A tibble: 1 × 1
#> X1
#> <chr>
#> 1 hi
read_csv("X2,\nhi", col_types = "c_")
#> Warning: Missing column names filled in: 'X2' [2]
#> Warning: Duplicated column names deduplicated: 'X2' => 'X2_1' [2]
#> Warning: 1 parsing failure.
#> row col expected actual
#> 1 -- 2 columns 1 columns
#> # A tibble: 1 × 1
#> X2
#> <chr>
#> 1 hi |
|
There is a bit of a chicken and egg problem here, standardising column types needs column names sorted out first, but if column names depend on skipped columns It can be done I am sure, but will likely take some refactoring of |
|
FWIW I have the same problem in readxl. Also unsolved. We should talk/commiserate about this @jimhester, to harmonize the solutions as much as possible. |
|
I just stumbled over this issue again. I'm reading a CSV file with an extra delimiter at the end of each line (so Short example: Since I explicitly state which columns I want to load, the warning is a bit irritating. Would it be possible to not issue the warning if I haven't explicitly selected it? Otherwise, wrapping everything in Or maybe |
|
I've got the same problem with read_delim |
When I read a file with trailing delimiters,
read_csvspits out a warning that a missing column name was filled in. Is there a way to tell the function that I want to read in all but the last (empty) column so that the warning message is not produced? I don't know how common such (malformed) CSV files are, but an option to ignore trailing delimiters might be useful. I tried to get it to work with thecol_typesargument, but it seems like all columns are read in at first. See also my question on StackOverflow.The text was updated successfully, but these errors were encountered: