You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If col_types = NULL, and a column has the first 1000 rows with a value of NA, read_csv specifies the column as logical.
(As an aside, this seems to have changed in the last several months, as I discovered this with old code that used to yield these columns as character type).
This seems like a behavior that is prone to errors. If we only know that the first 1000 are NA, there is no reason to assume what comes later will be logical. The most robust handling of this situation would be to treat it as character so that there is no risk of parsing failures that coerce values to fit some other class.
Here's a test file - basically just CSV with 1200 rows of "NA" then a row with "TEST" natest.csv.zip
require(readr)
#> Loading required package: readr
read_csv("~/temp/natest.csv")
#> Parsed with column specification:#> cols(#> numbers = col_double(),#> testcol = col_logical()#> )#> Warning: 1 parsing failure.#> row col expected actual file#> 1290 testcol 1/0/T/F/TRUE/FALSE TEST '~/temp/natest.csv'#> # A tibble: 1,290 x 2#> numbers testcol#> <dbl> <lgl> #> 1 1. NA #> 2 2. NA #> 3 3. NA #> 4 4. NA #> 5 5. NA #> 6 6. NA #> 7 7. NA #> 8 8. NA #> 9 9. NA #> 10 10. NA #> # ... with 1,280 more rows
The text was updated successfully, but these errors were encountered:
jzadra
changed the title
read_csv column type guess for 1000 rows of "NA" is logical - should be character
read_csv column type guess for 1000 rows of "NA" is logical - should be character?
May 2, 2018
This is a considered decision and, I think, should have been this way from the start. If you have no information to go on, the most R-like thing to do is to guess the missing data is logical. This is also critical for later vector-binding or coercion or row-binding, because you can upcast logical NAs but can't downcast character.
Some places to read previous discussion re: why things are the way they are now:
If you know a column should be character, then it's best to express that outright. Or if you want to increase the number of rows used for guessing, you can increase guess_max. Also, cols() allows you to set your own default column type.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
lockbot
locked and limited conversation to collaborators
Oct 31, 2018
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
If
col_types = NULL
, and a column has the first 1000 rows with a value of NA,read_csv
specifies the column as logical.(As an aside, this seems to have changed in the last several months, as I discovered this with old code that used to yield these columns as character type).
This seems like a behavior that is prone to errors. If we only know that the first 1000 are NA, there is no reason to assume what comes later will be logical. The most robust handling of this situation would be to treat it as character so that there is no risk of parsing failures that coerce values to fit some other class.
Here's a test file - basically just CSV with 1200 rows of "NA" then a row with "TEST"
natest.csv.zip
Created on 2018-05-02 by the reprex package (v0.2.0).
The text was updated successfully, but these errors were encountered: