Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

Closed
peterdesmet opened this issue Jul 9, 2021 · 4 comments
Closed

Comments

@peterdesmet
Copy link
Contributor

Consider a csv file where a column contains the literal values NaN, INF or -INF (case ignored).

  • When col_type is undefined, then read_csv() will read values correctly and set column to double
  • When col_type = n, then read_csv() will throw a warning, set values to NA and set column to double
  • When col_type = d, then read_csv() will read the values correctly and set column to double

Is this by design?

Reprex:

library(readr)
file <- "https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv"
read_csv(file)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   num = col_double(),
#>   num_nan = col_double(),
#>   num_inf = col_double(),
#>   num_ninf = col_double()
#> )
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3     NaN     Inf     -Inf
#> 2     3     NaN     Inf     -Inf
#> 3     3       3       3        3
read_csv(file, col_types = "nnnn")
#> Warning: 6 parsing failures.
#> row      col expected actual                                                                                                                                       file
#>   1 num_nan  a number   NaN  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   1 num_inf  a number   INF  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   1 num_ninf a number   -INF 'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   2 num_nan  a number   nan  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   2 num_inf  a number   inf  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#> ... ........ ........ ...... ..........................................................................................................................................
#> See problems(...) for more details.
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3      NA      NA       NA
#> 2     3      NA      NA       NA
#> 3     3       3       3        3
read_csv(file, col_types = "dddd")
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3     NaN     Inf     -Inf
#> 2     3     NaN     Inf     -Inf
#> 3     3       3       3        3
@jimhester
Copy link
Collaborator

Yes this is expected, col_type = "d" uses native IEEE 754 double parser, which understands these special values, whereas col_type = "n" is a custom flexible number parser specific to readr which is intended for more human generated types of numbers, e.g. those with thousand separators etc. and does not.

@peterdesmet
Copy link
Contributor Author

Thanks! Does this mean that a column containing a mix of human generated numbers $100, 10,000 and INF, NaN cannot be parsed without throwing warnings?

@jimhester
Copy link
Collaborator

yeah, I would maybe suggest you import these columns as characters and then use a custom function to parse them after reading.

@peterdesmet
Copy link
Contributor Author

Thanks, I will switch between col_number() and col_double() depending on the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants