read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

peterdesmet · 2021-07-09T17:22:08Z

Consider a csv file where a column contains the literal values NaN, INF or -INF (case ignored).

When col_type is undefined, then read_csv() will read values correctly and set column to double
When col_type = n, then read_csv() will throw a warning, set values to NA and set column to double
When col_type = d, then read_csv() will read the values correctly and set column to double

Is this by design?

Reprex:

library(readr)
file <- "https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv"
read_csv(file)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   num = col_double(),
#>   num_nan = col_double(),
#>   num_inf = col_double(),
#>   num_ninf = col_double()
#> )
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3     NaN     Inf     -Inf
#> 2     3     NaN     Inf     -Inf
#> 3     3       3       3        3
read_csv(file, col_types = "nnnn")
#> Warning: 6 parsing failures.
#> row      col expected actual                                                                                                                                       file
#>   1 num_nan  a number   NaN  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   1 num_inf  a number   INF  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   1 num_ninf a number   -INF 'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   2 num_nan  a number   nan  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#>   2 num_inf  a number   inf  'https://gist.githubusercontent.com/peterdesmet/9525d4aaefc109230f162164e5d66e23/raw/fda84bf3b924d37b926eaf1b80b794ff0a119dca/numbers.csv'
#> ... ........ ........ ...... ..........................................................................................................................................
#> See problems(...) for more details.
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3      NA      NA       NA
#> 2     3      NA      NA       NA
#> 3     3       3       3        3
read_csv(file, col_types = "dddd")
#> # A tibble: 3 x 4
#>     num num_nan num_inf num_ninf
#>   <dbl>   <dbl>   <dbl>    <dbl>
#> 1     3     NaN     Inf     -Inf
#> 2     3     NaN     Inf     -Inf
#> 3     3       3       3        3

The text was updated successfully, but these errors were encountered:

jimhester · 2021-07-09T17:46:57Z

Yes this is expected, col_type = "d" uses native IEEE 754 double parser, which understands these special values, whereas col_type = "n" is a custom flexible number parser specific to readr which is intended for more human generated types of numbers, e.g. those with thousand separators etc. and does not.

peterdesmet · 2021-07-09T17:54:37Z

Thanks! Does this mean that a column containing a mix of human generated numbers $100, 10,000 and INF, NaN cannot be parsed without throwing warnings?

jimhester · 2021-07-09T18:04:19Z

yeah, I would maybe suggest you import these columns as characters and then use a custom function to parse them after reading.

peterdesmet · 2021-07-12T12:13:56Z

Thanks, I will switch between col_number() and col_double() depending on the use case.

peterdesmet closed this as completed Jul 12, 2021

slodge mentioned this issue Aug 18, 2021

In v2, quoted NaN values force a column to character #1277

Closed

khusmann mentioned this issue Dec 7, 2023

type_convert() does not parse IEEE 754 double values (NaN, Inf, -Inf) #1526

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

peterdesmet commented Jul 9, 2021

jimhester commented Jul 9, 2021

peterdesmet commented Jul 9, 2021

jimhester commented Jul 9, 2021

peterdesmet commented Jul 12, 2021

read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

read_csv() throws warning for NaN, INF, -INF when col_type set to number #1225

Comments

peterdesmet commented Jul 9, 2021

jimhester commented Jul 9, 2021

peterdesmet commented Jul 9, 2021

jimhester commented Jul 9, 2021

peterdesmet commented Jul 12, 2021