write_csv: scientific notation cannot be disabled #671

dpprdan · 2017-05-10T15:04:26Z

write_csv() turns (some) longer numbers into scientific notation and there does not seem to be a way to disable it. This has been mentioned before in #229 and apparently was fixed then, so this might be a regression?

library("readr")
df <- data.frame(a = -0.0004029971, b = 0.0412975501857025)
print(df, digits = 17)
#>                         a                    b
#> 1 -0.00040299710000000002 0.041297550185702497
cat(format_csv(df))
#> a,b
#> -4.029971e-4,0.0412975501857025

The problem is I cannot use scientific notation (well) with the tools that import the csv.

Also compare this to this (which should be equivalent IMHO):

format_csv(data.frame(GEOID = seq(from = 60150001022000, to = 60150001022005, 
  1)))
#> [1] "GEOID\n60150001022e3\n60150001022001\n60150001022002\n60150001022003\n60150001022004\n60150001022005\n"

I.e. GEOID\n60150001022e3 instead of GEOID\n60150001022000

The text was updated successfully, but these errors were encountered:

jimhester · 2017-05-10T18:09:56Z

I don't think we will be changing this behavior in the near future (if ever). A workaround you can use is to format the columns before writing. See ?base::format for details on possible formatting arguments.

format_numeric <- function(x, ...) {
  numeric_cols <- vapply(x, is.numeric, logical(1))
  x[numeric_cols] <- lapply(x[numeric_cols], format, ...)
  x
}

library("readr")
df <- data.frame(a = -0.0004029971, b = 0.0412975501857025)
format_csv(format_numeric(df))
#> [1] "a,b\n-0.0004029971,0.04129755\n"

dpprdan · 2017-05-11T09:01:02Z

Thanks! One general question though: Why default to a notation/formatting that, at least to me, seems to be less compatible with other tools? (Even more so when the file format, csv, is arguably one of the most interchangeble/compatible formats.) I guess this is all a matter of perspective, I just would like to understand your design choice.

One addition and one question with respect to the format_numeric fuction: I guess one ought to add the scientific = FALSE option to reliably disable scientific notation, irrespective of options(scipen).

format_numeric_jh <- function(x, ...) {
  numeric_cols <- vapply(x, is.numeric, logical(1))
  x[numeric_cols] <- lapply(x[numeric_cols], format, ...)
  x
}
format_numeric_dpd <- function(x, scientific = FALSE, ...) {
  numeric_cols <- vapply(x, is.numeric, logical(1))
  x[numeric_cols] <- lapply(x[numeric_cols], format, scientific = scientific, ...)
  x
}
df <- data.frame(a = -0.00004029971, b = 0.0412975501857025)
geoid_df <- data.frame(GEOID = seq(from = 60150001022000, to = 60150001022005, 1))
print(df, digits = 18)
#>                         a                    b
#> 1 -4.0299709999999997e-05 0.041297550185702497
library("readr")
format_csv(format_numeric_jh(df))
#> [1] "a,b\n-4.029971e-05,0.04129755\n"
format_csv(format_numeric_jh(geoid_df))
#> [1] "GEOID\n6.015e+13\n6.015e+13\n6.015e+13\n6.015e+13\n6.015e+13\n6.015e+13\n"
# ehm, no
format_csv(format_numeric_dpd(df))
#> [1] "a,b\n-0.00004029971,0.04129755\n"
format_csv(format_numeric_dpd(geoid_df))
#> [1] "GEOID\n60150001022000\n60150001022001\n60150001022002\n60150001022003\n60150001022004\n60150001022005\n"

But how can I reliably preserve precision without hard-coding it with digits or nsmall?

zeehio · 2017-05-17T13:57:37Z

While I am not the one to talk about design decisions maybe I can help explaining the integer limitations (this message may help to understand the problem or workaround it with the bit64 package, it does not provide a specific solution):

In R you store integers as a 32-bit integer number (a <- 3L). This limits you to a maximum number of: a <- 2147483647L (2^31-1). If you try a larger number a <- 2147483648L you will see a warning non-integer value 2147483648L qualified with L; using numeric value and the number will be stored as a double.

Using doubles, 2^53 (9007199254740992)[http://stackoverflow.com/a/1848953/446149] is stored without loss of precision. If you try a higher number you may lose precision depending on its floating point representation. For instance 2^53+1 can't be represented as a double without loss of precision, but 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368 (close to 1.8E308) can.

Another alternative if you need to work with integer numbers below 2^63-1 is to use the bit64 package, that works as expected. This would be a workaround to your problem.

x <- data.frame(a = bit64::as.integer64(60150001022000))
readr::write_csv(x, "/tmp/test.csv")
# cat /tmp/test.csv 
# a
# 60150001022000

Usually doubles (called numeric in R) are used to store very large or very small numbers up to some degree of precision. Using them to store integers is fine, as long as we are aware of the 2^52+1 limit, but then printing those numbers is much more complicated because printing libraries need to be designed having in mind the use case "The user wants to store an integer in a double" or we need to work around that.

dpprdan · 2017-05-19T07:12:25Z

Thanks @zeehio for the explanation. I was not aware of that (in that detail at least).
However, that does not explain why, when I try to store doubles like (a = -0.00004029971, b = 0.0412975501857025) as csv, I either get scientific notation a = -4.029971e-05, loose precision b = 0.04129755, or both, does it?
In any case, this remains true.

dpprdan mentioned this issue May 10, 2017

Loss of precision with csv_write #229

Closed

jimhester closed this as completed May 19, 2017

zeehio mentioned this issue May 20, 2017

Allow control of scientific notation in write_csv #679

Closed

rmcd1024 mentioned this issue May 10, 2018

write_csv treatment of "small" numerical values #845

Closed

lock bot locked and limited conversation to collaborators Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write_csv: scientific notation cannot be disabled #671

write_csv: scientific notation cannot be disabled #671

dpprdan commented May 10, 2017

jimhester commented May 10, 2017

dpprdan commented May 11, 2017

zeehio commented May 17, 2017

dpprdan commented May 19, 2017 •

edited

write_csv: scientific notation cannot be disabled #671

write_csv: scientific notation cannot be disabled #671

Comments

dpprdan commented May 10, 2017

jimhester commented May 10, 2017

dpprdan commented May 11, 2017

zeehio commented May 17, 2017

dpprdan commented May 19, 2017 • edited

dpprdan commented May 19, 2017 •

edited