Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow control of scientific notation in write_csv #679


Copy link

@zeehio zeehio commented May 20, 2017

Issues #671 and #229 show that there are users that want to control how scientific notation is used when writing CSV files.

Currently readr (using the grisu3 C library) decides that integers ending with more than two zeroes are printed using scientific notation in this line of code. This usually makes sense from a readability point of view, but gives non-uniform results in cases like this:

cat(format_csv(data.frame(a = seq(from=1998, to=2002, 1))))

For cases like this, readability is better if scientific notation is avoided.
While using integers instead of doubles already prints numbers correctly, some
users use doubles to store integers that can be larger than 2^32 (see the linked issues).

This pull request offers the possibility of disabling scientific notation on
positive integers. If this option is enabled, integer numbers up to 10^15 will
be printed without scientific notation. The 10^15 limit is reasonable, as
2^53+1 (approx 9*10^15) is the first integer that can't be represented as double without losing

cat(format_csv(data.frame(a = seq(from=1998, to=2002, 1)), int_use_scientific = FALSE))

Another loosely related issue is the fact that write_csv does not offer a
scipen-like argument, that some users want because other CSV import tools don't
cope well with scientific notation
link to question (stackoverflow).

This PR also addresses this issue, by offering a scipen argument. Currently the tradeoff
between using fixed point notation or scientific notation is based on a tradeoff
between the length of the number and its readability. With positive values of scipen
we bias the decision to fixed point format (penalizing scientific notation), while
with a negative value we bias towards scientific notation. There are no absolutes,
as with this PR we don't allow to fully disable scientific notation. If we did we would print
very very long (>300 digits) numbers slow to parse.

df <- data.frame(a = c(1E-1, 1E-2, 1E-3, 1E-4))
format_csv(df, scipen = -3)
#[1] "a\n1e-1\n1e-2\n1e-3\n1e-4\n"
format_csv(df, scipen = -2)
#[1] "a\n0.1\n1e-2\n1e-3\n1e-4\n"
format_csv(df, scipen = -1)
#[1] "a\n0.1\n0.01\n1e-3\n1e-4\n"
format_csv(df, scipen = -0)
#[1] "a\n0.1\n0.01\n0.001\n1e-4\n"
format_csv(df, scipen = 1)
#[1] "a\n0.1\n0.01\n0.001\n0.0001\n"

Feel free to review and merge if you feel these features are worth it.

Copy link

Currently i use hack from this post to bypass the issue

Still hope this pull request will be accepted.

@zeehio zeehio force-pushed the write_csv_allow_long_int_without_scientific branch from 48e67f3 to e4633fa Compare October 24, 2017 12:34
Copy link

Thank you for the pull request. I am not sure we want to go down the route of having a number of customization options for writing, part of the idea is to use standardized formatting.

I think we should simply turn off scientific formatting of integers entirely.

@zeehio zeehio closed this Dec 16, 2017
@zeehio zeehio force-pushed the write_csv_allow_long_int_without_scientific branch from e4633fa to 5415cc1 Compare December 16, 2017 12:04
Copy link
Contributor Author

zeehio commented Dec 16, 2017

Thanks for the review, I don't know what I did but I closed the pull request.

Anyway, I will resubmit a pull request to simply turn off scientific formatting of integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants