Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_csv: Disable scientific notation for integers #765



Copy link

@zeehio zeehio commented Dec 16, 2017

Large integer numbers are not properly handled by write_csv as they are saved in scientific notation losing precision.

This PR is a followup of #679 much more simplified and hopefully closer to what @jimhester has in mind 馃槂

Thanks for your time reviewing it :-)

@zeehio zeehio force-pushed the zeehio:write_csv_allow_long_int_without_scientific branch Dec 16, 2017
@zeehio zeehio force-pushed the zeehio:write_csv_allow_long_int_without_scientific branch to f032ab7 Dec 17, 2017
Copy link

@simecek simecek commented Jan 6, 2018

I have run into a similar problem with Kaggle competition:

The input file train_2.csv contains Wikipedia traffic data, i.e. how many times the given page (in rows) was accessed on a given date (in columns). The read_csv function warned me that 37603th row was not parsed properly. That happened because of "1e+05" that parse_integer fails to parse as demonstrated below:

> parse_integer(c("1e05", "2", "3"))
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col               expected actual expected   <int> <int>                  <chr>  <chr> actual 1     1    NA no trailing characters    e05
[1] NA  2  3
# A tibble: 1 x 4
    row   col               expected actual
  <int> <int>                  <chr>  <chr>
1     1    NA no trailing characters    e05 
> parse_double(c("1e05", "2", "3"))
[1] 1e+05 2e+00 3e+00

I am not sure how train_2.csv file was created and whether it is a problem of write_csv or not. However something similar is to be expected for any large file of integers just by a pure chance.

Inability to write data with write_csv and reading them back with read_csv (without specification of column types) might be challenging for beginners. I would suggest either accept this pull request or modify parse_integer to accept a scientific notation.

Copy link
Contributor Author

@zeehio zeehio commented Sep 30, 2018

This PR closes #845 :-)

@jimhester jimhester merged commit 213eb0e into tidyverse:master Nov 13, 2018
0 of 2 checks passed
0 of 2 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
continuous-integration/travis-ci/pr The Travis CI build is in progress
Copy link

@jimhester jimhester commented Nov 13, 2018

Thanks for the PR and for your patience in getting it merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants