Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_csv: Disable scientific notation for integers #765

Merged

Conversation

zeehio
Copy link
Contributor

@zeehio zeehio commented Dec 16, 2017

Large integer numbers are not properly handled by write_csv as they are saved in scientific notation losing precision.

This PR is a followup of #679 much more simplified and hopefully closer to what @jimhester has in mind 😃

Thanks for your time reviewing it :-)

@zeehio zeehio force-pushed the write_csv_allow_long_int_without_scientific branch from 6e193da to 7106735 Compare December 16, 2017 12:13
@zeehio zeehio force-pushed the write_csv_allow_long_int_without_scientific branch from 7106735 to f032ab7 Compare December 17, 2017 15:28
@simecek
Copy link

simecek commented Jan 6, 2018

I have run into a similar problem with Kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

The input file train_2.csv contains Wikipedia traffic data, i.e. how many times the given page (in rows) was accessed on a given date (in columns). The read_csv function warned me that 37603th row was not parsed properly. That happened because of "1e+05" that parse_integer fails to parse as demonstrated below:

> parse_integer(c("1e05", "2", "3"))
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col               expected actual expected   <int> <int>                  <chr>  <chr> actual 1     1    NA no trailing characters    e05
[1] NA  2  3
attr(,"problems")
# A tibble: 1 x 4
    row   col               expected actual
  <int> <int>                  <chr>  <chr>
1     1    NA no trailing characters    e05 
> parse_double(c("1e05", "2", "3"))
[1] 1e+05 2e+00 3e+00

I am not sure how train_2.csv file was created and whether it is a problem of write_csv or not. However something similar is to be expected for any large file of integers just by a pure chance.

Inability to write data with write_csv and reading them back with read_csv (without specification of column types) might be challenging for beginners. I would suggest either accept this pull request or modify parse_integer to accept a scientific notation.

@zeehio
Copy link
Contributor Author

zeehio commented Sep 30, 2018

This PR closes #845 :-)

@jimhester jimhester merged commit 213eb0e into tidyverse:master Nov 13, 2018
@jimhester
Copy link
Member

Thanks for the PR and for your patience in getting it merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants