New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_csv: Disable scientific notation for integers #765

Merged
merged 2 commits into from Nov 13, 2018

Conversation

Projects
None yet
3 participants
@zeehio
Contributor

zeehio commented Dec 16, 2017

Large integer numbers are not properly handled by write_csv as they are saved in scientific notation losing precision.

This PR is a followup of #679 much more simplified and hopefully closer to what @jimhester has in mind 馃槂

Thanks for your time reviewing it :-)

@zeehio zeehio force-pushed the zeehio:write_csv_allow_long_int_without_scientific branch from 6e193da to 7106735 Dec 16, 2017

@zeehio zeehio force-pushed the zeehio:write_csv_allow_long_int_without_scientific branch from 7106735 to f032ab7 Dec 17, 2017

@simecek

This comment has been minimized.

simecek commented Jan 6, 2018

I have run into a similar problem with Kaggle competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

The input file train_2.csv contains Wikipedia traffic data, i.e. how many times the given page (in rows) was accessed on a given date (in columns). The read_csv function warned me that 37603th row was not parsed properly. That happened because of "1e+05" that parse_integer fails to parse as demonstrated below:

> parse_integer(c("1e05", "2", "3"))
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col               expected actual expected   <int> <int>                  <chr>  <chr> actual 1     1    NA no trailing characters    e05
[1] NA  2  3
attr(,"problems")
# A tibble: 1 x 4
    row   col               expected actual
  <int> <int>                  <chr>  <chr>
1     1    NA no trailing characters    e05 
> parse_double(c("1e05", "2", "3"))
[1] 1e+05 2e+00 3e+00

I am not sure how train_2.csv file was created and whether it is a problem of write_csv or not. However something similar is to be expected for any large file of integers just by a pure chance.

Inability to write data with write_csv and reading them back with read_csv (without specification of column types) might be challenging for beginners. I would suggest either accept this pull request or modify parse_integer to accept a scientific notation.

@zeehio

This comment has been minimized.

Contributor

zeehio commented Sep 30, 2018

This PR closes #845 :-)

@jimhester jimhester merged commit 213eb0e into tidyverse:master Nov 13, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jimhester

This comment has been minimized.

Member

jimhester commented Nov 13, 2018

Thanks for the PR and for your patience in getting it merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment