New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to parse dates from long numbers #302

Merged
merged 1 commit into from Feb 5, 2015

Conversation

Projects
None yet
2 participants
@jiho
Contributor

jiho commented Jan 30, 2015

With as.character(), long numbers get converted to scientific
notation which the parse of course fails to recognise. Using
format(..., scientific = FALSE) prevents that.

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Jan 30, 2015

Contributor

Should really have run check(). Additional commits fix mistakes.

Contributor

jiho commented Jan 30, 2015

Should really have run check(). Additional commits fix mistakes.

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Jan 30, 2015

Contributor

Example of what used to fail:

> x <- 20140101000000
> as.character(x)
[1] "2.0140101e+13"
> format(x, scientific = FALSE, trim = TRUE)
[1] "20140101000000"
> .num_to_date(x)
[1] "20140101000000"

It seems like a bug in as.character actually but it is probably safer to fix it here rather than wait for as.character to change...

Contributor

jiho commented Jan 30, 2015

Example of what used to fail:

> x <- 20140101000000
> as.character(x)
[1] "2.0140101e+13"
> format(x, scientific = FALSE, trim = TRUE)
[1] "20140101000000"
> .num_to_date(x)
[1] "20140101000000"

It seems like a bug in as.character actually but it is probably safer to fix it here rather than wait for as.character to change...

@vspinu

This comment has been minimized.

Show comment
Hide comment
@vspinu

vspinu Jan 30, 2015

Member

Thanks. Could you please add a small test for this?

Could you also squash your commits into one? We try to stick to one-commit-per-change rule to make it easier to navigate the history.

Member

vspinu commented Jan 30, 2015

Thanks. Could you please add a small test for this?

Could you also squash your commits into one? We try to stick to one-commit-per-change rule to make it easier to navigate the history.

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Jan 30, 2015

Contributor

Regarding performance, format would actually be even a bit better:

> x <- round(runif(10^7) * 100000)
> system.time(as.character(x))
   user  system elapsed 
  7.786   0.015   7.816 
> system.time(format(x, scientific = FALSE, trim = TRUE))
   user  system elapsed 
  6.412   0.013   6.431 
Contributor

jiho commented Jan 30, 2015

Regarding performance, format would actually be even a bit better:

> x <- round(runif(10^7) * 100000)
> system.time(as.character(x))
   user  system elapsed 
  7.786   0.015   7.816 
> system.time(format(x, scientific = FALSE, trim = TRUE))
   user  system elapsed 
  6.412   0.013   6.431 
@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Jan 30, 2015

Contributor

Tests are added and everything is squashed into one commit

Contributor

jiho commented Jan 30, 2015

Tests are added and everything is squashed into one commit

Show outdated Hide outdated inst/tests/test-parsers.R
@@ -156,6 +156,20 @@ test_that("ymd functions correctly parse dates with no separators and no quotes"
equals(as.POSIXct("2010-01-02 23:59:59", tz = "UTC")))
})
test_that("numbers are correctly converted into character for parsing", {
# numbers with 000000 at the end are wrongly converted by as.character

This comment has been minimized.

@vspinu

vspinu Jan 30, 2015

Member

They are not wrongly converted. That's the feature which is customized by scipen option.

@vspinu

vspinu Jan 30, 2015

Member

They are not wrongly converted. That's the feature which is customized by scipen option.

Show outdated Hide outdated inst/tests/test-parsers.R
equals(as.POSIXct("2010-01-02 00:00:00", tz = "UTC")))
expect_that(.num_to_date(20100102000000),
equals("20100102000000"))
# check for a fix in as.character

This comment has been minimized.

@vspinu

vspinu Jan 30, 2015

Member

I don't think we need these tests. as.character is not broken and will never be fixed.

@vspinu

vspinu Jan 30, 2015

Member

I don't think we need these tests. as.character is not broken and will never be fixed.

@vspinu

This comment has been minimized.

Show comment
Hide comment
@vspinu

vspinu Jan 30, 2015

Member

This PR made me think, are there other places as.character will fail similarly?

Member

vspinu commented Jan 30, 2015

This PR made me think, are there other places as.character will fail similarly?

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Feb 2, 2015

Contributor

Potential problems:

  • R/parse.r l. 455 as.character is used with randomly generated numbers which may end with many zeros by chance.
  • inst/tests/test-parser.r l. 520 same as above.
  • R/parse.r l. 489, as.character is used when it is not necessary (.num_to_date always returns a character)

The other uses in the source code do not seem problematic to me.

Contributor

jiho commented Feb 2, 2015

Potential problems:

  • R/parse.r l. 455 as.character is used with randomly generated numbers which may end with many zeros by chance.
  • inst/tests/test-parser.r l. 520 same as above.
  • R/parse.r l. 489, as.character is used when it is not necessary (.num_to_date always returns a character)

The other uses in the source code do not seem problematic to me.

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Feb 2, 2015

Contributor

I've addressed the comments above. Let me know what you think. I guess I'll need to squash the commits again but I'll rather do that when all is accepted.

Contributor

jiho commented Feb 2, 2015

I've addressed the comments above. Let me know what you think. I guess I'll need to squash the commits again but I'll rather do that when all is accepted.

@vspinu

This comment has been minimized.

Show comment
Hide comment
@vspinu

vspinu Feb 2, 2015

Member

R/parse.r l. 455 as.character is used with randomly generated numbers which may end with many zeros by chance.

Probably not an issue as as.character is applied to POSIXct object. as.character(.POSIXct(1000000000000)) works as expected.

R/parse.r l. 489, as.character is used when it is not necessary (.num_to_date always returns a character)

Good catch. That makes me think that .num_to_date is a misnomer. It doesn't convert to date.

I've addressed the comments above. Let me know what you think. I guess I'll need to squash the commits again but I'll rather do that when all is accepted.

Yes. Looks good. Please squash. Instead of adding new commits you can always amend previous commit and then force push.

Thanks.

Member

vspinu commented Feb 2, 2015

R/parse.r l. 455 as.character is used with randomly generated numbers which may end with many zeros by chance.

Probably not an issue as as.character is applied to POSIXct object. as.character(.POSIXct(1000000000000)) works as expected.

R/parse.r l. 489, as.character is used when it is not necessary (.num_to_date always returns a character)

Good catch. That makes me think that .num_to_date is a misnomer. It doesn't convert to date.

I've addressed the comments above. Let me know what you think. I guess I'll need to squash the commits again but I'll rather do that when all is accepted.

Yes. Looks good. Please squash. Instead of adding new commits you can always amend previous commit and then force push.

Thanks.

Allow to parse dates from long numbers
With as.character(), long numbers with many zeros at the end get
converted to scientific notation which the parse of course fails to
recognise. Using format(..., scientific = FALSE) prevents that.

trim = TRUE is necessary for variable length input.

Add tests for these problematic numbers. Tests also check wether
the behaviour of as.character() changes but format()
outperforms as.character() and may be preferred anyway.
@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Feb 5, 2015

Contributor

Squashed commits

Contributor

jiho commented Feb 5, 2015

Squashed commits

vspinu added a commit that referenced this pull request Feb 5, 2015

Merge pull request #302 from jiho/fix_parse_numeric_dates
Allow to parse dates from long numbers

@vspinu vspinu merged commit 7e62fb9 into tidyverse:master Feb 5, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@vspinu

This comment has been minimized.

Show comment
Hide comment
@vspinu

vspinu Feb 5, 2015

Member

👍 Thanks!

Member

vspinu commented Feb 5, 2015

👍 Thanks!

@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho Feb 5, 2015

Contributor

You are welcome. Happy to help.

Contributor

jiho commented Feb 5, 2015

You are welcome. Happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment