Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crazy timestamp bug in feather? #351

Closed
randomgambit opened this issue Oct 16, 2018 · 13 comments
Closed

crazy timestamp bug in feather? #351

randomgambit opened this issue Oct 16, 2018 · 13 comments

Comments

@randomgambit
Copy link

Hello the dream team,

Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.

Consider this

import pandas as pd
import feather
import numpy as np


df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
                                  pd.to_datetime('2018-02-01 14:01:00.456'),
                                  pd.to_datetime('2018-03-05 14:01:02.200')]})

df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)

df
Out[17]: 
          string_time_utc           timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200

Here I create the corresponding EST timestamp of my original timestamps (in UTC time).

Now saving the dataframe to csv or to feather will generate two completely different results.

df.to_csv('P://testing.csv')
df.to_feather('P://testing.feather')

Switching to R.

Using the good old csv gives me something a bit annoying, but expected. R thinks my timezone is UTC by default, and wrongly attached this timezone to timestamp_est. No big deal, I can always use with_tz or even better: import as character and process as timestamp while in R.

> dataframe <- read_csv('P://testing.csv')
Parsed with column specification:
cols(
  X1 = col_integer(),
  string_time_utc = col_datetime(format = ""),
  timestamp_est = col_datetime(format = "")
)
Warning message:
Missing column names filled in: 'X1' [1] 
> 
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 4
     X1 string_time_utc         timestamp_est          
  <int> <dttm>                  <dttm>                 
1     0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
2     1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
3     2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
  mytimezone
  <chr>     
1 UTC       
2 UTC       
3 UTC 

Now look at what happens with feather:

> dataframe <- read_feather('P://testing.feather')
> 
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 3
  string_time_utc         timestamp_est           mytimezone
  <dttm>                  <dttm>                  <chr>     
1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 ""        
2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 ""        
3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 ""     

My timestamps have been converted!!! pure insanity.
Am I missing something here?

Thanks!!

@wesm
Copy link
Owner

wesm commented Oct 17, 2018

Can you report in Apache Arrow? Someone will need to investigate once we get R fully moved over from this codebase

@randomgambit
Copy link
Author

hi wes you mean on the JIRA? can you send me the link? thx!

@wesm
Copy link
Owner

wesm commented Oct 17, 2018

@randomgambit
Copy link
Author

done

@wesm
Copy link
Owner

wesm commented Oct 17, 2018

Thanks. It will need a more descriptive / objective title

@randomgambit
Copy link
Author

like "INSANE timestamp ERROR - win 400$ click here" I guess

@wesm
Copy link
Owner

wesm commented Oct 17, 2018

We have hundreds of issues. We need to be able to understand the nature of the problem from the title

@randomgambit
Copy link
Author

sure i was kidding :)

@randomgambit
Copy link
Author

monsieur @wesm , do we have an update about this bug? timestamps are always tricky... thanks!!

@wesm
Copy link
Owner

wesm commented Jan 16, 2019

I'm not sure the status, what does the JIRA issue you created say?

@randomgambit
Copy link
Author

what do you thik, @wesm ? when is .13 being released? Thanks!

@wesm
Copy link
Owner

wesm commented Jan 18, 2019

Please comment on ARROW-3543. I estimate timeline for 0.13 to be end of March

@wesm wesm closed this as completed Jan 18, 2019
Repository owner locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants