-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timezone of DateTime are not correctly processed #27
Comments
Have read the documentation of CH [1], I found that the default timezone of a CH server is configurable. Has TB supported this yet? If yes, we might need to consider how to pass this parameter to DF. Another question, from [2] we know that [1] https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#server_configuration_parameters-timezone |
ok, nice summary! All we want to have:) We have auto-discovered default tz in mgmt.rs which is easy to extend a configurable option in conf. But this is not high priority. I list the tasks in a priority (welcome to add or recorrect): priority high:
priority low(not necessarily solved in near future):
|
To support the default tz detected by TB, we should correctly pass the tz configuration to DF right, I think. And that might be the first thing to do. Then in the future, when we are planning to support configurable tz in conf, its easy to modify only the tz detecting process |
@whjpji right. The main thing is how to make DF operations tz aware. |
@jinmingjian I got stuck, could you please give me some advises about how to represent the default tz in DF? Should I pass the default tz via a default catalog or schema (or maybe information schema? I'm not very familiar with these concepts) into DF? i.e. In let config = ExecutionConfig::new()
.create_default_catalog_and_schema(true)
.with_default_catalog_and_schema(/* ... */);
let mut ctx = ExecutionContext::with_config(config); I'm afraid this might bring extra cost if the engine rapidly access this parameter when doing query. Another way might be using a Which way do you prefer? Or do you have any better ideas? [1]
|
@whjpji thanks for your exploring. For your passing problem, this is unimportant for whole big task. What we should figure out this how or does A-DF correctly process the tz info? This answer is important, in that I see rare tz handling logic in A-DF. Another talking, for example, is this one recent issue in Arrow. If A-DF does not respect the tz well, then we need to find a way to solve this problem by ourself. That's the biggest problem.
We first figure out how "the engine access this parameter", then we can understand the cost of this accessing. |
@whjpji more thoughts: |
@jinmingjian I agree with you. My mind graph is "pass tz to DF, and then make DF handle tz right", so the first thing came up with my mind was "pass tz to DF". However the latter thing is more important, so my thoughts were too straight. I will read codes of A-DF more carefully and do some experiments on it, to find out whether it can handle the tz right, and if yes, then how. |
@jinmingjian Unfortunately, arrow does not support tz yet. The codes show that:
We have to support tz ourselves, by making changes in A-DF. However, unlike #154, we cannot make all the changes in a dedicated mod, and these changes might be strongly coupled with all parts of A-DF. Should we directly make PRs on the arrow-rs repo and backport them to TB? I think It is better if these PRs can keep track with and be reviewed by the arrow's community. What do you think of this question? [1]
[2]
[3]
[4]
|
Oh, my last comment seemed to talk about another question: arrow does not support tz in date/time types. Return back to our original question: how to make arrow support a global tz offset based on the server's tz or in the future the tz conf from the configuration file. To make TB compatible with CH, not only the tz in date/time types should not be ignored, but also there should be a global tz offset stored somewhere. In which way do you think is better to introduce this global tz offset? Via the system schema or a global static variable somewhere? |
@whjpji If you confirm that A-DF does not support tz, I suggest we firstly try to patch at the planner side. Patch the TB own Timestamp32 firstly: if we meet a Timestamp32, we patch that column with +/- global tz offset from mgnt as an expression in planning. How do you think so?:) |
@whjpji correct: you can not use tz from mgnt, otherwise there is an inter-dependence. You setup a global tz in DF, and sync mgnt's tz to that in DF at boot or some time. |
@jinmingjian I have confirmed that A-DF does not support tz. First by default, my timezone is Etc/GMT+8 (Asia/Shanghai), and I create a table and insert a timestamp into it:
Then I set my timezone to Etc/GMT-5 (America/Los_Angeles), and requery the timestamp
I got the same answer. However, the timestamp displayed should vary with different server timezones, which is not expected CH's behavior. I will try fixing it in this weekend. |
@whjpji your this method does not confirm the A-DF behavior. Because the back type of DateTime is Timestamp32, which is TB new-added type. The good way is, you use/change a unit test in DF to confirm. A-DF may ignore tz now in computation as I and you see, but the true concern is if there are some preset slot in the temporal fn to enable the tz without patching. For a idea:
this ignore the tz, but you can tweak the logic into
|
@whjpji if my idea is established, you can tweak the Timestamp32(Option< String >) -> Timestamp32(Option< Int32 >) something like for performance, because it is not necessary to parse the offset from string for every function call. |
@jinmingjian Thanks for your correction, I didn't know that
I agree with you, and we can tweek the
I think it is not enough to store only an offset in the data type, because the tz name is also needed in some functions. Here is an example from the CH's document: SELECT toDateTime('2019-01-01 00:00:00', 'UTC') AS time_utc,
toTypeName(time_utc) AS type_utc,
toInt32(time_utc) AS int32utc,
toTimeZone(time_utc, 'Asia/Yekaterinburg') AS time_yekat,
toTypeName(time_yekat) AS type_yekat,
toInt32(time_yekat) AS int32yekat,
toTimeZone(time_utc, 'US/Samoa') AS time_samoa,
toTypeName(time_samoa) AS type_samoa,
toInt32(time_samoa) AS int32samoa
FORMAT Vertical; And the result will be:
If only the tz offset is stored in the metadata, the column What if we make the BTW, all the changes made to |
But there is still a problem. If no tz specified when inserting into a table (named But if the server restarted with the tz changed (saying GMT+5), should the timestamp stored in If the tz interpretation changed as the server's tz changed, the above strategy will be incorrect to handle this. And the tz might be handled in such way: Timestamp32(None) => to_some_part(array) // parse the array with default tz provided by TB,
Timestamp32(Some(Utc)) => to_some_part(array, Utc) // parse the array with Utc (which is the current behavior of A-DF),
Timestamp32(Some(tz)) => to_some_part(array, tz) // for other tz I have no experience on using CH. What do you think of this problem? |
@whjpji all considerations are great! It is not necessary to keep 100% same to CH, but keep possible compatibility still wanted.
yeah, try it.
if the column's (A's) type also has no tz, the timestamp stored in A be interpreted as time with tz of GMT+5 in kinds of datetime calculations, but the timestamp itself is always unix timestamp/epoch in CH.
this treatment is unnecessary as above mentioned. |
We have committed a primary solution in #166. This makes the timezone behavior of TB compatible with that of CH. This behavior may be improved in the future. For now, I just close this issue. |
the logic in arrow/DF ignores timezone, but CH re-interpret the presentation according to the server's timezone.
The text was updated successfully, but these errors were encountered: