Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timezone in metadata of DateTime #190

Merged
merged 6 commits into from Jul 11, 2021

Conversation

frank-king
Copy link
Contributor

@frank-king frank-king commented Jul 10, 2021

After #166, TB has supported a global default timezone for the DateTime type, but have not supported the DateTime type with tz in metadata.

This PR is to support Datetime type with timezone in metadata.

First, I added a type struct TimeZoneId(u16), a wrapper of some variant in enum chrono_tz::Tz, to store the timezone in BqlType::DateTime for the space-efficiency consideration.

When the user create a table with a column (for example, who's type is DateTime('Etc/GMT-8')), it is parsed as a timezone id, stored in BqlType, and passed to A-DF as type Timestamp32(Some(BaseTimeZone(<tz_id>, <offset>))) where the tz_id corresponds to 'Etc/GMT-8'. The offset is calculated in ahead and stored here for fast lookup.

The CH functions in A-DF which accepts a DateTime with or without a tz uses Timestamp32(None) as its signature. During type coercion, Timestamp32 with or without timezones are "casted" to the same Timestamp32(None) type without any tz offsets applied (i.e. timezones here are ignored).

Then during datetime (CH) functions in A-DF, temporal arguments passed in are treated as timestamps with the timezone that is stored in the input schema.

Finally the timezones can be resolved correctly.


Curerntly , timezones of data values are ignored, only the tz in schema are resolved. However, values can have their own timezone, such as

select toTimeZone(toDateTime('2021-01-01 00:00:00', 'Etc/GMT-8'), 'UTC')

We can solve this problem in the next PR.

Signed-off-by: Frank King <frankking1729@gmail.com>
Signed-off-by: Frank King <frankking1729@gmail.com>
Signed-off-by: Frank King <frankking1729@gmail.com>
…imezones

Signed-off-by: Frank King <frankking1729@gmail.com>
…ent timezones

Note: only timezone of input schema is resolved. Timezone in `Timestamp32`
values are ignored.

Signed-off-by: Frank King <frankking1729@gmail.com>
Signed-off-by: Frank King <frankking1729@gmail.com>
@jinmingjian
Copy link
Contributor

@whjpji this is great!

I like to discuss one my old idea: I personally think DateTime('SomeRegion/SomeCity') is a little counter-intuitive to use. Because 'SomeRegion/SomeCity' is not arbitrary, only tiny set of city names is accepted. You must lookup a table to correctly setup timezone parameter. Is it better that we support both DateTime('SomeRegion/SomeCity') and DateTime(offset in i32), and we recommend the second form, similarly applied to other kinds of datetime functions, like

select toTimeZone(toDateTime('2021-01-01 00:00:00', 3600*8), 0)

Further more, if we can only use some BaseTimeZoneId(i16) as the only core internal concept, i16 may be an shifted offset-in-second(for perf). The string name or offset in second is just derived from this BaseTimeZoneId.

How do you think so:)

@jinmingjian
Copy link
Contributor

@whjpji but my idea is beyond this PR. Let's have a look today to see if this is good to merge:)

@frank-king
Copy link
Contributor Author

@jinmingjian I agree with you. Supporting a straight forward offset is easier to use. Currently, GMT timezones, based on the offsets in hours, are already available. More precisely, if we want to support offsets which are not integral times of hours, we can consider the A-DF's approach:

An absolute time zone offset of the form '+XX:XX' or '-XX:XX', such as '+07:30'

It is not difficult to support it. Simply use larger values of u16 in the TimeZoneId (such as 0x80xx). Consider only supporting the offsets that is a multiple of 15 minutes, there are 24 hours * 4 quarters * 2 signs = 192 values needed. An i8 embedded in the lower byte of u16 is enough.

@frank-king
Copy link
Contributor Author

@jinmingjian There might be one thing to pay attention is that, some timezones are not so simple as where only a time offset is applied, but might have more complicated rules (such as the daylight saving time). Whether and how we handle those timezones needs our further thinking.

@jinmingjian
Copy link
Contributor

@whjpji great! '+07:30' is better! We do not need add a variant for types. We can support it and recommend it before 'region/city'.
For DST, you are right. But:

Only a minority of the world's population uses DST; Asia and Africa generally do not observe it.

DST clock shifts sometimes complicate timekeeping and can disrupt travel, billing, record keeping, medical devices, and sleep patterns. Computer software generally adjusts clocks automatically.

We can still use i16 as the internal representation for furthermore if we really want to support DST.

@jinmingjian
Copy link
Contributor

@whjpji considering all tests passed, I firstly merge this PR. And you can refactor if you think the above is the good direction.

@jinmingjian jinmingjian merged commit 4b46809 into tensorbase:main Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants