Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.time: add Date #18272

Closed
wants to merge 10 commits into from
Closed

std.time: add Date #18272

wants to merge 10 commits into from

Conversation

Vexu
Copy link
Member

@Vexu Vexu commented Dec 13, 2023

Implements #8396

@InKryption
Copy link
Contributor

Haven't looked through thoroughly, but it seems the added Month enum is a duplicate of std.time.epoch.Month.

@xdBronch
Copy link
Contributor

xdBronch commented Dec 13, 2023

i dont have a strong opinion on it but recently there was some discussion about StackFallback having get instead of allocator and imo it makes sense for it to be different. afaik its the only allocator getter that affects the state of the allocator which can cause some unexpected behavior if youre not aware of that.

edit: i didnt realize this existed, see #16344. my thinking is that the function having a different name will possibly cause people to investigate why and learn what it does differently

@Vexu
Copy link
Member Author

Vexu commented Dec 13, 2023

Haven't looked through thoroughly, but it seems the added Month enum is a duplicate of std.time.epoch.Month.

Looks like it does, that enum didn't exist when I started this. I'll combine them.

i dont have a strong opinion on it but recently there was some discussion about StackFallback having get instead of allocator and imo it makes sense for it to be different. afaik its the only allocator getter that affects the state of the allocator which can cause some unexpected behavior if youre not aware of that.

Oh right, didn't notice that despite it being documented. Maybe there should be an additional allocator function which doesn't reset it or a const allocator = @compileError("use get because XYZ");?

@xdBronch
Copy link
Contributor

either sounds good, even get just not reinitializing the FBA like #16344 mentioned would probably be fine

@Vexu
Copy link
Member Author

Vexu commented Dec 13, 2023

I extracted the stackFallback change to a separate PR since it is breaking: #18275

Comment on lines +158 to +171
pub const names = [_][]const u8{
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these need to be off in some Locale specific area; not as a pub field in the Month enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these names constants be provided at all? They are specifically English and not localized and if you do not intend to localize your application to languages other than English it's not too difficult to do @tagName on the tag and then customize the capitalization as well. Maybe I want to capitalize it "january" or "JANUARY" instead.

lib/std/time.zig Outdated Show resolved Hide resolved
lib/std/time.zig Outdated Show resolved Hide resolved
lib/std/time.zig Show resolved Hide resolved
lib/std/time.zig Outdated
@@ -83,6 +83,12 @@ pub fn localtime(allocator: std.mem.Allocator) LocalTimeError!TimeZone {

/// ISO 8601 compliant date and time representation.
pub const DateTime = struct {
/// nanoseconds, range 0 to 999
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update :) Not sure though why we would need separate fields for ms, us and ns - a u30 nanosecond field [0-999999999] would allow any amount of fractional seconds, down to ns precision, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the format for the other parts but using just one field for fractions does make more sense thinking again.

day: u8,

/// day of the year, range 1 to 366
year_day: u16,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding year_day and week_day - what's the reason to carry those around with any datetime instance? why not calculate them "on demand"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why carry anything but the timestamp and the timezone offset? Either way removing them doesn't affect the size of the struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Go takes the timestamp approach. Rust's Chrono seems to be in the middle. I feel that one benefit of a timestamp approach is that it's easier to implement date arithmetic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to advocate a purely timestamp-based approach. If there is a need to have fields like year_day stored in the structure then that's fine. I was just thinking that fields that are rarely required could be replaced by methods that calculate the value on request for given datetime.

lib/std/time.zig Outdated

/// UNIX timestamp in nanoseconds
nano_timestamp: i128,
tt: TimeZone.TimeType,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the implication of "TimeType" here? Does this cover a rule associated with the TimeZone (POSIX or transition lookup from the tzfile)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how this is currently either. I'll adjust it so that it is just the offset to the UNIX timestamp and a time zone name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leroycep is currently working on an abstraction for this in his zig-chrono library. Basically, I think you'd like to have an interface type that could be a fixed offset, a POSIX TZ rule or a IANA db tzfile time zone as concrete type. Any concrete type would have a method that gives you the offset from UTC (+ a DST indicator if appropriate) for a given datetime, tz and Unix timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I have it organized like this:

tz.zig
tz/
    TZif.zig
     Posix.zig

tz/TZif.zig is the equivalent of the TimeZone file here, and tz/Posix.zig implements parsing and using POSIX TZ environment variable that is in the footer of most TZif files.

Like @FObersteiner mentioned, I have also been working on a TimeZone abstraction. I decided to go with an Allocator-like interface. Unfortunately this does make lifetimes of the TimeZones a bit more complex, which I'm handling at the moment by having a Database which handles their allocation.

allocator.free(self.timetypes);
}

/// Project UTC timestamp to this time zone.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover a fall-back to LMT if the tzfile neither defines an offset for given Unix time (not UTC ;-)) nor a general POSIX rule? (v1 TZif)

Copy link
Member Author

@Vexu Vexu Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently it does only the most basic lookup for the offset. I was hoping for someone more familiar with the format to handle the edge cases so that I don't need to learn more about it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this through once more, I think if you have a IANA tz db file loaded (TZif), it should specify an LMT offset as the first entry in the timetypes array. I'm not sure if this is historically "correct" since the concept of time zones as we have them today didn't exist a 150 years ago. But to me it seems "more" correct than extending a POSIX rule indefinitely into the past.
For the future, the story is a bit different; if the tzfile does not define any transitions after 2038, you end up with a fixed UTC offset, which doesn't seem right. But as for the past, extending a DST rule into the future indefinitely also seems... optimistic?

@karlseguin
Copy link
Contributor

I don't mean to pick on this specific PR, since I think it's in-line with much of the standard lib, but I think this could have much more extensive testing. For example, each of these lines crash in a different way:

std.time.DateTime.fromTimestamp(170141183460469231731687303715884105727, std.time.TimeZone.TimeType.UTC);

std.time.DateTime.fromTimestamp(-1, std.time.TimeZone.TimeType.UTC);

std.time.DateTime.fromTimestamp(-323232223991999, std.time.TimeZone.TimeType.UTC);

@intCast(tzi.StandardName[4]),
@intCast(tzi.StandardName[5]),
},
.offset = tzi.Bias,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation doesn't mention if this includes leap seconds but I'd guess it does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. If not, there is microsoft/STL#1624 and https://www.codeproject.com/Articles/8789/World-Clock-and-the-TimeZoneInformation-class on how to do it.

Bear in mind, that Linux may or may not be configured to use UTC or TAI directly without (from my understanding) a standard way to check the current config ie for folks hosting their own UTC servers (without standard query interface) see https://en.wikipedia.org/wiki/Unix_time#Variant_that_counts_leap_seconds.

@Vexu
Copy link
Member Author

Vexu commented Dec 14, 2023

I don't mean to pick on this specific PR, since I think it's in-line with much of the standard lib, but I think this could have much more extensive testing. For example, each of these lines crash in a different way:

I fully expected a lot of comments on this considering the three previous PRs and the way time is in general.

I'll look more into the edge cases once we agree on the common cases.

/// nanoseconds, range 0 to 999 999 999
nanosecond: u32,

/// seconds, range 0 to 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could mention here 60 [leap second]

@intCast(tzi.StandardName[4]),
@intCast(tzi.StandardName[5]),
},
.offset = tzi.Bias,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. If not, there is microsoft/STL#1624 and https://www.codeproject.com/Articles/8789/World-Clock-and-the-TimeZoneInformation-class on how to do it.

Bear in mind, that Linux may or may not be configured to use UTC or TAI directly without (from my understanding) a standard way to check the current config ie for folks hosting their own UTC servers (without standard query interface) see https://en.wikipedia.org/wiki/Unix_time#Variant_that_counts_leap_seconds.

/// %M Minute (00-59)
/// %p AM or PM designation
/// %s Millisecond 891
/// %S Second (00-60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 [leap second]

while (i < header.counts.leapcnt) : (i += 1) {
const occur: i64 = if (legacy) try reader.readInt(i32, .big) else try reader.readInt(i64, .big);
if (occur < 0) return error.Malformed; // rfc8536: occur [...] MUST be nonnegative
if (i > 0 and leapseconds[i - 1].occurrence + 2419199 > occur) return error.Malformed; // rfc8536: occur [...] each later value MUST be at least 2419199 greater than the previous value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const least_diff_leap_sec = 2419199 ?

LocalTimeUnavailable,
} || std.mem.Allocator.Error;

pub fn localtime(allocator: std.mem.Allocator) LocalTimeError!TimeZone {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to support the Posix style footers and/or TZ environment variable? I have support for Posix style TimeZone specifications (https://codeberg.org/geemili/chrono-zig/src/branch/master/src/tz/Posix.zig) and support for loading from the TZ environment variable: https://codeberg.org/geemili/chrono-zig/src/branch/master/src/tz.zig#L140-L154

};

/// Get current date and time in the system's local time.
/// On non-Windows systems this loads the system's local time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// On non-Windows systems this loads the system's local time
/// On non-Windows systems this loads the system's local

/// Get current date and time in the system's local time.
/// On non-Windows systems this loads the system's local time
/// time zone database from the filesystem and may allocate
/// so when called repeated `nowTz` should be preferred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// so when called repeated `nowTz` should be preferred.
/// so when called repeatedly `nowTz` should be preferred.

Comment on lines +251 to +263
const leap_adjustment = leap: {
var i = self.leapseconds.len;
if (i == 0) break :leap 0;
while (i > 0) {
i -= 1;
if (self.leapseconds[i].occurrence < seconds) break;
}
break :leap self.leapseconds[i].correction;
};
const tt = self.transitions[index].timetype;
return .{
.offset = tt.offset + leap_adjustment,
.name_data = tt.name_data,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get that one :) Are you proposing a leap second correction for a UTC offset? IIUC, time zones as defined in TZif specify offsets relative to a certain reference time (UTC).

Comment on lines +330 to +331
/// %s Millisecond 891
/// %S Second (00-60)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using %f for the fractional seconds part? That would make it independent of the resolution (ms, µs, ns).

/// %S Second (00-60)
/// %y Year, last two digits (00-99)
/// %Y Year
/// %z UTC offset in the form ±HHMM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ISO8601 wants there to be a colon between hours and minutes, plus there might be non-zero seconds as well (current IANA db version does define such offsets for some time zones during some periods of time).

/// %y Year, last two digits (00-99)
/// %Y Year
/// %z UTC offset in the form ±HHMM
/// %Z Time zone name.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%Z typically gives you the abbreviation, not the (up-to-date) IANA db identifier. While this is nice for output, parsing the abbreviations is a pain, if not impossible since there are many ambiguous ones. I think it would be great to have a directive that gives (or accepts) the IANA identifier.

@andrewrk
Copy link
Member

andrewrk commented Jan 4, 2024

I'm leaning towards rejecting #8396. Can you share why you personally want zig standard library to have date APIs in it, and maybe start by maintaining a third party API for some duration, and then we can think about upstreaming it to the standard library?

@FObersteiner
Copy link

As a side-note to Andrew's comment above; if you not only want datetime support but also IANA-db time zone support, you would also have to ship the database itself since not all distributions come with it. Windows for example does its own thing. Plus you would have to make sure that version of the database is up-to-date.

@Vexu
Copy link
Member Author

Vexu commented Jan 4, 2024

I need the system's local time and a way to convert that into a textual representation in order to properly implement __DATE__, __TIME__ and __TIMESTAMP__ macros in Aro.

@andrewrk
Copy link
Member

andrewrk commented Jan 4, 2024

Note that timestamps are a colossal headache for the reproducible builds community. This spec for SOURCE_DATE_EPOCH suggests a neat idea: using the last modification time of the source if this environment variable is not provided.

Of course, for this use case, localizing the time is out of the question.

It seems such a shame to add a massive dependency to the zig standard library (timezone data in addition to code), which is also volatile (subject to changing in response to wars, politics, border disputes, etc.), which is then mainly used to cause reproducibility problems.

@nektro
Copy link
Contributor

nektro commented Jan 4, 2024

I used to be in the camp that this is something that would be essential for Zig to have eventually but I'm come around to agreeing that, just like with proper string support, users that need it would be better served by a separate module that's not tied to Zig's own release cadence and can update whenever the source data does

@FObersteiner
Copy link

FObersteiner commented Jan 5, 2024

Note that timestamps are a colossal headache for the reproducible builds community. This spec for SOURCE_DATE_EPOCH suggests a neat idea: using the last modification time of the source if this environment variable is not provided.

Is this to say "don't use local time"? I mean, IIUC they suggest Unix time in seconds resolution as the solution - I didn't see that coming ;-) Having an ISO8601 compatible date/time string with UTC offset would at least make it human readable - if that is a useful feature in this context I don't know. And it's convertible to UTC/Unix time, except for potential leap seconds. Plus, you wouldn't need "real" time zone support for this, just the UTC offset.
Anyways, I can see why you wouldn't want a half-featured datetime in your std lib (you still need a 3rd party lib for actual time zone support) or full-featured datetime with tz integration (dependencies mentioned above).

@Vexu
Copy link
Member Author

Vexu commented Jan 6, 2024

I don't necessarily need the DateTime since Aro already has code to convert timestamps into human readable dates but what about time zones? How do you suggest I get the offset of the local time zone for those macros and what should be done about std.Tz?

@andrewrk
Copy link
Member

andrewrk commented Jan 6, 2024

I think if you made it so that Aro assumed that everyone's local time zone was UTC then you would be doing everyone a favor. It doesn't make any sense for the local time zone of the computer that compiled a piece of software to be present inside the executable. Nobody wants that.

edit:

what should be done about std.Tz?

It looks like it is unused by the zig project itself and should probably be deleted in favor of a third party package.

I'm open to reconsidering this later, but I'm pretty sure this is the best option for the time being.

@Vexu
Copy link
Member Author

Vexu commented Jan 7, 2024

I think if you made it so that Aro assumed that everyone's local time zone was UTC then you would be doing everyone a favor. It doesn't make any sense for the local time zone of the computer that compiled a piece of software to be present inside the executable. Nobody wants that.

That's how it works currently since it uses the value from std.time.timestamp but it could lead to confusion since that's not how those macros are defined.

I'll open a new PR that removes std.Tz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet