Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: disallow leading zeroes in int literals and int types #12443

Merged
merged 2 commits into from
Aug 18, 2022

Conversation

wooster0
Copy link
Contributor

This makes 0123 and u0123 etc. illegal.

I'm now confident that this is a good change because
I actually caught two C header translation mistakes in haiku.zig with this.
Clearly, 0123 being octal in C (TIL) can cause confusion, and we avoid ambiguity by
requiring 0o as the prefix and now also disallowing leading zeroes in integers.

For consistency and because it looks weird, we disallow it for integer types too (e.g. u0123).

Fixes #11963
Fixes #12417

@tiehuis
Copy link
Member

tiehuis commented Aug 13, 2022

As an aside, the current implementation of parseInt and parseFloat in std accept leading zeroes. May want to do a follow-up to this to modify their behavior if we are wanting consistency.

pub const CSBRK = 08005;
pub const CSBRK = 0x8005;
Copy link
Contributor

@marler8997 marler8997 Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings me back. I'll share the story for those interested. At my old job we were sometimes getting weird file permissions. It was a very large code base (millions of LOC in hundreds of thousands of files) and was hard to reproduce. Various people had spent days trying to reproduce/debug this issue but it's difficult when there's so much code to sift through. I thought it could be some sort of uninitialized value issue but I was able to determine that each time it happened the permission bits were consistent. After looking at the bit pattern of the permissions, it dawned on me that a user probably entered a hex integer literal when it was meant to be octal (i.e. they did something like 0x755 instead of 0755). I took the bit pattern and decoded it into a hex integer literal and used grep to search our repositories (probably took 5 or 10 minutes) and sure enough, I got a hit for a set of permission bits being passed to an open call! Changed that 0x to an 0 and problem solved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for sharing! That really sounds like a pain. So much time spent for such a simple fix. Luckily with Zig this should never happen again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had exact this issue myself a couple of weeks ago in my zig codebase, and thought about suggesting this just a few days ago. @r00ster91 you got it , thanks.

@wooster0
Copy link
Contributor Author

@tiehuis good point! I opened #12446.

Copy link
Member

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.zig:1:11: error: expected expression, found 'invalid bytes'
const a = 00123;
          ^~~~~
a.zig:1:16: note: invalid byte: ';'
const a = 00123;
               ^

.. is a terrible compile error. This either needs a custom error in Ast.zig or, preferably an error in AstGen.zig since this doesn't prevent us from parsing properly.

@wooster0
Copy link
Contributor Author

I agree. I thought it was very confusing too but then I noticed that we actually already have this problem with many other, similar cases too:

pub fn main() void {
    _ = 1_;
}
x.zig:4:9: error: expected expression, found 'invalid bytes'
    _ = 1_;
        ^~
x.zig:4:11: note: invalid byte: ';'
    _ = 1_;
          ^
pub fn main() void {
    _ = 1__1;
}
x.zig:4:9: error: expected expression, found 'invalid bytes'
    _ = 1__1;
        ^~
x.zig:4:11: note: invalid byte: '_'
    _ = 1__1;
          ^

I think it is very much part of the lump you will find when you search for "underscore_placement" on https://github.com/ziglang/zig/find/master

As you can see, they all expect the // :2:21: error: expected expression, found 'invalid bytes' error too, which I also find could probably be improved.

I think one simple thing we could do is use a more specific result.tag than .invalid? Currently we use .invalid for everything but we could use more specific ones. Is that an option?
I feel like this might be better suited as a follow-up PR.

@wooster0
Copy link
Contributor Author

wooster0 commented Aug 14, 2022

Besides, do all these "invalid_underscore_placement_in_"-named tests even belong in zig/test/cases/compile_errors? These tests seem like very simple tests that should just be part of tests in tokenizer.zig (and probably most if not all of them already are).

@Vexu
Copy link
Member

Vexu commented Aug 14, 2022

I agree. I thought it was very confusing too but then I noticed that we actually already have this problem with many other, similar cases too:

Yes, and I'm about to open an issue for this.

I think one simple thing we could do is use a more specific result.tag than .invalid? Currently we use .invalid for everything but we could use more specific ones. Is that an option?

Most uses of .invalid are related to literals, it would be better to just validate them at a later stage than the tokenizer.

I feel like this might be better suited as a follow-up PR.

Fixing the previous things sure, but I'd prefer we start this new error off properly.

This makes `0123` and `u0123` etc. illegal.

I'm now confident that this is a good change because
I actually caught two C header translation mistakes in `haiku.zig` with this.
Clearly, `0123` being octal in C (TIL) can cause confusion, and we make this easier to read by
requiring `0o` as the prefix and now also disallowing leading zeroes in integers.

For consistency and because it looks weird, we disallow it for integer types too (e.g. `u0123`).

Fixes ziglang#11963
Fixes ziglang#12417
@Vexu Vexu merged commit 4055e60 into ziglang:master Aug 18, 2022
@nektro
Copy link
Contributor

nektro commented Aug 19, 2022

this broke translate-c

@wooster0
Copy link
Contributor Author

How did it break? Any test case? Can you elaborate?

@nektro
Copy link
Contributor

nektro commented Aug 19, 2022

/run/media/meghan/dev/mount.ufs/zig-cache/o/9aeca74be3315ca7b4accb89a5db24de/cimport.zig:1648:33: error: integer literal '00' has leading zero
pub const O_RDONLY = @as(c_int, 00);
                                ^~

@wooster0
Copy link
Contributor Author

What is the C version of that?

@nektro
Copy link
Contributor

nektro commented Aug 19, 2022

https://github.com/ziglang/zig/blob/master/lib/libc/musl/include/fcntl.h#L44

#define O_RDONLY  00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arbitrary bit-width integer types allow leading zeros compiler accepting leading zeroes
6 participants