-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
x86_64: Implement integer saturating left shifting codegen #22529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c87f153 to
ebace39
Compare
ebace39 to
ec2df79
Compare
|
What's the CI error. It seems that I didn't touch that region of code. |
|
Don't mind that failure -- it's an inconsistent issue which is being debugged as I type this. Once it's fixed, a rebase should solve that error. |
|
I don't like that this introduces a miscomp, I would rather the backend error on unsupported types rather than silently produce incorrect code. |
|
@jacobly0 which unsupported types? there is already a compiler error when unsupported type is hit. |
|
ec2df79 to
9494329
Compare
|
@jacobly0 ahh! Sorry for my mistake and thanks for your careful review! I forgot that negative values need the minimum value to saturate and it seems that the saturating_arithmetic.zig behavior test is not strong enough. The force-push above should solve the error and the check you have given is passing successfully. However, x86 does not have a saturating shifting instruction so we have to emit up to two shifting and two conditional branches, totally ~10 MIR instructions, I doubt if the shl_sat grammar and IR is useful, as it seems to be a pretty rare case. |
jacobly0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good enough to get in the next release, if I don't manage to finish the rewrite before then.
0f90ddb to
0d040ec
Compare
|
While rewriting comparisons, I have added |
0d040ec to
7206677
Compare
jacobly0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A missing instruction is causing a miscomp of:
const std = @import("std");
var lhs: i256 = -0x53d4148cee74ea43477a65b3daa7b8fdadcbf4508e793f4af113b8d8da5a7eb6;
var rhs: u8 = 0x91;
pub fn main() void {
std.debug.print("{} <<| {} == {}\n", .{ lhs, rhs, lhs <<| rhs });
}$ zig run -fllvm repro.zig
-37916679844576018420322620862767642147218730393747648456860519430618437287606 <<| 145 == -57896044618658097711785492504343953926634992332820282019728792003956564819968
$ zig run -fno-llvm repro.zig
-37916679844576018420322620862767642147218730393747648456860519430618437287606 <<| 145 == 57896044618658097711785492504343953926634992332820282019728792003956564819967
Fixed by:
@@ -88544,7 +88548,7 @@ fn genShiftBinOpMir(
) !void {
const pt = self.pt;
const zcu = pt.zcu;
- const abi_size: u32 = @intCast(lhs_ty.abiSize(zcu));
+ const abi_size: u31 = @intCast(lhs_ty.abiSize(zcu));
const shift_abi_size: u32 = @intCast(rhs_ty.abiSize(zcu));
try self.spillEflagsIfOccupied();
@@ -88728,7 +88732,17 @@ fn genShiftBinOpMir(
.immediate => {},
else => self.performReloc(skip),
}
- }
+ } else try self.asmRegisterMemory(.{ ._, .mov }, temp_regs[2].to64(), .{
+ .base = .{ .frame = lhs_mcv.load_frame.index },
+ .mod = .{ .rm = .{
+ .size = .qword,
+ .disp = switch (tag[0]) {
+ ._l => lhs_mcv.load_frame.off,
+ ._r => lhs_mcv.load_frame.off + abi_size - 8,
+ else => unreachable,
+ },
+ } },
+ });
switch (rhs_mcv) {
.immediate => |shift_imm| try self.asmRegisterImmediate(
tag,Also, there is no handling for shift amounts larger than the bit width:
const std = @import("std");
var lhs: u8 = 0xab;
var rhs: u8 = 32;
pub fn main() void {
std.debug.print("{} <<| {} == {}\n", .{ lhs, rhs, lhs <<| rhs });
}$ zig run -fllvm repro.zig
171 <<| 32 == 255
$ zig run -fno-llvm repro.zig
171 <<| 32 == 171
Normally I would suggest looking at the llvm backend lowering, but even that miscomps. As usual, adding a check for this case to emit an error instead of a miscomp would make this PR mergeable.
|
Applied you fix for the missing instruction. Thank you! |
|
Based on x86_64 semantics, you have only implemented the following cases:
|
|
The genShiftBinOpMir diff mentioned above sometimes fails to encode: |
|
That's unrelated to my diff, the instruction I added is neither |
|
oops, sorry. It is my mistake. |
788831b to
bf4615d
Compare
bf4615d to
4051039
Compare
|
Just a heads up, given all of the miscomps I have already found, this is going to need more behavior test coverage than just enabling an already existing test. At least regression tests covering the examples that have been mentioned should be added near the enabled behavior test. |
|
Just out of curiosity, how do you find some many cases (which looked like random to me) for testing? |
4051039 to
6b824df
Compare
|
The bits are just copied from other random values, not very relevant to the miscomps. I just tried various combinations of type widths, signedness, and shift amounts that either overflow or not. |
jacobly0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good, I know some people have been waiting for this operation for a while.
6b824df to
c2692a7
Compare
Simliarly to shl_with_overflow, we first SHL/SAL the integer, then SHR/SAR it back to compare if overflow happens. If overflow happened, set result to the upper limit to make it saturating. Bug: ziglang#17645 Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com> Signed-off-by: Bingwu Zhang <xtex@aosc.io>
Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com> Signed-off-by: Bingwu Zhang <xtex@aosc.io>
c2692a7 to
1da909a
Compare
Simliarly to shl_with_overflow, we first SHL/SAL the integer, then SHR/SAR it back to compare if overflow happens.
If overflow happened, set result to the upper limit to make it saturating.
Theoretically, if the left shifting instruction is lowered as a single SHL/SAL opcode, and the left operand fits into a register (so no truncation is needed), the CF flag can be used to check for overflow. However the optimization is not implemented right now (for my laziness).
Bug: #17645