Skip to content

Conversation

@xtexx
Copy link
Contributor

@xtexx xtexx commented Jan 18, 2025

Simliarly to shl_with_overflow, we first SHL/SAL the integer, then SHR/SAR it back to compare if overflow happens.
If overflow happened, set result to the upper limit to make it saturating.

Theoretically, if the left shifting instruction is lowered as a single SHL/SAL opcode, and the left operand fits into a register (so no truncation is needed), the CF flag can be used to check for overflow. However the optimization is not implemented right now (for my laziness).

Bug: #17645

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from c87f153 to ebace39 Compare January 18, 2025 14:08
@xtexx xtexx changed the title x86_64: Implement integer saturing left shifting codegen x86_64: Implement integer saturating left shifting codegen Jan 18, 2025
@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from ebace39 to ec2df79 Compare January 18, 2025 15:21
@xtexx
Copy link
Contributor Author

xtexx commented Jan 19, 2025

What's the CI error. It seems that I didn't touch that region of code.

@andrewrk andrewrk requested a review from jacobly0 January 19, 2025 03:29
@mlugg
Copy link
Member

mlugg commented Jan 19, 2025

Don't mind that failure -- it's an inconsistent issue which is being debugged as I type this. Once it's fixed, a rebase should solve that error.

@jacobly0
Copy link
Member

jacobly0 commented Jan 19, 2025

I don't like that this introduces a miscomp, I would rather the backend error on unsupported types rather than silently produce incorrect code.

@xtexx
Copy link
Contributor Author

xtexx commented Jan 19, 2025

@jacobly0 which unsupported types? there is already a compiler error when unsupported type is hit.

https://github.com/ziglang/zig/blob/ec2df79cbf25e53c79d53dd6d461bd9d7457bd85/src/arch/x86_64/CodeGen.zig#L12944-L12946

@jacobly0
Copy link
Member

$ cat check.zig
const std = @import("std");

fn shlSat(x: anytype, y: std.math.Log2Int(@TypeOf(x))) @TypeOf(x) {
    return x <<| y;
}

fn testType(comptime T: type) bool {
    var ok = true;
    comptime var rhs: std.math.Log2Int(T) = 0;
    inline while (true) : (rhs += 1) {
        comptime var lhs: T = std.math.minInt(T);
        inline while (true) : (lhs += 1) {
            ok = shlSat(lhs, rhs) == lhs <<| rhs and ok;
            if (lhs == std.math.maxInt(T)) break;
        }
        if (rhs == @bitSizeOf(T) - 1) break;
    }
    return ok;
}

pub fn main() void {
    var ok = true;
    ok = testType(i2) and ok;
    ok = testType(u2) and ok;
    ok = testType(i3) and ok;
    ok = testType(u3) and ok;
    ok = testType(i4) and ok;
    ok = testType(u4) and ok;
    std.debug.print("{s}\n", .{if (ok) "ok" else "bad"});
}
$ zig run -fllvm check.zig
ok
$ zig run -fno-llvm check.zig
bad

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from ec2df79 to 9494329 Compare January 20, 2025 01:53
@xtexx
Copy link
Contributor Author

xtexx commented Jan 20, 2025

@jacobly0 ahh! Sorry for my mistake and thanks for your careful review!

I forgot that negative values need the minimum value to saturate and it seems that the saturating_arithmetic.zig behavior test is not strong enough. The force-push above should solve the error and the check you have given is passing successfully.

However, x86 does not have a saturating shifting instruction so we have to emit up to two shifting and two conditional branches, totally ~10 MIR instructions, I doubt if the shl_sat grammar and IR is useful, as it seems to be a pretty rare case.

Copy link
Member

@jacobly0 jacobly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough to get in the next release, if I don't manage to finish the rewrite before then.

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch 2 times, most recently from 0f90ddb to 0d040ec Compare January 20, 2025 06:56
@jacobly0
Copy link
Member

While rewriting comparisons, I have added Temp.cmpInts in #22637. If you want, you can see other uses in the file for how to integrate it with other legacy code. It handles turning any passed in condition into a functional result flag.

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from 0d040ec to 7206677 Compare March 1, 2025 05:58
@xtexx xtexx requested a review from jacobly0 March 1, 2025 06:00
Copy link
Member

@jacobly0 jacobly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A missing instruction is causing a miscomp of:

const std = @import("std");
var lhs: i256 = -0x53d4148cee74ea43477a65b3daa7b8fdadcbf4508e793f4af113b8d8da5a7eb6;
var rhs: u8 = 0x91;
pub fn main() void {
    std.debug.print("{} <<| {} == {}\n", .{ lhs, rhs, lhs <<| rhs });
}
$ zig run -fllvm repro.zig
-37916679844576018420322620862767642147218730393747648456860519430618437287606 <<| 145 == -57896044618658097711785492504343953926634992332820282019728792003956564819968
$ zig run -fno-llvm repro.zig
-37916679844576018420322620862767642147218730393747648456860519430618437287606 <<| 145 == 57896044618658097711785492504343953926634992332820282019728792003956564819967

Fixed by:

@@ -88544,7 +88548,7 @@ fn genShiftBinOpMir(
 ) !void {
     const pt = self.pt;
     const zcu = pt.zcu;
-    const abi_size: u32 = @intCast(lhs_ty.abiSize(zcu));
+    const abi_size: u31 = @intCast(lhs_ty.abiSize(zcu));
     const shift_abi_size: u32 = @intCast(rhs_ty.abiSize(zcu));
     try self.spillEflagsIfOccupied();
 
@@ -88728,7 +88732,17 @@ fn genShiftBinOpMir(
                 .immediate => {},
                 else => self.performReloc(skip),
             }
-        }
+        } else try self.asmRegisterMemory(.{ ._, .mov }, temp_regs[2].to64(), .{
+            .base = .{ .frame = lhs_mcv.load_frame.index },
+            .mod = .{ .rm = .{
+                .size = .qword,
+                .disp = switch (tag[0]) {
+                    ._l => lhs_mcv.load_frame.off,
+                    ._r => lhs_mcv.load_frame.off + abi_size - 8,
+                    else => unreachable,
+                },
+            } },
+        });
         switch (rhs_mcv) {
             .immediate => |shift_imm| try self.asmRegisterImmediate(
                 tag,

Also, there is no handling for shift amounts larger than the bit width:

const std = @import("std");
var lhs: u8 = 0xab;
var rhs: u8 = 32;
pub fn main() void {
    std.debug.print("{} <<| {} == {}\n", .{ lhs, rhs, lhs <<| rhs });
}
$ zig run -fllvm repro.zig
171 <<| 32 == 255
$ zig run -fno-llvm repro.zig
171 <<| 32 == 171

Normally I would suggest looking at the llvm backend lowering, but even that miscomps. As usual, adding a check for this case to emit an error instead of a miscomp would make this PR mergeable.

@xtexx
Copy link
Contributor Author

xtexx commented Mar 1, 2025

Applied you fix for the missing instruction. Thank you!
While RHS may not be an immediate/literal constant, how can the miscomp case be checked at compile-time?

@jacobly0
Copy link
Member

jacobly0 commented Mar 1, 2025

Based on x86_64 semantics, you have only implemented the following cases:

  • lhs_bits <= 32 and rhs_bits <= 5
  • lhs_bits > 32 and lhs_bits <= 64 and rhs_bits <= 6
  • rhs_bits <= log2(lhs_bits)

@xtexx
Copy link
Contributor Author

xtexx commented Mar 1, 2025

The genShiftBinOpMir diff mentioned above sometimes fails to encode:

error(x86_64_encoder): no encoding found for: none mov m128 imm8s none none

@jacobly0
Copy link
Member

jacobly0 commented Mar 1, 2025

That's unrelated to my diff, the instruction I added is neither .size = .xword nor contains an immediate.

@xtexx
Copy link
Contributor Author

xtexx commented Mar 1, 2025

oops, sorry. It is my mistake.

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch 2 times, most recently from 788831b to bf4615d Compare March 1, 2025 14:40
@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from bf4615d to 4051039 Compare March 1, 2025 15:31
@jacobly0
Copy link
Member

jacobly0 commented Mar 2, 2025

Just a heads up, given all of the miscomps I have already found, this is going to need more behavior test coverage than just enabling an already existing test. At least regression tests covering the examples that have been mentioned should be added near the enabled behavior test.

@xtexx
Copy link
Contributor Author

xtexx commented Mar 2, 2025

Just out of curiosity, how do you find some many cases (which looked like random to me) for testing?

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from 4051039 to 6b824df Compare March 2, 2025 01:07
@jacobly0
Copy link
Member

jacobly0 commented Mar 2, 2025

The bits are just copied from other random values, not very relevant to the miscomps. I just tried various combinations of type widths, signedness, and shift amounts that either overflow or not.

Copy link
Member

@jacobly0 jacobly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good, I know some people have been waiting for this operation for a while.

@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from 6b824df to c2692a7 Compare March 2, 2025 02:30
xtexx and others added 2 commits March 2, 2025 10:30
Simliarly to shl_with_overflow, we first SHL/SAL the integer, then
SHR/SAR it back to compare if overflow happens.
If overflow happened, set result to the upper limit to make it saturating.

Bug: ziglang#17645
Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
Signed-off-by: Bingwu Zhang <xtex@aosc.io>
Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
Signed-off-by: Bingwu Zhang <xtex@aosc.io>
@xtexx xtexx force-pushed the x86-64/shl-sat-int branch from c2692a7 to 1da909a Compare March 2, 2025 02:30
@jacobly0 jacobly0 enabled auto-merge March 2, 2025 02:32
@jacobly0 jacobly0 merged commit a6525c1 into ziglang:master Mar 2, 2025
9 checks passed
@xtexx xtexx deleted the x86-64/shl-sat-int branch March 15, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants