SHA-256: Initial constraint optimizations by rose2221 · Pull Request #247 · worldfnd/provekit

rose2221 · 2025-12-25T01:26:50Z

Summary

Optimizes SHA256 compression constraint generation through byte-level operations and fused constraints.

Results :-

Metric	Before	After	Improvement
Constraints	432,976	152699	-64.73%
Constraints (per SHA256 compression call)	39,293	20,907	-46.79%

SHA256 Compression – R1CS Cost Breakdown

Component	Constraints	Witnesses
Direct	3,600	13,364
AND + XOR	144,389	212,998
RANGE	4,651	5,145
Total	152,640	231,507

Marginal SHA256 Call

Component	Constraints / call	Witnesses / call
Direct	3,600	13,364
AND + XOR	13,312	16,384
RANGE	3,995	4,169
Total per SHA256	20,907	33,917

Key Optimizations:-

1. New Type System for Byte-Level Operations

Introduced U8 and U32 wrapper types to replace raw usize witness indices
U32 represents a 32-bit word as 4 little-endian U8 bytes
Each U8 tracks whether it has been range-checked via a range_checked flag
Enables byte-level operations that avoid expensive 32-bit digital decomposition

2. ROTR with Fused Constraints

Implemented ROTR using free byte reordering, byte partitioning into lo and hi via a new BytePartition witness builder, and a single fused constraint per output byte: result * 2^k + lo = byte + lo_next * 256
This achieves 4 constraints per ROTR (one per byte) instead of multiple decomposition constraints.

3. SHR Implementation with Fused Constraints

Similar optimization to ROTR but handles zero-fill for shifted-out bits.
Byte-level shift is free (reordering + zero padding)
Uses same fused constraint approach for intra-byte bit shifting
MSB byte has special handling: result * 2^k + lo = byte (no next byte term)

4. Range Check Optimization

Only range-checks lo during byte partitioning. Soundness is preserved because the partitioned high bits (hi) contribute to a derived output byte via fused constraints, and that output byte is subsequently constrained to [0,255] by downstream byte-level lookup tables (e.g. AND/XOR). This reduces the number of explicit range checks in this context without sacrificing soundness.

5.. New BytePartition WitnessBuilder

Added dedicated witness builder variant for splitting an 8-bit value at a bit boundary.
Computes lo (lower k bits) and hi (upper 8-k bits) such that: x = lo + hi * 2^k
Produces 2 witness values per invocation and is reused across ROTR and SHR operations.

6.. Chained U32 Addition with Fused Constraints

Replaced chained pairwise additions in T1 computation (h + Σ₁(e) + Ch(e,f,g) + K[i] + W[i]) with a single variadic addition using the fused constraint: a + b + c + d + e = result + carry * 2^32
Reduces to 1 constraint + 2 witnesses, with carry range-checks sized precisely based on input count.

7. Byte-Level Binary Operations (AND/XOR)

Kept SHA256 AND/XOR operations at the 8-bit level by introducing and_ops_byte and xor_ops_byte tracking and an is_byte_level flag inadd_binop_constraints, avoiding unnecessary 32-bit decomposition while preserving the full decomposition path for general Noir blackbox ops.

8. Range Check & LogUp Optimizations

Introduced fused constraints for LogUp inverse computation, replacing the two-step denominator + inverse verification.
- Before: Compute (X - c·v) and verify denominator * inverse = 1, 2 constraints + 2 witnesses.
- After: Single fused constraint: (X - c·v) * inverse = 1, 1 constraint + 1 witness.
Added a new WitnessBuilder::LogUpInverse variant that computes the inverse directly during batch inversion, eliminating the intermediate denominator witness.
Fused the log-derivative multiset equality check into a single constraint.
- Before: Separate constraints for table sum, witness sum, and equality check (3 constraints).
- After: Single constraint: (Σ table_terms - Σ witness_terms) = 0. Removes 2 constraints and 2 intermediate sum witnesses per range-check lookup.
Extended batch inversion support to include LogUpInverse.
- LogUpInverse is deferred and batched alongside existing inverse operations.
- Denominators (X - c·v) are computed inline during batch inversion.

8. BinOp LogUp Constraint Fusion (binops.rs)

Replaced separate binop-side and table-side summations with a single fused sum using negated coefficients.
Enforces equality via one constraint: (Σ binop_terms − Σ table_terms) = 0.
Removes intermediate sum witnesses and reduces constraints per AND/XOR lookup.

9. Combined AND/XOR Lookup Table

Replaced separate lookup tables for AND and XOR operations with a single combined table.
Before: Two independent lookup tables (~196K constraints each).
After: One combined table using a 4-term encoding: sz − (lhs + rs·rhs + rs²·and + rs³·xor). Eliminates one full lookup table.
Savings: ~147K constraints.

10. Byte-Level Binary Operations (SHA256)

Avoided 32-bit decomposition for SHA256 AND/XOR by operating directly on U8 byte types.
Before: 32-bit AND/XOR required decomposition into 4 bytes followed by 4 separate 8-bit lookups.
After: SHA256 binops are executed directly at the byte level using U8, skipping decomposition entirely.
Eliminates decomposition constraints and intermediate witnesses per binary operation.

11. Inlined T1 / T2 Computation in SHA256 Rounds

Removed intermediate T1 and T2 values in SHA256 round computation.
Before:

T1 = h + Σ₁(e) + Ch(e,f,g) + K[i] + W[i]
T2 = Σ₀(a) + Maj(a,b,c)
new_e = d + T1
new_a = T1 + T2

After: new_e and new_a are computed directly with all terms inlined.
Savings: 12 witnesses per round (result, carry, and unpacked bytes for T1/T2) × 64 rounds = 768 witnesses per SHA256 compression.

12. Inline Table Entry Inverse (CombinedTableEntryInverse)

Optimized lookup inverse handling by inlining the denominator directly into the inverse constraint.
Before: Create denominator witness, then verify denominator × inverse = 1 (2 witnesses per entry).
After: Single fused constraint: (sz − lhs − rs·rhs − rs²·and − rs³·xor) × inv = 1. Eliminates the intermediate denominator witness.
Savings: 65,536 witnesses (one per table entry).

13. Pack Cache for U32 Operations

Introduced memoization for U32 packing operations.
Before: Identical U32 byte sequences could be repacked multiple times within a compression.
After: pack_cached() caches pack results keyed by byte indices and reuses them.
Avoids duplicate pack witnesses and constraints across SHA256 rounds.

Bisht13 · 2026-01-16T19:57:43Z

Nit: Remove unused import.

…raint

…edule expansion in SHA256 functions

…256 round functions

… function

…ssing

…ages

…nds and comments

…ion, skip zero-carry range checks, remove unused variants

…e-level ops and optimizing witness decomposition

Use range_ops_total from R1CS breakdown instead of ACIR RANGE opcode count when calculating non-SHA256 range checks. The previous code incorrectly used ACIR-level counts which don't match R1CS-level range operations.

The combined AND/XOR lookup table operates on 8-bit operands. When handling BlackBoxFuncCall::AND/XOR with constant operands, the code was pushing full 32-bit constants directly to the ops vectors instead of decomposing them into 4 bytes first. This caused index out of bounds panics in the witness builder when computing multiplicities: the index was computed as (lhs << 8) + rhs, expecting 8-bit values (max index 65535), but receiving 32-bit values (producing indices like 3389742323). Fix: Decompose constant operands into [u8; 4] bytes and push byte-level constant operations to the lookup table, matching the expected byte-level semantics.

When handling AND/XOR operations where lhs is a constant and rhs is a witness, the code was using the raw 32-bit rhs witness directly instead of decomposing it into bytes. This caused index out of bounds panics in the witness builder when computing multiplicities for the 8-bit lookup table (max index 65535, but receiving indices like 2282366754). Fix: Add explicit handling for the (lhs=constant, rhs=witness) case that decomposes the rhs witness into bytes via add_digital_decomposition before pushing to the binop ops vectors.

…oR1CSCompiler

SHA-256: Initial constraint optimizations

rose2221 marked this pull request as draft December 25, 2025 01:27

rose2221 force-pushed the rs/sha_optimization branch 2 times, most recently from 6fd8fd0 to 5b78f63 Compare January 7, 2026 19:38

rose2221 marked this pull request as ready for review January 7, 2026 19:41

Bisht13 requested changes Jan 16, 2026

View reviewed changes

Bisht13 approved these changes Jan 16, 2026

View reviewed changes

Rose Jethani and others added 16 commits January 17, 2026 14:49

feat: streamline SHA256 compression witness handling and constraints

1dd8fff

simplify witness addition and constraint handling in SHA256 compression

1c2da3d

feat: Optimize SHA256 compression with byte-level ops and fused const…

30a9dbd

…raint

fix: Correct comment formatting for fused constraints and message sch…

42d20e8

…edule expansion in SHA256 functions

fix: Improve comment formatting for clarity in multi-addition and SHA…

a430e99

…256 round functions

fix: Improve comment formatting for clarity in add_u32_multi_addition…

620603e

… function

feat: simplify Ch and Maj boolean formulas in SHA256 compression

7068cbd

feat: add LogUpInverse witness builder and integrate into batch proce…

34e94c6

…ssing

fix: improve comment formatting for clarity in R1CS solver panic mess…

df62e0b

…ages

refactor: enhance clarity in binop constraints by restructuring summa…

5da5e51

…nds and comments

refactor: remove unused not_u32 and not_u8 functions

20b0cce

refactor: rename variables for clarity in range check helper function

0763cd9

perf:refactor binary operation constraints to use combined lookup table

af53038

Address PR review comments: fix non-deterministic HashMap, key collis…

df24630

…ion, skip zero-carry range checks, remove unused variants

refactor: streamline binary operation handling by removing unused byt…

662c7ba

…e-level ops and optimizing witness decomposition

cargo run --release --bin provekit-cli circuit_stats ./target/basic.json

e331ec2

rose2221 force-pushed the rs/sha_optimization branch from a834994 to e331ec2 Compare January 17, 2026 09:58

Bisht13 and others added 4 commits January 18, 2026 19:18

fix: correct non-SHA256 range check count in circuit_stats display

9fea91d

Use range_ops_total from R1CS breakdown instead of ACIR RANGE opcode count when calculating non-SHA256 range checks. The previous code incorrectly used ACIR-level counts which don't match R1CS-level range operations.

refactor: simplify formatting of digital decomposition calls in NoirT…

e967be7

…oR1CSCompiler

Bisht13 merged commit f8728db into main Jan 19, 2026
4 of 5 checks passed

Bisht13 deleted the rs/sha_optimization branch January 19, 2026 06:51

dcbuild3r pushed a commit that referenced this pull request May 16, 2026

Merge pull request #247 from worldfnd/rs/sha_optimization

3e1aab9

SHA-256: Initial constraint optimizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHA-256: Initial constraint optimizations#247

SHA-256: Initial constraint optimizations#247
Bisht13 merged 20 commits into
mainfrom
rs/sha_optimization

rose2221 commented Dec 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bisht13 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rose2221 commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results :-

Key Optimizations:-

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bisht13 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rose2221 commented Dec 25, 2025 •

edited

Loading