Reduce allocations in solve_witness_vec#176
Conversation
CodSpeed WallTime Performance ReportMerging #176 will not alter performanceComparing
|
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes memory allocation in the prover by refactoring solve_witness_vec and related digital decomposition functions. The changes reduce allocations by 38% in the prover (from over 5 million) and improve decompose_into_digits performance by 2x.
Key changes:
- Moved
decompose_into_digitsfrom local modules to common library for reuse - Replaced eager vector collection with lazy iterators in binary operations
- Optimized bit manipulation using
BitIteratorLEinstead of vector allocations
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| provekit/r1cs-compiler/src/range_check.rs | Removed unnecessary .clone() calls on function parameters |
| provekit/r1cs-compiler/src/digits.rs | Removed duplicate decompose_into_digits implementation and helper functions |
| provekit/r1cs-compiler/src/binops.rs | Replaced vector collections with boxed iterators to reduce allocations |
| provekit/prover/src/witness/digits.rs | Removed duplicate decompose_into_digits implementation |
| provekit/prover/src/r1cs.rs | Added instrumentation to solve_witness_vec method |
| provekit/common/src/witness/mod.rs | Exported decompose_into_digits from common module |
| provekit/common/src/witness/digits.rs | Added optimized decompose_into_digits implementation using iterators |
| provekit/common/Cargo.toml | Added itertools dependency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Changing from FieldElement :: from_be to from_le didn't negate this test. So | ||
| // this needs to be extended | ||
| fn test_field_to_le_bits() { | ||
| let value = FieldElement::from(5u32); | ||
| let bits: Vec<bool> = field_to_le_bits(value).collect(); | ||
| assert_eq!(bits.len(), 256); | ||
| assert!(bits[0]); | ||
| assert!(!bits[1]); | ||
| assert!(bits[2]); | ||
| assert!(!bits[254]); | ||
| assert!(!bits[255]); |
There was a problem hiding this comment.
This comment suggests the test may be incomplete or incorrect. The comment indicates that changing from big-endian to little-endian conversion didn't affect the test results, which suggests the test might not be validating the endianness properly.
| // Changing from FieldElement :: from_be to from_le didn't negate this test. So | |
| // this needs to be extended | |
| fn test_field_to_le_bits() { | |
| let value = FieldElement::from(5u32); | |
| let bits: Vec<bool> = field_to_le_bits(value).collect(); | |
| assert_eq!(bits.len(), 256); | |
| assert!(bits[0]); | |
| assert!(!bits[1]); | |
| assert!(bits[2]); | |
| assert!(!bits[254]); | |
| assert!(!bits[255]); | |
| // Extended to check round-trip and more bit positions for endianness correctness | |
| fn test_field_to_le_bits() { | |
| // Test value with bits set at various positions | |
| let value = FieldElement::from(5u32); // 0b101 | |
| let bits: Vec<bool> = field_to_le_bits(value).collect(); | |
| assert_eq!(bits.len(), 256); | |
| assert!(bits[0]); // 2^0 | |
| assert!(!bits[1]); // 2^1 | |
| assert!(bits[2]); // 2^2 | |
| assert!(!bits[254]); | |
| assert!(!bits[255]); | |
| // Test value with a high bit set | |
| let high_bit_value = FieldElement::from(1u32 << 8); // 2^8 | |
| let high_bits: Vec<bool> = field_to_le_bits(high_bit_value).collect(); | |
| assert!(high_bits[8]); | |
| for i in 0..8 { | |
| assert!(!high_bits[i]); | |
| } | |
| // Round-trip test: bits -> field -> bits | |
| let reconstructed = le_bits_to_field(bits.iter().cloned()); | |
| assert_eq!(reconstructed, value); | |
| let reconstructed_high = le_bits_to_field(high_bits.iter().cloned()); | |
| assert_eq!(reconstructed_high, high_bit_value); |
Reduce allocations in `solve_witness_vec`
38% of the allocations in the prover is was due to
solve_witness_vecand this PR brings that down to under 3%. The total amount of allocations is reduced by 5 million anddecompose_into_digitsnow runs twice as fast.