feat(vm): add P256VERIFY precompile (TIP-7951)#6720
feat(vm): add P256VERIFY precompile (TIP-7951)#6720CodeNinjaEvan merged 3 commits intotronprotocol:developfrom
Conversation
29c4ccb to
4837049
Compare
Pure-Java BC port of EIP-7951; gated by ALLOW_TVM_OSAKA.
Manual @test, not part of regular suite. Run via --tests filter.
4837049 to
9d4e1d1
Compare
| * Single-threaded, pure-Java BouncyCastle path. The first three tests use a | ||
| * 5000-iteration JIT warmup; coldNoWarmup deliberately skips it. | ||
| */ | ||
| public class PrecompileBenchmark { |
There was a problem hiding this comment.
Really impressive benchmark methodology here — the cold-no-warmup test in particular is a thoughtful touch for low-frequency precompiles, and the headroom analysis in the PR description is genuinely convincing. 🎯
One small concern: the class Javadoc says "Not part of the regular test suite — invoke explicitly", but because the methods use @Test without @Ignore they will still be picked up by a plain ./gradlew :framework:test run. The 4 benchmarks bring ~100 keypair generations plus 5×5000-iter measurement loops per benchmark, which adds noticeable time and a flood of System.out.printf lines to CI logs.
The repo already has a precedent for this exact case at framework/src/test/java/org/tron/common/runtime/vm/TimeBenchmarkTest.java, which is annotated @Ignore at the class level and still runs fine via --tests selection. Would it be reasonable to add @Ignore here as well so the benchmark stays opt-in? The two --tests … invocations in the PR description would still work unchanged.
Summary
0x100, gated by theALLOW_TVM_OSAKAproposal (no separate P256VERIFY config-file knob — activates via on-chain proposal vote, mirroringALLOW_TVM_SELFDESTRUCT_RESTRICTION).P256VerifyTest): 778 from the Wycheproofecdsa_secp256r1_sha256_p1363_testsuite plus 4 reference / invalid-input cases adopted from go-ethereum's EIP-7951 fixtures.PrecompileBenchmark, guarded by-DrunPrecompileBenchmark=true) compare against ECRecover. TEST 4 (coldNoWarmup) measures no-execute-warmup latency to reflect the low-frequency mainnet case where the JVM has not JIT-compiled the precompile path.Spec
The 6900-energy cost is the value mandated by EIP-7951 (not derived from these benchmarks); the benchmarks below confirm that this spec value is conservative for TRON's JVM.
Test
./gradlew :framework:test --tests org.tron.common.runtime.vm.P256VerifyTest./gradlew :framework:test --tests org.tron.common.runtime.vm.AllowTvmOsakaTest./gradlew :framework:test --no-daemon -DrunPrecompileBenchmark=true --tests 'org.tron.common.runtime.vm.PrecompileBenchmark.coldNoWarmup' -i(manual)./gradlew :framework:test --no-daemon -DrunPrecompileBenchmark=true --tests 'org.tron.common.runtime.vm.PrecompileBenchmark' -i(manual)Benchmark
Server: Linux 8 cores / 32 GB. Two separate opt-in
--no-daemonruns so the cold measurement gets a fresh JVM:-DrunPrecompileBenchmark=true --tests 'PrecompileBenchmark.coldNoWarmup'-DrunPrecompileBenchmark=true --tests 'PrecompileBenchmark'(provides warm steady-state and fail-path numbers)Warm steady-state — TEST 1 / TEST 3 (5000-iter warmup, 5000 × 5 measure)
At C2 steady state P256 is ~1.75× slower than ECRecover, well under the 2.30× energy ratio (6900/3000) — i.e. the EIP-7951-mandated 6900 leaves ~24% headroom against TRON's measured cost.
Cold no-execute-warmup — TEST 4 (100 distinct inputs, per-call timing)
The first measured
execute()call sees an unprimed precompile path, but inputs are generated before timing, so this is not a full cryptographic helper classloading measurement. Both precompiles still land in the 10–20 ms range for the first measured call. After 100 invocations the JVM has not yet triggered C1/C2 (-XX:CompileThreshold=10000default), so per-call cost stays around 2 ms — three orders of magnitude above the C2 steady state of ~2 µs. This is the realistic upper bound for a low-frequency precompile that rarely reaches JIT steady state.Fail-path early-exit — TEST 2 (warm, 5000-iter warmup, 5000 × 5 measure)
len != 160r ≥ N(modulus bound)qx ≥ P(field bound)Confirms the cheap guards short-circuit before scalar multiplication: length / bound / infinity checks finish in ~100–500 ns, three orders of magnitude faster than a full ECDSA pass. Off-curve detection costs one curve-equation evaluation (~2 µs) but is still ~1000× cheaper than a full verify. A failing ECDSA verify costs the same as a passing one — both run the full scalar multiplication; only the final equality check differs.