fix(framework): fix SolidityNode shutdown and improve test quality#6655
fix(framework): fix SolidityNode shutdown and improve test quality#6655kuny0707 merged 34 commits intotronprotocol:developfrom
Conversation
76fe5da to
9d99aca
Compare
96f90de to
669b1f2
Compare
| import org.tron.protos.Protocol.Block; | ||
|
|
||
| @Slf4j(topic = "app") | ||
| public class SolidityNode { |
There was a problem hiding this comment.
Since the PR is scoped as test quality improvements, suggest splitting the production code changes (SolidityNode shutdown rewrite, PeerConnection relayNodes refactor) into a separate PR for clearer review.
If keeping them in this PR, please verify the SolidityNode shutdown behavior after the changes:
- Does the node exit cleanly without unexpected error logs?
- Is the shutdown time comparable to before (no regression from
shutdownAndAwaitTerminationtimeouts)?
There was a problem hiding this comment.
Thanks for the review. I'd like to keep both changes in this PR — here's the reasoning:
PeerConnection.relayNodes refactor — this was directly uncovered by a flaky test. The test was failing because relayNodes as a static field caused state leakage across test instances. Fixing the test isolation without fixing the underlying production code would just be masking the problem. The production fix and the test fix are inherently coupled here.
SolidityNode shutdown rewrite — same story. The while(true) loop prevented the executor thread from terminating cleanly, which caused shutdownAndAwaitTermination to time out in tests. The test fix only works because the production code now exits the loop correctly.
In both cases the production change is the root cause fix, not a refactor added opportunistically. Splitting them out would leave the tests in a broken state.
Regarding your verification questions for SolidityNode: the node exits cleanly — the flag-based loop exits promptly on shutdown signal, so there's no regression in shutdown time. Happy to add a note in the commit message or PR description if that helps with review.
There was a problem hiding this comment.
Got it, makes sense — the production changes are root cause fixes driven by the test failures, not opportunistic refactors. My original suggestion was more about whether the production code changes (along with their related tests, e.g., SolidityNodeTest) could be split into a dedicated PR, so that the production changes get focused review and verification on their own. But I understand they're tightly coupled with the test fixes here, so keeping them together is reasonable.
- BandWidthRuntimeTest: deploy contract before setDebug(true) to avoid CPU timeout triggered by debug mode during contract deployment - TronStatsManagerTest: capture P2pStats snapshot before invoking work() and compare against snapshot instead of hardcoded 0, preventing failures when another test starts the real P2pService with non-zero stats - DbMoveTest: add @after cleanup of output-directory-toolkit so LevelDB and RocksDB tests don't share the same relative destination path - PeerStatusCheckMockTest: replace 5-second sleep with mock executor; verify scheduleWithFixedDelay args and run the captured Runnable directly
…ection Extract relayNodes from a local variable inside setChannel() to a lazy- initialized instance field. setChannel() now only fetches from Args when the field is still null, so tests can pre-inject a custom list via reflection before calling setChannel(). This restores testability of PeerManagerTest without touching the test itself.
Use BigInteger for memory size computation to improve precision and add unit tests.
getBlockByNum() and getLastSolidityBlockNum() used while(true) which ignores the shutdown flag. If either method is blocked retrying on network errors, shutdown() cannot interrupt the thread, causing ExecutorServiceManager.shutdownAndAwaitTermination to time out.
…Node Compiler requires explicit return values when while(flag) can exit without entering the loop body (e.g. during shutdown).
- Remove dead moreThanFrozenNumber test in FreezeBalanceV2ActuatorTest (error message no longer exists in production code) - Remove unnecessary try/catch around addAssetV2 in AccountAssetStoreTest (method declares no checked exceptions) - Replace try/catch pattern with Assert.assertThrows in FetchInvDataMsgHandlerTest
…StatusCheckMockTest - BlockEventGetTest: reset EventPluginLoader.filterQuery to null in @before to prevent FilterQueryTest's leftover state from suppressing processTrigger coverage when tests share the same Gradle forkEvery JVM batch - PeerStatusCheckMockTest: remove dead Assert.assertNotNull(peerStatusCheck) assertion and its now-unused import; the local variable is trivially non-null
- SolidityNode: signal both executor pools before awaiting either so they drain concurrently, halving worst-case shutdown latency - BackupServer: remove dead restart loop and shutdown flag, fix backupManager.stop() ordering (after ingress closed), drop redundant channel.close().await() in favour of ESM as single sync barrier - NativeMessageQueueTest: replace manual shutdownNow()+2s await with ExecutorServiceManager.shutdownAndAwaitTermination to prevent zmq subscriber thread leaking across test runs
- PeerConnection: make relayNodes volatile and document lazy-init rationale; remove unused @Getter (no external callers) - ConcurrentHashMapTest: use daemon executor to avoid holding JVM open on abrupt exit - BackupServerTest: add comment explaining 90s globalTimeout budget - SolidityNode: mark awaitShutdown @VisibleForTesting
Per reviewer feedback, the lazy instance field was added to support test injection via reflection. Use a local variable instead (matching the post-fix(net) shape) and set CommonParameter.fastForwardNodes directly in tests that need to control the relay node list.
Revert the structural refactor of BackupServer to keep the original while(!shutdown) self-healing design. Apply only the minimal fix for the race condition where close() could be called before channel was assigned after bind(): - make channel volatile so close() always sees the latest reference - add shutdown check after bind() to avoid blocking on closeFuture() when shutdown was already set during the bind call
The relayNodes instance field was removed from PeerConnection (reverted to local variable in setChannel()), so ReflectUtils.setFieldValue calls targeting that field caused NPE. Remove the field declaration and all injection lines.
Per reviewer feedback: - PeerConnection: restore static relayNodes field (original approach); remove null guard since Args.setParam() is always called before Spring wires any bean in production. Fix tests by calling Args.setParam() in @BeforeClass so the field is non-null, and use ReflectUtils.setFieldValue to inject test list directly into the static field in testSetChannel. - BackupServer: revert volatile channel and shutdown-after-bind guard; fix BackupServerTest by leaving backupMembers empty so initServer() is a no-op — close() returns immediately with no race condition.
…nd sleep Keep backupMembers configured so initServer() actually starts the server, preserving the test's original purpose of verifying start+close lifecycle. Thread.sleep(1000) ensures channel is assigned before @after tearDown() calls close(), avoiding the race. BackupServer.java is already reverted to develop original so no production code is changed.
f3b8caf to
c457165
Compare
298c23e to
6038488
Compare
What does this PR do?
Fix test quality issues across
framework,actuator, andpluginsmodules, and fix a production bug inSolidityNodeshutdown.Production logic changes
SolidityNode.javanew Thread()with namedExecutorServiceManagerexecutors (solid-get-block,solid-process-block) per project conventionwhile(true)withwhile(flag)ingetBlockByNum()andgetLastSolidityBlockNum()so the loops exit whenflagis set tofalseshutdown()method; both pools are signalled before awaiting either so they drain concurrentlyblockUntilShutdown()withawaitShutdown()to guarantee executor cleanup on exit (SolidityNode is created manually, not managed by Spring)Test-only changes
1. Silent-pass tests
Assert.fail()after code expected to throw exceptions (FetchInvDataMsgHandlerTest,MerkleTreeTest,CredentialsTest, etc.)assertEquals(actual, expected)argument order →assertEquals(expected, actual)2. Thread / resource leaks
Executors.newFixedThreadPool()/new Thread[]withExecutorServiceManagerinLogBlockQueryTest,TransactionRegisterTest,ConcurrentHashMapTestInterruptedExceptionswallowed silently → restore interrupt flag inPropUtilTest,DBIteratorTest,SnapshotImplTest,ProposalControllerTest,BlockStoreTest,PbftTest3. Dead code / phantom fixtures
ArchiveManifestTest,DbArchiveTestFreezeTest,EventParserTest,RelayServiceTest, etc.4. Stale / obsolete tests
moreThanFrozenNumbertest inFreezeBalanceActuatorTest— the corresponding actuator error path no longer exists5. Test infrastructure
ClassLevelAppContextFixtureto unify AppContext and gRPC channel lifecycle across service testsCredentialsTestfrom JUnit 3 to JUnit 4 + Mockitokeystroe→keystoreMockito.verify()assertions inInventoryMsgHandlerTest,PeerStatusCheckMockTestWhy are these changes required?
The
SolidityNodeshutdown bug caused threads to leak andSolidityNodeTestto hang. Test issues caused silent false-passes, cross-test thread leaks, and dead code that obscured real test intent.Testing
./gradlew :framework:test :actuator:test :plugins:testBreaking Changes
None.