Skip to content

fix(framework): fix SolidityNode shutdown and improve test quality#6655

Merged
kuny0707 merged 34 commits intotronprotocol:developfrom
Little-Peony:fix_tests_v2
Apr 28, 2026
Merged

fix(framework): fix SolidityNode shutdown and improve test quality#6655
kuny0707 merged 34 commits intotronprotocol:developfrom
Little-Peony:fix_tests_v2

Conversation

@Little-Peony
Copy link
Copy Markdown
Contributor

@Little-Peony Little-Peony commented Apr 8, 2026

What does this PR do?

Fix test quality issues across framework, actuator, and plugins modules, and fix a production bug in SolidityNode shutdown.

Production logic changes

SolidityNode.java

  • Replace bare new Thread() with named ExecutorServiceManager executors (solid-get-block, solid-process-block) per project convention
  • Replace while(true) with while(flag) in getBlockByNum() and getLastSolidityBlockNum() so the loops exit when flag is set to false
  • Add explicit shutdown() method; both pools are signalled before awaiting either so they drain concurrently
  • Wrap blockUntilShutdown() with awaitShutdown() to guarantee executor cleanup on exit (SolidityNode is created manually, not managed by Spring)

Test-only changes

1. Silent-pass tests

  • Add Assert.fail() after code expected to throw exceptions (FetchInvDataMsgHandlerTest, MerkleTreeTest, CredentialsTest, etc.)
  • Fix assertEquals(actual, expected) argument order → assertEquals(expected, actual)

2. Thread / resource leaks

  • Replace bare Executors.newFixedThreadPool() / new Thread[] with ExecutorServiceManager in LogBlockQueryTest, TransactionRegisterTest, ConcurrentHashMapTest
  • Fix InterruptedException swallowed silently → restore interrupt flag in PropUtilTest, DBIteratorTest, SnapshotImplTest, ProposalControllerTest, BlockStoreTest, PbftTest

3. Dead code / phantom fixtures

  • Remove never-called private static helpers in ArchiveManifestTest, DbArchiveTest
  • Remove unreachable test code in FreezeTest, EventParserTest, RelayServiceTest, etc.

4. Stale / obsolete tests

  • Remove moreThanFrozenNumber test in FreezeBalanceActuatorTest — the corresponding actuator error path no longer exists

5. Test infrastructure

  • Extract ClassLevelAppContextFixture to unify AppContext and gRPC channel lifecycle across service tests
  • Migrate CredentialsTest from JUnit 3 to JUnit 4 + Mockito
  • Fix package typo: keystroekeystore
  • Add Mockito.verify() assertions in InventoryMsgHandlerTest, PeerStatusCheckMockTest

Why are these changes required?

The SolidityNode shutdown bug caused threads to leak and SolidityNodeTest to hang. Test issues caused silent false-passes, cross-test thread leaks, and dead code that obscured real test intent.

Testing

  • SolidityNode: verify sync from full node works normally and node shuts down cleanly
  • All changed tests pass locally: ./gradlew :framework:test :actuator:test :plugins:test

Breaking Changes

None.

@Sunny6889 Sunny6889 changed the title test: fix test quality issues in framework, actuator and plugins test(framework): fix test quality issues Apr 8, 2026
Comment thread framework/src/test/java/org/tron/core/db/AccountAssetStoreTest.java Outdated
@Sunny6889 Sunny6889 requested a review from 3for April 8, 2026 07:53
Comment thread framework/src/test/java/org/tron/core/db/DBIteratorTest.java
Comment thread plugins/src/test/java/org/tron/plugins/DbMoveTest.java
Comment thread framework/src/main/java/org/tron/program/SolidityNode.java Outdated
Comment thread framework/src/test/java/org/tron/core/witness/ProposalControllerTest.java Outdated
Comment thread framework/src/test/java/org/tron/core/net/MessageTest.java Outdated
import org.tron.protos.Protocol.Block;

@Slf4j(topic = "app")
public class SolidityNode {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PR is scoped as test quality improvements, suggest splitting the production code changes (SolidityNode shutdown rewrite, PeerConnection relayNodes refactor) into a separate PR for clearer review.

If keeping them in this PR, please verify the SolidityNode shutdown behavior after the changes:

  • Does the node exit cleanly without unexpected error logs?
  • Is the shutdown time comparable to before (no regression from shutdownAndAwaitTermination timeouts)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I'd like to keep both changes in this PR — here's the reasoning:

PeerConnection.relayNodes refactor — this was directly uncovered by a flaky test. The test was failing because relayNodes as a static field caused state leakage across test instances. Fixing the test isolation without fixing the underlying production code would just be masking the problem. The production fix and the test fix are inherently coupled here.

SolidityNode shutdown rewrite — same story. The while(true) loop prevented the executor thread from terminating cleanly, which caused shutdownAndAwaitTermination to time out in tests. The test fix only works because the production code now exits the loop correctly.

In both cases the production change is the root cause fix, not a refactor added opportunistically. Splitting them out would leave the tests in a broken state.

Regarding your verification questions for SolidityNode: the node exits cleanly — the flag-based loop exits promptly on shutdown signal, so there's no regression in shutdown time. Happy to add a note in the commit message or PR description if that helps with review.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense — the production changes are root cause fixes driven by the test failures, not opportunistic refactors. My original suggestion was more about whether the production code changes (along with their related tests, e.g., SolidityNodeTest) could be split into a dedicated PR, so that the production changes get focused review and verification on their own. But I understand they're tightly coupled with the test fixes here, so keeping them together is reasonable.

Comment thread framework/src/test/java/org/tron/common/utils/PropUtilTest.java Outdated
Comment thread framework/src/test/java/org/tron/core/event/BlockEventGetTest.java
Comment thread framework/src/test/java/org/tron/core/net/peer/PeerStatusCheckMockTest.java Outdated
Little-Peony and others added 20 commits April 27, 2026 10:39
- BandWidthRuntimeTest: deploy contract before setDebug(true) to avoid
  CPU timeout triggered by debug mode during contract deployment
- TronStatsManagerTest: capture P2pStats snapshot before invoking work()
  and compare against snapshot instead of hardcoded 0, preventing
  failures when another test starts the real P2pService with non-zero stats
- DbMoveTest: add @after cleanup of output-directory-toolkit so LevelDB
  and RocksDB tests don't share the same relative destination path
- PeerStatusCheckMockTest: replace 5-second sleep with mock executor;
  verify scheduleWithFixedDelay args and run the captured Runnable directly
…ection

Extract relayNodes from a local variable inside setChannel() to a lazy-
initialized instance field. setChannel() now only fetches from Args when
the field is still null, so tests can pre-inject a custom list via
reflection before calling setChannel(). This restores testability of
PeerManagerTest without touching the test itself.
Use BigInteger for memory size computation to improve precision and add unit tests.
getBlockByNum() and getLastSolidityBlockNum() used while(true) which
ignores the shutdown flag. If either method is blocked retrying on
network errors, shutdown() cannot interrupt the thread, causing
ExecutorServiceManager.shutdownAndAwaitTermination to time out.
…Node

Compiler requires explicit return values when while(flag) can exit
without entering the loop body (e.g. during shutdown).
- Remove dead moreThanFrozenNumber test in FreezeBalanceV2ActuatorTest
  (error message no longer exists in production code)
- Remove unnecessary try/catch around addAssetV2 in AccountAssetStoreTest
  (method declares no checked exceptions)
- Replace try/catch pattern with Assert.assertThrows in FetchInvDataMsgHandlerTest
…StatusCheckMockTest

- BlockEventGetTest: reset EventPluginLoader.filterQuery to null in @before
  to prevent FilterQueryTest's leftover state from suppressing processTrigger
  coverage when tests share the same Gradle forkEvery JVM batch
- PeerStatusCheckMockTest: remove dead Assert.assertNotNull(peerStatusCheck)
  assertion and its now-unused import; the local variable is trivially non-null
- SolidityNode: signal both executor pools before awaiting either so
  they drain concurrently, halving worst-case shutdown latency
- BackupServer: remove dead restart loop and shutdown flag, fix
  backupManager.stop() ordering (after ingress closed), drop redundant
  channel.close().await() in favour of ESM as single sync barrier
- NativeMessageQueueTest: replace manual shutdownNow()+2s await with
  ExecutorServiceManager.shutdownAndAwaitTermination to prevent zmq
  subscriber thread leaking across test runs
- PeerConnection: make relayNodes volatile and document lazy-init
  rationale; remove unused @Getter (no external callers)
- ConcurrentHashMapTest: use daemon executor to avoid holding JVM open
  on abrupt exit
- BackupServerTest: add comment explaining 90s globalTimeout budget
- SolidityNode: mark awaitShutdown @VisibleForTesting
Per reviewer feedback, the lazy instance field was added to support
test injection via reflection. Use a local variable instead (matching
the post-fix(net) shape) and set CommonParameter.fastForwardNodes
directly in tests that need to control the relay node list.
Revert the structural refactor of BackupServer to keep the original
while(!shutdown) self-healing design. Apply only the minimal fix for
the race condition where close() could be called before channel was
assigned after bind():
- make channel volatile so close() always sees the latest reference
- add shutdown check after bind() to avoid blocking on closeFuture()
  when shutdown was already set during the bind call
The relayNodes instance field was removed from PeerConnection (reverted
to local variable in setChannel()), so ReflectUtils.setFieldValue calls
targeting that field caused NPE. Remove the field declaration and all
injection lines.
Per reviewer feedback:
- PeerConnection: restore static relayNodes field (original approach);
  remove null guard since Args.setParam() is always called before
  Spring wires any bean in production. Fix tests by calling
  Args.setParam() in @BeforeClass so the field is non-null, and use
  ReflectUtils.setFieldValue to inject test list directly into the
  static field in testSetChannel.
- BackupServer: revert volatile channel and shutdown-after-bind guard;
  fix BackupServerTest by leaving backupMembers empty so initServer()
  is a no-op — close() returns immediately with no race condition.
…nd sleep

Keep backupMembers configured so initServer() actually starts the server,
preserving the test's original purpose of verifying start+close lifecycle.
Thread.sleep(1000) ensures channel is assigned before @after tearDown()
calls close(), avoiding the race. BackupServer.java is already reverted to
develop original so no production code is changed.
@Little-Peony Little-Peony changed the title test(framework): fix test quality issues fix(framework): fix SolidityNode shutdown and improve test quality Apr 27, 2026
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kuny0707 kuny0707 merged commit 980c707 into tronprotocol:develop Apr 28, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants