Skip to content

test(framework): add dual DB engine coverage with flaky test fixes#6586

Open
vividctrlalt wants to merge 3 commits intotronprotocol:developfrom
vividctrlalt:test/dual-db-engine-coverage
Open

test(framework): add dual DB engine coverage with flaky test fixes#6586
vividctrlalt wants to merge 3 commits intotronprotocol:developfrom
vividctrlalt:test/dual-db-engine-coverage

Conversation

@vividctrlalt
Copy link
Contributor

Summary

New test infrastructure

  • Extract shared DB test logic into DbDataSourceImplTest, BaseMethodTest, TestConstants
  • Extend BaseTest to support DB engine override via Typesafe Config system property
  • Add testWithRocksDb Gradle task; CI runs it on x86_64 after default build

Flaky test fixes (20+ classes)

  • VM: FreezeTest, FreezeV2Test, VoteTest, InternalTransactionCallTest
  • DB: ChainbaseTest, CheckpointV2Test, SnapshotImplTest, SnapshotManagerTest, SnapshotRootTest, RevokingDbWithCacheNewValueTest
  • Core: ManagerTest, TransactionExpireTest, ForkControllerTest, ShieldedReceiveTest
  • Services: MetricsApiServiceTest, PeerStatusCheckTest, SyncServiceTest, ComputeRewardTest, ConditionallyStopTest
  • Config: ArgsTest, DynamicArgsTest
  • New: TrieTest

Production code (minimal)

  • Add Args.validateConfig() for ARM64 + LevelDB rejection at startup
  • Call validateConfig() in FullNode.main() (production entry only, tests unaffected)
  • Remove unused ROCKS_DB_ENGINE constant in Storage.java

Test plan

  • All existing tests pass with default (LevelDB) engine
  • ./gradlew :framework:testWithRocksDb passes on x86_64
  • ARM64 builds exclude LevelDB tests and use RocksDB
  • CI pr-check workflow runs dual-engine tests on x86_64

- Extract shared test logic into DbDataSourceImplTest, BaseMethodTest, TestConstants
- Enable DB engine override via system property so tests run against both engines
- Refactor 20+ test classes to support dual-engine testing
- Fix flaky tests: FreezeTest, FreezeV2Test, VoteTest, ManagerTest, ShieldedReceiveTest
- Update build.gradle and CI workflow accordingly
private final Map<String, Sha256Hash> dbRoots = Maps.newConcurrentMap();

public static String getDbEngineFromConfig(final Config config) {
if (Arch.isArm64()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change and validateConfig introduce a regression: on ARM64, starting the node with the default config.conf (where db.engine is set to LEVELDB) now results in a startup failure.

The issue can be reproduced as follows:

$ ./gradlew :framework:buildFullNodeJar
…..
Building for architecture: aarch64, Java version: 17
…..

$ java -jar framework/build/libs/FullNode.jar
$ tail -f logs/tron.log
…..
15:17:21.876 ERROR [main] [Exit](ExitManager.java:49) Shutting down with code: PARAMETER_INIT(1), reason: ARM64 architecture only supports RocksDB. Current engine: LEVELDB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've opened an issue to continue the discussion: #6587

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe fail-fast is the correct behavior here. If a user runs on ARM64 with db.engine=LEVELDB, the node should refuse to start with a clear error message, rather than silently falling back to RocksDB or deferring the failure to runtime.

The principle is: no hidden behavior. If you're running on ARM64, you must explicitly configure a supported engine. A silent fallback would mask a misconfiguration and could lead to confusion later (e.g., user expects LevelDB but data is actually stored in RocksDB).

That said, I understand the concern about the default config.conf shipping with LEVELDB. This is really a config-level issue — the default config should either be platform-aware or the documentation should clearly state that ARM64 users need to set db.engine=ROCKSDB. Happy to discuss further in #6587.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more context: the whole point of this series of changes is to eliminate silent behaviors in FullNode. Silently swapping the DB engine in production is exactly the kind of hidden behavior we're trying to remove — it should be strictly prohibited.

If an ARM64 node has db.engine=LEVELDB in config.conf, that's a configuration management problem, not something the application should paper over at runtime. Mixing config-management concerns into production code leads to harder-to-debug issues and undermines the principle of explicit configuration.

The correct fix belongs in the deployment/config layer — ARM64 environments should ship with the right config.conf from the start.

* Validate final configuration after all sources (defaults, config, CLI) are applied.
*/
public static void validateConfig() {
if (Arch.isArm64() && !"ROCKSDB".equals(PARAMETER.storage.getDbEngine())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "ROCKSDB".equals(PARAMETER.storage.getDbEngine()) to "ROCKSDB".equals(PARAMETER.storage.getDbEngine()).toUpperCase() or Constant.ROCKSDB.equalsIgnoreCase(PARAMETER.storage.getDbEngine()), so that values like rocksdb or RocksDB are also supported, just like https://github.com/vividctrlalt/java-tron/blob/test/dual-db-engine-coverage/framework/src/main/java/org/tron/core/config/args/Args.java#L799.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will fix to use Constant.ROCKSDB.equalsIgnoreCase() for consistency.

* Proper fix: TrieImpl.insert() should short-circuit when the existing value
* equals the new value, avoiding unnecessary invalidate(). See TrieImpl:188-192.
*/
@Ignore("TrieImpl bug: root hash depends on insertion order with duplicate key-value puts")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was @Ignore used instead of addressing the issue in TrieImpl.insert()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! This is not a test bug — it's a bug in the production code TrieImpl.insert(). When inserting a duplicate key with the same value, the root hash changes depending on insertion order, which is incorrect.

The comment on @Ignore explains the root cause: TrieImpl.insert() should short-circuit when the existing value equals the new value, avoiding an unnecessary invalidate() call (see TrieImpl:188-192).

Since the scope of this PR is test refactoring (dual DB engine coverage), fixing production code in TrieImpl is out of scope here. The @Ignore annotation documents the known issue so it can be tracked and fixed separately. Also worth noting that this trie code is not yet enabled in production, so the impact is limited for now.

@3for
Copy link

3for commented Mar 18, 2026

Running ./gradlew :framework:testWithRocksDb on ARM64 causes testLevelDb to fail. It would be better to add assumeLevelDbAvailable(); at the beginning of testLevelDb to skip this test when LevelDB is not available on the platform.

org.tron.core.db.DBIteratorTest > testLevelDb FAILED
    java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [java.lang.LinkageError: Unable to load library leveldbjni64-1.8, java.lang.LinkageError: Unable to load library leveldbjni-1.8, java.lang.LinkageError: Unable to load library leveldbjni, java.lang.LinkageError: Unable to load library from /var/folders/p5/bmmxtx5x14j_6znwywzj09w40000gq/T/libleveldbjni-1.8-7667426779319649693.jnilib, java.lang.LinkageError: Unable to load library from /Users/***/.hawtjni/leveldbjni/libleveldbjni-1.8-2478008178187835858.jnilib]
        at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:204)
        at org.fusesource.hawtjni.runtime.Library.load(Library.java:156)
        at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
        at org.tron.core.db.DBIteratorTest.testLevelDb(DBIteratorTest.java:35)

2315 tests completed, 6 failed, 28 skipped

> Task :framework:testWithRocksDb FAILED

FAILURE: Build failed with an exception.

Add assumeLevelDbAvailable() to DBIteratorTest.testLevelDb to skip
when LevelDB JNI is unavailable on ARM64 platforms.

Co-Authored-By: 3for <zouyudi@gmail.com>
@vividctrlalt
Copy link
Contributor Author

Thanks @3for for catching this! Fixed in d57d110 — added assumeLevelDbAvailable() to DBIteratorTest.testLevelDb so it skips on ARM64 where LevelDB JNI is unavailable.

On ARM64, LevelDB is not available at build time. Instead of throwing
an exception that breaks default config startup, warn and override
to RocksDB. Also uses equalsIgnoreCase for consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Discussion] Should aarch64 throw an exception when db.engine=LEVELDB?

3 participants