Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add get-db-stats command #3274

Merged
merged 3 commits into from Sep 2, 2021

Conversation

sdbondi
Copy link
Member

@sdbondi sdbondi commented Aug 31, 2021

Description

Adds get-db-stats command. This returns the LMDB entry stats and
the total entry sizes for each internal blockchain db.

Motivation and Context

Useful in debugging database sizes. At height 26215

>> get-db-stats
Name                              | Entries | Depth | Branch Pages | Leaf Pages | Overflow Pages
--------------------------------- | ------- | ----- | ------------ | ---------- | --------------
metadata_db                       | 5       | 1     | 0            | 1          | 23
headers_db                        | 26218   | 3     | 19           | 4057       | 0
header_accumulated_data_db        | 26218   | 3     | 6            | 1010       | 0
block_accumulated_data_db         | 26218   | 3     | 99           | 21775      | 1087
block_hashes_db                   | 26218   | 3     | 9            | 468        | 0
utxos_db                          | 747509  | 5     | 15022        | 373736     | 0
inputs_db                         | 560784  | 5     | 5928         | 99085      | 0
txos_hash_to_index_db             | 747509  | 4     | 533          | 35242      | 0
kernels_db                        | 262572  | 5     | 2556         | 39712      | 0
kernel_excess_index               | 262572  | 4     | 172          | 11999      | 0
kernel_excess_sig_index           | 262572  | 4     | 432          | 15393      | 0
kernel_mmr_size_index             | 26218   | 2     | 1            | 170        | 0
output_mmr_size_index             | 26218   | 3     | 3            | 438        | 0
utxo_commitment_index             | 186725  | 4     | 119          | 6889       | 0
orphans_db                        | 720     | 3     | 9            | 554        | 1674
orphan_header_accumulated_data_db | 718     | 2     | 1            | 47         | 0
monero_seed_height_db             | 1       | 1     | 0            | 1          | 0
orphan_chain_tips_db              | 16      | 1     | 0            | 1          | 0
orphan_parent_map_index           | 720     | 2     | 1            | 22         | 0

19 databases, page size: 4096 bytes

Totalling DB entry sizes. This may take a few seconds...

>>
Name                              | Entries | Total Size | Avg. Size/Entry | % of total
--------------------------------- | ------- | ---------- | --------------- | ----------
metadata_db                       | 5       | 0.09 MiB   | 18859 bytes     | 0.01%
headers_db                        | 26218   | 12.34 MiB  | 493 bytes       | 0.90%
header_accumulated_data_db        | 26218   | 3.40 MiB   | 136 bytes       | 0.25%
block_accumulated_data_db         | 26218   | 39.25 MiB  | 1569 bytes      | 2.86%
block_hashes_db                   | 26218   | 1.00 MiB   | 40 bytes        | 0.07%
utxos_db                          | 747509  | 789.72 MiB | 1107 bytes      | 57.51%
inputs_db                         | 560784  | 263.80 MiB | 493 bytes       | 19.21%
txos_hash_to_index_db             | 747509  | 84.83 MiB  | 119 bytes       | 6.18%
kernels_db                        | 262572  | 90.40 MiB  | 361 bytes       | 6.58%
kernel_excess_index               | 262572  | 29.05 MiB  | 116 bytes       | 2.12%
kernel_excess_sig_index           | 262572  | 37.06 MiB  | 148 bytes       | 2.70%
kernel_mmr_size_index             | 26218   | 0.40 MiB   | 16 bytes        | 0.03%
output_mmr_size_index             | 26218   | 1.40 MiB   | 56 bytes        | 0.10%
utxo_commitment_index             | 186725  | 12.82 MiB  | 72 bytes        | 0.93%
orphans_db                        | 720     | 7.51 MiB   | 10932 bytes     | 0.55%
orphan_header_accumulated_data_db | 718     | 0.11 MiB   | 160 bytes       | 0.01%
monero_seed_height_db             | 1       | 0.00 MiB   | 40 bytes        | 0.00%
orphan_chain_tips_db              | 16      | 0.00 MiB   | 72 bytes        | 0.00%
orphan_parent_map_index           | 720     | 0.05 MiB   | 72 bytes        | 0.00%

Total data size: 1373.23 MiB

How Has This Been Tested?

Manually by running get-db-stats

@sdbondi sdbondi force-pushed the core-lmdb-stats branch 3 times, most recently from 783654f to e03d4d9 Compare August 31, 2021 12:54
@sdbondi sdbondi changed the title feat: add get-blockchain-db-stats command feat: add get-db-stats command Aug 31, 2021
@sdbondi sdbondi force-pushed the core-lmdb-stats branch 4 times, most recently from fdbb8d7 to c2b1e10 Compare August 31, 2021 13:02
stringhandler
stringhandler previously approved these changes Aug 31, 2021
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

SWvheerden
SWvheerden previously approved these changes Aug 31, 2021
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ran locally, and it worked.
There is just a large discrepancy between the reported size and the size on disc.
command reported 1.4GB.
OSX: 2.7GB
Can just be an LMDB allocation issue?

applications/tari_base_node/src/main.rs Outdated Show resolved Hide resolved
@sdbondi
Copy link
Member Author

sdbondi commented Aug 31, 2021

Yeah saw the same, there would be some overhead for the BTrees and some "wasted" bytes (by glossing some details of LMDB overflow pages). But yeah, I can't imagine there would be this much overhead. While pruned syncing, I'm observing the real size is almost aways double the actual used data size. I've added additional info that the resizer uses

@sdbondi sdbondi force-pushed the core-lmdb-stats branch 8 times, most recently from a001e6d to 1cf2e2a Compare August 31, 2021 14:50
Adds `get-db-stats` command. This returns the LMDB entry stats and
the total entry sizes for each internal blockchain db.
@sdbondi sdbondi force-pushed the core-lmdb-stats branch 4 times, most recently from b6bcb69 to 37bb8a4 Compare September 2, 2021 06:09
@sdbondi
Copy link
Member Author

sdbondi commented Sep 2, 2021

UPDATE:

Resizing changed in this PR:

Previous approach: on every write call mdb_env_stat and mdb_env_info to determine if a resize is needed. Pros: simple Cons: slightly (not measured) decreases write performance.

This approach: Monero's approach. If a write returned a MDB_FULL error, resize and retry the transaction. Pros: no write overhead, potentially more robust. Cons: DbTransactions must be "retryable" (in practise don't consume the operations, rather borrow them)

Tested on syncing base node with small initial size
This change is in another commit so can be reversed if there are strong feelings against it.

Tested manually with config, left overnight with at least one db resize occurring during block propagation

db_init_size_mb = 500
db_grow_size_mb = 100
db_resize_threshold_mb = 99

@sdbondi
Copy link
Member Author

sdbondi commented Sep 2, 2021

Regarding DB Size:

At our current height, the actual stored blockchain data takes only around 55% of the total bytes used by LMDB pages. Overflow pages are 4096 bytes each (like every other page) and hold "overflow" data bigger than one page. We are seeing a lot of this data in block_accumulated_data_db which holds the deleted bitmap for a block.

Comparing a bitmap serialization vs a Vec by adding 500 random u32s we see this.

Output:

bitmap = 4976 (This would create an extra overflow page, meaning a total of 4096 + 4096 bytes used to store this value)
vector = 2008 (4 * 500 + 8 byte (usize) length)

Perhaps we should use a simple Vec for accumulated data

let mut bm = Bitmap::create();
let mut v = Vec::new();
for _ in 0..500 {
    let n = OsRng.next_u32();
    bm.add(n);
    v.push(n);
}
let b = bm.serialize();
eprintln!("bm.run_optimize() = {:?}", bm.run_optimize());
let b2 = bm.serialize();
assert_eq!(b.len(), b2.len());
eprintln!("bitmap = {:?}", b.len());

let b = bincode::serialize(&v).unwrap();
eprintln!("vector = {:?}", b.len());

@stringhandler
Copy link
Collaborator

Interesting. Given that the aim of MW is primarily to reduce db size I would suggest storing it as a Vec then

Resize everytime the database indicates it has run out of space rather
than checking for each and every write.
@aviator-app aviator-app bot removed the mq-failed label Sep 2, 2021
@aviator-app aviator-app bot merged commit d785f4f into tari-project:development Sep 2, 2021
@sdbondi sdbondi deleted the core-lmdb-stats branch September 5, 2021 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants