Skip to content

[BugFix] fix cloud native pk index memory statistic leak #60566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Jul 3, 2025

Why I'm doing:

In the current cloud-native PK index memtable implementation, we only account for the memory usage of keys and values. However, since we're using a B-tree for storage, the memory consumption of the B-tree's internal nodes is not being accurately tracked.
The B-tree provides a bytes_used interface to retrieve its total memory usage. However, since keys use std::string, the heap memory allocated by these strings isn't included in the bytes_used count. We need to implement additional tracking for this memory.

What I'm doing:

This pull request refactors the PersistentIndexMemtable implementation in be/src/storage/lake/persistent_index_memtable.cpp to improve memory tracking and efficiency. The changes include replacing std::string with std::string_view for key handling, introducing a new method to check heap allocation for strings, and updating memory usage calculations.

Refactoring for Memory Efficiency:

  • Replaced std::string with std::string_view for keys in methods like upsert, insert, erase, erase_with_filter, replace, and get to reduce unnecessary string allocations. ([[1]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L33-R38), [[2]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L54-R65), [[3]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L87-R91), [[4]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L113-R115), [[5]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L130-R135), [[6]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L179-R180))
  • Added a new function is_string_heap_allocated in be/src/util/string_util.cpp to determine whether a std::string uses heap memory, and updated _keys_heap_size calculations accordingly. ([[1]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-58984ee0cc1ce8771537ac8af37a9bb3447abc880f413364470f5bbfc989a265R51-R58), [[2]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-987ba58807d358f61be1b8ea0c619122113532bdb5b574f922986a6a79eada42R79))
  • Modified PersistentIndexMemtable::memory_usage() to use _keys_heap_size and _map.bytes_used() for more accurate memory usage tracking. ([be/src/storage/lake/persistent_index_memtable.cppL179-R180](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728L179-R180))
  • Renamed _keys_size to _keys_heap_size and reset it in the clear() method to ensure proper memory management. ([[1]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-0fa7a9fa792819b27a25ea176c8c8154af0ed664f12dfd466c8ed781e304b2aeL81-R81), [[2]](https://github.com/StarRocks/starrocks/pull/60566/files#diff-2a0d7738bf81a53d3c0561225e77a7ddfb6f5aec5f831a03c4b0371715c04728R189))

Testing Enhancements:

  • Added a unit test test_is_string_heap_allocated in be/test/util/string_util_test.cpp to verify the behavior of the is_string_heap_allocated function for both small and large strings. ([be/test/util/string_util_test.cppR107-R115](https://github.com/StarRocks/starrocks/pull/60566/files#diff-ca9144c8e70f0db720811da5c0d9413dc7a706795fc7ae268380b3cd55f669b0R107-R115))

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.5
    • 3.4
    • 3.3

Signed-off-by: luohaha <18810541851@163.com>
@luohaha luohaha requested review from a team as code owners July 3, 2025 08:44
@mergify mergify bot assigned luohaha Jul 3, 2025
luohaha added 2 commits July 3, 2025 16:46
Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: luohaha <18810541851@163.com>
@kevincai kevincai requested a review from Copilot July 4, 2025 04:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors PersistentIndexMemtable to accurately track heap memory usage for keys by switching from std::string to std::string_view, adding a helper to detect heap-allocated strings, and updating memory accounting and tests.

  • Replace std::string conversions with std::string_view in all mutating and lookup methods to avoid unnecessary allocations.
  • Introduce is_string_heap_allocated to detect when a std::string uses heap memory and update _keys_heap_size accordingly.
  • Update memory_usage() and clear() to include heap key sizes, and add unit tests for is_string_heap_allocated.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
be/src/util/string_util.h Declare is_string_heap_allocated helper
be/src/util/string_util.cpp Define heap-allocation detection logic
be/test/util/string_util_test.cpp Add unit test for is_string_heap_allocated
be/src/storage/lake/persistent_index_memtable.h Rename _keys_size to _keys_heap_size
be/src/storage/lake/persistent_index_memtable.cpp Use string_view, update heap-size tracking, memory_usage(), and clear()

Signed-off-by: luohaha <18810541851@163.com>
Copy link

github-actions bot commented Jul 4, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jul 4, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jul 4, 2025

[BE Incremental Coverage Report]

pass : 19 / 20 (95.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/lake/persistent_index_memtable.cpp 15 16 93.75% [165]
🔵 be/src/util/string_util.cpp 4 4 100.00% []

@luohaha luohaha enabled auto-merge (squash) July 4, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants