Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Tracker #5082

Merged
merged 32 commits into from
Dec 28, 2022
Merged

Memory Tracker #5082

merged 32 commits into from
Dec 28, 2022

Conversation

codesigner
Copy link
Contributor

@codesigner codesigner commented Dec 21, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

#5081

Description:

UPDATE: MemoryTracker is only turn on when jemalloc in enabled. because default malloc cannot get accurate size of free(ptr) (see untrackMemory in Memory.h, malloc_usable_size() is not accurate);

override the default c++ new/delete operator, the new added new/delete function do the memory quota counting before do the real alloc/free

  • NewDelete.cpp take over the c++ global new/free operators, hand over control to Memory.h;
  • Memory.h defined the tracking/alloc/dealloc routines
    • using jemalloc to get the alignment and sizing staff;
    • use MemoryTracker to track the memory usage;
  • MemoryTracker.cpp
    • A thread local MemoryStats to make thread can reserve some memory in local, avoid get memory from global on every (de)alloc;
    • A global used memory counter and A set-able limit, throw bad_alloc when used exceeded limit;
  • Graph Executors
    • added thenError callback to capture the bad_alloc exception, wrap error in status, making the query hit the limit fail;
  • Storage Processors
    • added try catch, to catch exception in singleThread, (FLAGS_query_concurrently=false )
    • add thenError, to be able failed in multiple Thread(FLAGS_query_concurrently=true)

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A
  • Mannual

Test

  • DataSet: LDBC sf1, set cgroup to limit graphd to 500Mb, or storaged to 300Mb to let graphd/storaged hit the memory limit
  • Query match (v:Person)-[e*3]->(x) where id(v)==933 return x
  • Env: graphd, storaged's max memory controled by separate cgroup
  • Before: result in graphd/storaged killed by system's OOM killer,
  • After:
    • query failed with error message
(root@nebula) [sf1]> match (v:Person)-[e*3]->(x) where id(v)==933 return x;
[ERROR (-1005)]: Graph Error: GRAPH_MEMORY_EXCEEDED(-2600)

(root@nebula) [sf1]> match (v:Person)-[e*3]->(x) where id(v)==933 return x;
[ERROR (-1005)]: Storage Error: STORAGE_MEMORY_EXCEEDED(-3600)
  • graphd/storaged not killed,

  • Debug LOG indicate it do not leak memory

  • graphd memory stats log

I20221223 17:13:10.113437 694469 MemoryUtils.cpp:135]  sys_used: 22601728 sys_total: 524288000 sys_ratio:0.0431094 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275
I20221223 17:13:11.110716 694469 MemoryUtils.cpp:135]  sys_used: 21544960 sys_total: 524288000 sys_ratio:0.0410938 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275
I20221223 17:13:12.111212 694469 MemoryUtils.cpp:135]  sys_used: 35426304 sys_total: 524288000 sys_ratio:0.0675703 usr_used:61865984 usr_total:419430400 usr_ratio:0.1475
I20221223 17:13:13.111642 694469 MemoryUtils.cpp:135]  sys_used: 35430400 sys_total: 524288000 sys_ratio:0.0675781 usr_used:61865984 usr_total:419430400 usr_ratio:0.1475
I20221223 17:13:14.112054 694469 MemoryUtils.cpp:135]  sys_used: 35426304 sys_total: 524288000 sys_ratio:0.0675703 usr_used:61865984 usr_total:419430400 usr_ratio:0.1475
I20221223 17:13:15.108489 694469 MemoryUtils.cpp:135]  sys_used: 35426304 sys_total: 524288000 sys_ratio:0.0675703 usr_used:61865984 usr_total:419430400 usr_ratio:0.1475
I20221223 17:13:16.108870 694469 MemoryUtils.cpp:135]  sys_used: 35426304 sys_total: 524288000 sys_ratio:0.0675703 usr_used:61865984 usr_total:419430400 usr_ratio:0.1475
I20221223 17:13:17.109252 694469 MemoryUtils.cpp:135]  sys_used: 142598144 sys_total: 524288000 sys_ratio:0.271984 usr_used:123731968 usr_total:419430400 usr_ratio:0.295
I20221223 17:13:18.109655 694469 MemoryUtils.cpp:135]  sys_used: 264663040 sys_total: 524288000 sys_ratio:0.504805 usr_used:278921216 usr_total:419430400 usr_ratio:0.665
E20221223 17:13:18.984743 694227 QueryInstance.cpp:144] Graph Error: GRAPH_MEMORY_EXCEEDED(-2600)
I20221223 17:13:19.131322 694469 MemoryUtils.cpp:135]  sys_used: 406847488 sys_total: 524288000 sys_ratio:0.776 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275
I20221223 17:13:20.132285 694469 MemoryUtils.cpp:135]  sys_used: 40009728 sys_total: 524288000 sys_ratio:0.0763125 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275
I20221223 17:13:21.113485 694469 MemoryUtils.cpp:135]  sys_used: 29794304 sys_total: 524288000 sys_ratio:0.0568281 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275
I20221223 17:13:22.110652 694469 MemoryUtils.cpp:135]  sys_used: 29667328 sys_total: 524288000 sys_ratio:0.0565859 usr_used:53477376 usr_total:419430400 usr_ratio:0.1275

  • storaged memory stats log
    storaged has a FLAGS_query_concurrently, will run different code path, both tested with FLAGS_query_concurrently set to false and true
I20221223 16:11:21.265054 694559 MemoryUtils.cpp:135]  sys_used: 108347392 sys_total: 314572800 sys_ratio:0.344427 usr_used:154830224 usr_total:251658240 usr_ratio:0.61524
I20221223 16:11:22.266657 694559 MemoryUtils.cpp:135]  sys_used: 108355584 sys_total: 314572800 sys_ratio:0.344453 usr_used:154830224 usr_total:251658240 usr_ratio:0.61524
I20221223 16:11:23.266523 694559 MemoryUtils.cpp:135]  sys_used: 33931264 sys_total: 314572800 sys_ratio:0.107865 usr_used:155878800 usr_total:251658240 usr_ratio:0.619407
I20221223 16:11:24.265806 694559 MemoryUtils.cpp:135]  sys_used: 84860928 sys_total: 314572800 sys_ratio:0.269766 usr_used:228230544 usr_total:251658240 usr_ratio:0.906907
I20221223 16:11:25.265872 694559 MemoryUtils.cpp:135]  sys_used: 107819008 sys_total: 314572800 sys_ratio:0.342747 usr_used:154830224 usr_total:251658240 usr_ratio:0.61524
I20221223 16:11:26.265977 694559 MemoryUtils.cpp:135]  sys_used: 107679744 sys_total: 314572800 sys_ratio:0.342305 usr_used:154830224 usr_total:251658240 usr_ratio:0.61524
I20221223 16:11:27.266279 694559 MemoryUtils.cpp:135]  sys_used: 107683840 sys_total: 314572800 sys_ratio:0.342318 usr_used:154830224 usr_total:251658240 usr_ratio:0.61524

Performance

E2E test

  • Test a query to verify how mush the latency increase, with memtracker, average latency increase 1.429839%, (just a glance, may varies on different queries)
  • Query: match (v:Person)-[e*3]->(x) where id(v)==933 return max(x);
  master(s) master + memtracker (s)  increasement(%)
  4.748618 4.732939  
  4.696424 4.834848  
  4.744301 4.838668  
  4.78011 4.829482  
  4.706159 4.808589  
  4.872129 4.896602  
  4.798863 4.863956  
  4.704617 4.827797  
  4.706771 4.79206  
  4.762867 4.77539  
avg 4.7520859 4.8200331 1.429839%

FlameGraph

the percentile of in MemoryTracker's allocImpl() and free() is trival, I think the cache miss introduce by atomic member (which cannot be revealed by FlameGraph) is the major reason of slowness.

  • GraphD
    perf_graphd2

    • alloc (0.08%)
      Screenshot from 2022-12-26 13-41-19
    • free (0.04%)
      Screenshot from 2022-12-26 13-40-52
  • StorageD
    perf_storaged

    • alloc
      can not find, maybe too trivial
    • free (0.1%)
      Screenshot from 2022-12-26 13-50-17

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2022

Codecov Report

Base: 77.36% // Head: 77.47% // Increases project coverage by +0.11% 🎉

Coverage data is based on head (834dad6) compared to base (011f2fa).
Patch coverage: 47.86% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5082      +/-   ##
==========================================
+ Coverage   77.36%   77.47%   +0.11%     
==========================================
  Files        1105     1110       +5     
  Lines       82188    82917     +729     
==========================================
+ Hits        63586    64242     +656     
- Misses      18602    18675      +73     
Impacted Files Coverage Δ
src/clients/meta/MetaClient.h 92.30% <ø> (ø)
src/common/memory/MemoryTracker.cpp 0.00% <0.00%> (ø)
src/common/meta/SchemaManager.h 50.00% <ø> (ø)
src/common/meta/ServerBasedSchemaManager.h 100.00% <ø> (ø)
src/graph/executor/Executor.h 41.02% <0.00%> (-8.98%) ⬇️
src/graph/executor/StorageAccessExecutor.h 55.55% <0.00%> (-3.54%) ⬇️
src/graph/executor/algo/ShortestPathBase.cpp 46.15% <0.00%> (-0.73%) ⬇️
src/graph/executor/algo/ShortestPathBase.h 50.00% <ø> (ø)
src/graph/optimizer/OptimizerUtils.cpp 94.82% <ø> (-0.03%) ⬇️
src/graph/optimizer/rule/IndexScanRule.cpp 88.13% <ø> (+0.11%) ⬆️
... and 134 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -65,6 +66,14 @@ folly::Future<Status> SubgraphExecutor::getNeighbors() {
}
vids_.clear();
return handleResponse(std::move(resp));
})
.thenError(folly::tag_t<std::bad_alloc>{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about other Executor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about other Executor?

I am trying add more check, including storaged's execution path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we catch all bad allocation exception in one place like Scheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we catch all bad allocation exception in one place like Scheduler?

I tried, but experimented show that exception occured in folly's Executor can be catched in thenError, but outside seems not able to catch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no conflict. As I known, Scheduler will call execute function of each Executor and got the Future of execution. So, we could process bad allocation exception from this Future.
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no conflict. As I known, Scheduler will call execute function of each Executor and got the Future of execution. So, we could process bad allocation exception from this Future. WDYT?

Yes, no conflict, I have also added catch in Scheduler, see Scheduler.cpp

@codesigner codesigner requested a review from a team as a code owner December 22, 2022 13:34
Comment on lines 70 to 72
memoryMonitorThread_ = std::make_unique<thread::GenericWorker>();
if (!memoryMonitorThread_ || !memoryMonitorThread_->start("graph-memory-monitor")) {
return Status::Error("Fail to start query engine background thread.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query engine? graph-memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query engine? graph-memory?

good catch

Comment on lines 32 to 59
try {
if (req.common_ref().has_value() && req.get_common()->profile_detail_ref().value_or(false)) {
profileDetailFlag_ = true;
profileDetail("GetDstBySrcProcessorTotal", 0);
profileDetail("GetDstBySrcProcessorDedup", 0);
}

spaceId_ = req.get_space_id();
auto retCode = getSpaceVidLen(spaceId_);
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) {
for (auto& p : req.get_parts()) {
pushResultCode(retCode, p.first);
spaceId_ = req.get_space_id();
auto retCode = getSpaceVidLen(spaceId_);
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) {
for (auto& p : req.get_parts()) {
pushResultCode(retCode, p.first);
}
onFinished();
return;
}
onFinished();
return;
}
this->planContext_ = std::make_unique<PlanContext>(
this->env_, spaceId_, this->spaceVidLen_, this->isIntId_, req.common_ref());

// check edgetypes exists
retCode = checkAndBuildContexts(req);
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) {
for (auto& p : req.get_parts()) {
pushResultCode(retCode, p.first);
this->planContext_ = std::make_unique<PlanContext>(
this->env_, spaceId_, this->spaceVidLen_, this->isIntId_, req.common_ref());

// check edgetypes exists
retCode = checkAndBuildContexts(req);
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) {
for (auto& p : req.get_parts()) {
pushResultCode(retCode, p.first);
}
onFinished();
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try-catch it outside the function doProcess in GraphStorageServiceHandler.cpp?

try {
  doProcess()
} catch xxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try-catch it outside the function doProcess in GraphStorageServiceHandler.cpp?

try {
  doProcess()
} catch xxx

The call stack is like this, xxxProcessor.process is called from folly
Screenshot from 2022-12-22 13-13-17

@Sophie-Xie Sophie-Xie added the priority/hi-pri Priority: high label Dec 26, 2022
…valable, depends on jemalloc to get accurate free size
critical27
critical27 previously approved these changes Dec 27, 2022
Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

dutor
dutor previously approved these changes Dec 27, 2022
Copy link
Contributor

@dutor dutor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me mostly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/hi-pri Priority: high ready for review ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants