New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Collect end_to_end traces should only request traces if there is a parent trace. #19497
Closed
1 task done
Tracked by
#19626
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
amitanandaiyer
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Oct 12, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
and removed
status/awaiting-triage
Issue awaiting triage
labels
Oct 12, 2023
yugabyte-ci
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Oct 12, 2023
amitanandaiyer
added a commit
that referenced
this issue
Oct 17, 2023
… non-null Summary: `collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc. With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging. Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced. Memory footprint: Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory. Jira: DB-8294 Test Plan: ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 147 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 145 ``` With-change: ``` 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 9 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 0 ``` ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 591 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 139 ``` With-change: ``` 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 441 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 14 ``` Reviewers: rthallam, esheng Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D29266
amitanandaiyer
added a commit
that referenced
this issue
Oct 18, 2023
…parent trace is non-null Summary: Original commit: a41c17d / D29266 `collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc. With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging. Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced. Memory footprint: Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory. Jira: DB-8294 Test Plan: ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 147 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 145 ``` With-change: ``` 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 9 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 0 ``` ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 591 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 139 ``` With-change: ``` 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 441 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 14 ``` Reviewers: rthallam, esheng, arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29415
amitanandaiyer
added a commit
that referenced
this issue
Oct 19, 2023
…parent trace is non-null Summary: Original commit: a41c17d / D29266 `collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc. With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging. Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced. Memory footprint: Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory. Jira: DB-8294 Test Plan: ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 147 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 145 ``` With-change: ``` 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 9 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 0 ``` ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 591 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 139 ``` With-change: ``` 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 441 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 14 ``` Reviewers: rthallam, esheng, arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29416
amitanandaiyer
added a commit
that referenced
this issue
Oct 19, 2023
…parent trace is non-null Summary: Original commit: a41c17d / D29266 `collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc. With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging. Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced. Memory footprint: Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory. Jira: DB-8294 Test Plan: ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 147 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 145 ``` With-change: ``` 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 9 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 0 ``` ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 591 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 139 ``` With-change: ``` 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 441 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 14 ``` Reviewers: rthallam, esheng, arybochkin Reviewed By: arybochkin Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29417
amitanandaiyer
added a commit
that referenced
this issue
Oct 23, 2023
…parent trace is non-null Summary: Original commit: a41c17d / D29266 `collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc. With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging. Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced. Memory footprint: Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory. Jira: DB-8294 Test Plan: ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 147 10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 145 ``` With-change: ``` 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 9 10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 0 ``` ``` export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10" ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large ``` Master: ``` 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 591 10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 139 ``` With-change: ``` 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 441 10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log 14 ``` Reviewers: rthallam, esheng Reviewed By: rthallam Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29450
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-8294
Description
Generally,
collect_end_to_end_traces is used in conjunction with enable_tracing to fetch the traces from the remote server for outgoing calls such as Read/Write etc.
With Sampled traces, if we enable collect_end_to_end_traces, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause bloat in memory consumption if a large number of requests are being processed by the tserver. The memory bloat can cause the server to start running out of memory and cause the workload to hit soft memory thresholds etc, if the workload is quite intensive.
Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: