Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Collect end_to_end traces should only request traces if there is a parent trace. #19497

Closed
1 task done
Tracked by #19626
amitanandaiyer opened this issue Oct 12, 2023 · 0 comments
Closed
1 task done
Tracked by #19626

Comments

@amitanandaiyer
Copy link
Contributor

amitanandaiyer commented Oct 12, 2023

Jira Link: DB-8294

Description

Generally,
collect_end_to_end_traces is used in conjunction with enable_tracing to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable collect_end_to_end_traces, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause bloat in memory consumption if a large number of requests are being processed by the tserver. The memory bloat can cause the server to start running out of memory and cause the workload to hit soft memory thresholds etc, if the workload is quite intensive.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@amitanandaiyer amitanandaiyer added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Oct 12, 2023
@amitanandaiyer amitanandaiyer self-assigned this Oct 12, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Oct 12, 2023
@yugabyte-ci yugabyte-ci added priority/high High Priority and removed priority/medium Medium priority issue labels Oct 12, 2023
amitanandaiyer added a commit that referenced this issue Oct 17, 2023
… non-null

Summary:
`collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Memory footprint:
Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory.
Jira: DB-8294

Test Plan:
```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000"
ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```

Master:
```
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
147
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
145

```
With-change:
```
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
9
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
0
```

```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10"

ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```
Master:
```
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
591
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
139
```

With-change:
```
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
441
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
14
```

Reviewers: rthallam, esheng

Reviewed By: rthallam

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D29266
amitanandaiyer added a commit that referenced this issue Oct 18, 2023
…parent trace is non-null

Summary:
Original commit: a41c17d / D29266
`collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Memory footprint:
Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory.
Jira: DB-8294

Test Plan:
```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000"
ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```

Master:
```
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
147
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
145

```
With-change:
```
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
9
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
0
```

```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10"

ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```
Master:
```
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
591
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
139
```

With-change:
```
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
441
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
14
```

Reviewers: rthallam, esheng, arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29415
amitanandaiyer added a commit that referenced this issue Oct 19, 2023
…parent trace is non-null

Summary:
Original commit: a41c17d / D29266
`collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Memory footprint:
Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory.
Jira: DB-8294

Test Plan:
```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000"
ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```

Master:
```
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
147
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
145

```
With-change:
```
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
9
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
0
```

```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10"

ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```
Master:
```
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
591
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
139
```

With-change:
```
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
441
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
14
```

Reviewers: rthallam, esheng, arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29416
amitanandaiyer added a commit that referenced this issue Oct 19, 2023
…parent trace is non-null

Summary:
Original commit: a41c17d / D29266
`collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Memory footprint:
Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory.
Jira: DB-8294

Test Plan:
```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000"
ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```

Master:
```
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
147
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
145

```
With-change:
```
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
9
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
0
```

```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10"

ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```
Master:
```
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
591
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
139
```

With-change:
```
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
441
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
14
```

Reviewers: rthallam, esheng, arybochkin

Reviewed By: arybochkin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29417
amitanandaiyer added a commit that referenced this issue Oct 23, 2023
…parent trace is non-null

Summary:
Original commit: a41c17d / D29266
`collect_end_to_end_traces` is used in conjunction with `enable_tracing` to fetch the traces from the remote server for outgoing calls such as Read/Write etc.

With Sampled traces, if we enable `collect_end_to_end_traces`, we notice that all outgoing calls end up requesting a trace. i.e. even though inbound calls are only traced 1 in 1000, outgoing calls end up being traced 100% of times, causing the lower layers to trace a lot more. This can cause additional memory consumption. And a lot more logging.

Collecting a trace at the ybclient layer is not that useful, if it is not tied to a parent trace/incoming trace, so we could change this to only request an outgoing trace, if the current rpc is being traced.

Memory footprint:
Although the memory consumed by the Trace object is tracked as part of the inbound_call, the accounting does not keep track of the increase as we add more traces -- this can cause "unexplained memory buildup" due to the difference between total memory and server memory.
Jira: DB-8294

Test Plan:
```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=1000"
ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```

Master:
```
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
147
10:29 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
145

```
With-change:
```
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
9
10:31 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
0
```

```
export YB_EXTRA_TSERVER_FLAGS="--collect_end_to_end_traces=true --print_trace_every=1 --enable_tracing=false --sampled_trace_1_in_n=10"

ybd --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.Large

```
Master:
```
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
591
10:35 ~/code/yugabyte-db [master] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
139
```

With-change:
```
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Tracing op:" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
441
10:33 ~/code/yugabyte-db [fix_end_to_end_traces] $ grep -c "inbound_call.cc.*Trace Created" /Users/amitanand/code/yugabyte-db/build/debug-clang-dynamic-arm64-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_Large.log
14
```

Reviewers: rthallam, esheng

Reviewed By: rthallam

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29450
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants