feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974

spencergilbert · 2021-11-09T19:55:22Z

Signed-off-by: Spencer Gilbert spencer.gilbert@datadoghq.com

I had to remove the const to set it like this, if there's a better alternative let me know. Setting this to 0 rather than None should allow us to use cached values from the apiserver rather than getting the freshest results from etcd directly.

Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

netlify · 2021-11-09T19:55:28Z

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 172ebeb

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/618c267ac81a7d0007aa2f77

jszwedko

Makes sense. For the kubernetes_logs source we only consume logs for local pods so I imagine that the local node will have any relevant data available. Are you aware of any risks here?

spencergilbert · 2021-11-09T20:03:46Z

Makes sense. For the kubernetes_logs source we only consume logs for local pods so I imagine that the local node will have any relevant data available. Are you aware of any risks here?

We actually don't communicate with the "local" node. We today, and after this, make calls to the control plane for pod/namespace information. This changes our requirements around the freshness of the resource and whether we get it from what the kube-apiserver has cached (part of the control plane) or ends up querying etcd directly (also sorta part of the control plane, but more in a storage manner).

Communicating with "node local" api's is what Datadog suggested after a short review, but would be a pretty full rewrite.

jszwedko · 2021-11-09T20:06:37Z

Ah, gotcha, thanks for clearing that up.

Is the risk then that we might get stale metadata for k8s resources like the pods or namespaces and use that to annotate the logs?

spencergilbert · 2021-11-09T20:10:11Z

Ah, gotcha, thanks for clearing that up.

Is the risk then that we might get stale metadata for k8s resources like the pods or namespaces and use that to annotate the logs?

Yeah - it does introduce that possibility, but I'd consider the stability improvements are worth it. As is (especially with every vector instance making calls) we can DDoS etcd pretty easily :|

jszwedko · 2021-11-09T20:28:23Z

Yeah, I agree, I just wanted to make sure I understood the trade-offs.

github-actions · 2021-11-09T20:36:06Z

Soak Test Results

Baseline: 99e7549
Comparison: 3279d38
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.

`datadog_agent_remap_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	10.56Mi	10.60Mi	10.61Mi	10.61Mi	-0.35	-0.58
comparison	10.59Mi	10.61Mi	10.62Mi	10.62Mi	0.36	-0.10

`datadog_agent_remap_datadog_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	19.86Mi	19.95Mi	19.97Mi	19.97Mi	-0.41	-0.76
comparison	18.56Mi	18.60Mi	18.61Mi	18.62Mi	-0.30	-0.66

`syslog_humio_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.50Mi	7.65Mi	7.65Mi	7.65Mi	-0.46	-1.29
comparison	7.07Mi	7.19Mi	7.20Mi	7.21Mi	0.14	-1.16

`syslog_log2metric_humio_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.05Mi	5.07Mi	5.08Mi	5.08Mi	0.07	-1.24
comparison	5.15Mi	5.18Mi	5.18Mi	5.18Mi	0.36	-0.89

`syslog_log2metric_splunk_hec_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.13Mi	5.26Mi	5.26Mi	5.26Mi	-1.05	-0.49
comparison	5.35Mi	5.38Mi	5.38Mi	5.38Mi	-0.03	-1.05

`syslog_loki`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.91Mi	4.11Mi	4.15Mi	4.15Mi	0.18	-1.13
comparison	3.86Mi	4.05Mi	4.07Mi	4.07Mi	-0.33	-0.98

`syslog_regex_logs2metric_ddmetrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.88Mi	3.89Mi	3.90Mi	3.90Mi	0.28	-0.75
comparison	3.77Mi	3.79Mi	3.79Mi	3.79Mi	0.25	-1.07

`syslog_splunk_hec_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.02Mi	7.03Mi	7.03Mi	7.03Mi	-0.01	0.27
comparison	7.24Mi	7.26Mi	7.26Mi	7.26Mi	-0.01	-0.85

Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

github-actions · 2021-11-10T20:35:44Z

Soak Test Results

Baseline: a9c310c
Comparison: 172ebeb
Total Vector CPUs: 4

What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.

`datadog_agent_remap_blackhole`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	10.56Mi	10.60Mi	10.61Mi	10.61Mi	-0.35	-0.58
comparison	10.04Mi	10.09Mi	10.09Mi	10.10Mi	-0.17	-1.16

`datadog_agent_remap_datadog_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	19.94Mi	19.98Mi	19.99Mi	19.99Mi	0.06	-0.74
comparison	20.12Mi	20.15Mi	20.17Mi	20.17Mi	0.28	0.19

`splunk_hec_route_s3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.71Mi	6.04Mi	6.09Mi	6.11Mi	-0.27	-0.74
comparison	5.47Mi	5.77Mi	5.83Mi	5.83Mi	-0.48	-0.58

`splunk_transforms_splunk3`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	2.28Mi	2.72Mi	2.79Mi	2.80Mi	-0.04	-1.13
comparison	2.61Mi	2.81Mi	2.83Mi	2.86Mi	-0.32	-0.50

`syslog_humio_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.50Mi	7.65Mi	7.65Mi	7.65Mi	-0.46	-1.29
comparison	7.07Mi	7.19Mi	7.20Mi	7.21Mi	0.14	-1.16

`syslog_log2metric_humio_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.02Mi	5.04Mi	5.05Mi	5.05Mi	-0.47	-0.31
comparison	5.00Mi	5.03Mi	5.04Mi	5.04Mi	-0.55	-0.74

`syslog_log2metric_splunk_hec_metrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	5.19Mi	5.19Mi	5.19Mi	5.19Mi	-0.39	-1.25
comparison	5.34Mi	5.38Mi	5.38Mi	5.38Mi	-0.44	-0.54

`syslog_loki`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	4.13Mi	4.37Mi	4.45Mi	4.46Mi	0.29	-0.12
comparison	3.86Mi	4.05Mi	4.07Mi	4.07Mi	-0.33	-0.98

`syslog_regex_logs2metric_ddmetrics`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	3.71Mi	3.72Mi	3.72Mi	3.72Mi	-0.04	-1.58
comparison	3.77Mi	3.79Mi	3.79Mi	3.79Mi	0.25	-1.07

`syslog_splunk_hec_logs`

EXPERIMENT	VALUE_min	VALUE_p90	VALUE_p99	VALUE_max	VALUE_skewness	VALUE_kurtosis
baseline	7.02Mi	7.03Mi	7.03Mi	7.03Mi	-0.01	0.27
comparison	7.24Mi	7.26Mi	7.26Mi	7.26Mi	-0.01	-0.85

…9974) * feat(kubernetes_logs source): Use resource_version of 0 to use cache Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com> * Update reflector unit tests, use to_owned over to_string Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com> * make fmt Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

feat(kubernetes_logs source): Use resource_version of 0 to use cache

3279d38

Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

spencergilbert added the source: kubernetes_logs Anything `kubernetes_logs` source related label Nov 9, 2021

spencergilbert self-assigned this Nov 9, 2021

spencergilbert requested review from JeanMertz, jszwedko and StephenWakely November 9, 2021 19:55

spencergilbert added ci-condition: k8s e2e all targets Run Kubernetes E2E test suite for all targets (instead of just the essential subset) ci-condition: k8s e2e tests enable Run Kubernetes E2E test suite for this PR labels Nov 9, 2021

jszwedko approved these changes Nov 9, 2021

View reviewed changes

spencergilbert added 2 commits November 10, 2021 15:01

Update reflector unit tests, use to_owned over to_string

2c17186

Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

make fmt

172ebeb

Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

jszwedko approved these changes Nov 11, 2021

View reviewed changes

spencergilbert merged commit 6cfb28d into master Nov 11, 2021

spencergilbert deleted the spencer/use-apiserver-cache branch November 11, 2021 15:24

ZhiminXiang mentioned this pull request Jan 19, 2023

Vector making inefficient api GETs in large K8s clusters #7943

Closed

This was referenced Mar 9, 2023

Vector making api requests to Kubernetes API server without using resource_version #16753

Closed

Vector making api requests to Kubernetes API server without using resource_version #16797

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974

feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974

spencergilbert commented Nov 9, 2021

netlify bot commented Nov 9, 2021 •

edited

jszwedko left a comment

spencergilbert commented Nov 9, 2021

jszwedko commented Nov 9, 2021

spencergilbert commented Nov 9, 2021

jszwedko commented Nov 9, 2021

github-actions bot commented Nov 9, 2021

github-actions bot commented Nov 10, 2021

feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974

feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974

Conversation

spencergilbert commented Nov 9, 2021

netlify bot commented Nov 9, 2021 • edited

jszwedko left a comment

Choose a reason for hiding this comment

spencergilbert commented Nov 9, 2021

jszwedko commented Nov 9, 2021

spencergilbert commented Nov 9, 2021

jszwedko commented Nov 9, 2021

github-actions bot commented Nov 9, 2021

Soak Test Results

datadog_agent_remap_blackhole

datadog_agent_remap_datadog_logs

syslog_humio_logs

syslog_log2metric_humio_metrics

syslog_log2metric_splunk_hec_metrics

syslog_loki

syslog_regex_logs2metric_ddmetrics

syslog_splunk_hec_logs

github-actions bot commented Nov 10, 2021

Soak Test Results

datadog_agent_remap_blackhole

datadog_agent_remap_datadog_logs

splunk_hec_route_s3

splunk_transforms_splunk3

syslog_humio_logs

syslog_log2metric_humio_metrics

syslog_log2metric_splunk_hec_metrics

syslog_loki

syslog_regex_logs2metric_ddmetrics

syslog_splunk_hec_logs

netlify bot commented Nov 9, 2021 •

edited

`datadog_agent_remap_blackhole`

`datadog_agent_remap_datadog_logs`

`syslog_humio_logs`

`syslog_log2metric_humio_metrics`

`syslog_log2metric_splunk_hec_metrics`

`syslog_loki`

`syslog_regex_logs2metric_ddmetrics`

`syslog_splunk_hec_logs`

`datadog_agent_remap_blackhole`

`datadog_agent_remap_datadog_logs`

`splunk_hec_route_s3`

`splunk_transforms_splunk3`

`syslog_humio_logs`

`syslog_log2metric_humio_metrics`

`syslog_log2metric_splunk_hec_metrics`

`syslog_loki`

`syslog_regex_logs2metric_ddmetrics`

`syslog_splunk_hec_logs`