New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(kubernetes_logs source): Use resource_version of 0 to use cache #9974
Conversation
Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>
✔️ Deploy Preview for vector-project canceled. 🔨 Explore the source changes: 172ebeb 🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/618c267ac81a7d0007aa2f77 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. For the kubernetes_logs
source we only consume logs for local pods so I imagine that the local node will have any relevant data available. Are you aware of any risks here?
We actually don't communicate with the "local" node. We today, and after this, make calls to the control plane for pod/namespace information. This changes our requirements around the freshness of the resource and whether we get it from what the kube-apiserver has cached (part of the control plane) or ends up querying etcd directly (also sorta part of the control plane, but more in a storage manner). Communicating with "node local" api's is what Datadog suggested after a short review, but would be a pretty full rewrite. |
Ah, gotcha, thanks for clearing that up. Is the risk then that we might get stale metadata for k8s resources like the pods or namespaces and use that to annotate the logs? |
Yeah - it does introduce that possibility, but I'd consider the stability improvements are worth it. As is (especially with every vector instance making calls) we can DDoS etcd pretty easily :| |
Yeah, I agree, I just wanted to make sure I understood the trade-offs. |
Soak Test ResultsBaseline: 99e7549 What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.
|
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 10.56Mi | 10.60Mi | 10.61Mi | 10.61Mi | -0.35 | -0.58 |
comparison | 10.59Mi | 10.61Mi | 10.62Mi | 10.62Mi | 0.36 | -0.10 |
datadog_agent_remap_datadog_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 19.86Mi | 19.95Mi | 19.97Mi | 19.97Mi | -0.41 | -0.76 |
comparison | 18.56Mi | 18.60Mi | 18.61Mi | 18.62Mi | -0.30 | -0.66 |
syslog_humio_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 7.50Mi | 7.65Mi | 7.65Mi | 7.65Mi | -0.46 | -1.29 |
comparison | 7.07Mi | 7.19Mi | 7.20Mi | 7.21Mi | 0.14 | -1.16 |
syslog_log2metric_humio_metrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 5.05Mi | 5.07Mi | 5.08Mi | 5.08Mi | 0.07 | -1.24 |
comparison | 5.15Mi | 5.18Mi | 5.18Mi | 5.18Mi | 0.36 | -0.89 |
syslog_log2metric_splunk_hec_metrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 5.13Mi | 5.26Mi | 5.26Mi | 5.26Mi | -1.05 | -0.49 |
comparison | 5.35Mi | 5.38Mi | 5.38Mi | 5.38Mi | -0.03 | -1.05 |
syslog_loki
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 3.91Mi | 4.11Mi | 4.15Mi | 4.15Mi | 0.18 | -1.13 |
comparison | 3.86Mi | 4.05Mi | 4.07Mi | 4.07Mi | -0.33 | -0.98 |
syslog_regex_logs2metric_ddmetrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 3.88Mi | 3.89Mi | 3.90Mi | 3.90Mi | 0.28 | -0.75 |
comparison | 3.77Mi | 3.79Mi | 3.79Mi | 3.79Mi | 0.25 | -1.07 |
syslog_splunk_hec_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 7.02Mi | 7.03Mi | 7.03Mi | 7.03Mi | -0.01 | 0.27 |
comparison | 7.24Mi | 7.26Mi | 7.26Mi | 7.26Mi | -0.01 | -0.85 |
Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>
Soak Test ResultsBaseline: a9c310c What follows is a statistical summary of the soak captures between the SHAs given above. Units are bytes/second/CPU, except for 'skewness' and 'kurtosis'. Higher numbers in 'comparison' is generally better. Higher skewness or kurtosis numbers indicate a lack of consistency in behavior, making predictions of fitness in the field challenging.
|
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 10.56Mi | 10.60Mi | 10.61Mi | 10.61Mi | -0.35 | -0.58 |
comparison | 10.04Mi | 10.09Mi | 10.09Mi | 10.10Mi | -0.17 | -1.16 |
datadog_agent_remap_datadog_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 19.94Mi | 19.98Mi | 19.99Mi | 19.99Mi | 0.06 | -0.74 |
comparison | 20.12Mi | 20.15Mi | 20.17Mi | 20.17Mi | 0.28 | 0.19 |
splunk_hec_route_s3
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 5.71Mi | 6.04Mi | 6.09Mi | 6.11Mi | -0.27 | -0.74 |
comparison | 5.47Mi | 5.77Mi | 5.83Mi | 5.83Mi | -0.48 | -0.58 |
splunk_transforms_splunk3
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 2.28Mi | 2.72Mi | 2.79Mi | 2.80Mi | -0.04 | -1.13 |
comparison | 2.61Mi | 2.81Mi | 2.83Mi | 2.86Mi | -0.32 | -0.50 |
syslog_humio_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 7.50Mi | 7.65Mi | 7.65Mi | 7.65Mi | -0.46 | -1.29 |
comparison | 7.07Mi | 7.19Mi | 7.20Mi | 7.21Mi | 0.14 | -1.16 |
syslog_log2metric_humio_metrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 5.02Mi | 5.04Mi | 5.05Mi | 5.05Mi | -0.47 | -0.31 |
comparison | 5.00Mi | 5.03Mi | 5.04Mi | 5.04Mi | -0.55 | -0.74 |
syslog_log2metric_splunk_hec_metrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 5.19Mi | 5.19Mi | 5.19Mi | 5.19Mi | -0.39 | -1.25 |
comparison | 5.34Mi | 5.38Mi | 5.38Mi | 5.38Mi | -0.44 | -0.54 |
syslog_loki
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 4.13Mi | 4.37Mi | 4.45Mi | 4.46Mi | 0.29 | -0.12 |
comparison | 3.86Mi | 4.05Mi | 4.07Mi | 4.07Mi | -0.33 | -0.98 |
syslog_regex_logs2metric_ddmetrics
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 3.71Mi | 3.72Mi | 3.72Mi | 3.72Mi | -0.04 | -1.58 |
comparison | 3.77Mi | 3.79Mi | 3.79Mi | 3.79Mi | 0.25 | -1.07 |
syslog_splunk_hec_logs
EXPERIMENT | VALUE_min | VALUE_p90 | VALUE_p99 | VALUE_max | VALUE_skewness | VALUE_kurtosis |
---|---|---|---|---|---|---|
baseline | 7.02Mi | 7.03Mi | 7.03Mi | 7.03Mi | -0.01 | 0.27 |
comparison | 7.24Mi | 7.26Mi | 7.26Mi | 7.26Mi | -0.01 | -0.85 |
…9974) * feat(kubernetes_logs source): Use resource_version of 0 to use cache Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com> * Update reflector unit tests, use to_owned over to_string Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com> * make fmt Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>
Signed-off-by: Spencer Gilbert spencer.gilbert@datadoghq.com
Closes #7943
I had to remove the const to set it like this, if there's a better alternative let me know. Setting this to 0 rather than None should allow us to use cached values from the apiserver rather than getting the freshest results from etcd directly.