Update Internal Collector Telemetry Docs #7035

avillela · 2025-06-03T17:33:53Z

This PR contains updates to the documentation on Internal Collector Telemetry to help clarify some of the approaches for exporting internal Collector metrics. It also includes an explanation on why self-ingesting telemetry is not advisable.

jmichalek132 · 2025-06-03T18:24:36Z

content/en/docs/collector/internal-telemetry.md

+There are three ways to export internal Collector metrics.
+
+1. Self-ingesting, exporting internal metrics via the
+   [Prometheus exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter).


Afaik this is not using the prometheus exporter, rather https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

@jmichalek132 thanks for the clarification. How does the Go Prometheus exporter differ from one in the Collector?? I was under the impression that the Collector's one was based on the Go one??

@dashpole can answer that nicely.

There are two reasons:

First, the exporters implement different interfaces. There is a Reader, and the exporter.Metrics for the collector.

Second, the collector's exporter is designed to aggregate metrics from multiple resources/targets together, similar to how the Prometheus server's /federate endpoint works. The Go SDK exporter is designed to only handle metrics from a single resource, more like prometheus client_golang.

@dashpole so when exporter is configured to use prometheus to export internal metrics, then it's using the Go Prometheus exporter behind the scenes then?

Yes, that is correct. You can link to go.opentelemetry.io/otel/exporters/prometheus

jmichalek132 · 2025-06-03T18:26:23Z

content/en/docs/collector/internal-telemetry.md

+               exporter:
+                 prometheus:
+                   host: '0.0.0.0'
+                   port: 8888


might be also nice to provide example how to get the original names back
https://github.com/open-telemetry/opentelemetry-collector/blob/e1f670844604a5b119d8560bc079ceca4c92bf72/CHANGELOG.md?plain=1#L347

@jmichalek132 happy to do that. Can you elaborate on what is meant by the line Users who do not customize the Prometheus reader should not be impacted. in the changelog? Is the "Prometheus reader" the same as the "Prometheus receiver"?

Do the Prometheus receiver (receivers::prometheus) and/or Prometheus exporter (exporters::prometheus) have to be configured when the internal metrics exporter is prometheus?

@jmichalek132 friendly reminder for clarification on above item 😁

@avillela We just added a note about this further down the page. You could link to it from here, if you want, and I think that should handle this suggestion.

https://opentelemetry.io/docs/collector/internal-telemetry/#_total-suffix

kallangerard · 2025-06-04T04:17:37Z

content/en/docs/collector/internal-telemetry.md

+- [Traces](#configure-internal-traces)
+
+{{% alert title="Who monitors the monitor?" color="info" %}} Internal Collector
+metrics can be exported directly to a backend for analysis, or to the Collector


I agree that a collector shouldn't self monitor it's own telemetry, I would be wary of suggesting the telemetry should be sent directly to a backend.

For example imagine a common agent and gateway pattern on Kubernetes. We could have hundreds or thousands node agents, batching and shipping application telemetry to a gateway layer that could be comprised of only a few instances.

If someone tried to make connections from thousands of node agents to a vendors backend for otelcol telemetry, there may be a lot of scaling issues from that. The internal-telemetry would also never have a chance to be enriched with Kubernetes metadata.

I've been thinking of using a dedicated internal telemetry gateway of otelcol instances for this purpose, so every otelcol instance regardless of whether it's a node agent, a gateway or a load balancing exporter layer, would send to the same collector instances dedicated to otelcol telemetry. I'm not sure what to call this pattern, but maybe we can suggest it here?

@kallangerard I like that idea! I'll make the revisions.

On the same not we just use prometheus to scrape otel collector metrics directly, might be worth calling it out as an option.

@jmichalek132 sorry...I'm confused. Isn't that what setting the exporter to prometheus does in the first config??

For reference, we used to have a similar warning on this page, but it was removed when the self-monitoring section was removed. I'll leave it up to @codeboten if we want to add the warning again.

kallangerard · 2025-06-04T04:20:59Z

content/en/docs/collector/internal-telemetry.md

+data), its internal telemetry won't be sent to its intended destination. This
+makes it difficult to detect problems with the Collector itself. Likewise, if
+the Collector generates additional telemetry related to the above issue, such
+error logs, and those logs are sent into the same collector, it can create a


Do we need an example of excluding otelcol's own logs from log tailing. I know Spunk's otelcol helm had some examples of this. I believe it was using custom exclude annotations if I recall correctly.

@kallangerard can you point me in the direction of this documentation?

I just had a look, I was wrong, Splunk used path filtering in the filelog receiver to exclude their self-logs, while otlp logs will filter out logs with an exclude annotation on the pod. They seem be excluding filelog capability entirely in their latest examples though.

See this old version https://github.com/signalfx/splunk-otel-collector-chart/blob/0fa56adc9c55728094da367fd71a51655b1da40a/examples/only-logs-with-extra-file-logs/rendered_manifests/configmap-agent.yaml#L133-L136

I think if someone is using the filelog receiver, they're likely to have already come across this issue and are handling it in their own way.

For internal telemetry logs to an otlp endpoint, I don't think there's any way to do it safely by sending to its own otlp receiver endpoint. I'm not 100% sure on a safe alternative though. I believe there's some internal rate limiting in the otelcol's internal logs, but I haven't tested it with a self-exporting broken logging pipeline. I've been scraping otelcol logs with Datadog for a while and haven't seen any runaway log volumes, but I guess that's not self consumption. 🤷😅

Outside of this PR I'll try and write up some examples of a dedicated internal telemetry collector.

kallangerard · 2025-06-04T04:26:06Z

content/en/docs/collector/internal-telemetry.md

+           - pull:
+               exporter:
+                 prometheus:
+                   host: '0.0.0.0'


Is 0.0.0.0 right here?

I'm not sure if this will work for IPv6 only stacks

The default behaviour has been changed in endpoints and such from 0.0.0.0 to localhost. Should we just be using localhost if we are intending to expose for self scraping, or be more explicit for public interfaces?

Probably not something we need to tackle here, but I'd love a chime in from anyone with better container/Linux networking knowledge than me.

@kallangerard this was in the original docs, so I can't speak for it (and haven't used it). Hope someone else will chime in with more info. 😁

tiffany76 · 2025-06-06T21:17:48Z

@open-telemetry/collector-approvers, PTAL. Thanks!

jade-guiton-dd

Thank you for the PR. I have a few questions about the new section

jade-guiton-dd · 2025-06-30T10:45:29Z

content/en/docs/collector/internal-telemetry.md

-{{% alert title="Internal telemetry configuration changes" %}}
+There are three ways to export internal Collector metrics.
+
+1. Self-ingesting, exporting internal metrics via the


What does "self-ingesting" mean here? It doesn't look like this section has the Collector ingest its own metrics.

jade-guiton-dd · 2025-06-30T10:49:13Z

content/en/docs/collector/internal-telemetry.md

+   ```
+
+2. Self-ingesting and exporting, scraping metrics via the Collector's own
+   [Prometheus Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver).


I'm not sure it makes sense to have a section about self-ingesting Prometheus-exported metrics, for two reasons:

We used to have a section about self-ingesting OTLP-exported metrics in the past, but it was removed, because we want to discourage users from doing self-ingestion, as it can introduce reliability and data loss issues (if the Collector is unhealthy, who's going to export the metrics showing that it is?). See remove suggestion to process internal telemetry through collector #5749 for precedent.

If users are going to be doing self-ingestion anyways, I think it makes more sense to do it through OTLP rather than Prometheus. This will preserve all of the telemetry and its semantics as-is, without unintended conversions (eg. metric names with the default config) or data loss (eg. scope attributes until recently) that could occur due to the Prometheus exporter/receiver combo. Moreover, it's possible with the Core Collector distribution, not just Contrib.

jade-guiton-dd · 2025-06-30T10:50:08Z

content/en/docs/collector/internal-telemetry.md

+   ```
+
+   {{% alert title="WARNING" color="warning" %}} Although the above approach is
+   possible, it is not recommended, as it can introduce scaling issues.


What kind of scaling issues do you expect when exporting metrics through OTLP? Is this comment meant to be about self-ingestion? I'd rather we didn't discourage people from emitting internal telemetry with our own protocol.

jade-guiton-dd · 2025-06-30T10:56:51Z

content/en/docs/collector/internal-telemetry.md

@@ -161,9 +256,12 @@ service:
            exporter:
              otlp:
                protocol: http/protobuf
-                endpoint: https://backend:4318
+                endpoint: https://${OTLP_ENDPOINT}


Nitpick: People often mix up OTLP/gRPC and OTLP/HTTP, so I think keeping a reminder of the standard port would be good. Maybe add a line like "This will load the endpoint from the OTLP_ENDPOINT environment variable, which should look something like backend:4318"

tiffany76

I left a few suggestions, but I'll do a more thorough copy edit after the content is finalized based on the other suggestions. Thanks, @avillela!

tiffany76 · 2025-07-03T16:50:52Z

content/en/docs/collector/internal-telemetry.md

+- [Logs](#configure-internal-logs)
+- [Traces](#configure-internal-traces)
+
+{{% alert title="Who monitors the monitor?" color="info" %}} As a matter of best


Suggested change

{{% alert title="Who monitors the monitor?" color="info" %}} As a matter of best

{{% alert title="Who monitors the monitor?" %}} As a matter of best

tiffany76 · 2025-07-03T16:51:38Z

content/en/docs/collector/internal-telemetry.md

+exports the telemetry to an OTLP backend for analysis.
+
+When a Collector is responsible for handling its own telemetry through a traces,
+metrics, or logs pipeline and encounters an issue (e.g. memory limiter blocking


Suggested change

metrics, or logs pipeline and encounters an issue (e.g. memory limiter blocking

metrics, or logs pipeline and encounters an issue (for example, memory limiter blocking

tiffany76 · 2025-07-03T16:52:01Z

content/en/docs/collector/internal-telemetry.md

+metrics, or logs pipeline and encounters an issue (e.g. memory limiter blocking
+data), its internal telemetry won't be sent to its intended destination. This
+makes it difficult to detect problems with the Collector itself. Likewise, if
+the Collector generates additional telemetry related to the above issue, such


Suggested change

the Collector generates additional telemetry related to the above issue, such

the Collector generates additional telemetry related to the above issue, such as

tiffany76 · 2025-07-03T17:13:51Z

content/en/docs/collector/internal-telemetry.md

+               exporter:
+                 prometheus:
+                   host: '0.0.0.0'
+                   port: 8888


@avillela We just added a note about this further down the page. You could link to it from here, if you want, and I think that should handle this suggestion.

https://opentelemetry.io/docs/collector/internal-telemetry/#_total-suffix

avillela added 2 commits June 3, 2025 16:43

Update docs for internal Collector telemetry

a267085

Update docs for internal collector telemetry

e06e7b2

avillela requested a review from a team as a code owner June 3, 2025 17:33

github-project-automation bot added this to SIG Comms: PRs & Issues Jun 3, 2025

opentelemetrybot requested review from a team and bogdandrutu and removed request for a team June 3, 2025 17:34

github-actions bot added the sig:collector label Jun 3, 2025

jmichalek132 reviewed Jun 3, 2025

View reviewed changes

kallangerard reviewed Jun 4, 2025

View reviewed changes

Merge branch 'main' into avillela-internal-collector-telemetry

56f0d10

opentelemetrybot requested a review from a team June 27, 2025 17:16

Fix merge conflicts

2cbbb98

avillela requested review from jmichalek132, dashpole, kallangerard and tiffany76 June 27, 2025 18:35

jade-guiton-dd reviewed Jun 30, 2025

View reviewed changes

tiffany76 reviewed Jul 3, 2025

View reviewed changes

timojohlo mentioned this pull request Jul 4, 2025

(docs/internal-telemetry): add example for labels #7257

Merged

	{{% alert title="Who monitors the monitor?" color="info" %}} As a matter of best
	{{% alert title="Who monitors the monitor?" %}} As a matter of best

	metrics, or logs pipeline and encounters an issue (e.g. memory limiter blocking
	metrics, or logs pipeline and encounters an issue (for example, memory limiter blocking

	the Collector generates additional telemetry related to the above issue, such
	the Collector generates additional telemetry related to the above issue, such as

Update Internal Collector Telemetry Docs #7035

Are you sure you want to change the base?

Update Internal Collector Telemetry Docs #7035

Uh oh!

Conversation

avillela commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avillela Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallangerard Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallangerard Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avillela Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiffany76 commented Jun 6, 2025

Uh oh!

jade-guiton-dd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiffany76 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

avillela Jun 4, 2025 •

edited

Loading

kallangerard Jun 6, 2025 •

edited

Loading

kallangerard Jun 4, 2025 •

edited

Loading

avillela Jun 4, 2025 •

edited

Loading