Skip to content

[feature request] Improve SDK logging output for more efficient troubleshooting #6242

Open
@toddfoust

Description

@toddfoust

Package

None

Is your feature request related to a problem?

Can we improve the output or formatting of these self-diagnostic logs to make troubleshooting much more efficient?

Has anyone tried to troubleshoot the SDK yet? I got involved on an issue where someone's web app was not feeding information to Live Metrics. I turned on SDK Logging built into OTel, to collect the self-diagnostics output while they opened up a browser and navigated to Live Metrics blade. I'm starting to read through the output file to decipher what processing is going on within the SDK and when I first load the log file I get this experience in VSCode, where each line is a composite format string:

Image

I'm clearly interested in reading, reviewing and comprehending the three SDK entries highlighted in red above, but the string output just shows me the raw unformatted string with each place holder. Let's extract that SDK log entry on line 299 here:

2025-02-11T23:04:06.0983783Z:Transmission failed. StatusCode: {0}. Error from Ingestion: {1}. Action: {2}. Origin: {3}. AAD Enabled: {4}. Instrumentation Key: {5}. Configured Endpoint: {6}. Actual Endpoint: {7}{0}{(To get exact error change LogLevel to Verbose)}{Telemetry is stored offline for retry}{AzureMonitorMetricExporter}{False}{1226a108-1d56-4bc7-9aea-c04d04904833}{https://eastus2-0.in.applicationinsights.azure.com/}{eastus2-0.in.applicationinsights.azure.com}

When I read the output string of that log entry I see that we have 8 placeholder values that I need to mentally replace, my eyes cannot easily distinguish which text is part of which placeholder so instead of reading this line like a normal person, I have to spend a few minutes or even open up notepad and try to slowly piece together... "ok, so {0} needs to get replaced with '(to get exact error change LogLevel to Verbose)', or was I supposed to replace that first {0} place holder with '0", lets skip that for now, let's assume {1} is the placeholder that should get '(to get exact error change LogLevel to Verbose)', then would 'Action: {2}' become 'Action: Telemetry is stored offline for retry'?

Just trying to decipher or eye-ball this one entry alone is a challenge, when we could just have the SDK write out the formatted string with the values already replaced into the placeholder slots.

What is the expected behavior?

Instead of line 299 of that sample log reading like this:

2025-02-11T23:04:06.0983783Z:Transmission failed. StatusCode: {0}. Error from Ingestion: {1}. Action: {2}. Origin: {3}. AAD Enabled: {4}. Instrumentation Key: {5}. Configured Endpoint: {6}. Actual Endpoint: {7}{0}{(To get exact error change LogLevel to Verbose)}{Telemetry is stored offline for retry}{AzureMonitorMetricExporter}{False}{1226a108-1d56-4bc7-9aea-c04d04904833}{https://eastus2-0.in.applicationinsights.azure.com/}{eastus2-0.in.applicationinsights.azure.com}

I'd like to see it written like this in the resulting log file:

2025-02-11T23:04:06.0983783Z:Transmission failed. StatusCode: 0. Error from Ingestion: (To get exact error change LogLevel to Verbose). Action: Telemetry is stored offline for retry. Origin: AzureMonitorMetricExporter. AAD Enabled: False. Instrumentation Key: 1226a108-1d56-4bc7-9aea-c04d04904833. Configured Endpoint: https://eastus2-0.in.applicationinsights.azure.com/. Actual Endpoint: eastus2-0.in.applicationinsights.azure.com

Which alternative solutions or features have you considered?

Can we add a new feature to the open telemetry SDK that would allow for better logging output that is more readable and allows people troubleshooting missing data symptoms to more easily identify what is going on within the SDK at the time of missing data?

Not sure the best feature implementation, maybe as an additional setting the SDK self-diagnostic logging enablement? Should we just add another setting that would help output more readable logs?

How about another configurable setting called 'OutputFormat', or 'OutputStyle' or just 'Format'? If omitted, the resulting log file sticks with this weird composite formatted string, but if that configurable value is changed to 'ResolvePlaceholders' or 'Readable' or 'Interpolated', then the SDK will go ahead and replace each placeholder with the string value automatically before writing to the log?

{
"LogDirectory": ".",
"FileSize": 32768,
"LogLevel": "Warning",
"Format": "Interpolated"
}

So maybe this "Format" setting defaults to "Composite", as I'm sure some users have taken a dependency on this format type already, then add an alternative value of "Interpolated" if you want those placeholders replaced with their string values in the output log file?

Thoughts?

Additional context

I have real world dropped, delayed or duplicate telemetry problems that I need to solve. When that investigation takes me to the SDK and how the records were produced, I don't want to have to do mental interpolation just to figure out what the failure was or what the SDK was doing at that moment in time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestneeds-triageNew issues which have not been classified or triaged by a community member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions