Skip to content

DNS failure logs to be logged on info level instead of debug level #39142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kishor7007 opened this issue Apr 16, 2025 · 13 comments · May be fixed by #39652
Open

DNS failure logs to be logged on info level instead of debug level #39142

kishor7007 opened this issue Apr 16, 2025 · 13 comments · May be fixed by #39652
Labels
area/dns area/envoy_log question Questions that are neither investigations, bugs, nor enhancements

Comments

@kishor7007
Copy link

kishor7007 commented Apr 16, 2025

Title: DNS failure logs to be logged as info instead of debug

Description:

On strict dns / logical dns resolutions if there are errors on the DNS resolution, the failure logs are getting written only when debug log level enabled. To easily identify the DNS failure logs, it is beneficial to log them as info so that users can easily identify the failure logs without out need of debug log level enablement.

Sample Log:

2025-04-16T08:59:11.694103Z debug envoy dns external/envoy/source/extensions/network/dns_resolver/cares/dns_impl.cc:152 dns resolution for myapp.net failed with c-ares status 11 thread=19

[optional Relevant Links:]

Sample images given below

Image

Image

Image Image
@kishor7007 kishor7007 added the triage Issue requires triage label Apr 16, 2025
@kishor7007 kishor7007 changed the title DNS failure logs to be logged as info instead of debug DNS failure logs to be logged as info instead of debug level Apr 16, 2025
@kishor7007 kishor7007 changed the title DNS failure logs to be logged as info instead of debug level DNS failure logs to be logged on info level instead of debug level Apr 16, 2025
@ramaraochavali
Copy link
Contributor

Can you use stats https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/dns_resolution.html to identify the failures? These logs can spam if there are my dns clusters in mesh

@kishor7007
Copy link
Author

All are summarised counters at per Envoy,  without details on which specific FQDN like test.example.comlookup actually failed and respective failure reason.
Moreover, as failures should be addressed promptly rather than lingering in the system, having logs of DNS lookup failures logs enable monitoring systems like ElasticSearch to filter sidecar logs for real-time alarms, analyse historical patterns, trigger alerts, and improve response times efficiently.

@phlax phlax added question Questions that are neither investigations, bugs, nor enhancements area/dns area/envoy_log and removed triage Issue requires triage labels Apr 17, 2025
@ramaraochavali
Copy link
Contributor

You can also get which cluster DNS resolution failed with update_failure metric on cluster.

If you still want logging, you can use https://www.envoyproxy.io/docs/envoy/v1.34.0/operations/cli.html#cmdoption-component-log-level and enable it just for dns like dns:debug

@kishor7007
Copy link
Author

Enabling the dns:debug level prints the unwanted success logs as well, right?

@yanavlasov
Copy link
Contributor

You can use fine grain logger to limit the log messages emitted by Envoy. https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-enable-fine-grain-logging

Generally logging DNS errors in debug log file is not a viable approach as even at info level a busy proxy can produce too many log entries. However I'm unsure hot to surface DNS diagnostics in an easy way as Envoy does not have access log for DNS resolutions. Maybe others can suggest something.

@kishor7007
Copy link
Author

Since debug logs are not enabled by default, any DNS errors occurring in the system may go unnoticed, potentially masking issues until an external user reports them. Given the minimal volume of DNS errors compared to the current logging activity in an active traffic system, enabling logs for these errors is essential to ensure timely awareness of potential problems.

@kishor7007
Copy link
Author

waiting for the response on this.

@yanavlasov
Copy link
Contributor

Changing the level of DNS error messages in dns_impl.cc to info can product a log spam for Envoy deployments that use dynamic forwarding proxy where DNS resolution errors can be common. I do not think this is practical.

If you need to monitor DNS resolution for clusters where error rate is lower, can use fine grain logger to just enable messages from dns_impl.cc ? To better isolate the errors we can downgrade the rest of messages in this file to trace.

@kishor7007
Copy link
Author

Thanks @yanavlasov for the response, I agree with you that isolating the error logs from general logs, i.e.

  1. Setting the general logs to trace level, providing maximum detail.
  2. Setting the error logs specifically to debug level, providing enhanced visibility on problems.

@kishor7007
Copy link
Author

@ramaraochavali and @yanavlasov,
Hope, we can proceed with this approach.

@ramaraochavali
Copy link
Contributor

Sure. As long as info level is not touched, I am fine with that.

@kishor7007
Copy link
Author

kishor7007 commented May 21, 2025

Hi @ramaraochavali and @yanavlasov,
As part of working on this, noticed that apple_dns_impl.cc already has warn log level for the errors.

grep -ir "ENVOY_LOG(warn" .
./apple/apple_dns_impl.cc: ENVOY_LOG(warn, "DNS resolver error ({}) in dnsServiceGetAddrInfo for {}", error, dns_name);
./apple/apple_dns_impl.cc: ENVOY_LOG(warn, "DNS resolver error in dnsServiceRefSockFD for {}", dns_name);
./apple/apple_dns_impl.cc: ENVOY_LOG(warn, "DNS resolver error ({}) in DNSServiceProcessResult", error);

Based on this, I would like to derive the following options.

  1. Marking the cares/dns_impl.cc error logs to warn level as well to match consistence on the dns component error logs
  2. Keeping the ./apple/apple_dns_impl.cc error logs as is i.e. warn level as this is already in the production and marking cares/dns_impl.cc error logs to debug level.
  3. Marking the both ./apple/apple_dns_impl.cc and cares/dns_impl.cc error logs to debug level.

Please suggest a best suitable one to proceed on this.

@yanavlasov
Copy link
Contributor

Apple DNS resolver is not used in production environments. You can either go with 2 or 3.

@kishor7007 kishor7007 linked a pull request May 28, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns area/envoy_log question Questions that are neither investigations, bugs, nor enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants