Handle validation failures on startup more gracefully #1113

adamconnelly · 2020-06-27T13:08:51Z

At the moment Promitor relies on throwing a ValidationFailedException to crash the application if the configuration isn't valid. This isn't ideal because it adds noise to the console output, meaning you have to scroll back up past the stack trace to get to the validation error.

Here's an example of the output you receive:

[13:56:38 INF] Starting validation of Promitor setup
[13:56:38 INF] Start Validation step 1/6: Metrics Declaration Path
[13:56:38 INF] Scrape configuration found at '/home/adam/github.com/adamconnelly/promitor/config/promitor/scraper/metrics.yaml'
[13:56:38 INF] Validation step 1/6 succeeded
[13:56:38 INF] Start Validation step 2/6: Azure Authentication
[13:56:38 INF] Validation step 2/6 succeeded
[13:56:38 INF] Start Validation step 3/6: Metrics Declaration
[13:56:38 INF] Metrics declaration is using spec version v1
[13:56:38 ERR] The following problems were found with the metric configuration:
Error 1:1: 'metrics' is a required field but was not found.
Warning 12:1: Unknown field 'metric'. Did you mean 'metrics'?
[13:56:38 WRN] Validation step 3/6 failed. Error(s): Errors were found while deserializing the metric configuration.
[13:56:38 INF] Start Validation step 4/6: Resource Discovery
[13:56:38 INF] Validation step 4/6 succeeded
[13:56:38 INF] Start Validation step 5/6: StatsD Metric Sink
[13:56:38 INF] Validation step 5/6 succeeded
[13:56:38 INF] Start Validation step 6/6: Prometheus Scraping Endpoint Metric Sink
[13:56:38 INF] Validation step 6/6 succeeded
[13:56:38 FTL] Promitor is not configured correctly. Please fix validation issues and re-run.
[13:56:38 FTL] Host terminated unexpectedly
Promitor.Agents.Scraper.Validation.Exceptions.ValidationFailedException: Validation Failed. Errors:- Metrics Declaration: Errors were found while deserializing the metric configuration.

   at Promitor.Agents.Scraper.Validation.RuntimeValidator.ProcessValidationResults(List`1 validationResults) in /home/adam/github.com/adamconnelly/promitor/src/Promitor.Agents.Scraper/Validation/RuntimeValidator.cs:line 61
   at Promitor.Agents.Scraper.Validation.RuntimeValidator.Run() in /home/adam/github.com/adamconnelly/promitor/src/Promitor.Agents.Scraper/Validation/RuntimeValidator.cs:line 47
   at Promitor.Agents.Scraper.Startup.ValidateRuntimeConfiguration(IServiceCollection services) in /home/adam/github.com/adamconnelly/promitor/src/Promitor.Agents.Scraper/Startup.cs:line 87
   at Promitor.Agents.Scraper.Startup.ConfigureServices(IServiceCollection services) in /home/adam/github.com/adamconnelly/promitor/src/Promitor.Agents.Scraper/Startup.cs:line 58
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.InvokeCore(Object instance, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass9_0.<Invoke>g__Startup|0(IServiceCollection serviceCollection)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.Invoke(Object instance, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass8_0.<Build>b__0(IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.UseStartup(Type startupType, HostBuilderContext context, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.<>c__DisplayClass12_0.<UseStartup>b__0(HostBuilderContext context, IServiceCollection services)
   at Microsoft.Extensions.Hosting.HostBuilder.CreateServiceProvider()
   at Microsoft.Extensions.Hosting.HostBuilder.Build()
   at Promitor.Agents.Scraper.Program.Main(String[] args) in /home/adam/github.com/adamconnelly/promitor/src/Promitor.Agents.Scraper/Program.cs:line 23

In this case the situation isn't really exceptional, in the sense that it's part of the normal lifecycle of the application, and we expect that validation can fail (which is why we're doing it in the first place!), so relying on an exception like this doesn't feel appropriate. The stack trace also doesn't add value to Promitor users since the validation already tells them what's wrong, and isn't necessary for developers since we know where the validation code lives, and we can see the step that failed from the error message.

Specification

The application should follow the following lifecycle:

Load configuration and register any services.
Run validation to make sure we can start successfully.
Start the web server and scraping jobs.

We can use a similar approach to the one outlined here to achieve this: https://andrewlock.net/running-async-tasks-on-app-startup-in-asp-net-core-part-1/#4-manually-running-tasks-in-program-cs.

This will allow us to run validation, exiting with a non-zero exit code if validation fails.

Question

Currently validation continues even if a previous step has failed. Is this deliberate, and is it the behaviour we want?

The text was updated successfully, but these errors were encountered:

I've tweaked the way that the startup process for Promitor works so that it runs the validation in the `Main()` method. This gives us the opportunity to exit gracefully if validation fails instead of throwing an exception. I've also added a new enum to track the possible exit statuses, and made sure that unhandled exceptions continue to use an exit code of `1`. Fixes tomkerkhove#1113

I've tweaked the way that the startup process for Promitor works so that it runs the validation in the `Main()` method. This gives us the opportunity to exit gracefully if validation fails instead of throwing an exception. Also: - Added a new enum to track the possible exit statuses, and made sure that unhandled exceptions continue to use an exit code of `1`. - Updated the unhandled exception message to point people to raising an issue. - Altered the check to make sure the config folder is set so that it exits gracefully instead of ending up in the unhandled exception block. - Moved the logging about whether or not the configuration is valid from RuntimeValidator into the main method. It seemed more appropriate for the logging to be there since the main method now has logic for exiting if the config is invalid. Fixes tomkerkhove#1113

tomkerkhove · 2020-06-29T05:56:28Z

Re-opening for resource discovery agent

- Moved `ExitStatus` into the agents core and updated the discovery agent to use it. - Updated the unhandled exception message for the discovery agent to match the format of the scraper agent. - Added some additional validation to both agents to check that their required config files exist. This is to avoid us ending up in the unhandled exception block and directing users to create an issue. Fixes tomkerkhove#1113

triage-new-issues bot added the triage label Jun 27, 2020

adamconnelly mentioned this issue Jun 27, 2020

Exit gracefully if a validation error occurs #1114

Merged

tomkerkhove closed this as completed in #1114 Jun 29, 2020

tomkerkhove added this to the v2.0.0 milestone Jun 29, 2020

tomkerkhove added the enhancement Enhancements for current features label Jun 29, 2020

triage-new-issues bot removed the triage label Jun 29, 2020

tomkerkhove reopened this Jun 29, 2020

tomkerkhove assigned adamconnelly Jun 29, 2020

adamconnelly mentioned this issue Jul 3, 2020

Update discovery agent startup to match scraper #1144

Merged

tomkerkhove closed this as completed in #1144 Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle validation failures on startup more gracefully #1113

Handle validation failures on startup more gracefully #1113

adamconnelly commented Jun 27, 2020

tomkerkhove commented Jun 29, 2020

Handle validation failures on startup more gracefully #1113

Handle validation failures on startup more gracefully #1113

Comments

adamconnelly commented Jun 27, 2020

Specification

Question

tomkerkhove commented Jun 29, 2020