Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #30284 - Improve smartproxy registration failure error messages #912

Merged
merged 2 commits into from
Jan 27, 2021

Conversation

wbclark
Copy link
Contributor

@wbclark wbclark commented Jan 6, 2021

draft PR

@wbclark
Copy link
Contributor Author

wbclark commented Jan 6, 2021

Current output when httpd is down:

    2021-01-06 17:17:26 [NOTICE] [configure] 1600 out of 1684 done.
    2021-01-06 17:17:30 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Could not evaluate: Exception Failed to open TCP connection to centos7-katello-3-18.mercury.example.com:443 (Connection refused - connect(2) for "centos7-katello-3-18.mercury.example.com" port 443) in get request to: https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies?search=name=%22centos7-foreman-proxy-nightly.mercury.example.com%22
    2021-01-06 17:17:30 [ERROR ] [configure] Wrapped exception:
    2021-01-06 17:17:30 [ERROR ] [configure] Failed to open TCP connection to centos7-katello-3-18.mercury.example.com:443 (Connection refused - connect(2) for "centos7-katello-3-18.mercury.example.com" port 443)
    2021-01-06 17:17:30 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Failed to call refresh: Exception Failed to open TCP connection to centos7-katello-3-18.mercury.example.com:443 (Connection refused - connect(2) for "centos7-katello-3-18.mercury.example.com" port 443) in get request to: https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies?search=name=%22centos7-foreman-proxy-nightly.mercury.example.com%22
    2021-01-06 17:17:30 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Exception Failed to open TCP connection to centos7-katello-3-18.mercury.example.com:443 (Connection refused - connect(2) for "centos7-katello-3-18.mercury.example.com" port 443) in get request to: https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies?search=name=%22centos7-foreman-proxy-nightly.mercury.example.com%22
    2021-01-06 17:17:30 [ERROR ] [configure] Wrapped exception:
    2021-01-06 17:17:30 [ERROR ] [configure] Failed to open TCP connection to centos7-katello-3-18.mercury.example.com:443 (Connection refused - connect(2) for "centos7-katello-3-18.mercury.example.com" port 443)
    2021-01-06 17:17:32 [NOTICE] [configure] System configuration has finished.
    2021-01-06 17:17:32 [NOTICE] [post] Executing hooks in group post
    2021-01-06 17:17:32 [NOTICE] [post] All hooks in group post finished
  
      There were errors detected during install.
      Please address the errors and re-run the installer to ensure the system is properly configured.
      Failing to do so is likely to result in broken functionality.
  
      The full log is at /var/log/foreman-installer/foreman-proxy-content.log

@wbclark
Copy link
Contributor Author

wbclark commented Jan 6, 2021

When foreman.service is down, there is no issue because foreman.socket will bring it back up.

OTOH, when foreman.socket itself is down:

    2021-01-06 19:14:49 [NOTICE] [configure] 1600 out of 1684 done.
    2021-01-06 19:14:53 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Could not evaluate: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 503)
    2021-01-06 19:14:53 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Failed to call refresh: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 503)
    2021-01-06 19:14:53 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 503)
    2021-01-06 19:14:55 [NOTICE] [configure] System configuration has finished.
    2021-01-06 19:14:55 [NOTICE] [post] Executing hooks in group post
    2021-01-06 19:14:55 [NOTICE] [post] All hooks in group post finished
  
      There were errors detected during install.
      Please address the errors and re-run the installer to ensure the system is properly configured.
      Failing to do so is likely to result in broken functionality.
  
      The full log is at /var/log/foreman-installer/foreman-proxy-content.log

@wbclark
Copy link
Contributor Author

wbclark commented Jan 6, 2021

When a 401 response is received (for example by using incorrect --foreman-proxy-oauth-consumer-key):

    2021-01-06 20:48:53 [NOTICE] [configure] 1600 out of 1684 done.
    2021-01-06 20:48:57 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Could not evaluate: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 401)
    2021-01-06 20:48:57 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Failed to call refresh: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 401)
    2021-01-06 20:48:57 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Proxy centos7-foreman-proxy-nightly.mercury.example.com cannot be retrieved: unknown error (response 401)
    2021-01-06 20:48:59 [NOTICE] [configure] System configuration has finished.
    2021-01-06 20:48:59 [NOTICE] [post] Executing hooks in group post
    2021-01-06 20:48:59 [NOTICE] [post] All hooks in group post finished
  
      There were errors detected during install.
      Please address the errors and re-run the installer to ensure the system is properly configured.
      Failing to do so is likely to result in broken functionality.
  
      The full log is at /var/log/foreman-installer/foreman-proxy-content.log

@wbclark wbclark force-pushed the clear_proxy_error branch 2 times, most recently from 3f9b0f0 to 5912fc8 Compare January 6, 2021 22:47
@wbclark wbclark force-pushed the clear_proxy_error branch 4 times, most recently from ab88b03 to 82ede21 Compare January 7, 2021 07:21
@wbclark wbclark force-pushed the clear_proxy_error branch 3 times, most recently from 303a7a6 to 661fdde Compare January 13, 2021 15:52
@wbclark wbclark force-pushed the clear_proxy_error branch 10 times, most recently from f0713d8 to c165811 Compare January 15, 2021 16:57
@wbclark wbclark changed the title [WIP] implement clearer error messages for smartproxy registration failures implement clearer error messages for smartproxy registration failures Jan 15, 2021
@wbclark wbclark marked this pull request as ready for review January 15, 2021 16:59
@wbclark wbclark changed the title implement clearer error messages for smartproxy registration failures Refs #30284 - Improve smartproxy registration failure error messages Jan 18, 2021
@wbclark
Copy link
Contributor Author

wbclark commented Jan 18, 2021

Error output with the latest version is as below:

    2021-01-18 23:46:13 [NOTICE] [configure] 1600 out of 1684 done.
    2021-01-18 23:46:17 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Could not evaluate: Error making GET request to Foreman at https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies: 503 Service Unavailable: The webserver was unable to reach the backend service. Is foreman.service running?
    2021-01-18 23:46:17 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Failed to call refresh: Error making GET request to Foreman at https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies: 503 Service Unavailable: The webserver was unable to reach the backend service. Is foreman.service running?
    2021-01-18 23:46:17 [ERROR ] [configure] /Stage[main]/Foreman_proxy::Register/Foreman_smartproxy[centos7-foreman-proxy-nightly.mercury.example.com]: Error making GET request to Foreman at https://centos7-katello-3-18.mercury.example.com/api/v2/smart_proxies: 503 Service Unavailable: The webserver was unable to reach the backend service. Is foreman.service running?
    2021-01-18 23:46:20 [NOTICE] [configure] System configuration has finished.

Removing additional unnecessary language and de-duplicating error messages in foreman-installer output will require an additional change in Kafo, independent of the implementation here.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 19, 2021

FYI: I still think that theforeman/kafo#290 is the way to go for real error reporting. I suspect you have more context available to give a precise message.

It looks like a powerful solution, but it also requires a lot more build out, so it looks like there isn't an easy path to get that done for the next release?

@ehelms
Copy link
Member

ehelms commented Jan 19, 2021

A couple of minor comments from me. I think this is a good update to the output messages for an area that is often unclear as to what might be the underlying cause when users hit it to help users and developers more easily diagnosis and fix the problem. I think there have been some further updates outlined we could investigate for general error reporting the installer, and for me those should not hold up this improvement.

messages = {
'400' => '400 Bad Request: Something is wrong with the data we sent to Foreman server',
'401' => '401 Unauthorized Request: Check credentials for validity',
'404' => '404 Not Found: The requested resource was not found',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an edge case to me. The only thing I can think of is that they entered the wrong server URL and end up on a different service.

Copy link
Contributor Author

@wbclark wbclark Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely an edge case and I had a few more in mind when I wrote it, based on situations I've seen previously:

  1. Attempted improper use of load balancing

  2. "I forgot I cloned this VM and left it running on the same network"

  3. Automation/scripts run amok

'504' => 'The webserver timed out waiting for a response from the backend service. Is Foreman under unusually heavy load?'
}

if explanation = explanations[response.code.to_str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to use braces when there is an assignment as a visual cue. Otherwise it's too easy to confuse with == and it may just be a bug/typo.

Suggested change
if explanation = explanations[response.code.to_str]
if (explanation = explanations[response.code.to_str])

The alternative is:

Suggested change
if explanation = explanations[response.code.to_str]
explanation = explanations[response.code.to_str]
if explanation

Copy link
Contributor Author

@wbclark wbclark Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't aware of this convention.

Thinking about it some more, would this be acceptable?

if not (explanation = explanations[response.code.to_str]).nil?

The ()s are only useful if the reader knows the convention. This makes it very clear we are testing whether explanation got assigned a non-nil value, while keeping it on one line. (unless was also tempting here but unless... else feels awkward)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to add Rubocop to the repo which will enforce the convention. If you don't do it and meant if x == y, Rubocop will ask you to change it to if (x = y). That should tell you enough.

On a side note, Python thought it was a bad idea to allows assignment statements in if conditions to prevent those logic bugs. Recently it did add if x := y as a compromise.

That's the theory behind it.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it Refs instead of Fixes? What do you think is needed to close the issue?

Code wise I think this is ok.

@ehelms
Copy link
Member

ehelms commented Jan 22, 2021

Same, I do feel this 'Fixes' the issue. And other enhancements we may do can be captured by independent issues.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 22, 2021

Why is it Refs instead of Fixes? What do you think is needed to close the issue?

Code wise I think this is ok.

I'd like to also remove near duplicate lines like Unable to Evaluate: #{some_message} vs. Failed to call Refresh: #{the_same_message} from installer output

But I'm happy to treat that as a separate issue.... I'll change this one to Fixes

@wbclark wbclark changed the title Refs #30284 - Improve smartproxy registration failure error messages Fixes #30284 - Improve smartproxy registration failure error messages Jan 22, 2021
@wbclark
Copy link
Contributor Author

wbclark commented Jan 25, 2021

Forgot to comment on Friday that I did in fact update to use the parenthesis convention for variable assignment within conditional.

@ekohl could you take a look and let me know if you are satisfied?

@wbclark
Copy link
Contributor Author

wbclark commented Jan 26, 2021

I have rebased to solve the merge conflict

@ehelms
Copy link
Member

ehelms commented Jan 26, 2021

I'd like to get this change in before I do a release of the module.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline there are some recommendations for more actionable error messages but I'm ok merging this and improving it later on.

@@ -96,6 +99,20 @@ def success?(response)
end

def error_message(response)
JSON.parse(response.body)['error']['full_messages'].join(' ') rescue "unknown error (response #{response.code})"
explanations = {
'400' => 'Something is wrong with the data sent to Foreman server',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud because it's a bit vague. How about

Suggested change
'400' => 'Something is wrong with the data sent to Foreman server',
'400' => 'Receiving server (Foreman) did not accept our data',

The reason I put Foreman between braces is that if it actually hits some proxy or something else in between then it may be clearer. Of course there shouldn't be anything in between but if everything went well we wouldn't be writing this code :) I believe 400 is more likely to be returned by some non-Foreman server in between. Like if a user pointed it at https://internal-service.example.com

'400' => 'Something is wrong with the data sent to Foreman server',
'401' => 'Often this is caused by invalid Oauth credentials',
'404' => 'The requested resource was not found',
'500' => 'Check /var/log/foreman/production.log on Foreman server for detailed information',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #28384 (Foreman 2.2) there is a rake task. If we have the request ID (available as the X-Request-Id header) we can recommend REQUEST_ID=... foreman-rake errors:fetch_log instead of the precise error. I don't have experience with this myself, but if we want to make it easy to debug we can be much more precise nowadays. This task has to run on the Foreman server but the registration can happen from any server so we can't directly run it.

Note that Apache can also return a HTTP 500 in which case you need to look in the Apache logs. That was more common with Passenger (for example, when Rails failed to even boot) but may still happen.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 27, 2021

Thanks @ekohl . I'm not fully satisfied with that wording either (and I'm not sure that I'll ever be).

Thinking longer term, I think the best approach is to have links in these explanation messages to community wikis that can be much more comprehensive in covering various cases, with the option of downstream releases replacing those links with more specifically applicable ones for their community. Often the workflow for an user encountering one of these errors will be to copy and paste it into a search engine, so we might as well remove that step for them and just send them directly to the relevant webpage where we can really cover all possible causes in order of likelihood, and they can hopefully expand on it as new causes are discovered.

I'd like to consider this as a parking lot item to revisit in the future. For now, I'll merge it considering it's still an improvement over the current state.

@wbclark wbclark merged commit d0bfd99 into theforeman:master Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants