-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node-exporter #13
Conversation
9fdd363
to
a51b26b
Compare
* Add role to create CA and client certificates This role will likely only be used only for boostrapping an environment. It doesn't take care of certificate rotation yet, but it can be made to do so as and when required.
a51b26b
to
618f3cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
Would be nicer if we had some more central internal CA; and that we dont store it in S3; but babysteps. We have 3150 days to fix it :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comments and have some questions:
Since you included everything related to metrics into this role, instead of putting it in a separate one, would it make sense to make it optional?
Can you elaborate on why the monitoring-certs
role came into existence?
Remove documentation for obsolete 'metrics_fqdn'
Also: * comment out usage of use_common_name_for_san as it is not available in ansible-2.7
In ansible 2.7, when openssl_csr sees and empty file, it panics.
They don't work so nicely with ansible 2.7, they need exact timestamp and calculating that is not fun. Fortunately they default to 10 years anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you included everything related to metrics into this role, instead of putting it in a separate one, would it make sense to make it optional?
Can you elaborate on why the
sft-monitoring-certs
role came into existence?
Minor comments. Also, the two questions above remain.
|
||
# NOTE: defaults to 'inherit' which leads to error messages not | ||
# appearing if nginx.conf is set to 'error_log: stderr' | ||
StandardError=journal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we sleeping? Is it to work around the bug that is linked? (I don't see why) or is it causing the bug that is linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised inherit
doesn't work StandardOutput=
defaults to journal
and StandardError=inherit
just means "Connect to the same fd as StandardOutput
" Honestly sounds like a systemd bug to me. @flokli thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't added in this PR, but I fixed it anyways :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshaymankar can you elaborate on why you removed this? The issue that you and I linked closes with Confirmed → Won't Fix
. I experienced the issue just last week, which was when I introduced the suggested workaround.
@arianvp the line you commented on, was introduced by #18. Happy to discuss anything related over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood from the launchpad thread, this is marked Won't Fix
for the ubuntu package only. It was fixed in the upstream according to this comment from over a year ago.
I could be wrong though. I did run the role and make sure it works. But as it is a race condition problem, I should've probably run this many times, which I didn't, so maybe I could run systemctl restart nginx
a few times to make sure it works. To be sure, what version of nginx did you face this issue with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nginx/1.18.0
. The comment also seems to refer to the Ubuntu package, no? The role installs from Nginx directly. Did you spot a link to a commit/change in the thread, so we can see that the actual fix was?
The issue has been fixed https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1581864
Also: * Rename 'nginx configuration changed' handler to 'reload nginx'
This reverts commit 86b11d3. To be discussed and done in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change I'd like to ask for: move the Prometheus collector port 8443
into a role variable in defaults/main.yml
. Otherwise, LGTM.
listen 8443 ssl http2; | ||
listen [::]:8443 ssl http2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to a variable in defaults/main.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to a constant. Let's make it overridable as we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Moved it to a constant.
I'm the last one disagreeing with an extensive use of constants ;) I just had this line in the back of my head, which could be - when configured as part of the environment variables - feed into both, Terraform and the Ansible role. Anyway, later.
Fixes https://github.com/zinfra/backend-issues/issues/1556