-
Notifications
You must be signed in to change notification settings - Fork 1k
add support for custom TLS certificates #798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Well, tests are already better, since they do not fail with some obscure error, but rather with timeout. I'll check why. |
@zimbatm since we enabled CRD validation all new manifest fields must be covered by it. That's why the test is failing - validation doesn't know Have a look at the CRD and add the field there (alphabetical order). If you have questions feel free to ask. Had to deal with validation a lot in the past months. |
thanks @FxKu. I just pushed updated to both schema files. Let's see what CI is saying now. |
4ac7c38
to
6b59a55
Compare
it looks like coveralls is unhappy. any idea how I could add test coverage for |
stoked on this PR as it will be really helpful for what we have going. but testing with the current spilo, and spilo (or maybe patroni), raises issue with the permissions:
|
Disregard my comment. Issue is related to the operator, not this PR. Opened issue #821 |
This will fix #633 (mentioning it here just so it gets linked to that issue) |
6b59a55
to
d152ea9
Compare
Rebased to fix the merge conflicts. @FxKu is it possible to schedule 30min next week to get this merged? You can reserve a slot at https://calendly.com/zimbatm/ |
@FxKu ok, let me know if that's better! |
pkg/cluster/k8sres.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found another issue. In one of your documented examples caFile
is left out. But still it is appended as env
variable and Postgres tries to look it up, but:
FATAL: could not load root certificate file "/tls/ca.crt": No such file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the unit test could also be extended then to check that the env variable is not created when the CA is left out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I did my testing, Postgres complains but keeps going. Is that not the case for you?
I will add documentation for the cafile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the CAFile optional and documented it a bit in 662e7e68e90ed199b6fd6df1ac0388a1bb40407c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted both secret types to work out of the box but I understand if the log line is undesirable.
docs/reference/cluster_manifest.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to "ca.crt".
missing (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not anymore, I had to remove it to avoid the Postgres runtime error.
docs/user.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, can't it be read from the secret if it's in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The secret resource is not inspectable during schema generation. The secret resource might not exist, or might be also changed at later points in time. This would require to add another hook in the operator to listen for changes on that secret and update the schema accordingly. I don't know if it's worth the extra complexity.
Another solution would be to change the spilo image to unset the environment variable if the file doesn't exist. This would hide errors if the file really is supposed to exist and is not there.
That's why initially I just passed the env to the spilo image and let postgres complain. It's unfortunate that the log line contains "FATAL" because it didn't seem to affect the running instance the last time I tested it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the explanation.
👍 |
@FxKu last commit to improve the user documentation |
docs/user.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not very precise. Are changes ignored? Does it break the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It inherits the same issue that changing spilo_fsgroup
has where the pod needs to be re-created.
I will remove that section for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the user reads-up on spilo_fsgroup
they will probably find the issue.
I think this is ready |
daf62d1
to
0803b90
Compare
rebased and squashed |
Thanks @zimbatm. Can you address my last comment? We are already telling in the reference docs that changing the |
fair enough, I will just remove that section then. |
it looks like the e2e test failures are also there in master |
👍 |
yes that's one part in the e2e test which probably needs more waiting time, since it's passing at times and sometimes not :-/ |
👍 |
Thanks @zimbatm for adding this cool feature and your patience in the review process. |
yesssssss!! 🎉 thanks to you too for helping me out on this PR! |
@FxKu @zimbatm
The statefulSet does not get the mount point for the secret. Nothing seems to happen, and no errors in the opr logs. |
It's hard to say without more details. I haven't used openshift before so I don't know what changes from a vanilla k8s installation. The first thing I would do is look at the event log to see if anything stands out. If it's an existing pg cluster, try creating a fresh one as pods don't really like the fsGroup to be changed. |
right, it's the new section that adds fsGroup automatically that is causing the issue:
We must have it autodetect the group, or ideally simply remove it. When we remove the fsGroup from the sts (for few minutes till opr puts it back), perms and all look like this:
k8sres.go:
and
IMO adding tls should not automatically add functionality like fsGroup, when it was not specifically requested. This way, for these two reasons (not working on openshift as well as non-intuitive behaviour), I suggest to remove the above two section. I have simulated removing that section in OpenShift, and all seems to work very nice (thanks for the code!) If it's fsGroup is mandatory for k8s to fix perms, we should add it in the docs that user should add fsGroup, maybe change the docs:
to something like:
(ideally add more info in the last phrase). @zimbatm @FxKu , WDYS? |
* solves #798 (comment) Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* solves zalando/postgres-operator#798 (comment) Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
re-opened #690 from my account to allow pushes from maintainers.