-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster initialisation fails with Permission Denied on GKE #676
Comments
Never seen this error before. Would be good to know how your K8s environment looks like? Where do you run it? Which K8s version? Are there PodSecurityPolicies defined? etc. |
Closing due to missing response. Anybody feel free to reopen if facing this problem. |
Something similar happens to me when the pods restart after beeing killed. I'm running on GKE with version 1.3.0 with the
and after that it loops. I can run |
Same here
Happens after Istio CNI failed pod initialisation for a node that is preemtible on GKE, and then force-shutting down the pg with |
Seems to be related to GKE. What is the securityContext of the Pod? Does it run in privileged mode or not? I've also remember issues with the readOnlyRootFilesystem set to true, but that was OpenShift. Any ideas @CyberDem0n ? |
One may want to try this new spilo image which does chmod accordingly at startup. To get the latest versions of the images of both operator and spilo, do: It solved similar issues in openshift. |
@ReSearchITEng What's the |
This is the permissions that are causing the crash:
|
Also
|
That's the acronym for our internal CI pipeline tool - continuous delivery pipeline |
We have a saying in Swedish: skit bakom spakarna; shit behind the levers (in this case me). 😃 |
We started running the latest cdp container
Would you like one of the dump files as well? |
Also, we have two developer laptops; the one with Windows Subsys for Linux works, with the latest cdp image, but on macOS:
both postgres-es are waiting for something. Also this warning is new:
and this one:
HOWEVER; deleting the postgresql instance and adding it back in made one of them become the leader |
Can you maybe raise this in the Spilo repository? The latter should already be fixed in the latest operator. |
I can confirm this issue is not fixed in the latest version of Spilo and the Operator v1.5.0. I experience the same behavior on GKE and PKS with PodSecurityPolicies enabled (Privileged policy enabled). It seems to work on EKS without PodSecurityPolicies enabled. This issue is quite a problem for us because it requires manual intervention every time the Postgres pod restarts (We have two single-node Postgres instances per namespace). In order to get the DB online again, the following command resolves the issue:
It seems like spilo requires 0 group permissions to start up? I tried the spilo 1.6-p114 image, but for some reason it didn't change the permissions either. |
Any news or plans regarding this issue, started using the operator and we are running into the same problem. |
@stoetti help us to help you:
diff --git a/postgres-appliance/launch.sh b/postgres-appliance/launch.sh
index 56d4c68..4d47f2d 100755
--- a/postgres-appliance/launch.sh
+++ b/postgres-appliance/launch.sh
@@ -18,7 +18,7 @@ fi
sysctl -w vm.dirty_background_bytes=67108864 > /dev/null 2>&1
sysctl -w vm.dirty_bytes=134217728 > /dev/null 2>&1
-mkdir -p "$PGLOG" "$RW_DIR/postgresql" "$RW_DIR/tmp" "$RW_DIR/certs"
+mkdir -p "$PGLOG" "$PGDATA" "$RW_DIR/postgresql" "$RW_DIR/tmp" "$RW_DIR/certs"
if [ "$(id -u)" -ne 0 ]; then
sed -e "s/^postgres:x:[^:]*:[^:]*:/postgres:x:$(id -u):$(id -g):/" /etc/passwd > "$RW_DIR/tmp/passwd"
cat "$RW_DIR/tmp/passwd" > /etc/passwd
@@ -35,6 +35,7 @@ done
chown -R postgres: "$PGROOT" "$RW_DIR/certs"
chmod -R go-w "$PGROOT"
chmod 01777 "$RW_DIR/tmp"
+chmod 0700 "$PGDATA"
if [ "$DEMO" = "true" ]; then
python3 /scripts/configure_spilo.py patroni pgqd certificate pam-oauth2
|
@CyberDem0n thanks for the quick reply and the suggestion. I had to update your path using the permission from a previous comment '0700' instead of the the suggested '07000' because permission error came up upon initiali startup. Opened a pull request zalando/spilo#447 |
Right, it was a stupid copy&paste error from my side. The permission clearly should be set to 700 oct, not 7000 :) |
I tried to setup a PG-Cluster with your provided minimal-database.yaml
When the Database-Pod starts, there is an error thrown while initializing the database:
initdb: could not access directory "/home/postgres/pgdata/pgroot/data": Permission denied
I did a quick: kubectl exec -it acid-test-cluster-0 -- /bin/bash
chown postgres.postgres pgdata/
and everything is fine an working (/home/postgres/pgdata had root.root as owner).
The text was updated successfully, but these errors were encountered: