OpenShift Installation #985

steveyang95 · 2020-05-19T18:18:56Z

Hi!

Is there any formal documentation or directions that people can write up to get setup on OpenShift?

I have followed the following without much luck:
#852 (comment)

I am running on OpenShift 4.4 and my OpenShift cluster creation logs says: API v1.17.1 up

Error

  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 498, in _load_cluster                                                                                                       
    self._wait_caches()                                                                                                                                                                                     
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 492, in _wait_caches                                                                                                        
    raise RetryFailedError('Exceeded retry deadline')                                                                                                                                                       
patroni.utils.RetryFailedError: 'Exceeded retry deadline'                                                                                                                                                   
2020-05-19 05:47:47,803 ERROR: Error communicating with DCS                                                                                                                                                 
2020-05-19 05:47:47,803 INFO: DCS is not accessible                                                                                                                                                         
2020-05-19 05:47:47,805 WARNING: Loop time exceeded, rescheduling immediately.                                                                                                                              
2020-05-19 05:47:48,470 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:48,473 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:49,474 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:49,475 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:50,477 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:50,479 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:51,480 ERROR: ObjectCache.run ApiException()                                                                                                                                               
2020-05-19 05:47:51,482 ERROR: ObjectCache.run ApiException()

I have also set kubernetes_use_configmaps: "true".

These are the commands that I run:

oc apply -f postgres-operator/manifests/operator-service-account-rbac.yaml
oc apply -f postgres-operator/manifests/postgres-operator.yaml
oc apply -f postgres-operator/manifests/api-service.yaml
oc apply -f postgres-operator/manifests/minimal-postgres-manifest.yaml

My configmap.yaml:

kind: ConfigMap
metadata:
  name: postgres-operator
data:
  # additional_secret_mount: "some-secret-name"
  # additional_secret_mount_path: "/some/dir"
  api_port: "8080"
  aws_region: eu-central-1
  cluster_domain: cluster.local
  cluster_history_entries: "1000"
  cluster_labels: application:spilo
  cluster_name_label: cluster-name
  # connection_pooler_default_cpu_limit: "1"
  # connection_pooler_default_cpu_request: "500m"
  # connection_pooler_default_memory_limit: 100Mi
  # connection_pooler_default_memory_request: 100Mi
  connection_pooler_image: "registry.opensource.zalan.do/acid/pgbouncer:master-7"
  # connection_pooler_max_db_connections: 60
  # connection_pooler_mode: "transaction"
  # connection_pooler_number_of_instances: 2
  # connection_pooler_schema: "pooler"
  # connection_pooler_user: "pooler"
  # custom_service_annotations: "keyx:valuez,keya:valuea"
  # custom_pod_annotations: "keya:valuea,keyb:valueb"
  db_hosted_zone: db.example.com
  debug_logging: "true"
  # default_cpu_limit: "1"
  # default_cpu_request: 100m
  # default_memory_limit: 500Mi
  # default_memory_request: 100Mi
  docker_image: registry.opensource.zalan.do/acid/spilo-12:1.6-p3
  # downscaler_annotations: "deployment-time,downscaler/*"
  # enable_admin_role_for_users: "true"
  # enable_crd_validation: "true"
  # enable_database_access: "true"
  # enable_init_containers: "true"
  # enable_lazy_spilo_upgrade: "false"
  enable_master_load_balancer: "false"
  # enable_pod_antiaffinity: "false"
  # enable_pod_disruption_budget: "true"
  enable_replica_load_balancer: "false"
  # enable_shm_volume: "true"
  # enable_sidecars: "true"
  # enable_team_superuser: "false"
  enable_teams_api: "false"
  # etcd_host: ""
  kubernetes_use_configmaps: "true"
  # infrastructure_roles_secret_name: postgresql-infrastructure-roles
  # inherited_labels: application,environment
  # kube_iam_role: ""
  # log_s3_bucket: ""
  logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
  # logical_backup_s3_access_key_id: ""
  logical_backup_s3_bucket: "my-bucket-url"
  # logical_backup_s3_region: ""
  # logical_backup_s3_endpoint: ""
  # logical_backup_s3_secret_access_key: ""
  logical_backup_s3_sse: "AES256"
  logical_backup_schedule: "30 00 * * *"
  master_dns_name_format: "{cluster}.{team}.{hostedzone}"
  # master_pod_move_timeout: 20m
  # max_instances: "-1"
  # min_instances: "-1"
  # min_cpu_limit: 250m
  # min_memory_limit: 250Mi
  # node_readiness_label: ""
  # oauth_token_secret_name: postgresql-operator
  # pam_configuration: |
  #  https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
  # pam_role_name: zalandos
  pdb_name_format: "postgres-{cluster}-pdb"
  # pod_antiaffinity_topology_key: "kubernetes.io/hostname"
  pod_deletion_wait_timeout: 10m
  # pod_environment_configmap: "default/my-custom-config"
  pod_label_wait_timeout: 10m
  pod_management_policy: "ordered_ready"
  pod_role_label: spilo-role
  # pod_service_account_definition: ""
  pod_service_account_name: "postgres-pod"
  # pod_service_account_role_binding_definition: ""
  pod_terminate_grace_period: 5m
  # postgres_superuser_teams: "postgres_superusers"
  # protected_role_names: "admin"
  ready_wait_interval: 3s
  ready_wait_timeout: 30s
  repair_period: 5m
  replica_dns_name_format: "{cluster}-repl.{team}.{hostedzone}"
  replication_username: standby
  resource_check_interval: 3s
  resource_check_timeout: 10m
  resync_period: 30m
  ring_log_lines: "100"
  secret_name_template: "{username}.{cluster}.credentials"
  # sidecar_docker_images: ""
  # set_memory_request_to_limit: "false"
  spilo_privileged: "false"
  super_username: postgres
  # team_admin_role: "admin"
  # team_api_role_configuration: "log_statement:all"
  # teams_api_url: http://fake-teams-api.default.svc.cluster.local
  # toleration: ""
  # wal_s3_bucket: ""
  watched_namespace: "*"  # listen to all namespaces
  workers: "4"

I have also tried the following and got same ApiException()

Operator Image should be at least: registry.opensource.zalan.do/acid/postgres-operator:v1.4.0-21-g1249626-dirty
Operator should be configured with these values:
kubernetes_use_configmaps: "true"
docker_image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p114 #or newer

The text was updated successfully, but these errors were encountered:

yaroslavkasatikov · 2020-05-23T14:01:02Z

Hello,
I totally support this.
Got the same issue on Openshift 4.4.3.

Tried with:

kubernetes_use_configmaps: "true"
docker_image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p119
operator: registry.opensource.zalan.do/acid/postgres-operator:v1.5.0

All outputs are the same with @steveyang95

Also I tried to start cluster in privileged mode w/o kubernetes_use_configmaps, but got this error.

2020-05-23 13:58:32,423 ERROR: failed to update leader lock
2020-05-23 13:58:32,505 INFO: not promoting because failed to update leader lock in DCS
2020-05-23 13:58:42,369 INFO: Lock owner: acid-cluster-0; I am acid-cluster-0
2020-05-23 13:58:42,418 ERROR: Permission denied
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 288, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 634, in patch_or_create
    ret = self.retry(func, self._namespace, body) if retry else func(self._namespace, body)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 455, in retry
    return self._retry.copy()(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/utils.py", line 331, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 277, in wrapper
    return getattr(self._api, func)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 203, in wrapper
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8SClient.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '025a21dc-753a-4082-b64f-c4b9689e04e7', 'Content-Type': 'application/json', 'Date': 'Sat, 23 May 2020 13:58:42 GMT', 'Content-Length': '251'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \\"acid-cluster\\" is forbidden: endpoint address 10.129.0.87 is not allowed","reason":"Forbidden","details":{"name":"acid-cluster","kind":"endpoints"},"code":403}\n'

2020-05-23 13:58:42,419 ERROR: failed to update leader lock
2020-05-23 13:58:42,419 INFO: not promoting because failed to update leader lock in DCS

So, could you please test your installation in Openshift 4.4.3 and give a feedback.

Many thanks,
Yaroslav

FxKu · 2020-05-25T13:08:14Z

Maybe @ReSearchITEng, you can opt in here as our OpenShift user. I think, you can not simply take the Spilo image as is, but must make sure it runs in rootless mode.

yaroslavkasatikov · 2020-05-31T20:45:26Z

Hey team,

Do you have any updates here?
We're looking for postgresql operator as strategic solution and want to test yours, but Openshift is a cornerstone of all infrastructure system. It will be too sad if we won't be able to run your postgres operator :-(

FxKu · 2020-06-03T16:55:38Z

@steveyang95 and @yaroslavkasatikov can you check if the solution described here helps? Setting the spiloFSGroup parameter.

On the other hands, as per docs this param is also not required for OpenShift. Maybe you can choose a previous Spilo release with the v1.5.0 operator? Then we can better tell, where the incompatibility is coming from.

Jan-M · 2020-06-04T17:40:00Z

I would look into privileges of the pod, seems Patroni (within the Postgres pod) is not allowed to do leader election.

So you may lack pod privileges to update/write config maps which are used on open shift for election.

https://github.com/zalando/postgres-operator/blob/master/manifests/operator-service-account-rbac.yaml#L220

CyberDem0n · 2020-06-05T06:18:09Z

@yaroslavkasatikov on openshift you have to set kubernetes_use_configmaps.

When using endpoints Patroni is trying to update subsets with the IP address of the pod which is running as primary, and on OpenShift it is not allowed :(

FxKu · 2020-06-05T09:26:26Z

@steveyang95 and @yaroslavkasatikov can you try to extend the cluster role used by the Pods and hence Patroni to be able to read and update ConfigMaps? I guess, simply replacing endpoints here with configmaps should be fine. Can you try?

ReSearchITEng · 2020-06-11T12:28:07Z

@steveyang95 @yaroslavkasatikov
I can confirm you it works on OCP 4.3 tested and working. Did no test on 4.4 (yet).
oc version returns: "kubernetes v1.16.2" (aka OpenShift 4.3).

Please set DEBUG messages, so we'll get better understanding what resource is OCP rejecting.

More on OCP 4.3 we use:
(On a security hardened one, where PV read is not allowed, after 30 min (resync period), the oc get pg will change status from Running to "SyncFailed". The cluster still works as expected, just resync is impacted. This should be solved by PR#958 . Meanwhile you can set resync_period to some big number.)
when you install the helm chart (1.5.0), use configTarget: "ConfigMap".

resync_period: 987654321 #some big number
spilo_privileged: "false"
kubernetes_use_configmaps: "true"
docker_image: registry.opensource.zalan.do/acid/spilo-12:1.6-p3
watched_namespace: "" #if you want one opr per namespace

When you install the cluster, make sure you comment out:

#  enableShmVolume: true
#  spiloFSGroup: 103

As the OCP SCC will allocate dynamically user/group, and newer spilo images know how to dynamically chown to that at startup.

ReSearchITEng · 2020-07-03T19:35:05Z

Yes, OCP 4.4 (based on k8s 1.17), cluster pods (spilo) gives errors:

I can confirm the error: ERROR: ObjectCache.run ApiException()
Solution: Giving more perms to the postgres-pod (gave also all the postgres-operator perms to the pod). I did not identify the exact perm.
After that, going into this error: (Spilo fails to run callback_endpoint.py when KUBERNETES_USE_CONFIGMAPS is set to true spilo#449):

2020-07-03 19:23:05,818 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
Traceback (most recent call last):
  File "/scripts/callback_endpoint.py", line 9, in <module>
    from kubernetes import client as k8s_client, config as k8s_config
ModuleNotFoundError: No module named 'kubernetes'
2020-07-03 19:23:05,915 INFO: promoted self to leader by acquiring session lock
server promoting
2020-07-03 19:23:05,919 INFO: cleared rewind state after becoming the leader
Traceback (most recent call last):
  File "/scripts/callback_endpoint.py", line 9, in <module>
    from kubernetes import client as k8s_client, config as k8s_config
ModuleNotFoundError: No module named 'kubernetes'

<grants, etc all ok>

2020-07-03 19:23:16,956 INFO: Lock owner: postgres-operator-cluster-0; I am postgres-operator-cluster-0
2020-07-03 19:23:17,053 INFO: no action.  i am the leader with the lock

DB looks up (psql command in the pod workds), but, cluster is in "SyncFailed" status.

ReSearchITEng · 2020-07-03T20:54:22Z

Solution:

make sure you have for postgres-pod serviceAccount also the below permissions:
postgres-pod perms:
If you don't have them already, add:

- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - create
  - patch
  - get
  - list
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - patch
  - update
  - watch

on top of existing:

- apiGroups:
  - ""
  resources:
  - endpoints
  verbs:
  - get
# Patroni needs to watch pods
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
  - patch
  - update
  - watch
# to let Patroni create a headless service
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - create
  - patch

postgres-pod perms:
If not already there, add nodes perms:

  # to check nodes for node readiness label
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch

on top of existing.

crd perms
Make sure you have at least "get" if not the entire set like: get,create,patch,update,...
More on: limit perms for crds #1044
ignorable error
As for from kubernetes import client as k8s_client, config as k8s_config error -> it can be ignored. It will be fixed in Spilo fails to run callback_endpoint.py when KUBERNETES_USE_CONFIGMAPS is set to true spilo#449 , but you can safely ignore it.
Result:

$ oc version
Client Version: 4.5.0-202005291417-9933eb9
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         34d     Cluster version is 4.4.5
Kubernetes Version: v1.17.1
$ oc get pg
NAME                        TEAM       VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE    STATUS
postgres-cluster   postgres   12        1      1Gi      10m           100Mi            120m   Running
$ helm version
version.BuildInfo{Version:"v3.2.4"

stewartshea · 2020-08-13T23:04:04Z

I think you also need create for the postgres-pod perms on configmaps

- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - patch
  - update
  - watch
  - create

davidkarlsen · 2021-01-22T23:38:05Z

Related: #1327

davidkarlsen · 2021-01-22T23:38:54Z

shouldn't the operator provide whatever RBAC permissions needed, else it becomes a bit hackish and not so automated.

ghost · 2022-03-04T15:33:34Z

Even when kubernetes_use_configmaps is set the operator stills tries to create endpoints which are not allowed on OpenShift.

Probably related to this PR:

#1760

Can somebody of the maintainers have a look at this PR? Thx!

The only way I'm able to install the operator on OpenShift is to use an older version 1.6.3 and unset kubernetes_use_configmaps for the deployment to fail and add endpoint/restricted to postgres-pod

cluster-role.

resources:

endpoints
endpoints/restricted

endpoint/restricted can not be there from the start.

This trick doesn't seem to work with the latest version (1.7.x) or the main branch.

If somebody else is able to install the operator on OpenShift please share your config. Thx!

FxKu · 2022-04-04T15:12:15Z

It was fixed now with #1760 and #1825 and will be included in the next release this week.

ghost · 2022-04-05T07:51:14Z

It was fixed now with #1760 and #1825 and will be included in the next release this week.

Thanks! We tested the code and I confirm that it works on OpenShift.

Jan-M added the documentation label Jun 4, 2020

ReSearchITEng mentioned this issue Jul 3, 2020

Spilo fails to run callback_endpoint.py when KUBERNETES_USE_CONFIGMAPS is set to true zalando/spilo#449

Closed

feikesteenbergen mentioned this issue Jul 22, 2020

Provde instructions on how to install on OpenShift timescale/helm-charts#178

Open

FxKu mentioned this issue Jan 28, 2021

missing permissions for operator #1335

Closed

mpdevul mentioned this issue Oct 11, 2021

harbor-core cannot find redis goharbor/harbor-operator#786

Closed

Samusername mentioned this issue Nov 23, 2021

In Openshift, endpoints is forbidden #1702

Closed

FxKu closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenShift Installation #985

OpenShift Installation #985

steveyang95 commented May 19, 2020 •

edited

Loading

yaroslavkasatikov commented May 23, 2020

FxKu commented May 25, 2020

yaroslavkasatikov commented May 31, 2020 •

edited

Loading

FxKu commented Jun 3, 2020 •

edited

Loading

Jan-M commented Jun 4, 2020

CyberDem0n commented Jun 5, 2020

FxKu commented Jun 5, 2020

ReSearchITEng commented Jun 11, 2020 •

edited

Loading

ReSearchITEng commented Jul 3, 2020

ReSearchITEng commented Jul 3, 2020 •

edited

Loading

stewartshea commented Aug 13, 2020

davidkarlsen commented Jan 22, 2021

davidkarlsen commented Jan 22, 2021 •

edited

Loading

ghost commented Mar 4, 2022 •

edited by ghost

Loading

FxKu commented Apr 4, 2022

ghost commented Apr 5, 2022

OpenShift Installation #985

OpenShift Installation #985

Comments

steveyang95 commented May 19, 2020 • edited Loading

yaroslavkasatikov commented May 23, 2020

FxKu commented May 25, 2020

yaroslavkasatikov commented May 31, 2020 • edited Loading

FxKu commented Jun 3, 2020 • edited Loading

Jan-M commented Jun 4, 2020

CyberDem0n commented Jun 5, 2020

FxKu commented Jun 5, 2020

ReSearchITEng commented Jun 11, 2020 • edited Loading

ReSearchITEng commented Jul 3, 2020

ReSearchITEng commented Jul 3, 2020 • edited Loading

stewartshea commented Aug 13, 2020

davidkarlsen commented Jan 22, 2021

davidkarlsen commented Jan 22, 2021 • edited Loading

ghost commented Mar 4, 2022 • edited by ghost Loading

FxKu commented Apr 4, 2022

ghost commented Apr 5, 2022

steveyang95 commented May 19, 2020 •

edited

Loading

yaroslavkasatikov commented May 31, 2020 •

edited

Loading

FxKu commented Jun 3, 2020 •

edited

Loading

ReSearchITEng commented Jun 11, 2020 •

edited

Loading

ReSearchITEng commented Jul 3, 2020 •

edited

Loading

davidkarlsen commented Jan 22, 2021 •

edited

Loading

ghost commented Mar 4, 2022 •

edited by ghost

Loading