Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster creation fails due to webhook error. #167

Closed
jrab66 opened this issue Nov 25, 2022 · 11 comments
Closed

Cluster creation fails due to webhook error. #167

jrab66 opened this issue Nov 25, 2022 · 11 comments
Labels
type/question Type: question about the product

Comments

@jrab66
Copy link

jrab66 commented Nov 25, 2022

Describe the bug (required)
I am unable to actually install nebula cluster 1.3.0` on GKE, already seend to be failing on some webhook error at install time.

kruise,cert-manager and operator pods are actually working, is at the time of creating the nebula-cluster that is giving use the failure.

Your Environments (required)
already test on two versions on GKE.

  • GKE: 1.22.12-gke.2300
  • GKE: 1.24.7-gke.900

How To Reproduce(required)

prerequisites:

kruise 1.1.0
cert-manager 1.10.0
nebula-operator 1.3.0

helm install kruise openkruise/kruise --version 1.1.0
NAME: kruise
LAST DEPLOYED: Tue Nov 22 18:09:47 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None


kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.10.0/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
configmap/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created


kubectl create ns nebula-operator-system

namespace/nebula-operator-system created
helm install nebula-operator nebula-operator/nebula-operator --namespace=nebula-operator-system -f values/dev.operator.values.yaml --version 1.3.0
NAME: nebula-operator
LAST DEPLOYED: Tue Nov 22 18:12:17 2022
NAMESPACE: nebula-operator-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Nebula Operator installed!

error when installing actual nebula-cluster v1.3.0

helm install starmatch-nebula-dev nebula-operator/nebula-cluster -f values/dev.values.yaml --version 1.3.0
Error: Internal error occurred: failed calling webhook "nebulaclustervalidating.nebula-graph.io": failed to call webhook: Post "https://nebula-operator-webhook-service.nebula-operator-system.svc:443/apis/admission.nebula-graph.io/v1alpha1/nebulaclustervalidating?timeout=10s": no endpoints available for service "nebula-operator-webhook-service"
exit status 1
@wey-gu
Copy link
Contributor

wey-gu commented Nov 25, 2022

@MegaByte875 As I recall we by default didn't leverage webhook? Could we check this in GKE/GCP?

@MegaByte875
Copy link
Contributor

@jrab66 cert-manager and openkruise can be configurable and the default value is false, please follow the doc install_guide

@jrab66
Copy link
Author

jrab66 commented Nov 28, 2022

I am using this values to actually try to install it.

kruise installed with default values.

cert-manager installed via Yaml

operator `values.yaml

controllerManager:
  replicas: 2
  resources:
    limits:
      cpu: 200m
      memory: 300Mi
    requests:
      cpu: 200m
      memory: 300Mi

admissionWebhook:
  create: true

enableKruise: true

@Sophie-Xie Sophie-Xie added the type/question Type: question about the product label Nov 30, 2022
@kqzh
Copy link
Contributor

kqzh commented Nov 30, 2022

I am using this values to actually try to install it.

kruise installed with default values.

cert-manager installed via Yaml

operator `values.yaml

controllerManager:
  replicas: 2
  resources:
    limits:
      cpu: 200m
      memory: 300Mi
    requests:
      cpu: 200m
      memory: 300Mi

admissionWebhook:
  create: true

enableKruise: true

hi @jrab66 , please check if cert-manager is installed successfully in your environment, if you set admissionWebhook.create = true, you need install cert-manager before install nebula-operator. here is the install doc https://cert-manager.io/docs/installation/helm/

@bradenwright
Copy link

bradenwright commented Dec 1, 2022

Cert-manager is installed and working...

braden@rltadmins-MacBook-Pro-4 infrastructure % kc get all -n cert-manager
W1130 22:59:56.587268   14620 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
NAME                                           READY   STATUS    RESTARTS      AGE
pod/cert-manager-cainjector-5bbfd5f64c-pz6v9   1/1     Running   5 (31h ago)   32h
pod/cert-manager-db6ff9997-csstp               1/1     Running   1 (32h ago)   32h
pod/cert-manager-webhook-6776d98fdc-mqmcb      1/1     Running   0             32h
pod/reflector-5664d9c5b4-qjnnw                 1/1     Running   0             32h

NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   10.96.7.130    <none>        9402/TCP   32h
service/cert-manager-webhook   ClusterIP   10.96.10.248   <none>        443/TCP    32h

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           32h
deployment.apps/cert-manager-cainjector   1/1     1            1           32h
deployment.apps/cert-manager-webhook      1/1     1            1           32h
deployment.apps/reflector                 1/1     1            1           32h

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-cainjector-5bbfd5f64c   1         1         1       32h
replicaset.apps/cert-manager-db6ff9997               1         1         1       32h
replicaset.apps/cert-manager-webhook-6776d98fdc      1         1         1       32h
replicaset.apps/reflector-5664d9c5b4                 1         1         1       32h

The endpoint it complains about is running and has a vaild endpoint and is exposing the proper port...

braden@rltadmins-MacBook-Pro-4 infrastructure % kc describe svc -n mynamepsace nebula-operator-webhook-service  
W1130 22:59:30.562664   14601 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
Name:              nebula-operator-webhook-service
Namespace:         mynamespace
Labels:            app.kubernetes.io/component=admission-webhook
                   app.kubernetes.io/instance=dev-mn-nebula
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=nebula-operator
                   app.kubernetes.io/version=1.0.0
                   argocd.argoproj.io/instance=dev-mn-nebula
                   helm.sh/chart=nebula-operator-1.3.0
Annotations:       argocd.argoproj.io/sync-wave: -4
                   cloud.google.com/neg: {"ingress":true}
Selector:          app.kubernetes.io/component=controller-manager,app.kubernetes.io/instance=dev-mn-nebula,app.kubernetes.io/name=nebula-operator
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.1.102
IPs:               10.96.1.102
Port:              <unset>  443/TCP
TargetPort:        9443/TCP
Endpoints:         10.92.5.5:9443,10.92.7.8:9443
Session Affinity:  None
Events:            <none>

The version of cert-manager I'm running is...

jetstack cert-manager helm chart v1.10.0
app version for cert-manager v1.10.0

which is one of the latest if not the latest version of cert-manager. Anything else to check? Still getting:

Failed sync attempt to 282abdc49f98d84353b747e1905d8b3db9be05a9: one or more objects failed to apply, reason: Internal error occurred: failed calling webhook "nebulaclustervalidating.nebula-graph.io": failed to call webhook: Post "https://nebula-operator-webhook-service.mynamespace.svc:443/apis/admission.nebula-graph.io/v1alpha1/nebulaclustervalidating?timeout=10s": context deadline exceeded (retried 5 times).

@kqzh
Copy link
Contributor

kqzh commented Dec 1, 2022

hi @bradenwright , I use cert-manager v1.10.1 to test in the local environmentcan, can you execute kubectl describe deploy nebula-operator-controller-manager-deployment -n <your-namespace> and show me the result?

@bradenwright
Copy link

bradenwright commented Dec 1, 2022

It looks good from what I can tell, nebula-operator-controller-manager-deployment 2/2 2 2 5m17s:

braden@rltadmins-MacBook-Pro-4 helm % kc describe deploy -n mynamespace nebula-operator-controller-manager-deployment
W1201 01:16:13.798444   31715 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
Name:                   nebula-operator-controller-manager-deployment
Namespace:              mynamespace
CreationTimestamp:      Thu, 01 Dec 2022 01:10:27 -0600
Labels:                 app.kubernetes.io/component=controller-manager
                        app.kubernetes.io/instance=prod-sm-nebula
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=nebula-operator
                        app.kubernetes.io/version=1.0.0
                        argocd.argoproj.io/instance=prod-sm-nebula
                        helm.sh/chart=nebula-operator-1.3.0
Annotations:            argocd.argoproj.io/sync-wave: -4
                        deployment.kubernetes.io/revision: 1
Selector:               app.kubernetes.io/component=controller-manager,app.kubernetes.io/instance=prod-sm-nebula,app.kubernetes.io/name=nebula-operator
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/component=controller-manager
                    app.kubernetes.io/instance=prod-sm-nebula
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=nebula-operator
                    app.kubernetes.io/version=1.0.0
                    helm.sh/chart=nebula-operator-1.3.0
  Service Account:  nebula-operator-controller-manager-sa
  Containers:
   controller-manager:
    Image:      vesoft/nebula-operator:v1.3.0
    Port:       9443/TCP
    Host Port:  0/TCP
    Command:
      /usr/local/bin/controller-manager
    Args:
      --health-probe-bind-address=:8081
      --metrics-bind-address=:8080
      --enable-kruise=true
      --max-concurrent-reconciles=3
      --enable-leader-election
      --leader-election-namespace=mynamespace
      --admission-webhook=true
    Limits:
      cpu:     200m
      memory:  300Mi
    Requests:
      cpu:        200m
      memory:     300Mi
    Liveness:     http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:    http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
   kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=2
    Limits:
      cpu:     100m
      memory:  30Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:
   cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nebula-operator-webhook-secret
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   nebula-operator-controller-manager-deployment-58f6ccc4b7 (2/2 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  5m47s  deployment-controller  Scaled up replica set nebula-operator-controller-manager-deployment-58f6ccc4b7 to 2

@kqzh
Copy link
Contributor

kqzh commented Dec 1, 2022

It seems your nebula-operator-webhook-service.selector was not match nebula-operator-controller-manager-deployment.label, label app.kubernetes.io/instance is different, can you edit it and try again?

@bradenwright
Copy link

I think that was just me switching b/t 2 deploys, here's the info again from the same deploy/cluster:

braden@rltadmins-MacBook-Pro-4 mn-nebula % kc describe svc -n mynamespace nebula-operator-webhook-service
W1201 01:44:19.515837   35571 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
Name:              nebula-operator-webhook-service
Namespace:         mynamespace
Labels:            app.kubernetes.io/component=admission-webhook
                   app.kubernetes.io/instance=prod-mn-nebula
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=nebula-operator
                   app.kubernetes.io/version=1.0.0
                   argocd.argoproj.io/instance=prod-mn-nebula
                   helm.sh/chart=nebula-operator-1.3.0
Annotations:       argocd.argoproj.io/sync-wave: -4
                   cloud.google.com/neg: {"ingress":true}
Selector:          app.kubernetes.io/component=controller-manager,app.kubernetes.io/instance=prod-mn-nebula,app.kubernetes.io/name=nebula-operator
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.208.6.25
IPs:               10.208.6.25
Port:              <unset>  443/TCP
TargetPort:        9443/TCP
Endpoints:         10.204.1.18:9443,10.204.3.5:9443
Session Affinity:  None
Events:            <none>
braden@rltadmins-MacBook-Pro-4 mn-nebula % kc decribe deploy -n mynamespace nebula-operator-controller-manager-deployment
error: unknown command "decribe" for "kubectl"

Did you mean this?
	describe
braden@rltadmins-MacBook-Pro-4 mn-nebula % kc describe deploy -n mynamespace nebula-operator-controller-manager-deployment
W1201 01:44:45.710497   35591 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
Name:                   nebula-operator-controller-manager-deployment
Namespace:              mynamespace
CreationTimestamp:      Thu, 01 Dec 2022 01:10:27 -0600
Labels:                 app.kubernetes.io/component=controller-manager
                        app.kubernetes.io/instance=prod-mn-nebula
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=nebula-operator
                        app.kubernetes.io/version=1.0.0
                        argocd.argoproj.io/instance=prod-mn-nebula
                        helm.sh/chart=nebula-operator-1.3.0
Annotations:            argocd.argoproj.io/sync-wave: -4
                        deployment.kubernetes.io/revision: 1
Selector:               app.kubernetes.io/component=controller-manager,app.kubernetes.io/instance=prod-mn-nebula,app.kubernetes.io/name=nebula-operator
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/component=controller-manager
                    app.kubernetes.io/instance=prod-mn-nebula
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=nebula-operator
                    app.kubernetes.io/version=1.0.0
                    helm.sh/chart=nebula-operator-1.3.0
  Service Account:  nebula-operator-controller-manager-sa
  Containers:
   controller-manager:
    Image:      vesoft/nebula-operator:v1.3.0
    Port:       9443/TCP
    Host Port:  0/TCP
    Command:
      /usr/local/bin/controller-manager
    Args:
      --health-probe-bind-address=:8081
      --metrics-bind-address=:8080
      --enable-kruise=true
      --max-concurrent-reconciles=3
      --enable-leader-election
      --leader-election-namespace=mynamespace
      --admission-webhook=true
    Limits:
      cpu:     200m
      memory:  300Mi
    Requests:
      cpu:        200m
      memory:     300Mi
    Liveness:     http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:    http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
   kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=2
    Limits:
      cpu:     100m
      memory:  30Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:
   cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nebula-operator-webhook-secret
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   nebula-operator-controller-manager-deployment-58f6ccc4b7 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  34m   deployment-controller  Scaled up replica set nebula-operator-controller-manager-deployment-58f6ccc4b7 to 2

@MegaByte875
Copy link
Contributor

MegaByte875 commented Mar 27, 2023

@jrab66 Did you resolve the problem?
I wish the chek list below could help you:

  1. kubectl get secrets |grep webhook
  2. nebula-operator logs about webhook
    DEBUG controller-runtime.webhook.webhooks admission/http.go:136 wrote response {"webhook": "/apis/admission.nebula-graph.io/v1alpha1/statefulsetvalidating", "code": 200, "reason": "", "UID": "7bbd714a-43e5-4edc-a882-341f823cd6f5", "allowed": true}
  3. cert-manager logs about nebula-operator-webhook-cert
  4. cert-manager-cainjector logs about nebula-operator-webhook-secret

@QingZ11
Copy link
Contributor

QingZ11 commented May 5, 2023

I have noticed that the issue you created hasn’t been updated for nearly a month, so I have to close it for now. If you have any new updates, you are welcome to reopen this issue anytime.

Thanks a lot for your contribution anyway 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Type: question about the product
Projects
None yet
Development

No branches or pull requests

7 participants