Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Started instructions fail on GKE v 1.17 #3910

Closed
paultiplady opened this issue Apr 30, 2021 · 9 comments · Fixed by #4517
Closed

Getting Started instructions fail on GKE v 1.17 #3910

paultiplady opened this issue Apr 30, 2021 · 9 comments · Fixed by #4517
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@paultiplady
Copy link

paultiplady commented Apr 30, 2021

Expected Behavior

Running the command in https://github.com/tektoncd/pipeline/blob/main/docs/install.md should succeed and produce a working install.

Actual Behavior

kubectl apply command fails and does not produce a working install.

Steps to Reproduce the Problem

  1. kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.22.0/release.yaml
namespace/tekton-pipelines created
podsecuritypolicy.policy/tekton-pipelines configured
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-controller-cluster-access unchanged
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-controller-tenant-access configured
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-webhook-cluster-access unchanged
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-leader-election unchanged
role.rbac.authorization.k8s.io/tekton-pipelines-controller created
role.rbac.authorization.k8s.io/tekton-pipelines-webhook created
serviceaccount/tekton-pipelines-controller created
serviceaccount/tekton-pipelines-webhook created
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-cluster-access configured
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-leaderelection unchanged
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-tenant-access configured
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook-cluster-access configured
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook-leaderelection unchanged
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller created
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook created
customresourcedefinition.apiextensions.k8s.io/clustertasks.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/conditions.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/images.caching.internal.knative.dev unchanged
customresourcedefinition.apiextensions.k8s.io/pipelines.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/pipelineruns.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/pipelineresources.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/runs.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/tasks.tekton.dev configured
customresourcedefinition.apiextensions.k8s.io/taskruns.tekton.dev configured
secret/webhook-certs created
validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.pipeline.tekton.dev configured
mutatingwebhookconfiguration.admissionregistration.k8s.io/webhook.pipeline.tekton.dev configured
validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.pipeline.tekton.dev configured
clusterrole.rbac.authorization.k8s.io/tekton-aggregate-edit unchanged
clusterrole.rbac.authorization.k8s.io/tekton-aggregate-view unchanged
deployment.apps/tekton-pipelines-controller created
service/tekton-pipelines-controller created
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook created
poddisruptionbudget.policy/tekton-pipelines-webhook created
deployment.apps/tekton-pipelines-webhook created
service/tekton-pipelines-webhook created
Error from server (InternalError): error when creating "release-v0.19.0.yaml": Internal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post
 https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=30s: service "tekton-pipelines-webhook" not found

Additional Info

Looking at the failing webhook pods, it seems the ConfigMaps are not in place in time:

✗ kubectl -n tekton-pipelines logs tekton-pipelines-webhook-5475db4495-xqtgz
2021/04/30 18:51:15 Registering 2 clients
2021/04/30 18:51:15 Registering 3 informer factories
2021/04/30 18:51:15 Registering 4 informers
2021/04/30 18:51:15 Registering 5 controllers
2021/04/30 18:51:15 Readiness and health check server listening on port 8080
{"severity":"INFO","timestamp":"2021-04-30T18:51:15.078241552Z","caller":"logging/config.go:116","message":"Successfully created the logger."}
{"severity":"INFO","timestamp":"2021-04-30T18:51:15.078396125Z","caller":"logging/config.go:117","message":"Logging level set to: info"}
{"severity":"INFO","timestamp":"2021-04-30T18:51:15.078549076Z","logger":"tekton-pipelines-webhook","caller":"profiling/server.go:64","message":"Profiling enabled: false
","commit":"b459114"}
{"severity":"INFO","timestamp":"2021-04-30T18:51:15.084831041Z","logger":"tekton-pipelines-webhook","caller":"leaderelection/context.go:46","message":"Running with Stand
ard leader election","commit":"b459114"}
{"severity":"INFO","timestamp":"2021-04-30T18:51:15.149607254Z","logger":"tekton-pipelines-webhook","caller":"sharedmain/main.go:209","message":"Starting configuration m
anager...","commit":"b459114"}
{"severity":"EMERGENCY","timestamp":"2021-04-30T18:51:15.250117394Z","logger":"tekton-pipelines-webhook","caller":"sharedmain/main.go:211","message":"Failed to start con
figuration manager","commit":"b459114","error":"configmap \"config-artifact-pvc\" not found","stacktrace":"github.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/
sharedmain.MainWithConfig\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain/main.go:211\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/webhook/
main.go:235\nruntime.main\n\truntime/proc.go:204"}

But the configmaps are being validated by the tekton-pipelines-webhook that cannot start because of the missing configmaps. Validation deadlock?

I worked around this by moving the configmaps above the ValidatingWebhookConfiguration manifests in the file.

Or alternatively, without destroying everything and re-creating, this worked too:

kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.22.0/release.yaml
# Error ^
kubectl -n tekton-pipelines delete validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.pipeline.tekton.dev
kubectl -n tekton-pipelines delete validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.pipeline.tekton.dev
kubectl create -f configmaps.yaml --save-config  # containing just the configmaps from the release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.22.0/release.yaml
  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-dispatcher", GitCommit:"a39a896b5018d0c800124a36757433c660fd0880", GitTreeState:"clean", BuildDate:"2021-01-28T22:06:27Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-gke.3000", GitCommit:"4e9d32bd87bb5644183bae27249c6021ddb89436", GitTreeState:"clean", BuildDate:"2021-02-23T09:19:15Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}  
  • Tekton Pipeline version:

None (Tekton not installed yet)

@paultiplady paultiplady added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2021
@paolocarta
Copy link

paolocarta commented Jun 11, 2021

Experienced a similar problem with a private GKE cluster version v1.19.9-gke.1900. In my case I installed tektoncd v0.24.2 with the webhook controller running.

The error:
Internal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=10s": context deadline exceededInternal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=10s": context deadline exceeded

@paultiplady The problem was a firewall rule on the GCP which allows just traffic to ports 443 and 10250 from the control plane to Pods on worker nodes.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2021
@bygui86
Copy link

bygui86 commented Nov 14, 2021

No news and even no comments on this... looks a bit weird...

@bygui86
Copy link

bygui86 commented Nov 14, 2021

/remove-lifecycle rotten

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 14, 2021
@jerop jerop assigned ghost Jan 10, 2022
@bobcatfish
Copy link
Collaborator

I think we've seen a very similar issue pop up occasionally in the past that we were a bit stumped by (e.g. #2207) but this might be different b/c in those cases I think the error was something like 'connection refused' vs this one has 'service not found' (could be the same underlying issue tho?).

if it WAS a similar issue then i think you'd only see this happen very rarely and not consistently

@bygui86
Copy link

bygui86 commented Jan 10, 2022

6 months ago together with @paolocarta we tried to deploy Tekton on a GKE private cluster a good amount of times.

We were not able to see the same behaviour 2 times in a row. If I remember well we faced both "connection refused" and "service not found" issues, but I cannot be sure as passed too much time.

After 2 weeks and many attempts we gave up and chose another CI/CD tool.

@ghost ghost added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 24, 2022
@debuggerpk
Copy link

I am facing the same issue on a private GKE cluster

@debuggerpk
Copy link

following the instructions at #3317 (comment) seems to solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants