Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some integration tests hanging on dashboard PRs #398

Closed
bobcatfish opened this issue May 20, 2020 · 1 comment
Closed

Some integration tests hanging on dashboard PRs #398

bobcatfish opened this issue May 20, 2020 · 1 comment

Comments

@bobcatfish
Copy link
Contributor

Expected Behavior

Prow Integration test statuses should always eventually get updated instead of just hanging indefintely.

Actual Behavior

Notably in tektoncd/dashboard#1406 and tektoncd/dashboard#1403 the integration tests would just hang. Symptoms:

  • Always "pending" state, in spite of how many /test commands issued
  • When you click "details", it says the build is running and always shows only 36 lines of output

You can find the underlying pods by looking at the "hook" logs (see prow architecture).

The pods in question were stuck initializing, until eventually they'd disappear, apparently cleaned up but something, but the status was never updated.

For example:

(⎈ |euca:default)➜  Downloads kubectl --context prow get pod c90a44ca-9abe-11ea-b60d-a22e91c6f8b8
NAME                                   READY   STATUS     RESTARTS   AGE
c90a44ca-9abe-11ea-b60d-a22e91c6f8b8   0/2     Init:0/3   0          17m

Running a kubectl describe looks like:

  Type     Reason                  Age                  From                                          Message
  ----     ------                  ----                 ----                                          -------
  Normal   Scheduled               21m                  default-scheduler                             Successfully assigned default/c90a44ca-9abe-11ea-b60d-a22e91c6f8b8 to gke-prow-highmem-pool-45b2fab2-01f9
  Warning  FailedCreatePodSandBox  2m35s (x8 over 18m)  kubelet, gke-prow-highmem-pool-45b2fab2-01f9  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "c90a44ca-9abe-11ea-b60d-a22e91c6f8b8": operation timeout: context deadline exceeded
  Warning  FailedSync              7s (x11 over 2m21s)  kubelet, gke-prow-highmem-pool-45b2fab2-01f9  error determining status: rpc error: code = Unknown desc = Error: No such container: 287318f1a2d49db1df7c2737e3ef3a3ca0d721022cc15e91a827bff1c8bb5093
@bobcatfish
Copy link
Contributor Author

Googling for failed to create a sandbox for pod "c90a44ca-9abe-11ea-b60d-a22e91c6f8b8": operation timeout: context deadline exceeded led me to kubernetes/kubernetes#79451 so I got the bright idea that maybe if I update the nodes, they will come with a fix for this issue.

However, the fact that the FailedCreatePodSandBox error is repeating seems to hint that it is retrying, and also it looks like 1.14.4 has the release https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.14.md#v1144 - the version of the nodes is somewhat lost in the sands of time but i think it was at least 1.14.10.

Anyway I upgraded the nodes anyway and now it seems like things are working (@eddycharly plz reopen if I'm wrong) so either that fixed something or whatever it was stopped happening (for now).

(When I looked at the node that this was running on, it looked like it had recently restarted so maybe that was somehow the problem).

Looking at https://tekton-releases.appspot.com/builds/tekton-prow/pr-logs/directory/pull-tekton-dashboard-integration-tests? The last few have actually completed, tho you can see all the ones that never got updated as well:

image

I'm gonna close this for now 🤞 but we can reopen if this keeps happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant