Fix flaky test by disabling Prometheus metrics #1341

dhwthompson · 2022-01-24T22:26:25Z

The test suites for some of our packages spin up parts of a Kubernetes cluster using the envtest library, so that we have something more solid than mocks to test against. One problem this was causing was that running tests for both the kube and server packages at once would cause one of those test suites to fail.

Side-note: it's possible to reproduce this problem a little more efficiently by spinning up a fake web server to sit on port 8080, for example using Python's built-in web server:

python3 -m http.server 8080

After some digging, I found out that the reason for this is that both environments were trying to spin up a Prometheus metrics handler listening on port 8080. If they both tried to do this at once, one would fail, causing the entire test suite for that package to fail too (including some nil-pointer panics during the clean up, which might be worth tidying up in a future pull request).

Setting the metrics bind address to the string "0" disables metrics for the test clusters (the documentation for which is buried in a comment in the Kubernetes docs), which fixes this particular problem. Testing on my local machine, there didn't seem to be any other fixed port assignments cropping up to take its place.

Closes: #1332

The test suites for some of our packages spin up parts of a Kubernetes cluster using the `envtest` library, so that we have something more solid than mocks to test against. One problem this was causing was that running tests for both the `kube` and `server` packages at once would cause one of those test suites to fail. Side-note: it's possible to reproduce this problem a little more efficiently by spinning up a fake web server to sit on port 8080, for example using Python's built-in web server: python3 -m http.server 8080 After some digging, I found out that the reason for this is that both environments were trying to spin up a Prometheus metrics handler listening on port 8080. If they both tried to do this at once, one would fail, causing the entire test suite for that package to fail too (including some nil-pointer panics during the clean up, which might be worth tidying up in a future pull request). Setting the metrics bind address to the string "0" disables metrics for the test clusters (the documentation for which is buried in a comment in the Kubernetes docs[1]), which fixes this particular problem. Testing on my local machine, there didn't seem to be any other fixed port assignments cropping up to take its place. [1]: https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.10.2/pkg/manager

jpellizzari

🤯

jpellizzari

Never would have found this. Nice work!

The test suites for some of our packages spin up parts of a Kubernetes cluster using the `envtest` library, so that we have something more solid than mocks to test against. One problem this was causing was that running tests for both the `kube` and `server` packages at once would cause one of those test suites to fail. Side-note: it's possible to reproduce this problem a little more efficiently by spinning up a fake web server to sit on port 8080, for example using Python's built-in web server: python3 -m http.server 8080 After some digging, I found out that the reason for this is that both environments were trying to spin up a Prometheus metrics handler listening on port 8080. If they both tried to do this at once, one would fail, causing the entire test suite for that package to fail too (including some nil-pointer panics during the clean up, which might be worth tidying up in a future pull request). Setting the metrics bind address to the string "0" disables metrics for the test clusters (the documentation for which is buried in a comment in the Kubernetes docs[1]), which fixes this particular problem. Testing on my local machine, there didn't seem to be any other fixed port assignments cropping up to take its place. [1]: https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.10.2/pkg/manager

* New proto endpoints * Added integration tests * V2 workflow added * Use controller-runtime client in app server * Fix whitepace and unused linting errors * Add kustomization tests * Fix flaky test by disabling Prometheus metrics (#1341) The test suites for some of our packages spin up parts of a Kubernetes cluster using the `envtest` library, so that we have something more solid than mocks to test against. One problem this was causing was that running tests for both the `kube` and `server` packages at once would cause one of those test suites to fail. Side-note: it's possible to reproduce this problem a little more efficiently by spinning up a fake web server to sit on port 8080, for example using Python's built-in web server: python3 -m http.server 8080 After some digging, I found out that the reason for this is that both environments were trying to spin up a Prometheus metrics handler listening on port 8080. If they both tried to do this at once, one would fail, causing the entire test suite for that package to fail too (including some nil-pointer panics during the clean up, which might be worth tidying up in a future pull request). Setting the metrics bind address to the string "0" disables metrics for the test clusters (the documentation for which is buried in a comment in the Kubernetes docs[1]), which fixes this particular problem. Testing on my local machine, there didn't seem to be any other fixed port assignments cropping up to take its place. [1]: https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.10.2/pkg/manager * Use suite test for kubebuilder setup/teardown * Add more create kustomization test coverage * Skip acceptance test lint * Split k8s resources into separate files * Initial work implementing HelmRepository to the API * Added weave related labels to helm repository resource * Use app name to list helm repositories * Added unit test for ListHelmRepositories * Added missing namespace parameter when listing helm repositories Co-authored-by: James Wilson <jwils22@gmail.com> Co-authored-by: Jordan Pellizzari <jordan@weave.works> Co-authored-by: David Thompson <32775+dhwthompson@users.noreply.github.com>

dhwthompson requested a review from jpellizzari January 24, 2022 22:26

dhwthompson added the exclude from release notes label Jan 24, 2022

dhwthompson force-pushed the fix-double-test-port-failure branch from 59d39fa to c6076a2 Compare January 24, 2022 23:50

jpellizzari reviewed Jan 25, 2022

View reviewed changes

jpellizzari requested review from jpellizzari and J-Thompson12 January 25, 2022 17:02

jpellizzari approved these changes Jan 25, 2022

View reviewed changes

J-Thompson12 approved these changes Jan 25, 2022

View reviewed changes

dhwthompson merged commit 0d022cc into main Jan 25, 2022

dhwthompson deleted the fix-double-test-port-failure branch January 25, 2022 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test by disabling Prometheus metrics #1341

Fix flaky test by disabling Prometheus metrics #1341

dhwthompson commented Jan 24, 2022

jpellizzari left a comment

jpellizzari left a comment

Fix flaky test by disabling Prometheus metrics #1341

Fix flaky test by disabling Prometheus metrics #1341

Conversation

dhwthompson commented Jan 24, 2022

jpellizzari left a comment

Choose a reason for hiding this comment

jpellizzari left a comment

Choose a reason for hiding this comment