Update helmfile defaults for faster helm deployments. #404

willgraf · 2020-12-10T21:00:34Z

All of our helmfiles were using the same default values, specifically timeout: 600. This PR updates those defaults by:

Adding a specific timeout to each release. (Fixes Reduce Failure Time of Helmfile Deployments #267)
- 600 was kept for long-running deployments (redis, openvpn, prometheus-operator)
- 180 for cluster-issuers (inspired by cloudposse)
- 300 for everything else (this can always be changed later on)
force: true was removed for all helm deployments, which was causing some issues with helm3.
atomic: true and cleanupOnFail: true were added to all releases. Having helm do the cleanup lets us clean up some of our deployment wrapper script.
Comments were generally improved, and links were added for all helmfile references.
Update helmfile deployment script to log all helmfile failures and instructions for re-installing. No longer calls helm delete as each helmfile has atomic and cleanupOnFailure enabled. (Fixes "Cluster created" screen shows even if there were helm deployment issues. #349)

Set timeout for each release instead of standard 600. Use cleanupOnFail and atomic for all releases. Redis, prometheus, and openvpn keep their 600s timeout, all others go to 300s (except cluster-issuers, 180s)

Rename script to deploy-helmfiles as there is nothing GKE specific. Log all failures at the end of helmfile deployment to inform user what may be failing (Fixes #349)

willgraf · 2020-12-11T21:49:46Z

Please see below for an example of the helmfile failure output/warnings.

MekWarrior · 2020-12-12T03:28:10Z

I like alot of what is there but I do have a couple concerns. First, we need to make sure docs are clear on where to run the helmfile -l name=... command. Second, the message seems contradictory. On the one hand we say "Not all .... successfully deployed" but at the end we say "...created successfully." This could cause confusion and the early message may be disregarded - and subsequently lost to the logs.

MekWarrior

Looks good!

* Support TLS traffic with cert-manager. (#357) * Fix frontend ingress issue when no hosts are provided. (#381) * Template frontend ingress annotations using `CERTIFICATE_MANAGER_ENABLED` (#383) * Create tf-serving configuration files using an initContainer. (#382) * Fix whitespace issue in tasks/Makefile.kubectl (#386) * Bump openvpn to 4.2.3 (#385) * Upgrade certificate manager to version 1.0.3 (#384) * Add screenshot of successfully created cluster to docs. (#388) * Set up an AlertManager with slack receiver support (#317) * Install procps to give access to sysctl. (#390) * Migrate CI/CD from TravisCI to GitHub Actions (#394) * Change the redis helm chart repo to bitnami (#393) * Upgrade tf-serving chart to 0.3.0 for application version 0.4.0 (#392) * Move the frontend HPA definition into the helm chart. (#395) * Move the tf-serving HPA into the helm chart. (#396) * Move redis-consumer HPA into the helm chart. (#397) * Remove deprecated and unused charts (#398) * Migrate stable helm chart repo to archived URL. (#399) * Destroy the secret and remove the key from the DNS solver SA in a new task: `gke/destroy/certificate-manager-secret` (fixes #391). * Use GCP_SERVICE_ACCOUNT for DNS resolution (#401) * Clean up docs and test them with new GitHub Action workflow (#402) * Add code-formatted filename to list of files to change (#403) * Update ELK stack helmfiles (#380) * Move the prometheus-redis-exporter script to a chart using incubator/raw. (#405) * Use `kubectl del pvc` instead of deleting all pds with the cluster name. (#406) * Update helmfile defaults for faster helm deployments. (#404) * Skip gke/destroy/node-pools during cluster teardown. (#407) * Update docs to reflect the pending 1.4.0 release. (#408) * Bump redis-consumer version to 0.8.3 (#409) * Run integration tests on all PRs to master OR if they have the commit message. (#411) * Remove helm defaults for ELK helmfiles (#413) Co-authored-by: Morgan Schwartz <msschwartz21@gmail.com>

willgraf added 2 commits December 10, 2020 12:49

Update comments and reference URLs for all helmfiles.

306e2a1

Remove default helm options and define them for each release.

51cbbb0

Set timeout for each release instead of standard 600. Use cleanupOnFail and atomic for all releases. Redis, prometheus, and openvpn keep their 600s timeout, all others go to 300s (except cluster-issuers, 180s)

willgraf added the wip label Dec 10, 2020

willgraf changed the base branch from master to stable December 10, 2020 21:00

willgraf added 4 commits December 10, 2020 17:13

Add exponential delay to helmfile deployment and keep track of failures.

1de1f94

Rename script to deploy-helmfiles as there is nothing GKE specific. Log all failures at the end of helmfile deployment to inform user what may be failing (Fixes #349)

No double [[ or ]]

7027b4f

Fix integer comparison

99159d8

Better logging.

9a9f18b

willgraf added 3 commits December 11, 2020 15:23

Merge branch 'stable' into faster-helmfile

2c008b2

reduce a bit of whitespace in output

a3ca5a9

Reduce noise from kubens in create task.

4095eb3

willgraf removed the wip label Dec 12, 2020

willgraf added 3 commits December 11, 2020 20:19

Make changes from PR review.

8ef9ba5

Update error messages and move "finished" text into script.

f63ec40

deployment not destruction

f368efd

MekWarrior approved these changes Dec 14, 2020

View reviewed changes

willgraf merged commit 1a4aad0 into stable Dec 14, 2020

willgraf deleted the faster-helmfile branch December 14, 2020 21:13

This was referenced Dec 14, 2020

"Cluster created" screen shows even if there were helm deployment issues. #349

Closed

Reduce Failure Time of Helmfile Deployments #267

Closed

Speed Up Cluster Creation and Destruction #279

Closed

Fix helm defaults for ELK helmfiles. #413

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update helmfile defaults for faster helm deployments. #404

Update helmfile defaults for faster helm deployments. #404

willgraf commented Dec 10, 2020 •

edited

Loading

willgraf commented Dec 11, 2020

MekWarrior commented Dec 12, 2020

MekWarrior left a comment

Update helmfile defaults for faster helm deployments. #404

Update helmfile defaults for faster helm deployments. #404

Conversation

willgraf commented Dec 10, 2020 • edited Loading

willgraf commented Dec 11, 2020

MekWarrior commented Dec 12, 2020

MekWarrior left a comment

Choose a reason for hiding this comment

willgraf commented Dec 10, 2020 •

edited

Loading