Helmify the monitoring #723

coryschwartz · 2020-03-19T22:25:26Z

Fix: #594 (upgrade to kubernetes 1.17)

Partially fix now-re-opend: #228 (effective cluster monitoring)

In this PR, I have removed the kops addon, which used apis which were depreciated in kubernetes 1.16. In particular, apps/v1beta2 APIs, which are missing in 1.17 and still required by the kops addon.

In its place, I've constructed a helm chart which consists of the prometheus operator and pushgateway with appropriate value changes.

I have upgraded the kubernetes version in this PR to the latest version (1.17) and the new dashboards provided by this work better than the buggy kops addon dashboards

At present time, I haven't migrated the custom dashboards, but I'll do that in this PR before its finished.

@Robmat05 this is a draft PR, not ready to review yet, but open for comments about the strategy and direction

coryschwartz · 2020-03-20T04:25:57Z

Due to the large number of files included in this PR, here is an explanation of what is going on here:

1. All the infrastructure is code.

prometheus operator moved from kops addon to helm chart
prometheus pushgateway moved from downloaded helm chart to scm
grafana dashboards are in scm
redis moved from downloaded helm chart to scm
sidecar is already in scm, and left it alone
weave networking is already in scm, and left it alone
almost all of the infra configuration is in one file: infra/k8s/testground-infra/values.yaml
we can roll back if an update breaks us.

2. Kubernetes is upgraded to 1.17

updated the versions in cluster.yaml.

3. Automatic cluster monitoring.

This existed before with the kops addon, but it was buggy.
redis and weave have serviceMonitors. Again, this was already there with the kops addon.

4. Automatic plan dashboards

This was not possible with the kops addon.
This works by a helm templates -- /infra/k8s/testground-infra/charts/testground-dashboards
dashboards are uploaded to configMap, which the grafana dashboard watches.
Updating the configMaps updates the dashboards, without any additional effort.
(future) dashboards can be filly managed by CI

5. Not really automatic plan downloads

This was already possible with the kops addon
Use the tool in /dashboards or use the web UI to export dashboards
dashboards are stored in plain grafana json.
users can create dashboards comfortably from the grafana webUI, without a templating DSL.
downloads dashboards which have a testground tag.

There are a couple of dashboards with missing queries -- they didn't survive the upgrade. I'll save those for the demo tomorrow so I can show how dashboard import and export works.

nonsense · 2020-03-20T14:03:27Z

infra/k8s/install.sh

              -f ./sidecar.yaml

-echo "Install Redis..."


Not sure why Redis is removed here?

nonsense · 2020-03-20T14:06:08Z

@coryschwartz I am not sure why we are moving all community supported helm charts to our testground repo? We don't have resources to maintain all these helm charts or fork them. Are we modifying them? Why not use the community provided helm charts directly, rather than copying them under our repo?

Overall I like that we are getting rid of kops addons, but I don't understand why we need to copy over all helm charts and not just use them as provided by the community and update our values.yml for each chart.

coryschwartz · 2020-03-20T16:56:54Z

@coryschwartz I am not sure why we are moving all community supported helm charts to our testground repo? We don't have resources to maintain all these helm charts or fork them. Are we modifying them? Why not use the community provided helm charts directly, rather than copying them under our repo?

This was just for version control, so we can roll back to previous images, etc. if something is found not to be working. I don't have a strong preference for placing them in a single chart like this, and it would work fine just to replace the values.

There is some benefit to having them in a single chart -- you can describe the dependency tree. For example, the serviceMonitor is a CRD created by the prometheus operator, and the weave network serviceMonitor would fail if the operator were not installed first. I haven't added weave to the testground-infra so the dependency is implicit and might be confusing. I would also move the sidecar and weave into the the chart, so that the whole infra is described.

With regard to the additional chore for the updates, I agree with you, that is the downside of this approach. The way I created these was just by running helm pull and extracting the outputs. I am not, and don't want to, edit the content of the third party charts. I could imagine writing a simple script to update the charts sometimes. What do you think of this idea @nonsense ?

coryschwartz · 2020-03-23T22:54:31Z

@nonsense

I think this is more reviewable now that I've removed the huge number of sub-charts. In its place, I added a small bash script update_helm_thirdparty.sh which can be used to download the latest version of un-managed thirdparty charts.

coryschwartz · 2020-03-23T22:57:01Z

dashboards/dashboards/cluster-bench.json

@@ -41,8 +41,8 @@
            "styles": null
        },
        {
-            "datasource": "prometheus",
-            "editable": false,
+            "datasource": "Prometheus",


When I initially uploaded this to the bitnami grafana, it couldn't find the datasource "prometheus", and I went through and capitolized these.

coryschwartz · 2020-03-23T22:58:21Z

dashboards/dashboards/cluster-bench.json

-                            "colorScheme": "interpolateOranges",
-                            "exponent": 0.5,
-                            "mode": "spectrum"
+                        "CustomPanel": {


These changes were generated with the dashboard importing tool, just uploading them to the new grafana and downloading them again

coryschwartz · 2020-03-23T23:01:11Z

infra/k8s/kops-weave/weave-service-monitor.yml

@@ -4,7 +4,7 @@ metadata:
  name: weave-net
  labels:
    k8s-app: weave-net
-  namespace: monitoring
+  namespace: default


Didn't re-create the monitoring namespace.

coryschwartz · 2020-03-23T23:10:16Z

infra/k8s/testground-infra/charts/testground-dashboards/templates/dashboard.yaml

@@ -0,0 +1,8 @@
+apiVersion: v1


Creating a configmap containing all of the dashboards in /dashboards.

Something to think about here is that there is a maximum size a configmap can be, which is limited by etcd. We can probably not grow this configmap to any more than 1MB.

At present time, the sum size of all dashboards is about 80k, so we are still well away from this limitation. If we do get close to the 1MB limit, we will have to switch this template a little bit.

Rather than 1 configmap with all dashboards, we would do one configmap per dashboard such that the limit for each dashboard is 1MB. So long as they all have the appropriate label, this works fine.

nonsense · 2020-03-23T23:55:42Z

pkg/runner/cluster_k8s.go

@@ -342,14 +342,16 @@ func (c *ClusterK8sRunner) healthcheckRedis() (redisCheck api.HealthcheckItem) {
 		redisCheck.Message = err.Error()
 		return
 	}
-	if len(pods.Items) != 1 {
-		redisCheck.Message = fmt.Sprintf("expected 1 redis pod. found %d.", len(pods.Items))
+	// one master, and slaves.


For simplicity sake, we should probably run just one Redis master, without slaves, for the time being. Having seen benchmarks together with @raulk , we don't seem to be anywhere near scale where we need multiple Redis containers, and I think this will just increase complexity for now, considering that we expect strong consistency for the sync service.

nonsense · 2020-03-24T12:01:05Z

This is failing on my machine with:

installing helm infrastructure
walk.go:74: found symbolic link in path: /Users/nonsense/code/src/github.com/ipfs/testground/infra/k8s/testground-infra/charts/testground-dashboards/dashboards resolves to /Users/nonsense/code/src/github.com/ipfs/testground/dashboards
Error: found in Chart.yaml, but missing in charts/ directory: prometheus-operator, prometheus-pushgateway, redis
Error on line 118

I think we are missing a call to ./update_helm_thirdparty.sh and we seem to be missing the helm repo add command for bitnami in the README

nonsense · 2020-03-24T12:03:57Z

infra/k8s/update_helm_thirdparty.sh

+
+for repo in "${thirdparty[@]}"
+do
+	helm pull "$repo" --untar --untardir ./testground-infra/charts/


We are mixing our source code with something we pull here, which will result in not-clean repo. We should probably pull these in a gitignored directory.

My intention was to eventually add these to source control so we have the ability to roll back in the event of a bad update. I hadn't intended on running update_helm_thirdparty.sh during normal installation, but just when we want to pull in updates to those charts.

nonsense · 2020-03-24T12:07:36Z

After manually adding the binami repo and running helm install testground-infra ./testground-infra , I get:

installing helm infrastructure
walk.go:74: found symbolic link in path: /Users/nonsense/code/src/github.com/ipfs/testground/infra/k8s/testground-infra/charts/testground-dashboards/dashboards resolves to /Users/nonsense/code/src/github.com/ipfs/testground/dashboards
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
Error on line 118

nonsense · 2020-03-24T12:20:40Z

The testground network ping-pong test is failing with this PR, looks like we have some mis-configuration with Redis:

testground -vv run single network/ping-pong \
    --builder=docker:go \
    --runner=cluster:k8s \
    --build-cfg bypass_cache=true \
    --build-cfg push_registry=true \
    --build-cfg registry_type=aws \
    --run-cfg keep_service=false \
    --instances=2 \
  --collect

{"ts":1585052403466778622,"msg":"","group_id":"single","run_id":"9cfe8b0a0a66","event":{"type":"finish","outcome":"failed","error":"during redisClient: no viable redis host found","stacktrace":"goroutine 1 [running]:\nruntime/debug.Stack(0xb0e49c, 0x2, 0xc00033d7e8)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9d\ngithub.com/ipfs/testground/sdk/runtime.(*logger).RecordCrash(0xc00010ff00, 0xa5b500, 0xc0000afc20)\n\t/sdk/runtime/output.go:170 +0xa2\ngithub.com/ipfs/testground/sdk/runtime.Invoke.func2(0xc00016b320)\n\t/sdk/runtime/runner.go:88 +0x67\npanic(0xa5b500, 0xc0000afc20)\n\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/ipfs/testground/sdk/sync.MustWatcherWriter(0xd0cf20, 0xc0000b6d80, 0xc00016b320, 0x0, 0x0)\n\t/sdk/sync/common.go:145 +0x83\nmain.run(0xc00016b320, 0x0, 0x0)\n\t/plan/main.go:33 +0x129\ngithub.com/ipfs/testground/sdk/runtime.Invoke(0xc5c830)\n\t/sdk/runtime/runner.go:96 +0x399\nmain.main()\n\t/plan/main.go:17 +0x2d\n"}}

The environment variable is set correctly for the container:

      REDIS_HOST:                 testground-infra-redis-headless

➜  testground git:(helmify-the-monitoring) ✗ testground healthcheck --runner=cluster:k8s
Mar 24 12:23:20.155311  INFO    testground client initialized   {"addr": "localhost:8080"}
checking runner cluster:k8s
finished checking runner cluster:k8s
Checks:
- k8s: ok; k8s cluster is running
- efs: ok; efs provisioner is running
- redis: ok; redis service is running
- sidecar: ok; sidecar service is running
No fixes applied.

nonsense

Make sure that Redis is one node (so that we have strong consistency) and that network/ping-pong continues to work.

Also make sure that ./install.sh runs flawlessly and installs monitoring and dashboards.

coryschwartz · 2020-03-24T22:27:48Z

oh, I had intended to change the redis host only for kubernetes, not for local runners!

Alright, disabling redis cluster mode, and will correct the other suggestions as well.

After manually adding the binami repo and running helm install testground-infra ./testground-infra , I get:

installing helm infrastructure
walk.go:74: found symbolic link in path: /Users/nonsense/code/src/github.com/ipfs/testground/infra/k8s/testground-infra/charts/testground-dashboards/dashboards resolves to /Users/nonsense/code/src/github.com/ipfs/testground/dashboards
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
Error on line 118

oh, right! So there is a difference in helm2 and helm 3 CRD installation. These charts are fully helm-3 capable. We can skip (or ignore) those messages. I added an option to disable the CRD install hook, but since we are using helm3, the CRDs do still get installed.

coryschwartz · 2020-03-24T23:55:42Z

I got the ping-pong test working.

I'm adding the charts back in. If there is disagreement about keeping these in source control, I don't mind removing them in the future.

nonsense · 2020-03-25T11:48:12Z

@coryschwartz the network/ping-pong test is not working for me - I am not sure if we have some obscure bug that only manifests on occasion, or the configs to Redis in this PR triggered it (we are changing the DNS name of Redis here, so it might be the case that we have a bug that this is triggering).

{"ts":1585137068426355474,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"message","message":"registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)"}}
{"ts":1585137068426413089,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"start","runenv":{"plan":"network","case":"ping-pong","seq":0,"params":{},"instances":2,"outputs_path":"/outputs/8553537b6069/single/0","network":"ip+net","group":"single","group_instances":2}}}
{"ts":1585137068426544234,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"message","message":"waiting for pushgateway to become accessible"}}
{"ts":1585137068428239694,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"message","message":"pushgateway is up at prometheus-pushgateway:9091; pushing metrics every 5s."}}
{"ts":1585137068431793818,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"message","message":"before sync.MustWatcherWriter"}}
{"ts":1585137068433512760,"msg":"","group_id":"single","run_id":"8553537b6069","event":{"type":"finish","outcome":"failed","error":"during redisClient: no viable redis host found","stacktrace":"goroutine 1 [running]:\nruntime/debug.Stack(0xb0e49c, 0x2, 0xc0002ef7e8)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9d\ngithub.com/ipfs/testground/sdk/runtime.(*logger).RecordCrash(0xc00018a9e0, 0xa5b500, 0xc00018b000)\n\t/sdk/runtime/output.go:170 +0xa2\ngithub.com/ipfs/testground/sdk/runtime.Invoke.func2(0xc0000b3d40)\n\t/sdk/runtime/runner.go:88 +0x67\npanic(0xa5b500, 0xc00018b000)\n\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/ipfs/testground/sdk/sync.MustWatcherWriter(0xd0cf20, 0xc0001be840, 0xc0000b3d40, 0x0, 0x0)\n\t/sdk/sync/common.go:145 +0x83\nmain.run(0xc0000b3d40, 0x0, 0x0)\n\t/plan/main.go:33 +0x129\ngithub.com/ipfs/testground/sdk/runtime.Invoke(0xc5c830)\n\t/sdk/runtime/runner.go:96 +0x399\nmain.main()\n\t/plan/main.go:17 +0x2d\n"}}
during redisClient: no viable redis host found
goroutine 1 [running]:
runtime/debug.Stack(0x2f, 0x0, 0x0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/ipfs/testground/sdk/runtime.Invoke.func2(0xc0000b3d40)
        /sdk/runtime/runner.go:92 +0xc0
panic(0xa5b500, 0xc00018b000)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/ipfs/testground/sdk/sync.MustWatcherWriter(0xd0cf20, 0xc0001be840, 0xc0000b3d40, 0x0, 0x0)
        /sdk/sync/common.go:145 +0x83
main.run(0xc0000b3d40, 0x0, 0x0)
        /plan/main.go:33 +0x129
github.com/ipfs/testground/sdk/runtime.Invoke(0xc5c830)
        /sdk/runtime/runner.go:96 +0x399
main.main()
        /plan/main.go:17 +0x2d

nonsense · 2020-03-25T12:13:50Z

@coryschwartz we need all Testground services to be on the flannel network, so every service we start needs to have:

  podAnnotations: {
    cni: "flannel"
  }

Otherwise when a testplan is started, it has access only to the control network and doesn't have routes to the required services. This seems to be the case for Redis right now.

nonsense · 2020-03-25T12:15:58Z

Alternatively we have to somehow fix the networking to always attach new pods to flannel and not weave, but I am not sure how to do that with the CNI Genie plugin, so for now we are explicit in the annotations.

nonsense · 2020-03-25T12:18:14Z

On my cluster right now I see:

NAME                                                   READY   STATUS      RESTARTS   AGE     IP              NODE                                             NOMINATED NODE   READINESS GATES
busybox                                                1/1     Running     0          5m58s   100.96.1.21     ip-172-20-56-193.eu-central-1.compute.internal   <none>           <none>
collect-outputs                                        1/1     Running     0          37m     100.96.3.6      ip-172-20-58-7.eu-central-1.compute.internal     <none>           <none>
dummy-8t4xk                                            1/1     Running     0          40m     100.96.2.8      ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
dummy-nsc4h                                            1/1     Running     0          40m     100.96.3.4      ip-172-20-58-7.eu-central-1.compute.internal     <none>           <none>
dummy-xswlw                                            1/1     Running     0          40m     100.96.1.4      ip-172-20-56-193.eu-central-1.compute.internal   <none>           <none>
efs-provisioner-ddb4ddf7d-j7wmn                        1/1     Running     0          41m     100.96.2.3      ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
prometheus-pushgateway-b44bc5955-ztf6t                 1/1     Running     0          41m     100.96.2.7      ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
prometheus-testground-infra-prometheu-prometheus-0     3/3     Running     1          40m     30.0.0.1        ip-172-20-56-193.eu-central-1.compute.internal   <none>           <none>
testground-infra-grafana-59966cb797-bw5hd              3/3     Running     0          41m     100.96.2.5      ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
testground-infra-kube-state-metrics-5458c7998f-hfl4d   1/1     Running     0          41m     100.96.2.6      ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
testground-infra-prometheu-operator-7f5f59d49d-64qt2   2/2     Running     0          41m     100.96.1.3      ip-172-20-56-193.eu-central-1.compute.internal   <none>           <none>
testground-infra-prometheus-node-exporter-kwz68        1/1     Running     0          41m     172.20.52.88    ip-172-20-52-88.eu-central-1.compute.internal    <none>           <none>
testground-infra-prometheus-node-exporter-nm6s4        1/1     Running     0          41m     172.20.56.193   ip-172-20-56-193.eu-central-1.compute.internal   <none>           <none>
testground-infra-prometheus-node-exporter-wl4xk        1/1     Running     0          41m     172.20.58.7     ip-172-20-58-7.eu-central-1.compute.internal     <none>           <none>
testground-infra-prometheus-node-exporter-zbb6z        1/1     Running     0          41m     172.20.40.70    ip-172-20-40-70.eu-central-1.compute.internal    <none>           <none>
testground-infra-redis-master-0                        2/2     Running     0          41m     16.0.0.2        ip-172-20-58-7.eu-central-1.compute.internal     <none>           <none>

testground-infra-redis-master-0 is on the wrong network.
prometheus-testground-infra-prometheu-prometheus-0 is on the wrong network.

Multus CNI has a default network, but CNI Genie doesn't seem to have one, so it seems to attach pods randomly to a given network, if there are no specific annotations.

nonsense · 2020-03-25T12:31:17Z

I'm adding the charts back in. If there is disagreement about keeping these in source control, I don't mind removing them in the future.

I am not sure what problem this addressed. It should be just a simple set of helm repo add and helm repo update, rather than having to include 70k lines and have to manually update our charts.

I think the community is doing a great job at keeping important charts like those we use maintained and not introduce problems so that we can always pretty much rely on the latest version.

nonsense · 2020-03-25T12:37:43Z

@coryschwartz this seems to be reverting all the configs we have been doing to Redis:

  sysctls:
  - name: net.core.somaxconn
    value: "100000"
  - name: net.netfilter.nf_conntrack_max
    value: "100000"

  extraFlags:
  - "--maxclients 100000"

  podAnnotations: {
    cni: "flannel"
  }

We want Redis to occupy a full node, and this is an easy way to make sure it happens. Obviously it should be improved with annotations at some point and affinity, but for now it does the trick.

    requests:
      memory: 13000Mi
      cpu: 6500m

## Sysctl InitContainer
## used to perform sysctl operation to modify Kernel settings (needed sometimes to avoid warnings)
sysctlImage:
  enabled: true
  command:
  - /bin/sh
  - -c
  - |
    sysctl -w net.core.somaxconn=100000 &&
    sysctl -w net.netfilter.nf_conntrack_max=100000

Cory Schwartz added 4 commits March 19, 2020 15:00

Testground monitoring helm chart

02ddf27

Upgrade to kubrnetes v1.17

89fec3c

use default namespace for serviceMonitors

b40b769

helm install monitoring

9e6df8e

coryschwartz added the status/in-progress label Mar 19, 2020

coryschwartz self-assigned this Mar 19, 2020

Cory Schwartz added 7 commits March 19, 2020 15:42

fix indentation

5a92721

Remove s3bucket

237477e

Automatic dashboards

53fc4bf

Include redis.

7206c99

rename to testground-infra

6398bfd

update redis host

4b1795c

dash import

a994a9c

coryschwartz marked this pull request as ready for review March 20, 2020 04:26

Merge branch 'master' into helmify-the-monitoring

a818402

Robmat05 added this to the Testground v0.4 milestone Mar 20, 2020

nonsense reviewed Mar 20, 2020

View reviewed changes

infra/k8s/install.sh

-f ./sidecar.yaml

echo "Install Redis..."

Copy link

Member

nonsense Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why Redis is removed here?

Cory Schwartz added 3 commits March 20, 2020 13:46

the second nslookup

56b01f9

healthcheck expect master-slave.

485128d

support multiple slaves

ef7364a

Robmat05 added status/waiting and removed status/in-progress labels Mar 23, 2020

Cory Schwartz added 3 commits March 23, 2020 15:30

Merge branch 'master' into helmify-the-monitoring

443847f

remove downloaded files (making script)

d3898ff

add update_helm_thirdparty script

093aac3

coryschwartz commented Mar 23, 2020

View reviewed changes

coryschwartz requested a review from nonsense March 23, 2020 23:11

remove old redis values

640f7ae

nonsense reviewed Mar 23, 2020

View reviewed changes

nonsense reviewed Mar 24, 2020

View reviewed changes

nonsense requested changes Mar 24, 2020

View reviewed changes

Cory Schwartz added 3 commits March 24, 2020 15:01

Merge branch 'master' into helmify-the-monitoring

8bf3918

Disable redis cluster

164446a

no crd hook

5d1c5b5

Cory Schwartz added 2 commits March 24, 2020 16:40

change the order (dashboards on bottom)

c2efe94

stable repo

c728378

add charts again

32abbbd

coryschwartz merged commit c5e3314 into master Mar 25, 2020

coryschwartz deleted the helmify-the-monitoring branch March 25, 2020 00:09

Robmat05 added status/done and removed status/waiting labels Mar 25, 2020

This was referenced Mar 25, 2020

[bug]restore redis values #740

Merged

Refactor/remove thirdparty helm #743

Merged

Robmat05 added the status/done label Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helmify the monitoring #723

Helmify the monitoring #723

coryschwartz commented Mar 19, 2020

coryschwartz commented Mar 20, 2020 •

edited

nonsense Mar 20, 2020

nonsense commented Mar 20, 2020 •

edited

coryschwartz commented Mar 20, 2020

coryschwartz commented Mar 23, 2020

coryschwartz Mar 23, 2020

coryschwartz Mar 23, 2020

coryschwartz Mar 23, 2020

coryschwartz Mar 23, 2020

nonsense Mar 23, 2020

nonsense commented Mar 24, 2020

nonsense Mar 24, 2020

coryschwartz Mar 24, 2020

nonsense commented Mar 24, 2020

nonsense commented Mar 24, 2020 •

edited

nonsense left a comment

coryschwartz commented Mar 24, 2020

coryschwartz commented Mar 24, 2020

nonsense commented Mar 25, 2020 •

edited

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020 •

edited

Helmify the monitoring #723

Helmify the monitoring #723

Conversation

coryschwartz commented Mar 19, 2020

coryschwartz commented Mar 20, 2020 • edited

1. All the infrastructure is code.

2. Kubernetes is upgraded to 1.17

3. Automatic cluster monitoring.

4. Automatic plan dashboards

5. Not really automatic plan downloads

Choose a reason for hiding this comment

nonsense commented Mar 20, 2020 • edited

coryschwartz commented Mar 20, 2020

coryschwartz commented Mar 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nonsense commented Mar 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nonsense commented Mar 24, 2020

nonsense commented Mar 24, 2020 • edited

nonsense left a comment

Choose a reason for hiding this comment

coryschwartz commented Mar 24, 2020

coryschwartz commented Mar 24, 2020

nonsense commented Mar 25, 2020 • edited

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020

nonsense commented Mar 25, 2020 • edited

coryschwartz commented Mar 20, 2020 •

edited

nonsense commented Mar 20, 2020 •

edited

nonsense commented Mar 24, 2020 •

edited

nonsense commented Mar 25, 2020 •

edited

nonsense commented Mar 25, 2020 •

edited