Timeout Helm deployment > 300 seconds failed #3604

misteio · 2021-10-18T20:29:22Z

Description:

I want to install an Helm Chart that takes more than 300 seconds to deploy.

Steps to reproduce the issue:

Deploy a personal Helm Chart through KubeApps.
Wait around 300 seconds
It failed

Describe the results you received:

Describe the results you expected:

It could be great if we can change this default behavior.

Additional information you deem important (e.g. issue happens only occasionally):

I use a custom Helm Chart I've made. When I add the following annotations, it won't wait until the end of deployment.

annotations:
    "helm.sh/hook": post-install,post-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation

Version of Helm, Kubeapps and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}

Output of helm list -n <kubeapps-namespace> <kubeapps-release-name>:

kubeapps        shared          2               2021-10-18 18:08:32.619996037 +0000 UTC deployed        kubeapps-7.5.8          2.4.1

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

absoludity · 2021-10-18T22:03:02Z

As Antonio mentioned on slack, this was an arg to the kubeops service (but was not exposed in the chart). We need to make it an arg to the kubeapps-apis service as a generic timeout that plugins may or may not obey, and pass it to the helm plugin for now.

antgamdia · 2021-10-19T10:03:25Z

Just for the record, as mentioned in slack, I guess we should:

Create a --timeout parameter in the Cobra handler (like https://github.com/kubeapps/kubeapps/blob/master/cmd/kubeapps-apis/cmd/root.go#L78)
Pass the --timeout parameter in the chart (like https://github.com/kubeapps/kubeapps/blob/master/chart/kubeapps/templates/kubeappsapis/deployment.yaml#L79)
Use it and pass it through each Helm action (like https://github.com/kubeapps/kubeapps/blob/master/pkg/agent/agent.go#L87)

PRs are welcomed :)

castelblanque · 2021-11-23T16:30:42Z

I'm trying to come up with an implementation for this :)

Handling of the input flag in the Cobra command is clear
On the other end, usage of the value for e.g. Helm's Install command is clear as well

But I'm struggling with the step in between, that is, passing the actual timeout value to the plugin, so that it is available upon each of the operations.

Would it make sense to pass an additional param that would contain plugin-level common values (like timeout for operations) when registering plugins? Here. It would be kept at the server level, and thus accesible from all operations in the plugin.

antgamdia · 2021-11-23T16:57:41Z

But I'm struggling with the step in between, that is, passing the actual timeout value to the plugin, so that it is available upon each of the operations.

What you're pointing out is something interesting and we really need. I mean, passing arbitrary plugin-level config in runtime is essential for our plugins to run.
Although I did mention this --flag way, these flags are intended for the core API server and not for the plugins, so it wasn't a good suggestion, my bad.
Fortunately, Satya has been working on passing plugin-specific config and I think we can leverage this work also for this feature.

What about extending this struct with a helmAgentConfig or similar?
This way we would have the config available for the Helm plugin, I guess.

Of course, another alternative is using ENV vars as we were doing at the beginning of the development.

absoludity · 2021-11-23T23:12:22Z

What about extending this struct with a helmAgentConfig or similar? This way we would have the config available for the Helm plugin, I guess.

Yes, the idea there is that we users can add pluginConfig in the values here:

https://github.com/kubeapps/kubeapps/blob/master/chart/kubeapps/values.yaml#L1602

and then each plugin is responsible for parsing that (and failing if validation fails for the plugins config) when the plugin is registered (the path to the config is passed to the plugin). So it won't be going via the cobra command (well, only in the sense that the pluginconfiguration path is passed in this way).

Thanks for raising the issue clearly Rafa!

castelblanque · 2021-11-24T07:55:29Z

Ok, at first I thought that timeout would be a generic CLI arg that plugins may honor or not.
Let's make it using pluginConfig.

Thanks @antgamdia @absoludity !

castelblanque · 2021-11-24T17:03:51Z

It seems that it is only caused by using hooks which take too long (>300 seconds by default).
When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

How to reproduce

Add a hook to the Helm chart. E.g.

apiVersion: batch/v1
kind: Job
metadata:
  name: "{{.Release.Name}}"
  labels:
    app.kubernetes.io/managed-by: {{.Release.Service | quote }}
    app.kubernetes.io/instance: {{.Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion }}
    helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "helm.sh/hook": pre-install
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    metadata:
      name: "{{.Release.Name}}"
      labels:
        app.kubernetes.io/managed-by: {{.Release.Service | quote }}
        app.kubernetes.io/instance: {{.Release.Name | quote }}
        helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
    spec:
      restartPolicy: Never
      containers:
      - name: pre-install-job
        image: "alpine:3.3"
        command: ["/bin/sleep","{{default "10" .Values.sleepyTime}}"]

Set a sleepyTime value greater than the timeout set for Helm (--timeout=10s)

absoludity · 2021-11-24T23:10:09Z

It seems that it is only caused by using hooks which take too long (>300 seconds by default).

Yes - I guess because helm just creates the resources (without --wait), unless a hook is defined that needs to run (to completion) before the resources are created (or after), to be considered successful.

When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

Do you mean when installing a chart without hooks? So with the hook, the create/update can be affected and this is the use-case originally reported? Sorry, just confused by the last statement.

castelblanque · 2021-11-25T07:57:11Z

When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

Do you mean when installing a chart without hooks? So with the hook, the create/update can be affected and this is the use-case originally reported? Sorry, just confused by the last statement.

No, it happens in any case. Timeout always applies to hooks, even when not using wait. Timeout only applies to create/update when wait is set.

This is the flow that Helm uses (at least for installing):

Pre-install hooks always applying timeout
Create or Update resources without waiting
If wait flag is set, wait until resources are correctly created/updated
Post-install hooks always applying timeout

In other words:

timeout always applies to pre/post hooks. waitflag never affects hooks.
timeout applies to create/update resources only when using wait flag

Sorry, maybe I'm entangling the explanation 😅

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

* Added timeout parameter for Helm operations (#3604) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Finalising Helm timeout tests (#3604) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Finalising Helm timeout tests (#3604) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com> * Finalising Helm timeout tests (#3604) Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

absoludity added component/apis-server Issue related to kubeapps api-server kind/feature An issue that reports a feature (approved) to be implemented size/S labels Oct 18, 2021

antgamdia added the good first issue label Oct 19, 2021

antgamdia added this to Inbox in Kubeapps Oct 19, 2021

antgamdia added this to the Community requests milestone Oct 19, 2021

ppbaena added the priority/low label Oct 25, 2021

ppbaena moved this from Inbox to Backlog in Kubeapps Oct 25, 2021

antgamdia moved this from Backlog to Next iteration discussion in Kubeapps Nov 22, 2021

ppbaena moved this from Next iteration discussion to Committed in Kubeapps Nov 22, 2021

castelblanque self-assigned this Nov 22, 2021

castelblanque moved this from Committed to In progress in Kubeapps Nov 22, 2021

castelblanque pushed a commit that referenced this issue Nov 25, 2021

Added timeout parameter for Helm operations (#3604)

b23faec

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque mentioned this issue Nov 25, 2021

Added timeout parameter for Helm operations (#3604) #3822

Merged

castelblanque pushed a commit that referenced this issue Nov 26, 2021

Finalising Helm timeout tests (#3604)

a7fb982

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque pushed a commit that referenced this issue Nov 26, 2021

Finalising Helm timeout tests (#3604)

a0a8308

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque pushed a commit that referenced this issue Nov 26, 2021

Finalising Helm timeout tests (#3604)

45fcd2e

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

castelblanque moved this from In progress to Done in Kubeapps Nov 30, 2021

castelblanque closed this as completed Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout Helm deployment > 300 seconds failed #3604

Timeout Helm deployment > 300 seconds failed #3604

misteio commented Oct 18, 2021 •

edited

absoludity commented Oct 18, 2021

antgamdia commented Oct 19, 2021

castelblanque commented Nov 23, 2021

antgamdia commented Nov 23, 2021

absoludity commented Nov 23, 2021 •

edited

castelblanque commented Nov 24, 2021

castelblanque commented Nov 24, 2021

absoludity commented Nov 24, 2021

castelblanque commented Nov 25, 2021

Timeout Helm deployment > 300 seconds failed #3604

Timeout Helm deployment > 300 seconds failed #3604

Comments

misteio commented Oct 18, 2021 • edited

Description:

Steps to reproduce the issue:

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Version of Helm, Kubeapps and Kubernetes:

absoludity commented Oct 18, 2021

antgamdia commented Oct 19, 2021

castelblanque commented Nov 23, 2021

antgamdia commented Nov 23, 2021

absoludity commented Nov 23, 2021 • edited

castelblanque commented Nov 24, 2021

castelblanque commented Nov 24, 2021

How to reproduce

absoludity commented Nov 24, 2021

castelblanque commented Nov 25, 2021

misteio commented Oct 18, 2021 •

edited

absoludity commented Nov 23, 2021 •

edited