Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Helm deployment > 300 seconds failed #3604

Closed
misteio opened this issue Oct 18, 2021 · 9 comments
Closed

Timeout Helm deployment > 300 seconds failed #3604

misteio opened this issue Oct 18, 2021 · 9 comments
Assignees
Labels
component/apis-server Issue related to kubeapps api-server good first issue kind/feature An issue that reports a feature (approved) to be implemented
Projects

Comments

@misteio
Copy link

misteio commented Oct 18, 2021

Description:

I want to install an Helm Chart that takes more than 300 seconds to deploy.

Steps to reproduce the issue:

  1. Deploy a personal Helm Chart through KubeApps.
  2. Wait around 300 seconds
  3. It failed

Describe the results you received:

Error

Describe the results you expected:

It could be great if we can change this default behavior.

Additional information you deem important (e.g. issue happens only occasionally):

I use a custom Helm Chart I've made. When I add the following annotations, it won't wait until the end of deployment.

annotations:
    "helm.sh/hook": post-install,post-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation

Version of Helm, Kubeapps and Kubernetes:

  • Output of helm version:
version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}
  • Output of helm list -n <kubeapps-namespace> <kubeapps-release-name>:
kubeapps        shared          2               2021-10-18 18:08:32.619996037 +0000 UTC deployed        kubeapps-7.5.8          2.4.1  
  • Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
@absoludity
Copy link
Contributor

As Antonio mentioned on slack, this was an arg to the kubeops service (but was not exposed in the chart). We need to make it an arg to the kubeapps-apis service as a generic timeout that plugins may or may not obey, and pass it to the helm plugin for now.

@absoludity absoludity added component/apis-server Issue related to kubeapps api-server kind/feature An issue that reports a feature (approved) to be implemented size/S labels Oct 18, 2021
@antgamdia
Copy link
Contributor

Just for the record, as mentioned in slack, I guess we should:

  1. Create a --timeout parameter in the Cobra handler (like https://github.com/kubeapps/kubeapps/blob/master/cmd/kubeapps-apis/cmd/root.go#L78)
  2. Pass the --timeout parameter in the chart (like https://github.com/kubeapps/kubeapps/blob/master/chart/kubeapps/templates/kubeappsapis/deployment.yaml#L79)
  3. Use it and pass it through each Helm action (like https://github.com/kubeapps/kubeapps/blob/master/pkg/agent/agent.go#L87)

PRs are welcomed :)

@antgamdia antgamdia added this to Inbox in Kubeapps Oct 19, 2021
@antgamdia antgamdia added this to the Community requests milestone Oct 19, 2021
@ppbaena ppbaena moved this from Inbox to Backlog in Kubeapps Oct 25, 2021
@antgamdia antgamdia moved this from Backlog to Next iteration discussion in Kubeapps Nov 22, 2021
@ppbaena ppbaena moved this from Next iteration discussion to Committed in Kubeapps Nov 22, 2021
@castelblanque castelblanque self-assigned this Nov 22, 2021
@castelblanque castelblanque moved this from Committed to In progress in Kubeapps Nov 22, 2021
@castelblanque
Copy link
Collaborator

I'm trying to come up with an implementation for this :)

  • Handling of the input flag in the Cobra command is clear
  • On the other end, usage of the value for e.g. Helm's Install command is clear as well

But I'm struggling with the step in between, that is, passing the actual timeout value to the plugin, so that it is available upon each of the operations.

Would it make sense to pass an additional param that would contain plugin-level common values (like timeout for operations) when registering plugins? Here. It would be kept at the server level, and thus accesible from all operations in the plugin.

@antgamdia
Copy link
Contributor

But I'm struggling with the step in between, that is, passing the actual timeout value to the plugin, so that it is available upon each of the operations.

What you're pointing out is something interesting and we really need. I mean, passing arbitrary plugin-level config in runtime is essential for our plugins to run.
Although I did mention this --flag way, these flags are intended for the core API server and not for the plugins, so it wasn't a good suggestion, my bad.
Fortunately, Satya has been working on passing plugin-specific config and I think we can leverage this work also for this feature.

What about extending this struct with a helmAgentConfig or similar?
This way we would have the config available for the Helm plugin, I guess.

Of course, another alternative is using ENV vars as we were doing at the beginning of the development.

@absoludity
Copy link
Contributor

absoludity commented Nov 23, 2021

What about extending this struct with a helmAgentConfig or similar? This way we would have the config available for the Helm plugin, I guess.

Yes, the idea there is that we users can add pluginConfig in the values here:

https://github.com/kubeapps/kubeapps/blob/master/chart/kubeapps/values.yaml#L1602

and then each plugin is responsible for parsing that (and failing if validation fails for the plugins config) when the plugin is registered (the path to the config is passed to the plugin). So it won't be going via the cobra command (well, only in the sense that the pluginconfiguration path is passed in this way).

Thanks for raising the issue clearly Rafa!

@castelblanque
Copy link
Collaborator

Ok, at first I thought that timeout would be a generic CLI arg that plugins may honor or not.
Let's make it using pluginConfig.

Thanks @antgamdia @absoludity !

@castelblanque
Copy link
Collaborator

It seems that it is only caused by using hooks which take too long (>300 seconds by default).
When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

How to reproduce

  1. Add a hook to the Helm chart. E.g.
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{.Release.Name}}"
  labels:
    app.kubernetes.io/managed-by: {{.Release.Service | quote }}
    app.kubernetes.io/instance: {{.Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion }}
    helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "helm.sh/hook": pre-install
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    metadata:
      name: "{{.Release.Name}}"
      labels:
        app.kubernetes.io/managed-by: {{.Release.Service | quote }}
        app.kubernetes.io/instance: {{.Release.Name | quote }}
        helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
    spec:
      restartPolicy: Never
      containers:
      - name: pre-install-job
        image: "alpine:3.3"
        command: ["/bin/sleep","{{default "10" .Values.sleepyTime}}"]
  1. Set a sleepyTime value greater than the timeout set for Helm (--timeout=10s)

@absoludity
Copy link
Contributor

It seems that it is only caused by using hooks which take too long (>300 seconds by default).

Yes - I guess because helm just creates the resources (without --wait), unless a hook is defined that needs to run (to completion) before the resources are created (or after), to be considered successful.

When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

Do you mean when installing a chart without hooks? So with the hook, the create/update can be affected and this is the use-case originally reported? Sorry, just confused by the last statement.

@castelblanque
Copy link
Collaborator

When installing a chart, create and update operations are not affected by the Helm timeout unless --wait arg is used.

Do you mean when installing a chart without hooks? So with the hook, the create/update can be affected and this is the use-case originally reported? Sorry, just confused by the last statement.

No, it happens in any case. Timeout always applies to hooks, even when not using wait. Timeout only applies to create/update when wait is set.

This is the flow that Helm uses (at least for installing):

  1. Pre-install hooks always applying timeout
  2. Create or Update resources without waiting
  3. If wait flag is set, wait until resources are correctly created/updated
  4. Post-install hooks always applying timeout

In other words:

  • timeout always applies to pre/post hooks. waitflag never affects hooks.
  • timeout applies to create/update resources only when using wait flag

Sorry, maybe I'm entangling the explanation 😅

castelblanque pushed a commit that referenced this issue Nov 25, 2021
Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>
castelblanque pushed a commit that referenced this issue Nov 26, 2021
Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>
castelblanque pushed a commit that referenced this issue Nov 26, 2021
Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>
castelblanque pushed a commit that referenced this issue Nov 26, 2021
Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>
castelblanque added a commit that referenced this issue Nov 29, 2021
* Added timeout parameter for Helm operations (#3604)

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

* Finalising Helm timeout tests (#3604)

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

* Finalising Helm timeout tests (#3604)

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>

* Finalising Helm timeout tests (#3604)

Signed-off-by: Rafa Castelblanque <rcastelblanq@vmware.com>
@castelblanque castelblanque moved this from In progress to Done in Kubeapps Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/apis-server Issue related to kubeapps api-server good first issue kind/feature An issue that reports a feature (approved) to be implemented
Projects
No open projects
Kubeapps
  
Done
Development

No branches or pull requests

5 participants