Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A flux app in failed install state, deleted via UX, stays around in k8s flux HelmRelease CR #5577

Open
gfichtenholt opened this issue Oct 27, 2022 · 5 comments · Fixed by #5584
Labels
component/plugin-flux Issue related to kubeapps plugin to manage Helm charts via Flux kind/bug An issue that reports a defect in an existing feature

Comments

@gfichtenholt
Copy link
Contributor

gfichtenholt commented Oct 27, 2022

on a brand new kind cluster created via 'make cluster-kind'

  1. deploy flux and install kubeapps with flux plugin
  2. login as kubeapps-operator (cluster admin)
  3. add package repository
    a) name: podinfo
    b) url: https://stefanprodan.github.io/podinfo
    c) description: test
    c) Packaging format: pick Helm Charts via Flux. Notice as soon as you pick that 'Description' field gets disabled
    d) click INSTALL RESPOSITORY and verify it gets to Ready state. I had to hit 'REFRESH' once.
  4. click Catalog. pick podinfo, version 6.2.2 (as of today)
    5 click DEPLOY
    a) name: podinfo
    b) service account: pick "default"
    c) click "DEPLOY 6.2.2"
  5. On the screen immediately following this you will see something like
    Status: pending
    ArtifactFailed: HelmChart 'default/default-podinfo' is not ready (see attached 1)
  6. Reload the page (attached screen 2) until it says status: failed
    BTW, the fact that it fails to install is not a problem - the service account is not appropriate
    for this app
  7. Click DELETE to delete the app
  8. Click Applications. Verify the app is gone from that page. However:
  9. the flux HelmRelease CR still exists in k8s:
$ kubectl get hr --all-namespaces -o yaml
apiVersion: v1
items:
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
  kind: HelmRelease
  metadata:
    creationTimestamp: "2022-10-27T07:37:45Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2022-10-27T07:40:51Z"
    finalizers:
    - finalizers.fluxcd.io
    generation: 2
    name: podinfo
    namespace: default
    resourceVersion: "32419"
    uid: 04f27a0b-b813-4d3d-abc9-ed749b46c82d
  spec:
    chart:
      spec:
        chart: podinfo
        reconcileStrategy: ChartVersion
        sourceRef:
          kind: HelmRepository
          name: podinfo
          namespace: default
        version: 6.2.2
    interval: 1m0s
    serviceAccountName: default
    timeout: 5m0s
    values:
      affinity: {}
      backend: null
      backends: []
      cache: ""
      certificate:
        create: false
        dnsNames:
        - podinfo
        issuerRef:
          kind: ClusterIssuer
          name: self-signed
      faults:
        delay: false
        error: false
        testFail: false
        testTimeout: false
        unhealthy: false
        unready: false
      h2c:
        enabled: false
      host: null
      hpa:
        cpu: null
        enabled: false
        maxReplicas: 10
        memory: null
        requests: null
      image:
        pullPolicy: IfNotPresent
        repository: ghcr.io/stefanprodan/podinfo
        tag: 6.2.2
      ingress:
        annotations: {}
        className: ""
        enabled: false
        hosts:
        - host: podinfo.local
          paths:
          - path: /
            pathType: ImplementationSpecific
        tls: []
      linkerd:
        profile:
          enabled: false
      logLevel: info
      nodeSelector: {}
      podAnnotations: {}
      probes:
        liveness:
          failureThreshold: 3
          initialDelaySeconds: 1
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        readiness:
          failureThreshold: 3
          initialDelaySeconds: 1
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
      redis:
        enabled: false
        repository: redis
        tag: 6.0.8
      replicaCount: 1
      resources:
        limits: null
        requests:
          cpu: 1m
          memory: 16Mi
      securityContext: {}
      service:
        annotations: {}
        enabled: true
        externalPort: 9898
        grpcPort: 9999
        grpcService: podinfo
        hostPort: null
        httpPort: 9898
        metricsPort: 9797
        nodePort: 31198
        type: ClusterIP
      serviceAccount:
        enabled: false
        imagePullSecrets: []
        name: null
      serviceMonitor:
        additionalLabels: {}
        enabled: false
        interval: 15s
      tls:
        certPath: /data/cert
        enabled: false
        hostPort: null
        port: 9899
        secretName: null
      tolerations: []
      ui:
        color: '#34577c'
        logo: ""
        message: ""
  status:
    conditions:
    - lastTransitionTime: "2022-10-27T07:37:45Z"
      message: failed to get last release revision
      reason: GetLastReleaseFailed
      status: "False"
      type: Ready
    failures: 9
    helmChart: default/default-podinfo
    observedGeneration: 1
kind: List
metadata:
  resourceVersion: ""

and it will stay there (see workaround below) and will prevent any new app with the same name (podinfo) from being deployed

doing
$ kubectl delete hr/podinfo does not help. the command prints
helmrelease.helm.toolkit.fluxcd.io "podinfo" deleted and hangs

Noticed that if I do a manual kubectl edit hr/podinfo and remove the lines

finalizers:
- finalizers.fluxcd.io

the CR will go away
Screen Shot 2022-10-27 at 12 14 33 AM
Screen Shot 2022-10-27 at 12 15 44 AM

@gfichtenholt gfichtenholt added the kind/bug An issue that reports a defect in an existing feature label Oct 27, 2022
@gfichtenholt gfichtenholt changed the title A flux app in failed install state, deleted via UX stays around in k8s A flux app in failed install state, deleted via UX, stays around in k8s flux HelmRelease CR Oct 27, 2022
@gfichtenholt gfichtenholt added the component/plugin-flux Issue related to kubeapps plugin to manage Helm charts via Flux label Oct 27, 2022
@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Oct 27, 2022

right before the DELETE button is clicked:

$ kubectl get hr --all-namespaces -o yaml | tee ~/Desktop/kubeapps/hr-before-delete-2.yaml
apiVersion: v1
items:
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
  kind: HelmRelease
  metadata:
    creationTimestamp: "2022-10-27T07:59:43Z"
    finalizers:
    - finalizers.fluxcd.io
    generation: 1
    name: podinfo
    namespace: default
    resourceVersion: "35624"
    uid: 36263d3d-546f-4982-8d26-393277d4b3db
  spec:
    chart:
      spec:
        chart: podinfo
        reconcileStrategy: ChartVersion
        sourceRef:
          kind: HelmRepository
          name: podinfo
          namespace: default
        version: 6.2.2
    interval: 1m0s
    serviceAccountName: default
    timeout: 5m0s
    values:
      affinity: {}
      backend: null
      backends: []
      cache: ""
      certificate:
        create: false
        dnsNames:
        - podinfo
        issuerRef:
          kind: ClusterIssuer
          name: self-signed
      faults:
        delay: false
        error: false
        testFail: false
        testTimeout: false
        unhealthy: false
        unready: false
      h2c:
        enabled: false
      host: null
      hpa:
        cpu: null
        enabled: false
        maxReplicas: 10
        memory: null
        requests: null
      image:
        pullPolicy: IfNotPresent
        repository: ghcr.io/stefanprodan/podinfo
        tag: 6.2.2
      ingress:
        annotations: {}
        className: ""
        enabled: false
        hosts:
        - host: podinfo.local
          paths:
          - path: /
            pathType: ImplementationSpecific
        tls: []
      linkerd:
        profile:
          enabled: false
      logLevel: info
      nodeSelector: {}
      podAnnotations: {}
      probes:
        liveness:
          failureThreshold: 3
          initialDelaySeconds: 1
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        readiness:
          failureThreshold: 3
          initialDelaySeconds: 1
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
      redis:
        enabled: false
        repository: redis
        tag: 6.0.8
      replicaCount: 1
      resources:
        limits: null
        requests:
          cpu: 1m
          memory: 16Mi
      securityContext: {}
      service:
        annotations: {}
        enabled: true
        externalPort: 9898
        grpcPort: 9999
        grpcService: podinfo
        hostPort: null
        httpPort: 9898
        metricsPort: 9797
        nodePort: 31198
        type: ClusterIP
      serviceAccount:
        enabled: false
        imagePullSecrets: []
        name: null
      serviceMonitor:
        additionalLabels: {}
        enabled: false
        interval: 15s
      tls:
        certPath: /data/cert
        enabled: false
        hostPort: null
        port: 9899
        secretName: null
      tolerations: []
      ui:
        color: '#34577c'
        logo: ""
        message: ""
  status:
    conditions:
    - lastTransitionTime: "2022-10-27T07:59:43Z"
      message: failed to get last release revision
      reason: GetLastReleaseFailed
      status: "False"
      type: Ready
    failures: 9
    helmChart: default/default-podinfo
    observedGeneration: 1
kind: List
metadata:
  resourceVersion: ""

and

$ grpcurl -plaintext -d '{"context": {"cluster": "default", "namespace": "default"}}' -H "Authorization: $token" localhost:8080 kubeappsapis.plugins.fluxv2.packages.v1alpha1.FluxV2PackagesService.GetInstalledPackageSummaries
{
  "installedPackageSummaries": [
    {
      "installedPackageRef": {
        "context": {
          "cluster": "default",
          "namespace": "default"
        },
        "identifier": "podinfo",
        "plugin": {
          "name": "fluxv2.packages",
          "version": "v1alpha1"
        }
      },
      "name": "podinfo",
      "pkgVersionReference": {
        "version": "6.2.2"
      },
      "currentVersion": {
        "pkgVersion": "6.2.2",
        "appVersion": "6.2.2"
      },
      "pkgDisplayName": "podinfo",
      "shortDescription": "Podinfo Helm chart for Kubernetes",
      "latestVersion": {
        "pkgVersion": "6.2.2",
        "appVersion": "6.2.2"
      },
      "status": {
        "reason": "STATUS_REASON_FAILED",
        "userReason": "GetLastReleaseFailed: failed to get last release revision"
      }
    }
  ]
}

@gfichtenholt
Copy link
Contributor Author

delete can be executed via grpc:

$ grpcurl -plaintext -d '{"installed_package_ref": {"context": {"cluster": "default", "namespace": "default"}, "plugin": {"name": "fluxv2.packages", "version": "v1alpha1"}, "identifier": "podinfo"}}' -H "Authorization: $token" localhost:8080 kubeappsapis.core.packages.v1alpha1.PackagesService.DeleteInstalledPackage
{
  
}

@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Oct 27, 2022

Notice after the delete is executed backend returns an empty list:

$ grpcurl -plaintext -d '{"context": {"cluster": "default", "namespace": "default"}}' -H "Authorization: $token" localhost:8080 kubeappsapis.plugins.fluxv2.packages.v1alpha1.FluxV2PackagesService.GetInstalledPackageSummaries
{
  
}

but the HelmRelease CR is still there

$ kubectl get hr --all-namespaces
NAMESPACE   NAME      AGE   READY   STATUS
default     podinfo   12m   False   failed to get last release revision

This sure looks like a backend issue so I am self-assigning this. flux plugin skips this CR when listing installed packages due to metadata.generation=2 and status.observedGeneration=1 which basically says the CR is being "reconciled" which I guess maybe technically the case, except that said reconciliation does not end unless you help it along.

I think the fact that GetInstalledPackageSummaries() returns an empty set and GetInstalledPackageDetail() returns the detail IS one problem that should be fixed. They should be consistent.

@gfichtenholt gfichtenholt self-assigned this Oct 27, 2022
gfichtenholt added a commit that referenced this issue Nov 5, 2022
…X, stays around in k8s flux HelmRelease CR #5577  (#5584)

fixed an inconsistency between GetInstalledPackageSummaries() and
GetInstalledPackageDetail() in one corner case.
Main fix is dependent on flux
fluxcd/helm-controller#554

There is only one small change to production code. The rest is
test-related code. Also,

+ added a few integration tests.
+ bump flux version in tests
+ fix for available package handling with flux in multi-tenant mode
#5541
@gfichtenholt
Copy link
Contributor Author

Reopening since dependent fluxcd/helm-controller#554 is still open

@gfichtenholt gfichtenholt reopened this Nov 5, 2022
@ppbaena ppbaena added this to the Technical debt milestone Nov 7, 2022
@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Nov 21, 2022

Capturing notes from slack discussion:
I got a response (to dependent issue) but it's not a great one. Basically, they claim there is nothing they can do at their level. Full discussion is here
Whether that's entirely true or not needs some investigation.

One other thought, our UX recently made a change where service account is optional for flux packages. So the problem vector space has really become smaller compared to the time I opened the issue. Nowadays, one has to actually go "change away" from default settings to cause the issue to happen. In that case and if what flux guys claim is correct, we may have no choice but direct our users to manually "patch the finalizers out" out of the flux CRD via kubectl as flux guys suggest (edited)

Comment on #554 HelmRelease with non-existing service account gets stuck being deleted
Looks like this is related to: fluxcd/flux2#997
The missing service account means that HelmRelease can't actually do anything, because its permissions in RBAC only come from the service account. So the clean-up tasks that are normally associated with deleting a HelmRelease can't be performed, or checked for performance.
Deleting a HelmRelease normally would perform helm uninstall (unless it has been suspended before deleting.)
There is an example command here that shows how to wield kubectl patch in order to remove a finalizer, without invoking kubectl edit or necessarily requiring that you drop into an editor or imperative workflow.
https://kubernetes.io/blog/2021/05/14/using-finalizers-to-cont… Show more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/plugin-flux Issue related to kubeapps plugin to manage Helm charts via Flux kind/bug An issue that reports a defect in an existing feature
Projects
Status: 🗂 Backlog
3 participants
@ppbaena @gfichtenholt and others