Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero cannot backup one StorageClass with plugins and another StorageClass with restic #2999

Closed
sagor999 opened this issue Oct 9, 2020 · 10 comments
Labels
Area/CSI Related to Container Storage Interface support Enhancement/User End-User Enhancement to Velero Restic Relates to the restic integration Reviewed Q2 2021 Volumes Relating to volume backup and restore

Comments

@sagor999
Copy link

sagor999 commented Oct 9, 2020

What steps did you take and what happened:
Upgraded to velero 1.5.1 to take advantage of --default-volumes-to-restic and not having to annotate every single volume.
Unfortunately it doesn't work. Fails with PartiallyFailed.
Here is the command that I run:
velero backup create lvm-test6 --default-volumes-to-restic --include-namespaces default --selector app=lvm-test
Here is log:

time="2020-10-09T21:51:57Z" level=info msg="Executing custom action" backup=velero/lvm-test6 logSource="pkg/backup/item_backupper.go:327" name=lvm-test-pvc namespace=default resource=persistentvolumeclaims
time="2020-10-09T21:51:57Z" level=info msg="Starting PVCBackupItemAction" backup=velero/lvm-test6 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/backup/pvc_action.go:57" pluginName=velero-plugin-for-csi
time="2020-10-09T21:51:57Z" level=info msg="Fetching storage class for PV lvm-local-disk" backup=velero/lvm-test6 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/backup/pvc_action.go:101" pluginName=velero-plugin-for-csi
time="2020-10-09T21:51:57Z" level=info msg="1 errors encountered backup up item" backup=velero/lvm-test6 logSource="pkg/backup/backup.go:451" name=lvm-test-5ff8bbd8f5-r7gtc
time="2020-10-09T21:51:57Z" level=error msg="Error backing up item" backup=velero/lvm-test6 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=default, name=lvm-test-pvc): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass lvm-local-disk: failed to get volumesnapshotclass for provisioner topolvm.cybozu.com" logSource="pkg/backup/backup.go:455" name=lvm-test-5ff8bbd8f5-r7gtc
time="2020-10-09T21:51:57Z" level=info msg="Backed up 3 items out of an estimated total of 4 (estimate will change throughout the backup)" backup=velero/lvm-test6 logSource="pkg/backup/backup.go:418" name=lvm-test-5ff8bbd8f5-r7gtc namespace=default progress= resource=pods

Velero is installed using helm:

    initContainers: 
      - name: velero-plugin-for-aws
        image: velero/velero-plugin-for-aws:v1.1.0
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - mountPath: /target
            name: plugins
      - name: velero-plugin-for-csi
        image: velero/velero-plugin-for-csi:v0.1.1
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - mountPath: /target
            name: plugins
...
      # Comma separated list of velero feature flags. default: empty
      features: EnableCSI
...
      # Set true for backup all pod volumes without having to apply annotation on the pod when used restic Default: false. Other option: false.
      defaultVolumesToRestic: true

We have two storage providers in our on-prem cluster: CephCSI, and TopolVM.
CephCSI supports snapshotting and works fine.
TopolVM volumes do not support snapshotting and should be using restic.

Annotating volume for opt-in works fine though:

      annotations:
        backup.velero.io/backup-volumes: lvm-volume

Is there a way to make it so that velero automatically backs up volumes that support snapshotting via snapshot feature, and for the rest to default to restic?

What did you expect to happen:
Expected it to work.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
Client:
        Version: v1.5.1
        Git commit: -
Server:
        Version: v1.5.1
  • Velero features (use velero client config get features):
velero client config get features
features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-19T09:16:25Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
    on prem cluster.
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@carlisia carlisia self-assigned this Oct 19, 2020
@carlisia carlisia added Needs info Waiting for information Needs investigation and removed Needs info Waiting for information labels Oct 19, 2020
@carlisia
Copy link
Contributor

I'm double checking if restic is supposed to work when CSI is enabled.

@carlisia carlisia added the Restic Relates to the restic integration label Oct 21, 2020
@carlisia carlisia removed their assignment Oct 21, 2020
@nrb nrb added the Area/CSI Related to Container Storage Interface support label Oct 23, 2020
@nrb nrb self-assigned this Oct 23, 2020
@nrb nrb added Volumes Relating to volume backup and restore Enhancement/User End-User Enhancement to Velero labels Oct 23, 2020
@nrb
Copy link
Contributor

nrb commented Oct 23, 2020

Is there a way to make it so that velero automatically backs up volumes that support snapshotting via snapshot feature, and for the rest to default to restic?

Not right now, no. There's not a way to allow Velero to use restic for one storage class (TopolVM in your case) and a plugin for another.

We have had similar requests, and this is likely a feature we need to look into, as mixed environments seem to becoming more prevalent.

I'm going to update the issue title to reflect that this would be a feature request, and I can't find a pre-existing issue.

@nrb nrb changed the title --default-volumes-to-restic fails to backup topolVM volume Velero cannot backup one StorageClass with plugins and another StorageClass with restic Oct 23, 2020
@nrb
Copy link
Contributor

nrb commented Oct 23, 2020

@betta1 I think you were requesting this too at one point, if I'm not mistaken.

@nrb nrb removed their assignment Oct 23, 2020
@ashish-amarnath
Copy link
Contributor

@sagor999 It looks to me like you wanted to use restic to backup volumes that were backed by a CSI provider using the --default-volumes-to-restic feature.
The v0.1.1 release of the velero-plugin-for-csi is incompatible with the v1.5.x as it uses an old API to determine whether or not a volume is being backed up using restic. Specifically, this change.

Please update your velero-plugin-for-csi to the latest release v0.1.2

@ashish-amarnath
Copy link
Contributor

ashish-amarnath commented Oct 24, 2020

The reported issue is fixed in the v0.1.2 release of the CSI plugin.
This can be closed once this is confirmed.
Based on this I am removing the "Needs Investigation" label and adding the "Needs Info" label.

I think the "Enhancement/User" label was added for

Is there a way to make it so that velero automatically backs up volumes that support snapshotting via snapshot feature, and for the rest to default to restic?

So preserving that label.

@ashish-amarnath ashish-amarnath added Needs info Waiting for information Enhancement/User End-User Enhancement to Velero and removed Enhancement/User End-User Enhancement to Velero Needs investigation labels Oct 24, 2020
@sagor999
Copy link
Author

sagor999 commented Nov 3, 2020

@ashish-amarnath I just tried to use this again. For some reason, now it doesn't use restic at all.
It just seems to skip the actual backup phase of the volume.

time="2020-11-03T01:31:56Z" level=info msg="Backing up item" backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:121" name=data-sentry-kafka-0 namespace=ctla resource=persistentvolumeclaims
time="2020-11-03T01:31:56Z" level=info msg="Executing custom action" backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:327" name=data-sentry-kafka-0 namespace=ctla resource=persistentvolumeclaims
time="2020-11-03T01:31:56Z" level=info msg="Executing PVCAction" backup=velero/sentry-kafka0 cmd=/velero logSource="pkg/backup/backup_pv_action.go:49" pluginName=velero
time="2020-11-03T01:31:56Z" level=info msg="Backing up item" backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:121" name=pvc-1e4bfb1c-ec3e-4c88-a09d-dc2867276d58 namespace= resource=persistentvolumes
time="2020-11-03T01:31:56Z" level=info msg="Executing takePVSnapshot" backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:405" name=pvc-1e4bfb1c-ec3e-4c88-a09d-dc2867276d58 namespace= resource=persistentvolumes
time="2020-11-03T01:31:56Z" level=info msg="Backup has volume snapshots disabled; skipping volume snapshot action." backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:408" name=pvc-1e4bfb1c-ec3e-4c88-a09d-dc2867276d58 namespace= resource=persistentvolumes
time="2020-11-03T01:31:56Z" level=info msg="Executing custom action" backup=velero/sentry-kafka0 logSource="pkg/backup/item_backupper.go:327" name=data-sentry-kafka-0 namespace=ctla resource=persistentvolumeclaims
time="2020-11-03T01:31:56Z" level=info msg="Starting PVCBackupItemAction" backup=velero/sentry-kafka0 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/backup/pvc_action.go:58" pluginName=velero-plugin-for-csi
time="2020-11-03T01:31:56Z" level=info msg="Volume snapshotting not requested for backup velero/sentry-kafka0" backup=velero/sentry-kafka0 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/backup/pvc_action.go:62" pluginName=velero-plugin-for-csi

I looked at storage, and couldn't find actual restic data in there.
I tried to restore from that backup, and yes, it restores PV and PVC, but doesn't perform restic restore init container. So no actual data is being restored\backed up.

I also tried adding annotation:
backup.velero.io/backup-volumes: kafka0-volume but that did not do anything.

@ashish-amarnath
Copy link
Contributor

@sagor999 Can you please share a sample workload that you are trying to backup and restore?
If you already have done that, please point me to it. I will try this out to see what might be happening.

@sagor999
Copy link
Author

sagor999 commented Nov 3, 2020

@ashish-amarnath Sure, here it is:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: velero-test-backup-pod
  namespace: ctla
spec:
  replicas: 1
  selector:
    matchLabels:
      app: velero-test-backup-pod
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: kafka0-volume
      labels:
        app: velero-test-backup-pod
    spec:
      volumes:
        - name: kafka0-volume
          persistentVolumeClaim:
            claimName: data-sentry-kafka-0
      containers:
        - name: app
          image: ubuntu:focal
          args:
            - /bin/sh
            - -c
            - touch /tmp/healthy; sleep 50000000;
          volumeMounts:
            - mountPath: "/backup"
              name: kafka0-volume

And PVC that is referenced is backed by TopolVM (https://github.com/topolvm/topolvm).

Velero itself is installed via helm chart, using two plugins:
velero/velero-plugin-for-gcp:v1.1.0
velero/velero-plugin-for-csi:v0.1.2
features: EnableCSI

Thank you!

@sagor999
Copy link
Author

sagor999 commented Nov 4, 2020

@ashish-amarnath I found the cause I think. in helm chart I had this set:
configuration.defaultVolumesToRestic: true
setting it to false fixed restic backup\restore and now during restore it properly adds restic-restore init container.
So something is wrong\bugged with that option I think.

@carlisia carlisia added Info Received and removed Needs info Waiting for information labels Nov 11, 2020
@eleanor-millman
Copy link
Contributor

Closing because it seems to be resolved from a Velero point of view. Looks like the current issue is with the Helm chart. If so, please open an issue against the Helm chart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support Enhancement/User End-User Enhancement to Velero Restic Relates to the restic integration Reviewed Q2 2021 Volumes Relating to volume backup and restore
Projects
None yet
Development

No branches or pull requests

5 participants