Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.4.0-beta.1] Unable to restore restic data with custom certificate option. #2562

Closed
leitaof opened this issue May 22, 2020 · 15 comments · Fixed by #2576
Closed

[v1.4.0-beta.1] Unable to restore restic data with custom certificate option. #2562

leitaof opened this issue May 22, 2020 · 15 comments · Fixed by #2576
Assignees
Labels
Milestone

Comments

@leitaof
Copy link

leitaof commented May 22, 2020

What steps did you take and what happened:
I have deployed the latest beta version to use custom ca cert. The backups is performed properly with restic and I see the data in minio under mybubket/restic

But when trying to restore restic fails whit x509: certificate signed by unknown authority

What did you expect to happen:
The connection should work for the restic restore as the same for the restic backup.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version):
    Client:
    Version: v1.4.0-beta.1
    Git commit: 8bf75bd
    Server:
    Version: v1.4.0-beta.1

  • Velero features (use velero client config get features):
    features:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:08:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version:
    rke v1.0.6

  • Cloud provider or hardware configuration:
    hardware

  • OS (e.g. from /etc/os-release):
    kubectl get nodes -o wide
    NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    ltec-fil-m-01 Ready controlplane,etcd 245d v1.15.11 10.195.177.52 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-m-02 Ready controlplane,etcd 245d v1.15.11 10.195.177.53 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-m-03 Ready controlplane,etcd 245d v1.15.11 10.195.177.54 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-01 Ready worker 245d v1.15.11 10.195.177.55 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-02 Ready worker 245d v1.15.11 10.195.177.56 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-03 Ready worker 207d v1.15.11 10.195.177.57 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9
    ltec-fil-w-99 Ready worker 121d v1.15.11 10.195.200.99 Ubuntu 16.04.6 LTS 4.4.0-169-generic docker://18.9.9

@skriss skriss added the Bug label May 26, 2020
@skriss
Copy link
Member

skriss commented May 26, 2020

Ah. I see what's going on here. While we're correctly passing the --cacert flag to the actual restic restore command, we're not passing it to the restic stats command here: https://github.com/vmware-tanzu/velero/blob/master/pkg/restic/exec_commands.go#L188-L191.

I'll work on a fix for this.

@skriss skriss added this to the v1.4 milestone May 26, 2020
@skriss skriss self-assigned this May 26, 2020
@skriss
Copy link
Member

skriss commented May 26, 2020

@leitaof if you're available, it'd be great to have you test out a fix for this. I should have a docker image up shortly that you can use.

@leitaof
Copy link
Author

leitaof commented May 26, 2020

@skriss Sure, will test it this after noon whit the new docker image.

@skriss
Copy link
Member

skriss commented May 26, 2020

Awesome, thanks!

OK, the image with the fix is: steveheptio/velero:fix-2562. You can swap it with:

kubectl -n velero set image deployment/velero velero=steveheptio/velero:fix-2562
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:fix-2562

@leitaof
Copy link
Author

leitaof commented May 26, 2020

I have tried the restore but my pod is unable to find the fixed image because hes searching in the velero repo instead of steveheptio

29m Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29m Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29m Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
16m Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image

@skriss
Copy link
Member

skriss commented May 26, 2020

Ah, shoot. I retagged the image with the fix as steveheptio/velero:v1.4.0-beta.1 (despite the tag, it does include the fix). You can use that updated image, which should avoid the error you got:

kubectl -n velero set image deployment/velero velero=steveheptio/velero:v1.4.0-beta.1
kubectl -n velero set image daemonset/restic restic=steveheptio/velero:v1.4.0-beta.1

You'll have to delete the partially-restored workload and try again after updating the images.

@leitaof
Copy link
Author

leitaof commented May 26, 2020

Still same error

2m55s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
2m55s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
4m17s Normal SandboxChanged pod/nexus-694dff6965-cbh6p Pod sandbox changed, it will be killed and re-created.
3m8s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"

Events from updated image
10m Normal Created pod/velero-779455f468-bvwqh Created container velero
10m Normal Pulling pod/velero-779455f468-bvwqh Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Pulled pod/velero-779455f468-bvwqh Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Started pod/velero-779455f468-bvwqh Started container velero
10m Normal ScalingReplicaSet deployment/velero Scaled down replica set velero-775cc8b8fd to 0
10m Normal SuccessfulDelete replicaset/velero-775cc8b8fd Deleted pod: velero-775cc8b8fd-svsjv
10m Normal Killing pod/velero-775cc8b8fd-svsjv Stopping container velero
10m Normal Killing pod/restic-g2jgq Stopping container restic
10m Normal SuccessfulDelete daemonset/restic Deleted pod: restic-g2jgq
10m Normal Pulling pod/restic-7p25r Pulling image "steveheptio/velero:v1.4.0-beta.1"
10m Normal SuccessfulCreate daemonset/restic Created pod: restic-7p25r
10m Normal Scheduled pod/restic-7p25r Successfully assigned velero/restic-7p25r to ltec-fil-w-99
10m Normal Pulled pod/restic-7p25r Successfully pulled image "steveheptio/velero:v1.4.0-beta.1"
10m Normal Created pod/restic-7p25r Created container restic

@skriss
Copy link
Member

skriss commented May 26, 2020

Did you delete this pod: pod/nexus-694dff6965-cbh6p (or the entire namespace) and start a new restore?

@leitaof
Copy link
Author

leitaof commented May 26, 2020

yes I have deleted the namespace and just to be sure I have delete again and did a restore again.

LAST SEEN TYPE REASON OBJECT MESSAGE
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal CREATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
71s Normal Scheduled pod/nexus-694dff6965-cbh6p Successfully assigned nexus/nexus-694dff6965-cbh6p to ltec-fil-w-99
71s Normal ProvisioningSucceeded persistentvolumeclaim/nexus-data Successfully provisioned volume pvc-af93e765-518a-4535-9b8a-14bf2b557b6f
71s Normal Provisioning persistentvolumeclaim/nexus-data External provisioner is provisioning volume for claim "nexus/nexus-data"
71s Normal ExternalProvisioning persistentvolumeclaim/nexus-data waiting for a volume to be created, either by external provisioner "ltec-fil-nfs-client-provisioner" or manually created by system administrator
7s Normal BackOff pod/nexus-694dff6965-cbh6p Back-off pulling image "velero/velero-restic-restore-helper:fix-2562"
7s Warning Failed pod/nexus-694dff6965-cbh6p Error: ImagePullBackOff
29s Warning Failed pod/nexus-694dff6965-cbh6p Error: ErrImagePull
29s Warning Failed pod/nexus-694dff6965-cbh6p Failed to pull image "velero/velero-restic-restore-helper:fix-2562": rpc error: code = Unknown desc = Error response from daemon: manifest for velero/velero-restic-restore-helper:fix-2562 not found
29s Normal Pulling pod/nexus-694dff6965-cbh6p Pulling image "velero/velero-restic-restore-helper:fix-2562"
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/docker-public-ingress Ingress nexus/docker-public-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress
53s Normal UPDATE ingress/nexus-http-ingress Ingress nexus/nexus-http-ingress

@skriss
Copy link
Member

skriss commented May 26, 2020

OK, here's the other way to work around this: you can override which image it tries to pull for the restic restore helper by providing a configmap that specifies the specific image to use:

kubectl -n velero create configmap restic-restore-action-config --from-literal=image=velero/velero-restic-restore-helper:v1.4.0-beta.1
kubectl -n velero label configmap restic-restore-action-config velero.io/plugin-config=
kubectl -n velero label configmap restic-restore-action-config velero.io/restic=RestoreItemAction

After setting this up, you'll need to (a) delete the partially-restored workload/namespace in your cluster, and (b) try a new restore.

Thanks for the patience!

@skriss
Copy link
Member

skriss commented May 26, 2020

Ah, I think I see why you were still getting the issue with pulling the fix-2562 restore helper tag - retagging the core velero image wasn't sufficient to have it change which tag it pulled for the restic restore helper; the velero binary needed to be fully recompiled with the different version tag.

@nrb nrb closed this as completed in #2576 May 26, 2020
@skriss
Copy link
Member

skriss commented May 26, 2020

@leitaof we went ahead and merged the code change since it seemed straight-forward and low-risk, but we'd still like to have your verification!

@leitaof
Copy link
Author

leitaof commented May 26, 2020

No problem but i will test it tomorrow and give you feedback after.

@leitaof
Copy link
Author

leitaof commented May 27, 2020

I have tested the restore whit the latest v1.4.0 and it work properly.
Thanks guys for your good work!

@skriss
Copy link
Member

skriss commented May 27, 2020

awesome, thanks again for the testing and feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants