New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velero randomly restoring another namespaces PVCs on a backup. #2570
Comments
Also to assist in debugging, can I get some information on how a snapshot gets decided to be attached? So just to describe the steps on what happened:
When this environment gets created, what is weird is instead of the Deployment being attached to the new PVC or the StatefulSet being attached the the PVC Claim, it is attached to velero-clone. So instead of a brand new environment, we get a random PVC snapshot attached instead (pretty huge bug + security issue since the PVC is in another namespace.) |
Perhaps Velero is only comparing PVC name instead of namespace + PVC name. Each environment has the same PVC name. Or perhaps?
Also, I believe velero should have a security e-mail of some sort since I looked everywhere for an email and can't find a way to report a security issue to the team. |
@arianitu my guess based on the info that you provided is that the PV named To verify this - could you provide the full YAML for:
|
That could be possible, but how does the PV decide it met its requirements? This was a completely fresh environment, although the yaml of clone looks really weird since it's targeting this new environment.
Looking at that PV, it was created on "2020-05-04T20:18:18Z", but somehow has the namespace "wp-bf083a8f-1ac7-416c-bac5-234f586c4440". How is that possible when the namespace "wp-bf083a8f-1ac7-416c-bac5-234f586c4440" was created yesterday?
By the way, both the Deployment and StatefulSet PVCs got replaced by cloned velero instances twice and only the StatefulSet PVC got replaced on the third time. Every subsequent environment is no longer seeing this behaviour. |
There's some info on the binding process here. The basic idea is that if there's a PV that exists/is available, that has the same storage class as the PVC, and is at least as big as the PVC's requests, then it may be used to satisfy the PVC. Only if there are no existing PVs that match the PVC's spec is a new one dynamically provisioned. The PV itself is a cluster-scoped resource, so it doesn't have a namespace; the namespace inside the PV's YAML just represents the two-way binding with the PVC (which happened when you created the PVC). If you always want new PVs to be dynamically provisioned, then I'd ensure that your cluster contains no available PVs before creating the new PVCs. |
@skriss does this mean only 1 clone can happen at the same time since you're not guaranteed an order to matching a PV? Seems error prone, especially if a restore was pending/failed before. Is there any way to make it consistent so the PV only binds to the intended namespace? |
When Velero clones a PV as part of a restore, it'll specify that specific PV name in the PVC spec, so the new PVC is always bound to the correct PV. So the Velero behavior should be safe as far as I know. The behavior you're seeing isn't something Velero is triggering - it just happens that this old PV that was created by Velero a few weeks ago (not sure why) happens to meet the requirements of this new PVC that you're creating. |
@skriss Okay I think I am understanding this a bit more, but isn't that still a big problem? If a restore fails for any reason, you will have Deployments/StatefulSets being attached to a PV that was cloned instead of a freshly made one. There isn't a good way for us to prevent PVs from existing before creating Deployments/StatefulSets. We create Deployments/StatefulSets all the time and we're cloning environments all the time. If a clone fails, our deployment is going to fail. Is there any solution that can work around this problem? It doesn't seem very intuitive and people will definitely run into this and be a bit shocked to notice that their Deployment is serving another random failed restore. |
@skriss Can we instead manually create our PVCs with a name like you said Velero does to prevent Deployments/StatefulSets from grabbing any PV? |
Here's a race condition as well:
In a busy environment, this situation is going to happen often. |
Closing this since this isn't really an issue with Velero and more of an issue how Kubernetes handles PV/PVCs. |
Hello @skriss sorry to bring this up from the dead, but given that Velero is using ClaimRef, why did our PVCs match to the Velero PV? This issue says this should not happen: kubernetes/kubernetes#23615 and it has been fixed. Can you maybe help me understand ClaimRef and what the idea of it is? I thought ClaimRef is a deterministic way to find a PersistentVolume from a PersistentVolumeClaim. In the above case, why did the ClaimRef match, is it because ClaimRef only uses .name? Or does it use .namespace as well? Shouldn't it have to match on Name + Namespace, why did a separate Namespace match to the PV? https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/util/util.go#L214 It looks like ClaimRef has to match Namespace + Name, so why did our PVC attach to the PV? Other attempts to solve this problem:
The only thing we can think of is to manually create the GKE Disk, PersistentDisk and PersistentDiskVolume using LabelSelectors. But that sucks since we lose the volume abstraction layer of Kuberenetes. This seems very silly, why can't I get a PVC to just NOT binding to a specific PV created by Velero? I've also requested more information on the above Kubernetes issue. |
Can you point me to the logic for this? Are you using ClaimRef, or LabelSelectors or what are you using to guarantee this, maybe this is a Velero bug? Edit: It appears Velero uses volumeName: to restore a PVC to a PV. I'm investigating if there is maybe a better way to do it and guarantee PVCs binding to a PV and not running into the above condition. https://github.com/vmware-tanzu/velero/blob/master/pkg/restore/restore.go#L1051 |
@skriss Can we please also add ClaimRef instead of just volumeName. If we add ClaimRef we should not run into the above race condition since the Name + Namespace would have to match for the PVC to be able to mount to the PV. At line https://github.com/vmware-tanzu/velero/blob/master/pkg/restore/restore.go#L1051, we need to also append ClaimRef with the Name + Namespace. |
@arianitu I'm not working on Velero anymore, but someone from @vmware-tanzu/velero-maintainers should be able to help answer your questions. |
@carlisia can you assist in finding the right maintainer to fix this bug? If you look at the Kubernetes issue above, apparently both volumeName and ClaimRef is required for correct binding of an existing PV to a PVC. edit: Thank you @skriss for all your work on the Velero project, good luck on your future projects! |
@arianitu Yeah, it looks like you're right, we need to make sure that the |
@arianitu Are you doing namespace remapping during your restores? That's the line you linked to, I just want to be sure. I think we need to be checking this whether there's remapping happening or not. Searching through the restore/restore.go code, we do use the ClaimRef in spots, but don't necessarily use it in actual restoration. We also don't reference it in the RestoreItemAction for PVs. I think, but don't know at the moment, that the ClaimRef shouldn't be cleared, but if the PV's coming in and getting attached to the wrong PVCs, then that would point to either the ClaimRef being cleared or something being unset. |
@nrb Yes, I am doing a namespace remapping when I hit this case. Can you show me the code where ClaimRef is used? |
Also @nrb this is a pretty critical bug for us do you know the timeline of a fix? I think PVC restores should definitely use ClaimRef and I don't see that anywhere. |
@arianitu Are you hitting it consistently when doing namespace remapping? I agree we should be setting the ClaimRef on remap cases, though I think it's more or less handled for us when not doing namespace remapping. I don't have a timeline on a fix at the moment, depending on how easily it could reproduced we might be able to put out a patch release, but I can't make promises there at the moment. |
@nrb Yes this is consistent, but it is a race condition. You need another PVC to waiting for a PV that is created by Velero to hit this case. But the bug seems major to me, what do you think? Velero will only restore things consistently as long as nothing else is happening on the cluster (which on a busy cluster is quite rare.) Do you want a reproducible case? |
@arianitu A reproducible case would be awesome! |
@arianitu Checking in to see if you were able to get us a reproducible case for this? |
@ashish-amarnath still working on this, we're a bit busy, but this issue is a high priority for us so we hope to get a case soon. |
@arianitu I am going to close this issue. Please re-open once you are able to provide a repro case. |
@ashish-amarnath @nrb Please re-open, reproduction case below: 1. What is the problem? Velero does not use ClaimRef with the cloned PV created by Please see kubernetes/kubernetes#23615 (comment) Original Namespace
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc
namespace: clone-test-a
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: ssd
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: clone-test-a
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.19.3
ports:
- containerPort: 80
volumeMounts:
- name: nginx-data
mountPath: "/var/www/html"
volumes:
- name: nginx-data
persistentVolumeClaim:
claimName: nginx-pvc
Do a namespace clone
apiVersion: v1
kind: PersistentVolume
metadata:
name: this-should-not-match
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
storageClassName: ssd
gcePersistentDisk:
fsType: ext4
pdName: velero-should-not-match
persistentVolumeReclaimPolicy: Delete Create a random namespace that is unrelated to the above (assume this is happening in the cluster at the same time as the restore is happening where the above PV was restored by velero, but the PVC has not yet been created in clone-test-b)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc
namespace: unrelated-namespace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: ssd
this-should-not-match 1Gi RWO Delete Bound unrelated-namespace/nginx-pvc ssd 33s Solution and test
apiVersion: v1
kind: PersistentVolume
metadata:
name: this-does-not-match
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
storageClassName: ssd
gcePersistentDisk:
fsType: ext4
pdName: velero-does-not-match
persistentVolumeReclaimPolicy: Delete
claimRef:
name: nginx-pvc
namespace: clone-test-b
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc
namespace: unrelated-namespace-2
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: ssd
kubectl get pv this-does-not-match 1Gi RWO Delete Available clone-test-b/nginx-pvc ssd 41s This PVC got a brand new PV which is the correct behaviour: NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-pvc Bound pvc-e15b95f6-0a4f-11eb-9a60-42010a8e0076 1Gi RWO ssd 25s |
Also, this bug is caused here: https://github.com/vmware-tanzu/velero/blob/master/pkg/restore/restore.go#L891
In this case, this function calls executePVAction which does:
The claimRef is deleted here, but is never added back. I think maybe under https://github.com/vmware-tanzu/velero/blob/master/pkg/restore/restore.go#L864 we should add oldClaimRefName
then under https://github.com/vmware-tanzu/velero/blob/master/pkg/restore/restore.go#L864 we should set claimRef.namespace and claimRef.name accordingly.
Something along these lines. |
@arianitu Thanks for the detailed description of the issue and the proposed fix. The main thing that needs to happen is the UID needs to be cleared out of the ClaimRef, but the namespace and name can be kept in most instances. In the case that a namespace is being remapped, the namespaces inside a ClaimRef will need to be updated of course. The UID is removed because this is what makes any object unique within the running cluster, even if all other information is the same. Upon restore, the UID of the target PVC to rebind with will not be the same if the PVC was remade, so it needs to be removed in order to ensure that the PVC and PV are linked. As you showed in your example, it's not necessary in the Spec, and is usually used to indicate successful binding since it was an early Kubernetes type and best practices hadn't emerged by then. I'm working on a fix that does this now, I hope to have it available for review tomorrow. I plan to get this in for v1.5.2. |
What steps did you take and what happened:
We schedule a backup when creating our development environment. After the environment is created, velero somehow replaces the new PVC with a cloned PVC instead of keeping the new PVC.
This is a huge issue for us since we do not want PVCs randomly being attached. I am unsure why it's doing it, we are not issuing any restore commands. The only command I see that we are issuing is a backup schedule.
At first I thought maybe our ingress was just serving the wrong environment, but when I run
kubetl get pvc
on the new environment, I see velero-clone in my PVC names:kubectl get pvc -n wp-bf083a8f-1ac7-416c-bac5-234f586c4440
I wanted to check the restore logs, but
velero restore get
is returning nothing for some reason. Is there another way to get a list of restores and their logs? Maybe I'm running into a bug?What did you expect to happen:
I did not expect a backup schedule to trigger a PVC clone. I expected the backup schedule to just create a backup as usual.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
https://pastebin.com/Z1MgNT20
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
https://pastebin.com/sVuiWjGk
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
Cannot describe restore. Velero restore get is not showing any entries.
velero restore logs <restorename>
Cannot get restore logs. Velero restore get is not showing any entries.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
):velero client config get features
):kubectl version
):Kubernetes installer & version:
GKE (Google Cloud)
Cloud provider or hardware configuration:
Google Cloud
OS (e.g. from
/etc/os-release
):The text was updated successfully, but these errors were encountered: