Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Use namespace and name to match PVB to Pod restore #3051

Merged
merged 1 commit into from
Nov 10, 2020

Conversation

ashish-amarnath
Copy link
Member

@ashish-amarnath ashish-amarnath commented Nov 2, 2020

Signed-off-by: Ashish Amarnath ashisham@vmware.com

Fixes: #3027
Fixes: #2691
Fixes: #2956

@zubron Thanks for the debugging and repro example.

@ashish-amarnath
Copy link
Member Author

Here is the sample workload, from Bridget, that gives us a repro https://gist.github.com/zubron/dce63c10b5aea026f988b8a14d2934c5#file-app-yaml

Copy link
Contributor

@zubron zubron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, @ashish-amarnath! Just a couple of minor suggestions in test names but is otherwise good. Also looks like some other restore tests need to be updated following this change.

pkg/restic/common_test.go Outdated Show resolved Hide resolved
pkg/restic/common_test.go Outdated Show resolved Hide resolved
@zubron
Copy link
Contributor

zubron commented Nov 2, 2020

This will also fix #2691 and #2956, I looked at those two issues again and they also have the same problem of using the same pod name across multiple namespaces.

@nrb
Copy link
Contributor

nrb commented Nov 2, 2020

Could I get a summary of what the issue was? Were PVBs being assigned to the wrong pod?

@zubron
Copy link
Contributor

zubron commented Nov 2, 2020

@nrb I wrote up my findings for this issue in these two comments: #3027 (comment) and #3027 (comment)

@ashish-amarnath
Copy link
Member Author

To summarize the bug here.
Velero uses only the pod's name to find its corresponding podvolumebackups. When there are pods with the same name but in different namespaces, the wrong podvolume backup is matched with the wrong pod.

The fix here is to use both the pod name and its namespace to find the podvolumebackups that pertain to the pod.

Looking at the CI failures now.

Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
Copy link
Contributor

@zubron zubron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nrb nrb merged commit dc6762a into vmware-tanzu:main Nov 10, 2020
georgettica pushed a commit to georgettica/velero that referenced this pull request Dec 23, 2020
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
zubron pushed a commit that referenced this pull request Jan 14, 2021
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
georgettica pushed a commit to georgettica/velero that referenced this pull request Jan 26, 2021
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
vadasambar pushed a commit to vadasambar/velero that referenced this pull request Feb 3, 2021
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Feb 18, 2021
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Feb 18, 2021
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Feb 18, 2021
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
ashish-amarnath pushed a commit that referenced this pull request Feb 22, 2021
* Use pod namespace from backup when matching PVBs

In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes #3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Mar 31, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Mar 31, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
carlisia pushed a commit that referenced this pull request Apr 1, 2021
* Restore CAPI cluster objects in a better order

Restoring CAPI workload clusters without this ordering caused the
capi-controller-manager code to panic, resulting in an unhealthy cluster
state.

This can be worked around
(https://community.pivotal.io/s/article/5000e00001pJyN41611954332537?language=en_US),
but we provide the inclusion of these resources as a default in order to
provide a better out-of-the-box experience.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Add changelog

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Use pod namespace from backup when matching PVBs (#3475)

* Use pod namespace from backup when matching PVBs

In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes #3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Allow Dockerfiles to be configurable (#3634)

For internal builds of Velero, we need to be able to specify an
alternative Dockerfile which uses an alternative image registry to pull
the base images from. This change adapts our Makefile such that both the
main Dockerfile and build image Dockerfile can be overridden.

We have some special handling for the build image to only build when the
Dockerfile has changed. In this case, we check whether a custom
Dockerfile has been provided, and always rebuild in that case. For
custom build image Dockerfiles, use a fixed tag rather than the one
based on commit SHA of the original file.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Combine CRD install verification into 1 job, and update k8s versions (#3448)

* Validate CRDs against latest Kubernetes versions

Add Kubernetes v1.19 and v1.20 series images, and consolidate the job
into a single file to reduce repetition.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Ignore job if the changes are only site/design

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix codespell error

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Cache Velero binary for reuse on workers

This will cache the Velero binary based on the PR number and a SHA256 of
the generated binary.

This way, the runners testing each version of Kubernetes do not need to
build it independently.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix GitHub event access

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Wrap output path in quotes

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Move code checkout to build step

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Also cache go modules

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix syntax issues

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Download cached binary on each node

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Use cached go modules on main CI

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Add changelog for v1.5.4

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

Co-authored-by: Nolan Brubaker <brubakern@vmware.com>
dharmab pushed a commit to dharmab/velero that referenced this pull request May 25, 2021
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
dharmab pushed a commit to dharmab/velero that referenced this pull request May 25, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jun 29, 2021
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jun 29, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
Signed-off-by: Ashish Amarnath <ashisham@vmware.com>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants