Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pod namespace from backup when matching PVBs #3475

Conversation

zubron
Copy link
Contributor

@zubron zubron commented Feb 18, 2021

Please add a summary of your change

In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Signed-off-by: Bridget McErlean bmcerlean@vmware.com

Does your change fix a particular issue?

Fixes #3467

Please indicate you've done the following:

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
@zubron zubron force-pushed the use-original-namespace-for-pvb-matching-3467 branch from 77b9ea1 to 77c2bd7 Compare February 18, 2021 16:18
@zubron zubron marked this pull request as ready for review February 18, 2021 16:20
@zubron zubron added P1 - Important kind/release-blocker Must fix issues for the coming release (milestone) labels Feb 18, 2021
@zubron zubron added this to In progress in v1.6.0 via automation Feb 18, 2021
@zubron zubron added this to the v1.6.0 milestone Feb 18, 2021
Copy link
Member

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 79 to 83
var sourcePod corev1.Pod
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(input.ItemFromBackup.UnstructuredContent(), &sourcePod); err != nil {
return nil, errors.Wrap(err, "unable to convert source pod from runtime.Unstructured")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, we could have also reverse looked up the RestoreItemActionExecuteInput.Restore.Spec.NamespaceMapping. But this is also fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing that but there was actually an issue with that approach. At the point when this action's Execute function is called in restoreItem, the namespace mapping hasn't yet been applied to the object being restored. That doesn't happen until later in the function. This means that, under the current implementation, both pod and sourcePod have the same namespace when this is called. Rather than assuming they would be the same, I opted to look at the item from the backup. It seemed too risky for this change to change when the namespace mapping would be applied in restoreItem, but checking the pod from the backup also prevents this issue occurring again if when we do change when the mapping is applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a comment though to explain why we can't rely on the namespace mapping from the restore spec. Thanks for reminding me about why I couldn't use that approach! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zubron Not sure I follow. The NamespaceMapping that I was referring to is a field on the restore object spec. This field is set using the value of the namespace-mappings flag that is passed at the CLI running velero restore create command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @zubron.
Just capturing our discussion here. Changes at line79-82 weren't necessary and we could have used pod.Namespace at https://github.com/vmware-tanzu/velero/pull/3475/files#diff-9f110ab28d31cd60e15e9ec43ad813f22bd8b9329eaf0ebe70c0b6590592b76dR100
and added unit tests to catch any future code reordering that may break. The changes at line79-82 is future proofing.

v1.6.0 automation moved this from In progress to Review in progress Feb 20, 2021
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
@zubron zubron force-pushed the use-original-namespace-for-pvb-matching-3467 branch from dbd3d54 to 0d56725 Compare February 20, 2021 16:57
@nrb
Copy link
Contributor

nrb commented Feb 22, 2021

Before reviewing, I think this has the potential for backporting so that v1.5.x users can benefit from the fix.

Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the added test and the comment about why we can't do reverse lookup!

v1.6.0 automation moved this from Review in progress to Reviewer approved Feb 22, 2021
@ashish-amarnath ashish-amarnath merged commit 0246a32 into vmware-tanzu:main Feb 22, 2021
v1.6.0 automation moved this from Reviewer approved to Done Feb 22, 2021
zubron added a commit to zubron/velero that referenced this pull request Mar 31, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
zubron added a commit to zubron/velero that referenced this pull request Mar 31, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
carlisia pushed a commit that referenced this pull request Apr 1, 2021
* Restore CAPI cluster objects in a better order

Restoring CAPI workload clusters without this ordering caused the
capi-controller-manager code to panic, resulting in an unhealthy cluster
state.

This can be worked around
(https://community.pivotal.io/s/article/5000e00001pJyN41611954332537?language=en_US),
but we provide the inclusion of these resources as a default in order to
provide a better out-of-the-box experience.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Add changelog

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Use pod namespace from backup when matching PVBs (#3475)

* Use pod namespace from backup when matching PVBs

In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes #3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Allow Dockerfiles to be configurable (#3634)

For internal builds of Velero, we need to be able to specify an
alternative Dockerfile which uses an alternative image registry to pull
the base images from. This change adapts our Makefile such that both the
main Dockerfile and build image Dockerfile can be overridden.

We have some special handling for the build image to only build when the
Dockerfile has changed. In this case, we check whether a custom
Dockerfile has been provided, and always rebuild in that case. For
custom build image Dockerfiles, use a fixed tag rather than the one
based on commit SHA of the original file.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Combine CRD install verification into 1 job, and update k8s versions (#3448)

* Validate CRDs against latest Kubernetes versions

Add Kubernetes v1.19 and v1.20 series images, and consolidate the job
into a single file to reduce repetition.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Ignore job if the changes are only site/design

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix codespell error

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Cache Velero binary for reuse on workers

This will cache the Velero binary based on the PR number and a SHA256 of
the generated binary.

This way, the runners testing each version of Kubernetes do not need to
build it independently.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix GitHub event access

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Wrap output path in quotes

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Move code checkout to build step

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Also cache go modules

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Fix syntax issues

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Download cached binary on each node

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Use cached go modules on main CI

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

* Add changelog for v1.5.4

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

Co-authored-by: Nolan Brubaker <brubakern@vmware.com>
dharmab pushed a commit to dharmab/velero that referenced this pull request May 25, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jun 29, 2021
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
* Use pod namespace from backup when matching PVBs

In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.

To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.

Fixes vmware-tanzu#3467.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Explain why the namespace mapping can't be used

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-changelog has-unit-tests kind/release-blocker Must fix issues for the coming release (milestone)
Projects
No open projects
v1.6.0
  
Done
Development

Successfully merging this pull request may close these issues.

Restic restores failing when using namespace mapping
3 participants