Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use appropriate CRD API during readiness check #4015

Merged

Conversation

zubron
Copy link
Contributor

@zubron zubron commented Aug 5, 2021

Please add a summary of your change

The readiness check for the Velero CRDs was still using the v1beta1 API.
This would cause the readiness check to fail on 1.22 clusters as the
v1beta1 API is no longer available. Previously, this error would be
ignored and the installation would proceed, however with #4002, we are
no longer ignoring errors from this check.

This change modifies the CRD readiness check to check the CRDs using the
same API version that was used when submitting the CRDs to the cluster.
It also introduces a new CRD builder using the V1 API for testing.

This change also fixes a bug that was identified in the polling code
where if the CRDs were not ready on the first polling iteration, they
would be added again to the list of CRDs to check resulting in
duplicates. This would cause the length check to fail on all subsequent
polls and the timeout would always be reached.

Signed-off-by: Bridget McErlean bmcerlean@vmware.com

Please indicate you've done the following:

@zubron zubron force-pushed the use-appropriate-crd-version-for-install-wait branch 2 times, most recently from 753d1ba to 5f0cd04 Compare August 5, 2021 14:39
The readiness check for the Velero CRDs was still using the v1beta1 API.
This would cause the readiness check to fail on 1.22 clusters as the
v1beta1 API is no longer available. Previously, this error would be
ignored and the installation would proceed, however with vmware-tanzu#4002, we are
no longer ignoring errors from this check.

This change modifies the CRD readiness check to check the CRDs using the
same API version that was used when submitting the CRDs to the cluster.
It also introduces a new CRD builder using the V1 API for testing.

This change also fixes a bug that was identified in the polling code
where if the CRDs were not ready on the first polling iteration, they
would be added again to the list of CRDs to check resulting in
duplicates. This would cause the length check to fail on all subsequent
polls and the timeout would always be reached.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
sseago
sseago previously approved these changes Aug 5, 2021
Copy link
Collaborator

@sseago sseago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment on something in code comments before a func. Not a merge blocker, but the update would make things a bit clearer.

@@ -161,8 +177,8 @@ func IsCRDReady(crd *apiextv1beta1.CustomResourceDefinition) bool {
}

// IsUnstructuredCRDReady checks an unstructured CRD to see if it's ready, with both the Established and NamesAccepted conditions.
// TODO: Delete this function and use IsCRDReady when the upstream runtime.FromUnstructured function properly handles int64 field conversions.
// Duplicated function because the velero install package uses IsCRDReady with the beta types.
// TODO: Delete this function and use Isv1beta1CRDReady when the upstream runtime.FromUnstructured function properly handles int64 field conversions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore code which uses IsUnstructuredCRDReady doesn't really need to deal with v1/v1beta1 issues right now. I'm guessing that by the time this code is refectored (if it is ever refactored) we'll no longer be supporting v1beta1, so the above (and below) lines should probably reference IsV1CRDReady instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was a find/replace gone wrong :) I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked yesterday, and the upstream fix is available in the v0.20.0 version of the API libraries. Once we upgrade those, a unit test which checks for the upstream error will fail which might encourage the refactoring at that stage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. OK. If we choose to do that soon we'll need to deal with v1 vs v1beta1 issues in the refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment, but I think there is quite a bit of history surrounding this issue which I'm not fully aware of. Let me know what you think and I can change it again if you think it could be clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. The issue was that the typed checks were failing with CRDs which included int64 fields. Velero CRDs don't use those, so the installer wasn't hitting the problem, but restoring backups which included CRDs with these fields was failing. The workaround was to use the unstructured fields. I don't know whether it really matters enough at this point to go back and refactor the restore CRD ready checks to use the v1/v1beta1 functions, but if we do we just need to make sure that the int64 issue is gone by testing backup/restore with a CRD which uses this sort of field.

jenting
jenting previously approved these changes Aug 5, 2021
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
@zubron zubron dismissed stale reviews from jenting and sseago via 822c317 August 5, 2021 15:17
@zubron zubron requested review from sseago and jenting August 5, 2021 15:20
@zubron zubron merged commit d98c65f into vmware-tanzu:main Aug 5, 2021
@zubron zubron added this to the v1.6.3 milestone Aug 5, 2021
zubron added a commit to zubron/velero that referenced this pull request Aug 6, 2021
* Use appropriate CRD API during readiness check

The readiness check for the Velero CRDs was still using the v1beta1 API.
This would cause the readiness check to fail on 1.22 clusters as the
v1beta1 API is no longer available. Previously, this error would be
ignored and the installation would proceed, however with vmware-tanzu#4002, we are
no longer ignoring errors from this check.

This change modifies the CRD readiness check to check the CRDs using the
same API version that was used when submitting the CRDs to the cluster.
It also introduces a new CRD builder using the V1 API for testing.

This change also fixes a bug that was identified in the polling code
where if the CRDs were not ready on the first polling iteration, they
would be added again to the list of CRDs to check resulting in
duplicates. This would cause the length check to fail on all subsequent
polls and the timeout would always be reached.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Remove duplicate V1 CRD builder and update comment

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
jenting pushed a commit that referenced this pull request Aug 6, 2021
* Use appropriate CRD API during readiness check (#4015)

* Use appropriate CRD API during readiness check

The readiness check for the Velero CRDs was still using the v1beta1 API.
This would cause the readiness check to fail on 1.22 clusters as the
v1beta1 API is no longer available. Previously, this error would be
ignored and the installation would proceed, however with #4002, we are
no longer ignoring errors from this check.

This change modifies the CRD readiness check to check the CRDs using the
same API version that was used when submitting the CRDs to the cluster.
It also introduces a new CRD builder using the V1 API for testing.

This change also fixes a bug that was identified in the polling code
where if the CRDs were not ready on the first polling iteration, they
would be added again to the list of CRDs to check resulting in
duplicates. This would cause the length check to fail on all subsequent
polls and the timeout would always be reached.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Remove duplicate V1 CRD builder and update comment

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Merge pull request #4012 from jenting/add-k8s-1.22-ci-test

Add Kubernetes v1.22 CI test

* Update changelog for v1.6.3

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

Co-authored-by: Scott Seago <sseago@redhat.com>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
* Use appropriate CRD API during readiness check

The readiness check for the Velero CRDs was still using the v1beta1 API.
This would cause the readiness check to fail on 1.22 clusters as the
v1beta1 API is no longer available. Previously, this error would be
ignored and the installation would proceed, however with vmware-tanzu#4002, we are
no longer ignoring errors from this check.

This change modifies the CRD readiness check to check the CRDs using the
same API version that was used when submitting the CRDs to the cluster.
It also introduces a new CRD builder using the V1 API for testing.

This change also fixes a bug that was identified in the polling code
where if the CRDs were not ready on the first polling iteration, they
would be added again to the list of CRDs to check resulting in
duplicates. This would cause the length check to fail on all subsequent
polls and the timeout would always be reached.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>

* Remove duplicate V1 CRD builder and update comment

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants