Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom groups support to upgrade run resource #1191

Merged
merged 3 commits into from
Jun 6, 2024

Conversation

ksamoray
Copy link
Collaborator

@ksamoray ksamoray commented Apr 18, 2024

Add the logic to create and update custom host groups while running the upgrade process.

@ksamoray ksamoray force-pushed the upgrade_groups branch 2 times, most recently from 55d6452 to 67fdfd9 Compare April 18, 2024 19:11
Optional: true,
Default: true,
},
"extended_configuration": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse schema for key/value pairs, we already have a func getSwitchingProfileIdsSchema, we might want to rename it and generalize the description. Also advanced_configuration of edge transport node might benefit from this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching profile uses a different schema than edge transport node and upgrade - SwitchingProfileTypeIdEntry vs KeyValuePair.
I'll consolidate edge and upgrade and leave switching profiles as is.

return fmt.Errorf("host %s already exists in group %s", hostID, groupID)
}
group.UpgradeUnits = append(group.UpgradeUnits, model.UpgradeUnit{Id: &hostID})
_, err = client.Update(groupID, group)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel simultaneous updates might cause errors here, so we might need to wrap this code (including Get) in retry. Auto-retry that we have today will not re-read the revision, so Update would fail again if revision got updated. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense. Do we have any example for such retry mechanism in the code?
BTW it seems like the _revision attribute is ignored by NSX - I'll open a bug for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I've found something.

@ksamoray ksamoray force-pushed the upgrade_groups branch 2 times, most recently from af8f4a4 to b9f3123 Compare April 21, 2024 14:11
@ksamoray
Copy link
Collaborator Author

/test-all

1 similar comment
@ksamoray
Copy link
Collaborator Author

/test-all

@ksamoray ksamoray changed the title Implement upgrade groups resources Add custom groups support to upgrade run resource May 9, 2024
@ksamoray ksamoray force-pushed the upgrade_groups branch 3 times, most recently from 8aeb442 to f57ef48 Compare May 9, 2024 11:56
@ksamoray
Copy link
Collaborator Author

ksamoray commented May 9, 2024

/test-all

@ksamoray
Copy link
Collaborator Author

ksamoray commented May 9, 2024

/test-all

@ksamoray
Copy link
Collaborator Author

ksamoray commented May 9, 2024

/test-all

return fmt.Errorf("couldn't find upgrade unit group without id or display_name")
}
// This is a custom group, try to find it by name
groupList, err := upgradeClientSet.GroupClient.List(nil, nil, nil, nil, nil, nil, nil, nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is pagination relevant here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect more than 1000 groups?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't sound feasible, but maybe a comment about it would be helpful

var err error
isCreate := false
if groupID == "" {
groupName := group["display_name"].(string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the user change display_name for existing group?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - this isn't doable as the display_name is used to identify a pre-existing custom group, and as the upgrade support doesn't maintain/support state.
If the upgrade implementation would support state, we could identify the group by id and update the group.

} else if err != nil {
return handleDeleteError("Host Upgrade Group Binding", nsxUnit, err)
}
return addHostToGroup(m, groupID, nsxUnit, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the return here intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bug...


hostIDs := getUnitIDsFromUnits(group.UpgradeUnits)
if slices.Contains(hostIDs, hostID) {
return fmt.Errorf("host %s already exists in group %s", hostID, groupID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we swallow this error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks

// it should be assigned to the 'Group 1 for ESXI' group (this value is hardcoded in NSX)
groupClient := upgrade.NewUpgradeUnitGroupsClient(connector)
componentType := "HOST"
hostGroups, err := groupClient.List(&componentType, nil, nil, nil, nil, nil, nil, nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wasteful to run this API for each getHostDefaultUpgradeGroup call, can the logic be change to run it once? We only care for default groups here, and default groups are not expected to change throughout the update operation, is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, load the entire host/group map and cache it for the entire execution?
As the "original" upgrade group for each host could be different according to cluster membership (or lack of cluster association).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only run the API once and use the list for all calls of this function

@annakhm
Copy link
Collaborator

annakhm commented May 13, 2024

I didn't see any update in setUpgradeRunOutput function (sorry if I missed it), but looks like custom group needs to be updated with its new attributes

@ksamoray
Copy link
Collaborator Author

I didn't see any update in setUpgradeRunOutput function (sorry if I missed it), but looks like custom group needs to be updated with its new attributes

Yeah I missed this. Although as we do not support the state existence, that's a bit meaningless for now.

@ksamoray
Copy link
Collaborator Author

I didn't see any update in setUpgradeRunOutput function (sorry if I missed it), but looks like custom group needs to be updated with its new attributes

But this is only the status type - which doesn't have the extra group attributes. So I'm not sure that a change is required here.

@annakhm
Copy link
Collaborator

annakhm commented May 17, 2024

I didn't see any update in setUpgradeRunOutput function (sorry if I missed it), but looks like custom group needs to be updated with its new attributes

Yeah I missed this. Although as we do not support the state existence, that's a bit meaningless for now.

I think that although we ask to remove state post-upgrade, terraform plan should not produce non-empty diff after upgrade. is this true today?

@annakhm
Copy link
Collaborator

annakhm commented May 23, 2024

LGTM, except for two doubts:

  • after upgrade is complete and there are no new changes in config, would terraform plan show non-empty diff?
  • comment below regarding deleted custom group

@@ -503,9 +523,83 @@ func updateUpgradeUnitGroups(upgradeClientSet *upgradeClientSet, d *schema.Resou
for _, groupI := range d.Get(componentToGroupKey[component]).([]interface{}) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the code doesn't delete custom groups that were created with previous apply (which might have failed) but are not present in current config, is that correct?
To solve this, d.GetChange("host_group") might be useful

Add the logic to create and update custom host groups while running the
upgrade process.

Signed-off-by: Kobi Samoray <kobi.samoray@broadcom.com>
Signed-off-by: Kobi Samoray <kobi.samoray@broadcom.com>
@ksamoray
Copy link
Collaborator Author

Terraform doesn't show an empty diff, also without these changes as far as I understand (there are dependencies on computed groups within the upgrade process - I think that this is the reason?).
As for group deletion: it's possible to check the groups for empty groups (with no upgrade units) which do not match the "predefined group pattern" (either have the id of a compute collection, or that hardcoded name which NSX uses for non-clustered hosts). These can be deleted I guess. Is that what you had in mind?

@annakhm
Copy link
Collaborator

annakhm commented May 28, 2024

Terraform doesn't show an empty diff, also without these changes as far as I understand (there are dependencies on computed groups within the upgrade process - I think that this is the reason?). As for group deletion: it's possible to check the groups for empty groups (with no upgrade units) which do not match the "predefined group pattern" (either have the id of a compute collection, or that hardcoded name which NSX uses for non-clustered hosts). These can be deleted I guess. Is that what you had in mind?

I would suggest to rely on GetChange routing to check if groups are explicitly deleted + deleting groups that we know are recreated as a result of update.

@ksamoray ksamoray force-pushed the upgrade_groups branch 2 times, most recently from 91ff29b to f71f3d6 Compare June 4, 2024 08:44
When a host upgrade group is deleted, clean it from NSX.

Signed-off-by: Kobi Samoray <kobi.samoray@broadcom.com>
@ksamoray
Copy link
Collaborator Author

ksamoray commented Jun 4, 2024

/test-all

@ksamoray ksamoray merged commit b9a7d1f into vmware:master Jun 6, 2024
8 checks passed
@ksamoray ksamoray deleted the upgrade_groups branch June 6, 2024 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants