Design for data mover node selection #7383

Lyndon-Li · 2024-02-05T02:32:53Z

Add the design for node selection for data mover backup

codecov · 2024-02-05T02:44:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.62%. Comparing base (08a020e) to head (2f9d8ae).
Report is 52 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7383      +/-   ##
==========================================
- Coverage   61.75%   61.62%   -0.13%     
==========================================
  Files         262      263       +1     
  Lines       28433    28681     +248     
==========================================
+ Hits        17558    17675     +117     
- Misses       9643     9758     +115     
- Partials     1232     1248      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

design/node-agent-affinity.md

reasonerjt · 2024-03-11T03:13:58Z

design/node-agent-affinity.md

+As mentioned in the [Volume Snapshot Data Movement Design][2], the exposer decides where to launch the VGDP instances. At present, for volume snapshot data movement backups, the exposer creates backupPods and the VGDP instances will be initiated in the nodes where backupPods are scheduled. So the loadAffinity will be translated (from `metav1.LabelSelector` to `corev1.Affinity`) and set to the backupPods.  
+
+It is possible that node-agent pods, as a daemonset, don't run in every worker nodes, users could fulfil this by specify `nodeSelector` or `nodeAffinity` to the node-agent daemonset spec. On the other hand, at present, VGDP instances must be assigned to nodes where node-agent pods are running. Therefore, if there is any node selection for node-agent pods, users must consider this into this load affinity configuration, so as to guarantee that VGDP instances are always assigned to nodes where node-agent pods are available. This is done by users, we don't inherit any node selection configuration from node-agent daemonset as we think daemonset scheduler works differently from plain pod scheduler, simply inheriting all the configurations may cause unexpected result of backupPod schedule.    
+Otherwise, if a backupPod are scheduled to a node where node-agent pod is absent, the corresponding DataUpload CR will stay in `Accepted` phase until the prepare timeout (by default 30min).  


Let me clarify, this is a possible situation in v1.13, right?

IMO it's possible to take the node-selector in node-agent spec when scheduling the bakcupPod?

Yes, it is possible because users could modify node-agent's yaml and add any configurations affecting the schedule, i.e., topologies, affinities/anti-affinities, node selectors, etc.

Node-agent's node selection is done by Kubernetes' scheduler, the only way to control its behavior is to modify the node-agent's yaml; on the other hand, the data mover node-selection configuration is in a configMap that is detected after node-agent starts.
Therefore, if we reflect node-selection configuration into node-agent scheduling, we must dynamically edit node-agent's yaml after node-agent starts, so this causes node-agent restarts one more time.
Moreover, users may be surprised when node-agent restarts once more because they may not have realized their node-selection configuration has affected the node-agent scheduling.

Moreover, it is possible that node-agent pod cannot run in a specific node for sure, if we change the node-agent spec, things still doesn't work.

Therefore, users must know the relationship of node-agent scheduling and node-selection in either case. So we'd better have user realize this from the beginning and it is easy for them to make two correct configurations for node-agent spec and node-selection configuration.

OK let's make sure this is covered in documentation.

design/node-agent-affinity.md

reasonerjt · 2024-03-11T03:20:05Z

design/node-agent-affinity.md

+It is possible that node-agent pods, as a daemonset, don't run in every worker nodes, users could fulfil this by specify `nodeSelector` or `nodeAffinity` to the node-agent daemonset spec. On the other hand, at present, VGDP instances must be assigned to nodes where node-agent pods are running. Therefore, if there is any node selection for node-agent pods, users must consider this into this load affinity configuration, so as to guarantee that VGDP instances are always assigned to nodes where node-agent pods are available. This is done by users, we don't inherit any node selection configuration from node-agent daemonset as we think daemonset scheduler works differently from plain pod scheduler, simply inheriting all the configurations may cause unexpected result of backupPod schedule.    
+Otherwise, if a backupPod are scheduled to a node where node-agent pod is absent, the corresponding DataUpload CR will stay in `Accepted` phase until the prepare timeout (by default 30min).  
+
+At present, as part of the expose operations, the exposer creates a volume, represented by backupPVC, from the snapshot. The backupPVC uses the same storageClass with the source volume. If the `volumeBindingMode` in the storageClass is `Immediate`, the volume is immediately allocated from the underlying storage without waiting for the backupPod. On the other hand, the loadAffinity is set to the backupPod's affinity. If the backupPod is scheduled to a node where the snapshot volume is not accessible, e.g., because of storage topologies, the backupPod won't get into Running state, concequently, the data movement won't complete.  


I think we can explicitly document that when the storageclass has the BindingMode as "Immediate", the user SHOULD NOT set node selector for data mover

We can document this, but probably, we just need to tell users to be careful to set node-selection for Immediate volumes, as not all volumes have the constraints like topologies. In the envs with no constraints, node-selection works well with Immediate volumes

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>

blackpiglet · 2024-03-18T02:35:25Z

design/node-agent-affinity.md

+
+There is a common solution for the both problems:
+- We have an existing logic to periodically enqueue the dataupload CRs which are in the `Accepted` phase for timeout and cancel checks
+- We add a new logic to this existing logic to check if the corresponding backupPods are in unrecoverable status


Could we give more information about the definition of the unrecoverable state?
It's better to let the user know in which condition the DataUpload cancelation or failure is expected.

This Unrecoverable status check is an existing logic, here we add a further check in this existing logic.
At present, we don't have this check included in the doc, we can add it and include all the checks for backupPod/restorePod's Unrecoverable status.

Lyndon-Li · 2024-03-18T03:00:21Z

There are several comments covering requests for document. We will create a separate PR to add a document for node-selection. We will cover all the requests in that PR later.

github-actions bot added the Area/Design Design Documents label Feb 5, 2024

Lyndon-Li force-pushed the data-mover-node-selection branch from 49a27df to af76a48 Compare February 5, 2024 02:35

github-actions bot added the has-changelog label Feb 5, 2024

Lyndon-Li force-pushed the data-mover-node-selection branch from af76a48 to 268039e Compare February 5, 2024 02:42

Lyndon-Li marked this pull request as ready for review February 5, 2024 02:43

github-actions bot assigned Lyndon-Li Feb 5, 2024

github-actions bot requested review from anshulahuja98 and ywk253100 February 5, 2024 02:43

Lyndon-Li requested review from reasonerjt and sseago February 5, 2024 02:43

Lyndon-Li requested review from shubham-pampattiwar, blackpiglet, qiuming-best and allenxu404 February 5, 2024 02:44

Lyndon-Li mentioned this pull request Feb 5, 2024

Data mover backup node black list - Don't run in specified node #7036

Closed

ywk253100 reviewed Feb 5, 2024

View reviewed changes

design/node-agent-affinity.md Outdated Show resolved Hide resolved

Lyndon-Li force-pushed the data-mover-node-selection branch 2 times, most recently from 6ca7f2e to a3ba141 Compare February 5, 2024 10:41

kaovilai mentioned this pull request Feb 5, 2024

DataMover - datauploads and datadownloads resources aren't distributed equally among the workers. #6734

Open

ywk253100 previously approved these changes Feb 27, 2024

View reviewed changes

blackpiglet previously approved these changes Feb 27, 2024

View reviewed changes

sseago previously approved these changes Feb 27, 2024

View reviewed changes

reasonerjt reviewed Mar 11, 2024

View reviewed changes

design for data mover node selection

2f9d8ae

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>

Lyndon-Li dismissed stale reviews from sseago, blackpiglet, and ywk253100 via 2f9d8ae March 14, 2024 01:55

Lyndon-Li force-pushed the data-mover-node-selection branch from a3ba141 to 2f9d8ae Compare March 14, 2024 01:55

reasonerjt approved these changes Mar 18, 2024

View reviewed changes

blackpiglet reviewed Mar 18, 2024

View reviewed changes

blackpiglet approved these changes Mar 18, 2024

View reviewed changes

Lyndon-Li merged commit 6ec1701 into vmware-tanzu:main Mar 18, 2024
65 of 66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design for data mover node selection #7383

Design for data mover node selection #7383

Lyndon-Li commented Feb 5, 2024

codecov bot commented Feb 5, 2024 •

edited

Loading

reasonerjt Mar 11, 2024

Lyndon-Li Mar 11, 2024

reasonerjt Mar 11, 2024

reasonerjt Mar 11, 2024

Lyndon-Li Mar 11, 2024

blackpiglet Mar 18, 2024

Lyndon-Li Mar 18, 2024

Lyndon-Li commented Mar 18, 2024

Design for data mover node selection #7383

Design for data mover node selection #7383

Conversation

Lyndon-Li commented Feb 5, 2024

codecov bot commented Feb 5, 2024 • edited Loading

Codecov Report

reasonerjt Mar 11, 2024

Choose a reason for hiding this comment

Lyndon-Li Mar 11, 2024

Choose a reason for hiding this comment

reasonerjt Mar 11, 2024

Choose a reason for hiding this comment

reasonerjt Mar 11, 2024

Choose a reason for hiding this comment

Lyndon-Li Mar 11, 2024

Choose a reason for hiding this comment

blackpiglet Mar 18, 2024

Choose a reason for hiding this comment

Lyndon-Li Mar 18, 2024

Choose a reason for hiding this comment

Lyndon-Li commented Mar 18, 2024

codecov bot commented Feb 5, 2024 •

edited

Loading