Make ESX Ops idempotent #1084

pdhamdhere · 2017-03-24T23:19:27Z

Make ESX Ops idempotent to tolerate plugin retries in case of VMCI communication errors.

Testing Done: Changed Plugin code to retry every operation by injecting failure on 1st attempt. And then tried docker volume create/remove and docker run.

Fixes #1076

shuklanirdesh82 · 2017-03-25T00:36:25Z

I guess few tests needs to be fixed as well.

shuklanirdesh82 · 2017-03-25T00:37:34Z

esx_service/vmdk_ops.py

+       # Return success since disk is anyway not attached
+       logging.warning("*** Detach disk={0} not found. VM={1}".format(
+                       vmdk_path, vm.config.uuid))
+       return None


this is where some tests failed. Please add affected tests as part of this PR.

Fixed in latest commit.

Test is also failing that the volume is in use when remove is issued without a detach. That path is not addressable as we can;t auto detach the volume. Or can we?

Failing tests are fixed. Auto-detach is outside scope of this change.

govint · 2017-03-25T03:17:06Z

esx_service/vmdk_ops.py

-        return err("File %s already exists" % vmdk_path)
+        # We are mostly here due to race or Plugin VMCI retry #1076
+        logging.warning("File %s already exists", vmdk_path)
+        return None


Lets at least check that the volume opts in the KV match whats provided. Two create requests with say different VM sizes should be caught, unless the two volumes would be having the exact same properties.

Good point. Actually if they are different we may want to reject the request. It's not really idempotent if requests are different :-)

I think we should go ahead and merge (after fixing the tests) and open an issue to fix create with diff.params - it is super rear now (actually, never)

Filed a new issue #1089. Yes, it's a rare case where Volume Create is requested from 2 separate nodes at the same time. Addressing it would require some code cleanup since some default options are set in KV (diskformat, fstype) but some are not set (e.g. size).
@govint Are you okay addressing this in separate issue?

No problem if its going to take more changes anyway.

govint · 2017-03-25T03:18:39Z

esx_service/vmdk_ops.py

@@ -1318,10 +1325,11 @@ def disk_detach(vmdk_path, vm):
    if not device:
       # Could happen if the disk attached to a different VM - attach fails
       # and docker will insist to sending "unmount/detach" which also fails.


Not this change, Docker doesn't send unmount if mount fails anymore. But this service doesn't haven't to assume such behavior. Possibly remove all mention docker in the service code.

govint · 2017-03-25T03:19:46Z

esx_service/vmdk_ops.py

-       vmdk_path, vm.config.uuid)
-       logging.warning(msg)
-       return err(msg)
+       # Or Plugin retrying operation due to socket errors #1076


Just as a coding convention can skip mentioning PR number in code.

we actually do refer to github issues and PRs in different places, so while it may be good or bad, it is certainly consistent (there are 30 cases like that in 17 files in our repo, including ./vendor). BTW , what/where is this coding convention asking to skip PR numbers ?

I added PR# to give future reader some context why this special cases can occur. Alternatively, I can update function header comment without PR.

govint · 2017-03-25T03:22:36Z

esx_service/vmdk_ops_test.py

@@ -245,9 +245,9 @@ def testPolicy(self):
                                            volume_kv.DISK_ALLOCATION_FORMAT: unit[3]})
            self.assertEqual(err == None, unit[2], err)

-            # clean up should fail if the created should have failed.
+            # clean up would succeed with #1084.


As mentioned PR/issue numbers can be skipped in comments.

govint

Few comments added.

make esx ops idempotent

54cc3ff

vmwclabot added the cla-not-required label Mar 24, 2017

pdhamdhere changed the title ~~Pd fix 1079~~ Make ESX Ops idempotent Mar 24, 2017

Remove extra log

94921be

pdhamdhere force-pushed the pd-fix-1079 branch from dc9b846 to 94921be Compare March 24, 2017 23:35

msterin approved these changes Mar 25, 2017

View reviewed changes

shuklanirdesh82 suggested changes Mar 25, 2017

View reviewed changes

Fixed vmdk_ops_test

5c11e1c

govint reviewed Mar 25, 2017

View reviewed changes

govint suggested changes Mar 25, 2017

View reviewed changes

pdhamdhere mentioned this pull request Mar 25, 2017

Fail duplicate CreateVMDK if VolOpts are different #1089

Open

pdhamdhere merged commit fcc6381 into master Mar 25, 2017

pdhamdhere deleted the pd-fix-1079 branch March 25, 2017 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ESX Ops idempotent #1084

Make ESX Ops idempotent #1084

pdhamdhere commented Mar 24, 2017

shuklanirdesh82 commented Mar 25, 2017

shuklanirdesh82 Mar 25, 2017

pdhamdhere Mar 25, 2017

govint Mar 25, 2017

pdhamdhere Mar 25, 2017

govint Mar 25, 2017

msterin Mar 25, 2017

msterin Mar 25, 2017

pdhamdhere Mar 25, 2017

govint Mar 25, 2017

govint Mar 25, 2017

govint Mar 25, 2017

msterin Mar 25, 2017

pdhamdhere Mar 25, 2017

govint Mar 25, 2017

govint left a comment

Make ESX Ops idempotent #1084

Make ESX Ops idempotent #1084

Conversation

pdhamdhere commented Mar 24, 2017

shuklanirdesh82 commented Mar 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

govint left a comment

Choose a reason for hiding this comment