Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Make ESX Ops idempotent #1084

Merged
merged 3 commits into from
Mar 25, 2017
Merged

Make ESX Ops idempotent #1084

merged 3 commits into from
Mar 25, 2017

Conversation

pdhamdhere
Copy link
Contributor

Make ESX Ops idempotent to tolerate plugin retries in case of VMCI communication errors.

Testing Done: Changed Plugin code to retry every operation by injecting failure on 1st attempt. And then tried docker volume create/remove and docker run.

Fixes #1076

@pdhamdhere pdhamdhere changed the title Pd fix 1079 Make ESX Ops idempotent Mar 24, 2017
@shuklanirdesh82
Copy link
Contributor

I guess few tests needs to be fixed as well.

# Return success since disk is anyway not attached
logging.warning("*** Detach disk={0} not found. VM={1}".format(
vmdk_path, vm.config.uuid))
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where some tests failed. Please add affected tests as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is also failing that the volume is in use when remove is issued without a detach. That path is not addressable as we can;t auto detach the volume. Or can we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests are fixed. Auto-detach is outside scope of this change.

return err("File %s already exists" % vmdk_path)
# We are mostly here due to race or Plugin VMCI retry #1076
logging.warning("File %s already exists", vmdk_path)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets at least check that the volume opts in the KV match whats provided. Two create requests with say different VM sizes should be caught, unless the two volumes would be having the exact same properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Actually if they are different we may want to reject the request. It's not really idempotent if requests are different :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go ahead and merge (after fixing the tests) and open an issue to fix create with diff.params - it is super rear now (actually, never)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a new issue #1089. Yes, it's a rare case where Volume Create is requested from 2 separate nodes at the same time. Addressing it would require some code cleanup since some default options are set in KV (diskformat, fstype) but some are not set (e.g. size).
@govint Are you okay addressing this in separate issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem if its going to take more changes anyway.

@@ -1318,10 +1325,11 @@ def disk_detach(vmdk_path, vm):
if not device:
# Could happen if the disk attached to a different VM - attach fails
# and docker will insist to sending "unmount/detach" which also fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this change, Docker doesn't send unmount if mount fails anymore. But this service doesn't haven't to assume such behavior. Possibly remove all mention docker in the service code.

vmdk_path, vm.config.uuid)
logging.warning(msg)
return err(msg)
# Or Plugin retrying operation due to socket errors #1076
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a coding convention can skip mentioning PR number in code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually do refer to github issues and PRs in different places, so while it may be good or bad, it is certainly consistent (there are 30 cases like that in 17 files in our repo, including ./vendor). BTW , what/where is this coding convention asking to skip PR numbers ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added PR# to give future reader some context why this special cases can occur. Alternatively, I can update function header comment without PR.

@@ -245,9 +245,9 @@ def testPolicy(self):
volume_kv.DISK_ALLOCATION_FORMAT: unit[3]})
self.assertEqual(err == None, unit[2], err)

# clean up should fail if the created should have failed.
# clean up would succeed with #1084.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned PR/issue numbers can be skipped in comments.

Copy link
Contributor

@govint govint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments added.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants