Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Disabled volume discovery and added env var to enable it if needed #1047

Merged
merged 2 commits into from
Mar 21, 2017

Conversation

msterin
Copy link
Contributor

@msterin msterin commented Mar 17, 2017

Fixes #1039

Volume discovery was written to address challenges with docker/plugin startup order,
and as potential auto-recovery for plugin crashes.

  • We do not see crashes often (as a matter of fact, only once so far , with VMCI memory issues - and it' fixed).
  • Docker now manages the order of plugin invocation/startup with Managed Plugins.
  • and, the last but not the least, during the discovery we make assumption about Docker API availability, but Docker makes API available only AFTER full startup including plugin initialization, so it could lead to bug like deb Installation doesn't survive VM restart #1039, where docker either hangs for 15-20 sec (like with legacy plugin) or steps on it's own bugs and crash (like with managed plugin)

In the spirit of taking care of the "sunny day" first, when everything works OK and our code does not crash, this commit turns off discovery on startup.

Managed plugin handles all restarts and remounts fine. Legacy plugin may have issues if we crash, but first of all we never crash :-) and second legacy plugins are getting depreciated anyways

Test

  • manual for managed plugin : install plugin, created volumes, restarted docker, checked volumes exist, run docker container with restart , rebooted VM - all works
  • Manual for legacy plugin (thanks to @shuklanirdesh82 ) - rm existing plugin. restart docker, install docker-volume-vsphere_0.12.ccfc38f_amd64.deb and then docker volume ls, volume create, restart docker, volume ls, volume create, revboot the VM, docker ls, docker rm

//@pdhamdhere @shuklanirdesh82 //@govint

@msterin msterin self-assigned this Mar 17, 2017
@msterin msterin added this to the 0.13 milestone Mar 17, 2017
@msterin msterin added the P0 label Mar 17, 2017
Copy link
Contributor

@shaominchen shaominchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please check the CI failure.

]
},
{"name": "VDVS_DISCOVER_VOLUMES",
"description": "non-empty if pre-existing volumes need to be discovered ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems tabs and spaces are being mix-used which caused inconsistent alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and on the top of this github does not set proper tabstops.

Unfortunately I don't know how to set tab->spaces replacement for json editor in Code.
For everything else I did find it, but for json we'd need suffer :-) unless someone can point me out how to set it.
Add ?ts=4 to the diff URL to see it well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also update "docker-plugin-drivers.md" to include this option.

Copy link
Contributor Author

@msterin msterin Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but only after we integrated plugin with ../Makefiles. Still, you are right and a note there is needed. Captured in #1051

@msterin
Copy link
Contributor Author

msterin commented Mar 18, 2017

Please check the CI failure.

For some reason CI is now down with a sever case of failing checkouts, can't do much about it

git fetch --tags origin +refs/pull/1047/merge:
fatal: Couldn't find remote ref refs/pull/1047/merge
[info] build failed (exit code 1)

@shuklanirdesh82

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I have restarted the failing build https://ci.vmware.run/vmware/docker-volume-vsphere/1707

@pdhamdhere
Copy link
Contributor

pdhamdhere commented Mar 18, 2017

I am little paranoid so tried few scenarios. There is a bug with refCnt discovery with managed plugin.

  • Managed plugin can't be upgraded or disabled when in-use so we are GOOD here.
# docker plugin upgrade vsphere:latest
the plugin must be disabled before upgrading

# docker plugin disable vsphere:latest
Error response from daemon: plugin vsphere:latest is in use
  • Removing managed plugin with "-f" option. After using "-f" you are really on your own!
    Existing container continues to run but any further operation would fail.

  • Managed Plugin crash: Docker restarts plugin with same container id but we fail to populate reference count due to driverName mismatch with managed plugin

docker run --name=p1 --rm -d -v tv1:/data busybox sh -c 'ping 8.8.8.8 >/data/p1.txt'
docker-runc kill <plugin container> -a 

Following check in refcnt.go going to fail;

		for _, mount := range containerJSONInfo.Mounts {
			if mount.Driver == driverName {  <<<<<<<<<<<<<<<<<
				r.Incr(mount.Name)
				log.Debugf("  name=%v (driver=%s source=%s)",
					mount.Name, mount.Driver, mount.Source)
			}
		}
docker inspect p1
...
                "Driver": "vsphere:latest", <<<<<<

And we treat this as stale mount and issue detach/unmount. Result: Container continues to run with no volume :(

@pdhamdhere
Copy link
Contributor

With refCnt DISABLED, code correctly handles plugin crash while volume is in-use by container.

2017-03-18 23:01:02.431628106 +0000 UTC [ERROR] Refcount error - still trying to unmount...name=tv1 refcount=0 
2017-03-18 23:01:02.431859178 +0000 UTC [DEBUG] volume name=tv1 refcnt=0
2017-03-18 23:01:02.450321304 +0000 UTC [DEBUG] vmdkOps.Detach name=tv1

In summary, this PR fixes serious issue in case we crash! However, we lose ability to detach stale mounts (across VM crashes?). I am fine submitting this PR and filing a separate issue to track broken discovery.

@govint
Copy link
Contributor

govint commented Mar 19, 2017

Also disable the refcnt sanity test or update it with running with this option?

And document the behavior with volumes left attached to the VM in case case the VM restarts and the option is set or not set.

Can't this new option be part of the plugin configuration file? User may find it easier to run with the configuration file vs. update the start up scripts to set env variable.

@msterin
Copy link
Contributor Author

msterin commented Mar 19, 2017

Also disable the refcnt sanity test or update it with running with this option?

Yes CI is failing due to 'refcnt on restart' rest, I'll fix before merge and make sure CI is green.

And document the behavior with volumes left attached to the VM in case case the VM restarts and the option is set or not set.

I am not sure what specifically you want in documentation, if you can provide a draft I'd have an idea.

Can't this new option be part of the plugin configuration file? User may find it easier to run with the configuration file vs. update the start up scripts to set env variable.

I think it's easy enough to pass option on plugin install command, so no conf file changes are needed.

@msterin
Copy link
Contributor Author

msterin commented Mar 20, 2017

I've opened #1050 to address @pdhamdhere point (bring the discovery back after making a repair :) ), and hid the crash recovery test. When CI passes I will merge

Mark Sterin added 2 commits March 20, 2017 14:31
Volume discovery was written to address challenges with docker/plugin startup order,
and as potential auto-recovery for plugin crashes.

* We do not see crashes often (as a matter of fact, only once so far , with VMCI memory issues  - and
it' fixed).
* Docker now manages the order of plugin invocation/startup with
Managed Plugins.
* and, the last but not the least, during the discovery we make assumption about Docker API availability,
but Docker makes API available only AFTER full startup including plugin initialization, so it could lead to
bug like #1039, where docker either hangs for 15-20 sec (like with legacy plugin) or steps on it's own
bugs and crash (like with managed plugin)

In the spirit of taking care of the "sunny day" first, when everything works OK and our code does not crash,
this commit turns off discovery on startup.

Managed plugin handles all restarts and remounts fine. Legacy plugin may have issues if we crash, but first of all
we never crash :-) and second legacy plugins are getting depreciated anyways
@msterin msterin merged commit 74c8772 into master Mar 21, 2017
@msterin msterin deleted the no-discovery.msterin branch March 22, 2017 04:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deb Installation doesn't survive VM restart
6 participants