Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The isolator should checkpoint mounts BEFORE the actual mount operation. #93

Open
jieyu opened this issue Apr 14, 2016 · 8 comments
Open

Comments

@jieyu
Copy link

jieyu commented Apr 14, 2016

Currently, the isolator checkpoints the mount AFTER doing the actual mount. The consequence of that if the slave crashes after doing the mount before the checkpointing, it loses track of that mount during recovery. One obvious issue is mount leaking. There might be other issues.

@cantbewong
Copy link
Contributor

Your statement on the order of checkpointing occurring after the mount is correct

Order is:

  1. mount at OS level
  2. queue "Future" bind isolation mount for container
  3. record checkpoint

Note that the checkpoint record includes the munt location which is NOT deterministic and can only be recorded after the mount.

The mountpoint is there to handle cases where multiple containers a=on the agent request the idetical volume. When this happens, the isolator needs to lookup the OS mountpoint for the bind mount.

The design accepted the risk of a mount leak on the premise that pre-emptive mount would remedy this under the rare circumstance of a slave service crash.

If we go with a checkpoint first model, the checkpoint would need to be a 2 phase approach of first recording only provider + volume name as a "tenatative mount" and then recording full details. The "tentative mount" info would be enough to do an unmount during recover()

@jieyu
Copy link
Author

jieyu commented Apr 14, 2016

Do you need to checkpoint mount point location? Or you just need provider + volume.

In other words, if your recovery just need provider + volume, why are you checkpointing the mount point (and other informations)?

@cantbewong
Copy link
Contributor

The bind mount needs the mount point location. Provider + volume is a usually unique identifier but is not sufficient to make a bind call by itself. It might be possible to make a query to the provider but this takes time - and may result in no response adding the complexity that you would want this asynchronous. And some providers, such as Amazon allow duplicate volume names, meaning that the query is only usually reliable as opposed to 100% reliable.

Other information (mount options) was checkpointed for troubleshooting and for the possibility of warning generation should two containers request mount of the same volume with different options. This warning generation is not implemented at this time

@cantbewong
Copy link
Contributor

So to make it clear, when a second container arrives, requesting a mount of a volume that is already mounted and in use by another container, we need a way to determine the mount point location returned by the mount performed for the first container

@jieyu
Copy link
Author

jieyu commented Apr 15, 2016

Just to clarify, that duplicated volume names sounds weird to me. What's the semantics? Say you mount the same volume name (same driver) twice, there'll be two mounts in the host mount table? What happens if I do a subsequent unmount? Will that unmount both? or just one of them? If just one of them, which one will be unmounted?

Now I see that you need to checkpoint the mount points. Thanks for the explanation.

@jieyu
Copy link
Author

jieyu commented Apr 15, 2016

I looked at the docker volume specification:
https://docs.docker.com/engine/extend/plugins_volume/

/VolumeDriver.Mount
"Docker requires the plugin to provide a volume, given a user specified volume name. This is called once per container start. If the same volume_name is requested more than once, the plugin may need to keep track of each new mount request and provision at the first mount request and deprovision at the last corresponding unmount request."

It looks to me that the ref counting should be handled by the plugin (e.g., rexray), rather than the container runtime. The container runtime will simply call 'Mount' during container startup and 'Unmount' during container teardown. Given that, I am wondering why we are doing mount tracking in the isolator?

@clintkitson
Copy link
Contributor

The Docker engine does not keep track of state or mounted volume/container
counts today. But this isn't to say this is ideal.

For example, what if the Docker engine crashes and along with it the
containers. The plugin would have zero knowledge about this occurring other
than in the form of more requests and a higher tracking count. This will
lead to the volume never getting unmounted.

The container runtime from our view should be the sole authority behind
when to mount and unmount the volumes from the hosts. It should have all of
the necessary knowledge that would determine when that operation is proper
because it knows about the state of all containers.

So we didn't necessarily do anything different with the spec. We
implemented the volume package but changed how the Mesos runtime did
accounting to be more sophisticated and tracked directly with containers.

On Friday, April 15, 2016, Jie Yu notifications@github.com wrote:

I looked at the docker volume specification:
https://docs.docker.com/engine/extend/plugins_volume/

/VolumeDriver.Mount
"Docker requires the plugin to provide a volume, given a user specified
volume name. This is called once per container start. If the same
volume_name is requested more than once, the plugin may need to keep track
of each new mount request and provision at the first mount request and
deprovision at the last corresponding unmount request."

It looks to me that the ref counting should be handled by the plugin
(e.g., rexray), rather than the container runtime. The container runtime
will simply call 'Mount' during container startup and 'Unmount' during
container teardown. Given that, I am wondering why we are doing mount
tracking in the isolator?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#93 (comment)

@jieyu
Copy link
Author

jieyu commented Apr 15, 2016

@clintonskitson that makes sense. Thanks for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants