Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isolator should invoke potentially blocking operations async from module API handlers #92

Open
jdef opened this issue Apr 12, 2016 · 4 comments

Comments

@jdef
Copy link
Contributor

jdef commented Apr 12, 2016

related to #88, if calls to os::shell to execute dvdcli hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck in STAGING. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.

a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53

@cantbewong
Copy link
Contributor

I looked at the Marathon code and I agree that this is a good idea and should be feasible. Thanks for the input.

@jieyu
Copy link

jieyu commented Apr 14, 2016

To add to @jdef 's description, this problem is pretty severe. If any operation in dvdi module blocks, ALL subsequent container launch/update/destroy will be BLOCKED, irrespective of whether the container is using external volume or not.

Fixing that might involve serializing dvdcli operations. This is because when you use Subprocess, the order in which dvdcli operations are executed is non-deterministic. For instance, say you have a volume you want to umount first and then a new container coming requesting the same volume. You expect that the volume will be mounted for the new container. However, due to the race, it's likely that the umount happens later than the mount.

@dvonthenen
Copy link

@jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its capabilities are.

@jdef
Copy link
Contributor Author

jdef commented May 18, 2016

@jieyu how did we handle this scenario with the docker volume isolator
recently added to mesos?

On Wed, May 18, 2016 at 11:21 AM, David vonThenen notifications@github.com
wrote:

@jdef https://github.com/jdef, Just for my understanding, what happens
in the case for docker type workloads/containers? The specific case I am
thinking about is if we mount the volume async, come out of staging state,
and the application comes up without the volume data being available, the
application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its
capabilities are.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#92 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants