New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The purpose of sandbox vs. process vs. container config separation? #12
Comments
Sandbox is a static part of the config. It's usually deployed by admins and only changed by admins. Also, we don't make unique sandboxes for every project. Just a template with a unique directory and unique user/port range, and separate network bridge/vlan. On the other hand, process config is dynamic. It says run N processes of version X. Where both N and X are supposed to be updated many times an hour. It's supposed to be generated by an orchestration system (or merely a deployment script). Another way o think of it: in many other systems process config is the data transferred using runtime API call. Just we don't try to stick with HTTP API and think files are good API too. And another thing we're trying to do (but not there yet) is to make it multi-tenant. I.e. think of "sandbox" written by hosting provider. "container" config written by actual user. And "process" config is created by CI/CD or orchestration system.
TL;DR We haven't figured out a good way to make that limit(s) at the lithos level yet. We make it at orchestration system. And we're seeking a way to it in lithos too. Memory LimitWell, yes cgroup limits are underspecified for now. Let's talk about memory limit first, because this is the one I'm thinking of the most. Let's consider we have a project called "plum". And it has two services:
Now, let's talk about the following scenarios. The core idea is that all of them should work without changing sandbox:
Now (1) and (3) look like easy, just set 10Gb memory limit in a sandbox. Except for the one interesting question I've skipped before: Are plum-web and plum-celery in the same sandbox or in different ones? What about plum-ssr? The key point here is that interactions like "increase the number of instances of one service and decrease the number of instances of another" can happen between any services. Including services that belong to different projects. So we're not yet sure how to set a memory limit for sandbox:
Okay, this is already too long... CPU LimitWe only have There is also sched-bwc, but i haven't tried it yet. And I'm not sure how to configure it sensibly. Other LimitsWell, I'm not sure about other ones. It looks like there aren't good way to limit disk throughtput (the cgroup one isn't effective in my practice). Any other ones? Sandbox effectively limits other things: users, directories that can be mounted, network bridge that can be used. These things do not have shenanigans we have with memory. NowCurrenly, we handle memory limit in an orchestration system. The update that contains invalid memory limit will be rejected. I.e. if we have a 100Gb memory limit for whole cluster, and 200 instances configured to run we reject config that has anything above 500mb as the memory limit. You can also check it as a part of your CI/CD pipeline: i.e. don't allow to deploy anything which has limit larger than specified. This requires trusted code in CI which may be or may not be a problem for your use case. FutureThere are few other things that don't exactly match "sandbox" concept in current form:
So we're seeking a way to configure these things in a way that doesn't complicate config even more. And your input is valuable. |
I don't think there is anything actionable left in this issue. Feel free to open separate issues for specific proposals. |
I still don’t understand high-level concept of Lithos’ configuration. Why is configuration of a container separated into three configs (sandbox, process and container), specifically why is sandbox and process not a single config? What’s the idea behind this and how it’s supposed to be used? Could you please provide example of some non-trivial setup that demonstrates this?
In #8 (comment) you wrote:
This quite makes sense, except one thing – cgroup limits are configured only in the container config (inside image). Cgroup limits are exactly the thing I’d like to enforce for users and not let them change them.
The text was updated successfully, but these errors were encountered: