New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an efficient process pool for sandboxing evaluation of foreign computations #24
Comments
I'll take this on, if no one else is working on it |
Sure, though can you prioritize finishing up the key-value store bindings, On Fri, Apr 15, 2016 at 1:28 PM sfultong notifications@github.com wrote:
|
Sure thing |
I started looking into this and have a few questions:
I think it would help me to see how some client code would use a |
I think you're right. Not sure what I was thinking there. Let's go with that.
With the API I gave, the client of evaluate :: Budget -> a -> IO (ID, IO (Either Err b)), So the outer
I'm a bit hazy on this myself, but client code would just call the The worker process would just read a stream of data Packet a = Kill | Done | Eval a deriving (Generic)
instance Serial a => Serial (Packet a)
So the implementation of Do you think that gives you enough to go on, or do you have more questions? Obviously as stuff comes up we can talk more. :) |
unisonweb#24) with some minor changes to make it compile.
@pchiusano Thanks for elaborating. I've started a branch in my fork for this work (there's no implementation yet): https://github.com/tmciver/unison/tree/process-pool. A few more questions:
|
@tmciver Okay, I have been giving this some thought, and realized that sandboxing can be done in a way that is more elegant and makes this process pool unneeded. So I'm going to close this issue. Really sorry to pull the rug out from under you. :( I'll be on the lookout for other bite-sized projects. Briefly, the new design (which I'll write up in a post) is that nodes are much lighter weight things (many nodes can be running on a single machine), and you can even spawn nodes dynamically, using a new primitive like: -- | Create a new local node, running in the specified sandbox
spawn : Budget -> Sandbox -> Remote! Node Where I think this is a more elegant approach. Any pooling of nodes can be done just using pure Unison code. And when sharing node references with others, you can choose to share a sandboxed node that has more limited capabilities. For the implementation, each node will be backed by a single OS process if it's 'awake', or if it's 'asleep' it will just be backed by some bits on disk. I'm going to do some prototyping of this architecture before committing to it, but once it is more solid I think there will be some good bite-sized pieces for others to work on. |
OK, no problem. Let me know when there's more work to do; I'm eager to contribute to this project and hone my skills! |
Cool. :) |
This is a fun, important, fairly self-contained project that we aren't blocked on right now, and requiring minimal background. If you'd like to get involved in Unison development, read on and see if you'd like to take the lead on an implementation!
This project will be an important component of the distributed systems API. Reading or at least skimming that post is probably good background but isn't strictly necessary.
When a Unison node receives a computation to evaluate from another node (a "foreign computation"), currently we do so in in the same process as the node itself. This is bad for a few reasons:
Since we don't necessarily trust the foreign computation with the full set of CPU and memory resources available to a Unison node, we need to run foreign computations in some sort of sandbox. Here's the API (subject to tweaking, but this is probably a good start):
That is the full API. The implementation should be backed by a growable pool of processes. (If Haskell threads could specify a max heap size on startup, we could do everything in-process, but unfortunately, that isn't supported and it doesn't look like it's happening anytime soon.)
Here's a simple sketch of an implementation:
available :: Map (TimeBudget, SpaceBudget) [ProcessHandle]
, which is the list of free worker processes ("workers") associated with each budget. We don't literally want to spin up a new OS process every time evaluate gets called.running :: Map (TimeBudget, SpaceBudget) [ProcessHandle]
, which is the list of processes that are currently running a call toevaluate
.ids :: Map ID [ProcessHandle]
, storing the mapping from ID to processes with that ID.evaluate
gets called, serialize the thunk using the argument passed topool
. Lookup inavailable
to see if there's an existing process configured with that budget, and which happens to be free:Left InsufficientProcesses
. If not, spin up a new process with that budget, add it to theavailable
map and move to the next step.IO (Either Err b)
.running
,active
, andids
state accordingly.Note: Any restriction of privileges other than time / space budgeting will be handled before a call to
evaluate
. So for instance, if we want to disallow write access to the node's local data store, this would be implemented by inspecting the term, and making sure it cannot reference any such functions. We'll call this a "capability failure" vs a "resource budget failure" caused by a computation exceeding its time or space budget.The pool is backed by a number of worker processes (or just "workers"). A worker process will be initialized with a CPU and space budget (probably via command line flags), and its main logic will be some
a -> IO b
:The time budget will be handled internal to the Haskell code, but the memory budget will have to be handled via an RTS flag. It looks like
myprog +RTS -M1024m
will limitmyprog
to run with 1024 megabytes. The time budget should be handled internally so that the same worker can be reused, rather than having to spin up a new process every time. It will be quite common to have lots of sequential requests with the same budget.If you are interested in this project and have questions (or suggestions), please post them here, or come discuss in the chat room.
The text was updated successfully, but these errors were encountered: