Investigate more flexible access models for persistent storage #92

Open
pchiusano opened this Issue Aug 17, 2016 · 0 comments

Comments

1 participant
@pchiusano
Member

pchiusano commented Aug 17, 2016

Currently, persistent storage is tied to an originating node for both reads and writes. For instance, in the following program, the lookup call on the last line will occur on the n1 node, despite the fact that the surrounding computation is on n2 at that point:

Remote {
  n1 := Remote.spawn;
  n2 := Remote.spawn;
  ind := Remote {
    -- Remote.transfer : Node -> Remote Unit
    Remote.transfer n1;
    ind := Index.empty;
    Index.insert "Unison" "Rulez!!!1" ind;
    pure ind;
  };
  Remote.transfer n2;
  -- this will contact `n1` and do the lookup there
  Index.lookup "Unison" ind;
}

This works and is correct, but has the disadvantage that all reads route through the originating node, making that node a bottleneck. For read-heavy workloads, we can imagine relaxing this constraint and letting nodes with the same storage universe as the originating node just issue the query directly, without needing to route through the originating node. (For writes, I think writes should continue to always route through the originating node - we don't want shared, distributed mutable state - users should build higher-level abstractions in pure Unison for this sort of thing)

Some remarks with this approach:

  • We'd need to include the Universe as part of the runtime representation of an Index or any other persisted type.
  • There are some questions around encryption. At the moment, with this API, we are assuming encryption is transparent, done with some key derived from the node's private key. If we want to allow multiple nodes to do reads, we need to get them the key somehow, or just not encrypt the data.
    • One idea is to make key management and encryption more explicit in the API. So, it's not empty : Remote (Index k v), it's empty : Key -> Remote (Index k v) and lookup : Key -> k -> Index k v -> Remote (Optional v).
    • This is probably a good idea. More explicit, and we can provide common patterns just using pure Unison code.
  • There are questions around sandboxing. I've been thinking that sandboxes would certainly include control over what persistent data may be accessed and/or written. But since the sandbox is currently tied to the node, how do we enforce the sandboxing policy when other nodes may be issuing the queries? We may need to handle 'non-public' persistent data differently than persistent data that is 'world-readable'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment