New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-issue for upcoming work #101

Open
pchiusano opened this Issue Sep 10, 2016 · 3 comments

Comments

2 participants
@pchiusano
Member

pchiusano commented Sep 10, 2016

Overall

Project has so far been focused on building proof-of-concepts, design exploration, and research. There's still some of that to do, but generally, would like to move focus to engineering work to get the language/libraries/runtime of Unison in a usable state for 'real' stuff. This will let us see how well the ideas that have been developed work at scale.

Milestone is having implementation in good enough shape that you could write some highly-available service for your backend in pure Unison. And if you didn't mind being on bleeding edge, this might even be something you put in production, or at least use internally.

Motivating use cases

  • YouTube backend. Should handle fact that some videos have huge demand, others very little. Basically a distributed, elastic, load-balanced, key-value store of video fragments + maybe a node pool for doing encoding on uploads. 95% of the work will be on the generic data structure, which would have lots of uses.
  • Twitter clone backend. Should handle fact that some people have millions of followers, others have very few.
  • Amazon Lambda clone. This is just an elastic node pool. Should be very little code.
  • Later: real-time P2P video app. This makes sense once node protocol converted over to using UDP. Most of the service will be in having a small number of gateway nodes to facilitate UDP hole punching.

Very high level plan:

  • Improve core language and editing experience (data types + pattern matching + command line editing tool is a good MVP). This lets us develop nontrivial libraries in pure Unison, with a pleasant development workflow (no waiting for code compilation + easy refactoring).
  • Improve the runtime, including distributed communication API (basically, make the runtime good enough for 'real' work, not just proof of concept)
  • Build lots of awesome stuff :) See 'Motivating use cases' below.

More detailed plan

  • @pchiusano is working on #104, then will do #59, which unblocks a lot of stuff and will incidentally make the search engine example run at faster than glacier speed. :)
  • @refried is working on #115. Lots of possible stuff to work on once done with that.
  • @sfultong is on #77, moving onto #107 and/or #103
  • @runarorama is on working on #108 (error handling and supervision primitives)

Things for the language and editor to address after that:

  • #112 (improving parser)
  • #106 (data declarations)
  • #105 (text-oriented codebase interface)

At this point, we now have a nice interface to a Unison codebase, a nice parser, and data types. We can add more stuff to the standard library, perhaps #114 (nontrivial distributed libraries), but there are lots of basic utilities to fill in ('obvious' stuff we take for granted from Haskell's standard libraries).

The next big thing is more work on the Unison runtime, which is all blocked on #59 (separate runtime values from syntax tree). Once #59 is done, at least v1, we can do the following:

  • #109 (rewrite Multiplex to support node snapshots)
  • #110 (support contacting nodes outside current container)

Now we have pretty much all the ingredients to try writing some seriously nontrivial stuff. We have a nice editing and refactoring tool, a standard library, the ability to define new data types, and a runtime that isn't completely terrible. So we can start writing Unison code for some of these nontrivial use cases I gave above, like the Twitter backend, the YouTube backend, etc. How awesome will that be?? :)

Other notes:

  • Still some details to be filled in. See the section below which talks about upcoming design work. Some of the questions about key management might have some cascading effects later, but nothing I'm too scared of, so I think it's okay to punt on these things for now.

Upcoming design work:

  • Design work on node lifecycles - when is a node destroyed? what does it mean to destroy a node? (Paul or someone with strong FP design)
  • Design work on persistent data lifecycles - when is persistent data destroyed? (Paul or someone with strong FP design)
  • Design work on node key management (Paul)
    • Perhaps should make encryption API more explicit
      • Crypto.generate-key : Key
      • Crypto.generate-keypair : (Key, SecretKey)
    • Do spawned nodes need a public key? Maybe not:
      • spawn : Key -> Remote Node, will use provided key for encryption-at-rest, transporting
      • transfer : Key -> Node -> Remote ()
      • With this model, node id can just be random guid, fast to generate
      • No forward secrecy with this approach
  • Design work on live upgrades - how to upgrade running system? Also see Erlang for inspiration.
    • General idea: nodes are immutable, don't do hot replacement, just bring up new nodes with new logic, direct traffic to the new nodes
    • May need design for migrating ownership of persistent data, depends on
  • Implement GADTs (Paul, Dan, Arya, or someone w/ type)
  • Node snapshotting - basically, modify Multiplex to contain serializable continuations for all running computations. Lets us suspend a Node at any time if not in use or container is overloaded, and transport a running node between containers. This also gives a pretty good story for how to do live code upgrades to a running system! Just apply the patch to all the the continuations in the Multiplex state.

@pchiusano pchiusano changed the title from Brain dump meta-issue for upcoming work to Meta-issue for upcoming work Sep 27, 2016

@dumblob

This comment has been minimized.

Show comment
Hide comment
@dumblob

dumblob Oct 17, 2016

Hi @pchiusano, I'm glad you didn't give up year ago with this effort. I'm already following Unison from the very beginning and I'm a huge fan of it. Thank you!

Based on your talk at Full Stack Fest 2016 and your recent switch from research & architecturing to "the boring stuff" (engineering internal, increasing efficiency, etc.), I'd like to raise few questions to get a good overview of your plans. I'm asking in this thread to not pollute the GitHub Issues page.

  1. Are you planning to implement some space-saving heuristics (to save memory and hard drives space) for all the caches each Unison node is using? Currently these caches have add-only semantics, but considering IoT space limits and long-running programs, some pruning based on locality, reachability and recency will be needed.
  2. Will you provide a minimal (e.g. "statically" compiled executable or as small as possible Docker image) for easy deployment for testing purposes? I'm thinking about mobile devices, IoT, personal computers, etc. (this is in sharp contrast with Haskell with GBytes of runtime). I envisage something like 3 MBytes would be fine for a MVP.
  3. What are the exact goals for persistence, durability and reliability of all the data (including caches) used in Unison? I've looked at #92 , but could not extract an answer for this question.
  4. How performant would be treating the persistent values as samples of a signal of the same particular type? In other words, is Unison prepared for high loads when a particular value (e.g. a number) will be requested to be rewritten 10000x per second (e.g. because of voting of milions of participants in the same second, but e.g. for few days) and there will need to be guarantee, that at any time, the last value will be available to readers and at the same time that this read value was persisted.

Btw I'm participating in the development of Dao (programming language using many functional-programming-like ideas enacted in the internals and abstractions offered to the programmer) and I'd like to mimic the interfaces you came up with for Unison in Dao (we have quite good tools for that - e.g. code sections, concrete interfaces, futures, defer, etc.). Dao is though written in C for portability and embeddability reasons. Dao is also very tiny, but has a comprehensive and pretty well-designed (compared to the current "crippled" non-Haskell world 😉) standard library. We suffer though from bad presentation and unfinished documentation, so don't be surprised.

dumblob commented Oct 17, 2016

Hi @pchiusano, I'm glad you didn't give up year ago with this effort. I'm already following Unison from the very beginning and I'm a huge fan of it. Thank you!

Based on your talk at Full Stack Fest 2016 and your recent switch from research & architecturing to "the boring stuff" (engineering internal, increasing efficiency, etc.), I'd like to raise few questions to get a good overview of your plans. I'm asking in this thread to not pollute the GitHub Issues page.

  1. Are you planning to implement some space-saving heuristics (to save memory and hard drives space) for all the caches each Unison node is using? Currently these caches have add-only semantics, but considering IoT space limits and long-running programs, some pruning based on locality, reachability and recency will be needed.
  2. Will you provide a minimal (e.g. "statically" compiled executable or as small as possible Docker image) for easy deployment for testing purposes? I'm thinking about mobile devices, IoT, personal computers, etc. (this is in sharp contrast with Haskell with GBytes of runtime). I envisage something like 3 MBytes would be fine for a MVP.
  3. What are the exact goals for persistence, durability and reliability of all the data (including caches) used in Unison? I've looked at #92 , but could not extract an answer for this question.
  4. How performant would be treating the persistent values as samples of a signal of the same particular type? In other words, is Unison prepared for high loads when a particular value (e.g. a number) will be requested to be rewritten 10000x per second (e.g. because of voting of milions of participants in the same second, but e.g. for few days) and there will need to be guarantee, that at any time, the last value will be available to readers and at the same time that this read value was persisted.

Btw I'm participating in the development of Dao (programming language using many functional-programming-like ideas enacted in the internals and abstractions offered to the programmer) and I'd like to mimic the interfaces you came up with for Unison in Dao (we have quite good tools for that - e.g. code sections, concrete interfaces, futures, defer, etc.). Dao is though written in C for portability and embeddability reasons. Dao is also very tiny, but has a comprehensive and pretty well-designed (compared to the current "crippled" non-Haskell world 😉) standard library. We suffer though from bad presentation and unfinished documentation, so don't be surprised.

@dumblob

This comment has been minimized.

Show comment
Hide comment
@dumblob

dumblob Nov 15, 2016

@pchiusano any insights on this?

dumblob commented Nov 15, 2016

@pchiusano any insights on this?

@pchiusano

This comment has been minimized.

Show comment
Hide comment
@pchiusano

pchiusano Nov 15, 2016

Member

Hey @dumblob sorry this has been on my todo list for a while...

Re 1) and 2) basically, I am not considering it top priority right now to have Unison be this minimal executable that people could literally run on a toaster oven. It's something I'm keeping in mind and would be a nice to have, but it's a big engineering constraint that I'm not trying for right now. The way I'd manage a fleet of IoT devices is not by having Unison literally running on the devices, but by having a Unison node on your network with some capabilities that let it talk to those IoT devices, which can speak whatever protocol (hopefully some widespread standard will emerge for this). So like, your house has a bunch of smart devices, and you have a Unison node on your local network that has access to those devices, and you can write Unison programs that orchestrate the actions of those devices. (I'm ignoring the potentially scary implications of having hardware that can be controlled by possibly buggy software, but this is basic idea...)

So with that as the model, optimizing for space usage and executable size is much less of a concern.

Re: 3) general goal is just to have performant, durable, typed state available to Unison programs and move away from people needing to do explicit serialization to/from the file system, database, whatever. There are some API questions around this that we'll probably continue to play with and iterate on. Not sure that answers your question.

Particular performance use cases like 4) I'm not too focused on right now. It's more important to me to just have reasonable performance for general purpose computation. When that's done, we can go from there. Particular use cases are in the back of my mind and we can improve on things iteratively - Unison's runtime can be made screaming fast, just like any other language. But I think it will be a while before Unison gets used for like HFT. :)

Member

pchiusano commented Nov 15, 2016

Hey @dumblob sorry this has been on my todo list for a while...

Re 1) and 2) basically, I am not considering it top priority right now to have Unison be this minimal executable that people could literally run on a toaster oven. It's something I'm keeping in mind and would be a nice to have, but it's a big engineering constraint that I'm not trying for right now. The way I'd manage a fleet of IoT devices is not by having Unison literally running on the devices, but by having a Unison node on your network with some capabilities that let it talk to those IoT devices, which can speak whatever protocol (hopefully some widespread standard will emerge for this). So like, your house has a bunch of smart devices, and you have a Unison node on your local network that has access to those devices, and you can write Unison programs that orchestrate the actions of those devices. (I'm ignoring the potentially scary implications of having hardware that can be controlled by possibly buggy software, but this is basic idea...)

So with that as the model, optimizing for space usage and executable size is much less of a concern.

Re: 3) general goal is just to have performant, durable, typed state available to Unison programs and move away from people needing to do explicit serialization to/from the file system, database, whatever. There are some API questions around this that we'll probably continue to play with and iterate on. Not sure that answers your question.

Particular performance use cases like 4) I'm not too focused on right now. It's more important to me to just have reasonable performance for general purpose computation. When that's done, we can go from there. Particular use cases are in the back of my mind and we can improve on things iteratively - Unison's runtime can be made screaming fast, just like any other language. But I think it will be a while before Unison gets used for like HFT. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment