Skip to content

Latest commit

 

History

History
660 lines (503 loc) · 31.4 KB

README.md

File metadata and controls

660 lines (503 loc) · 31.4 KB

Research

This collects research on continuation-passing style, async/await, coroutines

Introduction to suspendable/resumable functions

React 16 uses coroutines, fibers, continuations and so a lot of Javascript developers have to learn that stuff:

Alternatively this post in the Python mailing list from 1999 is also a good explainer:

Layman's explanation of delimited continuations with examples of using them for exception handling and nondeterministic programming.

The Rise of Coroutines, by Kotlin Coroutines lead:

Low-level introduction to Continuation-Passing Style and Coroutines

Coroutines can replace state-machines:

State machine

class ProtocolWrapper(object):
    def __init__(self,
            header='\x61',
            footer='\x62',
            dle='\xAB',
            after_dle_func=lambda x: x):
        self.header = header
        self.footer = footer
        self.dle = dle
        self.after_dle_func = after_dle_func

        self.state = self.WAIT_HEADER
        self.frame = ''

    # internal state
    (WAIT_HEADER, IN_MSG, AFTER_DLE) = range(3)

    def input(self, byte):
        """ Receive a byte.
            If this byte completes a frame, the
            frame is returned. Otherwise, None
            is returned.
        """
        if self.state == self.WAIT_HEADER:
            if byte == self.header:
                self.state = self.IN_MSG
                self.frame = ''

            return None
        elif self.state == self.IN_MSG:
            if byte == self.footer:
                self.state = self.WAIT_HEADER
                return self.frame
            elif byte == self.dle:
                self.state = self.AFTER_DLE
            else:
                self.frame += byte
            return None
        elif self.state == self.AFTER_DLE:
            self.frame += self.after_dle_func(byte)
            self.state = self.IN_MSG
            return None
        else:
            raise AssertionError()

Coroutines

@coroutine
def unwrap_protocol(header='\x61',
                    footer='\x62',
                    dle='\xAB',
                    after_dle_func=lambda x: x,
                    target=None):
    """ Simplified framing (protocol unwrapping)
        co-routine.
    """
    # Outer loop looking for a frame header
    #
    while True:
        byte = (yield)
        frame = ''

        if byte == header:
            # Capture the full frame
            #
            while True:
                byte = (yield)
                if byte == footer:
                    target.send(frame)
                    break
                elif byte == dle:
                    byte = (yield)
                    frame += after_dle_func(byte)
                else:
                    frame += byte

Included in repo

Nim RFCs

Case studies

Structured concurrency / avoid eager execution

C++ Unified Executor proposals

Python Trio

Note from experience in eager execution vs delayed execution of compute graph in Machine Learning (which are similar to async task graphs), people found eager execution (Pytorch) significantly easier to reason about than lazy (Tensorflow).

Note 2: Rust seems to have come to the same conclusion. Rust features are delayed until they are polled

Languages write up

Continuations in C

C++ coroutines

Library impact: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0975r0.html

Disappearing coroutines:

For ultra low overhead coroutines usecase:

Alternatives and criticism of C++ CoroutineTS

Tradeoffs of various implementation techniques including:

C++ Fibers

Fibers == stackful coroutines == user-mode threads as in they allocate their own stacks on the heap and modify the stack base pointer and stack pointer to jump to it

C++ Resumable Functions

C# Async/Await

Concurrent ML and Guile

F# continuations

Kotlin coroutines

Kotlin builds async/await on top of coroutines abstraction

Talks:

LLVM coroutines

Lua coroutines

Javascript React Fiber

ML

Nim closure iterators

Ocaml Async

Note: OCaml optimizes chain of Deferred in tail call (i.e. async tail call optimization)

Python Greenlet, Stacklet, Continulet, Fibers

Rust readiness-based zero-alloc futures

Caveat: As Windows IOCP and Linux io_uring require being passed an owned buffer to allow "zero-copy" IO, Rust futures will need to allocate anyway, see async_await_cancellation_rust_matthias247

In particular it distinguishes between the traditional completion-based and their own poll-based futures with completion-based requiring a buffer for each future and so requiring more memory allocation (which are problematic because it stresses the GC, and lead to memory fragmentation on long running application). In particular the poll approach is attractive because it eases cancellation (don't poll) and since there is no heap indirection for the future, the compiler can do deep optimizations.

Presentations:

Key takeways

  • Rust async-await is future-first instead of being coroutine-first.
  • Rust allocates once per async scope (structured concurrency) instead of once per coroutine (await calls).
  • The future stack contains all the intermediate state and the future is implemented as a state machine.
  • The Executor and the Reactor (event loop) are separated.
    • The executor has a collection of runnable tasks and schedules them
    • The reactor has a collection of handles (futures) and notify the associated executor on events
    • A Rust future is a handle to a computation, it associates a procedure + data/context + an executor
      • The executor handle is stored in a field of type Waker
    • A task is a future in-progress, currently scheduled by an executor

Rust rough edges and criticism

Rust RFCs

Note: stackless in Rust means on the stack while stackless in Python means on the heap (don't use stack) ¯\(ツ)

Rust-style futures in C

Scheme

Swift Async proposal

Swift would build async/await on top of coroutines abstraction

24 Dec announcement:

Kernel I/O primitives

Windows IOCP

Windows Async Procedure Calls

Linux AIO

Linux io_uring

Academic research

Continuations

  • The Discoveries of Continuations, John Reynolds
    https://www.cs.tufts.edu/~nr/cs257/archive/john-reynolds/histcont.pdf

  • Representing Monads
    Filinski, 1994
    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.8213&rep=rep1&type=pdf Reddit quote: https://www.reddit.com/r/programming/comments/pz26z/oleg_says_its_time_to_think_about_removing_callcc/c3twxmh/?utm_source=reddit&utm_medium=web2x&context=3

    A compelling argument for why delimited continuations are elegant is the paper Representing Monads. Two very short one liners are enough to make any monad native to your programming language. For example your language doesn't support exceptions? Define a maybe monad and use monadic reflection to make them feel like native exceptions. Your language doesn't have mutable state? Define a state monad and use monadic reflection to make it feel like native mutable state. Your language doesn't have coroutines and requires you to write callback manually as in Node.js? Define a monad similar to F#'s async workflows (=~ continuation monad), and now it feels like your language has native asynchronicity. The same goes for monadic parsers.

    Summary: delimited continuations give you a way to write monadic code as if you were writing normal code (you could say it's do-notation on steroids).

  • Capturing the Future by Replaying the Past
    Functional Pearl
    James Koppel, Gabriel Scherer, Armando Solar-Lezama
    https://arxiv.org/pdf/1710.10385.pdf\

    Delimited continuations are the mother of all monads! So goes the slogan inspired by Filinski’s 1994 paper,which showed that delimited continuations can implement any monadic effect, letting the programmer usean effect as easily as if it was built into the language. It’s a shame that not many languages have delimited continuations.

Ergonomics (coroutine == delimited continuations):

Coroutines

The yield paper proved that yield as used in programming language is equivalent to shift and reset used to build delimited continuations.

Use cases

Database optimization

Wikipedia - related concepts

CPS for compilers

Type Erasure

A scheduler will likely need to enqueue continuations/frames/tasks in a data structure. For that it will need to be type-erasedat 2 level, the tasks and the environment. Tpe erasing the task is easy as it is always used by pointer. However the environment as dynmic size.

Some solutions:

Function colors

Function coloring problem might be exagerated. waitFor solves crossing the async->sync barrier. And an equivalent construct is needed for sync->async especially:

  • to handle blocking file operations on Linux
  • to "asyncify" channels/flowvers used to communicate in a multithreaded program.

Async function color

Rebuttal of the function color myth:

{.sideeffect.}, IO Monad, Tainted string, Result, error code are all colored

Nim {.sideeffect.} are also API infectious, which is usually OK except when we want to introduce logs in a previously side-effect free function.

However in the case of {.sideeffect.}, Result, error codes or {.raises: [].}, the viral nature is actually wanted, just like having proper parameter types as this ensures program correctness.

Fork/Join parallelism color

In most modern implementations fork/join parallel runtime are reentrant, a caller isn't "infected" by the internals of the called function, parallelism is black-boxed. However, Cilk, the foundational implementation of fork/join parallelism, that also proved how to schedule optimally on multicore is colored. http://supertech.csail.mit.edu/papers/PPoPP95.pdf At a spawn point, the current execution is separated into a child call and a continuation:

int fib(int n)
{
    if (n < 2)
        return n;
    int x = cilk_spawn fib(n-1);
    int y = fib(n-2);
    cilk_sync;
    return x + y;
}

At this point, the current thread can either:

  • execute the child call (work-first) and another thread can swoop in and steal the continuation (continuation-stealing). This is Cilk mode of operation.
  • execute the continuation (help-first) and another thread can swoop in and steal the child (child-stealing). This is most other framework mode of operations.

The main difference is that continuation stealing require saving the stack frame of the current function or it can't be resumed from an arbitrary thread. And another consequence is that function including cilk_spawn have a different calling convention than C since they are resumable. This means that Cilk program cannot naively be called from C. The workaround Cilk developers used is to always compile 2 versions of a program, one with the Cilk resumable calling convention and one with the C calling convention.

A side note: continuation stealing ensures that on a single-thread order of execution of the program is the same as it's serial equivalent. Also child stealing suffers from unbounded suspended tasks spawned since they are only worked on late. It's an excellent way to DOS GCC OpenMP implementation.

Ergonomics

Adding async call vs not adding it:

Schedulers

It is interesting to have both global schedulers and scoped schedulers.

Global schedulers have significantly more optimization opportunities. This is the reason why Microsoft forces everyone to use Windows Fibers for multithreading and Apple forces everyone to use Grand Central Dispatch. On the other hand, developers might want local scheduler either to control their priority/niceness or ensure reentrancy if used as a library.

Scheduler research

Async I/O frameworks

  • CppCoro by Lewis Baker: https://github.com/lewissbaker/cppcoro

    • Task, generator
    • Scheduler
    • Event, Barrier
    • Cancellation
    • Threadpool
    • File IO
    • Socket IO, IP
    • awaitable traits
    • awaiter, awaitable, scheduler concepts
  • C++ Elle by Docker: https://github.com/infinit/elle

    • elle: Utilities including serialization, logs, buffer, formatting, ...
    • reactor: An asynchronous framework using a coroutines scheduler
    • cryptography: Object-oriented cryptography wrapper around OpenSSL
    • protocol: Network communication designed to support RPCs
    • das: Symbol-based introspection
    • athena: Byzantine environment algorithms (Paxos)
    • service/aws: reactorified AWS API wrapper
    • service/dropbox: reactorified Dropbox API wrapper
    • nbd: A Network Block Device implementation.
  • C++ Mordor by Mozy: https://github.com/mozy/mordor

    • Streams
    • HTTP server and clients
    • logging, configuration, statistics gathering, and exceptions.
    • unit test
  • Rust async-std: https://github.com/async-rs/async-std