Replace dunai by automata/state machines #299

turion · 2024-03-11T19:02:26Z

Motivation

Unfortunately there are two reasons that force me to drop the dependency on dunai, and replace it by an implementation of effectful Mealy state machines.

dunai is incompatible with transformers 0.6, mtl 2.3, GHC 9.6: Allow GHC 9.6 #215, dunai: Does not build with transformers-0.6 because of ListT ivanperez-keera/dunai#402
There are fundamental performance issues with dunai: Space leak in rhine-bayes example program? #227 dunai: morphGS is probably inefficient ivanperez-keera/dunai#370

Work done

This PR replaces dunai by a effectful Mealy state machines, in the initial encoding. The implementation is heavily inspired by https://github.com/lexi-lambda/incremental/blob/master/src/Incremental/Fast.hs and https://github.com/turion/essence-of-live-coding. It solves the two problems:

The automaton implementation is compatible with transformers 0.6. Prominently, there is no support for the old ListT.
A benchmark is added which shows dramatic performance improvements.

Benchmark

The benchmark is a simple word count implementation which counts the words of Shakespeare's complete works. Its purpose is to show how much overhead dunai and rhine introduce. It includes:

An idiomatic Rhine implementation
An idiomatic state automaton implementation (without Rhine clocks)
An idiomatic dunai implementation
Three direct implementations using text
- Using IORefs
- Reading line by line, but avoiding IORefs
- Lazy text

For a non-Haskell baseline, the standard wc completes the benchmark in 30 ms on my reference machine.

Result with state automata

benchmarking WordCount/rhine
time                 239.8 ms   (233.2 ms .. 253.6 ms)
                     0.998 R²   (0.993 R² .. 1.000 R²)
mean                 214.1 ms   (196.9 ms .. 225.8 ms)
std dev              19.36 ms   (9.243 ms .. 27.70 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking WordCount/automaton
time                 84.87 ms   (77.64 ms .. 92.24 ms)
                     0.992 R²   (0.986 R² .. 0.998 R²)
mean                 91.60 ms   (88.83 ms .. 94.73 ms)
std dev              5.279 ms   (3.818 ms .. 7.088 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking WordCount/dunai
time                 900.6 ms   (789.3 ms .. 972.3 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 848.3 ms   (821.7 ms .. 868.4 ms)
std dev              28.27 ms   (12.92 ms .. 39.29 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking WordCount/Text/IORef
time                 90.62 ms   (87.74 ms .. 94.72 ms)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 89.66 ms   (88.41 ms .. 90.98 ms)
std dev              2.116 ms   (1.502 ms .. 3.124 ms)

benchmarking WordCount/Text/no IORef
time                 154.8 ms   (148.1 ms .. 163.4 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 152.5 ms   (149.9 ms .. 155.1 ms)
std dev              3.750 ms   (2.495 ms .. 5.029 ms)
variance introduced by outliers: 12% (moderately inflated)

benchmarking WordCount/Text/Lazy
time                 128.0 ms   (102.9 ms .. 141.7 ms)
                     0.962 R²   (0.841 R² .. 0.996 R²)
mean                 136.6 ms   (126.1 ms .. 155.2 ms)
std dev              21.80 ms   (10.23 ms .. 34.99 ms)
variance introduced by outliers: 48% (moderately inflated)

The naive Haskell baseline is 3 times wc with around 90 ms. There are faster word count implementations than wc in Haskell, but this benchmark is about the overhead introduced by the frameworks, so I wrote a baseline implementation that is the fastest conceivable program which I can imagine a Rhine program being optimized to.
Automata achieve this baseline.
Not using IORef introduces factors of 1.2-1.5.
Rhine with automata is 2.5x slower. This is not ideal, but I find it acceptable. Further investigation would be necessary to find out what additional overhead is introduced.
dunai is far behind with a 10x slowdown.

Comparison: No automata

I will introduce the benchmark before this PR (#285). With Rhine depending on dunai, it is vastly slower: In the direction of 100x against wc, and still over 2x over dunai, which means that clock erasure does not optimize well. The abstractions introduced in dunai-dependent Rhine are far from zero-cost.

benchmarking rhine
time                 2.368 s    (2.173 s .. 2.609 s)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 2.984 s    (2.696 s .. 3.352 s)
std dev              410.3 ms   (14.38 ms .. 509.5 ms)
variance introduced by outliers: 24% (moderately inflated)

Open questions, to dos

rhine-bayes/app/Main.hs

rhine-bayes/src/Data/MonadicStreamFunction/Bayes.hs

rhine-bayes/src/FRP/Rhine/Bayes.hs

rhine-examples/src/EventClock.hs

rhine-gloss/Main.hs

turion

Check whether all primitives in automata are inline
Fix copyrights, make clear that API is inspired heavily by dunai

rhine-gloss/src/FRP/Rhine/Gloss/IO.hs

rhine/rhine.cabal

rhine/src/Data/Automaton.hs

rhine/src/Data/Automaton/Except.hs

rhine/src/FRP/Rhine/Schedule.hs

rhine/test/Automaton/MSF.hs

rhine/test/Schedule.hs

rhine/src/Data/Automaton.hs

turion · 2024-05-10T16:44:50Z

CC @ners I'm pretty close to merging now :) one leftover FIXME and a final review from my side. In case you want to review, feel free!

automaton/src/Data/Automaton.hs

turion force-pushed the dev_automata branch from 8dfb47b to 8b0fbb2 Compare March 13, 2024 10:47

This was referenced Mar 26, 2024

Allow GHC 9.6 #215

Merged

Theory & advanced track turion/rhine-koans#21

Open

turion force-pushed the dev_automata branch 2 times, most recently from 0f5a463 to 1b8e153 Compare March 28, 2024 15:37

turion mentioned this pull request Mar 29, 2024

Simplify initClock? #304

Open

turion force-pushed the dev_automata branch from 1b8e153 to db127e2 Compare March 29, 2024 16:14