Performance issues with Bytes literals #3954

ceedubs · 2023-05-02T14:59:54Z

In general large Bytes values seem to be fine, but literal values of Bytes have bad performance and result in excessive resource usage that tends to get the Unison process killed by the OS.

Here is a fairly minimal way to reproduce:

Start up Unison with a fresh codebase: ❯ unison -C $(mktemp -d)

Open a scratch.u file and save the following (you don't need to add it):

main : '{Exception, IO} Bytes
main = do
  splitmix 42 do
    Random.bytes 100000

.> run main
.> add.run largeBytes

This all happens fairly quickly. Now I open my scratch file, add the watch expression > Bytes.size largeBytes and save. My ucm process hangs for a while and then gets killed by my OS.

@dolio and I talked about this a bit, and he said that Bytes literals don't have a dedicated representation in the AST and that they end up getting inefficiently encoded as [Nat] where each Nat represents a single Byte. It sounds like there are even further issues with excessive stack frames for large Bytes literals.

The text was updated successfully, but these errors were encountered:

dolio · 2023-05-02T15:16:46Z

Yeah, the problem is that e.g. 0xs0123 is actually something like fromList [0x01, 0x23] which then turns into (pseudocode):

let x0u = 0x01u -- unboxed number
    x0 = Nat x0u
    x1u = 0x23u
    x1 = Nat x1u
    l = mkList x0 x1
 in Bytes.fromList l

So, there's a massive blowup from that representation. Not only is every byte represented as 8 bytes, but code has to be emitted to build the representation, and put it in a sequence. So there's a huge per byte overhead on the code representation even.

The temporary workaround I suggested is to use a string literal of the hex digits, because the code emitted for strings is much more compact. It'd be nice if that were the actual representation of bytes literals, or if they were directly represented. I guess code could be written to detect the same fromBytes [...] with all literal numbers in the compiler, but that's additional work.

ceedubs added the performance label May 2, 2023

aryairani added the team-arya label May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with Bytes literals #3954

Performance issues with Bytes literals #3954

ceedubs commented May 2, 2023

dolio commented May 2, 2023 •

edited

Performance issues with Bytes literals #3954

Performance issues with Bytes literals #3954

Comments

ceedubs commented May 2, 2023

dolio commented May 2, 2023 • edited

dolio commented May 2, 2023 •

edited