-
Notifications
You must be signed in to change notification settings - Fork 996
runtime (gc_blocks.go): use best-fit allocation #5105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
The allocator originally just looped through the blocks until it found a sufficiently-long range. This is simple, but it fragments very easily and can degrade to a full heap scan for long requests. Instead, we now maintain a sorted nested list of free ranges by size. The allocator will select the shortest sufficient-length range, generally reducing fragmentation. This data structure can find a range in time directly proportional to the requested length.
|
This is the same basic mechanism as #1181, but it is a lot cleaner. |
|
This adds 100-300 bytes of code. We need to decide if this is worth it. |
|
I often run out of memory because of fragmentation, so I heartily support anything that combats it. |
|
@dgryski can you take a look? To see whether it helps with GC performance? |
|
In general, "best fit" is going to reduce fragmentation at the expense of CPU time. An allocation-heavy benchmark (in this case the binary trees benchmark game) shows this to be the case: Our current allocation scheme is "next fit". Interestingly, using Running the binary trees benchmark with |
|
It might be worth waiting until #5104 is merged. I remember I was able to optimize the free ranges construction on my experiments branch by exploiting the new metadata format. The current free range construction code just loops over the individual blocks. Also, are you using array-backed trees or trees that are each allocated as separate fixed-size objects? If the latter is the case, that is basically the worst case for this - there isn't really any meaningful fragmentation in the first place. |
|
Can you link the trees code so I can debug the SEGV? |
|
I'm using https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-go-2.html . So yes, fixed-sized allocations for the tree nodes. |
|
Oh right that SEGV is the race condition where we release the GC lock before writing the layout bitmap. I fixed it in #5102 while reorganizing the |
|
Also, the main issue with the binary trees benchmark here is that the collector is not the bottleneck. The lock is the bottleneck. If you switch to |
Add SSTGCHint() and related functions to optimize GC behavior for SST's simpler memory patterns. SST has fundamentally different memory characteristics: - Single shared stack (no per-goroutine allocations) - Fixed-size event queues (pre-allocated) - Tasks created once at startup - Run-to-completion (no blocking state) The best-fit allocator from PR tinygo-org#5105 is not critical for SST because the allocation patterns are much more predictable and less prone to fragmentation.
The allocator originally just looped through the blocks until it found a sufficiently-long range. This is simple, but it fragments very easily and can degrade to a full heap scan for long requests.
Instead, we now maintain a sorted nested list of free ranges by size. The allocator will select the shortest sufficient-length range, generally reducing fragmentation. This data structure can find a range in time directly proportional to the requested length.
Performance in the problematic
go/formatbenchmark: