Skip to content

Conversation

@niaow
Copy link
Member

@niaow niaow commented Nov 8, 2025

The compiler needs to know whether a defer is in a loop to determine whether to allocate stack or heap memory. Previously, this performed a DFS of the CFG every time a defer was found. This resulted in time complexity jointly proportional to the number of defers and the number of blocks in the function.

Now, the compiler will instead use Tarjan's strongly connected components algorithm to find cycles in linear time. The search is performed lazily, so this has minimal performance impact on functions without defers.

In order to implement Tarjan's SCC algorithm, additional state needed to be attached to the blocks. I chose to merge all of the per-block state into a single slice to simplify memory management.

@niaow
Copy link
Member Author

niaow commented Nov 8, 2025

I think I see what is going on here. I seem to have missed it since i ran mostly the non-device tests when developing this. The map seems to have been hiding the assignment with a nil block before.

@deadprogram
Copy link
Member

Passes the test corpus, with #5081 applied. At least in those tests there is no apparent performance difference.

@niaow this seems like a good idea overall. Is there a way we can benchmark it?

@deadprogram
Copy link
Member

The compiler needs to know whether a defer is in a loop to determine whether to allocate stack or heap memory.
Previously, this performed a DFS of the CFG every time a defer was found.
This resulted in time complexity jointly proportional to the number of defers and the number of blocks in the function.

Now, the compiler will instead use Tarjan's strongly connected components algorithm to find cycles in linear time.
The search is performed lazily, so this has minimal performance impact on functions without defers.

In order to implement Tarjan's SCC algorithm, additional state needed to be attached to the blocks.
I chose to merge all of the per-block state into a single slice to simplify memory management.
@niaow
Copy link
Member Author

niaow commented Nov 9, 2025

I made the requested change, removed index from tarjanState since we don't actually need it, and added some more comments to explain the used variations.

I don't think there is a good way or need to benchmark this. It only really matters if you have a bunch of defers in a gigantic function. We generally have other stuff that uses a lot more CPU time. I could create some crazy function with a huge number of blocks and defers but that wouldn't really be a good benchmark.

@deadprogram
Copy link
Member

Thanks for the improvement here @niaow now merging.

@deadprogram deadprogram merged commit 852dde6 into tinygo-org:dev Nov 10, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants