Skip to content

Set array size only when safe to do so #58848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MilesCranmer
Copy link
Member

@MilesCranmer MilesCranmer commented Jun 29, 2025

The .size of an array might be getting set too early; before the memory is actually allocated.

@MilesCranmer MilesCranmer changed the title Set array size when safe to do so Set array size only when safe to do so Jun 29, 2025
@LilithHafner
Copy link
Member

@oscardssmith, can you look at this?

@mbauman mbauman added arrays [a, r, r, a, y, s] backport 1.11 Change should be backported to release-1.11 labels Jun 30, 2025
@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jun 30, 2025

Copying my comment from my premature backport:

Ideally these would be atomic so order doesn't matter. But in case of misuse, I think this order is the least bad.

  • If the array is to shrink, then you ought to modify the length at the start.
  • If the array is to grow, then you ought to modify the length at the end.

This prevents one thread from reading .size and then indexing past the end of the array buffer before .ref has been set.

Now, while this lowers the chance of corruption from thread race, you can still totally get corruption. But this version is slightly less bad.

@vtjnash
Copy link
Member

vtjnash commented Jul 1, 2025

That would require a lock, since there are 3 fields to update (data, offset, length), which isn't really feasible. The described order is still at risk from all sorts of hoisting and sinking operations, as well as concurrent growing and shrinking on threads, but still is probably the best possible option (though I'd suggest making sure they are always exactly consecutively length then memory ref, as that is the only order that allows the CPU to do them as a single cache line update most of the time)

@MilesCranmer
Copy link
Member Author

Happy with whatever, just let me know what

they are always exactly consecutively length then memory ref, as that is the only order that allows the CPU to do them as a single cache line update most of the time

Interesting! Why does order matter if they are getting fused into a single update?

@vtjnash
Copy link
Member

vtjnash commented Jul 1, 2025

There is a write barrier after the store of memref

@MilesCranmer
Copy link
Member Author

Cool. How much would it affect real Julia performance? I would have assumed there aren’t the right effects in place for the compiler to do such an optimization – such as there permitting multiple references to the same object

@vtjnash
Copy link
Member

vtjnash commented Jul 1, 2025

The fusing will only happen if you use that order. Otherwise they will be separate operations.

@MilesCranmer
Copy link
Member Author

Sorry, are you saying that this fusing is already happening in Julia? Or that you’d like me to modify the PR so that this does happen?

If the latter, how much % performance improvement would this bring?

@vtjnash
Copy link
Member

vtjnash commented Jul 1, 2025

You'll need to modify the PR to make sure the setfield of length only occurs either immediately before another setfield, or after any other logic if there is no other setfield

@MilesCranmer
Copy link
Member Author

Current master branch doesn’t have this property; there is logic sitting in between each setfield. Maybe the ordering was profiled and then determined not to actually matter for performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] backport 1.11 Change should be backported to release-1.11
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants