-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace loop in ensureCapacity with a std.mem.max for the new capacity. #1233
Replace loop in ensureCapacity with a std.mem.max for the new capacity. #1233
Conversation
if (better_capacity >= new_capacity) break; | ||
} | ||
better_capacity += better_capacity / 2 + 8; | ||
better_capacity = std.mem.max(usize, []usize{better_capacity, new_capacity}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what you want. You need to keep multiplying better_capacity
by two and adding 8 until you are greater than new_capacity
. The loop actually accomplishes this mathematical recurrence: f(n) = f(n - 1) + f(n - 1) / 2 + 8
. The new code does not yield the same result.
Edit: thanks @tiehuis for correcting recurrence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying, but the current factor is 1.5x f(n) = f(n-1) + f(n-1) / 2 + 8
which is arguably a better ratio than 2. We want to do this in a loop because it preserves the property that a resize will always slightly overallocate on the assumption that more items beyond the point will eventually be added.
It could be worthwhile in some cases to have an ensureCapacityExact
which resizes to the exact amount if it is less than.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the new code doesn't do the same as what the old code does. But I don't think what the old code does makes any sense. I can see that you might want slightly more than asked for, but there are simpler ways to achieve that than having a loop.
ps. Why is 1.5 better than 2? If you just append elements in a tight loop 2 is better than 1.5 (and probably >2 is even better) because you just do less allocations and copies.
pps. Just for reference. For std::vector libstdc++ has 2 and the microsoft one has 1.5. And both just take the greater of the equivalent of better_capacity and new_capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md
Reasoning for the 2 vs 1.5 discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I wonder how much that affects real life programs. However, they don't always use 1.5.
https://github.com/facebook/folly/blob/f6f9d7ead19c34c033c4b673cc7ecb2942a6cb4a/folly/FBVector.h#L1186
By the way the +8 also pushes back the point of reusing memory significantly when starting with small sizes.
Reading that piece also makes me wonder about the allocators (and realloc). Maybe they could return a larger size than you requested if that suits them better.
It's a curious problem, isn't it? Thanks for the FBVector link, @monouser7dig. As far as I'm concerned these kinds of decisions are up for debate until 1.0.0. I'm intrigued by the idea of supporting a more sophisticated allocator interface so that allocators can provide (optional?) introspection into their state and the most efficient way to utilize them. For example, an ArenaAllocator could hint that a reallocation would unconditionally have to do a memcpy, and DirectAllocator could hint that if an allocation used less than an entire page, it could realloc to gain the rest of the page for free. |
@andrewrk |
I generalized this concept with a "stack fallback allocator": Example usage: Line 2802 in 2a719ee
Implementation: Line 364 in 2a719ee
It's probably not faster, but if the goal is to avoid allocation entirely for allocations below some threshold then I think it makes sense to use. For example, probably we will use it here: Lines 320 to 325 in 2a719ee
|
The loop just looked really weird to me and I couldn't just not fix it.