-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8301404: Factor out os::malloc with os::realloc common code, so that we only have 1 code path #24189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back gziemski! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@gerard-ziemski The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gerard!
Does the split into pre_
and post_
alloc really help with legibility when reading the code? To me, it doesn't.
What I mean by that is: If I read os::malloc
for the first time, then os::pre_alloc
and os::post_alloc
just looks like magic to me. They don't actually say what they do. If I want to understand os::malloc
I have to go into both pre_
and post_
alloc and read the code anyway. Then, I also have the issue that I need to remember exactly what the arguments were for those two functions, because the behavior changes depending on what's passed in.
We essentially have two functions for both pre and post, actually. Here're our pre calls:
680: size_t outer_size = os::pre_alloc(&rc, nullptr, size, true, mem_tag);
712: size_t outer_size = os::pre_alloc(&rc, memblock, size, false, mem_tag);
And here are our post calls:
693: return post_alloc(outer_ptr, size, 0, mem_tag, stack);
754: rc = post_alloc(outer_ptr, size, old_size, mem_tag, stack);
If we were to partially apply these, we have:
680: size_t outer_size = os::pre_alloc_nullptr_true(&rc, mem_tag);
712: size_t outer_size = os::pre_alloc_memblock_false(&rc, mem_tag);
And we have:
693: return post_alloc_0(outer_ptr, size, mem_tag, stack);
754: rc = post_alloc_old_size(outer_ptr, size, mem_tag, stack);
This, in combination with the non-obvious names, makes this harder to understand for me. It also doesn't actually save any lines of code.
Cheers,
Johan
Thank you for the review Johan, Instead of generic I think that makes the code more readable and explains their functionality without having to read their implementation, unless you are very curious or need to. We now use |
To me it never was about saving the lines of code, which it still does - I will calculate that in a bit. The malloc and realloc are so similar that I wanted to know exactly where the differences were and saw the opportunity to understand and clean it up. I personally think that we have achieved that here. I still think we can do more here, it's not perfect, but for now I am personally happy with this refactoring. Notice that we are implementing realloc in terms of malloc. I really think we can take this further, if we want to, and do the opposite - implement malloc in terms of realloc, which would completely eliminate malloc impl. We can save that for a follow up, if we want to. |
@stefank To answer Stefan's question from Slack channel - |
My comment was along the lines that I thought it would probably be clearer if os::pre_alloc returned a pointer and let the returned size_t be an output parameter instead. Given that the function allocates memory, I'm still not convinced that returning size_t is beneficial for the readability here. I posted that comment in the Slack channel because couldn't prioritize looking at this today, but wanted to give a quick answer to the direct ping in that channel. Now that you got my attention I think that the rename to |
I see your point about |
I think this went to the wrong PR. |
I took a deeper look at the proposed patch and I'm still not convinced that this is a net positive cleanup. The added shared code now contains new branches to sort out the difference between malloc and realloc. FWIW, I poked around at the original code to see what it would look to go the other way. Push NMT-specific code deeper down so that the higher-level functions get clearer and maybe easier to maintain. If you want to take a look, I've a branch for that here: |
Ah, thanks I will reveert it. |
Taking a look (very hopeful) ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really reasonable. The new names for the refactoring are good.
Also I think I saw that you ran tier1 on this change. You should run tier1-4 at least. |
Absolutely, I usually run tier1-5, but even getting to tier1 was quite a challenge, so I stopped there for a moment. |
I see what you were looking for - you have split and shuffled the code around into logical units. It is definitively an improvement over the existing code and I do like it, but it is not addressing at all the reason I personally started to look into this, which was to share as much code as possible between realloc and malloc (or just outright implement malloc in terms of realloc), so that any NMT interaction could be reduced to just one common path. Still, I think this is a definite improvement and I would personally take it. Thank you for all your work! Can we get this into a PR please? Also, one thing in your PR would be to change either:
or:
|
Thank you for the review Coleen, unfortunately this looks like a dead end, since Johan and Stefan do not see this as an improvement. Stefan has suggested an alternate impl, which does clean up things nicely, even though it does not address at all my concerns and reasons for working on this fix. Is that correct? Johan, Stefan, you are NO on this? |
Sure. But let's see what other people delving into this area think.
Thanks for finding that. I went back-and-forth between the two names and ended up using both. |
I have to think this through. I know that this code can be deceptively tricky, and looks deceptively simple, but a lot of thought has been put into the details of the existing implementation. Plus, this code is not well covered with regression tests. For those reasons alone, I'd rather avoid changes like these. They save 36 lines of code, but the simplicity here is really subjective, and the review work needed to be put into this is time we probably won't ever get back from these simplifications. Just my 5 cent. I am not against refactoring code, far from it, but there are oh so many places in the JVM that would benefit more from some cleanup love (e.g. hotspot arenas). |
Right. I don't think the goal of having 1 code path here is currently leading to a cleanup that makes the code easier to read or maintain. |
OK, closing this one. Hopefully, we can use Stefan's alternative to move forward here. |
@gerard-ziemski you have done a lot of very good cleanups over the last years. Sorry for being negative on this one. |
This is the 2nd time I am proposing this (controversial?) change, but this time I do have performance numbers, which indicate no change in speed (using NMTBenchmark from #23786):
proposed:
time:72,642,827[ns]
[samples:807,804] [NMT headers:382,064]
[malloc#:588,703] [realloc#:12,462] [free#:206,639]
memory requested:57,274,288 bytes, allocated:69,004,800 bytes
malloc overhead:4,853,360 bytes [8.47%], NMT headers overhead:6,877,152 bytes [12.01%]
existing code:
time:73,085,446[ns]
[samples:807,804] [NMT headers:382,064]
[malloc#:588,703] [realloc#:12,462] [free#:206,639]
memory requested:57,274,288 bytes, allocated:69,004,800 bytes
malloc overhead:4,853,360 bytes [8.47%], NMT headers overhead:6,877,152 bytes [12.01%]
Note: the NMTBenchmark reports realloc(nullptr) as mallocs(), which is why both versions show the same count for mallocs/reallocs.
The performance is virtually the same where I sampled each test 30 times and took the best (the shortest).
This proposed change factors out the common code and simplifies both os::malloc and os::realloc. We were able to reduce malloc from 44 lines down to 8 (saving of 36 lines) and realloc from 84 to 55 (29 lines).
To me the most important part here is that we reduce the number of times that NMT has to interact with the native allocation code.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24189/head:pull/24189
$ git checkout pull/24189
Update a local copy of the PR:
$ git checkout pull/24189
$ git pull https://git.openjdk.org/jdk.git pull/24189/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24189
View PR using the GUI difftool:
$ git pr show -t 24189
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24189.diff
Using Webrev
Link to Webrev Comment