Skip to content

8301404: Factor out os::malloc with os::realloc common code, so that we only have 1 code path #24189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

gerard-ziemski
Copy link

@gerard-ziemski gerard-ziemski commented Mar 24, 2025

This is the 2nd time I am proposing this (controversial?) change, but this time I do have performance numbers, which indicate no change in speed (using NMTBenchmark from #23786):

proposed:

time:72,642,827[ns]
[samples:807,804] [NMT headers:382,064]
[malloc#:588,703] [realloc#:12,462] [free#:206,639]
memory requested:57,274,288 bytes, allocated:69,004,800 bytes
malloc overhead:4,853,360 bytes [8.47%], NMT headers overhead:6,877,152 bytes [12.01%]
existing code:

time:73,085,446[ns]
[samples:807,804] [NMT headers:382,064]
[malloc#:588,703] [realloc#:12,462] [free#:206,639]
memory requested:57,274,288 bytes, allocated:69,004,800 bytes
malloc overhead:4,853,360 bytes [8.47%], NMT headers overhead:6,877,152 bytes [12.01%]
Note: the NMTBenchmark reports realloc(nullptr) as mallocs(), which is why both versions show the same count for mallocs/reallocs.

The performance is virtually the same where I sampled each test 30 times and took the best (the shortest).

This proposed change factors out the common code and simplifies both os::malloc and os::realloc. We were able to reduce malloc from 44 lines down to 8 (saving of 36 lines) and realloc from 84 to 55 (29 lines).

To me the most important part here is that we reduce the number of times that NMT has to interact with the native allocation code.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8301404: Factor out os::malloc with os::realloc common code, so that we only have 1 code path (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24189/head:pull/24189
$ git checkout pull/24189

Update a local copy of the PR:
$ git checkout pull/24189
$ git pull https://git.openjdk.org/jdk.git pull/24189/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24189

View PR using the GUI difftool:
$ git pr show -t 24189

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24189.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 24, 2025

👋 Welcome back gziemski! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 24, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Mar 24, 2025

@gerard-ziemski The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Mar 24, 2025
@gerard-ziemski gerard-ziemski marked this pull request as ready for review March 27, 2025 23:02
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 27, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 27, 2025

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gerard!

Does the split into pre_ and post_ alloc really help with legibility when reading the code? To me, it doesn't.

What I mean by that is: If I read os::malloc for the first time, then os::pre_alloc and os::post_alloc just looks like magic to me. They don't actually say what they do. If I want to understand os::malloc I have to go into both pre_ and post_ alloc and read the code anyway. Then, I also have the issue that I need to remember exactly what the arguments were for those two functions, because the behavior changes depending on what's passed in.

We essentially have two functions for both pre and post, actually. Here're our pre calls:

680:  size_t outer_size = os::pre_alloc(&rc, nullptr, size, true, mem_tag);
712:  size_t outer_size = os::pre_alloc(&rc, memblock, size, false, mem_tag);

And here are our post calls:

693:  return post_alloc(outer_ptr, size, 0, mem_tag, stack);
754:    rc = post_alloc(outer_ptr, size, old_size, mem_tag, stack);

If we were to partially apply these, we have:

680:  size_t outer_size = os::pre_alloc_nullptr_true(&rc, mem_tag);
712:  size_t outer_size = os::pre_alloc_memblock_false(&rc, mem_tag);

And we have:

693:  return post_alloc_0(outer_ptr, size, mem_tag, stack);
754:    rc = post_alloc_old_size(outer_ptr, size, mem_tag, stack);

This, in combination with the non-obvious names, makes this harder to understand for me. It also doesn't actually save any lines of code.

Cheers,
Johan

@openjdk openjdk bot removed the rfr Pull request is ready for review label Mar 31, 2025
@gerard-ziemski
Copy link
Author

gerard-ziemski commented Mar 31, 2025

Thank you for the review Johan,

Instead of generic pre_alloc, post_alloc names, which explain nothing I am now using pre_init_and_check_size and record_and_clear_bits

I think that makes the code more readable and explains their functionality without having to read their implementation, unless you are very curious or need to.

We now use ptr instead of the old rc, which I have no idea what it stood for.

@gerard-ziemski
Copy link
Author

gerard-ziemski commented Mar 31, 2025

To me it never was about saving the lines of code, which it still does - I will calculate that in a bit.

The malloc and realloc are so similar that I wanted to know exactly where the differences were and saw the opportunity to understand and clean it up.

I personally think that we have achieved that here. I still think we can do more here, it's not perfect, but for now I am personally happy with this refactoring.

Notice that we are implementing realloc in terms of malloc. I really think we can take this further, if we want to, and do the opposite - implement malloc in terms of realloc, which would completely eliminate malloc impl. We can save that for a follow up, if we want to.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 31, 2025
@gerard-ziemski
Copy link
Author

@stefank To answer Stefan's question from Slack channel - NMTPreInit::handle_realloc() takes handle and returns bool, so I was trying to stay consistent, which is why os::pre_init_and_check_size() takes handle and returns size.

@stefank
Copy link
Member

stefank commented Mar 31, 2025

@stefank To answer Stefan's question from Slack channel - NMTPreInit::handle_realloc() takes handle and returns bool, so I was trying to stay consistent, which is why os::pre_init_and_check_size() takes handle and returns size.

My comment was along the lines that I thought it would probably be clearer if os::pre_alloc returned a pointer and let the returned size_t be an output parameter instead. Given that the function allocates memory, I'm still not convinced that returning size_t is beneficial for the readability here.

I posted that comment in the Slack channel because couldn't prioritize looking at this today, but wanted to give a quick answer to the direct ping in that channel.

Now that you got my attention I think that the rename to pre_init_and_check_size lost the hint that it was performing "pre init allocation". The "and check size" is also an indication that the function does two different, unrelated(?) things. I would need to look closer to give suggestions for cleanups.

@gerard-ziemski
Copy link
Author

gerard-ziemski commented Mar 31, 2025

@stefank To answer Stefan's question from Slack channel - NMTPreInit::handle_realloc() takes handle and returns bool, so I was trying to stay consistent, which is why os::pre_init_and_check_size() takes handle and returns size.

My comment was along the lines that I thought it would probably be clearer if os::pre_alloc returned a pointer and let the returned size_t be an output parameter instead. Given that the function allocates memory, I'm still not convinced that returning size_t is beneficial for the readability here.

I posted that comment in the Slack channel because couldn't prioritize looking at this today, but wanted to give a quick answer to the direct ping in that channel.

Now that you got my attention I think that the rename to pre_init_and_check_size lost the hint that it was performing "pre init allocation". The "and check size" is also an indication that the function does two different, unrelated(?) things. I would need to look closer to give suggestions for cleanups.

I see your point about os::pre_init_and_check_size(), but then we should touch NMTPreInit::handle_realloc() as well?

@stefank
Copy link
Member

stefank commented Mar 31, 2025

pass explicit mem_tag value, skip exec parameter when false

I think this went to the wrong PR.

@stefank
Copy link
Member

stefank commented Mar 31, 2025

I took a deeper look at the proposed patch and I'm still not convinced that this is a net positive cleanup. The added shared code now contains new branches to sort out the difference between malloc and realloc.

FWIW, I poked around at the original code to see what it would look to go the other way. Push NMT-specific code deeper down so that the higher-level functions get clearer and maybe easier to maintain. If you want to take a look, I've a branch for that here:
master...stefank:jdk:24189_alternative

@gerard-ziemski
Copy link
Author

pass explicit mem_tag value, skip exec parameter when false

I think this went to the wrong PR.

Ah, thanks I will reveert it.

@gerard-ziemski
Copy link
Author

I took a deeper look at the proposed patch and I'm still not convinced that this is a net positive cleanup. The added shared code now contains new branches to sort out the difference between malloc and realloc.

FWIW, I poked around at the original code to see what it would look to go the other way. Push NMT-specific code deeper down so that the higher-level functions get clearer and maybe easier to maintain. If you want to take a look, I've a branch for that here: master...stefank:jdk:24189_alternative

Taking a look (very hopeful) ...

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really reasonable. The new names for the refactoring are good.

@coleenp
Copy link
Contributor

coleenp commented Apr 1, 2025

Also I think I saw that you ran tier1 on this change. You should run tier1-4 at least.

@gerard-ziemski
Copy link
Author

Also I think I saw that you ran tier1 on this change. You should run tier1-4 at least.

Absolutely, I usually run tier1-5, but even getting to tier1 was quite a challenge, so I stopped there for a moment.

@gerard-ziemski
Copy link
Author

I took a deeper look at the proposed patch and I'm still not convinced that this is a net positive cleanup. The added shared code now contains new branches to sort out the difference between malloc and realloc.

FWIW, I poked around at the original code to see what it would look to go the other way. Push NMT-specific code deeper down so that the higher-level functions get clearer and maybe easier to maintain. If you want to take a look, I've a branch for that here: master...stefank:jdk:24189_alternative

I see what you were looking for - you have split and shuffled the code around into logical units. It is definitively an improvement over the existing code and I do like it, but it is not addressing at all the reason I personally started to look into this, which was to share as much code as possible between realloc and malloc (or just outright implement malloc in terms of realloc), so that any NMT interaction could be reduced to just one common path.

Still, I think this is a definite improvement and I would personally take it.

Thank you for all your work! Can we get this into a PR please?

Also, one thing in your PR would be to change either:

realloc_impl --> realloc_inner

or:

malloc_inner --> malloc_impl

@gerard-ziemski
Copy link
Author

Thank you for the review Coleen, unfortunately this looks like a dead end, since Johan and Stefan do not see this as an improvement. Stefan has suggested an alternate impl, which does clean up things nicely, even though it does not address at all my concerns and reasons for working on this fix.

Is that correct? Johan, Stefan, you are NO on this?

@stefank
Copy link
Member

stefank commented Apr 1, 2025

I took a deeper look at the proposed patch and I'm still not convinced that this is a net positive cleanup. The added shared code now contains new branches to sort out the difference between malloc and realloc.
FWIW, I poked around at the original code to see what it would look to go the other way. Push NMT-specific code deeper down so that the higher-level functions get clearer and maybe easier to maintain. If you want to take a look, I've a branch for that here: master...stefank:jdk:24189_alternative

I see what you were looking for - you have split and shuffled the code around into logical units. It is definitively an improvement over the existing code and I do like it, but it is not addressing at all the reason I personally started to look into this, which was to share as much code as possible between realloc and malloc (or just outright implement malloc in terms of realloc), so that any NMT interaction could be reduced to just one common path.

Still, I think this is a definite improvement and I would personally take it.

Thank you for all your work! Can we get this into a PR please?

Sure. But let's see what other people delving into this area think.

Also, one thing in your PR would be to change either:

realloc_impl --> realloc_inner

or:

malloc_inner --> malloc_impl

Thanks for finding that. I went back-and-forth between the two names and ended up using both.

@tstuefe
Copy link
Member

tstuefe commented Apr 1, 2025

I have to think this through. I know that this code can be deceptively tricky, and looks deceptively simple, but a lot of thought has been put into the details of the existing implementation. Plus, this code is not well covered with regression tests.

For those reasons alone, I'd rather avoid changes like these. They save 36 lines of code, but the simplicity here is really subjective, and the review work needed to be put into this is time we probably won't ever get back from these simplifications.

Just my 5 cent. I am not against refactoring code, far from it, but there are oh so many places in the JVM that would benefit more from some cleanup love (e.g. hotspot arenas).

@stefank
Copy link
Member

stefank commented Apr 1, 2025

Thank you for the review Coleen, unfortunately this looks like a dead end, since Johan and Stefan do not see this as an improvement. Stefan has suggested an alternate impl, which does clean up things nicely, even though it does not address at all my concerns and reasons for working on this fix.

Is that correct? Johan, Stefan, you are NO on this?

Right. I don't think the goal of having 1 code path here is currently leading to a cleanup that makes the code easier to read or maintain.

@gerard-ziemski
Copy link
Author

OK, closing this one. Hopefully, we can use Stefan's alternative to move forward here.

@tstuefe
Copy link
Member

tstuefe commented Apr 1, 2025

OK, closing this one. Hopefully, we can use Stefan's alternative to move forward here.

@gerard-ziemski you have done a lot of very good cleanups over the last years. Sorry for being negative on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

5 participants