Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak under OpenJ9 #36

Closed
Ristovski opened this issue Jul 12, 2019 · 12 comments
Closed

Memory leak under OpenJ9 #36

Ristovski opened this issue Jul 12, 2019 · 12 comments

Comments

@Ristovski
Copy link

Running Canvas under OpenJ9-JDK11 (not to be confused with OpenJDK9) seems to be causing a gradual memory leak where a full client restart is required.

What's even more interesting, is that RAM usage remains roughly the same (as reported by htop) - only buff/cache (indicated by free -mh under Linux) seems to go up. Thus, not only is a client restart required - but also freeing the buff/cache via echo 3 > /proc/sys/vm/drop_caches.

This might be caused by virtual memory, but I am not sure.

I initially suspected OpenJ9s "JIT Compiler" threads, since those are the only ones that seem to be using any sort of disk I/O. But that would imply it would happen all the time - and it only happens with Canvas loaded.

Do note the whole buff/cache thing could be a side-effect of the high memory allocation and might as well be irrelevant.

Asking around Fabrics Discord, a user shared the following screenshot of OptiFines author explaining a similar issue with OpenJ9 (apparently /not/ OpenJDK9, as written in the screenshot)
image

Note how JDK9-12 are supposed to mitigate this, yet it happens with JDK11 anyways.

No idea if relevant or helpful, but going under water seems to accelerate the issue.

I will try and get more accurate RAM/VRAM measurements later today or tomorrow.

Any ideas on how to try debugging this?

@grondag
Copy link
Collaborator

grondag commented Jul 13, 2019

Thank you for the excellent report.

Best guess is that it is caused by some incompatibility with LWJGL's MemoryUtil allocation routines, which default to using Unsafe if available, then reflection if not.

There is/was such a bug with OpenJDK 9 which, as you point out, is not the same as OpenJ9. But that does illustrate that such bugs are possible.

The biggest consumer in Canvas will be chunk buffer uploads, for which Canvas uses MemoryUtil.memAlloc() and MemoryUtil.memFree(). It appears likely the calls could be made on different threads but they should be properly matched 1:1.

The only other use of memAlloc/memFree is for light map texture initialization, but it 1X and likely not the culprit.

To reproduce this in isolation a simple test app that allocates and deallocates on multiple threads with a variety of buffer sizes might do the job. I will eventually get around to doing that myself but if you want to have a go at it that would be great.

As for how to fix, I will probably be replacing those buffers with a pool of fixed-sized buffers for reuse, which may help, or perhaps LWJGL offers a different allocation routine that is more compatible (though perhaps less performant.) I have not done that research yet.

If the issue turns out to be allocate/release on different threads, I should be able to force both onto the client thread if necessary.

@grondag
Copy link
Collaborator

grondag commented Jul 13, 2019

Also - a faster if less exact way to confirm the problem is the chunk buffer allocation: just open a world and load a bunch of render chunks, either via F3+A or moving across terrain. If the leak is worse when you do that, it is almost certainly a problem with the MemoryUtil routines.

@ghost
Copy link

ghost commented Jul 13, 2019

The memory leak is still present with AdoptOpenJDK's JDK12-OpenJ9 build and worsens as I move across terrain I did not previously visit.

Sorry if my comment did not answer anything

@grondag
Copy link
Collaborator

grondag commented Jul 14, 2019

That does help some. I need to confirm it doesn’t happen on other JVMs and then I can try a different allocation approach.

@Ristovski
Copy link
Author

So I just tested spamming F3+A in a superflat world and I can confirm that with Canvas the RAM usage seems to go up over time (and especially buff/cache).

So it indeed does seem like some of the allocations are not getting freed.

Both instances (with and without Canvas) started off at about ~850-900MB of RAM, the one without Canvas stayed at that value most of the time while the one with Canvas shot up to 1800MB fairly quickly.

The 'memory allocated' metric in the F3 debug screen stayed the same, along with the reported memory use. (Which I guess confirms the issue is with the MemoryUtil allocations.)

@ghost
Copy link

ghost commented Jul 14, 2019

Memory leak also happens on:

  • Zulu 11.31.11
  • AdoptOpenJDK 12 with HotSpot (JRE)

@grondag
Copy link
Collaborator

grondag commented Jul 17, 2019

Thank you for the confirmation. I'm focused on a different (not yet released) mod currently, but will get back to working on Canvas "soon."

@Ristovski
Copy link
Author

Ristovski commented Aug 3, 2019

@grondag any updates on this?

Currently looking into https://github.com/LWJGL/lwjgl3-wiki/wiki/2.5.-Troubleshooting#memory-allocator-debug-mode, could yield some more information

@grondag
Copy link
Collaborator

grondag commented Aug 5, 2019

2019-08-04_17 35 38

AdoptOpenJDK 12 with HotSpot is working great on OSX. Guessing it is OS-specific.

I will try putting some additional config options to at least allow for debugging, maybe more conservative memory allocation. A rewrite of buffering altogether is what I need to do but that will take longer than I have until work on Exotic Matter and fluid API get to a stopping point.

@grondag grondag closed this as completed in 3405dc1 Aug 6, 2019
@grondag
Copy link
Collaborator

grondag commented Aug 7, 2019

Should hopefully be better in build 354, available now on Curse: https://www.curseforge.com/minecraft/mc-mods/canvas-renderer/files/2756313

I found some scenarios where it may not have been properly deallocating. If that doesn't fix the problem, try enabling "safe memory allocation" on the debug config tab. That will use NIO buffers which are much slower but they automatically deallocated via JVM garbage collection.

Please let me know if the problem persists.

@Ristovski
Copy link
Author

Ristovski commented Aug 7, 2019

Just tested the new version and it's better! Memory usage still goes up over time though, albeit much slower than before.

With NIO buffers, the memory usage is still a bit higher but doesn't seem to go up as much.

One thing I did notice when spamming F3+A in rapid succession (rather fast), was that with Canvas the CPU usage goes up and FPS drops to ~19FPS, while without Canvas reloading chunks at that speed yields no FPS drop.

This happens both with default and NIO buffers. I don't recall this happening with previous versions, but I could be wrong and might test that out later.

@grondag
Copy link
Collaborator

grondag commented Aug 7, 2019

It does do extra work now on chunk reload to ensure that all buffers are deallocated. When an early reload happens that way there can easily be a couple thousand buffers awaiting access to the main thread for upload, especially on a fast machine. Now it should block until they are all released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants