JVM memory leaks #7

mboes · 2016-11-14T22:29:45Z

One well known difficulty of interoperating two languages with automatic memory management is that the two garbage collectors tend to cut the grass under each other's feet. This is because each language has its own heap, which the GC of the other language can't traverse. A common solution to this problem (see e.g. HaskellR docs) is to add the objects referenced by one heap as GC roots in the other heap. Even then cycles can be an issue, but those are uncommon and/or can often be statically ruled out.

But unlike HaskellR, The jvm package has still left the memory management conundrum largely as "future work". In practice though, things work surprisingly well as-is. This is because the JNI does much of the work for us already. When the JNI provides a reference to some object, the reference is implicitly added as a GC root. So jvm is at least safe, in that objects won't just disappear under the Haskell program's feet (but see below). What's more, the JNI automatically pops these GC roots when the control flow returns from a native activation frame on the call stack, so leaks are not an issue in the (common) simple cases.

However, with the JNI we have other problems:

references are thread local. That means that the programmer shouldn't play games trying to store Java references in long-lived structures shared between multiple threads. There is currently no mechanism in place to statically protect the programmer from herself. Reference can be made thread-global at a small performance cost. Probably best to let the programmer do so explicitly. But we don't even have bindings for that yet.
Even JVM object references in long-lived thread-local structures won't do. Because then dynamic scope of the reference would be extruding from its lexical scope (remember that the JNI invalidates these references upon returning from a native call).

Both of these issues can be solved using monadic regions, in the style of Kiselyov and Shan. Regions give static guarantees that thread-local object references can't escape the lexical scope (i.e. can't live longer than the current activation frame).

One thing to keep in mind, however, is that monadic regions do have costs:

Need an ST-monad like transformer on top of IO, with a dummy type variable to track the active region. So no longer vanilla IO.
Imposes a monadic style everywhere when accessing Java objects, even in code that could otherwise be considered pure and written in direct style.
Regions impose a stack like discipline to memory management. Our experience suggests this is quite okay in practice, since it just means some objects end up living slightly longer (but predictably so) than they otherwise should, but ideally the programmer would retain a more fine grained control over the lifetime of resources.

A long term solution to both of those problems is to extend GHC with linear types. Tweag I/O is currently working with GHC HQ and Gothenburg university on precisely that (see https://ghc.haskell.org/trac/ghc/wiki/LinearTypes for an early writeup of the proposal). Linear types in this context would make it possible to avoid the inconvenience of a stack-like memory management discipline. One would still be able to free objects whenever, while still avoiding two GC's killing each other in a duel. Short term though I reckon our only bet is to embrace monadic regions if the programmers do need the extra static checking.

To summarize, I see two action items here:

introduce monadic regions for extra static checking of local references to JVM objects.
introduce an interface to allow the programmer to explicitly graduate local references to global references for advanced use cases. These global references would be modeled as a ForeignPtr in Haskell, so as to associate finalizers, which remove the global reference from the JVM once the object becomes unreachable.

This is all still up for discussion. For example, an alternative we could consider is to use only global references everywhere, with finalizers, no local references. But I worry about the performance overhead of such a strategy, which we'd have to measure carefully.

cc @robinbb @alpmestan @dcoutts

The text was updated successfully, but these errors were encountered:

mboes mentioned this issue Nov 30, 2016

Free attached threads #6

Closed

mboes mentioned this issue May 15, 2017

Document the performance cost of local/global refs creation #65

Closed

facundominguez mentioned this issue Jul 3, 2017

Introduce type-level regions #73

Open

parimalyeole1 pushed a commit that referenced this issue Feb 3, 2023

test ci with darwin runner #7

d2b9b99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JVM memory leaks #7

JVM memory leaks #7

mboes commented Nov 14, 2016

JVM memory leaks #7

JVM memory leaks #7

Comments

mboes commented Nov 14, 2016