-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interaction between Tensorflow Java and JavaCPP Pointer deallocation #208
Comments
If you're confident you don't need GC, we can set the "org.bytedeco.javacpp.nopointergc" system property to "true" to reduce overhead. |
I am not sure wether I need GC or not. Looking at the EagerSession class seems like it is managing memory through the pointer scope. The NDArrays I allocate through DJL/TF also use the same. |
I will try the system property you have mentioned and run my benchmarks, thanks for such a quick response! |
You'll need to keep in mind though that neither TF nor DJL are particularly designed to reduce GC. JavaCPP is only a small part of the overall design, and there is a lot garbage that gets generated elsewhere. |
Still investigating how GC optimizations work in the DJL + Tensorflow Environment. |
Like I said, make sure to set the "org.bytedeco.javacpp.nopointergc" system property to "true" to prevent JavaCPP from calling |
I've tried the org.bytedeco.javacpp.nopointergc and very quickly ran out of memory :-) |
If I keep it "false", then in JVM with G1GC, I spend 25% of the time in GC on a high throughput inference use-case, with occasional blocking. |
stacktrace related to allocation
|
So I've tried it, the blocking calls are removed, the thread with JavaCPP Deallocator is also disappears and no-more blocking GCs. Only issue is I quickly run out of memory :-) |
There is a team discussion on DJL around Tensorflow Java and JavaCPP performance, if you are interested - deepjavalibrary/djl#625 |
Obviously, when not using the GC, you'll need to make sure to deallocate native memory some other way! |
True that, I've just tested the theory that PointerScope takes care of my
specific use case for inference. But you are right, there is quite a lot of
garbage besides that.
…On Fri, Feb 19, 2021 at 4:33 PM Samuel Audet ***@***.***> wrote:
So I've tried it, the blocking calls are removed, the thread with JavaCPP
Deallocator is also disappears and no-more blocking GCs. Only issue is I
quickly run out of memory :-)
Obviously, when not using the GC, you'll need to make sure to deallocate
native memory some other way!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQDEMC63YUQQAQFNHWTS737NBANCNFSM4XD2EC2Q>
.
|
Eager execution does allocate a lot of native resources, as each operation and each of their outputs will need to be freed. I don't know much about details the GC feature of JavaCPP we are using now but I can tell that in 1.14, the way it was working is that the GC listener was running in a separate thread and was trying to free resources referenced by deleted objects, but just as a "best effort" since GC is not entirely reliable when it comes to native memory. @saudet, this thread-based implementation was just listening to a phantom reference queue and was therefore non-blocking, is JavaCPP doing something similar? Ultimately, resources should be cleaned up by closing the eager session enclosing some piece of code and that part is still true:
This ensure to release all resources independently from the GC. So while it is not enforced by the API, it is recommended to scope your eager operations into multiple eager session instead of only relying on the default one (i.e. the one used when simply invoking |
It's doing that, but it also blocks by default when Java users are typically used to having per-process memory thresholds like that, but it looks like users of DJL consider this a "bug", and prefer to have it behave more like C/C++/Python, where we can always allocate all memory available on the system with no restrictions. |
If I leave default settings for JavaCPP, use single interop and intraop
thread in Tensorflow and use G1GC for GC - I get a weird behavior.
Everything works fine until certain load threshold, after which heap gets
to about 100% and full GC cycles can not clean anything in it. Seems like
something is holding pointers forever. Need to test custom maxPhysicalBytes params.
…On Monday, 22 February 2021, Samuel Audet ***@***.***> wrote:
I don't know much about details the GC feature of JavaCPP we are using now
but I can tell that in 1.14, the way it was working is that the GC listener
was running in a separate thread and was trying to free resources
referenced by deleted objects, but just as a "best effort" since GC is not
entirely reliable when it comes to native memory. @saudet
<https://github.com/saudet>, this thread-based implementation was just
listening to a phantom reference queue and was therefore non-blocking, is
JavaCPP doing something similar?
It's doing that, but it also blocks by default when maxBytes or
maxPhysicalBytes is reached, unless that is set to 0 as done in DJL here:
https://github.com/awslabs/djl/blob/master/tensorflow/
tensorflow-engine/src/main/java/ai/djl/tensorflow/engine/LibUtils.java#L52
Java users are typically used to having per-process memory thresholds like
that, but it looks like users of DJL consider this a "bug", and prefer to
have it behave more like C/C++/Python, where we can always allocate all
memory available on the system with no restrictions.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQGOEXLME6OWTJN3I53TAIYS7ANCNFSM4XD2EC2Q>
.
|
That sounds like an issue somewhere in TF Java, not JavaCPP. If you have a simple way to reproduce this, please let us know! |
I do, let me write it up.
…On Mon, Feb 22, 2021 at 3:03 PM Samuel Audet ***@***.***> wrote:
If I leave default settings for JavaCPP, use single interop and intraop
thread in Tensorflow and use G1GC for GC - I get a weird behavior.
Everything works fine until certain load threshold, after which heap gets
to about 100% and full GC cycles can not clean anything in it. Seems like
something is holding pointers forever. Need to test custom maxPhysicalBytes
params.
That sounds like an issue somewhere in TF Java, not JavaCPP. If you have a
simple way to reproduce this, please let us know!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQGU7A35W3KARSUXKGDTALPCZANCNFSM4XD2EC2Q>
.
|
@saudet , I just to verify with you one more thing concerning the GC listening thread in JavaCPP. Before in TF Java (1.x), each resource that has native memory allocated was being referred by a So basically, that phantom reference contains all the information required to release any native resources associated to a particular object that was garbage collected. In JavaCPP, how do you keep track of which object refers to which native resources? Does it exclusively rely on |
No, there are |
Ok so the Then another question, if the |
Yes, that's idea. Everything is centered around Pointer to make it easier to reason about, and I've found that we can also use it to map anything of interest from the C/C++ world, well anything that has to do with memory living on the native heap anyway.
It stores a strong reference, otherwise this kind of pattern may result in a crash: try (PointerScope scope = new PointerScope()) {
IntPointer something = new IntPointer(10);
someNativeObject.keepReferenceTo(something);
// The Java object "something" is no longer referenced from here, but someNativeObject expects the native memory to be
// still allocated, so this call may crash, unless an object like PointerScope keeps a strong reference, which it does.
someNativeObject.doSomething()
} In the event that a scope is not closed, but becomes itself unreachable though, everything it contains becomes eligible for GC. |
Ok, this new paradigm introduced once we have switched to JavaCPP is probably the cause of the issue here. Before, the Some sessions (like the default one, which I think @skirdey uses) are never or rarely closed, therefore entirely rely on the GC or on explicit closing of the resources themselves (the tensors, for example). Since a strong reference is kept on the native pointer, the memory will never be released. That certainly happens with the It could be great if a user can decide to attach weakly or strongly a @skirdey , do you build your own version of TF Java when you are doing your tests or you use the prebuilt one coming with DJL? It would be interesting to see the behaviour of your code by adding this patch. |
@karllessard I would love to try a patch. I am using stock version of tensor-flow that comes with DJL. I am still not sure what would be the patch, so if you have one send it my way :-) |
btw, when using nopointergc=true the stock DJL benchmark goes OOM as well -
deepjavalibrary/djl#690
so if you want to try it locally without my specific use case, you can also
do that
…On Thu, Feb 25, 2021 at 4:34 AM Karl Lessard ***@***.***> wrote:
It stores a strong reference, otherwise this kind of pattern may result in
a crash:
Ok, this new paradigm introduced once we have switched to JavaCPP is
probably the cause of the issue here. Before, the EagerSession scope was
acting weakly and would not prevent the native memory of its
garbage-collected resources to be released.
Some sessions (like the default one, which I think @skirdey
<https://github.com/skirdey> uses) are never or rarely closed, therefore
entirely rely on the GC or on explicit closing of the resources themselves
(the tensors, for example). Since a strong reference is kept on the native
pointer, the memory will never be released. That certainly happens with the
EagerOperation resources
<https://github.com/tensorflow/java/blob/439373d4a76f4ba3665f55b929bb74876d4d96b7/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java#L56>
.
It could be great if a user can decide to attach weakly or strongly a
Pointer to a PointerScope. If that's not feasible, then we might want to
prevent attaching these resources to the eager session if we know that
session is the default one. But there is probably a cleaner way to fix this
as well.
@skirdey <https://github.com/skirdey> , do you build your own version of
TF Java when you are doing your tests or you use the prebuilt one coming
with DJL? It would be interesting to see the behaviour of your code by
adding this patch.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQG5LWHG4U26CGR3J4DTAY7TXANCNFSM4XD2EC2Q>
.
|
@karllessard @saudet @skirdey Here is my thought.
Either way is trying to get rid of GC (System.gc()) which slows down the entire system. The right pattern of coding Deep Learning system is to make sure all native resources are tracked by a scope and release the memory without a leak and help of GC. @saudet Does JavaCpp support approach 2 if users are working on memory-intensive application? |
Looking at DJL code and TfNDManager it does create a single session in with async option https://github.com/awslabs/djl/blob/master/tensorflow/tensorflow-engine/src/main/java/ai/djl/tensorflow/engine/TfNDManager.java#L69 I can not see if it is a "default" session or not. |
I see, that's what @rnett was referring in pull #188 (comment).
It's never a good idea to rely on the GC. We cannot avoid situations like I mention above where the native side keeps a reference to an existing Tensor, but where the Java side doesn't. The only sane way to deal with this is by not relying on the GC at all. We can still have it as a sort of option for users that don't want to use something like TensorScope though, but in that case, since we cannot offer any guarantees anyway, it makes no difference whether eager session holds on to weak references, or no reference at all!
Which patch? |
I checked the source code. It should not be a default EagerSession. The defaultEagerSession is created only when you call initDefault(options) or getDefault() |
JavaCPP supports both the "Java style", that is using GC via |
@karllessard I'm trying to find where in the old code it was using weak references, and I can't find. I don't remember removing anything like that myself either. From what I can tell, EagerSession has always been keeping strong references with this map: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/EagerSession.java#L499 |
|
@stu1130 , even if it is not the default session, if the created session remains open for a long time while creating a lot of ops and tensors, the issue can happen
@saudet: like @Craigacp pointed out,
There is none yet :) Ideally, like we discussed, it would be possible to refer weakly to a |
Ah, that's where I screwed up. I remember getting confused about what If we still want to have something that behaves like the original implementation, we should probably just replace the |
Ok, I understand that you want JavaCPP to follow closely the (future) behaviour of the JDK. I'm personally totally comfortable with what @rnett proposed about weak tensor scopes, which is quite identical to the original behaviour of the eager sessions. So TF could have its own "weak" pointer scope. @rnett , were you planning to change your proposed If that's reshuffling too much of your work, we can simply keep our own weak references directly in the |
I wasn't planning on doing it as part of TensorScope, but rather as part of Ops or Scope since we're passing it around anyways. But that's all up for discussion, I don't have anything particularly firm yet. |
Ok let's play it simple for now and I'll try to see how it works by simply replacing the |
Ok, so the issue was very easy to reproduce. Starting a JVM with only 256M of memory, this simple loop was hitting a OOM between 30K and 40K iterations: public static void main(String[] args) {
try (EagerSession s = EagerSession.create()) {
Ops tf = Ops.create(s);
while (true) {
tf.math.add(tf.constant(2), tf.constant(2));
}
}
} I pushed this PR #229 to keep only weak references on eager resources in the session, as proposed earlier, and now the garbage collection allows this loop to run forever. I'm pretty confident this should fix the issue observed earlier by @skirdey when using DJL. |
Awesome, I can give a try!
…On Sunday, 28 February 2021, Karl Lessard ***@***.***> wrote:
Ok, so the issue was very easy to reproduce. Starting a JVM with only 256M
of memory, this simple loop was hitting a OOM between 30K and 40K
iterations:
public static void main(String[] args) {
try (EagerSession s = EagerSession.create()) {
Ops tf = Ops.create(s);
while (true) {
tf.math.add(tf.constant(2), tf.constant(2));
}
}
}
I pushed this PR to keep only weak references on eager resources in the
session, as proposed earlier, and now the garbage collection allows this
loop to run forever. I'm pretty confident this should fix the issue
observed earlier by @skirdey <https://github.com/skirdey> when using DJL.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQGJ5QAGOTI5ZNW3WW3TBMGENANCNFSM4XD2EC2Q>
.
|
…nopointergc" (issue tensorflow/java#208)
I am trying to understand why when I use DJL + Tensorflow engine, there is a good amount of time spent in GC, while both DJL and Tensorflow Java seem to use pointerscope and do not relay on GC for object cleanup.
For example, I see JavaCPP Pointer.deallocator gets invoked https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java#L666-L667
which is a heavy synchronized call.
Any help is appreciated.
The text was updated successfully, but these errors were encountered: