-
Notifications
You must be signed in to change notification settings - Fork 224
Description
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Code is based one examples in tests found in Tensorflow 0.3.3 - OS Platform and Distribution (e.g., Linux Ubuntu 16.04 x86_64):
Method profiling is done on MacOS 12.4 x86_64. Resource profiling is done on Ubuntu 18.04 x86_64 - TensorFlow installed from (source or binary):
Installed from binary using Gradle - TensorFlow version (use command below):
Tensorflow 2.4.1 - Java version (i.e., the output of
java -version):
Compiled with Java 11 run with Java 17 - Java command line flags (e.g., GC parameters):
-XX:+UseG1GC - Python version (if transferring a model trained in Python):
Python 3.7
Describe the current behavior
I'm trying to upgrade from Tensorflow Java 0.3.3 to 0.4.1 and am seeing significant performance degradation when doing the upgrade. I've done a good amount of profiling to help diagnose the issue and I was hoping someone might be able to help me understand what's going on. On a high level these performance tests are constructing a HashMap of tf.Tensor objects from input data and feeding them to a trained Logistic Regression model for inference. Note that this profiling shows a downgrade from 0.4.1 back to 0.3.3 (so earlier is the newer version and later is the older versioon)
When downgrading from 0.4.1 to 0.3.1 I see a significant decrease in p50 latency (measured in ms)

I also see increased CPU usage

And increased Young Generation Garbage collection

Using method level profiling with Intellij I was able to see a CPU hotspot in the Tensor creation code. The newer version 0.4.1 has significantly more CPU samples than the older version 0.3.3

I was also able to see a memory hot spot in the inference step. The newer version 0.4.1 has significantly more memory samples than the older version 0.3.3

Describe the expected behavior
I'm having a hard time diagnosing the cause of this issue since I would expect a version upgrade to not have significantly different performance. Happy to hear any ideas on what might be causing this.
Code to reproduce the issue
It's difficult to supply a self contained code snippet since this is a prod system with many different classes, but I can supply more code as needed.
