Skip to content

Performance Issue when upgrading from 0.3.3 to 0.4.1 #461

@mattdornfeld

Description

@mattdornfeld

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    Code is based one examples in tests found in Tensorflow 0.3.3
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 x86_64):
    Method profiling is done on MacOS 12.4 x86_64. Resource profiling is done on Ubuntu 18.04 x86_64
  • TensorFlow installed from (source or binary):
    Installed from binary using Gradle
  • TensorFlow version (use command below):
    Tensorflow 2.4.1
  • Java version (i.e., the output of java -version):
    Compiled with Java 11 run with Java 17
  • Java command line flags (e.g., GC parameters):
    -XX:+UseG1GC
  • Python version (if transferring a model trained in Python):
    Python 3.7

Describe the current behavior
I'm trying to upgrade from Tensorflow Java 0.3.3 to 0.4.1 and am seeing significant performance degradation when doing the upgrade. I've done a good amount of profiling to help diagnose the issue and I was hoping someone might be able to help me understand what's going on. On a high level these performance tests are constructing a HashMap of tf.Tensor objects from input data and feeding them to a trained Logistic Regression model for inference. Note that this profiling shows a downgrade from 0.4.1 back to 0.3.3 (so earlier is the newer version and later is the older versioon)

When downgrading from 0.4.1 to 0.3.1 I see a significant decrease in p50 latency (measured in ms)
image

I also see increased CPU usage
image

Decreased memory usage
image

And increased Young Generation Garbage collection
image

Using method level profiling with Intellij I was able to see a CPU hotspot in the Tensor creation code. The newer version 0.4.1 has significantly more CPU samples than the older version 0.3.3
image

I was also able to see a memory hot spot in the inference step. The newer version 0.4.1 has significantly more memory samples than the older version 0.3.3
image

Describe the expected behavior
I'm having a hard time diagnosing the cause of this issue since I would expect a version upgrade to not have significantly different performance. Happy to hear any ideas on what might be causing this.

Code to reproduce the issue
It's difficult to supply a self contained code snippet since this is a prod system with many different classes, but I can supply more code as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions