Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory consumption and preparation runtime of tf.keras.backend.max in custom layer with masking #37479

Closed
padoremu opened this issue Mar 10, 2020 · 9 comments
Assignees
Labels
comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release type:bug Bug type:performance Performance Issue

Comments

@padoremu
Copy link

System information

  • Have I written custom code: yes
  • OS Platform and Distribution: Linux Ubuntu 18.04
  • Mobile device if the issue happens on mobile device: -
  • TensorFlow installed from: binary
  • TensorFlow version: 2.2.0-dev20200303
  • Python version: 3.6.9
  • Bazel version: -
  • GCC/Compiler version: -
  • CUDA/cuDNN version: CPU only
  • GPU model and memory: CPU only

Describe the current behavior
Memory consumption seems to be proportional to num_iterations and thus excessive, most likely being a memory leak. Runtime until seeing the first fit result is also extremely slow: 15 seconds until the first fit call, 55 seconds until seeing the result of the first fit, and the other fits run through in less than a second. Apparently, runtime is due to memory management and not due to the actual max function evaluation.

When using tf.keras.backend.max for computing a mask with tf.stack in a real setup, memory consumption increases steadily until running out of memory at approx. 30 GB. In contrast, without compute_mask, memory consumption doesn't go beyond approx 1 GB.

Describe the expected behavior
I would expect memory consumption to be independent of num_iterations and thus being much lower, plus preparation runtime being much lower.

Code to reproduce the issue

import tensorflow as tf
import numpy as np


batch_size = 100
dim_input = 100
dim_output = 1
num_iterations = 100 # will consume approx. 5 GB RAM when set to 1000


class CustomMask(tf.keras.layers.Layer):
  def __init__(self):
    super(CustomMask, self).__init__()

  def compute_mask(self, inputs, mask=None):
    batch_size = inputs.shape[0]

    batch_maxes = tf.keras.backend.max(inputs, axis=1)

    for batch in range(batch_size):
      for i in range(num_iterations):
        max = tf.keras.backend.max(batch_maxes[batch])

    return None

  def call(self, inputs, mask=None):
    return inputs


model = tf.keras.Sequential()

model.add(tf.keras.layers.Input(batch_input_shape=(batch_size, dim_input)))

model.add(CustomMask())

model.add(tf.keras.layers.Dense(dim_output))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

training_input = np.zeros([batch_size, dim_input])
training_output = np.zeros([batch_size, dim_output])

model.fit(training_input, training_output, batch_size=batch_size)

Other info / logs
If my usage of tf.keras.backend.max is wrong with regard to memory consumption and / or runtime, please let me know. I need to call it frequently within compute_mask for computing a custom mask in conjunction with tf.stack. However, the latter does not seem to be the problem, which is why I left it out in the stripped down code.

@padoremu padoremu added the type:bug Bug label Mar 10, 2020
@gadagashwini-zz gadagashwini-zz added comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release type:performance Performance Issue and removed type:bug Bug labels Mar 11, 2020
@gadagashwini-zz
Copy link
Contributor

@padoremu, When i tried to execute the issue by setting num_iterations=1000 session got crashed. Which shows, it took more than 12GB of RAM. Please find the gist here and confirm the issue. Thanks!

@gadagashwini-zz gadagashwini-zz added the stat:awaiting response Status - Awaiting response from author label Mar 11, 2020
@padoremu
Copy link
Author

Thanks. I can confirm the issue.

@gadagashwini-zz gadagashwini-zz removed the stat:awaiting response Status - Awaiting response from author label Mar 12, 2020
@gowthamkpr gowthamkpr assigned fchollet and unassigned gowthamkpr Mar 12, 2020
@gowthamkpr gowthamkpr added the type:bug Bug label Mar 12, 2020
@padoremu
Copy link
Author

I would like to kindly ask, if there are any news on this issue?

@padoremu
Copy link
Author

padoremu commented Apr 9, 2020

This issue has been inactive for one month now. I would appreciate some feedback very much. Thank you.

@padoremu
Copy link
Author

padoremu commented Apr 20, 2020

@gadagashwini @gowthamkpr @fchollet Is there any chance that a tensorflower comments on this issue? That would be very kind. Thank you.

@Saduf2019
Copy link
Contributor

@padoremu
Can you please try on latest tf version and in case the issue persist.

Please post this issue on keras-team/keras repo.
To know more refer to:
https://discuss.tensorflow.org/t/keras-project-moved-to-new-repository-in-https-github-com-keras-team-keras/1999

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Sep 19, 2021
@padoremu
Copy link
Author

@Saduf2019
Thank you for asking after so long. I just tried with a fresh tf-nightly installation (2.7.0-dev20210920) and using the initially posted code, and I can still perfectly reproduce the problem: the larger you set num_iterations, the more memory consumption increases. I recommend setting num_iterations = 1000 and observe the evolution with e.g. top on Linux. Memory consumption steadily increases. After 15 minutes, I even have > 20 GB.

Since this issue was created one and a half years ago, as you can imagine I had to find ways to avoid needing this kind of functionality. Please feel free to move this issue to kears-team/keras repo. My motivation to invest more time in communicating and documenting this problem is limited. Of course I would still be interested in a solution / fix. Anybody can easily reproduce the problem with the initially posted code - that's all one needs.

Thank you.

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label Sep 21, 2021
@tensorflowbutler
Copy link
Member

Hi There,

This is a stale issue. As you are using an older version of tensorflow, we are checking to see if you still need help on this issue. Please test the issue with the latest TensorFlow (TF2.7 and tf-nightly). If the issue still persists with the newer versions of TF, please feel free to open it in keras-team/keras repository by providing details about the issue and a standalone code to reproduce the issue. Thanks!

Please note that Keras development has moved to a separate Keras-team/keras repository to focus entirely on only Keras. Thanks!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release type:bug Bug type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

6 participants