Memory leak in Conv2D/Activation on GPU

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): Binary, the standard docker distribution
- TensorFlow version (use command below): v2.4.0-rc4-71-g582c8d236cb 2.4.0
- Python version: 3.6.9
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: 11.0
- GPU model and memory: GeForce RTX 2070, 8GB

**Describe the current behavior**
I upgraded to TF 2.4.0 from TF 2.1.2, and training a very simple convolutional network, which worked fine in 2.1.2, started running out of memory during training. I distilled a simple reproducible example that demonstrates the issue. Each training epoch consumes about 50MB of additional memory and, given enough epochs, it grows to infinity (or 32 GB in my case). It only occurs on GPU, the same thing runs fine on CPU.

**Describe the expected behavior**
Memory not growing, or growing only very little

**Standalone code to reproduce the issue**
```
import gc
import os
import psutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, Flatten, BatchNormalization, Activation

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)


input_tensor = tf.keras.layers.Input(shape=(512,64,1))

x = Conv2D(filters=32, kernel_size=(5,5), strides=(2,2), padding='same')(input_tensor)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=64, kernel_size=(4,4), strides=(2,2), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Flatten()(x)

x = Dense(5, activation='sigmoid')(x)

model = tf.keras.Model(inputs=input_tensor, outputs=x)


train_x = np.random.random((2048, 512, 64, 1))
train_y = np.random.random((2048, 5))

model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam())

process = psutil.Process(os.getpid())

for i in range(50):
    model.fit(train_x, train_y, epochs=1, batch_size=32, verbose=0)
    gc.collect()
    print(i, process.memory_info().rss // 1000000)
```

**Note 1**
Now, if you uncomment the BatchNormalization() layers creation, the memory problem disappears. So, it is somehow caused by the Activation layer following immediately the Conv2D

**Note 2**
The memory problem also occurs if I train multiple epochs in a single fit() call, such as 
```
model.fit(train_x, train_y, epochs=50, batch_size=32)
```
I used the for loop only to be able to call garbage collection and print the memory.

**Note 3**
A Conv2D layer with activation embedded in it, such as
```
Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same', activation='relu')
```
also causes the memory issue





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in Conv2D/Activation on GPU #46475

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak in Conv2D/Activation on GPU #46475

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions