-
Notifications
You must be signed in to change notification settings - Fork 75.3k
Memory leak in Conv2D/Activation on GPU #46475
Description
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): Binary, the standard docker distribution
- TensorFlow version (use command below): v2.4.0-rc4-71-g582c8d236cb 2.4.0
- Python version: 3.6.9
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: 11.0
- GPU model and memory: GeForce RTX 2070, 8GB
Describe the current behavior
I upgraded to TF 2.4.0 from TF 2.1.2, and training a very simple convolutional network, which worked fine in 2.1.2, started running out of memory during training. I distilled a simple reproducible example that demonstrates the issue. Each training epoch consumes about 50MB of additional memory and, given enough epochs, it grows to infinity (or 32 GB in my case). It only occurs on GPU, the same thing runs fine on CPU.
Describe the expected behavior
Memory not growing, or growing only very little
Standalone code to reproduce the issue
import gc
import os
import psutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, Flatten, BatchNormalization, Activation
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
input_tensor = tf.keras.layers.Input(shape=(512,64,1))
x = Conv2D(filters=32, kernel_size=(5,5), strides=(2,2), padding='same')(input_tensor)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(filters=64, kernel_size=(4,4), strides=(2,2), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Flatten()(x)
x = Dense(5, activation='sigmoid')(x)
model = tf.keras.Model(inputs=input_tensor, outputs=x)
train_x = np.random.random((2048, 512, 64, 1))
train_y = np.random.random((2048, 5))
model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam())
process = psutil.Process(os.getpid())
for i in range(50):
model.fit(train_x, train_y, epochs=1, batch_size=32, verbose=0)
gc.collect()
print(i, process.memory_info().rss // 1000000)
Note 1
Now, if you uncomment the BatchNormalization() layers creation, the memory problem disappears. So, it is somehow caused by the Activation layer following immediately the Conv2D
Note 2
The memory problem also occurs if I train multiple epochs in a single fit() call, such as
model.fit(train_x, train_y, epochs=50, batch_size=32)
I used the for loop only to be able to call garbage collection and print the memory.
Note 3
A Conv2D layer with activation embedded in it, such as
Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same', activation='relu')
also causes the memory issue