You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorFlow version (use command below): v2.3.0-rc2-23-gb36436b087 2.3.0
Describe the feature and the current behavior/state.
I just use Tensorflow 2.3 to implement one paper, which has an official Pytorch version. Everything is fine except for the Batch_size of the training data, where the official pytorch could use batch_size of 32 with 8GPU, while I could only deploy 16 with the same GPUs and same settings of neural network. I tried to use tensorboard Profiler to check and optimize my training loop.
Here is the Profiler output of 10 steps of my model training.
You can see that the image points out the peak heap usage occurs when the GradientTape of the last layer of the network asked for it. However, after this allocation of the GPU memory, the peak memory usage goes down from 7.41 GiBs to around only 6 GiBs, so I am wondering why TF2 will allocate so much heap usage at the beginning of each training step and will not use this part during the loop, and the difference between the heap and memory usage. Is there any way that I could optimize the heap allocation of the heap so that I could fit my 32 batch_size to the graphic memory?
I also noticed that the memory capacity of what TF Profiler shows is 10.96 GiBs, which is only 90% of Fragmentation. My GPU memory has got 12196 MiB memory to use, which means that there are still some available space for the training. Some blogs' workarounds don't work, such as tf.config.experimental.set_memory_growth(gpu, True). Thus, I am looking for a way to permit TF to use this part of the graphic memory, say 100% fragmentation.
I tried to use the following code as the documentation says, but I still failed:
I am sure that if I solve the above two problems, I could increase my batch_size from 16 to 32, since a lot of GPU memory was wasted before. Appreciate your help sincerely.
I am using multi-gpu. So error callback is provided:
Traceback (most recent call last):
File "main.py", line 29, in <module>
main()
File "main.py", line 25, in main
trainer.train()
File "/home/lz/potter/EDVR/trainers/train.py", line 238, in train
loss, acc = self.train_epoch(epoch)
File "/home/lz/potter/EDVR/trainers/train.py", line 191, in train_epoch
loss, psnr = self.multi_train_step(batch_x, batch_y)
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 846, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
return self._call_flat(
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
outputs = execute.execute(
File "/home/lz/anaconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 3 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[4,256,360,640] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node StatefulPartitionedCall/conv2d_61/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Identity_2/_190]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[4,256,360,640] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node StatefulPartitionedCall/conv2d_61/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(2) Resource exhausted: OOM when allocating tensor with shape[4,256,360,640] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node StatefulPartitionedCall/conv2d_61/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[GroupCrossDeviceControlEdges_0/StatefulPartitionedCall/Adam/Adam/update_1_1/Const/_155]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_multi_train_step_36442]
Function call stack:
multi_train_step -> multi_train_step -> multi_train_step
Will this change the current api? How?
No
Who will benefit with this feature?
Everyone who wants to optimize the consumption of GPU memory.
The text was updated successfully, but these errors were encountered:
@sanjoy Thanks for your reply! Here is a solution to use the rest part of GPU memory I found on the Internet:
gpus=tf.config.experimental.list_physical_devices('GPU')
ifgpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPUtry:
forgpuingpus:
tf.config.experimental.set_virtual_device_configuration(
gpu,
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=12195)])
logical_gpus=tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
exceptRuntimeErrorase:
# Virtual devices must be set before GPUs have been initializedprint(e)
I just set the memory_limit to 12195 MB directly and the TF will use the full portion of my GPU.
System information
Describe the feature and the current behavior/state.
I just use Tensorflow 2.3 to implement one paper, which has an official Pytorch version. Everything is fine except for the Batch_size of the training data, where the official pytorch could use
batch_size
of32
with 8GPU, while I could only deploy16
with the same GPUs and same settings of neural network. I tried to use tensorboard Profiler to check and optimize my training loop.Here is the Profiler output of 10 steps of my model training.
You can see that the image points out the peak heap usage occurs when the GradientTape of the last layer of the network asked for it. However, after this allocation of the GPU memory, the peak memory usage goes down from
7.41 GiBs
to around only6 GiBs
, so I am wondering why TF2 will allocate so much heap usage at the beginning of each training step and will not use this part during the loop, and the difference between the heap and memory usage. Is there any way that I could optimize the heap allocation of the heap so that I could fit my32 batch_size
to the graphic memory?I also noticed that the memory capacity of what TF Profiler shows is
10.96 GiBs
, which is only90%
of Fragmentation. My GPU memory has got12196 MiB
memory to use, which means that there are still some available space for the training. Some blogs' workarounds don't work, such astf.config.experimental.set_memory_growth(gpu, True)
. Thus, I am looking for a way to permit TF to use this part of the graphic memory, say100%
fragmentation.I tried to use the following code as the documentation says, but I still failed:
I am sure that if I solve the above two problems, I could increase my
batch_size
from16
to32
, since a lot of GPU memory was wasted before. Appreciate your help sincerely.I am using multi-gpu. So error callback is provided:
Will this change the current api? How?
No
Who will benefit with this feature?
Everyone who wants to optimize the consumption of GPU memory.
The text was updated successfully, but these errors were encountered: