This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Description
When I tried to implement the paper "mixed precision training" in the tensor2tensor open source code, I found that I could not achieve the training speed that I expected. After checking, I found that the GPU utilization of my modified code was significantly reduced. However, I did not modify the underlying code that allocates GPU resources. I suspect that the problem occurs where the GPU and CPU interact.
Has anyone encountered the same/similar problems? Many thanks in advance!