Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
New Feature: Pascal, Cuda 8, Unified memory #3678
As Cuda 8 enables unified memory for Pascal GPU, combining CPU and GPU on the same address level, and enhancing the memory size available for GPU (with limited latency), using CPU RAM.
ie, using the idea of Oversubscribe GPU memory for large dataset/models.
Example of 64GB allocation on GPU:
UVM is implemented through page-fault between CPU and GPU. It makes GPU program easier to be adopted, but not necessarily as fast as possible. Without more investigation, we are not sure this is suitable for high-performance machine learning. Note that the typical bandwidth across PCIE is often about two orders of magnitude slower than accessing the GPU dram itself.
If the model is indeed too large and cannot fit into GPU memory, it may make sense to load parts of the model in parallel, instead of relying the page faults in the kernels to page in the data.
My understanding was that 80GB/s was between devices. The CPU/GPU communication through PCIE used by UVM is only a small fraction of that. And they are both much smaller than the 720GB/s when GPU accessing its own memory.
This is an active research area, and we might still find a good use of UVM down the road. But the current belief is that it is better to page in/out memory with CPU in parallel, while the compute engine on GPU accesses its own memory in full speed. A good example is the "swap_memory" option in "tf.while_loop", which swaps the temporary memory created in the loop to the host, when the device memory is under pressure.