New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Pascal, Cuda 8, Unified memory #3678

Closed
deeplearning-ai-research opened this Issue Aug 6, 2016 · 11 comments

Comments

Projects
None yet
@deeplearning-ai-research

deeplearning-ai-research commented Aug 6, 2016

Hi,

As Cuda 8 enables unified memory for Pascal GPU, combining CPU and GPU on the same address level, and enhancing the memory size available for GPU (with limited latency), using CPU RAM.

  1. Is there a possibility to have larger than GPU ram NN+data (lower than CPU ram) for training in tensor flow ? (it would help reducing distributed computing/network latency) ?

ie, using the idea of Oversubscribe GPU memory for large dataset/models.

here, CUDA API :
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-simplifying

Example of 64GB allocation on GPU:

void foo() {
// Allocate 64 GB on GPU, using CPU RAM
char *data;
size_t size = 64*1024*1024*1024;
cudaMallocManaged(&data, size);
}
@zheng-xq

This comment has been minimized.

Contributor

zheng-xq commented Aug 7, 2016

UVM is implemented through page-fault between CPU and GPU. It makes GPU program easier to be adopted, but not necessarily as fast as possible. Without more investigation, we are not sure this is suitable for high-performance machine learning. Note that the typical bandwidth across PCIE is often about two orders of magnitude slower than accessing the GPU dram itself.

If the model is indeed too large and cannot fit into GPU memory, it may make sense to load parts of the model in parallel, instead of relying the page faults in the kernels to page in the data.

@arita37

This comment has been minimized.

arita37 commented Aug 9, 2016

Cuda 8 with Pascal has Nvlink which is 80Gb/sec, so Latency is very low between RAM.
It allows to create larger than GPU memory through single memory allocation,
At very low latency. Performance would be enhanced.

See the slides.

@zheng-xq

This comment has been minimized.

Contributor

zheng-xq commented Aug 9, 2016

My understanding was that 80GB/s was between devices. The CPU/GPU communication through PCIE used by UVM is only a small fraction of that. And they are both much smaller than the 720GB/s when GPU accessing its own memory.

This is an active research area, and we might still find a good use of UVM down the road. But the current belief is that it is better to page in/out memory with CPU in parallel, while the compute engine on GPU accesses its own memory in full speed. A good example is the "swap_memory" option in "tf.while_loop", which swaps the temporary memory created in the loop to the host, when the device memory is under pressure.

@yaroslavvb

This comment has been minimized.

Contributor

yaroslavvb commented Oct 3, 2017

initial support added in cd4f584

@evolu8

This comment has been minimized.

evolu8 commented Oct 19, 2017

UVM support throughout would be of enormous benefit. I would be very keen to see this implemented. Having ideal training throughput is often less important than coping with large input and large model sizes which simply fail to fit on a card.

@byronyi

This comment has been minimized.

Contributor

byronyi commented Oct 19, 2017

Looks like another “we don’t need distributed transactions in Google so we choose to let users implement crappy/wrong versions of their own” for BigTable/MegaStore.

Now we have Spanner :)

@ranshadmi

This comment has been minimized.

ranshadmi commented Oct 21, 2017

+1 for "UVM support throughout"!

1 similar comment
@Orna123

This comment has been minimized.

Orna123 commented Oct 22, 2017

+1 for "UVM support throughout"!

@evolu8

This comment has been minimized.

evolu8 commented Nov 10, 2017

Any movement on this?

Cheers

@tarlovsky

This comment has been minimized.

tarlovsky commented May 30, 2018

Hey guys, any progress? Highly interested!

@smit-hinsu

This comment has been minimized.

Contributor

smit-hinsu commented Jun 16, 2018

UVM support was added recently in b113981.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment