Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit on input Tensor size using cunn #144

Open
Tushar-N opened this issue Sep 24, 2015 · 5 comments
Open

Limit on input Tensor size using cunn #144

Tushar-N opened this issue Sep 24, 2015 · 5 comments

Comments

@Tushar-N
Copy link

Code to reproduce the error:

require 'cunn'
require 'cutorch'
cutorch.setDevice(1)
model=nn.Sequential():add(nn.Linear(300, 500)):add(nn.LogSoftMax()):cuda()
batch_size=90000
output=model:forward(torch.rand(batch_size,300):float():cuda())

The error:

/home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: invalid argument at /tmp/luarocks_cunn-scm-1-144/cunn/LogSoftMax.cu:249
stack traceback:
    [C]: in function 'updateOutput'
    /home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    [string "_RESULT={model:forward(torch.rand(batch_size,..."]:1: in main chunk
    [C]: in function 'xpcall'
    /home/tushar/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
    ...shar/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x00406670  
                                                                      [0.2761s]

Large batch sizes work fine on CPU. Smaller batch sizes (<80k) work fine on GPU.

smth chntla on the issue:

It must be that the cuda launch parameters (number of blocks/threads) were configured without such large batch sizes in mind

@etrulls
Copy link

etrulls commented Sep 29, 2015

I'm also running into this.
(I'm trying to hunt down the magic number but I'm new to CUDA and it's not very friendly.)

@dominikgrewe
Copy link
Member

The grid used to launch the kernel uses the batch size as the first dimension: https://github.com/torch/cunn/blob/master/LogSoftMax.cu#L238

There's a limit on the size of these grids, which depends on the version of CUDA you're using. It's 65536 for compute capability <= 2.x and 2^31-1 for newer versions (see https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications)

@etrulls
Copy link

etrulls commented Sep 29, 2015

Aha, I got that far, and I couldn't see why it wasn't working out of the box on a K40.
Turns out cunn is compiling for a compute capability of 2.0, so changing that seems to solve the problem. Thanks!

@hogwild
Copy link

hogwild commented Oct 2, 2015

I use a Geforce 750Ti. The problem is solved by changing the CUDA_NVCC_FLAGS from "-arch=sm_20" to "-arch=sm_50" in the CmakeLists.txt.

@mrharicot
Copy link

mrharicot commented Apr 14, 2016

I have the same issue with SpatialSoftmax on the GPU, I am using a Titan X and cunn was compiled with CUDA 7.5 and compute capability 5.2
Any idea if this is fixable?

EDIT: I think I have fixed this by explicitly using cudnn.SpatialSoftMax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants