Limit on input Tensor size using cunn #144

Tushar-N · 2015-09-24T06:49:51Z

Code to reproduce the error:

require 'cunn'
require 'cutorch'
cutorch.setDevice(1)
model=nn.Sequential():add(nn.Linear(300, 500)):add(nn.LogSoftMax()):cuda()
batch_size=90000
output=model:forward(torch.rand(batch_size,300):float():cuda())

The error:

/home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: invalid argument at /tmp/luarocks_cunn-scm-1-144/cunn/LogSoftMax.cu:249
stack traceback:
    [C]: in function 'updateOutput'
    /home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    [string "_RESULT={model:forward(torch.rand(batch_size,..."]:1: in main chunk
    [C]: in function 'xpcall'
    /home/tushar/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
    ...shar/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x00406670  
                                                                      [0.2761s]

Large batch sizes work fine on CPU. Smaller batch sizes (<80k) work fine on GPU.

smth chntla on the issue:

It must be that the cuda launch parameters (number of blocks/threads) were configured without such large batch sizes in mind

The text was updated successfully, but these errors were encountered:

etrulls · 2015-09-29T11:29:43Z

I'm also running into this.
(I'm trying to hunt down the magic number but I'm new to CUDA and it's not very friendly.)

dominikgrewe · 2015-09-29T11:39:04Z

The grid used to launch the kernel uses the batch size as the first dimension: https://github.com/torch/cunn/blob/master/LogSoftMax.cu#L238

There's a limit on the size of these grids, which depends on the version of CUDA you're using. It's 65536 for compute capability <= 2.x and 2^31-1 for newer versions (see https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications)

etrulls · 2015-09-29T12:04:23Z

Aha, I got that far, and I couldn't see why it wasn't working out of the box on a K40.
Turns out cunn is compiling for a compute capability of 2.0, so changing that seems to solve the problem. Thanks!

hogwild · 2015-10-02T10:50:34Z

I use a Geforce 750Ti. The problem is solved by changing the CUDA_NVCC_FLAGS from "-arch=sm_20" to "-arch=sm_50" in the CmakeLists.txt.

mrharicot · 2016-04-14T22:23:12Z

I have the same issue with SpatialSoftmax on the GPU, I am using a Titan X and cunn was compiled with CUDA 7.5 and compute capability 5.2
Any idea if this is fixable?

EDIT: I think I have fixed this by explicitly using cudnn.SpatialSoftMax

soumith mentioned this issue Sep 29, 2015

[cmake] compile with the system card's compute capability #148

Closed

hughperkins mentioned this issue Oct 1, 2015

cunn/SoftMax does not support images with 2^16 or more elements #150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit on input Tensor size using cunn #144

Limit on input Tensor size using cunn #144

Tushar-N commented Sep 24, 2015

etrulls commented Sep 29, 2015

dominikgrewe commented Sep 29, 2015

etrulls commented Sep 29, 2015

hogwild commented Oct 2, 2015

mrharicot commented Apr 14, 2016 •

edited

Loading

Limit on input Tensor size using cunn #144

Limit on input Tensor size using cunn #144

Comments

Tushar-N commented Sep 24, 2015

etrulls commented Sep 29, 2015

dominikgrewe commented Sep 29, 2015

etrulls commented Sep 29, 2015

hogwild commented Oct 2, 2015

mrharicot commented Apr 14, 2016 • edited Loading

mrharicot commented Apr 14, 2016 •

edited

Loading