An error training an Encoder-Decoder Attention Model #7

qiang2100 · 2017-12-30T12:47:09Z

When I train an Encoder-Decoder Attention Model using "sh run_std.sh", I get the following error:

/home/qiang/torch/extra/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [56,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
THCudaCheck FAIL file=/home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
/home/qiang/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
[C]: at 0x7fbc8f5b6050
[C]: in function '__index'
layers/EMaskedClassNLLCriterion.lua:18: in function 'forward'
nnets/EncDecAWE.lua:391: in function 'opfunc'
/home/qiang/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'optimMethod'
nnets/EncDecAWE.lua:468: in function 'trainBatch'
train.lua:40: in function 'train'
train.lua:162: in function 'main'
train.lua:269: in function 'main'
train.lua:272: in main chunk
[C]: in function 'dofile'
...iang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405e90
Lock freed

Usage instructions:

To obtain and lock an id: ./gpu_lock.py --id
The lock is automatically freed when the parent terminates

To get an id that won't be freed: ./gpu_lock.py --id-to-hog
You must manually free these ids: ./gpu_lock.py --free

More info: http://homepages.inf.ed.ac.uk/imurray2/code/gpu_monitoring/

The text was updated successfully, but these errors were encountered:

Sanqiang · 2018-04-23T00:41:44Z

If you change to CPU mode and you can see more clearly the error comes from.
One of bug I fixed is maybe because the author uses an older version of Torch. I fix my bug by replacing float to double.

Crista23 · 2019-02-10T02:21:57Z

Hi @qiang2100 ! I am encountering the same error, did you find what is causing it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error training an Encoder-Decoder Attention Model #7

An error training an Encoder-Decoder Attention Model #7

qiang2100 commented Dec 30, 2017 •

edited

Loading

Sanqiang commented Apr 23, 2018 •

edited

Loading

Crista23 commented Feb 10, 2019

An error training an Encoder-Decoder Attention Model #7

An error training an Encoder-Decoder Attention Model #7

Comments

qiang2100 commented Dec 30, 2017 • edited Loading

Sanqiang commented Apr 23, 2018 • edited Loading

Crista23 commented Feb 10, 2019

qiang2100 commented Dec 30, 2017 •

edited

Loading

Sanqiang commented Apr 23, 2018 •

edited

Loading