Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error training an Encoder-Decoder Attention Model #7

Open
qiang2100 opened this issue Dec 30, 2017 · 2 comments
Open

An error training an Encoder-Decoder Attention Model #7

qiang2100 opened this issue Dec 30, 2017 · 2 comments

Comments

@qiang2100
Copy link

qiang2100 commented Dec 30, 2017

When I train an Encoder-Decoder Attention Model using "sh run_std.sh", I get the following error:

/home/qiang/torch/extra/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [56,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
THCudaCheck FAIL file=/home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
/home/qiang/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
[C]: at 0x7fbc8f5b6050
[C]: in function '__index'
layers/EMaskedClassNLLCriterion.lua:18: in function 'forward'
nnets/EncDecAWE.lua:391: in function 'opfunc'
/home/qiang/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'optimMethod'
nnets/EncDecAWE.lua:468: in function 'trainBatch'
train.lua:40: in function 'train'
train.lua:162: in function 'main'
train.lua:269: in function 'main'
train.lua:272: in main chunk
[C]: in function 'dofile'
...iang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405e90
Lock freed

Usage instructions:

To obtain and lock an id: ./gpu_lock.py --id
The lock is automatically freed when the parent terminates

To get an id that won't be freed: ./gpu_lock.py --id-to-hog
You must manually free these ids: ./gpu_lock.py --free

More info: http://homepages.inf.ed.ac.uk/imurray2/code/gpu_monitoring/

@Sanqiang
Copy link

Sanqiang commented Apr 23, 2018

If you change to CPU mode and you can see more clearly the error comes from.
One of bug I fixed is maybe because the author uses an older version of Torch. I fix my bug by replacing float to double.

@Crista23
Copy link

Hi @qiang2100 ! I am encountering the same error, did you find what is causing it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants