-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: device-side assert triggered #44
Comments
Try reprocessing the data as its being loaded from the cache here.
Also, make sure that the model you are loading from |
It helped thanks! Converting to features started. I am using RTX 2080 8 gb. How can I deal with that? Thank you very much for your time and help! |
You are running out of storage (HDD/SSD). Clearing some space on your drive should do the trick. |
My code:
from simpletransformers.classification import ClassificationModel
import pandas as pd
train_df = pd.read_csv('data/train.csv', header=None)
eval_df = pd.read_csv('data/test.csv', header=None)
train_df[0] = (train_df[0] == 2).astype(int)
eval_df[0] = (eval_df[0] == 2).astype(int)
train_df = pd.DataFrame({
'text': train_df[1].replace(r'\n', ' ', regex=True),
'label':train_df[0]
})
eval_df = pd.DataFrame({
'text': eval_df[1].replace(r'\n', ' ', regex=True),
'label':eval_df[0]
})
model = ClassificationModel('xlm', 'model/', args=({'fp16': False}))
model.train_model(train_df)
result, model_outputs, wrong_predictions = model.eval_model(eval_df)
Error:
Features loaded from cache at cache_dir/cached_train_xlm_128_binary
Epoch: 0%| | 0/1 [00:00<?, ?it/s/opt/conda/conda-bld/pytorch_1570710853631/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1570710853631/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion
t >= 0 && t < n_classes
failed./opt/conda/conda-bld/pytorch_1570710853631/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion
t >= 0 && t < n_classes
failed.Traceback (most recent call last):
File "run1.py", line 24, in
model.train_model(train_df)
File "/home/data/anaconda3/envs/pytorch/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 162, in train_model
global_step, tr_loss = self.train(train_dataset, output_dir, show_running_loss=show_running_loss, eval_df=eval_df)
File "/home/data/anaconda3/envs/pytorch/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 235, in train
print("\rRunning loss: %f" % loss, end="")
RuntimeError: CUDA error: device-side assert triggered
Could you please help me figure it out how to fix that?Thank you!
The text was updated successfully, but these errors were encountered: