Open
Description
Related to TFT/Pytorch
Describe the bug
I'm trying to add a new dataset to this framework following the yaml. I got all kind of errors to be honest, but most of them are:
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1445, in _call_impl
result = forward_call(*input, **kwargs)
File "/workspace/models/tft_pyt/modeling.py", line 229, in forward
t_observed_tgt = fused_pointwise_linear_v2(t_tgt_obs, self.t_tgt_embedding_vectors, self.t_tgt_embedding_bias)
RuntimeError: Error instantiating 'training.trainer.CTLTrainer' : The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/workspace/models/tft_pyt/modeling.py", line 89, in fused_pointwise_linear_v2
def fused_pointwise_linear_v2(x, a, b):
out = x.unsqueeze(3) * a
out = out + b
~~~~~~~ <--- HERE
return out
**RuntimeError: CUDA error: device-side assert triggered**
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To Reproduce
Dataset:
zeus@b8ae237f7dad:/workspace$ head /workspace/datasets/sosd/timeseries_datasetcs.csv
AUDAT,MATNR,WERKS,total_quantity
2022-04-01,M00000000213903201,D110,1
2022-04-01,M00000000215022201,D110,5
2022-04-01,M00000000214593302,D110,3
2022-04-01,M00000000215043701,D110,5
2022-04-01,M00000000213449504,D110,0
2022-04-01,M00000000214319300,D110,0
2022-04-01,M00000000214385102,D110,10
2022-04-01,M00000000214180004,D110,0
2022-04-01,M00000000214458104,D110,20
config:
_target_: data.datasets.create_datasets
config:
graph: False
source_path: /workspace/datasets/sosd/timeseries_datasetcs.csv
dest_path: /workspace/datasets/sosd/
train_range:
- '2022-04-01'
- '2023-09-02'
valid_range:
- '2023-10-26'
- '2024-02-15'
test_range:
- '2023-09-02'
- '2023-10-26'
scale_per_id: True
encoder_length: 5
input_length: 5
example_length: 10
dataset_stride: 1
MultiID: False
features:
- name: 'MATNR'
feature_type: 'ID'
feature_embed_type: 'CATEGORICAL'
cardinality: 70908
- name: 'MATNR'
feature_type: 'STATIC'
feature_embed_type: 'CATEGORICAL'
cardinality: 70908
- name: 'WERKS'
feature_type: 'ID'
feature_embed_type: 'CATEGORICAL'
cardinality: 1
- name: 'AUDAT'
feature_type: 'TIME'
feature_embed_type: 'DATE'
- name: 'WERKS'
feature_type: 'KNOWN'
feature_embed_type: 'CATEGORICAL'
cardinality: 1
- name: 'total_quantity'
feature_type: 'TARGET'
feature_embed_type: 'CONTINUOUS'
scaler:
_target_: sklearn.preprocessing.StandardScaler
train_samples: 619765
valid_samples: 174172
binarized: True
time_series_count: 70908
Expected behavior
The Training starts.
Environment
- NVIDIA-SMI 535.216.03
- Driver Version: 535.216.03
- CUDA Version: 12.2