tranning problem #132

20210726 · 2023-08-15T11:13:35Z

at Step 5: Begin training,here is error:
2023-08-15 19:05:25,999 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
INFO:mmdet:workflow: [('train', 1)], max: 24 epochs
2023-08-15 19:05:26.085644: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File "tools/train.py", line 230, in
main()
File "tools/train.py", line 220, in main
train_model(
File "/waymo/SST/mmdet3d/apis/train.py", line 41, in train_model
train_detector(
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/root/ctrl/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/ctrl/mmcv/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/mmdet/datasets/dataset_wrappers.py", line 151, in getitem
return self.dataset[idx % self._ori_len]
File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 284, in getitem
data = self.prepare_train_data(idx)
File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 209, in prepare_train_data
input_dict = self.get_data_info(index)
File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 139, in get_data_info
trk.set_type(self.cat2id[trk.type_name], 'mmdet3d')
KeyError: 'Pedestrian'

Killing subprocess 2618
Traceback (most recent call last):
File "/root/anaconda3/envs/ctrl/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/anaconda3/envs/ctrl/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/ctrl/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/ctrl/ctrl_veh_24e.py', '--launcher', 'pytorch', '--no-validate']' returned non-zero exit status 1.

I followed the CTRL_instructions.md, use part of waymo-dataset, and ignore step2。There should be a configuration file to fix this problem, but I couldn’t find it.

Abyssaledge · 2023-08-16T06:42:11Z

The key should not be 'Pedestrian' since you use the vehicle config. I need the command and adopted config for checking.

20210726 · 2023-08-16T06:47:18Z

The key should not be 'Pedestrian' since you use the vehicle config. I need the command and adopted config for checking.

train command is :bash tools/dist_train.sh configs/ctrl/ctrl_veh_24e.py 1 --no-validate

and fsd_base_vehicle.yaml is:

Abyssaledge · 2023-08-16T06:51:06Z

Why do you use train_gt.bin in the YAML file?

20210726 · 2023-08-16T06:57:30Z

Why do you use train_gt.bin in the YAML file?

i think train_gt.bin is detection result in waymo bin format。so i shoud run （Step 2: Use ImmortalTracker to generate tracking results in training split (bin file format)） first，then use bin file generated in step 2 to train model？

Abyssaledge · 2023-08-16T07:05:10Z

No, train_gt.bin contains the ground-truth information on training set. What you need here is the proposals on training set.
So do not change bin path to train_gt.bin, only change the split to training and use 'fsd6f6e_vehicle_full_trainset.bin' if you want to generate training data.

Abyssaledge · 2023-08-22T10:12:02Z

Please reopen this issue if you need further discussion.

20210726 · 2023-08-22T13:03:39Z

Here is my step trying to reproduce CTRL，I want to know is there any wrong ？
especially step2，and Is the config file ‘fsd_base_vehicle.yaml’ correct?

1.prepare waymo data(I only use part of waymo dataset)
1.1 use my python script to generate train.txt val.txt test.txt and idx2timestamp.pkl idx2contextname.pkl
Then cp train.txt val.txt test.txt to ./data/waymo/kitti_format/ImageSets/
cp idx2timestamp.pkl idx2contextname.pkl to ./data/waymo/kitti_format/
1.2 python tools/create_data.py --dataset waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo

Step 1: Generate train_gt.bin once for all. (waymo bin format).
python ./tools/ctrl/generate_train_gt_bin.py
generate file 'train_gt.bin'

python ./tools/ctrl/extract_poses.py
Generate file context2timestamp.pkl and pose.pkl

Step 2: Use ImmortalTracker to generate tracking results in training split (bin file format)
modify file ego_info.py and time_stamp.py like this:

Modify file waymo_convert_detection.sh like this:

then:
bash preparedata/waymo/waymo_preparedata.sh ~/dataset/waymo/waymo_format/
generate files like this :

bash preparedata/waymo/waymo_convert_detection.sh ~/dataset/waymo/waymo_format/train_gt.bin CTRL_FSD_TTA
Generate files like this:
In data/waymo/training/detection/CTRL_FSD_TTA/dets:

Modify file run_mot.sh like this:

Then:
bash run_mot.sh
generate file like this:

Step 3: Generate track input for training
modify file ‘fsd_base_vehicle.yaml’ like this: pred.bin was generated in step 2.

python ./tools/ctrl/generate_track_input.py ./tools/ctrl/data_configs/fsd_base_vehicle.yaml --process 1
generate files like this:

Step 4: Assign candidates GT tracks
python ./tools/ctrl/generate_candidates.py ./tools/ctrl/data_configs/fsd_base_vehicle.yaml --process 1

Step 5: Begin training
bash tools/dist_train.sh configs/ctrl/ctrl_veh_24e.py 1 --no-validate

20210726 · 2023-08-22T13:06:14Z

@Abyssaledge

JayYangSS · 2023-10-17T03:14:47Z

step 2, I think you shouldn't use train_gt.bin:
bash preparedata/waymo/waymo_convert_detection.sh ~/dataset/waymo/waymo_format/train_gt.bin CTRL_FSD_TTA

you need use base detector to generate prediction result

Abyssaledge closed this as completed Aug 22, 2023

20210726 mentioned this issue Sep 21, 2023

Here is my step trying to reproduce CTRL，I want to know is there any wrong ？ #161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tranning problem #132

tranning problem #132

20210726 commented Aug 15, 2023 •

edited

Loading

Abyssaledge commented Aug 16, 2023

20210726 commented Aug 16, 2023 •

edited

Loading

Abyssaledge commented Aug 16, 2023

20210726 commented Aug 16, 2023

Abyssaledge commented Aug 16, 2023

Abyssaledge commented Aug 22, 2023

20210726 commented Aug 22, 2023

20210726 commented Aug 22, 2023

JayYangSS commented Oct 17, 2023

tranning problem #132

tranning problem #132

Comments

20210726 commented Aug 15, 2023 • edited Loading

Abyssaledge commented Aug 16, 2023

20210726 commented Aug 16, 2023 • edited Loading

Abyssaledge commented Aug 16, 2023

20210726 commented Aug 16, 2023

Abyssaledge commented Aug 16, 2023

Abyssaledge commented Aug 22, 2023

20210726 commented Aug 22, 2023

20210726 commented Aug 22, 2023

JayYangSS commented Oct 17, 2023

20210726 commented Aug 15, 2023 •

edited

Loading

20210726 commented Aug 16, 2023 •

edited

Loading