RuntimeError: received 0 items of ancdata #138

GCQi · 2023-02-21T14:04:34Z

When I train the lstr based tusimple, as the command is python main_landet.py --train --config ./configs/lane_detection/lstr/resnet18s_tusimple.py --mixed-precision, it run sevel epochs and randomly export the error RuntimeError: received 0 items of ancdata

The error message is:

Traceback (most recent call last):
  File "main_landet.py", line 80, in <module>
    runner.run()
  File "/data/123/gcq/LaneDetection/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 35, in run
    for i, data in enumerate(self.dataloader, 0):
  File "/home/123/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/home/123/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1316, in _next_data
    idx, data = self._get_data()
  File "/home/123/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1282, in _get_data
    success, data = self._try_get_data()
  File "/home/123/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1120, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/123/anaconda3/envs/pad/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/home/123/anaconda3/envs/pad/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 305, in rebuild_storage_fd
    fd = df.detach()
  File "/home/123/anaconda3/envs/pad/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/123/anaconda3/envs/pad/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/123/anaconda3/envs/pad/lib/python3.8/multiprocessing/reduction.py", line 164, in recvfds
    raise RuntimeError('received %d items of ancdata' %
RuntimeError: received 0 items of ancdata

The text was updated successfully, but these errors were encountered:

GCQi · 2023-02-21T14:08:31Z

Besides, it also show me the warning /data/123/gcq/LaneDetection/pytorch-auto-drive/utils/datasets/utils.py:30: UserWarning: An output with one or more elements was resized since it had shape [88473600], which does not match the required output shape [128, 3, 360, 640]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552411/work/aten/src/ATen/native/Resize.cpp:17.)

GCQi · 2023-02-21T14:17:25Z

And I changed the batch size to 128, maybe it caused the error?

voldemortX · 2023-02-21T14:25:05Z

And I changed the batch size to 128, maybe it caused the error?

Yes it is probably the reason, scale it down and see if the issue persists? Usually, this loading error accurs when parallel data loading is too heavy for your system.

GCQi · 2023-02-21T14:49:22Z

And I changed the batch size to 128, maybe it caused the error?

Yes it is probably the reason, scale it down and see if the issue persists? Usually, this loading error accurs when parallel data loading is too heavy for your system.

Now I change it to 64, and the error has not occured for now

GCQi · 2023-02-26T12:16:38Z

There comes a terrible thing that i still set the batch size is 64, and set the workers as 32, the error RuntimeError: received 0 items of ancdata appeared again.

GCQi · 2023-02-26T12:18:52Z

Besides, the train_augmentation as :

train_augmentation = dict(
    name='Compose',
    transforms=[
        dict(
            name='Resize',
            size_image=(360, 640),
            size_label=(360, 640)
        ),
        dict(
            name='RandomHorizontalFlip',
            flip_prob=0.5
                ),
        dict(
            name='RandomRotation',
            degrees=10
                ),
        dict(
            name='ColorJitter',
            brightness=0.4,
            contrast=0.4,
            saturation=0.4,
            hue=0.2
        ),
        dict(
            name='ToTensor'
        ),
        dict(
            name='RandomLighting',
            mean=0.0,
            std=0.1,
            eigen_value=[0.00341571, 0.01817699, 0.2141788],
            eigen_vector=[
                [0.41340352, -0.69563484, -0.58752847],
                [-0.81221408, 0.00994535, -0.5832747],
                [0.41158938, 0.71832671, -0.56089297]
            ]
        ),
        dict(
            name='Normalize',
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
            normalize_target=True
        )
    ]
)

GCQi · 2023-02-26T12:22:07Z

Have you ever encountered this problem before? I can not get the useful message from the error message.

voldemortX · 2023-02-26T12:29:41Z

@GCQi In my experience, this problem comes with heavy data loading (according to your hardware). Large batch size, more workers, and long training schedule increase the probability to encounter this error, which could happen halfway through training. You may find that my default batch size is kept at 20 for this very reason.

voldemortX · 2023-02-26T12:32:09Z

Sometimes the file_system strategy could help, but it has a memory leak issue of its own.

pytorch-auto-drive/main_landet.py

Line 9 in f2615da

# torch.multiprocessing.set_sharing_strategy('file_system')

GCQi · 2023-02-26T12:35:35Z

OK. thanks for your help !! This open frame work is pretty good, thanks for your contirbution and great work

voldemortX added lane detection possible bug labels Feb 21, 2023

voldemortX removed the possible bug label Feb 22, 2023

GCQi closed this as completed Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: received 0 items of ancdata #138

RuntimeError: received 0 items of ancdata #138

GCQi commented Feb 21, 2023

GCQi commented Feb 21, 2023

GCQi commented Feb 21, 2023

voldemortX commented Feb 21, 2023

GCQi commented Feb 21, 2023

GCQi commented Feb 26, 2023

GCQi commented Feb 26, 2023

GCQi commented Feb 26, 2023

voldemortX commented Feb 26, 2023 •

edited

Loading

voldemortX commented Feb 26, 2023 •

edited

Loading

GCQi commented Feb 26, 2023

RuntimeError: received 0 items of ancdata #138

RuntimeError: received 0 items of ancdata #138

Comments

GCQi commented Feb 21, 2023

GCQi commented Feb 21, 2023

GCQi commented Feb 21, 2023

voldemortX commented Feb 21, 2023

GCQi commented Feb 21, 2023

GCQi commented Feb 26, 2023

GCQi commented Feb 26, 2023

GCQi commented Feb 26, 2023

voldemortX commented Feb 26, 2023 • edited Loading

voldemortX commented Feb 26, 2023 • edited Loading

GCQi commented Feb 26, 2023

voldemortX commented Feb 26, 2023 •

edited

Loading

voldemortX commented Feb 26, 2023 •

edited

Loading