Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loading Error #10

Open
aadharna opened this issue Nov 2, 2021 · 1 comment
Open

Data Loading Error #10

aadharna opened this issue Nov 2, 2021 · 1 comment

Comments

@aadharna
Copy link

aadharna commented Nov 2, 2021

When I try to run the code provided here, I end up hitting a RuntimeError here:

c-swm/train.py

Line 104 in e944b24

obs = train_loader.__iter__().next()[0]
when running your Shapes-2D build/train functions: python train.py --dataset data/shapes_train.h5 --encoder small --name shapes

Specifically, the error says: RuntimeError: DataLoader worker is killed by signal: Killed. It seems to be coming from the fact that the dataloader is having trouble multi-processing the training set loading. But when I look through your utils files, I am not seeing why this error would exist.

This same error has occurred on both a windows machine as well as a linux machine.

Here's the full trace:

Traceback (most recent call last):
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self.data_queue.get(timeout=timeout)
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 921, in wait
    ready = selector.select(timeout)
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 19014) is killed by signal: Killed. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 104, in <module>
    obs = train_loader.__iter__().next()[0]
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data
    success, data = self._try_get_data()
  File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 737, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 19012, 19014) exited unexpectedly
(cswm) aadharna@penguin:~/PycharmProjects/c-swm$ 

I'll continue poking around and update when I find the root cause.

@mmdjiji
Copy link

mmdjiji commented Mar 6, 2024

Same situation, maybe the RAM is too small to run the train?

YES, RAM PROBLEM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants