Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows 10, AttributeError: Can't pickle local object 'StringEncoder.<locals>.EncodeField' #130

Closed
DONJYARAHOI opened this issue Aug 30, 2020 · 4 comments
Assignees

Comments

@DONJYARAHOI
Copy link

Hello,

I’m using windows10, torch 1.5.1+cu101 and torchvision 0.6.1+cu101.

I tried to run the example scripts. visualise_data worked well. But, in the agent_motion_prediction, I get an error when converting dataloader to an iterator as follows:

AttributeError                            Traceback (most recent call last)
<ipython-input-19-b7d59c605d01> in <module>
      1 # ==== TRAIN LOOP
----> 2 tr_it = iter(train_dataloader)
      3 progress_bar = tqdm(range(cfg["train_params"]["max_num_steps"]))
      4 losses_train = []
      5 for _ in progress_bar:

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
    277             return _SingleProcessDataLoaderIter(self)
    278         else:
--> 279             return _MultiProcessingDataLoaderIter(self)
    280 
    281     @property

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
    717             #     before it starts, and __del__ tries to join but will get:
    718             #     AssertionError: can only join a started process.
--> 719             w.start()
    720             self._index_queues.append(index_queue)
    721             self._workers.append(w)

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\multiprocessing\process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\multiprocessing\context.py in _Popen(process_obj)
    222     @staticmethod
    223     def _Popen(process_obj):
--> 224         return _default_context.get_context().Process._Popen(process_obj)
    225 
    226 class DefaultContext(BaseContext):

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\multiprocessing\context.py in _Popen(process_obj)
    325         def _Popen(process_obj):
    326             from .popen_spawn_win32 import Popen
--> 327             return Popen(process_obj)
    328 
    329     class SpawnContext(BaseContext):

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     91             try:
     92                 reduction.dump(prep_data, to_child)
---> 93                 reduction.dump(process_obj, to_child)
     94             finally:
     95                 set_spawning_popen(None)

~\Anaconda3\envs\Kaggle_Lyft_201126A\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object 'StringEncoder.<locals>.EncodeField'
@lucabergamini lucabergamini self-assigned this Sep 1, 2020
@lucabergamini
Copy link
Contributor

Hi @DONJYARAHOI ,
This seems to be a torch multiprocessing related issue, not sure it's something we can fix on our side really.
Do you still get it with num_workers set to 0? Or alternatively using a for loop on the dataloader?

@DONJYARAHOI
Copy link
Author

Setting num_workers to 0 works well. (Though I don't know how fast it is.)
Using a for loop gets same error.
I see. I would consider other environments. Such as WSL or kaggle Kernel.

@lucabergamini
Copy link
Contributor

Setting num_workers to 0 works well. (Though I don't know how fast it is.)

I expect it to be quite slow, as rasterisation is our current bottleneck

I see. I would consider other environments. Such as WSL or kaggle Kernel.

Yeah, we haven't disabled support for Windows as some people have successfully run L5Kit on it but we're currently lacking an active developer for that platform, so our support is very limited..

@nosound2
Copy link

nosound2 commented Sep 6, 2020

num_workers = 0 is a torch level limitation on Windows, it is a well-known issue. I am able to train successfully on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants