Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Support pickling #32

Closed
shmpanski opened this issue Oct 23, 2019 · 4 comments
Closed

Support pickling #32

shmpanski opened this issue Oct 23, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@shmpanski
Copy link

shmpanski commented Oct 23, 2019

Pickling is often necessary using multiprocessing. Can you add such a feature?

>>> import pickle
>>> from youtokentome import BPE
>>> bpe = BPE("shared.model")
>>> file = open("test.pkl", "w")
>>> pickle.dump(bpe, file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "stringsource", line 2, in _youtokentome_cython.BPE.__reduce_cython__
TypeError: self.encoder cannot be converted to a Python object for pickling
>>> 
@shmpanski
Copy link
Author

Actually, I solved the problem with multiprocessing. The need for pickling was produced by the pytorch bug when using multiple workers in the data loader.

@brian8128
Copy link

This would be nice to have even if we can do multiprocessing without it.

@1475963
Copy link

1475963 commented Jan 15, 2020

Hi, you can do this to pickle the BPE object or any object that contains a BPE encoder.

import youtokentome as yttm

def bpe_reduce(self):
	return (
		self.__class__,
		(str(MODEL_PATH),),
	)
yttm.BPE.__reduce__ = bpe_reduce

Then pickle.dump any instance of youtokentome.BPE.
This will prevent pickle from saving the BPE object, but it will be able to re-instantiate it given the tuple (class, model_path) when you load your pickle file.

@kefirski
Copy link
Contributor

Closed in #81

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants