New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Allow textX to work across processes (multiprocessing, ProcessPoolExecutor) #295
Comments
Hello @stanislaw. Thanks for your kind words. I haven't hit a use-case where I would need to pickle models so far, so I think your approach seems like a good solution at the moment. Thanks for providing the info what need to be stripped down for a serialization to work.
Thanks. We haven't set up any donation channel so far so the only way to help the project at the moment is through contributions, like this one :) |
Another part of this report: if I switch to thread-based parallelization instead of process-based parallelization i.e. This is roughly the code that I am using inside each Thread.
and the reader class is roughly as follows: class SDReader:
def __init__(self):
self.meta_model = metamodel_from_str(
STRICTDOC_GRAMMAR, classes=DOCUMENT_MODELS, use_regexp_group=True
)
obj_processors = {
# some processors I am sure thread-safe
}
self.meta_model.register_obj_processors(obj_processors)
def read(self, input):
document = self.meta_model.model_from_str(input)
return document
def read_from_file(self, file_path):
with open(file_path, 'r') as file:
sdoc_content = file.read()
try:
sdoc = self.read(sdoc_content)
return sdoc I am under impression that there is some shared state in I am quite sure that the thread-based parallelization would not speed up things too much as it is a known with Python but I thought about passing this information to you anyway in case there was a simple way to make this API thread-safe just for the sake of having this API to be side-effect free, with no shared state. Thanks. Some examples of the crashes:
|
This should make metamodel_from_* calls thread-safe
Thank for reporting. Indeed, grammar parsing is not thread safe as we are caching/reusing parsers for efficiency so, different threads might use the same parser concurrently. I've just pushed a possible fix. It is always hard to test issues like this so please install from the branch and see if the issue is fixed. |
Discussion continued on #297 |
Hello,
First of all, thank you very much for creating this tool. I am building my own tool on top of textX and so far I have had a great experience using it!
To improve the performance in my project, I have tried to parallelize reading text files based on my custom textX grammar as well as writing text files from textX in-memory models. In both cases, an attempt to parallelize results in various pickling errors similar to the following error:
For writing my objects that I previously obtained with textX I have managed to work around the pickling errors by stripping out some of the textX's metadata as follows:
This is the code that I am using to parallelize:
I am wondering if there is a good recommendation or fix by the developers/maintainers of textX as to how I could achieve the parallelization without using the hack above.
P.S. Does the project accept donations? It is the core building block of my project and I would be absolutely happy to send a few coins to support textX.
The text was updated successfully, but these errors were encountered: