New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openprinting-ppds: MemoryError #2
Comments
Hi @532910, thanks for the report. This problem is because pyppd saves the all PPDs into a single compressed string, which needs to be decompressed in its entirety to load a PPD. This requires enough RAM to hold all decompressed PPDs in memory. On my system (Ubuntu 16.04), it requires 700 MB to extract a single PPD, and ~30 MB to list the available PPDs. It's a lot. To solve this, we'd need to change the compression method to somehow be able to extract just the required PPD. This will probably reduce the compression rate, as the PPDs are compressed individually (or in batches), in exchange of a reduced runtime memory usage. I won't have time to work on this in the following weeks, but if you or anyone else reading this is able to work on a pull request, I can help pointing in the right direction. |
On my system 762Mi is not enough. It's virtual host so I've increased ram and it works now, but I don't know how to check how much it requires. This issue is not only about required memory amount, but also about wrong handling this case. Cups just says: Unable to copy PPD file / empty PPD file |
Adding to my comment above, it might be possible to use https://docs.python.org/3/library/lzma.html#lzma.LZMADecompressor to decompress the archive incrementally, reducing the memory usage. @532910 You can check the memory usage by running:
The required memory is the On the wrong error message, I agree it would be good to improve it and I'm happy to change it on PyPPD. However, I think this is a case of CUPS handling the exception, not PyPPD. I'm following the issue you created in their repo. |
|
About the "list" option to retreive the index list of the PPD package we do not need to worry. Each PPD is represented with a one line entry there. So it should be around 1% or less of the whole archive size. |
Hey @tillkamppeter, nice to see you around! Yes, that's what I was thinking. It seems Python's LZMA package is able to do this streaming decompression, but we're not using it because it's Python3 only. We're using the I'm unable to work on this for the next weeks, so if anyone reading this wants to give it a try, please go ahead. The changes will probably be limited to https://github.com/vitorbaptista/pyppd/blob/master/pyppd/pyppd-ppdfile.in#L42-L55 and https://github.com/vitorbaptista/pyppd/blob/master/pyppd/compressor.py. |
Really I see nothing wrong with Python3 dependency, Python2 is widely deprecated nowadays. |
@tillkamppeter What do you think? We could use the native LZMA support on Python3 only, which will reduce the memory requirements, and fallback to the current usage of |
I think so, too. Do a 2.x which is Python3-only and use the native LZMA support. The rare cases of Python2 need are covered by the old 1.x versions using |
Could you finish this (including the 2.0 release) at least some days before Feb 27? This is the Feature Freeze for Ubuntu 20.04 LTS (Focal). Thanks. |
Isn't better to use Zstd?
|
Note that the zstd Python bindings are not in Ubuntu, and the zstd command line utility is not in Ubuntu Main. So using zstd for pyppd now would make it impossible to get the fixed pyppd into Ubuntu 20.04 LTS. So we should use the Python3 onboard solution for now and can think about anything even better only after 20.04. |
agree |
I have merged Pull Request #3, cloned the repo with it included and tried it. If I select a PPD close to the start of the list it extracts the PPD quickly, and works correctly with some MB of memory consumption. If I take PPD near the end of the list it takes some seconds and then gets killed with signal 9, before reaching the desired PPD. I have downloaded the source of foomatic-db and compressed the |
I have looked into it again and with a PPD near the beginning of the archive we really save memory, but if we try to extract one near the end of the archive, the process seems to hog much more memory as before and so the kernel kills it because it is running out of memory. So there seems to be a severe memory leak which I was not able to deduct from reviewing the Python code. |
Can it be that we had double compression and decompression in the code? If you look at the load() function, it calls decompress() then the old code called decompress again in the cat() function, which is now replaced by streaming decompression. |
Last patch (Pull Request #4) makes it working great now. Thank you very much @dsam82. |
openprinting-ppds requires too much memory, and doesn't handle this case
correctly:
The text was updated successfully, but these errors were encountered: