openprinting-ppds: MemoryError #2

532910 · 2020-02-14T13:39:49Z

openprinting-ppds requires too much memory, and doesn't handle this case
correctly:

% /usr/lib/cups/driver/openprinting-ppds cat openprinting-ppds:0/ppd/openprinting/Samsung/PXL/Samsung_M262x_282x_Series_PXL.ppd
Traceback (most recent call last):
  File "/usr/lib/cups/driver/openprinting-ppds", line 119, in <module>
    main()
  File "/usr/lib/cups/driver/openprinting-ppds", line 95, in main
    ppd = cat(args[1])
  File "/usr/lib/cups/driver/openprinting-ppds", line 67, in cat
    ppds['ARCHIVE'] = BytesIO(decompress(base64.b64decode(ppds['ARCHIVE'].encode('ASCII'))))
  File "/usr/lib/cups/driver/openprinting-ppds", line 17, in decompress
    return process.communicate(value)[0]
  File "/usr/lib/python3.7/subprocess.py", line 939, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.7/subprocess.py", line 1711, in _communicate
    stdout = b''.join(stdout)
MemoryError
zsh: exit 1     /usr/lib/cups/driver/openprinting-ppds cat

% free
              total        used        free      shared  buff/cache   available
Mem:          986Mi        55Mi       698Mi       0.0Ki       233Mi       762Mi
Swap:            0B          0B          0B

% apt policy openprinting-ppds 
openprinting-ppds:
  Installed: 20181217-2
  Candidate: 20181217-2
  Version table:
 *** 20181217-2 500
        500 https://deb.debian.org/debian buster/main amd64 Packages
        100 /var/lib/dpkg/status

The text was updated successfully, but these errors were encountered:

vitorbaptista · 2020-02-14T14:30:53Z

Hi @532910, thanks for the report.

This problem is because pyppd saves the all PPDs into a single compressed string, which needs to be decompressed in its entirety to load a PPD. This requires enough RAM to hold all decompressed PPDs in memory. On my system (Ubuntu 16.04), it requires 700 MB to extract a single PPD, and ~30 MB to list the available PPDs. It's a lot.

To solve this, we'd need to change the compression method to somehow be able to extract just the required PPD. This will probably reduce the compression rate, as the PPDs are compressed individually (or in batches), in exchange of a reduced runtime memory usage.

I won't have time to work on this in the following weeks, but if you or anyone else reading this is able to work on a pull request, I can help pointing in the right direction.

532910 · 2020-02-14T14:37:40Z

On my system 762Mi is not enough. It's virtual host so I've increased ram and it works now, but I don't know how to check how much it requires.

This issue is not only about required memory amount, but also about wrong handling this case. Cups just says: Unable to copy PPD file / empty PPD file

vitorbaptista · 2020-02-14T14:55:35Z

Adding to my comment above, it might be possible to use https://docs.python.org/3/library/lzma.html#lzma.LZMADecompressor to decompress the archive incrementally, reducing the memory usage.

@532910 You can check the memory usage by running:

$ /usr/bin/time --verbose /usr/lib/cups/driver/openprinting-ppds cat openprinting-ppds:0/ppd/openprinting/Samsung/PXL/Samsung_M262x_282x_Series_PXL.ppd
...
	Command being timed: "/usr/lib/cups/driver/openprinting-ppds cat openprinting-ppds:0/ppd/openprinting/Samsung/PXL/Samsung_M262x_282x_Series_PXL.ppd"
	User time (seconds): 1.11
	System time (seconds): 0.51
	Percent of CPU this job got: 142%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.13
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 700256
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 189094
	Voluntary context switches: 28437
	Involuntary context switches: 17
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The required memory is the Maximum resident set size. In this example, 700256 kbytes, roughly ~700 MB. The more PPDs in your archive, the higher this memory requirement.

On the wrong error message, I agree it would be good to improve it and I'm happy to change it on PyPPD. However, I think this is a case of CUPS handling the exception, not PyPPD. I'm following the issue you created in their repo.

532910 · 2020-02-14T15:06:47Z

Maximum resident set size (kbytes): 1159316: 1.1G

tillkamppeter · 2020-02-14T17:28:58Z

About the "list" option to retreive the index list of the PPD package we do not need to worry. Each PPD is represented with a one line entry there. So it should be around 1% or less of the whole archive size.
With the "cat" option we request always one single PPD file and a single PPD is rather small compared to what the whole package can be. What we should do is to not decompress the whole package but do a streaming decompression until we arrive at the desired PPD file deleting the unneeded/skipped data immediately before loading new data, until we reach the desired PPD and after having the desired PPD loaded immediately stop the process. We could load 2 data blocks of ~ 10 kB and if they do not contain the beginning of the PPD discard the first and load the next and so on until we have the beginning of the PPD. Then we stop discarding and load more blocks until we find the end. After that we output the PPD and terminate the program. WDYT?

vitorbaptista · 2020-02-14T22:35:38Z

Hey @tillkamppeter, nice to see you around! Yes, that's what I was thinking. It seems Python's LZMA package is able to do this streaming decompression, but we're not using it because it's Python3 only. We're using the xz executable. We need to find a way to send the data to xz and read the result in batches, which I'm not sure is possible.

I'm unable to work on this for the next weeks, so if anyone reading this wants to give it a try, please go ahead. The changes will probably be limited to https://github.com/vitorbaptista/pyppd/blob/master/pyppd/pyppd-ppdfile.in#L42-L55 and https://github.com/vitorbaptista/pyppd/blob/master/pyppd/compressor.py.

532910 · 2020-02-16T16:18:13Z

Really I see nothing wrong with Python3 dependency, Python2 is widely deprecated nowadays.

vitorbaptista · 2020-02-18T22:14:49Z

Really I see nothing wrong with Python3 dependency, Python2 is widely deprecated nowadays.

@tillkamppeter What do you think? We could use the native LZMA support on Python3 only, which will reduce the memory requirements, and fallback to the current usage of xz. However, if Python2 isn't important anymore, we could just stop supporting it and bump the version to 2.x. If anyone needs Python2 support, they can use the 1.x versions.

tillkamppeter · 2020-02-18T22:21:11Z

I think so, too. Do a 2.x which is Python3-only and use the native LZMA support. The rare cases of Python2 need are covered by the old 1.x versions using xz.
By the way, in Ubuntu 20.04 Python2 will be completely removed from Main, and most common Python projects (HPLIP, system-config-printer, ...) had already switched to Python3 years ago.

tillkamppeter · 2020-02-18T22:22:35Z

Could you finish this (including the 2.0 release) at least some days before Feb 27? This is the Feature Freeze for Ubuntu 20.04 LTS (Focal). Thanks.

532910 · 2020-02-18T23:16:34Z

Isn't better to use Zstd?

% git clone https://github.com/OpenPrinting/foomatic-db
% cd foomatic-db/db/source/
% tar caf PPD.tar PPD
% =time --verbose zstd --ultra -22 PPD.tar -o PPD.tar.ultra.zst
	User time (seconds): 72.66
	Maximum resident set size (kbytes): 1261088
% =time --verbose zstd PPD.tar -o PPD.tar.default.zst        
	User time (seconds): 0.56
	Maximum resident set size (kbytes): 36536
% =time --verbose xz --keep PPD.tar --stdout > PPD.tar.default.xz
	User time (seconds): 93.65
	Maximum resident set size (kbytes): 98024
% =time --verbose xz --keep --best PPD.tar --stdout > PPD.tar.best.xz   
	User time (seconds): 84.93
	Maximum resident set size (kbytes): 690072

% ls
587M  PPD.tar
3.2M  PPD.tar.best.xz
4.7M  PPD.tar.default.xz
14M   PPD.tar.default.zst
1.9M  PPD.tar.ultra.zst

% =time --verbose xzcat PPD.tar.default.xz > /dev/null 
	User time (seconds): 0.96
	Maximum resident set size (kbytes): 10808
% =time --verbose xzcat PPD.tar.best.xz > /dev/null 
	User time (seconds): 0.85
	Maximum resident set size (kbytes): 68220
% =time --verbose zstdcat PPD.tar.default.zst > /dev/null
	User time (seconds): 0.22
	Maximum resident set size (kbytes): 4392
% =time --verbose zstdcat PPD.tar.ultra.zst  > /dev/null
	User time (seconds): 0.18
	Maximum resident set size (kbytes): 133560

tillkamppeter · 2020-02-19T07:31:11Z

Note that the zstd Python bindings are not in Ubuntu, and the zstd command line utility is not in Ubuntu Main. So using zstd for pyppd now would make it impossible to get the fixed pyppd into Ubuntu 20.04 LTS. So we should use the Python3 onboard solution for now and can think about anything even better only after 20.04.

532910 · 2020-02-19T07:33:19Z

agree

tillkamppeter · 2020-02-21T22:53:51Z

I have merged Pull Request #3, cloned the repo with it included and tried it. If I select a PPD close to the start of the list it extracts the PPD quickly, and works correctly with some MB of memory consumption. If I take PPD near the end of the list it takes some seconds and then gets killed with signal 9, before reaching the desired PPD.

I have downloaded the source of foomatic-db and compressed the db/source/PPD directory. I am on Ubuntu Focal, this distro is Python3-only and there is no /usr/bin/python. I tried both linking /usr/bin/python3 to /usr/bin/python and editing the shebang of the self-extracting archive.

tillkamppeter · 2020-02-22T08:29:19Z

I have looked into it again and with a PPD near the beginning of the archive we really save memory, but if we try to extract one near the end of the archive, the process seems to hog much more memory as before and so the kernel kills it because it is running out of memory. So there seems to be a severe memory leak which I was not able to deduct from reviewing the Python code.

tillkamppeter · 2020-02-22T09:00:52Z

Can it be that we had double compression and decompression in the code? If you look at the load() function, it calls decompress() then the old code called decompress again in the cat() function, which is now replaced by streaming decompression.

tillkamppeter · 2020-02-26T12:02:39Z

Last patch (Pull Request #4) makes it working great now. Thank you very much @dsam82.

532910 mentioned this issue Feb 14, 2020

bad error message apple/cups#5737

Closed

tillkamppeter closed this as completed Feb 26, 2020

532910 mentioned this issue Mar 16, 2021

bad error message OpenPrinting/cups#148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openprinting-ppds: MemoryError #2

openprinting-ppds: MemoryError #2

532910 commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 14, 2020 •

edited

tillkamppeter commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 16, 2020

vitorbaptista commented Feb 18, 2020

tillkamppeter commented Feb 18, 2020

tillkamppeter commented Feb 18, 2020

532910 commented Feb 18, 2020

tillkamppeter commented Feb 19, 2020

532910 commented Feb 19, 2020

tillkamppeter commented Feb 21, 2020

tillkamppeter commented Feb 22, 2020

tillkamppeter commented Feb 22, 2020

tillkamppeter commented Feb 26, 2020

openprinting-ppds: MemoryError #2

openprinting-ppds: MemoryError #2

Comments

532910 commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 14, 2020 • edited

tillkamppeter commented Feb 14, 2020

vitorbaptista commented Feb 14, 2020

532910 commented Feb 16, 2020

vitorbaptista commented Feb 18, 2020

tillkamppeter commented Feb 18, 2020

tillkamppeter commented Feb 18, 2020

532910 commented Feb 18, 2020

tillkamppeter commented Feb 19, 2020

532910 commented Feb 19, 2020

tillkamppeter commented Feb 21, 2020

tillkamppeter commented Feb 22, 2020

tillkamppeter commented Feb 22, 2020

tillkamppeter commented Feb 26, 2020

532910 commented Feb 14, 2020 •

edited