New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading PNG decoding, "parallel" PNG's #54
Comments
|
I've been thinking about this also: brion/mtpng#20 My original idea was to have some metadata that points to offsets within the file, for each zlib sub-block. However, your idea of just using IDAT block boundaries could make more sense, and only requires one bit of additional metadata to signal. An additional metadata flag could be used to signal that the first row of each block has filter type none (or other filter type that does not reference the previous row) - if this is known by the decoder, then it can perform defiltering in parallel too. |
|
I've had this idea for a while and have made my png-restart-marker repo public, it's kind of like JPEG restart markers for PNG, will push what I have so far tomorrow.
If you start on a scanline, IDAT and DEFLATE boundary the filter byte will be the first thing, I would just exclude filter types 2, 3, 4 at the start of each segment, then each worker is truly independent. |
Yes. The only spec change would be the definition of a new standard ancillary chunk type (that would be ignored by existing readers) that indicates which levels of parallelization are supported on the file: parallel IDAT decompression only, or parallel IDAT decompression+parallel de-filtering. It could also indicate the # of scanlines present in each IDAT chunk. This is so useful that my plan is to put together a quick prototype for 24bpp and 32bpp, specifically in order to benchmark it. |
|
I think if we could agree on a standard chunk type that contains this metadata, multiple independent implementations could then be created with faster writing/reading. This would be extremely valuable (large PNG's, like 8k-16k are now becoming common). |
|
One question is, should the safe-to-copy bit be set, in the chunk name? I suppose it shouldn't. An unaware PNG processing tool may do things like combine or re-split IDAT chunks, which would break parallel decoders if they still tried to process the chunks in parallel. |
|
I have created a proof-of-concept implementaion of parallel encoding and decoding here https://github.com/DavidBuchanan314/parallel-png-proposal , along with a proposal for how the metadata could be encoded. |
|
Wow that was fast! Thank you. My implementation can use this chunk data.
Yea, I'm thinking it shouldn't be set. According to the PNG spec, the IDAT boundaries have "no semantic significance", so a processing tool may recompress the data. It's the safe thing to do because we can't be sure what the processing tool is going to do with the IDAT data and/or filters. |
|
I'm wondering if the chunk should have an array with offsets/lengths that point to each IDAT chunk? (Someone suggested this, not sure who/where.) It's not necessary because a PNG reader can just find them by scanning the file, but perhaps there is some value in this on very large images. Having to find all the chunks and dispatch the decode jobs is serial processing. On the flip side, the IDAT offsets could be brittle in some way I can't think of, isn't necessary, and introduces a concept into PNG that seems to go against the flow. |
In your proposal: Shouldn't this be ceil(image_image/piece_height)? If the image height is 17 and the piece_height is 8, I would expect this to be 3 (not 2). (I filed a bug.) |
|
Hah, I spotted the bug around the same time as you did, I pushed a fix already.
Yes, I've been having similar thoughts. In practice, I don't think it'd be any faster to have an offset table, although it would be marginally faster if you wanted to implement random-access (but I'm not sure anyone would really want that?) |
|
One argument for using an offset table, is that you can have your IDAT chunks be arbitrary sizes - I know some encoders like to use chunks of a certain size, although I've never understood why exactly. |
|
I cleaned up and published the specification I had laying around, also added a method to encode without an offset table for maximum bikeshedding potential: https://github.com/libspng/png-restart-marker |
|
So far we have three potential approaches:
As a very brief summary: (Note: Here I use "piece" to refer to the subsections of the zlib stream, and/or slices of image)
It is my opinion that recording the file offsets of pieces (as in
Record a table of The I believe @brion is looking into implementing parallel decoding in mtpng, and I would be interested to see the performance numbers, especially in terms of quantifying how much time it costs to initially iterate over the |
|
As we are currently working on an update to PNG, the timing for this is quite good... My personal preference would be to avoid anything that includes offset tables for (a) the reasons that @DavidBuchanan314 mentioned concerning attacks but also (b) because of scenarios where assets are modified w/o changing their content (e.g., updated metadata) that could well change the byte offsets. So while I haven't looked at the proposals in depth, I would tend to favor |
|
I believe error handling will be the hardest part of this, do note that I went into the details of this and have made the argument that any error during parallel decoding must be recoverable and should be continued with linear decoding: #60 (comment). One issue I have with If defiltering requires data from the previous piece (flag = 0) then a simple parallel decoder would have to wait for the previous piece to finish, defeating the purpose of parallel decoding. An optimal decoder uses
To implement a decoder for it you only have to create If you encounter an unexpected filter value you switch to error recovery, it would be mostly error recovery code you have to add anyway. The "IDAT inside another IDAT" scenario doesn't seem possible. If all workers have to consume all data in their regions (which is a reasonable requirement) then you are in fact implicitly guarding against that. I wouldn't drop offsets for this reason. I have an entire page on the rationale behind the |
It looks possible to write completely standard PNG files that can be optionally decompressed across multiple threads:
https://twitter.com/richgel999/status/1470846966276018187
To do this, you would use multiple IDAT's, force a flush to a byte boundary at the end of each IDAT using a 0 or 1 byte uncompressed Deflate block (if necessary - 1/8th of the time it shouldn't be), and make sure the LZ matches never cross IDAT block boundaries. A special standard ancillary extension chunk (which would be ignored by existing readers) would hint to new decoders that the file can be decoded in parallel.
The filtering can also be done in parallel with decompression, using a simple pipeline.
This would significantly reduce the wall-clock time involved in reading large (8k-32k) PNG files on modern machines.
The text was updated successfully, but these errors were encountered: