Lossless JPEG: Valid range for DC table ? #68

malaterre · 2022-03-14T08:43:12Z

Let's consider the following steps:

% wget http://graphics.stanford.edu/~jowens/223b/lena/lena.ppm
% jpeg -c -p lena.ppm lena.jpg

If one read the DC table from the generated file (*), we'll see that:

Offset 0x0025 Marker 0xffc4 DHT Define Huffman Table(s) length variable 0x113
        JPEG_DHT_Parameters:
                 TableClass = 0
                 HuffmanTableIdentifier = 0
[...]
                         nHuffmanCodesOfLength 14 = 15
                                 ValueOfHuffmanCode 0 = 241
                                 ValueOfHuffmanCode 1 = 242
                                 ValueOfHuffmanCode 2 = 243
                                 ValueOfHuffmanCode 3 = 244
                                 ValueOfHuffmanCode 4 = 245
                                 ValueOfHuffmanCode 5 = 246
                                 ValueOfHuffmanCode 6 = 247
                                 ValueOfHuffmanCode 7 = 248
                                 ValueOfHuffmanCode 8 = 249
                                 ValueOfHuffmanCode 9 = 250
                                 ValueOfHuffmanCode 10 = 251
                                 ValueOfHuffmanCode 11 = 252
                                 ValueOfHuffmanCode 12 = 253
                                 ValueOfHuffmanCode 13 = 254
                                 ValueOfHuffmanCode 14 = 255

However the value 241 in the huffman table is impossible according to ISO 10918-1, H.1.2.2 Huffman coding of the modulo difference (table H.2). It should be in the range 0 ≤ symbol ≤ 16.

What am I misunderstanding here ?

(*)

% jpegdump < lena.jpg

The text was updated successfully, but these errors were encountered:

malaterre · 2022-03-14T09:38:36Z

One remark. Using the flag:

-h         : optimize the Huffman tables

seems to remove those huffman values

thorfdbg · 2022-03-14T16:40:25Z

Am 14.03.22 um 09:43 schrieb Mathieu Malaterre:

Let's consider the following steps: |% wget http://graphics.stanford.edu/~jowens/223b/lena/lena.ppm % jpeg -c -p lena.ppm lena.jpg | If one read the DC table from the generated file (*), we'll see that: |Offset 0x0025 Marker 0xffc4 DHT Define Huffman Table(s) length variable 0x113 JPEG_DHT_Parameters: TableClass = 0 HuffmanTableIdentifier = 0 [...] nHuffmanCodesOfLength 14 = 15 ValueOfHuffmanCode 0 = 241 ValueOfHuffmanCode 1 = 242 ValueOfHuffmanCode 2 = 243 ValueOfHuffmanCode 3 = 244 ValueOfHuffmanCode 4 = 245 ValueOfHuffmanCode 5 = 246 ValueOfHuffmanCode 6 = 247 ValueOfHuffmanCode 7 = 248 ValueOfHuffmanCode 8 = 249 ValueOfHuffmanCode 9 = 250 ValueOfHuffmanCode 10 = 251 ValueOfHuffmanCode 11 = 252 ValueOfHuffmanCode 12 = 253 ValueOfHuffmanCode 13 = 254 ValueOfHuffmanCode 14 = 255 | However the value |241| in the huffman table is impossible according to ISO 10918-1, H.1.2.2 Huffman coding of the modulo difference (table H.2). It should be in the range 0 ≤ symbol ≤ 16. What am I misunderstanding here ?

Note that this is a "default table" that covers all coding modes of JPEG. While the codewords they represent do not appear in sequential modes, they do appear in other modes and are therefore present. They don't do any harm there. While the decoder would create the right Huffman codes from them, it simply wouldn't use these entries. Without a fully populated table, coding modes with the 12-bit sequential or progressive or differential modes, or some of the "extended range" modes of JPEG XT would require the -h command line switch to encode an image. Thus, they are present. Clearly, you get better coding results with "-h" which builds a customized table that is ideal for the requested mode, and image. Greetings, Thomas

thorfdbg · 2022-03-14T16:41:47Z

Am 14.03.22 um 10:38 schrieb Mathieu Malaterre:

One remark. Using the flag: |-h : optimize the Huffman tables | seems to remove those huffman values

Exactly. See previous email. What you see is just a Huffman default table, which extends the example table from Annex K by those entries necessary for all other coding modes. Certainly not the ideal table, but not an error either. Greetings, Thomas

malaterre · 2022-03-15T07:15:45Z

@scaramallion could you confirm you always use optimized huffman tables in pylibjpeg. Otherwise this will confuse another famous lossless JPEG library when decompressing DICOM instances.

scaramallion · 2022-03-16T01:58:35Z

Hmm, I don't think I do. I'll have to check. Thanks for the heads-up

malaterre · 2022-03-18T08:30:39Z

Dear @thorfdbg

I am trying to understand on how best to address this. Could you please have a quick look at the following 12bits JPEG file:

You'll see that it contains a bogus 17 value in the huffman table:

                         nHuffmanCodesOfLength 11 = 1
                                 ValueOfHuffmanCode 0 = 17

I've used a simple scanning of all huffman table values to decide whether or not the input file is valid or not. However as discussed previously you've demonstrated that this is a naive approach, as tables could perfectly contains bogus huffman value if not used.

My question is twofold:

Could you confirm that the above attached 12bits JPEG file is indeed invalid ?
Could you suggest a fix so that your jpeg command line tool reject this invalid file ?

Thanks much !

thorfdbg · 2022-03-18T17:49:54Z

Am 18.03.22 um 09:30 schrieb Mathieu Malaterre:

Dear @thorfdbg <https://github.com/thorfdbg> I am trying to understand on how best to address this. Could you please have a quick look at the following 12bits JPEG file: bogus_huffman <https://user-images.githubusercontent.com/228803/158963962-0b4d6d98-90ab-4361-8096-f4f7a40b8db4.jpg> You'll see that it contains a bogus 17 value in the huffman table: |nHuffmanCodesOfLength 11 = 1 ValueOfHuffmanCode 0 = 17 | I've used a simple scanning of all huffman table values to decide whether or not the input file is valid or not. However as discussed previously you've demonstrated that this is a naive approach, as tables could perfectly contains bogus huffman value if not used.

Correct. You can also look at this as follows: The example Huffman table documented in the JPEG standard, used as default table by many implementations, defines a (subset of) Huffman codes. Can you actually ensure that each encoded image actually makes use of each code in this example table? Probably not. Thus, I can certainly decode the above image correctly, and I wouldn't know what the issue with the table is.

My question is twofold: 1. Could you confirm that the above attached 12bits JPEG file is indeed invalid ?

Actually, it looks pretty ok to me.

2. Could you suggest a fix so that your |jpeg| command line tool reject this invalid file ?

On which basis should it reject it? Because there is a code that is not used? Why? Greetings, Thomas

malaterre · 2022-03-21T09:25:30Z

On which basis should it reject it? Because there is a code that is not used? Why?

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-6b/jdhuff.c#L253-L265

AFAIK this is the original code from libjpeg6b ... I cannot possibly report a 27 years old bug.

malaterre · 2022-03-25T09:29:12Z

I cannot possibly report a 27 years old bug.

OK, let's try:

Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict libjpeg-turbo/libjpeg-turbo#586

thorfdbg · 2022-03-25T10:50:34Z

Am 25.03.2022 um 10:29 schrieb Mathieu Malaterre:

I cannot possibly report a 27 years old bug. OK, let's try: * libjpeg-turbo/libjpeg-turbo#586 <libjpeg-turbo/libjpeg-turbo#586>

Just to be sure... how did you create the stream? Note that libjpegturbo does not implement all of JPEG, in particular, it does not implement the lossless process. Greetings, Thomas

malaterre · 2022-03-25T11:04:03Z

Just to be sure... how did you create the stream?

I've extracted the JPEG bitstream from a DICOM file (PHILIPS_Gyroscan-12-Jpeg_Extended_Process_2_4.dcm). In DICOM terminology this is a JPEG Extended (Process 2 & 4): Default Transfer Syntax for Lossy JPEG 12 Bit Image Compression (Process 4 only) (aka 1.2.840.10008.1.2.4.51). Just a JPEG bistream with a SOF1 marker using Sample Precision = 12 (grayscale).

Note that libjpegturbo does not implement all of JPEG, in particular, it does not implement the lossless process.

That was the most complex thing with reporting of this bug. This is a 12bits per sample example. I know that libjpeg6b has supported those since day 1, I suspect most people will simply assume this is a bogus file.

Really (ideally!), I would have generated an 8bits huffman example. However I did not know if there was a way using your jpeg command line tool to craft one such.

The only sample I have at hand to demonstrate the issue are a lossless (which as you know is not handled in libjpeg-turbo or a 12bits sample, which is somewhat 'exotic' for the libjpeg-turbo audience).

Is there a way to craft one such example (8bits/huffman) ?

thorfdbg · 2022-03-25T11:07:57Z

Am 25.03.2022 um 12:04 schrieb Mathieu Malaterre:

Just to be sure... how did you create the stream? I've extracted the JPEG bitstream from a DICOM file (|PHILIPS_Gyroscan-12-Jpeg_Extended_Process_2_4.dcm|). In DICOM terminology this is a |JPEG Extended (Process 2 & 4): Default Transfer Syntax for Lossy JPEG 12 Bit Image Compression (Process 4 only)| (aka |1.2.840.10008.1.2.4.51|). Just a JPEG bistream with a SOF1 marker using Sample Precision = 12 (grayscale).

For 12bit input, you need to recompile libjpeg-turbo, it does not implement a unified bitdepth-agnoistic JPEG decoder as the ISO software does. )-: Greetings, Thomas

malaterre · 2022-03-25T11:11:55Z

For 12bit input, you need to recompile libjpeg-turbo,

I did indicate the complete steps at (including setting of WITH_12BIT:BOOL=ON):

Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict libjpeg-turbo/libjpeg-turbo#586 (comment)

My question remains: as to how to generate something equivalent using 8bits / sample... thanks for any suggestion (as always).

thorfdbg · 2022-05-23T06:04:15Z

As far as libjpeg-turbo is concerned, I believe this is best checked with Darrell. For this code, I added some additional security measures in 1.64 for the lossless predictive path and the AC coded sequential path. The refinement and sequential coding paths had already sufficient security measures to detect ouf-of-bounds cases for incoming streams.

Anyhow, I believe the current code is fine as is, please re-open if you find an issue.

malaterre mentioned this issue Mar 25, 2022

Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict libjpeg-turbo/libjpeg-turbo#586

Closed

thorfdbg closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lossless JPEG: Valid range for DC table ? #68

Lossless JPEG: Valid range for DC table ? #68

malaterre commented Mar 14, 2022

malaterre commented Mar 14, 2022

thorfdbg commented Mar 14, 2022 via email

thorfdbg commented Mar 14, 2022 via email

malaterre commented Mar 15, 2022

scaramallion commented Mar 16, 2022 •

edited

malaterre commented Mar 18, 2022

thorfdbg commented Mar 18, 2022 via email

malaterre commented Mar 21, 2022 •

edited

malaterre commented Mar 25, 2022

thorfdbg commented Mar 25, 2022 via email

malaterre commented Mar 25, 2022

thorfdbg commented Mar 25, 2022 via email

malaterre commented Mar 25, 2022

thorfdbg commented May 23, 2022

Lossless JPEG: Valid range for DC table ? #68

Lossless JPEG: Valid range for DC table ? #68

Comments

malaterre commented Mar 14, 2022

malaterre commented Mar 14, 2022

thorfdbg commented Mar 14, 2022 via email

thorfdbg commented Mar 14, 2022 via email

malaterre commented Mar 15, 2022

scaramallion commented Mar 16, 2022 • edited

malaterre commented Mar 18, 2022

thorfdbg commented Mar 18, 2022 via email

malaterre commented Mar 21, 2022 • edited

malaterre commented Mar 25, 2022

thorfdbg commented Mar 25, 2022 via email

malaterre commented Mar 25, 2022

thorfdbg commented Mar 25, 2022 via email

malaterre commented Mar 25, 2022

thorfdbg commented May 23, 2022

scaramallion commented Mar 16, 2022 •

edited

malaterre commented Mar 21, 2022 •

edited