Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lossless JPEG: Valid range for DC table ? #68

Closed
malaterre opened this issue Mar 14, 2022 · 14 comments
Closed

Lossless JPEG: Valid range for DC table ? #68

malaterre opened this issue Mar 14, 2022 · 14 comments

Comments

@malaterre
Copy link
Contributor

Let's consider the following steps:

% wget http://graphics.stanford.edu/~jowens/223b/lena/lena.ppm
% jpeg -c -p lena.ppm lena.jpg

If one read the DC table from the generated file (*), we'll see that:

Offset 0x0025 Marker 0xffc4 DHT Define Huffman Table(s) length variable 0x113
        JPEG_DHT_Parameters:
                 TableClass = 0
                 HuffmanTableIdentifier = 0
[...]
                         nHuffmanCodesOfLength 14 = 15
                                 ValueOfHuffmanCode 0 = 241
                                 ValueOfHuffmanCode 1 = 242
                                 ValueOfHuffmanCode 2 = 243
                                 ValueOfHuffmanCode 3 = 244
                                 ValueOfHuffmanCode 4 = 245
                                 ValueOfHuffmanCode 5 = 246
                                 ValueOfHuffmanCode 6 = 247
                                 ValueOfHuffmanCode 7 = 248
                                 ValueOfHuffmanCode 8 = 249
                                 ValueOfHuffmanCode 9 = 250
                                 ValueOfHuffmanCode 10 = 251
                                 ValueOfHuffmanCode 11 = 252
                                 ValueOfHuffmanCode 12 = 253
                                 ValueOfHuffmanCode 13 = 254
                                 ValueOfHuffmanCode 14 = 255

However the value 241 in the huffman table is impossible according to ISO 10918-1, H.1.2.2 Huffman coding of the modulo difference (table H.2). It should be in the range 0 ≤ symbol ≤ 16.

What am I misunderstanding here ?

(*)

% jpegdump < lena.jpg
@malaterre
Copy link
Contributor Author

One remark. Using the flag:

-h         : optimize the Huffman tables

seems to remove those huffman values

@thorfdbg
Copy link
Owner

thorfdbg commented Mar 14, 2022 via email

@thorfdbg
Copy link
Owner

thorfdbg commented Mar 14, 2022 via email

@malaterre
Copy link
Contributor Author

@scaramallion could you confirm you always use optimized huffman tables in pylibjpeg. Otherwise this will confuse another famous lossless JPEG library when decompressing DICOM instances.

@scaramallion
Copy link

scaramallion commented Mar 16, 2022

Hmm, I don't think I do. I'll have to check. Thanks for the heads-up

@malaterre
Copy link
Contributor Author

Dear @thorfdbg

I am trying to understand on how best to address this. Could you please have a quick look at the following 12bits JPEG file:

bogus_huffman

You'll see that it contains a bogus 17 value in the huffman table:

                         nHuffmanCodesOfLength 11 = 1
                                 ValueOfHuffmanCode 0 = 17

I've used a simple scanning of all huffman table values to decide whether or not the input file is valid or not. However as discussed previously you've demonstrated that this is a naive approach, as tables could perfectly contains bogus huffman value if not used.

My question is twofold:

  1. Could you confirm that the above attached 12bits JPEG file is indeed invalid ?
  2. Could you suggest a fix so that your jpeg command line tool reject this invalid file ?

Thanks much !

@thorfdbg
Copy link
Owner

thorfdbg commented Mar 18, 2022 via email

@malaterre
Copy link
Contributor Author

malaterre commented Mar 21, 2022

On which basis should it reject it? Because there is a code that is not used? Why?

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-6b/jdhuff.c#L253-L265

AFAIK this is the original code from libjpeg6b ... I cannot possibly report a 27 years old bug.

@malaterre
Copy link
Contributor Author

I cannot possibly report a 27 years old bug.

OK, let's try:

@thorfdbg
Copy link
Owner

thorfdbg commented Mar 25, 2022 via email

@malaterre
Copy link
Contributor Author

Just to be sure... how did you create the stream?

I've extracted the JPEG bitstream from a DICOM file (PHILIPS_Gyroscan-12-Jpeg_Extended_Process_2_4.dcm). In DICOM terminology this is a JPEG Extended (Process 2 & 4): Default Transfer Syntax for Lossy JPEG 12 Bit Image Compression (Process 4 only) (aka 1.2.840.10008.1.2.4.51). Just a JPEG bistream with a SOF1 marker using Sample Precision = 12 (grayscale).

Note that libjpegturbo does not implement all of JPEG, in particular, it does not implement the lossless process.

That was the most complex thing with reporting of this bug. This is a 12bits per sample example. I know that libjpeg6b has supported those since day 1, I suspect most people will simply assume this is a bogus file.

Really (ideally!), I would have generated an 8bits huffman example. However I did not know if there was a way using your jpeg command line tool to craft one such.

The only sample I have at hand to demonstrate the issue are a lossless (which as you know is not handled in libjpeg-turbo or a 12bits sample, which is somewhat 'exotic' for the libjpeg-turbo audience).

Is there a way to craft one such example (8bits/huffman) ?

@thorfdbg
Copy link
Owner

thorfdbg commented Mar 25, 2022 via email

@malaterre
Copy link
Contributor Author

For 12bit input, you need to recompile libjpeg-turbo,

I did indicate the complete steps at (including setting of WITH_12BIT:BOOL=ON):

My question remains: as to how to generate something equivalent using 8bits / sample... thanks for any suggestion (as always).

@thorfdbg
Copy link
Owner

As far as libjpeg-turbo is concerned, I believe this is best checked with Darrell. For this code, I added some additional security measures in 1.64 for the lossless predictive path and the AC coded sequential path. The refinement and sequential coding paths had already sufficient security measures to detect ouf-of-bounds cases for incoming streams.

Anyhow, I believe the current code is fine as is, please re-open if you find an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants