New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lossless JPEG: Valid range for DC table ? #68
Comments
One remark. Using the flag:
seems to remove those huffman values |
Am 14.03.22 um 09:43 schrieb Mathieu Malaterre:
Let's consider the following steps:
|% wget http://graphics.stanford.edu/~jowens/223b/lena/lena.ppm % jpeg
-c -p lena.ppm lena.jpg |
If one read the DC table from the generated file (*), we'll see that:
|Offset 0x0025 Marker 0xffc4 DHT Define Huffman Table(s) length variable
0x113 JPEG_DHT_Parameters: TableClass = 0 HuffmanTableIdentifier = 0
[...] nHuffmanCodesOfLength 14 = 15 ValueOfHuffmanCode 0 = 241
ValueOfHuffmanCode 1 = 242 ValueOfHuffmanCode 2 = 243 ValueOfHuffmanCode
3 = 244 ValueOfHuffmanCode 4 = 245 ValueOfHuffmanCode 5 = 246
ValueOfHuffmanCode 6 = 247 ValueOfHuffmanCode 7 = 248 ValueOfHuffmanCode
8 = 249 ValueOfHuffmanCode 9 = 250 ValueOfHuffmanCode 10 = 251
ValueOfHuffmanCode 11 = 252 ValueOfHuffmanCode 12 = 253
ValueOfHuffmanCode 13 = 254 ValueOfHuffmanCode 14 = 255 |
However the value |241| in the huffman table is impossible according to
ISO 10918-1, H.1.2.2 Huffman coding of the modulo difference (table
H.2). It should be in the range 0 ≤ symbol ≤ 16.
What am I misunderstanding here ?
Note that this is a "default table" that covers all coding modes of
JPEG. While the codewords they represent do not appear in sequential
modes, they do appear in other modes and are therefore present. They
don't do any harm there. While the decoder would create the right
Huffman codes from them, it simply wouldn't use these entries.
Without a fully populated table, coding modes with the 12-bit sequential
or progressive or differential modes, or some of the "extended range"
modes of JPEG XT would require the -h command line switch to encode an
image. Thus, they are present.
Clearly, you get better coding results with "-h" which builds a
customized table that is ideal for the requested mode, and image.
Greetings,
Thomas
|
Am 14.03.22 um 10:38 schrieb Mathieu Malaterre:
One remark. Using the flag:
|-h : optimize the Huffman tables |
seems to remove those huffman values
Exactly. See previous email. What you see is just a Huffman default
table, which extends the example table from Annex K by those entries
necessary for all other coding modes.
Certainly not the ideal table, but not an error either.
Greetings,
Thomas
|
@scaramallion could you confirm you always use |
Hmm, I don't think I do. I'll have to check. Thanks for the heads-up |
Dear @thorfdbg I am trying to understand on how best to address this. Could you please have a quick look at the following 12bits JPEG file: You'll see that it contains a bogus 17 value in the huffman table:
I've used a simple scanning of all huffman table values to decide whether or not the input file is valid or not. However as discussed previously you've demonstrated that this is a naive approach, as tables could perfectly contains bogus huffman value if not used. My question is twofold:
Thanks much ! |
Am 18.03.22 um 09:30 schrieb Mathieu Malaterre:
Dear @thorfdbg <https://github.com/thorfdbg>
I am trying to understand on how best to address this. Could you please
have a quick look at the following 12bits JPEG file:
bogus_huffman
<https://user-images.githubusercontent.com/228803/158963962-0b4d6d98-90ab-4361-8096-f4f7a40b8db4.jpg>
You'll see that it contains a bogus 17 value in the huffman table:
|nHuffmanCodesOfLength 11 = 1 ValueOfHuffmanCode 0 = 17 |
I've used a simple scanning of all huffman table values to decide
whether or not the input file is valid or not. However as discussed
previously you've demonstrated that this is a naive approach, as tables
could perfectly contains bogus huffman value if not used.
Correct. You can also look at this as follows: The example Huffman table
documented in the JPEG standard, used as default table by many
implementations, defines a (subset of) Huffman codes. Can you actually
ensure that each encoded image actually makes use of each code in this
example table? Probably not.
Thus, I can certainly decode the above image correctly, and I wouldn't
know what the issue with the table is.
My question is twofold:
1. Could you confirm that the above attached 12bits JPEG file is indeed
invalid ?
Actually, it looks pretty ok to me.
2. Could you suggest a fix so that your |jpeg| command line tool reject
this invalid file ?
On which basis should it reject it? Because there is a code that is not
used? Why?
Greetings,
Thomas
|
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-6b/jdhuff.c#L253-L265 AFAIK this is the original code from libjpeg6b ... I cannot possibly report a 27 years old bug. |
OK, let's try: |
Am 25.03.2022 um 10:29 schrieb Mathieu Malaterre:
I cannot possibly report a 27 years old bug.
OK, let's try:
* libjpeg-turbo/libjpeg-turbo#586
<libjpeg-turbo/libjpeg-turbo#586>
Just to be sure... how did you create the stream? Note that libjpegturbo
does not implement all of JPEG, in particular, it does not implement the
lossless process.
Greetings,
Thomas
|
I've extracted the JPEG bitstream from a DICOM file (
That was the most complex thing with reporting of this bug. This is a 12bits per sample example. I know that libjpeg6b has supported those since day 1, I suspect most people will simply assume this is a bogus file. Really (ideally!), I would have generated an 8bits huffman example. However I did not know if there was a way using your The only sample I have at hand to demonstrate the issue are a lossless (which as you know is not handled in libjpeg-turbo or a 12bits sample, which is somewhat 'exotic' for the libjpeg-turbo audience). Is there a way to craft one such example (8bits/huffman) ? |
Am 25.03.2022 um 12:04 schrieb Mathieu Malaterre:
Just to be sure... how did you create the stream?
I've extracted the JPEG bitstream from a DICOM file
(|PHILIPS_Gyroscan-12-Jpeg_Extended_Process_2_4.dcm|). In DICOM
terminology this is a |JPEG Extended (Process 2 & 4): Default Transfer
Syntax for Lossy JPEG 12 Bit Image Compression (Process 4 only)| (aka
|1.2.840.10008.1.2.4.51|). Just a JPEG bistream with a SOF1 marker using
Sample Precision = 12 (grayscale).
For 12bit input, you need to recompile libjpeg-turbo, it does not
implement a unified bitdepth-agnoistic JPEG decoder as the ISO software
does. )-:
Greetings,
Thomas
|
I did indicate the complete steps at (including setting of My question remains: as to how to generate something equivalent using 8bits / sample... thanks for any suggestion (as always). |
As far as libjpeg-turbo is concerned, I believe this is best checked with Darrell. For this code, I added some additional security measures in 1.64 for the lossless predictive path and the AC coded sequential path. The refinement and sequential coding paths had already sufficient security measures to detect ouf-of-bounds cases for incoming streams. Anyhow, I believe the current code is fine as is, please re-open if you find an issue. |
Let's consider the following steps:
If one read the DC table from the generated file (*), we'll see that:
However the value
241
in the huffman table is impossible according to ISO 10918-1, H.1.2.2 Huffman coding of the modulo difference (table H.2). It should be in the range 0 ≤ symbol ≤ 16.What am I misunderstanding here ?
(*)
The text was updated successfully, but these errors were encountered: