LZ4 format not compatible? #43

mickeyl · 2023-04-24T07:48:44Z

For some reason, it looks like the version of LZ4 you have implemented seems not compatible with Apple's LZ4_RAW or the reference implementation at https://github.com/lz4/lz4. Is that possible or am I using it wrong?

tsolomko · 2023-04-24T08:00:26Z

I don't know anything about Apple's LZ4_RAW, but it definitely should be compatible with the reference implementation. Do you have any specific examples of incompatibility?

mickeyl · 2023-04-24T09:15:43Z

Ok, here's some details. Consider the following example using upstream lz4.c and lz4.h:

-(void)testCompression {

    auto string = std::string("HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO");
    uint8_t buffer[100];

    auto compressed = ::LZ4_compress_default(string.c_str(), (char*)buffer, int(string.end() - string.begin()), sizeof(buffer));
    
    for (int i = 0; i < compressed; ++i) {
        printf("%02X, ", buffer[i]);
    }
    printf("\n");
}

This prints out 6F, 48, 41, 4C, 4C, 4F, 20, 06, 00, 1D, 50, 48, 41, 4C, 4C, 4F, .

A corresponding Swift program:

import SWCompression
import Foundation

let string = "HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO"
let data = string.data(using: .utf8)!
let compressed = LZ4.compress(data: data)
for byte in compressed {
    print(String(format: "%02X, ", byte), terminator: "")
}
print("")

emits 04, 22, 4D, 18, 64, 70, B9, 10, 00, 00, 00, 6F, 48, 41, 4C, 4C, 4F, 20, 06, 00, 1D, 50, 48, 41, 4C, 4C, 4F, 00, 00, 00, 00, 4B, B9, 24, 1F,.

It looks like your algorithm implementation is appending/prepending some metadata.

tsolomko · 2023-04-24T09:37:45Z

Right, the function LZ4_compress_default from the reference implementation produces compressed blocks whereas SWCompression produces compressed frames which consist of blocks (in principle, more than one) and various metadata that is required to decompress these blocks. You may note that the output of LZ4_compress_default is contained in its entirety within SWCompression's output, which confirms this.

To quote LZ4 manual:

Blocks are different from Frames (doc/lz4_Frame_format.md).
Frames bundle both blocks and metadata in a specified manner.
Embedding metadata is required for compressed data to be self-contained and portable.
Frame format is delivered through a companion API, declared in lz4frame.h.

As such, to obtain the exact same output from the reference implementation, you would need to use a function from lz4frame.h.

mickeyl · 2023-04-24T09:40:41Z

I see. Would it be possible to add a parameter to your implementation so that we can choose the output format between frames and blocks?

tsolomko · 2023-04-24T09:55:55Z

I am not sure there is any value in choosing between frames and blocks. There are various configurable parameters of the LZ4 algorithm (block size, block dependency, dictionary compression) that can result in different outputs (which may even include different amount of blocks themselves!) for the same input. So to decompress any isolated block you would still need to provide some external knowledge to the decompressor about how these blocks were compressed. At this point you would be reimplementing (at least, partially) the LZ4 frame format but in some other form.

So my question to you, why do you require the capability to compress into blocks? If you would like to reduce the size of the output, you can have a look at this function. By default, blockChecksums and contentSize parameters are false, while contentChecksum is true, so you can additionally set the latter to false to further reduce the amount of metadata.

mickeyl · 2023-04-24T10:08:16Z

I'm using LZ4 to compress/uncompress binary protocol data that is sent to a micro-controller (ESP32) via BLE. Every saved byte is valuable in this scenario. The central (iOS) would be using SWCompression, the peripheral (FreeRTOS) would be using upstream lz4. I would have thought that it would be pretty easy to just "toggle" appending/prepending the frame metadata, but if it's not, I might as well be using upstream lz4 on iOS.

tsolomko · 2023-04-24T12:41:30Z

Hmm, I see. I will think about adding such functionality in the future.

I would have thought that it would be pretty easy to just "toggle" appending/prepending the frame metadata

Generally, it is not. As I mentioned before one input may result in creation of several blocks. This can happen if the input is "sufficiently large", the precise definition of which depends on your uncompressed block size settings. In SWCompression by default the block size is 4MB, which incidentally is the strict upper limit of the reference implementation on the block size. I do wonder what the aforementioned LZ4_compress_default function from lz4.h does if the input is larger than 4MB. It is also unclear to me how does it handle cases of non-compressible inputs.

Meanwhile, assuming that size of your input is always smaller than 4MB and the input is always compressible, I can suggest the following workaround to extract only block data:

let data = // ... data.count must be smaller than 4 * 1024 * 1024
let compressedData = LZ4.compress(data: data)
let block = compressedData[(compressedData.startIndex + 11)..<(compressedData.endIndex - 8)]

mickeyl · 2023-04-24T12:42:33Z

Awesome, that's a quick fix that will do it for now. Thanks a lot.

mickeyl closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LZ4 format not compatible? #43

LZ4 format not compatible? #43

mickeyl commented Apr 24, 2023

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023 •

edited

Loading

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023

tsolomko commented Apr 24, 2023 •

edited

Loading

mickeyl commented Apr 24, 2023 •

edited

Loading

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023 •

edited

Loading

LZ4 format not compatible? #43

LZ4 format not compatible? #43

Comments

mickeyl commented Apr 24, 2023

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023 • edited Loading

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023

tsolomko commented Apr 24, 2023 • edited Loading

mickeyl commented Apr 24, 2023 • edited Loading

tsolomko commented Apr 24, 2023

mickeyl commented Apr 24, 2023 • edited Loading

mickeyl commented Apr 24, 2023 •

edited

Loading

tsolomko commented Apr 24, 2023 •

edited

Loading

mickeyl commented Apr 24, 2023 •

edited

Loading

mickeyl commented Apr 24, 2023 •

edited

Loading