Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZ4 format not compatible? #43

Closed
mickeyl opened this issue Apr 24, 2023 · 8 comments
Closed

LZ4 format not compatible? #43

mickeyl opened this issue Apr 24, 2023 · 8 comments

Comments

@mickeyl
Copy link

mickeyl commented Apr 24, 2023

For some reason, it looks like the version of LZ4 you have implemented seems not compatible with Apple's LZ4_RAW or the reference implementation at https://github.com/lz4/lz4. Is that possible or am I using it wrong?

@tsolomko
Copy link
Owner

I don't know anything about Apple's LZ4_RAW, but it definitely should be compatible with the reference implementation. Do you have any specific examples of incompatibility?

@mickeyl
Copy link
Author

mickeyl commented Apr 24, 2023

Ok, here's some details. Consider the following example using upstream lz4.c and lz4.h:

-(void)testCompression {

    auto string = std::string("HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO");
    uint8_t buffer[100];

    auto compressed = ::LZ4_compress_default(string.c_str(), (char*)buffer, int(string.end() - string.begin()), sizeof(buffer));
    
    for (int i = 0; i < compressed; ++i) {
        printf("%02X, ", buffer[i]);
    }
    printf("\n");
}

This prints out 6F, 48, 41, 4C, 4C, 4F, 20, 06, 00, 1D, 50, 48, 41, 4C, 4C, 4F, .

A corresponding Swift program:

import SWCompression
import Foundation

let string = "HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO HALLO"
let data = string.data(using: .utf8)!
let compressed = LZ4.compress(data: data)
for byte in compressed {
    print(String(format: "%02X, ", byte), terminator: "")
}
print("")

emits 04, 22, 4D, 18, 64, 70, B9, 10, 00, 00, 00, 6F, 48, 41, 4C, 4C, 4F, 20, 06, 00, 1D, 50, 48, 41, 4C, 4C, 4F, 00, 00, 00, 00, 4B, B9, 24, 1F,.

It looks like your algorithm implementation is appending/prepending some metadata.

@tsolomko
Copy link
Owner

Right, the function LZ4_compress_default from the reference implementation produces compressed blocks whereas SWCompression produces compressed frames which consist of blocks (in principle, more than one) and various metadata that is required to decompress these blocks. You may note that the output of LZ4_compress_default is contained in its entirety within SWCompression's output, which confirms this.

To quote LZ4 manual:

Blocks are different from Frames (doc/lz4_Frame_format.md).
Frames bundle both blocks and metadata in a specified manner.
Embedding metadata is required for compressed data to be self-contained and portable.
Frame format is delivered through a companion API, declared in lz4frame.h.

As such, to obtain the exact same output from the reference implementation, you would need to use a function from lz4frame.h.

@mickeyl
Copy link
Author

mickeyl commented Apr 24, 2023

I see. Would it be possible to add a parameter to your implementation so that we can choose the output format between frames and blocks?

@tsolomko
Copy link
Owner

tsolomko commented Apr 24, 2023

I am not sure there is any value in choosing between frames and blocks. There are various configurable parameters of the LZ4 algorithm (block size, block dependency, dictionary compression) that can result in different outputs (which may even include different amount of blocks themselves!) for the same input. So to decompress any isolated block you would still need to provide some external knowledge to the decompressor about how these blocks were compressed. At this point you would be reimplementing (at least, partially) the LZ4 frame format but in some other form.

So my question to you, why do you require the capability to compress into blocks? If you would like to reduce the size of the output, you can have a look at this function. By default, blockChecksums and contentSize parameters are false, while contentChecksum is true, so you can additionally set the latter to false to further reduce the amount of metadata.

@mickeyl
Copy link
Author

mickeyl commented Apr 24, 2023

I'm using LZ4 to compress/uncompress binary protocol data that is sent to a micro-controller (ESP32) via BLE. Every saved byte is valuable in this scenario. The central (iOS) would be using SWCompression, the peripheral (FreeRTOS) would be using upstream lz4. I would have thought that it would be pretty easy to just "toggle" appending/prepending the frame metadata, but if it's not, I might as well be using upstream lz4 on iOS.

@tsolomko
Copy link
Owner

Hmm, I see. I will think about adding such functionality in the future.

I would have thought that it would be pretty easy to just "toggle" appending/prepending the frame metadata

Generally, it is not. As I mentioned before one input may result in creation of several blocks. This can happen if the input is "sufficiently large", the precise definition of which depends on your uncompressed block size settings. In SWCompression by default the block size is 4MB, which incidentally is the strict upper limit of the reference implementation on the block size. I do wonder what the aforementioned LZ4_compress_default function from lz4.h does if the input is larger than 4MB. It is also unclear to me how does it handle cases of non-compressible inputs.

Meanwhile, assuming that size of your input is always smaller than 4MB and the input is always compressible, I can suggest the following workaround to extract only block data:

let data = // ... data.count must be smaller than 4 * 1024 * 1024
let compressedData = LZ4.compress(data: data)
let block = compressedData[(compressedData.startIndex + 11)..<(compressedData.endIndex - 8)]

@mickeyl
Copy link
Author

mickeyl commented Apr 24, 2023

Awesome, that's a quick fix that will do it for now. Thanks a lot.

@mickeyl mickeyl closed this as completed Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants