-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor documentation #15
Comments
Apologies for the delay in responding. I've been on holidays and then started a new job. I've added links in the README to the file format and some blog posts that give some background in commit f1a3824. Is this what you're looking for? If not, would you be willing to do a PR that does contain what you're looking for? |
I think the graphics belong into the repository, having a README-FORMAT.md or just FORMAT.md. or in this wiki, with some examples. That would be really helpful. |
So here are some more thinks that should be added:
I know, lots of questions. But this will help implementing zchunk in other languages. |
Also, how to use zck_gen_zdict would be handy. |
Good point.
This is actually already supposed to be working (there are zdicts designed for Fedora repodata available in the package fedora-repo-zdicts), but it seems that the updates* metadata isn't currently using the zdicts. I've submitted a patch to Fedora infra that will hopefully fix this. |
I'd be more interested in how to generate zdicts for different repos that fedora and fedora-update (eg. rpmfusion, private repos). Or are the fedora-repo-zdicts generic enough to be used on any repo? |
That's a good question. They'd definitely be better than nothing, but, especially for a private repo, you'd probably get much better compression using a zdict generated specifically for the repo. |
Thanks for the information. |
No, zdicts give you significantly better compression at the cost of a slight speed decrease, mainly during the compression process. Zchunk splits a file into completely independent compressed chunks, which means that identical data in one chunk can't be referenced by another. Zdicts help us get around this problem by providing a compression dictionary of strings common to more than one chunk. This dictionary is stored in the first chunk, and, while it takes up space (the default is around 100KB), it generally makes a huge difference in compression size. The trick is that, for a file to take advantage of zchunk's benefits, the zdict has to stay the same from one version of the file to the next. If you change the zdict, identical chunks will no longer match because their compressed data is different. |
In case this helps anyone. I built a rough parser in Kaitai (https://ide.kaitai.io/devel/# )for zchunk. I've found this helps visualize the format and what is happening. And can generate parsers in other languages.
|
@tomberek , I think there is an issue with endianness of ci values, in my tests the ci value must be calculated as follows
|
Does Zchunk have a "magic number" identifying the file type? |
Hey Daniel, yes it does. As per https://github.com/zchunk/zchunk/blob/main/zchunk_format.txt the first bytes of the file are: To clarify, almost everything you would see in Fedora would be the first. Detached headers were added for someone wanting to use zchunk to download immutable full disk images for the automotive industry. |
Hi, I see poorly documented README without any link to documentation or related papers. I'm totally out of context. Can you guide me? Fixing README might help newbies like me who explores Github and finds out this repo.
The text was updated successfully, but these errors were encountered: