rscompress

A compression library in Rust with focus on scientific data.

Disclaimer

This is a rewrite and merge of several compression algorithms developed during my time as a phd student:

The dissertation can be downloaded from https://doi.org/10.5445/IR/1000105055

Architecture

The library is split into one base and four supporting libraries. The base library orchestrates the supporting libraries. All compression algorithms follow the same basic structure:

Decorrelate data using transformations
Approximate the data, if lossy compression is needed
Code the data

Additionally, check if each step executed as expected.

                   +----------------+      lossless      +----------+
                   |                |                    |          |
Start   +------>   | Transformation |   +------------>   |  Coding  |   +------>   End
                   |                |                    |          |
                   +----------------+                    +----------+

                           +                                   ^
                           |                                   |
                           |  lossy                            |
                           |                                   |
                           v                                   |
                                                               |
                   +---------------+                           |
                   |               |                           |
                   | Approximation |  +------------------------+
                   |               |
                   +---------------+

This library will follow the same principles.

Transformations

Transformations are algorithms which represent the same information using a different alphabet. Good transformation algorithms eliminate redundant information in the data. A mathematical function can be seen as a transformation of a series of data. The series 1 1 2 3 5 8 13 21 .. can expressed as f(x) = f(x-1) + f(x-2). We mapped the information represented in alphabet A (integers) to an alphabet B (letters + integers) which is more compact. It is important to note that all transformations must have two properties:

Applying a transformation algorithm to data, does not loose information.
All transformation algorithms are reversible, such that the original representation can be reconstructed from the new alphabet.

Approximations

Approximations are algorithms which loose information for the sake of better compression. Given a threshold theta (this can be absolute or relative), the algorithm maps the data from alphabet A to B with an information lose within the expected threshold. An example for an approximation is the ~= operator known from primary school e.g. 1/3 ~= 0.3. Approximations have the following properties:

Applying an approximation algorithm to data, results in information loss.
Approximation algorithms are not reversible.
The information loss is guaranteed to be within the threshold theta

Codings

Codings are algorithms where the actual compression happens. The information is being saved on disk as compact as possible. Examples are Huffman or Arithmetic coding.

Checksums

Checksums are algorithms to check the integrity of the data at each step e.g. Adler-32.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github/workflows		.github/workflows
assets		assets
rscompress-approximation		rscompress-approximation
rscompress-checksums		rscompress-checksums
rscompress-coding		rscompress-coding
rscompress-transformation		rscompress-transformation
rscompress		rscompress
testdata		testdata
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rscompress

Disclaimer

Architecture

Transformations

Approximations

Codings

Checksums

About

Releases

Languages

License

ucyo/rscompress

Folders and files

Latest commit

History

Repository files navigation

rscompress

Disclaimer

Architecture

Transformations

Approximations

Codings

Checksums

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages