GitHub

This is a small project to implement Philip Gage's Byte Pair Encoding compression, written in contemporaneous C89 for fun. It is based directly on Gage's pseudo-code. I started off looking at his code, but then decided early on to rewrite without looking at it. So it definitely can't be considered original.

I added error-checking to the expand program, and the compress program should be able to handle arbitrary binary input. A voluminous amount of debug logging is generated with a debug print macro if DEBUG is defined. A few sample files, text and binary, are provided for testing.

This is the first time I've used fuzzing, and AFL++ was extremely useful for finding corner cases, in particular "circular byte expansion". I'm convinced it's pretty robust now, as I don't dynamically allocate memory anywhere.

I picked BPE because it appealed to me as dead-simple compression algorithm. It is in theory - replace two bytes with an unused one - but also fundamentally limited in the given algorithm by the byte as a unit, specifically the number of unused byte values available. If a block happens to start with many different byte values, then the block will be cut short with little compression. The pair table RLE is a little fiddly (I didn't implement it exactly) and limited in size, while a Huffman code is very elegant and can be as large as needed. Encoding can't be fast either, because my program takes dozens of passes over each variable-length block, while Huffman coding can be done in one or two passes over each fixed-length block. At least I/O is buffered so I can use getc/putc byte-by-byte without being horrendously inefficient.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
compress.c		compress.c
expand.c		expand.c
hashtable.c		hashtable.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

jxu/bpe

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages