Skip to content
This repository

Implementations of my simple data compression algorithm

branch: master
README.md
  • License: New BSD License
  • Copyright (c) 2012, Piotr Tarsa

Project description

This project contains implementations of a single compression algorithm in different languages and performance measurements of those implementations under available execution environments.

Project goal

The goal is to evaluate programming platforms usefullness when it comes to develop algorithms that are heavy on computations. Ideally, a programming platform should have comparatively low resource usage and execute programs quickly while hiding complexities related to the programming platform and allowing the programmer to focus only on the problem that he was originally trying to solve. Usually, comparisons between execution environments for different programming languages are done using microbenchmarks. Unfortunately those microbenchmarks are too simple to draw any sensible conclusion. This project provides several implementations for a single nontrivial problem and focuses on side effects (ie produced output) while providing freedom to implementation details.

The requirements for any implementation are:

  • Each implementation must produce identical output when fed with identical input and compression options
  • Each implementation should expose all compression options that the format defines and allow for adjustment of that options at encoding phase
  • Languages mixing is not allowed, eg mixing Python code with non-standard C modules
  • There must be no side effects other than the output, ie the decompressed stream when decompressing, compressed stream when compressing and compression options when checking them

Parallelization is allowed.

Performance results

All tests are performed on following system:

  • CPU: Intel Core 2 Duo E8400 @ 3.00 GHz (stock),
  • RAM: 8 GiB A-Data DDR3 RAM @ CL5, 1000 MHz
  • OS: Ubuntu 12.04 64-bit

Encoding speed on enwik8 (command line interface implementations) - revision 53a85ed:

Language Real time User time Sys time Execution environment Notes
C 7.473s 7.165s 0.276s GNU GCC 4.6.3 Three runs average, SSE optimizations (vectors and prefetching)
C 12.734s 12.437s 0.279s GNU GCC 4.6.3 Three runs average, no SSE optimizations
Java 20.587s 20.326s 0.280s Oracle JDK 7 update 15 Three runs average
Python 60.346s 59.907s 0.367s ShedSkin 0.9.3 + GNU GCC 4.6.3 Three runs average
Python 139.983s 132.384s 6.339s PyPy 1.9.0 Three runs average
Python 1806.230s 1804.573s 0.436s CPython 2.7.3 Single run, -OO

Encoding speed on enwik8 (JavaScript in browsers) - revision 53a85ed:

Browser Reported time Browser version Notes
Chrome 47.136s 25.0.1364.97 Three runs average
Opera 137.397s 12.11 (build 1661) Three runs average
Firefox 153.289s 19.0 Three runs average

Encoding speed on enwik9 (command line interface implementations) - revision 53a85ed:

Language Real time User time Sys time Execution environment Notes
C 67.853s 66.129s 0.764s GNU GCC 4.6.3 Three runs average, SSE optimizations (vectors and prefetching)
C 114.255s 112.308s 0.757s GNU GCC 4.6.3 Three runs average, no SSE optimizations
Java 189.163s 186.413s 0.845s Oracle JDK 7 update 15 Three runs average
Python 545.170s 540.163s 1.023s ShedSkin 0.9.3 + GNU GCC 4.6.3 Three runs average
Python 1086.181s 1069.226s 6.840s PyPy 1.9.0 Three runs average
Python 16799.145s 16789.525s 3.668s CPython 2.7.3 Single run, -OO

Encoding speed on enwik9 (JavaScript in browsers) - revision 53a85ed:

Browser Reported time Browser version Notes
Chrome 422.509s 25.0.1364.97 Three runs average
Opera 1287.090s 12.11 (build 1661) Three runs average
Firefox 1434.559s 19.0 Three runs average
Something went wrong with that request. Please try again.