delz (decompress LZ)
Polish vehicle registration certificate data decompression tool
Polish vehicle registration certificates since accession to European Union contains 2D code for automatic processing of data written inside the document. However the whole process consists of few steps. First and most obvious of them is the code, to be exact it is Aztec code which is not very popular these days. Output of the reader is base64-coded data chunk plus one character at the end (the meaning of it is still unknown). After decoding base64 we get a few hundred bytes of binary blob which is compressed using custom, LZ77-derived algorithm and this program is decompressing this data. Its output is array of undescribed strings delimited by pipe symbol (|) or 0x7c. Note that this data is encoded using UTF-16LE, which is typical for Windows systems, and there it is known as simply Unicode. Output of this program will be covered in details at the end of this document. Whole process is presented on the schematic below.
+-------+ +--------+ +------------+ +-----------+ | Aztec | scan() | | base64_decode() | LZ | decompress_lz() | pipe | | |------->| base64 |---------------->| compressed |---------------->| delimited | | code | | | | data | | text | +-------+ +--------+ +------------+ +-----------+
This project provides you with:
- standalone program for decompressing the blob you should get as output of base64 decoding function
It can be compiled by simply typing
make into your terminal (provided that
you have compiler).
- library in ar format ready to be included in your application
It can be compiled by typing
make liblz.a or as a dependency of above
Both parts are licensed as LGPL so you can use it even in commercial products. For exact conditions of redistribution, see LICENSE file you should get with this copy of program source.
To decompress data properly you should provide base64-decoded byte stream to
program's standard input. Decompressed data will be printed to the standard
output of the program. Assuming that data.bin contains valid stream and is
stored in current working directory you could do:
./delz < data.bin > data.utf8. After that you should get decompressed data
in data.utf8 file. In case of failure program returns non-zero value.
Program was tested on amd64 Linux system, but it should work properly on every UNIX-based system that supports UTF-8 in its console. It might be possible to make it work on Windows system as well but since Windows traditionally uses UTF-16 as standard console encoding it may not be possible to print output properly to console.
Furthermore there should not be any problem to use it on any big-endian system, but it may involve further testing.
The compression algorithm turns out to be custom implementation of LZSS algorithm. If you are interested in functioning LZ77-based algorithms you could start with Wikipedia pages for LZ77 and LZSS. tl;dr: both algorithms finds repeating substrings and replacing them with reference to last occurence of the same string. LZ77 proposed to save them together and place after them next byte of string uncompressed, which in practice was not efficient. LZSS tried to solve this problem by prefixing each byte of output with single bit indicating if it is raw data whether length-offset pair. It was better but now we have to store 9-bit words which still is not very efficient. According to informations from this article one of the implementers of LZSS solved this by storing flags in 8-bit packs.
The implementation used here goes one step further and besides the following tries to optimize usage of length-offset pairs by:
- encoding offsets longer than 127 bytes just after pair indicator mentioned above
- using same bits as in previous point to indicate that we should use previous offset (so it saves a byte in case two times in a row it need to copy bytes from ie. current offset minus 1)
- encoding length just after big offset bits and doing this the way that lengths shorter than about 36 bytes needs less than or exactly 8 bits, so it is saving few bits if it is shorter (and it is uncommon to copy more than 36 bytes at once)
If you want to learn more about details of this algorithm you should read the function code (I know it may be difficult for someone not writing low-level code, though).
It seems that some time ago and after implementing the code, government (or PWPW, who is responsible for producing the documents) changed output data format. In its new version the code stores some fields not present in the document itself and they have unknown meaning too. These new version is indicated by XXC1 field at hte beginning.
Because this section may interest mainly Polish-speaking people the desriptions will be only in that language. For others curious what is inside: please use Google translate and sorry for that.
|Pozycja (stary)||Pozycja (nowy)||Miejsce w dowodzie||Przykład||Opis|
|-||0||XXC1||n.d.||Rozróżnia wersje protokołu|
|0||1||SERIA DR||BAF1026996||Seria i numer dowodu|
|-||2||?||1465198||Kod teryt urzędu rejestrującego|
|1||3||ORGAN WYDAJĄCY||PREZYDENT M. ST. WARSZAWY||Linia 1|
|2||4||DZIELNICA ŻOLIBORZ||Linia 2|
|3||5||ul. NIEISTNIEJĄCA 1/2||Linia 3|
|4||6||01-627 WARSZAWA||Linia 4|
|7||A||UA 12345||Numer rejestracyjny pojazdu|
|14||I||2001-12-21||Data wydania dowodu rejestracyjnego (YYYY-MM-DD)|
|15||H||---||Okres ważności dowodu|
|16||C.1.1||KOWALSKI JAN||Pierwsza linia|
|27||C.2.1||KOWALSKI JAN||Pierwsza linia|
|38||F.1||1600||Maksymalna masa całkowita [kg]|
|39||F.2||1600||Dopuszczalna masa całkowita pojazdu [kg]|
|40||F.3||2600||Dopuszczalna masa całkowita zespołu [kg]|
|43||K||---||Numer świadectwa homologacji typu pojazdu|
|45||O.1||1000||Maksymalna mas całkowita przyczepy z hamulcem|
|46||O.2||400||Maksymalna mas całkowita przyczepy bez hamulca|
|47||Q||Stosunek mocy do masy (w KW/kg)|
|48||P.1||1600,00||Pojemność silnika [cm^3]|
|49||P.2||80,00||Moc silnika [kW]|
|51||B||2001-12-31||Data pierwszej rejestracji pojazdu (YYYY-MM-DD)|
|52||S.1||5||Liczba miejsc siedzących|
|53||S.2||---||Liczba miejsc stojących|
|54||RODZAJ POJAZDU||SAMOCHÓD OSOBOWY|
|58||NAJWIĘKSZY DOP. NACISK OSI||10,00||[kN]|
|59||NR KARTY POJAZDU||Karty nie wydano|
|61||?||03||Rodzaj - kod|
|62||?||06||Podrodzaj - kod|
|63||?||000||Przeznaczenie - kod|