Skip to content
C translation of the cbor2 implementation
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cboar
scripts
source
tests
.gitignore
LICENSE.txt
Makefile
README.rst
setup.py
tox.ini

README.rst

CBOAR

A high performance, flexible CBOR serialization library. Basic usage is as you'd expect:

>>> import cboar as cbor
>>> cbor.dumps(0)
b'\x00'
>>> cbor.dumps([1, 2, 3])
b'\x83\x01\x02\x03'
>>> cbor.loads(cbor.dumps('foo'))
'foo'

Status

Don't use this yet! It's not finished although it is approaching an alpha release. If you're desperate enough to use it, please report any memory leaks as issues, and any incorrect encodings / decodings (preferably with easy to reproduce test cases).

Be warned that while I've tried to hew closely to the design of cbor2 I have made a few internal changes to keep things easier at the C level so it's never going to be exactly the same. For example, rather than separate decoding functions called from a CBORDecoder class, it's easier to manage the shareable's state by moving the functions into methods on the class itself.

Background

On the piwheels project we recently switched to CBOR for all our serialization needs. Partly this was down to security; we previously used pickle because it was quick to get started with, but obviously there's awful security holes there if you can't trust the nodes you're communicating with. Partly it was a matter of flexibility; JSON was considered and quickly rejected for not supporting half the data-types we wanted to transmit (timestamps, durations, sets, etc. - half the fun of Python is its rich datatypes!).

Initially we settled on the cbor2 library; it was extremely flexible and the code looked well written and tested. Unfortunately, while cbor2 is easily fast enough for the majority of purposes on a PC, we run piwheels on a pi and when chucking around large structures (e.g. the search index) cbor2 took quite a while to encode things. After a day of tweaking cbor2 to try and improve the performance, and trying pypy3 (nice idea, but it's not ready for primetime yet with various external libraries causing issues), I decided to move to a C-based implementation, specifically the popular cbor library.

Quickly, we ran into issues: it doesn't support some types out of the box (sets and timestamps to name but two). No matter, it was flexible enough to provide a mechanism to extend it with new types. Unfortunately, this mechanism isn't as well designed as cbor2's. For instance, patching in set support breaks when dealing with, say, sets of tuples (because unlike cbor2 it doesn't know it should switch to immutable hashable collections when decoding within a set, or for dict keys for that matter). Its decoding is also rather basic in several areas (the long int decoding runs out of precision after a while, the dict decoding doesn't handle complex keys). After digging into the code to see if these issues could be quickly patched around, I came to the conclusion that its internal design wasn't half as clean (or extensible) as cbor2's.

If only there was a C-based CBOR implementation that had a design as clean as cbor2's, but written in (vaguely) comprehensible C!

Well, after a week mulling it over, here's my shot at it. It's basically a port of cbor2 into C. Most of the internal architecture is exactly the same (an OrderedDict to look up types, an identical default encoder mechanism, etc), so I've licensed it the same as cbor2 because for all intents and purposes, it's a derivative work (hell, I even nicked their test suite).

You can’t perform that action at this time.