I tried to used the table_driven algorithm from python, and quickly realized that the generated table was not memorized. Instead it is generated for each single call to table_driven(), making the computation several order of magnitude slower than bit_by_bit_fast() in my case.
I think the table should be memorized, either at init, or at first call to table_driven, with something like:
if self.tbl is None:
self.tbl = self.gen_table()