Speck, Android and Linux kernel interop #585
The Linux kernel added Speck-64 and Speck-128 for opportunistic encryption on low resource devices that would otherwise have to disable encryption on the file system. Also see [PATCH v2 0/5] crypto: Speck support on the kernel-crypto mailing list.
The problem is we are not going to interop well with the kernel. The problem is due to confusion in the Simon and Speck algorithmic description and the test vectors. The algorithmic description and test vectors did not quite align, and we went down one path (follow the test vectors) and the kernel went down another path (follow the algorithmic description).
It is possible to to interop with the kernel using our implementation but... (1) it is not readily apparent how to do it, and (2) there's a loss of efficiency when doing it. The interop can happen now by providing an adapter class. If my estimates are correct, then we could loose 2 to 4 cpb on a conversion via the adapter. That means SSE4 could drop from 2.1 cpb to about 4 or 6 cpb.
Also not readily apparent, the same interop issue affect Simon, too. At this point in time we are not aware of anyone implementing Simon and diverging due to the algorithmic description versus test vector results.
Simon and Speck has been part of the library since December 2017. We are probably running between the cracks, but we can likely make changes and take action before the 6.1 release and anyone notices.
This will track our handling of the issue and document how we close the gap.
Also see Speck, Android and Linux kernel interop on the Crypto++ mailing list.
The text was updated successfully, but these errors were encountered:
Why would it be slower? The data only needs to be packed/unpacked once, not once per cipher round. With SSE2 you need _mm_unpacklo_epi64() (PUNPCKLQDQ) and _mm_unpackhi_epi64 (PUNPCKHQDQ) anyway to de-interleave the (x, y), and similarly for interleaving them again at the end. So it should be easy to relabel the registers such that the word order is (y, x). Also since the byte order is now little endian the byte swap via _mm_shuffle_epi8(), i.e. PSHUFB, isn't needed anymore which would make the new convention faster actually.
Also just to clarify, the above linked LKML thread is for v1 of the patchset which used the (x, y) word order. But in v2 (https://firstname.lastname@example.org/msg30925.html) it has been changed to (y, x) since that's what the Speck designers said is the intended order.
My bad... I was referring to an adapter class to frame the data. That would accommodate both behaviors with minimal changes. Let me get that added above.
The other options are (1) add a second implementation so both behaviors are stand alone and perform at speed, (2) provide only one implementation and align with the kernel's implementation by modifying the existing implementation. A variation on (2) is, provide the adapter class on top of the updated Speck implementation so the old behavior is still available through the adapter but slower.
I'm learning towards (2). We'll modify the Crypto++ implementation to align with the kernel's implementation. The modified Speck will interop and run at speed. Those who need the incorrect behavior can get it through an adapter class that runs a little slower.
OK, thanks. The reference has been changed.
We recently learned our Simon and Speck implementation was wrong. The removal will stop harm until we can loop back and fix the issue. The issue is, the paper, the test vectors and the ref-impl do not align. Each produces slightly different result. We followed the test vectors but they turned out to be wrong for the ciphers. We have one kernel test vector but we don't have a working implementation to observe it to fix our implementation. Ugh...
The original files for the Simon and Speck implementation are available in
We will loop back around to this issue when we have the time to untangle things. We really need (1) a good set of test vectors; and (2) a working reference implementation. We don't have either at the moment.
We have one test vector from the Linux kernel, but our implementation failed to arrive at the known answer. We did some knob turning but failed to arrive at the answer. Plugging the kernel test vector into the reference implementation also failed to arrive at the answer.
Here is the kernel test vector provided by email. More should be available in the patch set for
Of the 200+ test vectors only 10 are semi-authentic. The ten are from the Simon and Speck paper but they had permutations applied to them so they worked with the algorithms described in the paper. The remaining 200 or so were generated with Crypto++ using straight C++ code. The library generated the test vectors because we don't have a reference implementation
The ALTIVEC function required an inline declaration. Lack of inline caused the self test failure. Two NEON functions needed the same. We also cleaned up constants in unnamed namespaces