Skip to content

proposal: encoding/{base32,base64}: constant-time implementations of RFC 4648 codecs for use with cryptography keys #73901

Open
@tob-scott-a

Description

@tob-scott-a

Proposal Details

Background

The current implementations of the base32 and base64 codecs are not ideal for cryptographic use cases. The linked document describes practical timing side-channels against general purpose RFC 4648 codecs when used to encode/decode cryptographic secrets. You can verify the current implementation uses a table look-up in both directions, which introduces the risk of cache-timing attacks.

Specialized implementations of these codecs exist. @Sc00bz wrote a constant-time implementation of base64 in C, which I used as a basis for both base32 and base64 in PHP.

Proposed Change

New functions in encoding/base32 and encoding/base64:

  • EncodeToStringConstTime as an alternative to EncodeToString
  • DecodeStringConstTime as an alternative to DecodeString

These functions could then be used in other packages that handle cryptographic secrets and auditors can be absolutely sure that timing leaks are not present.

(I'm happy to provide a patch for this, if accepted.)

Activity

added this to the Proposal milestone on May 28, 2025
mateusz834

mateusz834 commented on May 28, 2025

@mateusz834
Member

CC @golang/security

added
Proposal-CryptoProposal related to crypto packages or other security issues
on May 28, 2025
tob-scott-a

tob-scott-a commented on May 28, 2025

@tob-scott-a
Author

Just in case there's any confusion: I do not think this qualifies as a security vulnerability (or I would have reported it as one, rather than filing an issue with the intent to follow up with a patch).

added
LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a tool
on May 28, 2025
Jorropo

Jorropo commented on May 28, 2025

@Jorropo
Member

👍 however do we want new functions ? It would be simpler for everyone if we made the current functions constant time.
If you could send your patch already it would be nice, I would like to benchmark it.

tob-scott-a

tob-scott-a commented on May 28, 2025

@tob-scott-a
Author

A constant-time alternative will certainly be slower than the general-purpose functions: It replaces a map look-up with many operations.

In the context of handling cryptographic secrets, this trade-off is worth it. However, making all use-cases for a base64 codec constant-time would slow most Go programs down unnecessarily.

It is for this reason that my opening offer was to implement new functions rather than change the current behavior.

Jorropo

Jorropo commented on May 28, 2025

@Jorropo
Member

A single map lookup is far from cheap, I don't want to risk myself to guessing the performance impact, a benchmark would really help 🙂.
Also in go we often do not mind trading a few % of performance if it means making the API easier to use.

mateusz834

mateusz834 commented on May 28, 2025

@mateusz834
Member

Also in go we often do not mind trading a few % of performance if it means making the API easier to use.

We can always do this in reverse, make the current API constant-time, but add unsafe variants (if needed) for speed. Then folks would actually use the constant-time APIs, otherwise we might have low usage of such (new) APIs.

renthraysk

renthraysk commented on May 28, 2025

@renthraysk

Doesn't the fact that stdlib have configurable alphabets for the base64/32 rule out replacing them?

tob-scott-a

tob-scott-a commented on May 29, 2025

@tob-scott-a
Author

@renthraysk It's not actually a total show-stopper (although it does make the patch a bit trickier to write). See #73909 for a first pass at a patch that adheres to the suggestions provided in this discussion so far.

gopherbot

gopherbot commented on May 29, 2025

@gopherbot
Contributor

Change https://go.dev/cl/676917 mentions this issue: encoding/base64: add constant-time behavior, enabled by default

rolandshoemaker

rolandshoemaker commented on May 29, 2025

@rolandshoemaker
Member

I don't think we'd want to turn on constant-time behavior by default for all uses of encoding/{base32,base64}. I suspect the vast majority of their usage is for entirely benign inputs. That said we probably would want to use it by default for encoding/pem, where the contents are much more likely to be sensitive (of course we should probably still add some way to disable this, for people who i.e. need to parse large numbers of certificates etc).

I haven't thought through this too deeply, but based on my understanding of how people typically do constant time base64, could we add a generic decoder/encoder that generates the required bit shift offsets based on the alphabet, rather than needing to implement per-alphabet encoders and decoders? We'd then just need to add a WithConstantTime (or something) method to Encoder.

11 remaining items

rolandshoemaker

rolandshoemaker commented on May 30, 2025

@rolandshoemaker
Member

I think we need benchmarks to make reasonable decisions here about the trade-offs, but I will note that encoding/pem is currently only three functions. Doubling the API surface is perhaps not ideal, but concretely that comes down to adding three new functions, which isn't particularly horrific.

The point about requiring users to decide is valid, but I think the combination of naming and reasonable documentation makes it less scary. Additionally, since the existing Encode/Decode will become "safe" means it's unlikely people will unintentionally begin using the "unsafe" permutations unless they are explicitly looking for them for performance reasons.

Of course all of this is moot if the benchmarks show that the performance degradation is relatively low, hence why we need numbers before we can make a decision.

moved this to Incoming in Proposalson Jun 19, 2025
moved this from Incoming to Active in Proposalson Jun 19, 2025
aclements

aclements commented on Jun 19, 2025

@aclements
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

aclements

aclements commented on Jul 9, 2025

@aclements
Member

I think we need a benchmark comparison between variable-time and constant-time base64 and base32 encoding to make any decisions here. If the regression is small, then I'd favor fewer configuration knobs in the API.

tob-scott-a

tob-scott-a commented on Jul 10, 2025

@tob-scott-a
Author

@aclements Good point. I recall @Jorropo was planning to run their own benchmarks (based on this comment). If something changes and I need to write and share a small program (and the data collected from my own machines), I'm happy to do so.

rolandshoemaker

rolandshoemaker commented on Jul 10, 2025

@rolandshoemaker
Member

Benchmark based on https://go.dev/cl/676917:

% benchstat vt-base64 ct-base64
goos: darwin
goarch: arm64
pkg: encoding/base64
cpu: Apple M1 Pro
                       │  vt-base64  │               ct-base64               │
                       │   sec/op    │    sec/op     vs base                 │
EncodeToString/2-10      13.35n ± 1%    17.94n ± 1%   +34.47% (p=0.000 n=10)
EncodeToString/4-10      14.59n ± 1%    22.40n ± 0%   +53.58% (p=0.000 n=10)
EncodeToString/8-10      18.58n ± 1%    31.44n ± 1%   +69.19% (p=0.000 n=10)
EncodeToString/64-10     71.42n ± 1%   192.20n ± 1%  +169.13% (p=0.000 n=10)
EncodeToString/8192-10   5.907µ ± 1%   19.849µ ± 1%  +236.02% (p=0.000 n=10)
DecodeString/2-10        19.81n ± 4%    26.03n ± 0%   +31.37% (p=0.000 n=10)
DecodeString/4-10        19.28n ± 2%    32.28n ± 1%   +67.43% (p=0.000 n=10)
DecodeString/8-10        22.80n ± 1%    44.97n ± 1%   +97.21% (p=0.000 n=10)
DecodeString/64-10       54.78n ± 1%   231.20n ± 0%  +322.05% (p=0.000 n=10)
DecodeString/8192-10     4.181µ ± 2%   25.227µ ± 1%  +503.43% (p=0.000 n=10)
NewEncoding-10           145.7n ± 0%    289.3n ± 1%   +98.56% (p=0.000 n=10)
geomean                  75.33n         168.3n       +123.44%

                       │   vt-base64   │              ct-base64               │
                       │      B/s      │     B/s       vs base                │
EncodeToString/2-10       142.9Mi ± 1%   106.3Mi ± 1%  -25.61% (p=0.000 n=10)
EncodeToString/4-10       261.6Mi ± 1%   170.3Mi ± 0%  -34.89% (p=0.000 n=10)
EncodeToString/8-10       410.6Mi ± 1%   242.7Mi ± 1%  -40.90% (p=0.000 n=10)
EncodeToString/64-10      854.6Mi ± 1%   317.5Mi ± 1%  -62.85% (p=0.000 n=10)
EncodeToString/8192-10   1322.5Mi ± 1%   393.6Mi ± 1%  -70.24% (p=0.000 n=10)
DecodeString/2-10         192.6Mi ± 4%   146.6Mi ± 0%  -23.90% (p=0.000 n=10)
DecodeString/4-10         395.7Mi ± 2%   236.3Mi ± 1%  -40.28% (p=0.000 n=10)
DecodeString/8-10         502.0Mi ± 1%   254.5Mi ± 1%  -49.29% (p=0.000 n=10)
DecodeString/64-10       1532.1Mi ± 1%   363.0Mi ± 0%  -76.31% (p=0.000 n=10)
DecodeString/8192-10     2492.0Mi ± 2%   413.0Mi ± 1%  -83.43% (p=0.000 n=10)
NewEncoding-10           1675.4Mi ± 0%   843.8Mi ± 1%  -49.63% (p=0.000 n=10)
geomean                   608.6Mi        272.4Mi       -55.25%

Does not look great. I don't think we could reasonably accept slowing down non-sensitive base64 usages this much by default.

Didn't check the perf of @Sc00bz impl, nor did I do much investigation on whether there are easy perf optimizations still available for https://go.dev/cl/676917.

aclements

aclements commented on Jul 10, 2025

@aclements
Member

Thanks for running those. I'm not surprised that the adding an indirect function call for every byte like https://go.dev/cl/676917 does slows things down dramatically. I think we'd have to find a different way to do this.

Just as an experiment, what happens if you inline that logic for a particular alphabet? (It might work to just use a direct function call, but be sure to check that the compiler actually inlines it if you try that.)

rolandshoemaker

rolandshoemaker commented on Jul 10, 2025

@rolandshoemaker
Member

Inlining the calls improves the performance, but not massively:

% benchstat vt-base64 ct-inlined-base64
goos: darwin
goarch: arm64
pkg: encoding/base64
cpu: Apple M1 Pro
                       │  vt-base64  │           ct-inlined-base64           │
                       │   sec/op    │    sec/op     vs base                 │
EncodeToString/2-10      13.35n ± 1%    14.72n ± 1%   +10.34% (p=0.000 n=10)
EncodeToString/4-10      14.59n ± 1%    19.08n ± 3%   +30.82% (p=0.000 n=10)
EncodeToString/8-10      18.58n ± 1%    26.43n ± 1%   +42.25% (p=0.000 n=10)
EncodeToString/64-10     71.42n ± 1%   153.00n ± 2%  +114.24% (p=0.000 n=10)
EncodeToString/8192-10   5.907µ ± 1%   16.689µ ± 4%  +182.52% (p=0.000 n=10)
DecodeString/2-10        19.81n ± 4%    23.62n ± 1%   +19.23% (p=0.000 n=10)
DecodeString/4-10        19.28n ± 2%    29.69n ± 1%   +53.97% (p=0.000 n=10)
DecodeString/8-10        22.80n ± 1%    41.86n ± 2%   +83.60% (p=0.000 n=10)
DecodeString/64-10       54.78n ± 1%   213.35n ± 0%  +289.47% (p=0.000 n=10)
DecodeString/8192-10     4.181µ ± 2%   24.479µ ± 1%  +485.55% (p=0.000 n=10)
NewEncoding-10           145.7n ± 0%    297.7n ± 2%  +104.32% (p=0.000 n=10)
geomean                  75.33n         150.0n        +99.13%

                       │   vt-base64   │          ct-inlined-base64           │
                       │      B/s      │     B/s       vs base                │
EncodeToString/2-10       142.9Mi ± 1%   129.5Mi ± 1%   -9.37% (p=0.000 n=10)
EncodeToString/4-10       261.6Mi ± 1%   199.9Mi ± 3%  -23.56% (p=0.000 n=10)
EncodeToString/8-10       410.6Mi ± 1%   288.7Mi ± 1%  -29.70% (p=0.000 n=10)
EncodeToString/64-10      854.6Mi ± 1%   398.9Mi ± 2%  -53.33% (p=0.000 n=10)
EncodeToString/8192-10   1322.5Mi ± 1%   468.2Mi ± 4%  -64.60% (p=0.000 n=10)
DecodeString/2-10         192.6Mi ± 4%   161.5Mi ± 1%  -16.15% (p=0.000 n=10)
DecodeString/4-10         395.7Mi ± 2%   257.0Mi ± 1%  -35.06% (p=0.000 n=10)
DecodeString/8-10         502.0Mi ± 1%   273.4Mi ± 2%  -45.54% (p=0.000 n=10)
DecodeString/64-10       1532.1Mi ± 1%   393.3Mi ± 0%  -74.33% (p=0.000 n=10)
DecodeString/8192-10     2492.0Mi ± 2%   425.6Mi ± 1%  -82.92% (p=0.000 n=10)
NewEncoding-10           1675.4Mi ± 0%   820.0Mi ± 2%  -51.06% (p=0.000 n=10)
geomean                   608.6Mi        305.6Mi       -49.78%

Because we are operating on 8 byte chunks at a time, I do wonder if a SIMD (or other multi op packing technique) could provide a significantly more performant implementation.

randall77

randall77 commented on Jul 10, 2025

@randall77
Contributor

For encoding at least, you only need a 64-byte lookup table. That can all fit on a single cache line. I would think if you properly aligned the lookup table it would be constant time as far as cache timing attacks go.

randall77

randall77 commented on Jul 10, 2025

@randall77
Contributor

Decoding not too hard to fit into a 64-byte table also. See CL 687396.

rolandshoemaker

rolandshoemaker commented on Jul 11, 2025

@rolandshoemaker
Member

@randall77 I quite like the idea of using the fact that the LUT fits in a single cache line in theory, but in practice I'm not entirely comfortable with the concept of relying on CPU caching semantics for constant-time properties (I'm not sure I've ever seen a case where somebody has exploited mismatches in described behavior and implement behavior here, but it also wouldn't entirely shock me). If anything, the last four years or so have made me less willing to rely on CPU designers to not silently break it.

Sc00bz

Sc00bz commented on Jul 12, 2025

@Sc00bz

Here's base64 encode/decode 8 characters at a time with SIMD using standard 64 bit integers.

TL;DR I'm doing 8, 7 bit math operations in a 64 bit integer. There a separator bit to prevent negative numbers from bleeding over. Also the separator bit is use to make the 8, 7 bit conditional masks.

// Decode usage:
encoded := binary.BigEndian.Uint64(in)
data := base64StandardDecodeBlock(encoded)
out[0] = byte(data >> 40)
out[1] = byte(data >> 32)
out[2] = byte(data >> 24)
out[3] = byte(data >> 16)
out[4] = byte(data >>  8)
out[5] = byte(data >>  0)

// Encode usage:
data :=
	(uint64(in[0]) << 40) |
	(uint64(in[1]) << 32) |
	(uint64(in[2]) << 24) |
	(uint64(in[3]) << 16) |
	(uint64(in[4]) <<  8) |
	(uint64(in[5]) <<  0)
encoded := base64StandardEncodeBlock(data)
binary.BigEndian.PutUint64(out, encoded)

func base64StandardDecodeBlock(src uint64) uint64 {
	const SEPARATORS uint64 = 0x8080808080808080

	const LO1         int64 = int64('A')
	const HI1         int64 = int64('Z')
	const LO2         int64 = int64('a')
	const HI2         int64 = int64('z')
	const LO3         int64 = int64('0')
	const HI3         int64 = int64('9')
	const CHAR4       int64 = int64('+')
	const CHAR5       int64 = int64('/')
	const FULL_LO1   uint64 = uint64(LO1 << 56 | LO1 << 48 | LO1 << 40 | LO1 << 32 | LO1 << 24 | LO1 << 16 | LO1 << 8 | LO1)
	const FULL_HI1   uint64 = uint64(HI1 << 56 | HI1 << 48 | HI1 << 40 | HI1 << 32 | HI1 << 24 | HI1 << 16 | HI1 << 8 | HI1) | SEPARATORS
	const FULL_LO2   uint64 = uint64(LO2 << 56 | LO2 << 48 | LO2 << 40 | LO2 << 32 | LO2 << 24 | LO2 << 16 | LO2 << 8 | LO2)
	const FULL_HI2   uint64 = uint64(HI2 << 56 | HI2 << 48 | HI2 << 40 | HI2 << 32 | HI2 << 24 | HI2 << 16 | HI2 << 8 | HI2) | SEPARATORS
	const FULL_LO3   uint64 = uint64(LO3 << 56 | LO3 << 48 | LO3 << 40 | LO3 << 32 | LO3 << 24 | LO3 << 16 | LO3 << 8 | LO3)
	const FULL_HI3   uint64 = uint64(HI3 << 56 | HI3 << 48 | HI3 << 40 | HI3 << 32 | HI3 << 24 | HI3 << 16 | HI3 << 8 | HI3) | SEPARATORS
	const FULL_CHAR4 uint64 = uint64(CHAR4 << 56 | CHAR4 << 48 | CHAR4 << 40 | CHAR4 << 32 | CHAR4 << 24 | CHAR4 << 16 | CHAR4 << 8 | CHAR4)
	const FULL_CHAR5 uint64 = uint64(CHAR5 << 56 | CHAR5 << 48 | CHAR5 << 40 | CHAR5 << 32 | CHAR5 << 24 | CHAR5 << 16 | CHAR5 << 8 | CHAR5)

	const COUNT1       int64 = 0
	const COUNT2       int64 = COUNT1 + HI1 - LO1 + 1
	const COUNT3       int64 = COUNT2 + HI2 - LO2 + 1
	const COUNT4       int64 = COUNT3 + HI3 - LO3 + 1
	const COUNT5       int64 = COUNT4 + 1
	const FULL_COUNT4 uint64 = uint64(COUNT4 << 56 | COUNT4 << 48 | COUNT4 << 40 | COUNT4 << 32 | COUNT4 << 24 | COUNT4 << 16 | COUNT4 << 8 | COUNT4)
	const FULL_COUNT5 uint64 = uint64(COUNT5 << 56 | COUNT5 << 48 | COUNT5 << 40 | COUNT5 << 32 | COUNT5 << 24 | COUNT5 << 16 | COUNT5 << 8 | COUNT5)

	const DIFF1      uint64 = uint64( (LO1 - COUNT1))
	const DIFF2      uint64 = uint64( (LO2 - COUNT2))
	const DIFF3      uint64 = uint64(-(LO3 - COUNT3)) // Note the sign
	const FULL_DIFF1 uint64 = DIFF1 << 56 | DIFF1 << 48 | DIFF1 << 40 | DIFF1 << 32 | DIFF1 << 24 | DIFF1 << 16 | DIFF1 << 8 | DIFF1
	const FULL_DIFF2 uint64 = DIFF2 << 56 | DIFF2 << 48 | DIFF2 << 40 | DIFF2 << 32 | DIFF2 << 24 | DIFF2 << 16 | DIFF2 << 8 | DIFF2
	const FULL_DIFF3 uint64 = DIFF3 << 56 | DIFF3 << 48 | DIFF3 << 40 | DIFF3 << 32 | DIFF3 << 24 | DIFF3 << 16 | DIFF3 << 8 | DIFF3

	const FULL_ONE uint64 = 0x0101010101010101

	var value uint64
	var mask uint64

	src2 := src | SEPARATORS
	err := src & SEPARATORS

	// value += src >= LO1 && HI1 >= src ? src - DIFF1 : 0
	mask = (src2 - FULL_LO1) & (FULL_HI1 - src) & SEPARATORS
	mask = mask - (mask >> 7)
	value += (src & mask) - (FULL_DIFF1 & mask)
	err |= mask

	// value += src >= LO2 && HI2 >= src ? src - DIFF2 : 0
	mask = (src2 - FULL_LO2) & (FULL_HI2 - src) & SEPARATORS
	mask = mask - (mask >> 7)
	value += (src & mask) - (FULL_DIFF2 & mask)
	err |= mask

	// value += src >= LO3 && HI3 >= src ? src + DIFF3 : 0 // Sign of DIFF3 was flipped
	mask = (src2 - FULL_LO3) & (FULL_HI3 - src) & SEPARATORS
	mask = mask - (mask >> 7)
	value += (src & mask) + (FULL_DIFF3 & mask) // Sign of DIFF3 was flipped
	err |= mask

	// value += src == CHAR4 ? COUNT4 : 0
	mask = ((src2 ^ FULL_CHAR4) - FULL_ONE) & SEPARATORS ^ SEPARATORS
	mask = mask - (mask >> 7)
	value += FULL_COUNT4 & mask
	err |= mask

	// value += src == CHAR5 ? COUNT5 : 0
	mask = ((src2 ^ FULL_CHAR5) - FULL_ONE) & SEPARATORS ^ SEPARATORS
	mask = mask - (mask >> 7)
	value += FULL_COUNT5 & mask
	err |= mask

	value = 
		((value & 0x3f00000000000000) >> (56 - 7 * 6)) |
		((value & 0x003f000000000000) >> (48 - 6 * 6)) |
		((value & 0x00003f0000000000) >> (40 - 5 * 6)) |
		((value & 0x0000003f00000000) >> (32 - 4 * 6)) |
		((value & 0x000000003f000000) >> (24 - 3 * 6)) |
		((value & 0x00000000003f0000) >> (16 - 2 * 6)) |
		((value & 0x0000000000003f00) >> ( 8 - 1 * 6)) |
		((value & 0x000000000000003f) >> ( 0 - 0 * 6))

	// err = err == 0x7f7f7f7f7f7f7f7f ? 0 : -1
	_, err = bits.Sub64(err ^ 0x7f7f7f7f7f7f7f7f, 1, 0)
	err = ^(0 - err)

	return value | err // returns -1 on error
}

func base64StandardEncodeBlock(src uint64) uint64 {
	const SEPARATORS uint64 = 0x8080808080808080

	const LO1       int64 = int64('A')
	const HI1       int64 = int64('Z')
	const LO2       int64 = int64('a')
	const HI2       int64 = int64('z')
	const LO3       int64 = int64('0')
	const HI3       int64 = int64('9')
	const LO4       int64 = int64('+')
	const HI4       int64 = int64('+')
	const LO5       int64 = int64('/')
	const FULL_LO1 uint64 = uint64(LO1 << 56 | LO1 << 48 | LO1 << 40 | LO1 << 32 | LO1 << 24 | LO1 << 16 | LO1 << 8 | LO1)

	const DIFF1      uint64 = uint64( (LO2 - HI1 - 1) & 0x7f)
	const DIFF2      uint64 = uint64( (LO3 - HI2 - 1) & 0x7f)
	const DIFF3      uint64 = uint64(-(LO4 - HI3 - 1) & 0x7f) // avoid overflow with 0x71
	const DIFF4      uint64 = uint64( (LO5 - HI4 - 1) & 0x7f)
	const FULL_DIFF1 uint64 = DIFF1 << 56 | DIFF1 << 48 | DIFF1 << 40 | DIFF1 << 32 | DIFF1 << 24 | DIFF1 << 16 | DIFF1 << 8 | DIFF1
	const FULL_DIFF2 uint64 = DIFF2 << 56 | DIFF2 << 48 | DIFF2 << 40 | DIFF2 << 32 | DIFF2 << 24 | DIFF2 << 16 | DIFF2 << 8 | DIFF2
	const FULL_DIFF3 uint64 = DIFF3 << 56 | DIFF3 << 48 | DIFF3 << 40 | DIFF3 << 32 | DIFF3 << 24 | DIFF3 << 16 | DIFF3 << 8 | DIFF3
	const FULL_DIFF4 uint64 = DIFF4 << 56 | DIFF4 << 48 | DIFF4 << 40 | DIFF4 << 32 | DIFF4 << 24 | DIFF4 << 16 | DIFF4 << 8 | DIFF4
	// The overflow:
	// LO1 + DIFF1 + DIFF2 - 0x0f + DIFF4 + 63 = 0xaf
	// LO1 + DIFF1 + DIFF2 + 0x71 + DIFF4 + 63 = 0x12f (overflow)

	const COUNT1      uint64 = uint64(HI1 - LO1) + 1
	const COUNT2      uint64 = uint64(HI2 - LO2) + 1 + COUNT1
	const COUNT3      uint64 = uint64(HI3 - LO3) + 1 + COUNT2
	const COUNT4      uint64 = uint64(HI4 - LO4) + 1 + COUNT3
	const FULL_COUNT1 uint64 = COUNT1 << 56 | COUNT1 << 48 | COUNT1 << 40 | COUNT1 << 32 | COUNT1 << 24 | COUNT1 << 16 | COUNT1 << 8 | COUNT1
	const FULL_COUNT2 uint64 = COUNT2 << 56 | COUNT2 << 48 | COUNT2 << 40 | COUNT2 << 32 | COUNT2 << 24 | COUNT2 << 16 | COUNT2 << 8 | COUNT2
	const FULL_COUNT3 uint64 = COUNT3 << 56 | COUNT3 << 48 | COUNT3 << 40 | COUNT3 << 32 | COUNT3 << 24 | COUNT3 << 16 | COUNT3 << 8 | COUNT3
	const FULL_COUNT4 uint64 = COUNT4 << 56 | COUNT4 << 48 | COUNT4 << 40 | COUNT4 << 32 | COUNT4 << 24 | COUNT4 << 16 | COUNT4 << 8 | COUNT4

	src =
		((src << (56 - 7 * 6)) & 0x3f00000000000000) |
		((src << (48 - 6 * 6)) & 0x003f000000000000) |
		((src << (40 - 5 * 6)) & 0x00003f0000000000) |
		((src << (32 - 4 * 6)) & 0x0000003f00000000) |
		((src << (24 - 3 * 6)) & 0x000000003f000000) |
		((src << (16 - 2 * 6)) & 0x00000000003f0000) |
		((src << ( 8 - 1 * 6)) & 0x0000000000003f00) |
		((src << ( 0 - 0 * 6)) & 0x000000000000003f)
	src2 := src | SEPARATORS
	diff := FULL_LO1

	var mask uint64
	// if   ch  >= COUNT                    { diff += DIFF }
	mask = (src2 - FULL_COUNT1) & SEPARATORS; diff += FULL_DIFF1 & (mask - (mask >> 7))
	mask = (src2 - FULL_COUNT2) & SEPARATORS; diff += FULL_DIFF2 & (mask - (mask >> 7))
	mask = (src2 - FULL_COUNT3) & SEPARATORS; diff -= FULL_DIFF3 & (mask - (mask >> 7)) // overflow avoided
	mask = (src2 - FULL_COUNT4) & SEPARATORS; diff += FULL_DIFF4 & (mask - (mask >> 7))

	return (src + diff) & 0x7f7f7f7f7f7f7f7f
}

I didn't test all the possible encode/decodes (2**48) but I tested all 3 byte values (2**24) left shifted by 0, 6, 12, 18, and 24.

SSE2/AVX2/NEON will be faster. SSE2, AVX2, NEON have compares for bytes equal and signed greater than: _mm_cmpgt_epi8/_mm256_cmpgt_epi8/vceqq_s8 (PCMPGTB/VPCMPGTB/CMEQ) and _mm_cmpeq_epi8/_mm256_cmpeq_epi8/vcgtq_s8 (PCMPEQB/VPCMPEQB/CMGT). Thus a simple conditional add is 3 SIMD instructions (compare, and, add), but adding actual SIMD seems like overkill.

I'll benchmark this tomorrow. If someone else hasn't.

The base32 version is kind of straight forward. For decode you'll get an error on the DIFFs you need to flip the sign. For encode you need to manually check for overflows and flip the appropriate signs.

I'll submit a pull request assuming these are faster—Oh I just realized I need to make a change. To save time on partial blocks. The input to the encode function should be expanded before calling. Similar for the output of the decode function.

Sc00bz

Sc00bz commented on Jul 14, 2025

@Sc00bz

It's slower. I'll try a few things, but I think this will be slower without SIMD. I'm pretty sure this won't change much but the benchmark is testing encode and decode of zero bytes. So it's always hitting the same elements: encode[0] and decodeMap['A'].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposalProposal-CryptoProposal related to crypto packages or other security issues

    Type

    No type

    Projects

    Status

    Active

    Relationships

    None yet

      Development

      Participants

      @neild@Sc00bz@aclements@rolandshoemaker@randall77

      Issue actions

        proposal: encoding/{base32,base64}: constant-time implementations of RFC 4648 codecs for use with cryptography keys · Issue #73901 · golang/go