Decode URL safe base64 correctly #4721
Open
+22
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Subsystem
ktor-utils
Motivation
In a unit test for a Ktor project I was encoding and decoding some data with
ByteArray.encodeBase64(): String
andString.decodeBase64Bytes(): ByteArray
and was wondering why the result was not what I was expecting. After decoding and encoding, the resulting data was very slightly different from what I put in.Turns out that the issue was, that I was using URL-safe base64 (i.e. same as normal base64, but all
+
characters are replaced by-
and all/
replaced by_
, also no=
padding at the end).After looking at the source code, I found that
String.decodeBase64Bytes(): ByteArray
treats every character that is not a valid Base64 character as if it were the character/
. That's because the decoded character is determined usingindexOf()
, which returns-1
, so the last character in the alphabet is selected.So
"_-!A".decodeBase64Bytes()
is decoded tobyteArrayOf(-1, -1, -64)
.byteArrayOf(-1, -1, -64).encodeBase64()
will encode to///A
.Solution
I thought, this could be easily adapted to work with URL-safe base64 as well. Without impacting encoding and decoding of valid "normal" Base64. So that's what I did with this PR.
For strings containing non-base64 I was assuming this behaviour was currently undefined (and as far as I can see not documented) behaviour, so I hope it is fine to make this here.
I also added a test case verifying that a string containing the entire Base64 alphabet (both normal and URL safe) is converted correctly, since several characters were not tested in the previous unit tests.
By the way: I'm not sure how far along it currently is, but there is now also a
Base64
utility in the stdlib: https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.io.encoding/-base64/ (currently needs opt-in to@ExperimentalEncodingApi
). Are there any reasons holding ktor back from just using that (besides it still being marked as experimental)?