Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode URL safe base64 correctly #4721

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Decode URL safe base64 correctly #4721

wants to merge 4 commits into from

Conversation

floscher
Copy link

@floscher floscher commented Mar 3, 2025

Subsystem
ktor-utils

Motivation
In a unit test for a Ktor project I was encoding and decoding some data with ByteArray.encodeBase64(): String and String.decodeBase64Bytes(): ByteArray and was wondering why the result was not what I was expecting. After decoding and encoding, the resulting data was very slightly different from what I put in.

Turns out that the issue was, that I was using URL-safe base64 (i.e. same as normal base64, but all + characters are replaced by - and all / replaced by _, also no = padding at the end).

After looking at the source code, I found that String.decodeBase64Bytes(): ByteArray treats every character that is not a valid Base64 character as if it were the character /. That's because the decoded character is determined using indexOf(), which returns -1, so the last character in the alphabet is selected.

So "_-!A".decodeBase64Bytes() is decoded to byteArrayOf(-1, -1, -64).

byteArrayOf(-1, -1, -64).encodeBase64() will encode to ///A.

Solution
I thought, this could be easily adapted to work with URL-safe base64 as well. Without impacting encoding and decoding of valid "normal" Base64. So that's what I did with this PR.

For strings containing non-base64 I was assuming this behaviour was currently undefined (and as far as I can see not documented) behaviour, so I hope it is fine to make this here.

I also added a test case verifying that a string containing the entire Base64 alphabet (both normal and URL safe) is converted correctly, since several characters were not tested in the previous unit tests.


By the way: I'm not sure how far along it currently is, but there is now also a Base64 utility in the stdlib: https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.io.encoding/-base64/ (currently needs opt-in to @ExperimentalEncodingApi). Are there any reasons holding ktor back from just using that (besides it still being marked as experimental)?

floscher added 2 commits March 3, 2025 20:47
…aracters

This also checks decoding of URL-safe base64 strings, which currently does not work, but will be changed in a separate commit.
Copy link
Contributor

@bjhham bjhham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Could you run ./gradlew formatKotlin to get past the code style check?

We might also want to target this for 3.2.0 in case people are relying on the incorrect behaviour.

@floscher
Copy link
Author

floscher commented Mar 4, 2025

@bjhham Done, it was just a blank line too much.

@bjhham
Copy link
Contributor

bjhham commented Mar 7, 2025

Thanks again, I decided we can target this for 3.1.2 - here is a YouTrack ticket for the purposes of tracking https://youtrack.jetbrains.com/issue/KTOR-8292/URL-safe-base64-decoding-problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants