Skip to content

Commit

Permalink
minor cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
tvercaut committed Feb 22, 2024
1 parent b099d42 commit e522a49
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 34 deletions.
27 changes: 17 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# base64
A simple approach to convert strings from and to base64.
Header only library.
Header only c++ library (single header).

## Usage

Expand All @@ -10,20 +10,27 @@ Header only library.
#include "base64.hpp"

int main() {
auto base64= to_base64("Hello, World!");
std::cout << base64 << std::endl; // SGVsbG8sIFdvcmxkIQ==
auto s = from_base64("SGVsbG8sIFdvcmxkIQ==");
std::cout << s << std::endl; // Hello, World!
auto encoded_str = base64::to_base64("Hello, World!");
std::cout << encoded_str << std::endl; // SGVsbG8sIFdvcmxkIQ==
auto decoded_str = base64::from_base64("SGVsbG8sIFdvcmxkIQ==");
std::cout << decoded_str << std::endl; // Hello, World!
}
```

## Notes
This library relies on C++17.
This library relies on C++17 but will exploit some C++20 features if available (e.g. `bit_cast`).

A benchmark of various c/c++ base64 implementations can be found at https://github.com/gaspardpetit/base64/
There are many implementations available and it may be worth looking at those. A benchmark of various c/c++ base64 implementations can be found at https://github.com/gaspardpetit/base64/

There are many implementations available and it may be worth looking at those. For example, a different, unrelated, C++20 library for base64 encoding/decoding can be found at https://github.com/matheusgomes28/base64pp
This implementation here adopts the approach of Nick Galbreath's `modp_b64` library also used by chromium (e.g. https://github.com/chromium/chromium/tree/main/third_party/modp_b64 ) but offers it as a c++ single header file. This choice was based on the good computational performance of the underpinning algorithm. We also decided to avoid relying on a c++ `union` to perform type punning as this, while working in practice, is strictly speaking undefined behaviour in c++: https://en.wikipedia.org/wiki/Type_punning#Use_of_union

There is also an implementation that works with older C++ versions available at https://github.com/ReneNyffenegger/cpp-base64
Faster c/c++ implementations exist althrough these likely exploit simd / openmp or similar acceleration techniques:
- https://github.com/aklomp/base64
- https://github.com/lemire/fastbase64 (From a [blog post](https://lemire.me/blog/2018/01/17/ridiculously-fast-base64-encoding-and-decoding/) by the authors: "My understanding is that our good results have been integrated in [Klomp’s base64 library](https://github.com/aklomp/base64).")
- Other implementations related to the one by lemire: https://github.com/WojciechMula/base64-avx512 and https://github.com/WojciechMula/base64simd
- https://github.com/powturbo/Turbo-Base64 (Note that this is licensed under GPL 3.0)

There are also some more generic libraries available such as https://github.com/azawadzki/base-n
Many other C++ centric appraches exists although they seem to focus on readibility or genericity at the cost of performance, e.g.:
- https://github.com/matheusgomes28/base64pp (C++20 library from which we borrowed the unit test code)
- https://github.com/ReneNyffenegger/cpp-base64 (Implementation that works with older C++ versions)
- https://github.com/azawadzki/base-n (more generic baseN such as N=16 and N=32)
49 changes: 25 additions & 24 deletions include/base64.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,13 @@ std::array<std::uint32_t, 256> constexpr decode_table_3 = {
0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff,
0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff,
0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff};

// TODO fix decoding tables to avoid the need for different indices in big
// endian?
inline constexpr size_t decidx0{0};
inline constexpr size_t decidx1{1};
inline constexpr size_t decidx2{2};

#elif defined(__BIG_ENDIAN__)

std::array<std::uint32_t, 256> constexpr decode_table_0 = {
Expand Down Expand Up @@ -433,6 +440,12 @@ std::array<std::uint32_t, 256> constexpr decode_table_3 = {
0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff,
0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff};

// TODO fix decoding tables to avoid the need for different indices in big
// endian?
inline constexpr size_t decidx0{1};
inline constexpr size_t decidx1{2};
inline constexpr size_t decidx2{3};

#endif

std::array<char, 256> constexpr encode_table_0 = {
Expand Down Expand Up @@ -595,19 +608,15 @@ inline OutputBuffer decode_into(std::string_view base64Text) {
"Invalid base64 encoded data - Invalid character"};
}

// Use bit_cast instead of union and type punning to avoid
// undefined behaviour risk:
// https://en.wikipedia.org/wiki/Type_punning#Use_of_union
const std::array<char, 4> tempBytes =
detail::bit_cast<std::array<char, 4>, uint32_t>(temp);

#if defined(__LITTLE_ENDIAN__)
*currDecoding++ = tempBytes[0];
*currDecoding++ = tempBytes[1];
*currDecoding++ = tempBytes[2];
#else
// TODO fix decoding table to avoid the #if here?
*currDecoding++ = tempBytes[1];
*currDecoding++ = tempBytes[2];
*currDecoding++ = tempBytes[3];
#endif
*currDecoding++ = tempBytes[detail::decidx0];
*currDecoding++ = tempBytes[detail::decidx1];
*currDecoding++ = tempBytes[detail::decidx2];
}

switch (numPadding) {
Expand All @@ -630,16 +639,13 @@ inline OutputBuffer decode_into(std::string_view base64Text) {
"Invalid base64 encoded data - Invalid character"};
}

// Use bit_cast instead of union and type punning to avoid
// undefined behaviour risk:
// https://en.wikipedia.org/wiki/Type_punning#Use_of_union
const std::array<char, 4> tempBytes =
detail::bit_cast<std::array<char, 4>, uint32_t>(temp);
#if defined(__LITTLE_ENDIAN__)
*currDecoding++ = tempBytes[0];
*currDecoding++ = tempBytes[1];
#else
// TODO fix decoding table to avoid the #if here?
*currDecoding++ = tempBytes[1];
*currDecoding++ = tempBytes[2];
#endif
*currDecoding++ = tempBytes[detail::decidx0];
*currDecoding++ = tempBytes[detail::decidx1];
break;
}
case 2: {
Expand All @@ -658,12 +664,7 @@ inline OutputBuffer decode_into(std::string_view base64Text) {

const std::array<char, 4> tempBytes =
detail::bit_cast<std::array<char, 4>, uint32_t>(temp);
#if defined(__LITTLE_ENDIAN__)
*currDecoding++ = tempBytes[0];
#else
// TODO fix decoding table to avoid the #if here?
*currDecoding++ = tempBytes[1];
#endif
*currDecoding++ = tempBytes[detail::decidx0];
break;
}
default: {
Expand Down

0 comments on commit e522a49

Please sign in to comment.