Instructions for porting to C# #189

verelpode · 2020-12-05T19:06:13Z

It seems like Ulf Adams did a great job in creating Ryu, and he made impressive optimizations. The only downside is that the "master" version is the plain-C version instead of a modern language. Nevertheless it's great work.

I wrote up instructions for porting the "d2s.c" file to C#:

Firstly, eliminate all usage of the plain-C/prehistoric memcpy function. Change this:

memcpy(result + index + olength - i - 1, DIGIT_TABLE + c0, 2);
memcpy(result + index + olength - i - 3, DIGIT_TABLE + c1, 2);
memcpy(result + index + olength - i - 5, DIGIT_TABLE + d0, 2);
memcpy(result + index + olength - i - 7, DIGIT_TABLE + d1, 2);

To:

CopyFromDigitTable(result, (uint)index + olength - i - 1, c0);
CopyFromDigitTable(result, (uint)index + olength - i - 3, c1);
CopyFromDigitTable(result, (uint)index + olength - i - 5, d0);
CopyFromDigitTable(result, (uint)index + olength - i - 7, d1);

Where CopyFromDigitTable is a new method (not in the original source code) implemented like this:

[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
private static void CopyFromDigitTable(System.Span<char> inDestination, uint inDestOffset, uint inDigitTableOffset)
{
	unchecked {
		inDestination[(int)inDestOffset] = DIGIT_TABLE[inDigitTableOffset];
		inDestination[(int)inDestOffset+1] = DIGIT_TABLE[inDigitTableOffset+1];
	}
}
// A table of all two-digit numbers. This is used to speed up decimal digit generation by copying pairs of digits into the final output.
private static readonly char[] DIGIT_TABLE = new char[200] {
	'0','0','0','1','0','2','0','3','0','4','0','5','0','6','0','7','0','8','0','9',
	'1','0','1','1','1','2','1','3','1','4','1','5','1','6','1','7','1','8','1','9',
	'2','0','2','1','2','2','2','3','2','4','2','5','2','6','2','7','2','8','2','9',
	'3','0','3','1','3','2','3','3','3','4','3','5','3','6','3','7','3','8','3','9',
	'4','0','4','1','4','2','4','3','4','4','4','5','4','6','4','7','4','8','4','9',
	'5','0','5','1','5','2','5','3','5','4','5','5','5','6','5','7','5','8','5','9',
	'6','0','6','1','6','2','6','3','6','4','6','5','6','6','6','7','6','8','6','9',
	'7','0','7','1','7','2','7','3','7','4','7','5','7','6','7','7','7','8','7','9',
	'8','0','8','1','8','2','8','3','8','4','8','5','8','6','8','7','8','8','8','9',
	'9','0','9','1','9','2','9','3','9','4','9','5','9','6','9','7','9','8','9','9'
};

To port the mulShiftAll64 function, simplify and replace its confusing/unsafe mul pointer parameter with 2 simple reliable uint64 parameters named mul_0 and mul_1. Change this:

static inline uint64_t mulShiftAll64(const uint64_t m, const uint64_t* const mul, const int32_t j,
  uint64_t* const vp, uint64_t* const vm, const uint32_t mmShift) {
  *vp = mulShift64(4 * m + 2, mul, j);
  *vm = mulShift64(4 * m - 1 - mmShift, mul, j);
  return mulShift64(4 * m, mul, j);
}

To use two simple uint64 parameters named mul_0 and mul_1 instead of the unsafe mul pointer:

private static uint64_t mulShiftAll64(uint64_t m, uint64_t mul_0, uint64_t mul_1, int32_t j, out uint64_t vp, out uint64_t vm, uint32_t mmShift)
{
	unchecked {
		vp = mulShift64(4 * m + 2, mul_0, mul_1, j);
		vm = mulShift64(4 * m - 1 - mmShift, mul_0, mul_1, j);
		return mulShift64(4 * m, mul_0, mul_1, j);
	} // unchecked
}

As you see above, mulShiftAll64 invokes mulShift64. Likewise change mulShift64 to have 2 parameters mul_0 and mul_1 instead of the unsafe mul pointer.
Also, mulShift64 invokes umul128. Replace umul128 invocations with System.Math.BigMul:

private static uint64_t mulShift64(uint64_t m, uint64_t mul_0, uint64_t mul_1, int32_t j)
{
	unchecked {
		// m is maximum 55 bits
		uint64_t low1, high1; // 128
		high1 = System.Math.BigMul(m, mul_1, low: out low1);  // was:  low1 = umul128(m, mul_1, &high1); // 64
		uint64_t high0 = System.Math.BigMul(m, mul_0, low: out _);  // was:  umul128(m, mul_0, &high0); // 0
		uint64_t sum = high0 + low1;
		if (sum < high0) {
			++high1; // overflow into high1
		}
		return shiftright128(sum, high1, j - 64);
	} // unchecked
}

Swap "high" and "low" when replacing umul128 with System.Math.BigMul. i.e. System.Math.BigMul and umul128 have exchanged meanings of the output parameter and return value ("low" and "high" are in swapped places).

When the code invokes mulShiftAll64, update the invocation to pass the two new mul_0 and mul_1 parameters instead of the mul pointer: Change this:

vr = mulShiftAll64(m2, DOUBLE_POW5_INV_SPLIT[q], i, &vp, &vm, mmShift);

To:

vr = mulShiftAll64(m2, DOUBLE_POW5_INV_SPLIT[q,0], DOUBLE_POW5_INV_SPLIT[q,1], i, out vp, out vm, mmShift);

Note the usage of the two-dimensional array, hence the comma in DOUBLE_POW5_INV_SPLIT[q,0]. This is much clearer and safer than the original confusing mul pointer.

To port the lookup table, change this:

#define DOUBLE_POW5_INV_BITCOUNT 125
#define DOUBLE_POW5_BITCOUNT 125
#define DOUBLE_POW5_INV_TABLE_SIZE 342
#define DOUBLE_POW5_TABLE_SIZE 326

static const uint64_t DOUBLE_POW5_INV_SPLIT[DOUBLE_POW5_INV_TABLE_SIZE][2] = {
  { 1u, 2305843009213693952u }, { 11068046444225730970u, 1844674407370955161u },
  // ...
};

To a C# two-dimensional array with comma in the [,]:

private const int DOUBLE_POW5_INV_BITCOUNT = 125;
private const int DOUBLE_POW5_BITCOUNT = 125;
private const int DOUBLE_POW5_INV_TABLE_SIZE = 342;
private const int DOUBLE_POW5_TABLE_SIZE = 326;

private static readonly uint64_t[,] DOUBLE_POW5_INV_SPLIT = new uint64_t[DOUBLE_POW5_INV_TABLE_SIZE, 2] {
	{ 1u, 2305843009213693952u }, { 11068046444225730970u, 1844674407370955161u },
// ...
};

To extract the float64/double to raw bits in a uint64, change this:

int d2s_buffered_n(double f, char* result) {
	// Step 1: Decode the floating-point number, and unify normalized and subnormal cases.
	const uint64_t bits = double_to_bits(f);
	// ...
}

To use either BitConverter.DoubleToInt64Bits or CompilerServices.Unsafe.As:

private static int d2s_buffered_n(double f, System.Span<char> result)
{
	unchecked {
		// Step 1: Decode the floating-point number, and unify normalized and subnormal cases.
		uint64_t bits = (uint64_t)System.BitConverter.DoubleToInt64Bits(f);
		// Or:
		uint64_t bits = System.Runtime.CompilerServices.Unsafe.As<double, uint64_t>(ref f);
		
		// ...
	} // unchecked
}

Insert unchecked { ... } blocks covering the entire body of each function/method, to solve the problem of it throwing overflow exceptions whenever the C# program is compiled with overflow-checking enabled. In many places, this Ryu code is designed to deliberately cause and ignore overflows.

// Returns e == 0 ? 1 : ceil(log_2(5^e)); requires 0 <= e <= 3528.
private static int32_t pow5bits(int32_t e)
{
	unchecked {
		// This approximation works up to the point that the multiplication overflows at e = 3529.
		// If the multiplication were done in 64 bits, it would fail at 5^4004 which is just greater than 2^9297.
		assert(e >= 0);
		assert(e <= 3528);
		return (int32_t)(((((uint32_t)e) * 1217359) >> 19) + 1);
	} // unchecked
}

Change every char* to System.Span<char>. For example, change:

static inline int to_chars(const floating_decimal_64 v, const bool sign, char* const result) { /*...*/ }

To:

private static int to_chars(floating_decimal_64 v, bool sign, System.Span<char> result) { /*...*/ }

This Ryu code continues to work regardless of whether char is 8-bit or 16-bit.

Change this:

void d2s_buffered(double f, char* result) {
  const int index = d2s_buffered_n(f, result);

  // Terminate the string.
  result[index] = '\0';
}
char* d2s(double f) {
  char* const result = (char*) malloc(25);
  d2s_buffered(f, result);
  return result;
}

To:

public static string d2s(double f)
{
	System.Span<char> result = stackalloc char[25];
	int index = d2s_buffered_n(f, result);
	return result.Slice(0, index).ToString();
}

The struct floating_decimal_64 should be changed to readonly:

// A floating decimal representing m * 10^e.
private readonly struct floating_decimal_64
{
	public readonly uint64_t mantissa;
	// Decimal exponent's range is -324 to 308 inclusive, and can fit in a short if needed.
	public readonly int32_t exponent;
	public floating_decimal_64(uint64_t inMantissa, int32_t inExponent)
	{
		mantissa = inMantissa;
		exponent = inExponent;
	}
}

Only the d2s_buffered_n function/method modifies the fields inside struct floating_decimal_64 but it is very easy to change d2s_buffered_n to use local variables instead of the fields, followed by a final step of creating the instance of floating_decimal_64 at the end:

// ...
floating_decimal_64 v;
bool isSmallInt = d2d_small_int(ieeeMantissa, ieeeExponent, out v);
if (isSmallInt)
{
	// For small integers in the range [1, 2^53), v.mantissa might contain trailing (decimal) zeros.
	// For scientific notation we need to move these zeros into the exponent.
	// (This is not needed for fixed-point notation, so it might be beneficial to trim
	// trailing zeros in to_chars only if needed - once fixed-point notation output is implemented.)
	uint64_t v_mantissa = v.mantissa;
	int32_t v_exponent = v.exponent;
	for ( ; ; )
	{
		uint64_t q = div10(v_mantissa);
		uint32_t r = ((uint32_t)v_mantissa) - 10 * ((uint32_t)q);
		if (r != 0) break;
		v_mantissa = q;
		++v_exponent;
	}
	v = new floating_decimal_64(v_mantissa, v_exponent);
}
// ...

Use AggressiveInlining where applicable:

[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
private static uint64_t div5(uint64_t x) { return x / 5; }

[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
private static uint64_t div10(uint64_t x) { return x / 10; }

[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
private static uint64_t div100(uint64_t x) { return x / 100; }

[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
private static uint64_t div1e8(uint64_t x) { return x / 100000000; }

// ...and likewise for other tiny functions...

The text was updated successfully, but these errors were encountered:

verelpode · 2020-12-05T19:11:52Z

@Dogwei -- I see your C# port at:
https://github.com/Dogwei/RyuCsharp

Thanks, that's great -- that helped me get started with Ryu faster. I suggest some improvements. I see that currently your port doesn't use System.Math.BigMul. To achieve the desired performance goals, I suggest replacing umul128 invocations with System.Math.BigMul as shown in my previous message.

I also suggest eliminating (or at least reducing) the usage of unsafe pointers. These unsafe pointers are unnecessary. For char pointers, if you use .NET Framework 5.0 (or .NET Core), then you can use the safe, reliable, fast Span<char> instead of char*. Alternatively, instead of Span<char>, you can use a simple char array such char[] result = new char[25];. No need to resort to unsafe pointers.

Even when using the older .NET Framework 4.8 that doesn't support Span<T>, you can still implement the two-dimensional array 'DOUBLE_POW5_INV_SPLIT' without using any unsafe pointers. In the mulShiftAll64 function, simplify and replace its confusing/unsafe mul pointer parameter with 2 simple reliable uint64 parameters named mul_0 and mul_1. This is demonstrated in my previous message.

hmmm on second point, maybe you know all this already but you wanted to make a C# Ryu version that is as close as possible to the original plain-C source code, regardless of whether it does or doesn't achieve the performance goals when System.Math.BigMul etc are not used.

If that was your goal, I suggest you could make two versions: One that is a practical port to C# for real-world usage, and another version that is as close as possible to the plain-C source code.

verelpode · 2020-12-05T19:15:28Z

See also this suggestion from @Tornhoof to implement Ryu inside a future version of the .NET Framework 5.0:

Consider implementing Ryu algorithm for double.ToString() #10939

ulfjack · 2020-12-07T11:59:42Z

Hi @verelpode. I intentionally picked a boring old languages, rather than an exciting new one, exactly because it is boring, close to assembly, and still pretty much universal. I'm happy to link to wherever you want to post your port or instructions for whatever modern language you feel like. However, this repo is going to stick with C. This report isn't directly actionable, so I will close it. Feel free to post the instructions (or even better, the source) somewhere and let me know. Thanks!

ulfjack closed this as completed Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructions for porting to C# #189

Instructions for porting to C# #189

verelpode commented Dec 5, 2020

verelpode commented Dec 5, 2020

verelpode commented Dec 5, 2020

ulfjack commented Dec 7, 2020

Instructions for porting to C# #189

Instructions for porting to C# #189

Comments

verelpode commented Dec 5, 2020

verelpode commented Dec 5, 2020

verelpode commented Dec 5, 2020

ulfjack commented Dec 7, 2020