#### Binary Floating Point Representation
Consider the problem of how to store a floating point number using 32 bit binary representation. The IEEE 754 standard defines the format for single precision floating point numbers.

In order to understand the format, we need to write some functions to convert between binary and decimal representations.  First, we write a function to convert any positive base-10 integer to a binary string.

In [3]:
def convert_int_to_binary(num):
    res = ""

    while num > 0:
        digit = num % 2
        # print(num, digit)
        res = str(digit) + res
        num = int(num / 2)

    return res

In [2]:
convert_int_to_binary(4096)

'1000000000000'

Next, we will write a function to convert a decimal number between 0 and 1 to a binary string.  We will use the following algorithm:

1. Multiply the number by 2
2. If the result is greater than or equal to 1, then the first digit in the binary representation is 1.  Subtract 1 from the result.
3. If the result is less than 1, then the first digit in the binary representation is 0.
4. Repeat steps 1-3 until the result is 0 or we have reached the desired precision.
5. Multiply the result by 2 and go to step 2.
6. If the result is 0, then we have reached the desired precision.  Otherwise, we have reached the desired precision and the next digit is 1.
7. Repeat steps 5-6 until the result is 0 or we have reached the desired precision.

In [4]:
def convert_fraction_to_binary(num, precision=40):
    res = ""
    iter_count = 0  
    while num > 0 and precision > 0:
        num = num * 2
        if num >= 1:
            res += "1"
            num -= 1
        else:
            res += "0"
        precision -= 1
        #print(f"num = {num}  res = {res}  iter_count = {iter_count}    precision = {precision}")
        iter_count += 1
    return res

In [4]:
convert_fraction_to_binary(0.125,35)

'001'

Finally, we will write a function to convert a decimal number to the IEEE 754 binary representation.  The algorithm is as follows:

1. If the number is negative, then the first bit is 1.  Otherwise, the first bit is 0.
2. Convert the absolute value of the number to binary using the algorithm above.
3. If the number is less than 1, then the exponent is the number of digits to the right of the decimal point.  Otherwise, the exponent is the number of digits to the left of the decimal point minus 1.
4. Add 127 to the exponent and convert to binary.
5. If the number is less than 1, then the mantissa is the binary representation of the number without the leading 0.  Otherwise, the mantissa is the binary representation of the number without the leading 1.
6. If the number is less than 1, then the mantissa is padded with 0s on the right until it is 23 bits long.  Otherwise, the mantissa is padded with 0s on the right until it is 23 bits long.
7. The result is the concatenation of the first bit, the exponent, and the mantissa.

In [5]:
def get_fp_binary_representation(n):
    # specify 48 decimal places of precision ... given that we will take 23 for the mantissa, this means we will have far more than we need
    p = 48

    # Step 1:  split the number into two parts - both strings
    front, back = str(n).split('.')

    # Step 2:  convert the part in front of the decimal to binary ... take the sign into account
    if int(front) < 0:
        sign = "-"
        i_front = -int(front)
    else:
        sign = ""
        i_front = int(front)

    if i_front == 0:
        front_bin = "0"
    else:
        front_bin = convert_int_to_binary(i_front)

    # Step 3:  convert the part after the decimal to binary
    divisor = 10**len(back)
    back_bin = convert_fraction_to_binary(float(back)/divisor, p)

    # Step 4:  add the strings together and print the result
    bin_result = sign + front_bin + "." + back_bin
    print(f"The binary representation of the {n} is {bin_result}")

    # Step 5:  Determine the exponent and mantissa
    if front_bin == "0":
        exponent = 0
        keep_going = True
        while keep_going:
            # print (back_bin[-exponent],exponent)
            if back_bin[-exponent] == "1":
                keep_going = False
                exponent = exponent + 1
            exponent = exponent - 1
        exponent = exponent - 1

        # print(exponent)
        back_bin = back_bin[-exponent:]
        mantissa_truncated = back_bin
    else:
        exponent = len(front_bin)-1
        mantissa = front_bin[1:] + back_bin
        mantissa_truncated = mantissa[0:23]

    true_result = sign + "1." + mantissa_truncated + " x 2^(" + str(exponent) + ")"
    print(f"The binary scientific notation representation is {true_result}")

    # Step 6:  Convert to 32-bit floating point representation
    if int(front) < 0:
        bit1 = "1"
    else:
        bit1 = "0"

    exp = int(exponent)+127
    exp_binary_rep = convert_int_to_binary(exp)
    if len(exp_binary_rep) < 8:
        exp_binary_rep = "0" + exp_binary_rep

    if len(mantissa_truncated) < 23:
        mantissa_truncated = mantissa_truncated + (23-len(mantissa_truncated))*"0"

    if len(mantissa_truncated) > 23:
        mantissa_truncated = mantissa_truncated[0:23]

    res = bit1 + "|" + exp_binary_rep + "|" + mantissa_truncated
    return res

In [6]:
#fp_num = input("Enter your floating point value : \n")
fp_num = 3.14159265
result = get_fp_binary_representation(fp_num)
print(f"The IEEE 754 representation is {result}")

print()

fp_num = 3.14159266
result = get_fp_binary_representation(fp_num)
print(f"The IEEE 754 representation is {result}")

print()

fp_num = 1.25
result = get_fp_binary_representation(fp_num)
print(f"The IEEE 754 representation is {result}")

print()

fp_num = -1.25
result = get_fp_binary_representation(fp_num)
print(f"The IEEE 754 representation is {result}")

The binary representation of the 3.14159265 is 11.001001000011111101101010011110010001101010011110
The binary scientific notation representation is 1.10010010000111111011010 x 2^(1)
The IEEE 754 representation is 0|10000000|10010010000111111011010

The binary representation of the 3.14159266 is 11.001001000011111101101010101001000000110110111011
The binary scientific notation representation is 1.10010010000111111011010 x 2^(1)
The IEEE 754 representation is 0|10000000|10010010000111111011010

The binary representation of the 1.25 is 1.01
The binary scientific notation representation is 1.01 x 2^(0)
The IEEE 754 representation is 0|01111111|01000000000000000000000

The binary representation of the -1.25 is -1.01
The binary scientific notation representation is -1.01 x 2^(0)
The IEEE 754 representation is 1|01111111|01000000000000000000000


IEEE 754 representation for 64 bit "double precision" numbers includes 53 bits in the mantissa and 11 in the exponent. It has an exponent bias of 2^10 - 1 = 1023

In [7]:
def get_double_binary_representation(n):
    # specify 48 decimal places of precision ... given that we will take 53 for the mantissa, this means we will have far more than we need
    p = 48

    # Step 1:  split the number into two parts - both strings
    front, back = str(n).split('.')

    # Step 2:  convert the part in front of the decimal to binary ... take the sign into account
    if int(front) < 0:
        sign = "-"
        i_front = -int(front)
    else:
        sign = ""
        i_front = int(front)

    if i_front == 0:
        front_bin = "0"
    else:
        front_bin = convert_int_to_binary(i_front)

    # Step 3:  convert the part after the decimal to binary
    divisor = 10**len(back)
    back_bin = convert_fraction_to_binary(float(back)/divisor, p)

    # Step 4:  add the strings together and print the result
    bin_result = sign + front_bin + "." + back_bin
    print(f"The binary representation of the {n} is {bin_result}")

    # Step 5:  Determine the exponent and mantissa
    if front_bin == "0":
        exponent = 0
        keep_going = True
        while keep_going:
            # print (back_bin[-exponent],exponent)
            if back_bin[-exponent] == "1":
                keep_going = False
                exponent = exponent + 1
            exponent = exponent - 1
        exponent = exponent - 1

        # print(exponent)
        back_bin = back_bin[-exponent:]
        mantissa_truncated = back_bin
    else:
        exponent = len(front_bin)-1
        mantissa = front_bin[1:] + back_bin
        mantissa_truncated = mantissa[0:53]

    true_result = sign + "1." + mantissa_truncated + " x 2^(" + str(exponent) + ")"
    print(f"The binary scientific notation representation is {true_result}")

    # Step 6:  Convert to 32-bit floating point representation
    if int(front) < 0:
        bit1 = "1"
    else:
        bit1 = "0"

    exp = int(exponent)+1023
    exp_binary_rep = convert_int_to_binary(exp)
    if len(exp_binary_rep) < 11:
        exp_binary_rep = "0" + exp_binary_rep

    if len(mantissa_truncated) < 53:
        mantissa_truncated = mantissa_truncated + (53-len(mantissa_truncated))*"0"

    if len(mantissa_truncated) > 53:
        mantissa_truncated = mantissa_truncated[0:53]

    res = bit1 + "|" + exp_binary_rep + "|" + mantissa_truncated
    return res

In [14]:
# double precision is good up to a maximum of 15 decimal places
#fp_num = input("Enter your floating point value : \n")
fp_num = 3.141592653589793
result = get_double_binary_representation(fp_num)
print(f"The IEEE 754 64 bit representation is {result}")

print()

fp_num = 3.141592653589794
result = get_double_binary_representation(fp_num)
print(f"The IEEE 754 64 bit representation is {result}")

print()

fp_num = 1.602176634
result = get_double_binary_representation(fp_num)
print(f"The IEEE 754 64 bit representation is {result}")

print()

fp_num = 1.602176638
result = get_double_binary_representation(fp_num)
print(f"The IEEE 754 64 bit representation is {result}")

The binary representation of the 3.141592653589793 is 11.001001000011111101101010100010001000010110100010
The binary scientific notation representation is 1.1001001000011111101101010100010001000010110100010 x 2^(1)
The IEEE 754 64 bit representation is 0|10000000000|10010010000111111011010101000100010000101101000100000

The binary representation of the 3.141592653589794 is 11.001001000011111101101010100010001000010110100011
The binary scientific notation representation is 1.1001001000011111101101010100010001000010110100011 x 2^(1)
The IEEE 754 64 bit representation is 0|10000000000|10010010000111111011010101000100010000101101000110000

The binary representation of the 1.602176634 is 1.100110100010100000111111011101010111001000000011
The binary scientific notation representation is 1.100110100010100000111111011101010111001000000011 x 2^(0)
The IEEE 754 64 bit representation is 0|01111111111|10011010001010000011111101110101011100100000001100000

The binary representation of the 1.6021766