## Example 4.1 : The difference of two numbers

In [1]:
import numpy as np
from math import sqrt

In [2]:
# for later use, define some functions
def parse_double(d):  # given a 64bit double, this function returns its sign, exponent, fraction part
    x = np.float64(d)
    x_int64 = x.view(np.int64)  # reinterpret the bits of x as int64 (necessary to use 064b in the next line)
    binary_str = str(f"{x_int64:064b}")  # convert the number to string representing the binary
    sign = binary_str[0]
    exponent = binary_str[1:12]
    fraction = binary_str[12:]
    return sign, exponent, fraction

def make_double(sign_str, exponent_str, fraction_str):  # this is the "inverse" of parse_double
    assert len(sign_str)==1 and len(exponent_str)==11 and len(fraction_str)==52
    sign = int(sign_str, 2)
    exponent = int(exponent_str, 2)
    fraction = 0.
    for dig in range(52):
        bit = int(fraction_str[dig])
        r = bit * 2**(-dig-1)
        fraction += r
    assert fraction < 1.
    d = (-1)**sign * (1. + fraction) * 2**(exponent - 1023)
    return d

# Task
Calculate the (1e14) *(y-x) in the following cell step by step according to the rule of floating point addition and multiplication

In [3]:
x = 1.0
y = 1.0 + (1e-14) *sqrt(2)
print((1e14) *(y-x))
print(sqrt(2))

1.4210854715202004
1.4142135623730951


### Calculation of (1e-14) *sqrt(2)

In [4]:
print(parse_double(1e-14))
print(parse_double(sqrt(2)))

('0', '01111010000', '0110100001001001101110000110101000010010101110011011')
('0', '01111111111', '0110101000001001111001100110011111110011101111001101')


A = 01111010000____0110100001001001101110000110101000010010101110011011  
B = 01111111111____0110101000001001111001100110011111110011101111001101  
Prepend "1." to the fraction part:  
A = 01111010000__1.0110100001001001101110000110101000010010101110011011  
B = 01111111111__1.0110101000001001111001100110011111110011101111001101  
Add the biased exponent:  
01111010000 + 01111111111 - 01111111111 = 01111010000  
Multiply the 1+f parts:  
1.0110100001001001101110000110101000010010101110011011 * 1.0110101000001001111001100110011111110011101111001101 = 1.111111011000011000101101101000100000001010010101011011 (truncated to 55 digits)  
Then C=A*B is:  
C = 01111010000____1.111111011000011000101101101000100000001010010101011011  
Normalize: (unnecessary)  
Round:  
C = 01111010000____1.1111110110000110001011011010001000000010100101010111  

The intermediate result is as follows:

In [5]:
make_double("0", "01111010000", "1111110110000110001011011010001000000010100101010111")

1.4142135623730951e-14

### Calculation of 1.0 + (1e-14) *sqrt(2)

In [6]:
print(parse_double(1.0))

('0', '01111111111', '0000000000000000000000000000000000000000000000000000')


A = 01111010000____1111110110000110001011011010001000000010100101010111  (the previous result)  
B = 01111111111____0000000000000000000000000000000000000000000000000000  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 01111010000__1.1111110110000110001011011010001000000010100101010111  
B = 01111111111__1.0000000000000000000000000000000000000000000000000000  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
Since 01111111111 - 01111010000 = 101111 = (47)_10, we shift the 1+f part of A by 47 times!!  
A = 01111111111__0.0000000000000000000000000000000000000000000000111111101100 (truncated)   
B = 01111111111__1.0000000000000000000000000000000000000000000000000000  
Add the 1+f parts, and call the result C:   
C = 01111111111__1.0000000000000000000000000000000000000000000000111111101100  
Round :  
C = 01111111111__1.0000000000000000000000000000000000000000000001000000  
 
The intermediate result is as follows:

In [7]:
make_double("0", "01111111111", "0000000000000000000000000000000000000000000001000000")

1.0000000000000142

### Calculation of [1.0 + (1e-14) *sqrt(2)] - 1.0

In [8]:
print(parse_double(1.0))

('0', '01111111111', '0000000000000000000000000000000000000000000000000000')


A = 01111111111____1.0000000000000000000000000000000000000000000001000000  (previous result)  
B = 01111111111____1.0000000000000000000000000000000000000000000000000000
Subtract the 1+f part of B from that of A:  
C = 01111111111____0.0000000000000000000000000000000000000000000001000000  
Normalize :  
We should shift the 1+f part of C to the left 46 times !!  Since (46)_10 = (101110)_2, the exponent part becomes 01111111111 - 101110 = 01111010001  
C = 01111010001____1.0000000000000000000000000000000000000000000000000000    

Thus we have seen that the combination of "large number of right-shift" + "truncation" + "large number of left-shit" yields large numerical error.  
The intermediate result is:

In [9]:
make_double("0", "01111010001", "0000000000000000000000000000000000000000000000000000")

1.4210854715202004e-14

### Calculation of ([1.0 + (1e-14) *sqrt(2)] - 1.0) * 1e14
The result might be obvious, but we perform calculation anyway.

In [10]:
print(parse_double(1e14))

('0', '10000101101', '0110101111001100010000011110100100000000000000000000')


A = 01111010001____1.0000000000000000000000000000000000000000000000000000  (previous result)  
B = 10000101101____1.0110101111001100010000011110100100000000000000000000  
Add the biased exponent:  
01111010001 + 10000101101 - 01111111111 = 01111111111  
Multiply the 1+f parts:  
the result is clearly 1.0110101111001100010000011110100100000000000000000000  
Then C=A*B is:  
C = 01111111111____1.0110101111001100010000011110100100000000000000000000  
Normalize: (unnecessary) 
Round: (unnecessary)  

Thus the final result is:

In [11]:
make_double("0", "01111111111", "0110101111001100010000011110100100000000000000000000")

1.4210854715202004

This coincides with the expected value : 

In [12]:
((1.0 + (1e-14) *sqrt(2)) - 1.0) * 1e14

1.4210854715202004

when we take another look : 

In [13]:
a = sqrt(2)
print("step 1: ", f"{a:.110f}")
a = 1e-14 * a
print("step 2: ", f"{a:.110f}")
a = 1.0 + a
print("step 3: ", f"{a:.110f}")
a = a - 1.0
print("step 4: ", f"{a:.110f}")
a = 1e14 * a
print("step 5: ", f"{a:.110f}")

step 1:  1.41421356237309514547462185873882845044136047363281250000000000000000000000000000000000000000000000000000000000
step 2:  0.00000000000001414213562373095145516491647545337186492049163372142217554028320591896772384643554687500000000000
step 3:  1.00000000000001421085471520200371742248535156250000000000000000000000000000000000000000000000000000000000000000
step 4:  0.00000000000001421085471520200371742248535156250000000000000000000000000000000000000000000000000000000000000000
step 5:  1.42108547152020037174224853515625000000000000000000000000000000000000000000000000000000000000000000000000000000
