## Self-learning 4.1 The basics of floating point, its addition and multiplication

In [1]:
import numpy as np

In [2]:
# for later use, define some functions
def parse_double(d):  # given a 64bit double, this function returns its sign, exponent, fraction part
    x = np.float64(d)
    x_int64 = x.view(np.int64)  # reinterpret the bits of x as int64 (necessary to use 064b in the next line)
    binary_str = str(f"{x_int64:064b}")  # convert the number to string representing the binary
    sign = binary_str[0]
    exponent = binary_str[1:12]
    fraction = binary_str[12:]
    return sign, exponent, fraction

def make_double(sign_str, exponent_str, fraction_str):  # this is the "inverse" of parse_double
    assert len(sign_str)==1 and len(exponent_str)==11 and len(fraction_str)==52
    sign = int(sign_str, 2)
    exponent = int(exponent_str, 2)
    fraction = 0.
    for dig in range(52):
        bit = int(fraction_str[dig])
        r = bit * 2**(-dig-1)
        fraction += r
    assert fraction < 1.
    d = (-1)**sign * (1. + fraction) * 2**(exponent - 1023)
    return d

# Task 1 (the basics of floating point representation)
For the following results in the next cell, 
 - Explain why the results are rounded to 17 digits.
 - Explain the specific values shown in the results: why are "0.12345678901234568" and "1.2345678901234567" chosen as the approximate values in a computer? 

In [3]:
print(0.1234567890123456789)
print(1.234567890123456789)

0.12345678901234568
1.2345678901234567


### Explanation of 17 digits

In python, a real numbers is expresed by default as a double (64bit) floating point number, which is decomposed as :  
sign bit (1bit)  
exponent (11bit)  
fraction (52bit)  
and the number is given by
$(-1)^{\rm s} * (1. + {\rm fraction}) * 2^{{\rm exponent}-1023}$.  

The range is governed by the exponent part, and the precision is by the fraction part.  
The minimal step (compared to 1) is $2^{-52}\approx 2.22{e-16}$, the 64bit double has meaning only up to the 16th decimal place.  Thus the digits which have meaning to show to user is 1+16=17.

Note: For denormalized number (with its absolute value $<2^{-1022}\approx 2.22e-308$), the story is a little bit different.

### Explanation of the specific approximate values
First, we consider the case
print(0.1234567890123456789) = 0.12345678901234568

In [4]:
s, e, f = parse_double(0.12345678901234568)
print(s,e,f)

0 01111111011 1111100110101101110100110111010001101111011001011111


In [5]:
print(make_double("0", "01111111011", "1111100110101101110100110111010001101111011001011110"))
print(make_double("0", "01111111011", "1111100110101101110100110111010001101111011001011111"))
print(make_double("0", "01111111011", "1111100110101101110100110111010001101111011001100000"))

0.12345678901234566
0.12345678901234568
0.12345678901234569


Now we can see that 0.12345678901234568 was chosen because it is the closest to the "true value" 0.1234567890123456789.

Similarly, 1.2345678901234567 was chosen because it is the closest to the "true value" 1.234567890123456789, as follows.

In [6]:
s, e, f = parse_double(1.234567890123456789)
print(s,e,f)

0 01111111111 0011110000001100101001000010100011000101100111111011


In [7]:
print(make_double("0", "01111111111", "0011110000001100101001000010100011000101100111111010"))
print(make_double("0", "01111111111", "0011110000001100101001000010100011000101100111111011"))
print(make_double("0", "01111111111", "0011110000001100101001000010100011000101100111111100"))

1.2345678901234565
1.2345678901234567
1.234567890123457


Note : 
Of course, computer internally has digits more than 17.  Since $2^{-52} = 2.220446049250313080847263336181640625{\mathrm e}{\text -}16$ (37 digits), exact values corresponding to floating points generally have digits more than 16+37=53.  The next cell is an example:

In [8]:
print(f"{0.3:.65f}")
print(f"{0.03:.65f}")
print(f"{0.003:.65f}")

0.29999999999999998889776975374843459576368331909179687500000000000
0.02999999999999999888977697537484345957636833190917968750000000000
0.00300000000000000006245004513516505539882928133010864257812500000


# Task 2 (floating point addition)

According to https://www.sciencedirect.com/topics/computer-science/floating-point-addition, the rule of floating point addition is as follows:


1. Extract exponent and fraction bits.  
2. Prepend leading 1 to form the mantissa.  
3. Compare exponents.  
4. Shift smaller mantissa if necessary.  
5. Add mantissas.  
6. Normalize mantissa and adjust exponent if necessary.  
7. Round result.  
8. Assemble exponent and fraction back into floating-point number.  

(mantissa is the same meaning as the word "fraction" we use here.)  
Explain the following results according to the rule.

In [9]:
print(0.001 + 0.002)
print(0.01 + 0.02)
print(0.1 + 0.2)
print(0.11 + 0.19)
print("---------")
print(0.1 + 0.7)
print(5.8 + 53.62)

0.003
0.03
0.30000000000000004
0.3
---------
0.7999999999999999
59.419999999999995


### Explanation of 0.001 + 0.002 = 0.003

In [10]:
print(parse_double(0.001))
print(parse_double(0.002))

('0', '01111110101', '0000011000100100110111010010111100011010100111111100')
('0', '01111110110', '0000011000100100110111010010111100011010100111111100')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 01111110101___0000011000100100110111010010111100011010100111111100  
B = 01111110110___0000011000100100110111010010111100011010100111111100  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 01111110101_1.0000011000100100110111010010111100011010100111111100  
B = 01111110110_1.0000011000100100110111010010111100011010100111111100  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 01111110110_0.1000001100010010011011101001011110001101010011111110  
B = 01111110110_1.0000011000100100110111010010111100011010100111111100  
Add the 1+f parts, and call the result C:   
C = 01111110110_1.1000100100110111010010111100011010100111111011111010  
Delete "1.":  
C = 01111110110___1000100100110111010010111100011010100111111011111010

Now we can see the result is 0.003

In [11]:
make_double("0", "01111110110", "1000100100110111010010111100011010100111111011111010")

0.003

### Explanation of 0.01 + 0.02 = 0.03

In [12]:
print(parse_double(0.01))
print(parse_double(0.02))

('0', '01111111000', '0100011110101110000101000111101011100001010001111011')
('0', '01111111001', '0100011110101110000101000111101011100001010001111011')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 01111111000___0100011110101110000101000111101011100001010001111011  
B = 01111111001___0100011110101110000101000111101011100001010001111011  
Prepend "1." to the fraction part, the result of which we call "1+f part":   
A = 01111111000_1.0100011110101110000101000111101011100001010001111011  
B = 01111111001_1.0100011110101110000101000111101011100001010001111011  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 01111111001_0.10100011110101110000101000111101011100001010001111011  
B = 01111111001_1.0100011110101110000101000111101011100001010001111011  
Add the 1+f parts, and call the result C:   
C = 01111111001_1.11101011100001010001111010111000010100011110101110001  
Round the fraction part:  
C = 01111111001_1.1110101110000101000111101011100001010001111010111000  
Delete "1.":  
C = 01111111001___1110101110000101000111101011100001010001111010111000  

In the rounding process, I assume that the rule is to make the lowest bit to 0.  (I do knot the "round to even" rule in detail.)

Now we can see the result is 0.03

In [13]:
make_double("0", "01111111001", "1110101110000101000111101011100001010001111010111000")

0.03

### Explanation of 0.1 + 0.2 = 0.30000000000000004

In [14]:
print(parse_double(0.1))
print(parse_double(0.2))

('0', '01111111011', '1001100110011001100110011001100110011001100110011010')
('0', '01111111100', '1001100110011001100110011001100110011001100110011010')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 01111111011____1001100110011001100110011001100110011001100110011010  
B = 01111111100____1001100110011001100110011001100110011001100110011010  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 01111111011__1.1001100110011001100110011001100110011001100110011010  
B = 01111111100__1.1001100110011001100110011001100110011001100110011010  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 01111111100__0.1100110011001100110011001100110011001100110011001101  
B = 01111111100__1.1001100110011001100110011001100110011001100110011010  
Add the 1+f parts, and call the result C:   
C = 01111111100_10.0110011001100110011001100110011001100110011001100111  
Shift the 1+f part to the right, and increase the exponent part by 1.  
C = 01111111101__1.00110011001100110011001100110011001100110011001100111  
Round the fraction part:  
C = 01111111101__1.0011001100110011001100110011001100110011001100110100  
Delete "1.":  
C = 01111111101____0011001100110011001100110011001100110011001100110100

Now we can see the result is 0.30000000000000004

In [15]:
make_double("0", "01111111101", "0011001100110011001100110011001100110011001100110100")

0.30000000000000004

### Explanation of 0.11 + 0.19 = 0.3

In [16]:
print(parse_double(0.11))
print(parse_double(0.19))

('0', '01111111011', '1100001010001111010111000010100011110101110000101001')
('0', '01111111100', '1000010100011110101110000101000111101011100001010010')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 01111111011____1100001010001111010111000010100011110101110000101001  
B = 01111111100____1000010100011110101110000101000111101011100001010010  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 01111111011__1.1100001010001111010111000010100011110101110000101001  
B = 01111111100__1.1000010100011110101110000101000111101011100001010010  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 01111111100__0.11100001010001111010111000010100011110101110000101001  
B = 01111111100__1.1000010100011110101110000101000111101011100001010010  
Add the 1+f parts, and call the result C:   
C = 01111111100__10.01100110011001100110011001100110011001100110011001101  
Shift the 1+f part to the right, and increase the exponent part by 1.  
C = 01111111101__1.001100110011001100110011001100110011001100110011001101  
Round the fraction part (in this case we should round 2 digits to make the final fraction 52 digits):  
C = 01111111101__1.0011001100110011001100110011001100110011001100110011  
Delete "1.":  
C = 01111111101____0011001100110011001100110011001100110011001100110011

Now we can see the result is 0.3

In [17]:
make_double("0", "01111111101", "0011001100110011001100110011001100110011001100110011")

0.3

### Explanation of 0.1 + 0.7 = 0.7999999999999999

In [18]:
print(parse_double(0.1))
print(parse_double(0.7))

('0', '01111111011', '1001100110011001100110011001100110011001100110011010')
('0', '01111111110', '0110011001100110011001100110011001100110011001100110')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 01111111011___1001100110011001100110011001100110011001100110011010  
B = 01111111110___0110011001100110011001100110011001100110011001100110  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 01111111011_1.1001100110011001100110011001100110011001100110011010  
B = 01111111110_1.0110011001100110011001100110011001100110011001100110  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 01111111110_0.0011001100110011001100110011001100110011001100110011010  
B = 01111111110_1.0110011001100110011001100110011001100110011001100110  
Add the 1+f parts, and call the result C:   
C = 01111111110_1.1001100110011001100110011001100110011001100110011001010  
Round the fraction part of C:  
C = 01111111110_1.1001100110011001100110011001100110011001100110011001  
Delete "1.":  
C = 01111111110___1001100110011001100110011001100110011001100110011001  

Now we can see the result is 0.7999999999999999

In [19]:
make_double("0", "01111111110", "1001100110011001100110011001100110011001100110011001")

0.7999999999999999

### Explanation of 5.8 + 53.62 = 59.419999999999995

In [20]:
print(parse_double(5.8))
print(parse_double(53.62))

('0', '10000000001', '0111001100110011001100110011001100110011001100110011')
('0', '10000000100', '1010110011110101110000101000111101011100001010001111')


Name the former as A and the latter as B.  We connect the exponent and the fraction by '__':  
A = 10000000001____0111001100110011001100110011001100110011001100110011  
B = 10000000100____1010110011110101110000101000111101011100001010001111  
Prepend "1." to the fraction part, the result of which we call "1+f part":  
A = 10000000001__1.0111001100110011001100110011001100110011001100110011  
B = 10000000100__1.1010110011110101110000101000111101011100001010001111  
Make the exponent of A as the same as that of B, and shift the 1+f part to the right:  
A = 10000000100__0.0010111001100110011001100110011001100110011001100110011  
B = 10000000100__1.1010110011110101110000101000111101011100001010001111  
Add the 1+f parts, and call the result C:   
C = 10000000100__1.1101101101011100001010001111010111000010100011110101011  
Round the fraction part of C:  
C = 10000000100__1.1101101101011100001010001111010111000010100011110101  
Delete "1.":  
C = 10000000100____1101101101011100001010001111010111000010100011110101 

Now we can see the result is 59.419999999999995

In [21]:
make_double("0", "10000000100", "1101101101011100001010001111010111000010100011110101")

59.419999999999995

# Task 3 (floating point multiplication)
According to https://www.doc.ic.ac.uk/~eedwards/compsys/float/, the floating point multiplication rule is that

1. Add the biased exponents
2. Multiply the mantissas
3. Normalise
4. Round the result

Explain the following results according to this rule:

In [22]:
print(1.3 * 3.2)
print(0.1 * 0.7)
print(1e2 * 5.4321)

4.16
0.06999999999999999
543.21


### Explanation of 1.3 * 3.2 = 4.16

In [23]:
print(parse_double(1.3))
print(parse_double(3.2))

('0', '01111111111', '0100110011001100110011001100110011001100110011001101')
('0', '10000000000', '1001100110011001100110011001100110011001100110011010')


A = 01111111111____0100110011001100110011001100110011001100110011001101  
B = 10000000000____1001100110011001100110011001100110011001100110011010  
Prepend "1." to the fraction part:  
A = 01111111111__1.0100110011001100110011001100110011001100110011001101  
B = 10000000000__1.1001100110011001100110011001100110011001100110011010  
Add the biased exponent:  
(01111111111)_2 + (10000000000)_2 - (01111111111)_2 = (10000000000)_2  
Multiply the 1+f parts:  
1.0100110011001100110011001100110011001100110011001101 * 1.1001100110011001100110011001100110011001100110011010 = 10.0001010001111010111000010100011110101110000101001000 (truncated to 55 bits)  
Then C=A*B is:  
C = 10000000000____10.0001010001111010111000010100011110101110000101001000  
Normalize:  
C = 10000000001____1.00001010001111010111000010100011110101110000101001000  
Round:  
C = 10000000001____1.0000101000111101011100001010001111010111000010100100  

I am a lazy person so I used https://www.exploringbinary.com/binary-calculator/ to multiply 1+f parts.  
Thus, we see that the result is 4.16

In [24]:
make_double("0", "10000000001", "0000101000111101011100001010001111010111000010100100")

4.16

### Explanation of 0.1 * 0.7 = 0.06999999999999999

In [25]:
print(parse_double(0.1))
print(parse_double(0.7))

('0', '01111111011', '1001100110011001100110011001100110011001100110011010')
('0', '01111111110', '0110011001100110011001100110011001100110011001100110')


A = 01111111011____1001100110011001100110011001100110011001100110011010  
B = 01111111110____0110011001100110011001100110011001100110011001100110  
Prepend "1." to the fraction part:  
A = 01111111011__1.1001100110011001100110011001100110011001100110011010  
B = 01111111110__1.0110011001100110011001100110011001100110011001100110  
Add the biased exponent:  
01111111011 + 01111111110 - 01111111111 = 01111111010  
Multiply the 1+f parts:  
1.1001100110011001100110011001100110011001100110011010 * 1.0110011001100110011001100110011001100110011001100110 = 10.00111101011100001010001111010111000010100011110101101 (truncated to 55 digits)  
Then C=A*B is:  
C = 01111111010____10.00111101011100001010001111010111000010100011110101101  
Normalize:  
C = 01111111011____1.000111101011100001010001111010111000010100011110101101  
Round:  
C = 01111111011____1.0001111010111000010100011110101110000101000111101011  

Thus, we see that the result is 0.06999999999999999

In [26]:
make_double("0", "01111111011", "0001111010111000010100011110101110000101000111101011")

0.06999999999999999

### Explanation of 1e2 * 5.4321 = 543.21

In [27]:
print(parse_double(1e2))
print(parse_double(5.4321))

('0', '10000000101', '1001000000000000000000000000000000000000000000000000')
('0', '10000000001', '0101101110100111100001101100001000100110100000001010')


A = 10000000101____1001000000000000000000000000000000000000000000000000  
B = 10000000001____0101101110100111100001101100001000100110100000001010  
Prepend "1." to the fraction part:  
A = 10000000101__1.1001000000000000000000000000000000000000000000000000  
B = 10000000001__1.0101101110100111100001101100001000100110100000001010  
Add the biased exponent:  
10000000101 + 10000000001 - 01111111111 = 10000000111  
Multiply the 1+f parts:  
1.1001000000000000000000000000000000000000000000000000 * 1.0101101110100111100001101100001000100110100000001010 = 10.0001111100110101110000101000111101011100001010001111101  
Then C=A*B is:  
C = 10000000111____10.0001111100110101110000101000111101011100001010001111101  
Normalize:  
C = 10000001000____1.00001111100110101110000101000111101011100001010001111101  
Round:  
C = 10000001000____1.0000111110011010111000010100011110101110000101001000

Thus, we see that the result is 543.21

In [28]:
make_double("0", "10000001000", "0000111110011010111000010100011110101110000101001000")

543.21

FYI : 

In [29]:
print(f"{543.21:.60f}")

543.210000000000036379788070917129516601562500000000000000000000
