# WS02: Floating point number systems

These exercises are indented to give you practice at using the material on numerical approximation and are intended to reinforce the material that was covered in lectures.

Please attempt the worksheet before your tutorial. Support is available in your tutorial or in the Class Team.

# Part a (pen and paper warm up)

### 1. Number systems

Consider the number system given by $(\beta, t, L, U) = (10, 3, -3, 3)$ which gives

$$
x = \pm .b_1 b_2 b_3 \times 10^e \text{ where } -3 \le e \le 3.
$$

-  How many numbers can be represented by this normalised system?

-  What are the two largest positive numbers in this system?

-  What are the two smallest positive numbers?

-  What is the smallest possible difference between two numbers in this system?

-  What is the smallest possible difference in this system, $x$ and $y$, for which $x < 100 < y$?

### 2. Do floating point operations commute?

-  Let

    $$
     x = .85 \times 10^0, \quad
     y = .3 \times 10^{-2}, \quad
     z = .6 \times 10^{-2},
    $$
    
    in the system $(\beta, t, L, U) = (10, 2, -3, 3)$. Evaluate the following expression in this number system.
    
    $$
    x+(y+y), \quad
    (x+y)+y, \quad
    x+(z+z), \quad
    (x+z) +z.
    $$
   
    (Also note the benefits of adding the *smallest* terms first!)

 
-  Given the number system $(\beta, t, L, U) = (10, 3, -3, 3)$ and $x = .100\times 10^3$, find nonzero numbers $y$ and $z$ from this system for which $fl(x+y) = x$ and $fl(x+z) > x$.

# Part b (floating point numbers in python)

### 3. Floating point types in `numpy`

`numpy` has information about it's floating point types built in. Run this code block and interpret the answers.

In [1]:
import numpy as np

print(help(np.finfo))

for dtype in [float, np.double, np.single, np.half]:
    print(dtype.__name__, np.finfo(dtype))

Help on class finfo in module numpy:

class finfo(builtins.object)
 |  finfo(dtype)
 |  
 |  finfo(dtype)
 |  
 |  Machine limits for floating point types.
 |  
 |  Attributes
 |  ----------
 |  bits : int
 |      The number of bits occupied by the type.
 |  eps : float
 |      The difference between 1.0 and the next smallest representable float
 |      larger than 1.0. For example, for 64-bit binary floats in the IEEE-754
 |      standard, ``eps = 2**-52``, approximately 2.22e-16.
 |  epsneg : float
 |      The difference between 1.0 and the next smallest representable float
 |      less than 1.0. For example, for 64-bit binary floats in the IEEE-754
 |      standard, ``epsneg = 2**-53``, approximately 1.11e-16.
 |  iexp : int
 |      The number of bits in the exponent portion of the floating point
 |      representation.
 |  machar : MachAr
 |      The object which calculated these parameters and holds more
 |      detailed information.
 |  machep : int
 |      The exponent that yiel

### 4. Equality of floating point representations

When working with floating point numbers you cannot simply test for equality (see also [this stackoverflow question](https://stackoverflow.com/questions/4915462/how-should-i-do-floating-point-comparison/4915891#4915891)).

We want to test if we have computed the square root of two accurately.

In [2]:
a = np.sqrt(2.0)
print(f"a={a}, type(a)={type(a)}")

b = a*a
print(f"b={b}")

print(b == 2.0)
print(b-2.0)

a=1.4142135623730951, type(a)=<class 'numpy.float64'>
b=2.0000000000000004
False
4.440892098500626e-16


Disaster! One option is to use [`numpy.isclose`](https://numpy.org/doc/stable/reference/generated/numpy.isclose.html), which roughly equivalent to this code:

In [3]:
def my_isclose(x, y, tol=1.0e-9):
    return abs(x - y) < tol

print(my_isclose(a*a, b))
print(np.isclose(a*a, b))

True
True


### 5. What's the best way to write a function

Show that two functions

   $$
   f(x) = x ( \sqrt{x+1} - \sqrt{x}) \qquad \mbox{ and } \qquad
   g(x) = \frac{x}{\sqrt{x+1} + \sqrt{x}},
   $$
   are equivalent.
   Evaluate $f(500)$ and $g(500)$ using double precision (`np.float64`), single precision (`np.float32`) and half precision (`np.float16`). You should use `numpy.sqrt` to compute square roots.
   Explain why these answers are different and comment on which is the more accurate and why.

In [4]:
import numpy as np

def f(x):
    return x*(np.sqrt(x+1)-np.sqrt(x))

def g(x):
    return x/(np.sqrt(x+1)+np.sqrt(x))

In [5]:
x =  np.float64(500)
print("f=", f(x), "g=", g(x))
x =  np.float32(500)
print("f=", f(x), "g=", g(x))
x =  np.float16(500)
print("f=", f(x), "g=", g(x))

f= 11.174755300746853 g= 11.174755300747199
f= 11.174829567274003 g= 11.174755337843376
f= 11.827142799695878 g= 11.175081178213036


### 6. Ordering of floating point calculations in practice

For this question you are required to write a function `pySquared`. This function should make use of the formula

$$
\frac{1}{6} \pi^2 = \sum_{k=1}^\infty \frac{1}{k^2}
$$

to estimate the value of $\pi^2$ by computing the partial sum

$$
\pi^2 \approx 6 \sum_{k=1}^n \frac{1}{k^2}
$$

for large values of $n$.

a.  Using double precision and the predefined constant `pi`, produce a table that gives the absolute error in the approximation to $\pi^2$ that you obtain using your function `piSquared` with $n = \{10^6, 10^7, 10^8, 10^9\}$. What is the difference between your answers when $n=10^8$ and $10^9$? Can you explain what is happening?

In [6]:
def piSquared(n):
    piS = 0.0
    k=1.0
    while k < (n + 0.1) :
        piS = piS + 1./k/k
        k = k + 1.0
        
    return 6.0*piS

n=1.e6
for i in range(4):
    piS = piSquared(n)
    print(f"n={n}", f"piS={piS}", f"error={np.abs(piS - np.pi*np.pi)}")
    n = n*10.0

n=1000000.0 piS=9.86959840109262 error=5.999996737671154e-06
n=10000000.0 piS=9.869603801083557 error=6.000058014876686e-07
n=100000000.0 piS=9.86960434700745 error=5.40819087291311e-08
n=1000000000.0 piS=9.86960434700745 error=5.40819087291311e-08


b.  Now modify the function `piSquared` to compute the same sum of the $n$ terms but to add them up in the opposite order (i.e., start with the smallest terms first - which correspond to the largest values of $k$ first). Call your new script `piSquared_v2`.

In [7]:
def piSquared_v2(n):
    piS = 0.0
    k = n
    while k > 0.5 :
        piS = piS + 1./k/k
        k = k - 1.0
        
    return 6.0*piS

n=1.e6
for i in range(4):
    piS = piSquared_v2(n)
    print(f"n={n}", f"piS={piS}", f"error={np.abs(piS - np.pi*np.pi)}")
    n = n*10.0

n=1000000.0 piS=9.869598401092357 error=5.999997000571966e-06
n=10000000.0 piS=9.869603801089388 error=5.999999697081648e-07
n=100000000.0 piS=9.869604341089358 error=5.999999963535174e-08
n=1000000000.0 piS=9.869604395089357 error=6.000000496442226e-09


Produce a table that gives the absolute error in the approximation to $\pi^2$ that you obtain using your modified `piSquared_v2` with $n = \{10^6, 10^7, 10^8, 10^9\}$. What difference do you see when using the modified function? What is the explanation for this function being superior to the original version?