# Observing Imprecisions in Floating-Point Arithmetic

- Imprecisions are present in floating point arithmetic
- Machines store finite representations of data, and inexact values must be rounded
- Even "exact" values at different magnitudes cannot necessarily be used together without the introduction of error
- You can demonstrate this with Python3 out-of-the-box

In [1]:
def count_to_one(N):
    step = 1.0 / N
    return step, step*N, sum([step for _ in range(int(N))])

- The function above should return the number 1: we're adding N steps of size 1/N together

In [2]:
N = 1e3
print(count_to_one(N))

(0.001, 1.0, 1.0000000000000007)


- That didn't give us the number 1...
- It's _close_, but not exact. Is it random/probabilistic?
- Let's run it 5 more times

In [3]:
N = 1e3
for _ in range(5):
    print(count_to_one(N))

(0.001, 1.0, 1.0000000000000007)
(0.001, 1.0, 1.0000000000000007)
(0.001, 1.0, 1.0000000000000007)
(0.001, 1.0, 1.0000000000000007)
(0.001, 1.0, 1.0000000000000007)


- So we're getting the _same_ *wrong* answer each time
- Does the value we pick for N make a difference?

In [4]:
for N in [1e3, 1e6]:
    print("N = {0:.0e}".format(N))
    for _ in range(5):
        print("  {0}".format(count_to_one(N)))

N = 1e+03
  (0.001, 1.0, 1.0000000000000007)
  (0.001, 1.0, 1.0000000000000007)
  (0.001, 1.0, 1.0000000000000007)
  (0.001, 1.0, 1.0000000000000007)
  (0.001, 1.0, 1.0000000000000007)
N = 1e+06
  (1e-06, 1.0, 1.000000000007918)
  (1e-06, 1.0, 1.000000000007918)
  (1e-06, 1.0, 1.000000000007918)
  (1e-06, 1.0, 1.000000000007918)
  (1e-06, 1.0, 1.000000000007918)


- we see both under- and over-estimations that are consistent for each value of N we tried
- it so happens, decimal multiples of 10 cannot be stored in finite bits (like many, many other decimal numbers)
- the initial division in our function introduced a bit of error
- the subsequent addition of this incorrect value compounded the difference
- does this problem also exist for other datatypes/numpy?

In [5]:
import numpy as np

def count_to_one(N, dtype=float, sum_only=False):
    N = dtype(N)
    step = dtype(1.0) / N
    if sum_only:
        return np.sum([step for _ in range(int(N))])
    return step, step*N, np.sum([step for _ in range(int(N))])

In [6]:
for N in [1e1, 1e2, 1e3, 1e6]:
    print("N = {0:.0e} ({0})".format(N))
    for dt in [int, np.int32, np.int64, float, np.float32, np.float64]:
        estimate_of_one = count_to_one(N, dt)
        print("  {0} ({1} => {2})".format(estimate_of_one[2],
                                          dt.__name__,
                                          type(estimate_of_one[2]).__name__))

N = 1e+01 (10.0)
  1.0 (int => float64)
  1.0 (int32 => float64)
  1.0 (int64 => float64)
  1.0 (float => float64)
  1.0 (float32 => float32)
  1.0 (float64 => float64)
N = 1e+02 (100.0)
  0.9999999999999999 (int => float64)
  0.9999999999999999 (int32 => float64)
  0.9999999999999999 (int64 => float64)
  0.9999999999999999 (float => float64)
  0.9999998211860657 (float32 => float32)
  0.9999999999999999 (float64 => float64)
N = 1e+03 (1000.0)
  1.0000000000000004 (int => float64)
  1.0000000000000004 (int32 => float64)
  1.0000000000000004 (int64 => float64)
  1.0000000000000004 (float => float64)
  1.0000001192092896 (float32 => float32)
  1.0000000000000004 (float64 => float64)
N = 1e+06 (1000000.0)
  0.9999999999999981 (int => float64)
  0.9999999999999981 (int32 => float64)
  0.9999999999999981 (int64 => float64)
  0.9999999999999981 (float => float64)
  1.000000238418579 (float32 => float32)
  0.9999999999999981 (float64 => float64)


- since it is deterministic, this is obvious, but if we average over many executions we'll *still* get the wrong answer!

In [7]:
def iter_and_avg(cmd, args=[], kwargs={}, niters=100, verbose=False):
    results = np.array([])
    for _ in range(niters):
        results = np.append(results, cmd(*args, **kwargs))
        if verbose:
            print(results[-1])
    return np.mean(results), np.std(results)

In [8]:
iter_and_avg(count_to_one, args=[1e3], kwargs={"sum_only": True}, niters=30)

(1.0000000000000004, 0.0)

- just for sanity, let's try a few more iterations...

In [9]:
iter_and_avg(count_to_one, args=[1e3], kwargs={"sum_only": True}, niters=50)

(1.0000000000000002, 2.220446049250313e-16)

- What?! But it's deterministic... let's print the results...

In [10]:
iter_and_avg(count_to_one, args=[10000], kwargs={"sum_only": True}, niters=100, verbose=True)

1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000000004
1.0000000000

(1.0000000000000002, 2.220446049250313e-16)

- We see the same value throughout...
- Even our mean and standard deviation of these values is breaking!
- Ok, if we're always going to be wrong, maybe we should embrace being wrong in a structured way

### Monte Carlo Arithmetic


... in the next notebook