In [1]:
using BenchmarkTools

### Chapter 1 Problems

#### Problem 1.1
*(Machine precision)* In this computer experiment we determine the machine precision $\varepsilon_M$. Starti ng with a value of 1.0, $x$ is divided repeatedly by 2 until numerical addition of 1 and $x=2^{-M} gives 1. Compare single and double precision calculations.

In [2]:
ε = Float64(1.0)
while one(ε) + ε > one(ε)
    ε /= 2
end
@show ε * 2;
@show eps(Float64(1.0));

ε * 2 = 2.220446049250313e-16
eps(Float64(1.0)) = 2.220446049250313e-16


In [3]:
ε = Float32(1.0)
while one(ε) + ε > one(ε)
    ε /= 2
end
@show ε * 2;
@show eps(Float32(1.0));

ε * 2 = 1.1920929f-7
eps(Float32(1.0)) = 1.1920929f-7


#### Problem 1.2
*(Maximum and minimum integers)* Integers are used as counters or to encode elements of a finite set like characters or colors. There are different integer formats available which store signed or unsigned integers of different length.There is no infinite integer and addition of 1 to the maximum integer gives the minimum integer.

In this computer experiment we determine the smallest and largest integer numbers. Beginning with $I=1$ we add repeatedly 1 until the condition $I+1>I$ becomes invalid or substract repeatedly 1 until $I-1<I$ becomes invalid. For the 64 bit lone integer format this takes to long. Here we multiply alternatively $I$ by 2 until $I-1<I$ becomes invalid. For the character format the corresponding ordinal number is shown which is obtained by casting the character to an integer.

In [4]:
I = Int16(1)
while I + one(I) > I
    I += one(I)
end
Max_Int16 = I
Min_Int16 = I + one(I)
@show Max_Int16
@show Min_Int16;

Max_Int16 = 32767
Min_Int16 = -32768


In [5]:
I = Int128(1)
while I - one(I) < I
    I *= Int128(2)
end
@show typemin(Int128)
@show I;

typemin(Int128) = -170141183460469231731687303715884105728
I = -170141183460469231731687303715884105728


In [6]:
function myIntMinMax(x::T) where {T<:Union{Signed, Unsigned}}
    I = T(1)
    while I - one(I) < I
        I *= T(2)
    end
    return I, I - one(I)
end

myIntMinMax (generic function with 1 method)

In [7]:
myIntMinMax(Int64(1)) == (typemin(Int64), typemax(Int64))

true

In [8]:
@btime myIntMinMax(Int128(1))

  118.297 ns (0 allocations: 0 bytes)


(-170141183460469231731687303715884105728, 170141183460469231731687303715884105727)

#### Problem 1.3
*(Truncation error)* This computer experiment approximates the cosine function by a truncated Taylor series
$$ \cos(x)\approx \mathrm{mycos}(x, n_{max})=\sum_{n=0}^{n_{max}}(-)^n \frac{x^{2n}}{(2n)!}=1-\frac{x^2}{2}+\frac{x^4}{24}-\frac{x^6}{720}+\cdots $$
in the interval $-\pi/2 < x < \pi/2$. The function $\mathrm{mycos}(x, n_{max})$ is numerically compared to the intrinsic cosine function.

In [9]:
function mycos(x, nmax)
    @assert -π/2 < x < π/2
    sum(n -> (-1)^n * x^(2n)/factorial(2n), 0:nmax)
end

mycos (generic function with 1 method)

In [10]:
x = 1.0
for n in 1:6
    ϵ = mycos(x, n) - cos(x)
    println("Truncation error of mycos($x, $n) is $ϵ.")
end

Truncation error of mycos(1.0, 1) is -0.040302305868139765.
Truncation error of mycos(1.0, 2) is 0.0013643607985268646.
Truncation error of mycos(1.0, 3) is -2.4528090362019306e-5.
Truncation error of mycos(1.0, 4) is 2.734969395401521e-7.
Truncation error of mycos(1.0, 5) is -2.076252725302652e-9.
Truncation error of mycos(1.0, 6) is 1.1422973678065773e-11.


In [11]:
@btime mycos(1.0, 5)

  186.892 ns (0 allocations: 0 bytes)


0.540302303791887