In [None]:
using LinearAlgebra
using Plots,LaTeXStrings
using Polynomials
#include("FNC.jl")
include("functions/chapter02.jl")
include("functions/chapter03.jl")

# Example 3.1.1

Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).

In [None]:
year = 1955:5:2000
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
    0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];
    
scatter(year,y,label="data",
    xlabel="year",ylabel="anomaly (ºC)",leg=:bottomright)

A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix. 

In [None]:
t = @. (year-1950)/10; 
V = [ t[i]^j for i=1:length(t), j=0:length(t)-1 ]
c = V\y

In [None]:
import Pkg

In [None]:
Pkg.status()

In [None]:
p = Polynomial(c)

In [None]:
#f = s -> p((s-1950)/10)
f(s) = p((s-1950)/10)

plot(f,1954.9,2000.1,label="interpolant",c=2)
scatter!(year,y,label="data",
    xlabel="year",ylabel="anomaly (ºC)",leg=:bottomright,c=1)

As you can see, the interpolant does represent the data, in a sense. However it's a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as _overfitting_.

# Example 3.1.2

Here are the 5-year temperature averages again.

In [None]:
year = 1955:5:2000
t = year .- 1955;
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
    0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];

The standard best-fit line results from using a linear polynomial that meets the least squares criterion. 

In [None]:
V = [ t.^0 t ]    # Vandermonde-ish matrix

In [None]:
c = V\y

In [None]:
norm(V*c - y)

In [None]:
pl = Polynomial(c)

In [None]:
fl = s -> pl(s-1955)
plot(fl,1954,2001,label="linear fit",c=2)
scatter!(year,y,label="data",
    xlabel="year",ylabel="anomaly (ºC)",leg=:bottomright,c=1)

If we use a global cubic polynomial, the points are fit more closely.

In [None]:
V = [ t[i]^j for i=1:length(t), j=0:3 ]   # Vandermonde-ish matrix

In [None]:
c = V\y

In [None]:
norm(V*c - y)

In [None]:
pc = Polynomial( c )

In [None]:
fc = s -> pc(s-1955)

plot(fl,1954,2001,label="linear fit",c=2)
plot!(fc,1954,2001,label="cubic fit",c=3)
scatter!(year,y,label="data",
    xlabel="year",ylabel="anomaly (ºC)",leg=:bottomright,c=1)

If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.

# Example 3.1.3

Finding numerical approximations to $\pi$ has fascinated people for millenia. One famous formula is

$$ \frac{\pi^2}{6} = 1 + \frac{1}{2^2} + \frac{1}{3^2} + \cdots. $$


Say $s_k$ is the sum of the first  terms of the series above, and $p_k = \sqrt{6s_k}$. Here is a fancy way to compute these sequences in a compact code.

In [None]:
#a = [1/k^2 for k=1:100] 
#s = cumsum(a)        # cumulative summation
s = cumsum(1/k^2 for k=1:100)   # avoids allocating the array a

In [None]:
p = @. sqrt(6*s)

In [None]:
plot(1:100,π .- p,m=(:o,2),yaxis=:log,leg=:none,xlabel=L"k",ylabel=L"|\pi - p_k|",title="Sequence convergence")

This graph suggests that $p_k\to \pi$ but doesn't give much information about the rate of convergence. Let $\epsilon_k=|\pi-p_k|$ be the sequence of errors. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.

In [None]:
ep = @. abs(pi-p)    # error sequence
scatter(1:100,ep,m=(:o,2),
    leg=:none,xaxis=(:log10,L"k"),yaxis=(:log10,"error"),title="Convergence of errors")

This suggests a power-law relationship where $\epsilon_k\approx a k^b$, or $\log \epsilon_k \approx b (\log k) + \log a$.

In [None]:
k = 1:100
V = [ k.^0 log.(k) ]     # fitting matrix

In [None]:
c = V \ log.(ep)         # coefficients of linear fit

In terms of the parameters $a$ and $b$ used above, we have 

In [None]:
@show (a,b) = exp(c[1]),c[2];

It's tempting to conjecture that $b\to -1$ asymptotically. Here is how the numerical fit compares to the original convergence curve. 

In [None]:
plot!(k,a*k.^b,l=:dash)

# Example 3.2.1

Because the functions $\sin^2(t)$, $\cos^2(t)$, and $1$ are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.

In [None]:
t = range(0,stop=3,length=400)
A = [ sin.(t).^2 cos.((1+1e-7)*t).^2 t.^0 ]
kappa = cond(A)

In [None]:
A'A

In [None]:
cond(A'A)

Now we set up an artificial linear least squares problem with a known exact solution that actually makes the residual zero.

In [None]:
x = [1.,2,1]
b = A*x;

Using backslash to find the solution, we get a relative error that is about $\kappa$ times machine epsilon.

In [None]:
x_BS = A\b

In [None]:
@show observed_err = norm(x_BS-x)/norm(x)
@show max_err = kappa*eps()
@show digits = -log(10,observed_err);

If we formulate and solve via the normal equations, we get a much larger relative error. With $\kappa^2\approx 10^{14}$, we may not be left with more than about 2 accurate digits.

In [None]:
N = A'*A
x_NE = N\(A'*b)

In [None]:
@show observed_err = norm(x_NE-x)/norm(x)
@show digits = -log(10,observed_err);

# Example 3.3.1

Julia provides access to both the thin and full forms of the QR factorization.

In [None]:
A = rand(1.:9.,6,4)

In [None]:
m,n = size(A)

Here is a standard call:

In [None]:
Q,R = qr(A)
Q

In [None]:
R

If you look carefully, you see that we got a full $Q$ but a thin $R$. Moreover, the $Q$ above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form. 

In [None]:
Q̂ = Matrix(Q)

We can test that $\mathbf{Q}$ is orthogonal.

In [None]:
QTQ = Q'Q

In [None]:
norm(QTQ - I)

In [None]:
Q*Q'

The thin $Q$ cannot be an orthogonal matrix, because it is not even square, but it is still ONC.

In [None]:
Q̂'*Q̂ - I

In [None]:
Q̂*Q̂'

# Test `lsnormal` and `lsqrfact`

In [None]:
t = range(0,stop=3,length=400)
A = [ sin.(t).^2 cos.((1+1e-7)*t).^2 t.^0 ]
kappa = cond(A)

In [None]:
x = [1.,2,1]
b = A*x;

In [None]:
xlsn = lsnormal(A,b)

In [None]:
norm(b - A*x)

In [None]:
norm(b - A*xlsn)

In [None]:
xlsqr = lsqrfact(A,b)

In [None]:
norm(b - A*xlsqr)

In [None]:
xbs = A\b

In [None]:
norm(b - A*xbs)

# Example 3.4.1

We will use Householder reflections to produce a QR factorization of the matrix

In [None]:
A = rand(1.:9.,6,4);
Aorig = copy(A)

In [None]:
m,n = size(A)

Our first step is to introduce zeros below the diagonal in column 1. Define the vector 

In [None]:
z = A[:,1]

Applying the Householder definitions gives us

In [None]:
sign(-4.0)

In [None]:
#v = z - norm(z)*[1;zeros(m-1)]
v = [z[1] + sign(z[1])*norm(z); z[2:end]]

In [None]:
P = I - 2/(v'*v)*(v*v')   # reflector

(Julia automatically fills in an identity of the correct size for the `I` above.) By design we can use the reflector to get the zero structure we seek:

In [None]:
P*z

Now we let 

In [None]:
A = P*A

In [None]:
Q = P

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. To put it another way, we can repeat the process we just did on the smaller submatrix

In [None]:
A[2:m,2:n]

In [None]:
z = A[2:m,2]

In [None]:
#v = z - norm(z)*[1;zeros(m-1)]
v = [z[1] + sign(z[1])*norm(z); z[2:end]]

In [None]:
P = I - 2/(v'*v)*(v*v')

We now apply the reflector to the submatrix.

In [None]:
[1 zeros(m-1)'
zeros(m-1) P]

In [None]:
A[2:m,2:n] = P*A[2:m,2:n]
A

In [None]:
Q[:,2:m] = Q[:,2:m]*P
Q

We need two more iterations of this process.

In [None]:
for j = 3:n
    z = A[j:m,j]
    #v = z - norm(z)*[1;zeros(m-j)]
    v = [z[1] + sign(z[1])*norm(z); z[2:end]]
    P = I - 2/(v'*v)*(v*v')
    A[j:m,j:n] = P*A[j:m,j:n]
    Q[:,j:m] = Q[:,j:m]*P
end

We have now reduced the original  to an upper triangular matrix using four orthogonal Householder reflections:

In [None]:
R = A

In [None]:
Q

In [None]:
norm(Q'Q - I)

In [None]:
Q*R

In [None]:
Aorig

In [None]:
F = qr(Aorig)

In [None]:
Q

In [None]:
R

In [None]:
dump(F)

In [None]:
F.factors

In [None]:
F.T

In [None]:
F.Q