<center>
    <img src="http://sct.inf.utfsm.cl/wp-content/uploads/2020/04/logo_di.png" style="width:60%">
    <h1> INF285 - Computación Científica </h1>
    <h2> Weigthed Linear Least Squares Problems</h2>
    <h2> <a href="#acknowledgements"> [S]cientific [C]omputing [T]eam </a> </h2>
    <h2> Version: 1.04</h2>
</center>

<div id='toc' />

## Table of Contents
* [Introduction](#intro)
* [Weighted least-square](#wLeastSquare)
    * [Example in explanation](#exampleExplanation)
    * [Extension of "Initial Example" in jupyter notebook "07_08_Least_Squares"](#extensionInitialExample)
* [Acknowledgements](#acknowledgements)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as spla
%matplotlib inline
# https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
from sklearn import datasets
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import matplotlib as mpl
mpl.rcParams['font.size'] = 14
mpl.rcParams['axes.labelsize'] = 20
mpl.rcParams['xtick.labelsize'] = 14
mpl.rcParams['ytick.labelsize'] = 14
M=8

<div id='intro' />

## Introduction
[Back to TOC](#toc)

This jupyter notebook presents the notion of weigthed linear least square problems.
It first present the theoretical background, then a few small examples and finally makes a connection with the "Initial Example" from the section "Overdetermined Linear Systems of Equations" in the notebook "07_08_Least_Squares".
We strongly suggest the reader to review that example first.

<div id='wLeastSquare' />

## Weighted Linear Least-Square Problems
[Back to TOC](#toc)

In this example we will consider we give a different weight to each term in the linear least square problem.
Mathematical, this can be seen as multiplying the least-square problem by a diagonal matrix, say $W=\text{diag}(w_1,w_2,\dots,w_m)$, with $w_i>0$.
For instance, consider the following least square example:
$$
\begin{equation}
    \widehat{\mathbf{r}} = W\,\mathbf{r} =
    \underbrace{W}_{\texttt{Weight matrix}}\,\left(\underbrace{\begin{bmatrix}
        y_1 \\
        y_2 \\
        y_3 \\
        \vdots\\
        y_m 
    \end{bmatrix}}_{\displaystyle{\mathbf{b}}}
    -
    \underbrace{\begin{bmatrix}
        1 & x_1 \\
        1 & x_2 \\
        1 & x_3 \\
        \vdots & \vdots \\
        1 & x_m
    \end{bmatrix}}_{\displaystyle{A}}
    \underbrace{\begin{bmatrix}
        a\\
        b
    \end{bmatrix}}_{\mathbf{x}}\right).
\end{equation}
$$
In this case, if $W$ is the identity matrix, all the equations have the same weight.
Now, giving a different weight to each equation, we obtain the following:
$$
\begin{equation}
    \underbrace{W\,A}_{\displaystyle{B}}\,\mathbf{x}=\underbrace{W\,\mathbf{b}}_{\displaystyle{\mathbf{c}}},
\end{equation}
$$
where, as mentioned before, we could define $W$ as follows,
$$
\begin{equation}
    W=\text{diag}(w_1,w_2,\dots,w_m).
\end{equation}
$$
So, when computing the quadratic error we obtain,
$$\begin{equation}
    E = \left\|\widehat{\mathbf{r}} \right\|_2^2 =  \left\|W\,\mathbf{r} \right\|_2^2 = \left\|W\,\left(\mathbf{b}- A\,\mathbf{x}\right)\right\|_2^2=\sum_{i=1}^m w_i^2\,(y_i-a-b\,x_i)^2.
\end{equation}
$$
This indicates that when we use a higher value of $w_i$ for some $i$'s, these equations will have a higher impact in the minimization.
From the normal equations point of view, this slightly modified problem still needs to satisfy the normal equations, which in this case become,
$$
\begin{equation}
    B^*\,B\,\overline{\mathbf{x}}_w=B^*\,\mathbf{c}.
\end{equation}
$$
Notice that we used the sub-index $w$ for $\overline{\mathbf{x}}_w$ to differentiate it from the unweighted least square solution.
Thus, caming back to the original equations, we obtain,
$$
\begin{align*}
    B^*\,B\,\overline{\mathbf{x}}_w &=B^*\,\mathbf{c},\\
    (W\,A)^*\,(W\,A)\,\overline{\mathbf{x}}_w &=(W\,A)^*\,W\,\mathbf{b},\\
    A^*\,W^*\,W\,A\,\overline{\mathbf{x}}_w &=A^*\,W^*\,W\,\mathbf{b},\\
    A^*\,W^2\,A\,\overline{\mathbf{x}}_w &=A^*\,W^2\,\mathbf{b},\\
    \overline{\mathbf{x}}_w &=(A^*\,W^2\,A)^{-1}\,A^*\,W^2\,\mathbf{b}.\\
\end{align*}
$$
So, if $W$ is the identity matrix or a non-zero multiple of it, i.e. $W=\alpha\,I$ for $\alpha\neq0$, the weigthed least square problem reduces to,
$$
\begin{align*}
    \overline{\mathbf{x}} &=(A^*\,\alpha^2\,I^2\,A)^{-1}\,A^*\,\alpha^2\,I^2\,\mathbf{b},\\
    \overline{\mathbf{x}} &=(A^*\,A)^{-1}\,A^*\,\mathbf{b},\\
\end{align*}
$$
which means that it obtains the tradicional linear least-square apporximation.
**However, if the weights are not equal, we get a different approximation**.

It is important to point out that, from the least equare point of view, any algebraic modification we do to any equation, it will change its respective weight respect to the other equations.
For instance, if we have the following problem,
$$
\begin{equation}
   \widetilde{\mathbf{r}} = W\,\mathbf{r}=
    \begin{bmatrix}
        4\,y_1 \\
        y_2 \\
        y_3 
    \end{bmatrix}
    -
    \begin{bmatrix}
        4 & 4\,x_1 \\
        1 & x_2 \\
        1 & x_3
    \end{bmatrix}
    \begin{bmatrix}
        a\\
        b
    \end{bmatrix}.
\end{equation}
$$
It will have a different least square solution since we have a weight of $4$ for the first equation.
This is because we are minimizing the residual $\left\|\widetilde{\mathbf{r}}\right\|_2^2$.

On the other hand, if we had cancelled out the coefficient $4$, we would be solving,
$$
\begin{equation}
    \widehat{\mathbf{r}}
    =
    \begin{bmatrix}
        y_1 \\
        y_2 \\
        y_3 
    \end{bmatrix}
    -
    \begin{bmatrix}
        1 & x_1 \\
        1 & x_2 \\
        1 & x_3
    \end{bmatrix}
    \begin{bmatrix}
        a\\
        b
    \end{bmatrix},
\end{equation}
$$
which minimizes the norm of a different residual vector, which is $\left\|\mathbf{r}\right\|_2^2$.
Therefore, this is why they generate different linear least-square sollutions.

**It is very important to hightlight that this weight effect is important because we are dealing with overdetermined linear system of equation.**
In the case we deal with an **square and non-singular linear system of equations this does not change the solution** in exact arithmetic, for instance, the analysis changes to,
$$
\begin{align*}
    B^*\,B\,\mathbf{x} &= B^*\,\mathbf{c},\\
    (W\,A)^*\,(W\,A)\,\mathbf{x} &= (W\,A)^*\,W\,\mathbf{b},\\
    A^*\,W^*\,W\,A\,\mathbf{x} &=A^*\,W^*\,W\,\mathbf{b},\\
    A^*\,W^2\,A\,\mathbf{x} &=A^*\,W^2\,\mathbf{b},\\
    \mathbf{x} &=(A^*\,W^2\,A)^{-1}\,A^*\,W^2\,\mathbf{b},\\
    \mathbf{x} &=A^{-1}\,W^{-2}\,A^{-*}\,A^*\,W^2\,\mathbf{b},\\
    \mathbf{x} &=A^{-1}\,W^{-2}\,\,W^2\,\mathbf{b},\\
    \mathbf{x} &=A^{-1}\,\mathbf{b}.\\
\end{align*}
$$
The key difference is on the sixth line.
In this case there exists the inverse matrix of each matrix in the parenthesis, which was not true previously.

<div id='exampleExplanation' />

### Example in explanation
[Back to TOC](#toc)

In [2]:
x1,x2,x3 = 1,2,3
y1,y2,y3 = 5,-1,3
A1 = np.ones((3,2))
A1[:,1]=[x1,x2,x3]
b1=np.array([y1,y2,y3])
A1[0,:]*=4
b1[0]*=4
x1_bar=np.linalg.solve(A1.T @ A1,A1.T @ b1)
print('A1: ', A1)
print('b1: ', b1)
print('x1_bar: ', x1_bar)

A2 = np.ones((3,2))
A2[:,1]=[x1,x2,x3]
b2=np.array([y1,y2,y3])
x2_bar=np.linalg.solve(A2.T @ A2,A2.T @ b2)
print('A2: ', A2)
print('b2: ', b2)
print('x2_bar: ', x2_bar)

A1:  [[4. 4.]
 [1. 2.]
 [1. 3.]]
b1:  [20 -1  3]
x1_bar:  [ 6.80246914 -1.92592593]
A2:  [[1. 1.]
 [1. 2.]
 [1. 3.]]
b2:  [ 5 -1  3]
x2_bar:  [ 4.33333333 -1.        ]


We clearly observe that the least square solutions are different!
This is consistent with the previous explanation.

<div id='extensionInitialExample' />

### Extension of "Initial Example" in jupyter notebook "07_08_Least_Squares"
[Back to TOC](#toc)


In this example we will approximate $m$ points considering a linear relationship.
This means that we will have the data points $(x_i,y_i)$ for $i\in\{1,2,\dots,m\}$ and consider the relationshop $y=a_0+a_1\,b$.
The error that will be added follows a normal distribution, but we will consider we use the following weight matrix,
$$
\begin{equation}
    W=\text{diag}(w,w,1,\dots,1).
\end{equation}
$$
This means we will more weight to the first twio equations, this is arbitrarily, and it is only use to show what effects brings to the least square problem.

The example will only show the approximation output since we already saw all the other components in the original example.

### Question to think before to modify the weight $w$: What would you expect to happen with the approximation if you use a large value for $w$?

In [3]:
def showWeightedOutput(w=1):
    # Number of points to be used
    m = 10
    # Relationship considered
    fv = np.vectorize(lambda x, a0, a1: a0+a1*x)
    # Coefficients considered
    a0, a1 = 1, 4

    np.random.seed(0)
    # Standard deviation for the error
    sigma = 5e-1
    # Error to be added
    e = np.random.normal(0,sigma,m)

    # Generating data points
    x = np.linspace(0,1,m)
    y = fv(x,a0,a1)+e

    # Build the data matrix
    A = np.ones((m,2))
    A[:,1] = x
    # Setting up the right hand side
    b = np.copy(y)
    A[:2,:]*=w
    b[:2]*=w

    # Building and solving the normal equations
    # A^T A x_bar = A^T b
    x_bar = np.linalg.solve(A.T @ A, A.T @ b)
    # Showing the comparison between the "original function" and the "least-squared reconstructed approximation".
    # We added in red a "sample" of possible functions.
    # Notice that the colors used follow the description included in the classnotes.
    # This means to consider the following analogy:
    # blue: data points, this correspond to the right-hand-side vector "b".
    # red: this correspond to the sub-space generated by Ax, i.e. the span of the columns of A.
    # violet: This correspond to the least-square solution found.
    ####
    # plt.figure(figsize=(10,10))
    # for i in range(100):
    #     plt.plot(x,fv(x,x_bar[0]+np.random.normal(0,1),x_bar[1]+np.random.normal(0,1)),'r-',linewidth=1,alpha=0.2)
    # plt.plot(x,fv(x,a0,a1),'k-',linewidth=8,alpha=0.8)
    # plt.plot(x,fv(x,x_bar[0],x_bar[1]),'--',color='darkviolet',linewidth=4)
    # plt.plot(x,fv(x,x_bar[0],x_bar[1]),'r.',markersize=20)
    # plt.plot(x,y,'b.',markersize=10)
    # plt.grid(True)
    # plt.xlabel(r'$x$')
    # plt.ylabel(r'$y$')
    #
    plt.plot(x,y,'b.',markersize=20, label='Data points')
    plt.plot(x,fv(x,x_bar[0],x_bar[1]),'r.', markersize=30, label='Approximated points')
    plt.plot(x,fv(x,a0,a1),'k-',linewidth=8,alpha=0.8, label='Original function')
    plt.plot(x,fv(x,x_bar[0],x_bar[1]),'--',color='darkviolet',linewidth=4, label='Least Square approximation')
    plt.plot(x,fv(x,x_bar[0]+np.random.normal(0,1),x_bar[1]+np.random.normal(0,1)),'r-',linewidth=1,alpha=0.2, label='Non-optimal solutions')
    plt.grid(True)
    plt.xlabel(r'$x$')
    plt.ylabel(r'$y$')
    plt.legend(loc='lower left', ncol=1, fancybox=True, shadow=True, numpoints=1, bbox_to_anchor=(1,0))
    #
    plt.show()
interact(showWeightedOutput,w=(0.01,101,0.01))

interactive(children=(FloatSlider(value=1.0, description='w', max=101.0, min=0.01, step=0.01), Output()), _dom…

<function __main__.showWeightedOutput(w=1)>

<div id='acknowledgements' />

# Acknowledgements
[Back to TOC](#toc)
* _Material created by professor Claudio Torres_ (`ctorres@inf.utfsm.cl`) DI UTFSM. June 2021.- v1.0.
* _Update May 2022 - v1.01 - C.Torres_ : Adding \$\$ in Markdown for LaTeX.
* _Update May 2024 - v1.02 - C.Torres_ : Updating the use of strings for Python 3.12.*.
* _Update Nov 2024 - v1.03 - C.Torres_ : Including now the use of the residual vector.
* _Update Nov 2024 - v1.04 - C.Torres_ : Updating the explanation.