## Homework Assignment 3 - Linear Regression and Polynomial Regression


### 1. Linear Regression
Use numpy to establish multi-variable linear regression model, as 
$$y = Xa + b$$
where $$y \in \mathbb{R}^{n}, X \in \mathbb{R}^{n \times m}, a \in \mathbb{R}^{m}, b \in \mathbb{R},$$
$a$ is the regression coefficient, $b$ is the intercept term, according to the least square method
$$ [a|b] = (\hat{X}^T\hat{X})^{-1}\hat{X}^{T}y,$$
where $\hat{X} = [X|I_{n}]$, with $I_n = (1,1,1,\ldots,1)^T \in \mathbb{R}^{n}$

#### Step 1. Data generation

In [2]:
import numpy as np

In [3]:
def generate_linear_model_data(n, m):
    np.random.seed(2023)
    assert (n>m)
    X = np.random.uniform(0,1,(n,m))
    a = np.random.normal(0,1,m)
    # print(a.shape)
    b = np.random.normal(0,1,1)
    y = X@a + b + np.random.normal(0, 0.3, (n))
    # print(X, "\n", a, "\n", b, "\n", y)
    return X, y, a, b

In [4]:
n = 10
m = 3
X, y, a_true, b_true = generate_linear_model_data(n, m)
print('shape of X and y is:{} and {}'.format(X.shape, y.shape))

shape of X and y is:(10, 3) and (10,)


#### Step2. Fit the data by using the least square method

In [5]:
def expand_x(X):
    n, m = X.shape
    I_n = np.ones((n,1))
    # Tips: using numpy to concatenate the matrix X and I_n
    X_expand = np.concatenate((X, I_n), 1)
    # print(X_expand.shape)
    return X_expand

In [6]:
def ls_method(X, y):
    # Tips: using np.linalg.inv() to get the inverse matrix
    X_trans = np.transpose(X)
    # print(X_trans.shape, X.shape, y.shape)
    coeff = np.linalg.inv(X_trans@X)@X_trans@y
    # print(coeff)
    a_est, b_est = coeff[:-1], coeff[-1:]
    return a_est, b_est

In [7]:
X = expand_x(X)

In [8]:
a_est, b_est = ls_method(X, y)

In [9]:
# compare the true model and the estimated one
print('The true a is {}, and the estimated a is {}'.format(a_true, a_est))
print('The true b is {}, and the estimated b is {}'.format(b_true, b_est))

The true a is [0.77174916 0.74152348 1.32476273], and the estimated a is [0.68836574 0.07102966 1.39777345]
The true b is [0.43928671], and the estimated b is [0.71355283]


### 2. Polynomial Regression
Use numpy to establish polynomial regression model.
Only need to expand the raw data $X$ to $[X | X\cdot X | X \cdot X \cdot X |...]$.

In [10]:
def expand_poly_x(X, p):
    '''
    input X: (n, m)
          p: integer
    output: X_expand: (n, m * p + 1)
    '''
    n, m = X.shape
    assert(m*p+1 <= n)
    # Tips: Iteratively multiply to get the target matrix, and don not forget to append the unit vector I_n
    X_expand = np.zeros((n, m*p))
    X_expand[:, 0:m] = X
    
    for i in range(1, p):
        X_expand[:, i*m:(i+1)*m] = X_expand[:, (i-1)*m:i*m] * X
    #     print(X_expand)
    # print(X*X)
    # print(X*X*X)
    I_n = np.ones((n,1))
    X_expand = np.concatenate((X_expand, I_n ), 1)
    return X_expand

In [14]:
p = 3
n = 10
m = 2
X, y, a_true, b_true = generate_linear_model_data(n, m)
X = expand_poly_x(X,p)

In [15]:
a_est, b_est = ls_method(X, y)

In [16]:
a_est, b_est

(array([-14.41164636,   6.48866598,  49.77857677, -14.2304274 ,
        -53.34168224,   8.25924227]),
 array([-0.72581642]))