# Question 1, Answer B
###### We know that deterministic noise is a directly related to the complexity of our target function and the ability of the 'best' hypothesis from our Hypotheis Set to approximate that target. It is intuitive to see that as we choose more complex hypothesis sets, the ability for our 'best' hypothesis to approximate the target function increases, therefore our deterministic noise decreases.

###### Now, we know that $H' \subset H$, therefore $H'$ is less complex than $H$. Therefore we know that the best hypothesis from $H'$ will not be able to approximate the target function as well as the best hypothesis from $H$. Therefore the deterministic noise increases if use $H'$ instead of $H$.

# Question 2, Answer A

In [65]:
import numpy as np

def load_data(file_name):
    data = np.genfromtxt(file_name)
    return data[:, :2], data[:, 2] # X in first two columns, y in last column

def non_lin_trans(X):
    Z = np.array([(1, point[0], point[1], point[0] ** 2, point[1] ** 2, 
                   point[0] * point[1], np.absolute(point[0] - point[1]), np.absolute(point[0] + point[1])) for point in X])
    return Z

def run_linear_regression(Z, y):
    ZtZ = np.dot(Z.T, Z)
    pinv_Z = np.dot(np.linalg.inv(ZtZ), Z.T)
    w = np.dot(pinv_Z, y) # w = pseudo_inv(X)*y 
    return w

def compute_error(w, Z, y):
    #for each 'point'/row z in Z, compute wTz and compare sign(wTz) with y
     # will contain 0 or 1, for correct/incorrect classification
    error_list = [(np.sign(w.T @ z) != y_point) for z, y_point in zip(Z, y)]
    return np.sum(error_list)/len(error_list)
    

In [69]:

X_train, y_train = load_data('in.dta')
Z_train = non_lin_trans(X_train)# transform data using non linear transformation
w_lin = run_linear_regression(Z_train, y_train)# one step learning using linear regression
E_in = compute_error(w_lin, Z_train, y_train)

X_test, y_test = load_data('out.dta')
Z_test = non_lin_trans(X_test)# transform data using non linear transformation
E_out = compute_error(w_lin, Z_test, y_test)

print(f'The in-sample error is {"{:.2f}".format(E_in)}, while the out of sample error is {"{:.2f}".format(E_out)}')
print('The correct answer choice is A')

The in-sample error is 0.03, while the out of sample error is 0.08
The correct answer choice is A


# Question 3, Answer D

In [80]:
# make a function that takes k as a parameter
def run_linear_regression(Z, y, lamda):
    _, cols = np.shape(Z)
    ZtZ = np.dot(Z.T, Z)
    pinv_Z = np.dot(np.linalg.inv(ZtZ + lamda * np.identity(cols)), Z.T)
    w = np.dot(pinv_Z, y) # w = pseudo_inv(X)*y 
    return w    

def regularize_lin_reg(k):
    lamda = 10 ** k # purposely misspelling lambda because it is a keyword in python
    X_train, y_train = load_data('in.dta')
    
    Z_train = non_lin_trans(X_train)# transform data using non linear transformation
    w_lin = run_linear_regression(Z_train, y_train, lamda)# one step learning using linear regression
    E_in = compute_error(w_lin, Z_train, y_train)

    X_test, y_test = load_data('out.dta')
    Z_test = non_lin_trans(X_test)# transform data using non linear transformation
    E_out = compute_error(w_lin, Z_test, y_test)
    return E_in, E_out



In [81]:
E_in, E_out = regularize_lin_reg(k=-3)
print(f'The in-sample error is {"{:.2f}".format(E_in)}, while the out of sample error is {"{:.2f}".format(E_out)}')
print('The correct answer choice is D')

The in-sample error is 0.03, while the out of sample error is 0.08
The correct answer choice is D


# Question 4, Answer E

In [85]:
E_in, E_out = regularize_lin_reg(k=3)
print(f'The in-sample error is {"{:.1f}".format(E_in)}, while the out of sample error is {"{:.1f}".format(E_out)}')
print('The correct answer choice is E')

The in-sample error is 0.4, while the out of sample error is 0.4
The correct answer choice is E


# Question 5, Answer D

In [103]:
k_s = range(-2, 3)
E_list = [regularize_lin_reg(k) for k in k_s]
E_out_list = [E[1] for E in E_list]


print(f'Error list: {E_out_list}')
print(f'List of k: {list(k_s)}')
print(f'As we can see the minimum out of sample error is E_out = {E_out_list[1]} for k = {k_s[1]}')
print('The correct answer choice is D')

Error list: [0.084, 0.056, 0.092, 0.124, 0.228]
List of k: [-2, -1, 0, 1, 2]
As we can see the minimum out of sample error is E_out = 0.056 for k = -1
The correct answer choice is D


# Question 6, Answer B

In [106]:
print(f'Minimum out of sample error: {"{:.2f}".format(min(E_out_list))}')
print('The correct answer choice is B')

Minimum out of sample error: 0.06
The correct answer choice is B


# Question 7, Answer C
##### According to the definition of $H(Q, C, Q_0)$, $H(10, 0, 3)$ will have weights vector $w$ of the form $w = (w_1, w_2, 0, 0, 0, 0, 0, 0, 0, 0)$
##### This is equivalent to a weights vector from the hypothesis set $H_2$, which will be of the form $w = (w_1, w_2)$
##### Similarly, $H(10, 0, 4)$ will have weights vector $w$ of the form $w = (w_1, w_2, w_3, 0, 0, 0, 0, 0, 0, 0)$
##### This is equivalent to a weights vector from the hypothesis set $H_3$, which will be of the form $w = (w_1, w_2, w_3)$

##### We can see that any vector in $H_2$ can be generated by $H_4$ by letting $w_3=0$. Therefore, $H_2$ is a subset of $H_3$

##### We can not make any generalizations of vectors from sets $H(10, 1, 3)$ or $H(10, 1, 4)$, except that the former set is a subset of the latter set:

##### Let $w_a \epsilon H(10, 1, 3)$. $w_a$ can thus be written as $w_a = (w_1, w_2, 1, 1, 1, 1, 1, 1, 1, 1)$
##### Let $w_b \epsilon H(10, 1, 4)$. $w_b$ can thus be written as $w_b = (w_1, w_2, w_3, 1, 1, 1, 1, 1, 1, 1)$
##### $H(10, 1, 3) \subset H(10, 1, 4)$, however, we can't make any generalizations regarding either of them being related to the definition of $H_Q$

##### a. $H_2 \cup H_3 = H_3$ Since $H_3$ is a superset, so this choice is not correct
##### b. as stated earlier, we can't make any generalizations regarding definition of $H_Q$ here
##### c. $H_2 \cap H_3 = H_2$, this is correct
##### d. can't make any generalizations again