# Question 1, Answer C
##### We can approach this problem by solving the provided equation for $N$ in terms of $\sigma$, $d$, and $E_D[E_{in}(w_{lin})]$
##### By simple algebra, we can arrive at 
$$ N = \frac{\sigma ^2 (d+1)}{\sigma ^2 - E_D[E_{in}(w_{lin})]}$$

In [1]:
import math

sigma = 0.1
d = 8
E_in_exp = 0.008

N_min = ((sigma ** 2) * (d + 1)) / ( (sigma ** 2) - E_in_exp)
print(f'A minimum of {math.ceil(N_min)} samples needed to achieve an expected in sample error of {E_in_exp}')
print(f'As we can see, out of the answer choice, the smallest number of samples to guarantee an in sample error of at least {E_in_exp} has to be greater than {math.ceil(N_min)}. Therefore, the correct answer is C, 100 samples.')

A minimum of 45 samples needed to achieve an expected in sample error of 0.008
As we can see, out of the answer choice, the smallest number of samples to guarantee an in sample error of at least 0.008 has to be greater than 45. Therefore, the correct answer is C, 100 samples.


# Question 2, Answer D

##### the hypothesis is of the form: $h(x) = sign(w_0 + w_1x_1^2 + w_2x_2^2)$
##### At $(x_1, x_2) = (0, 0)$, the hypothesis should classify the point as +1
##### At this point, $h(x) = sign(w_0)$, and if h(x) classifies the point as +1, then $w_0 > 0$
##### When $x_1$ is large, and $x_2 = 0$, then the hypothesis should classify the point as -1
##### At this point, $h(x) = sign(w_0 + w_1x_1^2) = -1$, this implies that $w_1 < 0$
##### When $x_2$ is large, and $x_1 = 0$, then the hypothesis should classify the point as +1
##### At this point, $h(x) = sign(w_0 + w_2x_2^2) = +1$, this implies that $w_2 > 0$
##### This answer aligns with answer choice D

# Question 3, Answer C
##### There are 14 coordinates and 1 bias coordinate in $\Phi_4$
##### This leads to a $d_{vc} = 15$
##### The smallest value that is not smaller than the vc dimension is 15 among the answer choices, which is choice C

# Question 4, Answer E
##### $E(u, v) = (ue^v - 2ve^{-u})^2$
##### $\frac{\partial{E}}{\partial{u}} = 2(ue^v - 2ve^{-u})(e^v + 2ve^{-u})$
##### This corresponds to answer choice E

# Question 5, Answer D

In [2]:
import numpy as np

def E(u, v):
    return (u*np.exp(v) - 2*v*np.exp(-u)) ** 2

def deltaE(u, v):
    partialE_partialu = 2 * (u*np.exp(v) - 2*v*np.exp(-u)) * (np.exp(v) + 2 *v* np.exp(-u))
    partialE_partialv = 2 * (u*np.exp(v) - 2*v*np.exp(-u)) * (u*np.exp(v) - 2*np.exp(-u))
    return np.array([partialE_partialu, partialE_partialv])

eta = 0.1 # learning rate
error_thresh = 10 ** -14 # threshold error

u, v = (1, 1) # initializing as 1, 1
iterations = 0
while E(u, v) > error_thresh:
    iterations += 1
    gradE = deltaE(u, v)
    
    # update weights vector
    u -= eta * gradE[0]
    v -= eta * gradE[1]
    
print(f'{iterations} iterations taken for algorithm to converge')
print('Therefore the correct answer choice is D')

10 iterations taken for algorithm to converge
Therefore the correct answer choice is D


# Question 6, Answer E

In [3]:
print('Final Weights: ')
print("{:.3f}".format(u), "{:.3f}".format(v))
print('Therefore the correct answer choice is E')

Final Weights: 
0.045 0.024
Therefore the correct answer choice is E


# Question 7, Answer A

In [4]:
import numpy as np

def E(u, v):
    return (u*np.exp(v) - 2*v*np.exp(-u)) ** 2

def deltaE(u, v):
    partialE_partialu = 2 * (u*np.exp(v) - 2*v*np.exp(-u)) * (np.exp(v) + 2 *v* np.exp(-u))
    partialE_partialv = 2 * (u*np.exp(v) - 2*v*np.exp(-u)) * (u*np.exp(v) - 2*np.exp(-u))
    return np.array([partialE_partialu, partialE_partialv])

eta = 0.1 # learning rate
error_thresh = 10 ** -14 # threshold error

u, v = (1, 1) # initializing as 1, 1
for i in range(15):
    gradE = deltaE(u, v)
    u -= eta * gradE[0]
    gradE = deltaE(u, v)
    v -= eta * gradE[1]

print('Final error measure')
print(E(u, v))
print('Therefore the correct answer choice is A, 0.1')

Final error measure
0.13981379199615315
Therefore the correct answer choice is A, 0.1


# Question 8, Answer D
# Question 9, Answer A

In [63]:
import numpy as np
import matplotlib.pyplot as plt

N_experiments = 100
N_points = 100
N_test_points = 1000
eta = 0.01

def generate_line():
    # generate two random points
    p1 = np.random.uniform(-1, 1, size=2)
    p2 = np.random.uniform(-1, 1, size=2)
    
    # Using points to get the parameters m and c for y = mx + c
    m = (p1[1] -p2[1])/(p1[0] - p2[0]) # m = (y1 - y2)/ (x1 -x2)
    c = p1[1] - m*p1[0] # c = y - mx   
    
    w_target = np.array([c, m, -1])
    
    return w_target


def get_input_points(num_points):
    X = np.random.uniform(-1, 1, size=(num_points, 2))
    X = np.transpose(X)
    ones = np.ones((1,num_points))
    X = np.concatenate((ones, X), axis = 0) # concatenate ones to account for bias term
    X = np.transpose(X)# X= vector containing points input x as its rows
    return X

def generate_dataset(w_target, N_points):
    X = get_input_points(N_points)
    
    y_n = np.sign(np.dot(X, w_target))
    
    return X, y_n


def deltaE(w, x, y):
    return (-y * x)/(1 + np.exp(y  * np.dot(w, x)))
    
def logistic_reg_fit(w_target, X, y_n):
    epochs = 0
    w = np.zeros(np.shape(w_target))
    prev_w = np.ones(np.shape(w_target))
    
    while True:     
        prev_w = w
        
        for n in np.random.permutation(N_points):
            gradE = deltaE(w, X[n,:], y_n[n])
            w = w - eta * gradE
        
        epochs += 1
        
        if np.linalg.norm(w - prev_w) < 0.01:
            break
    return w, epochs

def cross_entropy_error(w, x, y):
    return np.log(1 + np.exp(-y * np.dot(w, x)))


def calculate_E_out(w_log_reg, X_test, y_n_test):
    point_errs = []
    
    for point_index in range(X_test.shape[0]):
        point_errs.append(cross_entropy_error(w_log_reg, X_test[point_index, :], y_n_test[point_index]))
    
    E_out = np.mean(point_errs)
    return E_out

E_out_list = []
epochs_list = [] 
for i in range(N_experiments):
    w_target = generate_line()
    X, y_n = generate_dataset(w_target, N_points) # get target function and training points

    # fitting to training points
    w_final, epochs = logistic_reg_fit(w_target, X, y_n)

    # test points, cross entropy error
    X_test, y_n_test = generate_dataset(w_target, N_test_points)

    E_out = calculate_E_out(w_final, X_test, y_n_test)
    E_out_list.append(E_out)
    epochs_list.append(epochs)

print(f'Mean out of sample error, {np.mean(E_out_list)}')
print('Therefore, Question 8 Answer Choice D is correct')

print(f'Mean number of epochs needed for convergence, {np.mean(epochs_list)}')
print('Therefore, Question 9 Answer Choice A is correct')

Mean out of sample error, 0.10199060311763469
Therefore Answer Choice D is correct


# Question 10, Answer E
##### 