In [None]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.cross_decomposition import CCA
import random

#### 1) Run the following cell to obtain the data points $(x,y)$ in tabular format. Replace the NaN values with values that make intuitive sense given the other data. Feel to free use the data generating process for $y$, provided in the code below.

In [None]:
x = np.linspace(-5, 5, 100)
y = 3 * x - 10 + np.random.normal(0, 2, 100)
y[0] = 'Nan'
y[24] = 'Nan'
y[50] = 'Nan'
y[78] = 'Nan'
y[99] = 'Nan'
dat = pd.DataFrame({'x':x, 'y':y}).sample(frac = 1)
dat.reset_index(drop=True, inplace=True)
dat

In [None]:
# SOLUTION
for i in range(len(dat['y'])):
    if dat['y'].isnull()[i]:
        # a reasonable prediction using the x-value
        dat['y'][i] = dat['x'][i] * 3 - 10

#### 2) Compute the SVD of the following matrix and print its singular values (in array or matrix form).

In [None]:
# SOLUTION
A = np.array([2, 1, -2, 1, 0, 0, 0, 1, 0]).reshape(3,3)
u, s, v = np.linalg.svd(A)
s

#### 3) Order the eigenvalues of the following matrix and find the eigenvector corresponding to the largest eigenvalue. Hint: you can use the $\verb|np.argmax|$ function.

In [None]:
# SOLUTION
A = np.array([2, 1, -2, 1, 0, 0, 0, 1, 0]).reshape(3,3)
l, v = np.linalg.eig(A)
max_l = np.argmax(l)
v[max_l]


#### 4) Find the SVD of the following matrix and scale the last two singular values to $0.1s_i(i+1)$, where $s_i$ is the $i^{th}$ singular value. Print the original X and the new X created with the scaled singular values and compare.

In [None]:
# SOLUTION
X = np.array([3,4,5,10,16,2,-1,2,1]).reshape(3,3)
u, s, v = np.linalg.svd(X)
for i in range(1, X.shape[1]):
    s[-i] = 0.1 * s[-i] * (3-i)
s = np.diag(s)
u @ s @ v, X

#### 5) You have some data $x$ and $y$ provided below. Assume the true generating function is linear, i.e. $y = w_0 + w_1 x + \epsilon$ where $\epsilon$ is random noise. First, construct a feature matrix $X$ with a column for  **$1$**'s, $x$, and $x^2$.

In [None]:
# SOLUTION
x = np.linspace(-5, 5, 100)
y = 3*x + 3 + np.random.normal(0, 1, 100)
ones = np.ones(100)
x_sq = x ** 2
full_arr = np.append(ones, x)
X = np.append(full_arr, x_sq).reshape(3, 100).T

#### 6) Run PCA on your feature matrix $X$ with $\verb|n_components = 2|$. Report your observations.

In [None]:
# SOLUTION
pca = PCA(n_components = 2)
pca.fit(X)
pca_X = pca.transform(X)

#### 7) Now, run CCA on your feature matrix $X$. Report the difference between PCA and CCA for this problem.

In [None]:
# SOLUTION
cca = CCA(n_components=2)
cca.fit(X, y)
cca_X = cca.transform(X)

#### 8) We have provided some data with various classes below. You are tasked with running regression on the classes, similar to the Baseball problem. First, create a featurized matrix that one-hot-encodes the classes. Only create columns for class B and C. Class A will act as the "base case."

In [None]:
x = np.linspace(-5, 5, 100)
y = 3 * x - 10 + np.random.normal(0, 2, 100)
classes = ['A', 'B', 'C']
classes_dat = random.choices(classes, k=100)
dat = pd.DataFrame({'x':x, 'y':y, 'Class':classes_dat}).sample(frac = 1)
dat.reset_index(drop=True, inplace=True)
dat

In [None]:
# SOLUTION
temp_array = np.array(dat['Class'])
for c in classes[-2:]:
    indicator = np.zeros(len(dat['Class']))
    for j in range(len(temp_array)):
        if temp_array[j] == c:
            indicator[j] = 1
    #indicator = pd.Series(indicator, name = position)
#     print(indicator)
#     print(len(temp_array))
    dat[c] = indicator
dat

#### 9) What is the main difference between CCA and PCA?

**Solution:** PCA is an unsupervised strategy, while CCA is supervised. PCA will find the features corresponding to the largest eigenvalues, which give the direction of highest linear stretch, indicating these are the features with maximum variability. CCA tries to maximize the correlation between two datasets in order to create the best linear model.

#### 10) Explain why during one-hot-encoding, we keep one class as the "base case." What would happen if we one-hot-encoded all the classes and tried to solve for the least squares solution? Hint: recall the least squares solution involves the term $(X^T X)^{-1}$. In order for $X^T X$ to be invertible, what must be true?

**Solution:** Consider a basic example in which there are two classes, A and B, and only five data points. Suppose you one-hot-encode these two classes to obtain a vector $I_A = (1, 0, 1, 1, 1)$ and $I_B = (0, 1, 0, 0, 0)$. It is very clear that $I_B = 1 - I_A$ (by definition, you must be either class A or class B), making our feature matrix $X$ have linearly dependent columns. Thus, when we try to find the least squares solution, it is obvious that $X^T X$ will be non-invertible due to the linear dependency of X. You can extend this to any $k$ number of classes.