# Homework 8 - Gaussian Process Regression
This homework will explore:
- parameter values of covariance functions
- new covariance functions (incl. products of covariance functions)

You should have downloaded:
- 'x_train.csv'
- 'y_train.csv'
- 'x_test.csv'

The data is synthetically generated. But let's assume that it is capturing car price trends from 2016 to 2022:
- x-axis: year, with decimal points denoting which corresponding part of the year the data was obtained
- y-axis: car price, ten thousand \$

## 1 Training and Test Data
### 1.1 Load and plot data
1. [1 pt] Import `x_train`, `y_train`, `x_test`, and store them accordingly.
2. [3 pt] Plot the training data as a scatter plot.
    - Set xlim from the minimum in x_test to the maximum in x_test.
    - Set ylim from 0 to 10.
    - Label axes.
    - Include a legend.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

x_train = None          # TODO
y_train = None          # TODO
x_test = None          # TODO

# TODO plot

### 1.2 Describe training and test data
1. [1 pt] Describe the training data. Based on the scatter plot, what is your guess for the short term pattern and long term trajectory of the unknown function that the data points are sampled from?

    **Ans:** (type response here)

2. [1 pt] Describe the test points. At which x-values are we testing at?

    **Ans:** (type response here)


## 2 Plotting functions
### 2.1 [5 pt] Complete the function `plot_pred` below.
- There should be 3 plots in the same figure:
    1. Scatter plot of training data
    2. Curve of prediction
    3. 95% confidence region
- Set xlim from the minimum in x_test to the maximum in x_test.
- Set ylim from 0 to 10.
- Include a legend.
- Include a title, using the argument `plottitle`, which is a string.

In [12]:
def plot_pred(x_train, y_train, x_test, pred, std, plottitle):
    # TODO plot 3 things

    # TODO all other features of plot

    pass

### 2.2. [3 pt] Complete the function `plot_cov_fn` below.
- This function should plot the covariance function $k(x,x')$ against the distance between points $x-x'$, similar to those you see in lecture and section.
- Read the sklearn documentation for RBF kernel, paying attention to how the \_\_call\_\_ method works.
    - You may need to reshape arrays by .reshape(-1,1) for compatibility with sklearn GPR package.
- Set xlim for horizontal axis $-5 \leq x-x^\prime \leq 5$
- Set ylim from 0 to 1.1.
- Include approapriate x and y labels.

In [None]:
def plot_cov_fn(kernel):
    x_minus_xprime = None           # TODO
    cov_values = None           # TODO, you may need to use np.squeeze() on the array returned by sklearn

    # TODO plot
    plt.plot(x_minus_xprime, cov_values)
    plt.xlabel(r'$x-x^\prime$'); plt.ylabel(r'$k(x,x^\prime)$'); plt.title('covariance function')
    plt.ylim([0,1.1])
    plt.show()

## 3 Radial Basis Function (RBF) Covariance Function
The Squared Exponential (SE) covariance function is also known as the Radial Basis Function (RBF) Covariance Function. RBF is the name used in sklearn. The formula is
$$
k(x,x') = \exp\left(-\frac{(x-x')^2}{2\ell^2}\right).
$$

### 3.1 Define GPR model with RBF and Predict
Import relevant sklearn packages for GPR and RBF.
1. [2 pt] Define `kernel_rbf`, RBF kernel
    - Length scale parameter is some value that you will have to experiment around with.
    - Find the parameter that gives the best possible fit. You can check this by completing the next few cells and plotting the prediction. 
    
    (The best parameters should at least fit the training data pretty well, but not necessarily on the test points.)
2. [1 pt] Define `gp_rbf`, GPR object with RBF kernel. 
    - Set noise parameter alpha to 0.01.
    - Set optimizer to None. (Of course, you can try using the optimizer but it may/may not work.)
3. [1 pt] Fit gp_rbf to the training data
    - You may need to reshape your training data with .reshape(-1, 1) to make it compatible with sklearn.
4. [1 pt] Predict the values at test points, obtain the mean and standard deviation of your predictions, storing them as `mean_pred_rbf`, `std_pred_rbf`.
    - Again, you may need to reshape your test data.

In [23]:
# TODO import packages

kernel_rbf = None       # TODO
gp_rbf = None       # TODO
mean_pred_rbf, std_pred_rbf = None       # TODO

### 3.2 Plot Prediction and Confidence Interval for GPR with RBF
1. [2 pt] Using the function plot_pred(), plot your predictions. 
    - Make sure you have a meaningful title.

2. [1 pt] Plot the covariance function  used in your GPR model, using plot_cov_fn that you wrote earlier.

In [None]:
# TODO plot prediction and covariance function

## 4 Exponential Sine Squared (ESS) Covariance Function
This is another possible covariance function we could use. Let's see at how this covariance function looks like and how it performs.The formula is
$$
k(x,x') = \exp\left(-\frac{2 \left[\sin(\pi|x-x'|/p) \right]^2}{\ell^2}\right).
$$

### 4.1 Define GPR model with ESS and Predict
Import relevant sklearn packages for ESS kernel, ExpSineSquared.
1. [2 pt] Define `kernel_ess`, ESS kernel.
    - Length scale parameter is some value that you will have to experiment around with.
    - Periodicity parameter should be some value that is clear from what you observed in the training data. (What does the period look like, just using your eyeballs?)
    - Find the parameters that give the best possible fit. You can check this by completing the next few cells and plotting the prediction.
    
    (The best parameters should only be able to fit the periodicity of training data, but otherwise not good.)
2. [1 pt] Define `gp_ess`, GPR object with ESS kernel. 
    - Set noise parameter alpha to 0.01.
    - Set optimizer to None.
3. [1 pt] Fit gp_ess to the training data
4. [1 pt] Predict the values at test points, storing them as `mean_pred_ess`, `std_pred_ess`.

In [29]:
# TODO import packages
kernel_ess = None       # TODO
gp_ess = None       # TODO
mean_pred_ess, std_pred_ess = None       # TODO

### 4.2 Plot GPR with ESS kernel
1. [2 pt] Using the function plot_pred(), plot your predictions. 
    - Make sure you have a meaningful title.

2. [1 pt] Plot the ESS covariance function used in your GPR model.

In [None]:
# TODO plot prediction and covariance function

## 5 Product of SE and ESS Covariance Function
We can in fact build covariance functions from existing ones by taking their product, best of both worlds! The formula should be 
$$
k(x,x') = \exp\left(-\frac{(x-x')^2}{2\ell_1^2}\right) \cdot  \exp\left(-\frac{2 \left[\sin(\pi|x-x'|/p) \right]^2}{\ell_2^2}\right).
$$
Note: There are 3 parameters, $\ell_1, \ell_2, p$.
### 5.1 Define GPR model with SE x ESS covariance function
1. [2 pt] Define `kernel_pdt`, which is the SE kernel multiplied by ESS kernel.
    - Length scales and periodicity parameter is some value that you will have to experiment around with. _It may not be the same as the optimal parameters you found in previous parts._
    
    (The best parameters should be able to both interpolate and extrapolate the training data well.)

2. [1 pt] Define `gp_pdt`, GPR object with SExESS kernel. 
    - Set noise parameter alpha to 0.01.
    - Set optimizer to None.
3. [1 pt] Fit gp_pdt to the training data
4. [1 pt] Predict the values at test points, obtain the mean and standard deviation of your predictions, storing them as `mean_pred_pdt`, `std_pred_pdt`.

In [31]:
# TODO import packages
kernel_pdt = None       # TODO
gp_pdt = None       # TODO
mean_pred_pdt, std_pred_pdt = None       # TODO

### 5.2 Plot GPR with SE x ESS
1. [2 pt] Using the function plot_pred(), plot your predictions. 
    - Make sure you have a meaningful title.
2. [1 pt] Plot the product covariance function used in your GPR model.

In [None]:
# TODO plot prediction and covariance function

## 6 Prediction of Car Prices
1. [2 pt] Based on your prediction of car prices using GPR with product covariance function (SE x ESS), what is the first year that car prices are always above $45,000? 
    - The year you answer with should be within the test range.
    - Your answer should include any relevant plots or code that verifies your answer.

    **Ans:** (type response here)

In [None]:
# TODO code for how you verified your answer