# Worksheet D

#### *variationalform* <https://variationalform.github.io/>

#### *Just Enough: progress at pace*

<https://variationalform.github.io/>

<https://github.com/variationalform>

Simon Shaw
<https://www.brunel.ac.uk/people/simon-shaw>.

<table>
<tr>
<td>
<img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:18px"/>
<img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:18px"/>
<img src="https://mirrors.creativecommons.org/presskit/icons/sa.svg?ref=chooser-v1" style="height:18px"/>
</td>
<td>

<p>
This work is licensed under CC BY-SA 4.0 (Attribution-ShareAlike 4.0 International)

<p>
Visit <a href="http://creativecommons.org/licenses/by-sa/4.0/">http://creativecommons.org/licenses/by-sa/4.0/</a> to see the terms.
</td>
</tr>
</table>

<table>
<tr>
<td>This document uses python</td>
<td>
<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" style="height:30px"/>
</td>
<td>and also makes use of LaTeX </td>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/92/LaTeX_logo.svg/320px-LaTeX_logo.svg.png" style="height:30px"/>
</td>
<td>in Markdown</td> 
<td>
<img src="https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png" style="height:30px"/>
</td>
</tr>
</table>

## What this is about:

This worksheet is based on the material in the notebook

- regress: polynomial and logistic regression.

Note that while the 'lecture' notebooks are prefixed with `1_`, `2_` and so on,
to indicate the order in which they should be studied, the worksheets are prefixed
with `A_`, `B_`, ...

In [None]:
# useful imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model

### Exercise 1
A straight line has gradient $m=2$ and $y$-intercept
$c=4$. Sketch it, and determine the value of $x$ for which $y=8$.

In [None]:
# Answer here - create more cells as necessary

### Exercise 2
A straight line with gradient $m$ passes though the point $(x_0,y_0)$
then show that $y-y_0 = m(x-x_0)$. This is
called the *point slope* form.

In [None]:
# Answer here - create more cells as necessary

### Exercise 3
A line with gradient $m=5$ passes through $(x,y)=(-1,2)$. Find the
equation of the line in the form $y=mx+c$.

In [None]:
# Answer here - create more cells as necessary

### Exercise 4

Recall Anscombe's data set. We used the following codse to split it into its four
sub-sets, and we also produced scateterplots, as shown below for the first data 
subset.

```
dfa = sns.load_dataset('anscombe')
print("The size of Anscombe's data set is:", dfa.shape)
dfa.dataset.unique()
dfa1 = dfa.loc[dfa['dataset'] == 'I']
dfa2 = dfa.loc[dfa['dataset'] == 'II']
dfa3 = dfa.loc[dfa['dataset'] == 'III']
dfa4 = dfa.loc[dfa['dataset'] == 'IV']
sns.scatterplot(data=dfa1, x="x", y="y")
dfa1.describe()
```

Implement linear regression for this fist dataset `dfa1`. Then
implement ridge and LASSO regression. Plot your regreesion
lines on the same plot and include the underlying data.

You might find the following useful:

```
dfreg = dfa1.sort_values('x', ascending = True).reset_index(drop=True)
```

After this you can either reassign `dfa1 = dfreg` or work directly
with `dfreg`.


In [None]:
# Answer here - create more cells as necessary

### Exercise 5

Repeat Exercise 4 but with `dfa2`.

In [None]:
# Answer here - create more cells as necessary

### Exercise 6

Repeat Exercise 5 but with `dfa3`.

In [None]:
# Answer here - create more cells as necessary

### Exercise 7

Repeat Exercise 6 but with `dfa4`.

In [None]:
# Answer here - create more cells as necessary

# Outline Suggested Solutions

The following are suggestions for solutions of the above problems. 
Please have a go first though before looking at these.


### Solution 1
The line passes through the vertical axis at $y=4$ and climbs a
vertical distance of $2$ for every unit of horizontal distance.
When $y=8$, we have from $y=mx+c$ that $x=(y-c)/m = 2$. 

### Solution 2
At another (arbitrary but distinct) point $(x,y)$ the gradient is 
$m = (y-y_0)/(x-x_0)$. The *point slope* form follows. 

### Solution 3
$y-y_0 = m(x-x_0)$ with $(x_0,y_0)=(-1,2)$ and $m=5$. Therefore

$$
y = mx - mx_0 + y_0 = 5x -(5\times-1-2) = 5x+7.
$$

### Solution 4

An outline solution to Exercise 4 follows.

In [None]:
dfa = sns.load_dataset('anscombe')
print("The size of Anscombe's data set is:", dfa.shape)

In [None]:
dfa.dataset.unique()

In [None]:
dfa1 = dfa.loc[dfa['dataset'] == 'I']
dfa2 = dfa.loc[dfa['dataset'] == 'II']
dfa3 = dfa.loc[dfa['dataset'] == 'III']
dfa4 = dfa.loc[dfa['dataset'] == 'IV']

In [None]:
sns.scatterplot(data=dfa1, x="x", y="y")
dfa1.describe()

In [None]:
dfa1.head()

In [None]:
dfreg = dfa1.sort_values('x', ascending = True).reset_index(drop=True)

In [None]:
dfreg.head()

In [None]:
X_vals = dfreg.iloc[:,1].values.reshape(-1,1)
y_vals = dfreg.iloc[:,2].values.reshape(-1,1)
#print(X_vals,'\n', y_vals)

In [None]:
# standard (usual) regression
reg_usual = linear_model.LinearRegression()
reg_usual.fit(X_vals, y_vals)
# Make predictions 
y_pred_usual = reg_usual.predict(X_vals)
print('reg_usual_coef_ = ', reg_usual.coef_)
print('reg_usual_intercept_ = ', reg_usual.intercept_)

In [None]:
# ridge regression
reg_ridge = linear_model.Ridge(alpha=0.5)
reg_ridge.fit(X_vals, y_vals)
# Make predictions
y_pred_ridge = reg_ridge.predict(X_vals)
print('reg_ridge_coef_ = ', reg_ridge.coef_)
print('reg_ridge_intercept_ = ', reg_ridge.intercept_)

In [None]:
# LASSO regression
reg_lasso = linear_model.Lasso(alpha=0.5)
reg_lasso.fit(X_vals, y_vals)
# Make predictions
y_pred_lasso = reg_lasso.predict(X_vals)
print('reg_lasso_coef_ = ', reg_lasso.coef_)
print('reg_lasso_intercept_ = ', reg_lasso.intercept_)

In [None]:
plt.plot(X_vals,y_vals,'.r',marker='o')
plt.plot(X_vals,y_pred_usual,'b',marker='d')
plt.plot(X_vals,y_pred_ridge,'g',marker='x')
plt.plot(X_vals,y_pred_lasso,'c',marker='+')

### Solution 5

An outline solution to Exercise 5 follows.

In [None]:
dfreg = dfa2.sort_values('x', ascending = True).reset_index(drop=True)
X_vals = dfreg.iloc[:,1].values.reshape(-1,1)
y_vals = dfreg.iloc[:,2].values.reshape(-1,1)

# standard (usual) regression
reg_usual = linear_model.LinearRegression()
reg_usual.fit(X_vals, y_vals)
# Make predictions 
y_pred_usual = reg_usual.predict(X_vals)
print('reg_usual_coef_ = ', reg_usual.coef_)
print('reg_usual_intercept_ = ', reg_usual.intercept_)

# ridge regression
reg_ridge = linear_model.Ridge(alpha=0.5)
reg_ridge.fit(X_vals, y_vals)
# Make predictions
y_pred_ridge = reg_ridge.predict(X_vals)
print('reg_ridge_coef_ = ', reg_ridge.coef_)
print('reg_ridge_intercept_ = ', reg_ridge.intercept_)

# LASSO regression
reg_lasso = linear_model.Lasso(alpha=0.5)
reg_lasso.fit(X_vals, y_vals)
# Make predictions
y_pred_lasso = reg_lasso.predict(X_vals)
print('reg_lasso_coef_ = ', reg_lasso.coef_)
print('reg_lasso_intercept_ = ', reg_lasso.intercept_)

plt.plot(X_vals,y_vals,'.r',marker='o')
plt.plot(X_vals,y_pred_usual,'b',marker='d')
plt.plot(X_vals,y_pred_ridge,'g',marker='x')
plt.plot(X_vals,y_pred_lasso,'c',marker='+')


### Solution 6

An outline solution to Exercise 6 follows.

In [None]:
dfreg = dfa3.sort_values('x', ascending = True).reset_index(drop=True)
X_vals = dfreg.iloc[:,1].values.reshape(-1,1)
y_vals = dfreg.iloc[:,2].values.reshape(-1,1)

# standard (usual) regression
reg_usual = linear_model.LinearRegression()
reg_usual.fit(X_vals, y_vals)
# Make predictions 
y_pred_usual = reg_usual.predict(X_vals)
print('reg_usual_coef_ = ', reg_usual.coef_)
print('reg_usual_intercept_ = ', reg_usual.intercept_)

# ridge regression
reg_ridge = linear_model.Ridge(alpha=0.5)
reg_ridge.fit(X_vals, y_vals)
# Make predictions
y_pred_ridge = reg_ridge.predict(X_vals)
print('reg_ridge_coef_ = ', reg_ridge.coef_)
print('reg_ridge_intercept_ = ', reg_ridge.intercept_)

# LASSO regression
reg_lasso = linear_model.Lasso(alpha=0.5)
reg_lasso.fit(X_vals, y_vals)
# Make predictions
y_pred_lasso = reg_lasso.predict(X_vals)
print('reg_lasso_coef_ = ', reg_lasso.coef_)
print('reg_lasso_intercept_ = ', reg_lasso.intercept_)

plt.plot(X_vals,y_vals,'.r',marker='o')
plt.plot(X_vals,y_pred_usual,'b',marker='d')
plt.plot(X_vals,y_pred_ridge,'g',marker='x')
plt.plot(X_vals,y_pred_lasso,'c',marker='+')


### Solution 7

An outline solution to Exercise 7 follows.

In [None]:
dfreg = dfa4.sort_values('x', ascending = True).reset_index(drop=True)
X_vals = dfreg.iloc[:,1].values.reshape(-1,1)
y_vals = dfreg.iloc[:,2].values.reshape(-1,1)

# standard (usual) regression
reg_usual = linear_model.LinearRegression()
reg_usual.fit(X_vals, y_vals)
# Make predictions 
y_pred_usual = reg_usual.predict(X_vals)
print('reg_usual_coef_ = ', reg_usual.coef_)
print('reg_usual_intercept_ = ', reg_usual.intercept_)

# ridge regression
reg_ridge = linear_model.Ridge(alpha=0.5)
reg_ridge.fit(X_vals, y_vals)
# Make predictions
y_pred_ridge = reg_ridge.predict(X_vals)
print('reg_ridge_coef_ = ', reg_ridge.coef_)
print('reg_ridge_intercept_ = ', reg_ridge.intercept_)

# LASSO regression
reg_lasso = linear_model.Lasso(alpha=0.5)
reg_lasso.fit(X_vals, y_vals)
# Make predictions
y_pred_lasso = reg_lasso.predict(X_vals)
print('reg_lasso_coef_ = ', reg_lasso.coef_)
print('reg_lasso_intercept_ = ', reg_lasso.intercept_)

plt.plot(X_vals,y_vals,'.r',marker='o')
plt.plot(X_vals,y_pred_usual,'b',marker='d')
plt.plot(X_vals,y_pred_ridge,'g',marker='x')
plt.plot(X_vals,y_pred_lasso,'c',marker='+')


## Technical Notes, Production and Archiving

Ignore the material below. What follows is not relevant to the material being taught.

#### Production Workflow

- Finalise the notebook material above
- Clear and fresh run of entire notebook
- Create html slide show:
  - `jupyter nbconvert --to slides D_worksheet.ipynb `
- Set `OUTPUTTING=1` below
- Comment out the display of web-sourced diagrams
- Clear and fresh run of entire notebook
- Comment back in the display of web-sourced diagrams
- Clear all cell output
- Set `OUTPUTTING=0` below
- Save
- git add, commit and push to FML
- copy PDF, HTML etc to web site
  - git add, commit and push
- rebuild binder

Some of this originated from
https://stackoverflow.com/questions/38540326/save-html-of-a-jupyter-notebook-from-within-the-notebook
These lines create a back up of the notebook. They can be ignored.
At some point this is better as a bash script outside of the notebook

In [None]:
%%bash
NBROOTNAME='D_worksheet'
OUTPUTTING=0

if [ $OUTPUTTING -eq 1 ]; then
  jupyter nbconvert --to html $NBROOTNAME.ipynb
  cp $NBROOTNAME.html ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.html
  mv -f $NBROOTNAME.html ./formats/html/

  jupyter nbconvert --to pdf $NBROOTNAME.ipynb
  cp $NBROOTNAME.pdf ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.pdf
  mv -f $NBROOTNAME.pdf ./formats/pdf/

  jupyter nbconvert --to script $NBROOTNAME.ipynb
  cp $NBROOTNAME.py ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.py
  mv -f $NBROOTNAME.py ./formats/py/
else
  echo 'Not Generating html, pdf and py output versions'
fi