## Lecture 11: Least squares and regularization

### STAT598z: Intro. to computing for statistics


***




### Vinayak Rao

#### Department of Statistics, Purdue University

In [1]:
options(repr.plot.width=4, repr.plot.height=3)

### Ordinary least squares regression

Consider linear regression: $ y =  x^{T} w + \varepsilon$
![Alt text](./figures/reg1.png)

In vector notation:
$$ y =  X^{\top} w + \varepsilon, \quad y\in\Re^n, X \in \Re^{n\times p} $$
![Alt text](./figures/reg2.png)

$ \hat{{w}} 
= \text{arg min}\ \|{y} - {X}^{\top} w\|^2
= \text{arg min} \sum_{i=1}^n (y_i - x_i^{\top} w)^2$


The problem:
$$ \hat{{w}} 
= \text{arg min}\ \|{y} - {X}^{\top} w\|^2
= \text{arg min} \sum_{i=1}^n (y_i - x_i^{\top} w)^2$$


The solution:
$$ \hat{w} 
=  \left(X{X}^{\top}\right)^{-1}Xy
$$

In 1-dim, this just gives the (normalized) correlation
$$ \hat{{w}} = \sum_{i=1}^n x_iy_i / \sum_{i=1}^n x_i^2
$$

For a new input $x^*$, we predict $y^* = {x^*}^{\top}\hat{w}$

$$ \hat{w} 
=  \left(X{X}^{\top}\right)^{-1}Xy
$$

<table> 
<tr> 
<td> $XX^{\top}$ </td>
<td> $Xy$ </td>
</tr>
<tr> 
<td> <img src="./figures/reg3.png" /> </td>
<td> <img src="./figures/reg4.png" /> </td>
</tr>
</table>

 Do not invert $XX^{\top}$ using `solve(X %*% t(X))`!

Directly solve for $w$ using `solve(X %*% t(X), X %*% y)`

### Programming style
Good programming style makes your life easier

Have informative variable/function names

Break your code into smaller functions and test individually.

Much better to build up from a set of bug-free components
than to write down a big function and then debug.
+ Much easier to deal with modular code
+ Build your own library of functions/wrappers

### Tracking down errors
Try to read the error message

Can be confusing, but is informative compared to e.g. Latex
(gibberish) or C (usually ‘segmentation faults’)

`"can’t find the object my_obj"`

Have you set variable/loaded package?

Have you set variable/loaded package? A common error:

` for(i in 1:10) a[i] <- i # First declare a! `

Have you set variable/loaded package? A common error:

` for(i in 1:10) a[i] <- i # First declare a! `

`"Error: Incompatible lengths ..."`

What are the lengths?

`missing value where TRUE/FALSE needed`

What is the argument to if/ while ?

### Tracking down errors
R always tell you where the error was detected

The actual cause can be much earlier than R indicates.

(In general, bad idea to write many lines of code without
checking syntax a few times along)

A Google search often leads to a solution on stackexchange
(be sure to remove variable names specific to your code)

### ‘Rubber-ducking’
(ericlippert.com/2014/03/05/how-to-debug-small-programs/)

+ You should be able to explain in simple words why each line is correct (not necessarily to a rubber duck though)

For each line know what input and predicted outputs are

While debugging code, intersperse `if`’s and `print`’s.
```R 
if(prob > 1 || prob < 0) {
print ”NOOO!!! Invalid probability”
stop();
}
```
+ compare expected with produced values

The `stopifnot` functions are also useful:
```R
stopifnot(prob > 0, prob < 1)
```

### Minimal working example
If you can’t track it down by inspection or want to email me:
+ Create a Minimal Working Example (MWE)

A third person should:
+ be able to reproduce error by cutting and pasting your code
+ not have to worry about unnecessary details

Do not send me your entire code, saying ”Help”

Do not send me just the error message, saying ”Help”

Remove all unnecessary code after error (easy)

Remove all unnecessary code before error (harder):

+ Remove unecessary variables/functions/packages
+ Remove unnecessary layers in ggplot()
+ If offending line is wrapped in a for/while loop, remove that
+ If error involves a long vector, try to minimize its length
+ Remove unnecessary columns in dataframes
+ If random numbers are involved, set.seed()

Most class errors I’ve seen can be reduced to one assignment
and one command

A good set of guidelines: http://stackoverflow.com/help/mcve

Write a minimal program:

+ *Restart from scratch*: Starting from an empty files, add as few
lines as possible to get your error
+ *Divide-and-conquer*: Remove parts of program line by line till the error 
disappears, and add the last line back

Ask a specific question

*"My code (see attached) gives an error"*: This is a story, not a
question

*“Why does my code (see attached) gives an error?”*: Unhelpful
question deserves unhelpful answer “Maybe your code is wrong”

Asking a specific question is halfway towards fixing your bug

Also convinces me that you thought about it