# 1st Course: Hand-made Linear Regression

The purpose of this notebook is to get acquainted with **tensors** and **gradients**. As a guiding toy example, we shall implement linear regression using gradient descent.

Let us start with the necessary imports.

In [None]:
import tensorflow as tf
import numpy as np

import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats
%matplotlib inline
set_matplotlib_formats('svg')

## Step 1: Tensors, and Creating our Dataset

TensorFlow **tensors** are very similar to Numpy **arrays** --- they have a certain **rank** and **shape**, the values in a tensor all have the same specifided **dtype** (which can not be `object`, though), and many array operations known from Numpy have an analogue in TensorFlow.
To get started with our linear regression example, let's

- fix some initial parameters $\alpha$, $\beta$,
- fix the sample size $n$,
- choose uniformly spaced $x_1,\ldots,x_n$ in the intervall $[0,1]$ and
- choose samples $y_i = \alpha x_i + \beta + e_i$ with some Gaussian noise $e_i$.


In [None]:
alpha = tf.constant(0.5)
beta = tf.constant(1.)
alpha, beta

In [None]:
n = 21
x = tf.linspace(0., 1., n)
x

In [None]:
y = alpha * x + beta + tf.random.normal((n,), stddev=0.1)
y

Let's plot the dataset including the line:

In [None]:
plt.scatter(x,y) # the points
plt.plot([0, 1], [beta, alpha + beta], color='orange') # the line


## Step 2: Variables, and Initializing our Parameters

A TensorFlow **variable** is a wrapper for a tensor and

> the best way to represent shared, persistent state manipulated by your program.

They should be used to store **model parameters** and receive special treatment when tracking gradients and optimizing or saving models.

For our linear regression example, we want to start with random parameters:

In [None]:
alpha = tf.Variable(tf.random.uniform((1,),-5,5)[0], name="alpha")
beta = tf.Variable(tf.random.uniform((1,),-5,5)[0], name="beta")
alpha, beta

To update the value of a TensorFlow variable, one has to use the `assign` method or variants like `assing_add`, `assign_sub` and so on:

## Step 3: Gradients and Gradient Tapes

To compute **gradients** of a function with respect TensorFlow tensors or variables, use the class `tf.GradientTape` and its methods `watch` and `gradient` as follows:

In [None]:
x = tf.constant(3.)
with tf.GradientTape() as tape:
    tape.watch(x)
    y = alpha * x + beta
tape.gradient(y, [alpha, x, beta])


A gradient tape records the gradients

- of the functions computed in its scope
- with respect to `tf.Variable` instances and with respect to tensors that are watched explicitly.

In [None]:
del tape

## Step 4: Implementing Gradient Descent

To perform gradient descent, we need to compute the gradients of our **loss** with respect to the **model parameters**. Let us first write down the loss function:

In [None]:
def compute_loss(alpha, beta, x, y):
    y_predict = alpha * x + beta
    errors = y - y_predict
    loss = tf.reduce_sum(errors * errors)
    return loss

compute_loss(alpha, beta, x, y)

This could be shortened using suitable tensor operations (see the exercises).

Let us turn to the gradient descent step.

In the function above, the loss is computed from the parameters and the samples using several elementary functions. To compute the gradient of the loss with respect to the parameters, we need to

- keep track of the gradients of the elementary functions and
- combine these one-step gradients using the chain rule.


In [None]:
def gradient_step(alpha, beta, x, y, learning_rate):
    with tf.GradientTape() as tape:
        loss = compute_loss(alpha, beta, x, y)
        grad_alpha, grad_beta = tape.gradient(loss, [alpha, beta])
    alpha.assign_sub(grad_alpha * learning_rate)
    beta.assign_sub(grad_beta * learning_rate)
    return loss

def gradient_descent(alpha, beta, x, y, nr_steps=10, learning_rate=0.01):
    alphas, betas, losses = [], [], []
    for step in range(0, nr_steps):
        loss = gradient_step(alpha, beta, x, y, learning_rate)
        alphas.append(alpha.read_value())
        betas.append(beta.read_value())
        losses.append(loss)
    return tf.stack(alphas), tf.stack(betas), tf.stack(losses)

def choose_params():
    alpha = tf.Variable(tf.random.uniform((1,),-5,5)[0], name="alpha")
    beta = tf.Variable(tf.random.uniform((1,),-5,5)[0], name="beta")
    return alpha, beta

alphas, betas, losses = gradient_descent(*choose_params(), x, y)
alphas, betas, losses 

Let us visualize the results quickly:

In [None]:
import pandas as pd
import seaborn as sns

sns.set_style('whitegrid')

def visualize(alphas, betas, losses):
    pd.Series(losses, name='loss').plot()
    pd.DataFrame({'alpha': alphas, 'beta': betas}).plot()
    
visualize(*gradient_descent(*choose_params(), x, y, nr_steps=100))

## Visualizing training with TensorBoard

For long-running computations, TensorFlow offers a convenient tool to log and visualize data: TensorBoard. To log the data, use the `tf.summary` module as, for example, in the following function:

In [None]:
import os
import time

LOGDIR = 'tmp'

def tb_gradient_descent(alpha, beta, x, y, nr_steps=10, learning_rate=0.01):
    path = os.path.join(LOGDIR, time.strftime('%H-%M-%S'))
    with tf.summary.create_file_writer(path).as_default():
        for step in range(0, nr_steps):
            loss = gradient_step(alpha, beta, x, y, learning_rate)
            tf.summary.scalar('alpha', alpha.read_value(), step=step)
            tf.summary.scalar('beta', beta.read_value(), step=step)
            tf.summary.scalar('loss', loss, step=step)

tb_gradient_descent(*choose_params(), x, y, nr_steps=100)

In [None]:
!tensorboard --logdir=$LOGDIR

## Eager mode versus graph mode

By default, TensorFlow 2 performs all tensor operations eagerly, which helps debugging and prototyping. But if we decorate a tensor function with `@tf.function`, on first run, the function gets compiled to a computation graph which then may run much more quickly.

In [None]:
alpha, beta = choose_params()
%timeit gradient_descent(alpha, beta, x, y, nr_steps=100)

In [None]:
alpha, beta = choose_params()
quick_descent = tf.function(gradient_descent)
%timeit quick_descent(alpha, beta, x, y, nr_steps=100)

The speed-up is impressive.

## Exercises

### Exercise 1: Tensor operations

Shorten the following function `get_loss` using `tf.square` or `tf.norm`:

In [None]:
def compute_loss(alpha, beta, x, y):
    y_predict = alpha * x + beta
    errors = y - y_predict
    loss = tf.reduce_sum(errors * errors)
    return loss


### Exercise 2: Computing gradients

Make TensorFlow compute the derivative of the function $t \mapsto \cos t * \sin t$ at $t=1$.

### Exercise 3: Computing second-order differentials

Compute the second derivative of the function $f(t) = t\cos(t)$ at $t=1$ and check that it equals $\-cos(1) - 2\sin(1)$: