### Exercises

#### Question 1

The accompanying file `data.csv` contains information for the value `x` of something observed at time `t`.

Given this data, we want to calculate the rate of change of this value over time - we'll do this by taking two consecutive observations, say $x(t_i)$ and $x(t_{i+1})$ and approximate the rate of change using this formula:

$$
v(t_{i+1}) = \frac{x(t_{i+1}) - x(t_i)}{t_{i+1} - t_i}
$$

For example, if the data looks like this:

```
t     x
0.1   10
0.2   12
0.4   14
0.5   15
```

Then the first row of data would be considered $t_0$, the second row $t_1$, etc

And we can start approximating the rate of change starting at $v_1$ which would be calculated as:

$$
v_1 = \frac{12 - 10}{0.2 - 0.1} = 20.0
$$

Similarly, $v_2$ would be calculated as:

$$
v_2 = \frac{14 - 12}{0.4 - 0.2} = 10.0
$$

Use NumPy arrays to create an array that holds the calculated rates of change and determine the minimum, maximum, average and standard deviation of the rate of change.

In [2]:
import csv
import numpy as np
with open('data.csv', newline='') as csvfile:
    x = csv.reader(csvfile, delimiter=' ', quotechar='|')
    header=next(x)
    print(f'Header: {header}')
    T=[]
    X = []
    for row in x:
        t,x=row[0].split(',')
        T.append(float(t))
        X.append(float(x))       
T=np.array(T)
X=np.array(X)
DX=np.array([])
DT=np.array([])
for i in range(1,len(T)):
        DX = np.append(DX,X[i]-X[i-1])
        DT = np.append(DT,T[i]-T[i-1])
V = DX/DT
print(f'Max V is  {np.max(V)}') 
print(f'Min V is  {np.min(V)}')
print(f'Mean V is {np.mean(V)}')
print(f'Std V is  {np.std(V)}')
print(V)

Header: ['t,x']
Max V is  69.07300506151955
Min V is  29.42739859222142
Mean V is 49.98125178748103
Std V is  9.043463532187504
[50.86622176 51.94935197 35.59144075 67.22995203 39.1007292  52.93709476
 48.7089318  62.20773438 51.32188572 31.73236779 57.36477206 53.03182065
 60.73281476 43.1915111  57.94411504 34.83306963 57.73255581 42.08891533
 57.98542428 38.32023478 67.48615221 48.093643   42.31323664 53.71244492
 47.40536029 56.94238226 34.93056181 69.07300506 34.25537804 50.61200384
 47.3108194  60.68569144 47.95527183 52.66924298 39.17350419 62.6727992
 47.34774089 51.06091262 44.84522578 57.78944777 40.05894433 59.18939376
 49.93942905 44.04063203 53.8329408  44.59051897 54.01930861 57.62888975
 42.55131646 43.3626037  66.71151349 32.38972102 58.07364049 51.45280885
 48.63115874 50.34830522 44.42223678 46.13926723 61.21598928 43.69435355
 46.52295121 61.16976936 45.07897815 60.60440737 33.92767856 59.26966689
 46.00648977 61.93828733 46.22600595 46.02387813 43.85041604 55.014928

In [5]:
delta_t = T[1:] - T[:-1]
delta_x = X[1:] - X[:-1]
v1 = delta_x/delta_t
v1

array([50.86622176, 51.94935197, 35.59144075, 67.22995203, 39.1007292 ,
       52.93709476, 48.7089318 , 62.20773438, 51.32188572, 31.73236779,
       57.36477206, 53.03182065, 60.73281476, 43.1915111 , 57.94411504,
       34.83306963, 57.73255581, 42.08891533, 57.98542428, 38.32023478,
       67.48615221, 48.093643  , 42.31323664, 53.71244492, 47.40536029,
       56.94238226, 34.93056181, 69.07300506, 34.25537804, 50.61200384,
       47.3108194 , 60.68569144, 47.95527183, 52.66924298, 39.17350419,
       62.6727992 , 47.34774089, 51.06091262, 44.84522578, 57.78944777,
       40.05894433, 59.18939376, 49.93942905, 44.04063203, 53.8329408 ,
       44.59051897, 54.01930861, 57.62888975, 42.55131646, 43.3626037 ,
       66.71151349, 32.38972102, 58.07364049, 51.45280885, 48.63115874,
       50.34830522, 44.42223678, 46.13926723, 61.21598928, 43.69435355,
       46.52295121, 61.16976936, 45.07897815, 60.60440737, 33.92767856,
       59.26966689, 46.00648977, 61.93828733, 46.22600595, 46.02

#### Question 2

In linear regression we try to find the coefficients `m` (slope) and `c` (y-intercept) of a straight line

$$
y = mx + c
$$

that provides the "best" fit given some `x` and `y` data. This formula then allows to predict `y` values for given `x` values.

Given an array of `n` `(x, y)` data pairs, these coefficients can be calculated very simply.

A bit of terminology first:

- Let `X` mean the column of `X` values.
- Let `Y` mean the column of `Y` values.
- Let `XX` mean a column calculated by multiplying each `x` in the `X` column by itself
- Let `XY` mean a column calculated by multiplying the `x` and `y` values from the `X` and `Y` columns

Then, given some column (say `X`), this symbol: $\sum{X}$ means the sum of all the elements in the column.

Similarly, the symbol $\sum{XY}$ means the sum of the values obtained by multiplying (pairwise) the values in `X` and `Y`.

Given those definitions, the formulas for calculating the "best" values of `m` and `c` are given by:

$$
m = \frac{n\sum{XY} - \sum{X}\sum{Y}}{n\sum{XX} - (\sum{X})^2}
$$

$$
c = \frac{\sum{Y}\sum{XX} - \sum{X}\sum{XY}}{n\sum{XX} - (\sum{X})^2}
$$

(where `n` is the number of `(x,y)` pairs in our data set.)

Using the same data we saw in Question 1, calculate the values for `m` and `c` for that data set given the formulas above.

You can think of the `t` column in the data as the `X` column, and the `x` values in the data as the `Y` column - we are trying to predict the value of `x` given a value of `t`.

This will result in a straight line that "best" fits through the data.

Compare the slope of this regression line to the average rate of change you calculated in Question 1.