<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Mathematics Basics

**With `NumPy` and `SciPy`**

&copy; Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [training@tpq.io](mailto:trainin@tpq.io) | [@dyjh](http://twitter.com/dyjh)

See also `31_math_basics.ipynb`.

## `SciPy`

From the package page (https://scipy.org):

> SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:
> * NumPy: Base N-dimensional array package
> * SciPy library: Fundamental library for scientific computing
> * Matplotlib: Comprehensive 2-D plotting
> * IPython: Enhanced interactive console
> * SymPy: Symbolic mathematics
> * pandas: Data structures & analysis

## Simple Linear Regression

From Wikipedia (https://en.wikipedia.org/wiki/Simple_linear_regression):

> In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

## Coefficient of Determination

From Wikipedia (https://en.wikipedia.org/wiki/Coefficient_of_determination):

> In statistics, the coefficient of determination, denoted $R^2$ or $r^2$ and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).<br>It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

## P-Value

From Wikipedia (https://simple.wikipedia.org/wiki/P-value):
    
> In statistics, a p-value is the probability that the null hypothesis (the idea that a theory being tested is false) gives for a specific experimental result to happen. p-value is also called probability value. If the p-value is low, the null hypothesis is unlikely, and the experiment has statistical significance as evidence for a different theory.<br>In many fields, an experiment must have a p-value of less than 0.05 for the experiment to be considered evidence of the alternative hypothesis. In short, a low p-value means a higher chance of the null hypothesis being false.

### Deterministic Sample Data 1 

In [None]:
!git clone https://github.com/tpq-classes/mathematics_basics.git
import sys
sys.path.append('mathematics_basics')


In [None]:
import numpy as np
np.set_printoptions(suppress=True)

In [None]:
from pylab import plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

In [None]:
x = np.linspace(0, 10, 1001)

In [None]:
y = 3 + x / 2

In [None]:
plt.plot(x, y);

For `linregress` see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html.

In [None]:
from scipy.stats import linregress

In general, we want high $r$ or $r^2$ values and low $p$ values.

In [None]:
linregress(x, y)

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
res[0]

In [None]:
res[1]

### Deterministic Sample Data 2

In [None]:
x = np.linspace(0, 10, 1001)

In [None]:
y = 3 + x ** 3 / 2

In [None]:
plt.plot(x, y);

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
plt.plot(x, y)
plt.plot(x, res[0] * x + res[1], 'r--');

## Random Data

### Random Sample Data 1

In [None]:
from numpy.random import default_rng

In [None]:
rng = default_rng(100)

In [None]:
y = 3 + x / 2 + rng.normal(0, 0.2, len(x))

In [None]:
plt.plot(x, y, 'b.');

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
plt.plot(x, y, 'b.')
plt.plot(x, res[0] * x + res[1], 'r--');

### Random Sample Data 2

In [None]:
y = 3 + np.sqrt(x) + rng.normal(0, 0.2, len(x))

In [None]:
plt.plot(x, y, 'b.');

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
plt.plot(x, y, 'b.')
plt.plot(x, res[0] * x + res[1], 'r--');

### Random Sample Data 3

In [None]:
x = np.linspace(0, 22, 1001)

In [None]:
y = 3 + np.sin(x) + rng.normal(0, 0.2, len(x))

In [None]:
plt.plot(x, y, 'b.');

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
plt.plot(x, y, 'b.')
plt.plot(x, res[0] * x + res[1], 'r--');

### Random Sample Data 4

In [None]:
x = np.linspace(1, 50, 1001)

In [None]:
y = 3 + x / 2 - np.log(x) - np.sqrt(x) / 2 + 3 * np.sin(x) + rng.normal(0, 0.2, len(x))

In [None]:
plt.plot(x, y, 'b.');

In [None]:
res = linregress(x, y)

In [None]:
res

In [None]:
plt.plot(x, y, 'b.')
plt.plot(x, res[0] * x + res[1], 'r--');

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>