# pysr tutorial, from cosine to rate state friction

In this tutorial we are going to go through first steps of symbolic regression using `pysr`. First we will use a basic example for a cosine function. Then we will use a rate and state friction model. Along the way we will explain how pysr arrives at the functional solution.

Some of this comes from the original tutorial provided by `pysr` which can be found on google colab:

https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb#scrollTo=4nDAAnisdhTc

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from pysr import PySRRegressor
from sklearn.model_selection import train_test_split

In [3]:
# Dataset
np.random.seed(0)
X = 2 * np.random.randn(100, 5)
x3 = X[:, 3]
x0 = X[:, 0]
# y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 2
y = 2.5382 * np.cos(x3) + x0 ** 2 - 2

# explanation how the model works

Below we create a dictionary of the paramaters we want to use in the model.

By default, `populations=15`, but you can set a different number of populations with this option. More populations may increase the diversity of equations discovered, though will take longer to train. However, it is usually more efficient to have `populations>procs`, as there are multiple populations running on each core.

By default, PySRRegressor uses `model_selection='best'` which selects an equation from `PySRRegressor.equations_` using a combination of accuracy and complexity. You can also select `model_selection='accuracy'`.

* How is accuracy calculated?
* How is complexity calculated?

In [4]:
default_pysr_params = dict(
    populations=30, # https://astroautomata.com/PySR/options/#populations
    model_selection="best",
)

Now we can fit the model. The process works essentially the same as a `sklearn` model. First you create the model object with it's set of hyperparameters. Then you fit the model. Keep in mind that unlike `sklearn`, which is creating a single model, a `PySR` model is technically an ensemble of linear regression models (in the statistical sense, that is, a linear combination of variables as opposed to, for example, a decision tree).

`PySR` can run for arbitrarily long, and continue to find more and more accurate expressions. You can set the total number of cycles of evolution with `niterations`, although there are also a few more ways to stop execution.

Binary operators are operators that include any form of combination, such as `A + B` or `A * B`. https://en.wikipedia.org/wiki/Binary_operation

Unary operators are operators that perform a transformation, as as `sin(x)` or `abs(x)` or `-(x)`.

In [None]:
# Learn equations
model = PySRRegressor(
    niterations=30,
    binary_operators=["plus", "mult"],
    unary_operators=["cos", "exp", "sin"],
    **default_pysr_params
)

model.fit(X, y)

