# Writing Models to and from YAML Files

In all the other tutorials, we write the models from scratch in Python.
This is useful when interactively testing new models.
However, there comes a point where we want to test multiple models quickly and writing separate Python files for all of them becomes cumbersome.
For those use cases, `simpple` allows you to specify your entire model in a YAML file.

## Simulated Data

In [None]:
import numpy as np

rng = np.random.default_rng(123)

x = np.sort(10 * rng.random(100))
m_true = 1.338
b_true = -0.45
truths = {"m": m_true, "b": b_true, "sigma": None}
y_true = m_true * x + b_true
yerr = 0.1 + 0.5 * rng.random(x.size)
y = y_true + 2 * yerr * rng.normal(size=x.size)

## Loading the Model

### Loading only the Parameters

In many cases, what we actually want to store is the prior distribution on parameters.
For example maybe the data was already imported with Numpy, our forward model function is already implemented, and we want to test various prior distributions.
In this instance, we can simply load the parameters dictionary and pass it to a forward model as we would normally do.

In [None]:
import simpple.load as sl

sl.load_parameters("./docs/tutorials/examples/line.yaml")

### Loading a `ForwardModel`

The next option would be to directly load the entire model from a the YAML file.
Arguments (other than `parameters`) and keyword arguments can be passed directly to `from_yaml()`.

In [None]:
from simpple.model import ForwardModel


def linear_model(p, x):
    return p["m"] * x + p["b"]


def log_likelihood(p, x, y, yerr):
    ymod = linear_model(p, x)
    var = yerr**2 + p["sigma"] ** 2
    return -0.5 * np.sum(np.log(2 * np.pi * var) + (y - ymod) ** 2 / var)


model = ForwardModel.from_yaml(
    "./docs/tutorials/examples/line.yaml",
    log_likelihood=log_likelihood,
    forward=linear_model,
)

Another option is to specify the arguments and keyword arguments in the YAML file, under the `args` and `kwargs` field.
For the built-in `Model` and `ForwardModel` classes, the only extra arguments are functions.
We cannot write functions in the YAML file, so the config file should contain the name of the function.
If there is a dot in the function, `simpple` will try to import it, otherwise, it will search for it in the global namespace.
This means that as long as the function is available to be used or imported in your script, `simpple` should be able to find it.

In [None]:
model = ForwardModel.from_yaml("./docs/tutorials/examples/line.yaml")

## Saving the Model

To save the model, simply call its `to_yaml()` method.

In [None]:
model.to_yaml("./save_line.yaml", overwrite=True)

## Working with custom models

### Using the default YAML functions

As discussed in the [Writing Model Classes](./writing-model-classes.ipynb) tutorial, we can build our own model classes.
As long as they are subclasses of `Model` or `ForwardModel`, they will inherit their the `from_yaml` and `to_yaml` methods.

In [None]:
import simpple.distributions as sdist


class PolyModel(ForwardModel):
    def __init__(self, parameters: dict[str, sdist.Distribution], order: int):
        self.order = order
        self.parameters = parameters
        for i in range(self.order + 1):
            k = "a" + str(i)
            if k not in self.parameters:
                raise KeyError(
                    f"Parameters should have keys from a0 to a{self.order} for polynomial of order {self.order}. Key {k} not found."
                )

    def _forward(self, p, x):
        parr = np.array([p[f"a{i}"] for i in range(self.order + 1)])
        return np.vander(x, self.order + 1, increasing=True) @ parr

    def _log_likelihood(self, p, x, y, yerr):
        ymod = self.forward(p, x)
        var = yerr**2 + p["sigma"] ** 2
        return -0.5 * np.sum(np.log(2 * np.pi * var) + (y - ymod) ** 2 / var)

Here `PolyModel` already defines its forward and log-likelihood functions.
Therefore the only thing needed as an argument in the YAML file is the order of the polynomial.

In [None]:
pm = PolyModel.from_yaml("./docs/tutorials/examples/line_poly.yaml")

In [None]:
# TODO: Fix to_yaml so it saves only what was passed as an argument to ForwardModel
pm.to_yaml("./save_line_poly.yaml", overwrite=True)

### Using custom YAML functions

There are many cases where it can be useful to add custom functionality when reading fro or writing to YAML files.
We can easily do so by implementing our own `from_yaml()` and `to_yaml()` methods.

For example, let us say our polynomial model also stores the data as attribute.
It is convenient to specify the data directly as array when creating our model in a notebook or a script.
However, when storing the models in a YAML file, we will need to read and write the data in some otherway.
One simple option is to store the arrays in a text file and load them from that text file.

Note that this is a simple implementation for demonstration purposes, but you could customize the two methods in any way you see fit for your own use case.


In [None]:
import yaml
import simpple.distributions as sdist
from pathlib import Path
from simpple.load import write_parameters, parse_parameters


class PolyModelData(ForwardModel):
    def __init__(
        self,
        parameters: dict[str, sdist.Distribution],
        order: int,
        x: np.ndarray,
        y: np.ndarray,
        yerr: np.ndarray,
    ):
        self.x = x
        self.y = y
        self.yerr = yerr
        self.order = order
        self.parameters = parameters
        for i in range(self.order + 1):
            k = "a" + str(i)
            if k not in self.parameters:
                raise KeyError(
                    f"Parameters should have keys from a0 to a{self.order} for polynomial of order {self.order}. Key {k} not found."
                )

    @classmethod
    def from_yaml(cls, path: Path | str, data_file: Path | None = None):
        with open(path) as f:
            mdict = yaml.safe_load(f)
        parameters = parse_parameters(mdict["parameters"])
        data_file = data_file or mdict["data_file"]
        x, y, yerr = np.loadtxt(data_file, delimiter=",").T
        model = cls(parameters, mdict["order"], x, y, yerr)
        model.data_file = data_file
        return model

    def to_yaml(self, path: Path | str, overwrite: bool = False):
        path = Path(path)

        model_dict = {}
        model_dict["class"] = self.__class__.__name__
        model_dict["parameters"] = write_parameters(self.parameters)
        model_dict["order"] = self.order

        if hasattr(self, "data_file"):
            # If the data file exists, just save its path so we re-use the same data.
            model_dict["data_file"] = str(self.data_file)
        else:
            # If the data file does not exist, save the data
            data_file = path.parent / (path.stem + "_data.csv")
            model_dict["data_file"] = str(data_file)
            if data_file.exists() and not overwrite:
                raise FileExistsError(
                    f"The data file {path} already exists. Use overwrite=True to overwrite it."
                )
            np.savetxt(data_file, np.array([x, y, yerr]).T, delimiter=",")

        if path.exists() and not overwrite:
            raise FileExistsError(
                f"The file {path} already exists. Use overwrite=True to overwrite it."
            )
        with open(path, mode="w") as f:
            yaml.dump(model_dict, f)

    def _forward(self, p, x):
        parr = np.array([p[f"a{i}"] for i in range(self.order + 1)])
        return np.vander(x, self.order + 1, increasing=True) @ parr

    def _log_likelihood(self, p, x, y, yerr):
        ymod = self.forward(p, x)
        var = yerr**2 + p["sigma"] ** 2
        return -0.5 * np.sum(np.log(2 * np.pi * var) + (y - ymod) ** 2 / var)


test_p = sl.load_parameters("./docs/tutorials/examples/line_poly.yaml")
pmd = PolyModelData(test_p, 1, x, y, yerr)

Since the class requires data to be created, we want to save that data when saving the YAML file.

In [None]:
pmd.to_yaml("./save_poly_data.yaml", overwrite=True)

If we then load the model it will simply re-use the data file that the YAML file points to.

In [None]:
pmd_yaml = pmd.from_yaml("./save_poly_data.yaml")