# The zfit API

Currently, the functionality of the zfit package is two-fold: design a high-level API to manage model building, fitting and generation, and implement this API using the `tensorflow` backend.

The final goal is to be able to, given a PDF model `pdf`, a dataset `data`, and a list of `params` to minimize, perform minimization tasks such as

```python
import zfit
from zfit.core.minimizer import MinuitMinimizer

nll = zfit.UnbinnedNLL(pdf, data, fit_range=(-10, 10))
minimizer = MinuitMinimizer(nll)
minimizer.minimize(params)
minimizer.hesse(params)
minimizer.error(params)
result = minimizer.get_state()
```

and generation in a very simple way

```python
sample = pdf.sample(n_draws=int(1e7), limits=(-10, 10))
```

The main concepts in the API are
- Parameters
- PDFs (and scalar functions, which are basically unnormalized pdfs)
- Loss functions
- Minimizers
  
In the following, we informally outline the basics of each of these objects, but we will not go into some of the more nitty-gritty API details.


## Parameters

Parameters are named quantities to be optimized in the minimization problems we are trying to solve.
Classes implementing parameters contain the value of the parameter, its limits, whether it's fixed or not, and eventually symmetric and asymmetric errors.


A *Parameter* initialization **MUST** contain its name and its initial value, and **MAY** include its lower and upper limits.

One can access the parameter information through the following properties (names are self explanatory):
  - The parameter name is accessed through `name`.
  - Its initial value is `init_value` and its current value is given by `value`.
  - Uncertainties are given by `error`, `upper_error` and `lower_error`. An error is raised if one tries to access them without having performed a minimization first.
  
Additionally, the parameter can be fixed/unfixed setting the `floating` property to either True or False.

## PDFs

PDF objects are normalized distributions, typically as a function of several parameters.
A very important concept is the *normalization range*, which is mandatory in most operations involving PDFs.

*Note*: details on how to compose and create your own PDFs, implement integrals, etc, belong to the implementation and will be discussed later.

PDF objects **MUST** be initialized **preferably** as keyword arguments with either *Parameters* or a simple constant and **MAY** also have a name. For example:

```python
gauss = zfit.pdf.Gauss(mu=mu, sigma=sigma, name="My Gaussian")
```

or simply a fixed pdf

```python
gauss_fix = zfit.pdf.Gauss(mu=1., sigma=4.)
```

The main methods of the PDF are:

- Getting the probability through the `prob` method, which **MUST** be called with a data array `x` and a normalization range `norm_range` as inputs. For example:

    ```python
    # Get the probabilities of some random generated events
    probs = gauss.prob(x=np.random.random(10), norm_range=(-30., 30))
    ```

- Getting the value of its integral in some given `limits` with the `integrate` method. While the `norm_range` **MUST** be given here, it **MAY** also be requested that this integral is calculated over the unnormalized PDF by setting it to `False`:

    ```python
    # Calculate the integral between -5 and 3 over the PDF normalized between -30 and 30
    integral_norm = gauss.integrate(limits=(-5, 3), norm_range=(-30., 30))
    # Calculate the unnormalized integral 
    integral_unnorm = gauss.integrate(limits=(-5, 3), norm_range=False)
    ```
    
- Getting the gradient through the **gradient** method, which **MUST** get the data arra `x` and the normalization range `norm_range` as inputs (which, as always, can be set to `False` and therefore no normalization is done). Additionally, the list of parameters with respect to which the integral is performed **MAY** be given through the `params` argument:

    ```python
    gradient = gauss.gradient(x=np.random.random(10), norm_range=(-30, 30), params=['mu'])
    ```

- Sampling from the PDF is done through the `sample` method, which **MUST** include the number of events `n_draws` as well as the limits from which to draw (`limits`):

    ```python
    # Draw 1000 samples within (-10, 10)
    sample = gauss.sample(n_draws=1000, limits=(-10, 10))
    ```
    
Additionally, extended PDFs, which will result in anything using a `norm_range` to not return the probability but the number probability (the function will be normalized to this yield instead of 1 inside the `norm_range`), can be created through the `set_yield` method, which **MUST** get a parameter as input:

```python
yield1 = Parameter("yield1", 100, 0, 1000)
gauss.set_yield(yield1)
# This integral yields approx 100
integral_extended = gauss.integrate(limits=(-10, 10), norm_range=(-10, 10))
```
    
The `is_extended` property can be then used to check whether a PDF is extended or not.

Finally, there **MUST** be the option to *temporarily* set the norm_range of a pdf with a context manager in order to perform several operations and make code more readable.

```python
with pdf.temp_norm_range((-30, 30)):
    pdf.prob(data)  # norm_range is now set
    pdf.integrate(limits=(-5, 3))
```


## Loss functions

Loss functions can then be build using `pdf.prob`, following a common interface, in which the model, the dataset and the fit range (which internally sets `norm_range` in the PDF and makes sure data only within that range are used) **MUST** be given, and where parameter constraints in form of a dictionary `{param: constraint}` **MAY** be given. 

As an example for unbinned NLL, one would write

```python
my_loss = zfit.UnbinnedNLL(gauss,
                           data,
                           fit_range=(-10, 10),
                           constraints={mu: zfit.pdf.Gauss(mu=1., sigma=0.4})
```

Additional constraints **MAY** be passed to the loss object using the `add_constraint(constraints)` method.

To build loss functions for simultaneous fits, the addition operation, either through the `my_loss.add` method or through the `+` operator, can be used (the particular combination that is performed depends on the type of loss function). The same result can be achieved by passing a list of PDFs on instantiation, along with the same number of datasets and fit ranges.

Finally, the value of the loss function is evaluated using the `eval()` method.

```python
simultaneous_loss = my_loss1 + my_loss2
loss_value = simultaneous_loss.eval()
```

## Minimizers

Minimizer objects are key to provide a coherent fitting API.
They are tied to a loss function and they keep an internal state that can be queried at any moment.

In their initialization, the loss function **MUST** be given. Additionally, the `params` to minimize, the `tolerance`, its `name`, as well as any other arguments needed to configure a particular algorithm **MAY** be given.

The internal state of the Minimizer is stored in a `MinimizerState` object, which provides access to the Estimated Distance to the Minimum, the value at the minimum and its status through the `edm`, `fmin` and `status` properties, respectively.
Additionally, the parameters of the minmizer can be accessed through the `get_parameters` method, which accept two optional parameters:

- `names`: A list of the parameter names to fetch. If not given, it returns all of them. 
- `only_floating`: Fetch only floating parameters. Defaults to `True`.

Access to all the properties and functions of `MinimizerState` can be done directly from the Minimizer, which would just return a view of its internal state.

The API **REQUIRES** to implement the following methods (below, whenever `params` is `None` means all the parameters from the loss function are used):
- `minimize(params=None)`, which runs the minimization and returns the internal state.
- `step(params=None)`, which performs only one step of the minimization procedure. If not applicable, this returns `NotImplementedError`.
- `hesse(params=None)`, which calculates the Hessian and returns the internal state.
- `error(params=None)`, which calculates the two-sided error and returns the internal state. This typically complicated function can be configured with `set_error_options`. Additionally, several methods for calculating this error can be implemented in a given minimizer, and the `set_error_method` method can be used to set the one called by `error`.
- `get_state(copy=True)`, which returns the internal `MinimizerState`, *ie*, the parameters, their errors, etc. The optional `copy` parameters controls whether a copy of the internal state (which would be the equivalent of a fit result) is returned, or just a view (reference) of it. 
- `converged`, which is set to `True` if the minimization process has been successful.

## Wrap up

With the API outlined above, a full fit procedure can be wrapped in a simple function:

```python
def minimize_unbinned_nll(pdf, data, fit_range, constraints=None):
    nll = zfit.UnbinnedNLL(pdf=pdf,
                           data=data,
                           fit_range=fit_range,
                           constraints=constraints)
    minimizer = MinuitMinimizer(nll)
    minimizer.minimize()
    minimizer.hesse()
    minimizer.error()
    return minimizer.get_state()
```