# Example 

Here we'll go into more detail on the Quick Start Example from
:doc:'getting_started'. In this example, we'll build a pipeline that
uses a Savitzky-Golay filter to compute the first derivative of the
measurement, then computes the similarity between the derivative and
itself, then clusters the data using spectral clustering, and finally
fits a Gaussian Process classifier to the data.

## Input Data

Okay, to begin, we'll load in a pre-prepared `xarray.Dataset`. To see how this data is generated, see :ref:`Building xarray.Datasets <../how-to/building_xarray_datasets>`.

This codebase uses :py:class:`xarray.Dataset` to store all input, intermediate, and output data. This is a powerful and flexible data structure for working with multi-dimensional data.


In [33]:
import numpy as np
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt


ds = xr.load_dataset('../data/example_dataset.nc')
ds


## Pipeline Step 1: Savitsky-Golay Filter


Now that we have the data in place, we'll instantiate a :py:class:`SavgolFilter` object using the a context manager (i.e., the 'with' construct shown below). Using this approach, each Pipeline operation that is defined in the context is automatically added to the ``my_first_pipeline`` variable.

In [35]:
from AFL.double_agent import *

with Pipeline() as my_first_pipeline:

       SavgolFilter(
           input_variable='measurement', 
           output_variable='derivative', 
           dim='x', 
           derivative=1
           )

my_first_pipeline


<Pipeline Pipeline N=1>

Going over the keyword arguments one by one:

- The ``input_variable`` keyword argument specifies the name of the variable in the dataset that will be used as
  the input to the Savitzky-Golay filter.
- The ``output_variable`` keyword argument specifies the name of the new variable that will be added to the dataset.
- The ``dim`` keyword argument specifies the dimension along which the filter will be applied.
- The ``derivative`` keyword argument specifies the order of the derivative to be computed.

We can inspect the pipeline by printing the ``my_first_pipeline`` variable.

In [29]:
my_first_pipeline.print()

PipelineOp                               input_variable ---> output_variable
----------                               -----------------------------------
0  ) <SavgolFilter>                      measurement ---> derivative

Input Variables
---------------
0) measurement

Output Variables
----------------
0) derivative


Finally, we can run the pipeline on the dataset and plot the results.

In [37]:
#ds_output = my_first_pipeline.calculate(ds)
#display(ds_output)

#ds_output.measurement.isel(sample=0).plot()
#ds_output.derivative.isel(sample=0).plot()


# Full Pipeline

In [None]:
from AFL.double_agent import *

with Pipeline() as my_first_pipeline:

    SavgolFilter(
        input_variable='measurement', 
        output_variable='derivative', 
        dim='x', 
        derivative=1
        )

    Similarity(
        input_variable='derivative', 
        output_variable='similarity', 
        sample_dim='sample',
        params={'metric': 'cosine'}
        )

    SpectralClustering(
        input_variable='similarity',
        output_variable='labels',
        dim='sample',
        )

    GaussianProcessClassifier(
        feature_input_variable='composition',
        predictor_input_variable='labels',
        output_prefix='extrap',
        sample_dim='sample',
        grid_variable='composition_grid',
        grid_dim='grid',

    )

    MaxValueAF(
        input_variables=['extrap_variance'],
        output_variable='next_sample',
        grid_variable='composition_grid',
    )

my_first_pipeline

<Pipeline Pipeline N=5>