# Workflow
This section shows how to generate consistent subcellular landmarks from an example 4i dataset.

CAMPA contains a high-level API together with command line scripts that can be easily used to create datasets, train models, and extract features.
Settings for the different stages of the workflow are communicated via parameter files. 
These are python files usually containing a dictionary of settings that are used by the individual steps.
You can find a complete set of example parameter files in [notebooks/params](params).

In the following we will use these parameter files to use the cVAE framework to generate CSLs for unperturbed and meayamycin perturbed cells.

## Setup & download data
Ensure that you have set up `config.ini` as described in the [installation instructions](../installation.rst) and have an entry for `TestData` ( `TestData = .../notebooks/params/TestData_constants.py`) in the `[data]` section.
The `[data]` section contains python files with configuration parameters for loading specific data. This data will be identified in CAMPA using the name set in the `[data]` section. 
E.g., for the test data, it will be identified by `TestData` and point to the config in [`notebooks/params/TestData_constants.py`](params/TestData_constants.py). 

The constants file is specific per dataset but has to contain the following variables:
- `DATA_DIR`: path to the folder containing the data
- `DATASET_DIR`: path to the folder where training/testing datasets derived from this data should be stored
- `OBJ_ID`: name of column in metadata.csv that contains a unique object identifier
- `CHANNELS_METADATA`: name of csv file containing channels metadata (relative to `DATA_DIR`). Is expected to contain channel names in column "name".
- `CONDITIONS`: dictionary of conditions to be used for cVAE models. Keys are column names in `metadata.csv`, and values are all possible values for this condition. This will be used to convert conditions to one-hot encoded vector

For using a different dataset, simply create a new constants file with dataset specific settings, and add a new entry to the `[data]` section in your `config.ini` file. 

To follow along with this tutorial, you need to download the example dataset.

TODO download commands

The example data is now stored in your `notebooks/example_data` folder.

The data is represented as an `MPPData` object. For more information on this class and the data representation on disk see [the tutorial on MPPData](mpp_data.ipynb).

## Create NNDataset
`NNDatasets` are created using a data params file specifying the data, normalisation, and subsampling for generating a NNDataset. 
Internally, an `NNDataset` represents train, val and test splits as `MPPData` objects. For more infomation on `NNDataset`, see [the tutorial on creating and working with a dataset using NNDataset](nn_dataset.ipynb)

The NNDataset can easily be created with the cli (assuming $CAMPA_DIR the location of campa):
```
campa create_dataset $CAMPA_DIR/notebooks/params/example_data_params.py
```

## Train cVAE
For training and evaluation, an `experiment_params.py` file is needed.
This file defines several experiments. For more information on the training and evaluation process, see [the tutorial on training cVAE models](train.ipynb)

To train, evaluate and cluster models, run (assuming $CAMPA_DIR the location of campa):
```
campa train all --config $CAMPA_DIR/notebooks/params/example_experiment_params.py
```

## 4. Cluster cVAE latent representation
After training a cVAE model, we can use its latent representation to generate consistent subcellular landmarks.
This is done in three steps.

TODO describe and link to detailled notebook

### Cluster a subset of the data
First, a subset of the entire data is clustered. This is done, because it is not feasible to cluster >10 million datapoints.
```
campa cluster xxx
```

Optionally, after this step a manual annotation of clusters can be done

### Predict latent representation on entire dataset

### Project cluster assignments to entire dataset

## 5. Extract features
TODO add notes here + link to features NB