In [1]:
from pathlib import Path

## Set data and experiment paths

Adapt the data and experiment paths below to configure different data and experiment paths.

In [2]:
data_dir = Path('./campa_data')
experiment_dir = Path('./campa_experiment')

## Update config
  The `campa.ini` used to generate this data looked as follows:
  ```
  [DEFAULT]
  data_dir = <path-to-data>
  experiment_dir = <path-to-experiments>

  [data]
  NascentRNA = <path-to-campa_ana>/NascentRNA_constants.py

  [co_occ]
  co_occ_chunk_size = 1e7
  ```

The code below reproduces this config and places it in `~/.config/campa/campa.ini`

In [3]:
from campa.constants import campa_config

campa_config.BASE_DATA_DIR = data_dir
campa_config.EXPERIMENT_DIR = experiment_dir
campa_config.add_data_config("NascentRNA", Path('.').parent / "NascentRNA_constants.py")
campa_config.CO_OCC_CHUNK_SIZE = 1e7

# save to default config location
config_fname = Path.home() / ".config" / "campa" / "campa.ini"
campa_config.write(config_fname)


Reading config from /Users/hannah.spitzer/.config/campa/campa.ini


## Download data

Download and unzip data and experiments to `data_dir` and `experiment_dir`:

- Data: https://doi.org/10.5281/zenodo.7299516 (unzipped ~200GB)
- Pre-trained models: https://doi.org/10.5281/zenodo.7299750 (unzipped ~20GB)

## Structure of the data

After download you should have the following folders in `data_dir` (`campa_config.BASE_DATA_DIR`):

- `184A1_unperturbed/`
    unperturbed 184A1 cells (4 wells)
- `184A1_DMSO/`
    184A1 cells with control DMSO treatment (2 wells)
- `184A1_AZD4573/`
    184A1 cells with AZD4573 treatment (3 wells 1h, 2 wells 2.5h)
- `184A1_CX5461/`
    184A1 cells with CX5461 treatment (3 wells)
- `184A1_meayamycin/`
    184A1 cells with Meayamycin treatment (2 wells)
- `184A1_triptolide/`
    184A1 cells with Triptolide treatment (2 wells)
- `184A1_TSA/`
    184A1 cells with TSA treatment (2 wells)
- `HeLa_scrambled/`
    Control HeLa cells (3 wells)
- `HeLa_SBF2/`
    SBF2-perturbed HeLa cells (3 wells)
- `wells_metadata.csv`
    Metadata
- `channels_metadata.csv`
    Metadata
- `datasets/` 
    Datasets for training models

Each well is prepared in the MPPData format and can be read with CAMPA. For more information on how to read `MPPData`s refer to the [MPPData tutorial](https://campa.readthedocs.io/en/latest/notebooks/mpp_data.html). 

Experiments are downloaded to `experiment_dir` (`campa_config.EXPERIMENT_DIR`):
- `VAE_all/`
    - `CondVAE_pert-CC/`
        cVAE conditioned on perturbation and cell cycle with 3x3 neighborhood (main model reported in CAMPA)
    - `CondVAE_pert-CC_noneigh/`
        cVAE conditioned on perturbation and cell cycle with no (1x1) neighborhood
    - `CondVAE_pert-CC_neigh5/`
        cVAE conditioned on perturbation and cell cycle with 5x5 neighborhood
    - `CondVAE_pert-CC_neigh7/`
        cVAE conditioned on perturbation and cell cycle with 7x7 neighborhood  
    - `VAE/`
        VAE model without conditioning
    - `MPPleiden/`
        pixel clustering model, (baseline reported in CAMPA)
- `VAE_SBF2/`
    - `CondVAE_siRNA-CC/`
        cVAE conditioned on siRNA condition (SBFF and scrambled) and cell cycle with 3x3 neighborhood (main model reported in CAMPA)
    - `VAE/`
        VAE model without conditioning
    - `MPPleiden/`
        pixel clustering model, (baseline reported in CAMPA)

Each experiment contains the trained models in `weights_epochXX`, and the final clustering (and annotation where applicable) in `aggregated/sub-XXX`. 
We also provide a summary of CSL-derived features from the 184A1 and the HeLa datasets [here](https://doi.org/10.6084/m9.figshare.19699651).
In order to derive features from all cells in all wells by yourself, start the provided workflow at the [Project clustering](04_cluster.ipynb#project-clustering) step.

