## Context Vectors Creation for CELTIC

This notebook demonstrates the process of creating context vectors to be used with CELTIC. It uses two pre-downloaded field-of-view (FoV) images for the analysis. If additional images are needed, they can be downloaded from the Allen Institute’s [hiPSC Single-Cell Image Dataset](https://open.quiltdata.com/b/allencell/packages/aics/hipsc_single_cell_image_dataset).

The notebook covers steps to extract single-cell images, generate context vectors, and save the outputs for training and prediction. All necessary resources and code are provided to replicate the context creation process or adapt it for new FoVs.


In [None]:
# package installation (e.g for Colab users)
!git clone https://github.com/zaritskylab/CELTIC
%cd CELTIC
!pip install .

In [2]:
from celtic.preprocess.context_creator import ContextVectorsCreator
from celtic.utils.functions import download_resources
import os

# Presets
organelle = 'microtubules'
resources_dir = '../resources'

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# download resources - sample images, metadata and models (2-3 min)
if not os.path.exists(resources_dir):
    shared_folder_link = 'https://drive.google.com/drive/folders/1KTzb3fzwjH5ffSLtLNHuYiLiPg2p2VUf?usp=sharing'
    download_resources(shared_folder_link, os.path.dirname(resources_dir))

### ContextVectorsCreator Initialization

In this section, we initialize the ContextVectorsCreator class, which generates context vectors for the specified organelle.

- **`organelle`**: The organelle of interest, such as microtubules in this example.

- **`fovs_to_process`**: A list of Field-of-View (FoV) identifiers (in this case, 94 and 116) indicating which images will be processed. These FoVs represent subsets of the data for which the context vectors will be generated.

- **`resources_dir`**: The directory containing the resources needed by the ContextVectorsCreator class. This directory was downloaded in the previous section and includes necessary files like model weights or configuration files.

- **`single_cell_image_dir`**: The directory where the single-cell images are stored. These images will be used as input for processing within the ContextVectorsCreator class.

In [14]:
creator = ContextVectorsCreator(organelle,
                                fovs_to_process = [94, 116],
                                resources_dir = resources_dir,
                                single_cell_image_dir='./single_cells')

### Extracting Single Cell Images

In this step, we use the `extract_single_cell_images()` method to extract single-cell images from the specified FoVs. The single-cell images are will be provided publicly in a few weeks, and the context for microtubules is available in the `resources/microtubules/metadata` directory in files with the `_context.csv` postfix. 

While this context data is already available, this code is provided to explain how the context was created and to enable the generation of context for FoVs not included in this research. The process is time-consuming, so it is recommended to save the images locally for future use.

The method performs the following tasks:
- Extracts single-cell images from the specified FoVs.
- Pre-calculates the necessary data that will be used for context creation in later steps.


In [16]:
creator.extract_single_cell_images()

Processing FOVId 94
.... CellId 233289
.... CellId 233293
.... CellId 233295
.... CellId 233296
.... CellId 233297
.... CellId 233303
.... CellId 233304
.... CellId 233305
.... CellId 233306
.... CellId 233307
.... CellId 233308
.... CellId 233310
.... CellId 233311
.... CellId 233312
Processing FOVId 116
.... CellId 233419
.... CellId 233420
.... CellId 233422
.... CellId 233423
.... CellId 233424
.... CellId 233427
.... CellId 233430
.... CellId 233432
.... CellId 233433
.... CellId 233434
.... CellId 233440
.... CellId 233441


### Creating Context Vectors

In this step, we define the contexts of interest and use the `create_context_vectors()` method to generate context vectors for the specified FoVs. The `contexts` variable lists the types of context data to include. In this example, all implemented contexts are used.

For detailed definitions of the context types, refer to the  [paper](https://www.biorxiv.org/content/10.1101/2024.11.10.622841v1.full) under the subsection **"CELTIC, cell-context dependent in silico labeling"** in the results section.

The output of this function is a DataFrame containing the context vectors. This DataFrame can be saved and referenced using variables like `path_context_csv` and `context_features`, which are later used in the training and prediction steps.


In [19]:
contexts = ['cell_stage', 'location', 'classic_shape', 'ml_shape', 'neighborhood_density']
creator.create_context_vectors(contexts)
 

_cell_stage
_location
_classic_shape
_ml_shape
_neighborhood_density


Unnamed: 0,cell_stage_0,cell_stage_1,cell_stage_2,cell_stage_3,cell_stage_4,cell_stage_5,location,classic_shape_0,classic_shape_1,classic_shape_2,classic_shape_3,classic_shape_4,ml_shape_0,ml_shape_1,ml_shape_2,neighborhood_density
0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0.636364
1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0.181818
2,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.636364
3,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.545455
4,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.181818
5,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0.363636
6,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.363636
7,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.636364
8,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.454545
9,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0.545455
