# Tutorial: Preparing Gene Embeddings for MORPH

MORPH accepts gene embeddings in a pickle file format (`.pkl`), where:

- **Keys** are gene names (matching those in `adata.obs['gene']` from your single-cell AnnData object).
- **Values** are the corresponding embedding vectors (e.g., NumPy arrays).

This allows MORPH to incorporate **prior biological knowledge** about each genetic perturbation.

### Choosing Gene Embeddings

You are free to use any gene embeddings that capture biologically meaningful information. Examples include:

- **DepMap gene effect scores**  
  (as used in our demo)
- **Embeddings from foundation models**  
  (e.g., pre-trained models like Geneformer, scGPT, GenePT and etc)
- **Custom or domain-specific embeddings**  
  (e.g., co-expression-based, pathway-informed, or literature-derived)

Make sure that:
- The embedding dictionary should include **at least all** the genes you plan to use in your perturbation set.  *(Note: Perturbations without corresponding embedding vectors will be excluded during training.)*
- The gene names in the dictionary match exactly with those in `adata.obs['gene']`.

Below is an example using the **DepMap** prior (the processed version can also be downloaded from [here](https://drive.google.com/drive/folders/1TQJE281q4xH7HcNHMg1v0urD99EDj5bO?usp=drive_link))

In [2]:
import pickle
path = './data/depmap_crispr_gene_effect_name_correct.pkl'
with open(path, 'rb') as f:
    gene_emb = pickle.load(f)
gene_emb

{'A1BG': array([-0.12263659,  0.01975626, -0.10720831, ..., -0.025991  ,
        -0.12763858, -0.06866593]),
 'A1CF': array([ 0.02588131, -0.08364029, -0.02321112, ..., -0.00770627,
        -0.04070519, -0.10752963]),
 'A2M': array([ 0.03421726, -0.0601177 ,  0.20020365, ..., -0.03846791,
         0.13455629,  0.06780639]),
 'A2ML1': array([-0.12808195, -0.02741749,  0.1160394 , ...,  0.23657601,
        -0.04798419,  0.11207051]),
 'A3GALT2': array([-0.03128489, -0.03611604, -0.17222678, ..., -0.23969028,
        -0.11611389, -0.14989671]),
 'A4GALT': array([ 0.33804566, -0.00105581, -0.07129434, ...,  0.06161137,
         0.21314402,  0.06892323]),
 'A4GNT': array([-0.00643947,  0.31287557,  0.20327011, ...,  0.12846113,
         0.1986469 , -0.03064705]),
 'AAAS': array([-0.09364196, -0.08689675, -0.12780591, ..., -0.49811884,
        -0.18335495, -0.22024095]),
 'AACS': array([ 0.18918552,  0.20443352, -0.09098149, ...,  0.02209684,
        -0.00440888, -0.04156209]),
 'AADAC': arr