# CytoTRACE
* using cytotrace2 to map the trajectory between cDC1,cDC2 and mregDC
* see info here: https://github.com/digitalcytometry/cytotrace2/tree/main/cytotrace2_python

## Prepare data for cytotrace
* create two txt files: 
* gene expression file --> rows: genes, columns: cell IDs
* cell annotations file --> Column 1: cell IDs, column 2: cell annotations 

In [1]:
#load packages I need
import os
import scanpy as sc
import pandas as pd
import seaborn as sns
import dandelion as ddl
import matplotlib.pyplot as plt

In [3]:
#show current directory 
os.getcwd()
os.chdir('/scratch/user/s4436039/scdata/Python_Integration_Sep')
os.getcwd()

'/scratch/user/s4436039/scdata/Python_Integration_Sep'

In [4]:
# read in data
data = sc.read_h5ad('NRclean_clustered2.h5ad')

In [5]:
# only using DC
data_DC = data[data.obs["NR_annotations_simple"].isin(["cDC1","cDC2","mregDC"])]

In [6]:
# Make the expression file
expression_df = pd.DataFrame(
    data_DC.X.T,  
    index=data_DC.var.index,  # Set gene names as row indices
    columns=data_DC.obs.index  # Set cell IDs as column names
)

# Display the DataFrame
expression_df.head()

Unnamed: 0,GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120_AM1_ACACTGATCCACTGGG-1,GSE215120_AM1_ACAGCCGCAAACCTAC-1,GSE215120_AM1_ACCAGTAAGACTGGGT-1,GSE215120_AM1_ACGTCAACAAGGACTG-1,GSE215120_AM1_ACGTCAATCCGCATCT-1,GSE215120_AM1_ACTGAGTCAGGCTGAA-1,GSE215120_AM1_ACTGTCCGTCTCTTAT-1,GSE215120_AM1_ACTTGTTTCTGAAAGA-1,GSE215120_AM1_AGAGCTTGTACAGTTC-1,...,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_CGGGCATTCTTCTGTA,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_CTACGGGGTGATCGTT,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_CTCATGCTCGTTAGAC,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_CTTGATTAGCAGGTCA,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_GAGGGATCAAGCGCTC,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_GTTTACTCAAGGCCTC,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_TACCGAACAAACCGGA,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_TCACGCTTCCGTCACT,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_TGCAGTAGTGTTGCCG,GSE180661_HGSOC_SPECTRUM-OV-045_S1_CD45P_LEFT_OVARY_TGTGAGTTCGGAAACG
HES4,2.10043,3.452755,4.218359,-0.323357,-0.392818,-0.331008,-0.30674,-0.320882,-0.312021,-0.356277,...,-0.303529,-0.236081,-0.295836,-0.235505,-0.226924,-0.299287,-0.298367,-0.291317,-0.291651,-0.271114
ISG15,-0.380933,0.944106,1.471873,0.228329,0.61383,0.046307,-0.984695,-0.364538,-0.35182,-0.223009,...,-0.025469,0.931269,-0.226304,-0.576623,0.731378,0.483867,0.876842,-0.908661,-0.895117,-0.780945
TNFRSF18,2.628043,5.650237,-0.12299,-0.125734,-0.10602,-0.124636,-0.125048,-0.125863,-0.124997,-0.115076,...,-0.12445,-0.113257,-0.120143,-0.113432,-0.112191,-0.120684,-0.131582,-0.126496,-0.121923,-0.123205
TNFRSF4,5.070108,4.063953,-0.175611,-0.16577,-0.234417,-0.180716,5.608533,-0.17404,-0.168396,-0.207875,...,-0.157069,-0.15253,-0.166347,-0.145882,-0.157567,-0.177489,-0.143188,-0.170851,-0.177479,-0.153878
ATAD3C,-0.083031,-0.084595,-0.073887,-0.070054,-0.095923,-0.073671,-0.069898,-0.070967,-0.070526,-0.084446,...,-0.066246,-0.063251,-0.070049,-0.06209,-0.064733,-0.071872,-0.061709,-0.067103,-0.069721,-0.063604


In [7]:
# Make cell annotations file
annotations_df = pd.DataFrame({
    "annotation": data_DC.obs["NR_annotations_simple"]  # Use the annotation column
})

# Display the resulting DataFrame
annotations_df.head()

Unnamed: 0,annotation
GSE215120_AM1_AAATGCCCAGAGCCAA-1,cDC2
GSE215120_AM1_ACACTGATCCACTGGG-1,cDC2
GSE215120_AM1_ACAGCCGCAAACCTAC-1,cDC2
GSE215120_AM1_ACCAGTAAGACTGGGT-1,cDC1
GSE215120_AM1_ACGTCAACAAGGACTG-1,cDC1


In [8]:
#show current directory 
os.chdir('/scratch/user/s4436039/scdata/CytoTRACE')
os.getcwd()

'/scratch/user/s4436039/scdata/CytoTRACE'

In [9]:
# Save both dataframes as .txt
annotations_df.to_csv("annotations_df.txt", sep="\t", index=True)
expression_df.to_csv("expression_df.txt", sep="\t", index=True)

## Run CytoTRACE

In [10]:
from cytotrace2_py.cytotrace2_py import *

In [15]:
import argparse

In [16]:
input_path = "/scratch/user/s4436039/scdata/CytoTRACE/expression_df.txt" 
annots_path = "/scratch/user/s4436039/scdata/CytoTRACE/annotations_df.txt" 
species_type = "human"

results =  cytotrace2(input_path,
                     annotation_path=annots_path,
                     species=species_type)

cytotrace2: Input parameters
    Input file: /scratch/user/s4436039/scdata/CytoTRACE/expression_df.txt
    Species: human
    Full model: False
    Parallelization enabled: True
    User-provided limit for number of cores to use: None
    Batch size: 10000
    Smoothing batch size: 1000
    Max PCs: 200
    Seed: 14
    Output directory: cytotrace2_results
cytotrace2: Loading dataset
cytotrace2: Dataset characteristics
    Number of input genes:  1268
    Number of input cells:  30241
cytotrace2: Preprocessing
cytotrace2: 192 cores detected
cytotrace2: Running 4 prediction batch(es) in parallel using 10 cores for smoothing per batch.
cytotrace2: Initiated processing batch 1/4 with 7561 cells
cytotrace2: Initiated processing batch 2/4 with 7560 cells
cytotrace2: Initiated processing batch 3/4 with 7560 cells
cytotrace2: Initiated processing batch 4/4 with 7560 cells
    Mapped 1130 input gene names to mouse orthologs
    1130 input genes are present in the model features.


    In case of a correct species input, be advised that model performance might be compromised due to gene space differences.


    Mapped 1130 input gene names to mouse orthologs
    1130 input genes are present in the model features.
    Mapped 1130 input gene names to mouse orthologs    Mapped 1130 input gene names to mouse orthologs


    In case of a correct species input, be advised that model performance might be compromised due to gene space differences.



    1130 input genes are present in the model features.


    In case of a correct species input, be advised that model performance might be compromised due to gene space differences.


    1130 input genes are present in the model features.


    In case of a correct species input, be advised that model performance might be compromised due to gene space differences.
Error in library(argparse) : there is no package called ‘argparse’
Execution halted
Error in library(argparse) : there is no package called ‘argparse’
Execution halted
Error in library(argparse) : there is no package called ‘argparse’
Execution halted
Error in library(argparse) : there is no package called ‘argparse’
Execution halted


CalledProcessError: Command '['Rscript', '/home/s4436039/miniforge3/envs/env/lib/python3.12/site-packages/cytotrace2_py/resources/smoothDatakNN.R', '--output-dir', 'cytotrace2_results', '--suffix', '_0', '--max-pcs', '200', '--seed', '14']' returned non-zero exit status 1.