# Generative Modeling of Isoform Expression using Diffusion Methods

**Authors:**  
Célien ABBET - s251705  
Thibaut HEIM - s252933  
Loïc LAISNEY - s253047  
Yann BECKER - s253048  
Pierre-Eduard KOLAR - s254145  

**Course:** 02456 Deep Learning, DTU, Fall 2025

## 0. Project Overview

This notebook documents our project for the 02456 Deep Learning course.  
The project focuses on generative modeling of isoform expression using diffusion-based methods.


# 1. Introduction

This notebook presents the full deep learning pipeline developed in our project, covering:

### Training of three core models
- Variational Autoencoder (SCimilarity VAE)
- Diffusion model (scDiffusion)
- Cell-type classifier

### Application of the trained models for
- Bulk isoform inference (guided/non guided)
- Single-cell isoform inference (guided/non guided)

### A set of evaluation metrics
- MMD, KL divergence, Wasserstein distances
- Random Forest distinguishability test. We measure the AUC (Area Under Curve)
- UMAP and ROC visualization

---

The notebook does not execute heavy computations. Instead, it documents the methods and illustrates outputs using figures generated beforehand, during and after model development.


# 2. Datasets

## 2.1 Bulk Dataset
- Transcriptome isoform quantification  
- Input dimensionality: 162,009 features
- Leiden clustering used to derive reference cell-type labels  




**Bulk transcripts dataset Overview**
```json
============================================================
Dataset: bulk_transcripts
============================================================

Shape: 19882 cells × 162009 features
  - n_obs (cells): 19882
  - n_vars (genes/transcripts): 162009

Data Matrix (X):
  - Type: <class 'scipy.sparse._csr.csr_matrix'>
  - Dtype: uint32
  - Memory: 6.85 GB
  - Sparse: True

Cell metadata (obs): ['geo_accession', 'series_id', 'characteristics_ch1', 'extract_protocol_ch1', 'source_name_ch1', 'title', 'contact_city', 'contact_country', 'contact_institute', 'instrument_model', 'library_source', 'organism_ch1', 'platform_id', 'singlecellprobability', 'submission_date', 'taxid_ch1', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'mt_outlier', 'n_counts', 'n_genes', 'leiden']
Gene/Transcript metadata (var): ['gene_name', 'gene_id']
Layers: None
Obs metadata (obsm): None
Var metadata (varm): None
Unstructured annotations (uns): ['gene_n_transcripts', 'gene_to_transcripts', 'multi_isoform_genes', 'single_isoform_genes', 'transcript_id_to_index', 'transcript_mapping']

```





## 2.2 Single-Cell Dataset
- Single-cell isoform expression matrix  
- Input dimensionality: 179,610 features  
- Leiden clustering used to derive reference cell-type labels  

**Single-cell transcripts dataset Overview**

```json
============================================================
Dataset: sc_transcripts
============================================================

Shape: 183880 cells/genes × 179610 features
  - n_obs (cells): 183880
  - n_vars (genes/transcripts): 179610

Data Matrix (X):
  - Type: <class 'scipy.sparse._csr.csr_matrix'>
  - Dtype: uint32
  - Memory: 9.72 GB
  - Sparse: True

Cell metadata (obs): ['geo_accession', 'series_id', 'characteristics_ch1', 'extract_protocol_ch1', 'source_name_ch1', 'title', 'contact_city', 'contact_country', 'contact_institute', 'instrument_model', 'library_source', 'organism_ch1', 'platform_id', 'singlecellprobability', 'submission_date', 'taxid_ch1', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'mt_outlier', 'n_counts', 'n_genes', 'leiden']
Gene/Transcript metadata (var): ['gene_name', 'gene_id']
Layers: None
Obs metadata (obsm): None
Var metadata (varm): None
Unstructured annotations (uns): ['gene_n_transcripts', 'gene_to_transcripts', 'multi_isoform_genes', 'single_isoform_genes', 'transcript_id_to_index', 'transcript_mapping']
```
<br><br>

**Note:**
Most metadata will not be used in this project. Although, the leiden clusters (``` 'leiden' ```) will be valuable for conditioning the generation of isoform expressions.


## 2.3 Preprocessing
- Normalization
- Log1p transformation


```python
# Loading the data
    adata = ad.read_h5ad("/work3/s193518/scIsoPred/data/bulk_processed_transcripts.h5ad")
# Library size normalization
    sc.pp.normalize_total(adata, target_sum=1e4)
# Log1p transformation
    sc.pp.log1p(adata)
```

# 3. Variational Autoencoder (VAE)

## 3.1 VAE Architecture
The VAE architecture is based on SCimilarity.Their Zenodo records let us download some useful code.

We used their annotation model (V1) as a basis for our architecture (`url = "https://zenodo.org/record/8286452/files/annotation_model_v1.tar.gz?download=1"`)

Their weights can be downloaded with `src/utils/download_VAE.py` for finetuning.




- The obtained architecture can be seen in `src/VAE/VAE_model.py`

```python
class Encoder(nn.Module):
    ...

class Decoder(nn.Module):
    ...

class VAE(torch.nn.Module):
    ...
```

## 3.2 Training Procedure
- Training script: `src/VAE/VAE_train.py`

**Command example:**
```bash
python VAE_train.py \
    --data_dir '/work3/s193518/scIsoPred/data/sc_processed_transcripts.h5ad' \
    --num_genes 179610 \
    --max_steps 4000 \
    --max_minutes 3000 \
    --checkpoint_freq 1000 \
    --batch_size 128 \
    --state_dict "./annotation_model_v1" \
    --save_dir "./output/ae_checkpoint/vae_sc_transcript/"
```
**Note**: The `state_dict` argument specifies the path to the initial weights (when finetuning).


# 4. Diffusion Model (scDiffusion)


## 4.1 Model Architecture

We once again use an existing architecture. The scDiffusion model can be found on github: `https://github.com/EperLuo/scDiffusion/tree/main`

- Source file: `src/classifier/cell_model.py`

```python
class Cell_Unet(nn.Module):
    ...
```

## 4.2 Training Procedure
- Training script: `src/diffusion/diffusion_train.py`

We trained three variants:  
- Bulk-only training  
- Bulk pretraining followed by single-cell finetuning  
- Single-cell training from scratch  

**Command example:**  
```bash
python diffusion_train.py \
    --data_dir "/work3/s193518/scIsoPred/data/bulk_processed_transcripts.h5ad" \
    --lr_anneal_steps 200000 \
    --batch_size 128 \
    --log_interval 10 \
    --save_interval 1000 \
    --save_interval 100000 \
    --vae_path /zhome/5b/d/223428/DTU_DL_PROJECT_DIFFUSION/src/VAE/output/ae_checkpoint/vae_sc_transcript/model_seed=0_step=3999.pt' \
    --latent_dim 128 \
    --model_path 'output/classifier_checkpoint/classifier_sc_processed_transcripts' \
    --start_guide_time 500 \
    --num_class 44
```

## 4.3 Inference

- Inference script: `src/diffusion/cell_sample.py`


**Command example**
```bash
python cell_sample.py \
    --num_samples 12000 \
    --batch_size 3000 \
    --model_path "model.pt" \
    --sample_dir "samples"

```

## 4.4 Visualization (UMAP, ROC)

Click for full size


**Bulk-only**

<a href="umap/UMAP_Global_bulk_non_guided.png" target="_blank">
  <img src="umap/UMAP_Global_bulk_non_guided.png" alt="Isoform expression" height="250">
</a>

<a href="umap/ROC_rd_forests/random_forest_plot_Global_bulk_ng.png" target="_blank">
  <img src="umap/ROC_rd_forests/random_forest_plot_Global_bulk_ng.png" alt="Isoform expression" height="250">
</a>

**Single-cell only**

<a href="umap/UMAP_Global_sc_non_guided.png" target="_blank">
  <img src="umap/UMAP_Global_sc_non_guided.png" alt="Isoform expression" height="250">
</a>

<a href="umap/ROC_rd_forests/random_forest_plot_sc_non_guided.png" target="_blank">
  <img src="umap/ROC_rd_forests/random_forest_plot_sc_non_guided.png" alt="Isoform expression" height="250">
</a>


**Single-cell transfer learning**

<a href="umap/UMAP_Globa_sc_transfer_non_guided.png" target="_blank">
  <img src="umap/UMAP_Globa_sc_transfer_non_guided.png" alt="Isoform expression" height="250">
</a>

<a href="umap/ROC_rd_forests/random_forest_plot_transfer_non_guided.png" target="_blank">
  <img src="umap/ROC_rd_forests/random_forest_plot_transfer_non_guided.png" alt="Isoform expression" height="250">
</a>


**Single-cell unique class**

<a href="umap/sc_unique_class/UMAP_Unique_Class_0.png" target="_blank">
  <img src="umap/sc_unique_class/UMAP_Unique_Class_0.png" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_unique_class/random_forest_plot_Unique_Class_0_sc_False_False.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_unique_class/random_forest_plot_Unique_Class_0_sc_False_False.png" height="250">
</a>

# 5. Leiden-Cluster Classifier

The classifier is also an adaptation from scDiffusion.

## 5.1 Architecture
- Source file: `src/classifier/cell_model.py`

```python
class Cell_classifier(nn.Module):
    ...
```

## 5.2 Training Procedure
- Training script: `src/classifier/classifier_train.py`

**Command example:**
```bash
python classifier_train.py \
    --data_dir "/work3/s193518/scIsoPred/data/sc_processed_transcripts.h5ad" \
    --iterations 200000 \
    --batch_size 128 \
    --log_interval 20 \
    --eval_interval 20 \
    --save_interval 100000 \
    --vae_path /zhome/5b/d/223428/DTU_DL_PROJECT_DIFFUSION/src/VAE/output/ae_checkpoint/vae_sc_transcript/model_seed=0_step=3999.pt' \
    --latent_dim 128 \
    --model_path 'classifier_sc_processed_transcripts' \
    --start_guide_time 500 \
    --num_class 44
```
**Note**: In the diffusion process, we use 1000 noise steps.  
The argument `--start_guide_time 500` in the classifier corresponds to the point in the reverse diffusion trajectory at which classifier guidance is activated. Since early timesteps (close to 1000) are dominated by pure noise, the classifier signal would be unreliable there. By starting guidance at step 500, we make sure that the generated samples have denoised sufficiently for the classifier to provide stable and meaningful gradients.


## 5.3 Inference

- Inference script: `src/diffusion/classifier_sample.py`


**Command example**
```bash
python classifier_sample.py \
    --num_samples 9000 \
    --batch_size 3000 \
    --model_path "output/diffusion/diff_model.pt" \
    --classifier_path "output/classifier/class_model.pt" \
    --sample_dir "output/simulated_samples/single_cell" \
    --start_guide_steps 500 \
    --classifier_scale 2
```
**Note**: The argument `--classifier_scale` denotes the weight of the guidance in the generation of features (the variable *s* in the equation (3) of the report). Results will show that a value of 2 might not be the best.

## 5.4 Visualization (UMAP, ROC)

Click for full size









**Bulk-only**


<a href="umap/bulk_guided/UMAP_Cluster_0.png" target="_blank">
  <img src="umap/bulk_guided/UMAP_Cluster_0.png" alt="UMAP Cluster 0" width="250">
</a>
<a href="umap/bulk_guided/UMAP_Cluster_1.png" target="_blank">
  <img src="umap/bulk_guided/UMAP_Cluster_1.png" alt="UMAP Cluster 1" width="250">
</a>
<a href="umap/bulk_guided/UMAP_Cluster_2.png" target="_blank">
  <img src="umap/bulk_guided/UMAP_Cluster_2.png" alt="UMAP Cluster 2" width="250">
</a>
<a href="umap/bulk_guided/UMAP_Cluster_3.png" target="_blank">
  <img src="umap/bulk_guided/UMAP_Cluster_3.png" alt="UMAP Cluster 3" width="250">
</a>

<br>


<a href="umap/bulk_guided/UMAP_Combined_Clusters_0_1_2_3_4_new.PNG" target="_blank">
  <img src="umap/bulk_guided/UMAP_Combined_Clusters_0_1_2_3_4_new.PNG" alt="Combined Clusters" height="250">
</a>

<a href="umap/bulk_guided/random_forest_plot_Guided_Global_bulk_True_False.png" target="_blank">
  <img src="umap/bulk_guided/random_forest_plot_Guided_Global_bulk_True_False.png" height="250">
</a>

<a href="umap/bulk_guided/random_forest_plot_Cluster_0_bulk_True_False.png" target="_blank">
  <img src="umap/bulk_guided/random_forest_plot_Cluster_0_bulk_True_False.png" height="250">
</a>

**Single-cell only**

<a href="umap/sc_guided/UMAP_Cluster_0.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Cluster_0.png" alt="UMAP Cluster 0" width="250">
</a>
<a href="umap/sc_guided/UMAP_Cluster_6.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Cluster_6.png" alt="UMAP Cluster 6" width="250">
</a>
<a href="umap/sc_guided/UMAP_Cluster_7.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Cluster_7.png" alt="UMAP Cluster 7" width="250">
</a>
<a href="umap/sc_guided/UMAP_Cluster_12.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Cluster_12.png" alt="UMAP Cluster 12" width="250">
</a>
<a href="umap/sc_guided/UMAP_Cluster_18.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Cluster_18.png" alt="UMAP Cluster 18" width="250">
</a>

<br>

<a href="umap/sc_guided/UMAP_Combined_Clusters_0_1_2_3_4_5_6_7_8.png" target="_blank">
  <img src="umap/sc_guided/UMAP_Combined_Clusters_0_1_2_3_4_5_6_7_8.png" alt="Combined Clusters" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_guided/random_forest_plot_sc_Guided_Global.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_guided/random_forest_plot_sc_Guided_Global.png" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_guided/random_forest_sc_guided_plot_Cluster_0.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_guided/random_forest_sc_guided_plot_Cluster_0.png" height="250">
</a>


**Single-cell transfer learning**

<a href="umap/sc_transfer_guided/UMAP_Cluster_0.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_0.png" alt="Cluster 0" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_1.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_1.png" alt="Cluster 1" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_2.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_2.png" alt="Cluster 2" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_4.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_4.png" alt="Cluster 4" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_6.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_6.png" alt="Cluster 6" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_7.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_7.png" alt="Cluster 7" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_8.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_8.png" alt="Cluster 8" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_10.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_10.png" alt="Cluster 10" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_12.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_12.png" alt="Cluster 12" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_13.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_13.png" alt="Cluster 13" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_14.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_14.png" alt="Cluster 14" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_16.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_16.png" alt="Cluster 16" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_17.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_17.png" alt="Cluster 17" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_18.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_18.png" alt="Cluster 18" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_19.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_19.png" alt="Cluster 19" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_20.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_20.png" alt="Cluster 20" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_21.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_21.png" alt="Cluster 21" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_22.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_22.png" alt="Cluster 22" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_23.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_23.png" alt="Cluster 23" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_24.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_24.png" alt="Cluster 24" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_26.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_26.png" alt="Cluster 26" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_27.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_27.png" alt="Cluster 27" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_28.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_28.png" alt="Cluster 28" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_29.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_29.png" alt="Cluster 29" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_30.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_30.png" alt="Cluster 30" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_31.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_31.png" alt="Cluster 31" width="150">
</a>
<a href="umap/ROC_rd_forests/sc_unique_class/chien_mechant.jpeg" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_32.png" alt="Cluster 32" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_33.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_33.png" alt="Cluster 33" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_34.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_34.png" alt="Cluster 34" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_35.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_35.png" alt="Cluster 35" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_36.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_36.png" alt="Cluster 36" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_37.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_37.png" alt="Cluster 37" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_38.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_38.png" alt="Cluster 38" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_39.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_39.png" alt="Cluster 39" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_42.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_42.png" alt="Cluster 42" width="150">
</a>
<a href="umap/sc_transfer_guided/UMAP_Cluster_43.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Cluster_43.png" alt="Cluster 43" width="150">
</a>

<br>

<a href="umap/sc_transfer_guided/UMAP_Combined_Clusters_0_1_2_3_4_5_7_8.png" target="_blank">
  <img src="umap/sc_transfer_guided/UMAP_Combined_Clusters_0_1_2_3_4_5_7_8.png" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_transfer_guided/random_forest_plot_Guided_Global_sc_True_True.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_transfer_guided/random_forest_plot_Guided_Global_sc_True_True.png" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_transfer_guided/random_forest_plot_Cluster_0_sc_True_True.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_transfer_guided/random_forest_plot_Cluster_0_sc_True_True.png" height="250">
</a>




**Single-cell unique class**

<a href="umap/sc_unique_class/UMAP_Unique_Class_0.png" target="_blank">
  <img src="umap/sc_unique_class/UMAP_Unique_Class_0.png" height="250">
</a>

<a href="umap/ROC_rd_forests/sc_unique_class/random_forest_plot_Unique_Class_0_sc_False_False.png" target="_blank">
  <img src="umap/ROC_rd_forests/sc_unique_class/random_forest_plot_Unique_Class_0_sc_False_False.png" height="250">
</a>

# 6. Evaluation Metrics

We use the metrics listed in section 3.3.2 of the report.

- File to run the metrics: `src/metrics/run_metrics_generator.py`

**Command example**
```bash
python run_metrics_generator.py \
    --mode 'sc' \
    --guided True \
    --transfer True \
    --num_samples 1000 \
    --sample_dir "output/simulated_samples/single_cell" \
    --start_guide_steps 500 \
    --rf True \
    --per_cluster True
```

## 6.1 Results

| **Metrics**     | **Bulk NG** | **Bulk G** | **Sc NG** | **Sc G** | **TL NG** | **TL G** |
|-----------------|------------:|-----------:|----------:|---------:|----------:|---------:|
| **MMD**         | 4.0e-4      | 5.8e-4     | 4.0e-4    | 5.2e-4   | 4.0e-4    | 4.0e-4   |            
| **Wasserstein** | 1.2         | 1.1        | 1.6       | 2.6      | 2.1       | 2.0      |           
| **KL**          | 1.7         | 2.6        | 3.3       | 13       | 8.2       | 7.7      |        
| **AUC**         | 0.94        | 0.96       | 0.98      | 0.99     | 1.0       | 1.0      |

**Note:** All those metrics are measured when generating global features. The following comparison focuses on generated features when conditioning on the Leiden cluster 0.

| **Metrics**     | **Sc G (Cluster 0)** | **1 Class NG (Cluster 0)** |
|-----------------|---------------------:|---------------------------:|
| **MMD**         | 2.0e-3               | 1.4e-3             
| **Wasserstein** | 1.56                 | 1.7             
| **KL**          | 11                   | 14             
| **AUC**         | 1.0                  | 0.93

# 7. Conclusion

Overall, the project demonstrates that diffusion-based generative modeling can capture complex isoform expression structures, but several limitations remain. First, transfer learning did not yield the expected improvements, likely because bulk and single-cell distributions are too different for effective transfer. Second, the classifier-guidance mechanism appears suboptimal, the chosen guidance parameter may not be well aligned with the diffusion dynamics. Further discussion on the results and the overall project can be found in Section 4 of the report.