# Analysis of GSE131907 using CaSee

This tutorial demonstrates the discrimination analysis of [GSE131907](https://www.nature.com/articles/s41467-020-16164-1)

The input data deposited on google drive https://drive.google.com/file/d/18KxhF9FqFKCgczmUBUWV-pazx32IZRiO/view?usp=sharing

files in `GSE131907.tar.xz`:

__input:__
- GSE131907_epi_count.csv # CaSee pipline input file

- GSE131907_epi_meta.csv # Cell annotation by Kim. et al.

__output:__

- merge_data_predict_times_1~10.csv # CaSee output files, using arg `times=10`

- merge_data_predict_Integration_.csv # integration of output files times 1~10



cols of `merge_data_predict_Integration_.csv`:

- `Normal_like and Tumor_like` # output of Capsule neural network

- `scale_Normal_probe and scale_Tumor_probe` # Standardized Normal_like and Tumor_like

- `weight_Normal_probe and weight_Tumor_probe` # Weighted Normal_like and Tumor_like

- `predict_Times` # In multiple trials, the time of cell was identified as a cancer cell.

- `predict_P_vals` # Probability of cancer cells in multiple trials

### Create config file

```shell

vim configs/CaSee_Model_configs.yaml

```

### Copy and paste the following text into Casee_ Model_ configs. yaml

```
# configs of trainning-loop, plz attention this config_file must be in floder named "configs".
# 训练集使用的配置文件，请务必放在configs文件夹下 

--- 

# training model args
data_arguments: 
  work_dir: /media/yuansh/14THHD/BscModel-V4/GSE131907/ # your scrnaseq working directory
  Counts_expr: GSE131907_epi_count.csv # must .csv
  Tissue_type: "Tumor" # Tissue type come from ['Tumor','Adjacent','Normal','Unknow']
  use_others: False # if you want to use cell cluseter which is not be annotation.
  remove_genes: False # remove mt, rp, ercc, LNC, non-coding RNA

cell_annotion: False    
Marker_genes:
  T_cell: ["CD3D",'CD3E','CD2']
  Fibroblast: ['COL1A1','DCN','C1R']
  Myeloid: ['LYZ','CD68','TYROBP']
  B_cell: ['CD79A','MZB1','MS4A1']
  Endothelial: ['CLDN5','FLT1','RAMP2']
  Mast: ['CPA3','TPSAB1','TPSB2']
  DC: ['LILRA4','CXCR3','IRF7']
  Cancer: ['EPCAM']
  
save_files:
  files: merge_data

trainig_loop:
  times: 10
  batch_size: 128  # if GPU not enough, plz set sutiable number  
  max_epochs: 20 # Generally speaking 20 epochs is enough to get the result, if can't plz set sutiable number
  seed: 42
  lr: 0.0005
  split_data_seed: 0
  gpu: True
ckpt: 
```
### Running CaSee

```shell
python CaSee.py --config configs/CaSee_Model_configs.yaml
```