# Analysis of scRNA-seq and snRNA-seq datasets of myogenic differentiation in C2C12 cell line

Here we analyze a SPLiT-seq dataset applied to the C2C12 myogenic system [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02505](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02505), recapitulating myoblasts to myotube differentiation. 

The dataset includes single-cell RNA-seq of myoblasts (0h of differentiation), single-nuclei RNA-seq of myoblasts (0h of differentiation) and single-nuclei RNA-seq of myotubes (72h of differentiation) and it is available on GEO with accession number [GSE168776](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE168776). 

In our analyses we used the 9000-cell library of the short-read SPLiT-seq dataset, selecting the batch C since it has the highest median UMI count and gene count per cell. It includes 1305 single myoblast cells, 1718 single myoblast nuclei and 3288 single myotube nuclei. We compute a diffusion map as described above and we choose as root cell for the computation of the diffusion pseudotime the myoblast cell with the largest value of Diffusion Component 1 in the diffusion map. 

Next, we separated the three cell populations (single-cell myoblasts, single-nuclei myoblasts and single-nuclei myotubes) and we performed selection of the top 500 and 1000 HVGs prior to GRN inference in each population independently. 
We used the mouse RBPs included in the “RBP2GO” database [https://pubmed.ncbi.nlm.nih.gov/33196814/](https://pubmed.ncbi.nlm.nih.gov/33196814/), with RBP2GO score larger than 10, and we added to each dataset the RBPs that belong to the full set of the HVGs (i.e. genes that are statistically significant according to the statistical test performed using the Scanpy function “scanpy.pp.highly_variable_genes”, without setting the parameter “n_top_genes”). We manually added ADAR1 to the gene set for GRN inference for the datasets in which it does not belong to the set of HVGs. GRN inference on the processed datasets was run as described above. 
Next, we obtained RBP-RNA interactions for ADAR1 from a native RNA immunoprecipitation (RIP) combined with RNA-Seq (RIP-Seq) experiment presented in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3978302/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3978302/), which has been performed in the same cell line (C2C12) and corresponding time points (0h and 72h) as the scRNA-seq dataset. In this study, 3263 and 401 ADAR1 targets were found at 0h and 72 h, respectively. We highlight that the targets were defined based on the log2 fold change of the binding enrichment between the two time points, hence they represent specific targets for the given time point. 
We evaluated the performance of each GRN inference method in predicting ADAR1 RIP-seq interactions before and after the filter of the rankings using catRAPID with the same pipeline and evaluation metrics described above.

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad

## Load mouse RBPs from RBP2GO

In [None]:
mouse_RBPs=pd.read_csv("Table_MM_RBP.txt",delimiter="\t",skiprows=4)
mouse_RBPs=mouse_RBPs[mouse_RBPs['RBP2GO_Score']>10]
len(mouse_RBPs)

# Analysis of SR-Split-seq data

In [None]:
metadata_short_1k=pd.read_csv("./C2C12_short_1k/GSM5169184_C2C12_short_1k_cell_metadata.csv.gz")
metadata_short_9kA=pd.read_csv("./C2C12_short_9kA/GSM5169185_C2C12_short_9kA_cell_metadata.csv.gz")
metadata_short_9kB=pd.read_csv("./C2C12_short_9kB/GSM5169186_C2C12_short_9kB_cell_metadata.csv.gz")
metadata_short_9kC=pd.read_csv("./C2C12_short_9kC/GSM5169187_C2C12_short_9kC_cell_metadata.csv.gz")
metadata_short_9kD=pd.read_csv("./C2C12_short_9kD/GSM5169188_C2C12_short_9kD_cell_metadata.csv.gz")
metadata_short_9kE=pd.read_csv("./C2C12_short_9kE/GSM5169189_C2C12_short_9kE_cell_metadata.csv.gz")
metadata_short_9kF=pd.read_csv("./C2C12_short_9kF/GSM5169190_C2C12_short_9kF_cell_metadata.csv.gz")

In [None]:
metadata_short_1k['sample'].value_counts()

In [None]:
metadata_short_9kA['sample'].value_counts()

In [None]:
metadata_short_9kB['sample'].value_counts()

In [None]:
metadata_short_9kC['sample'].value_counts()

In [None]:
metadata_short_9kD['sample'].value_counts()

In [None]:
metadata_short_9kE['sample'].value_counts()

In [None]:
metadata_short_9kF['sample'].value_counts()

In [None]:
print("Median UMI count")
print(metadata_short_1k.umi_count.median(),metadata_short_9kA.umi_count.median(),
      metadata_short_9kB.umi_count.median(),metadata_short_9kC.umi_count.median(),
      metadata_short_9kD.umi_count.median(),metadata_short_9kE.umi_count.median(),
      metadata_short_9kF.umi_count.median())

print("Median Gene count")
print(metadata_short_1k.gene_count.median(),metadata_short_9kA.gene_count.median(),
      metadata_short_9kB.gene_count.median(),metadata_short_9kC.gene_count.median(),
      metadata_short_9kD.gene_count.median(),metadata_short_9kE.gene_count.median(),
      metadata_short_9kF.gene_count.median())

We use the 9kC sample

In [None]:
def SaveDataforARACNe(folder, adata, geneset, label1, label2):
    adata_temp=ad.AnnData(X=adata[:,geneset].X)
    adata_temp.obs_names=adata[:,geneset].obs_names
    adata_temp.var_names=adata[:,geneset].var_names
    adata_temp.obs['batch']=label1
    adata_temp.write(folder+'processed_'+label1+'_'+label2+'.h5ad')

## 9kC sample

In [None]:
adata_short_9k=ad.read_mtx("./C2C12_short_9kC/GSM5169187_C2C12_short_9kC.genes.mtx.gz")
genes_short_9k=pd.read_csv("./C2C12_short_9kC/GSM5169187_C2C12_short_9kC_genes.csv.gz")

In [None]:
metadata_short_9kC=metadata_short_9kC.set_index('cell_barcode')

In [None]:
metadata_short_9kC.umi_count.median()

In [None]:
adata_short_9k.obs=metadata_short_9kC.copy()
adata_short_9k.obs.index=adata_short_9k.obs.index.astype(str)
adata_short_9k.var.index=list(genes_short_9k.gene_name)
adata_short_9k.var.index=adata_short_9k.var.index.astype(str)
adata_short_9k.obs_names=adata_short_9k.obs.index
adata_short_9k.var_names=adata_short_9k.var.index
adata_short_9k.var_names_make_unique()

In [None]:
print(adata_short_9k)
inters=list(set(adata_short_9k.var_names).intersection(set(mymapping.old_GN)))
len(inters)
adata_short_9k=adata_short_9k[:,inters].copy()
adata_short_9k.var_names=adata_short_9k.var_names.map(mydict)
adata_short_9k.var_names_make_unique()
adata_short_9k

In [None]:
# Remove mito genes
print(adata_short_9k)
mito_genes = adata_short_9k.var_names.str.startswith('mt-')
adata_short_9k=adata_short_9k[:,~mito_genes].copy()
print(adata_short_9k)

In [None]:
adata_short_9k.obs['sample'].value_counts()

In [None]:
adata_short9k_UMI=adata_short_9k.copy()
sc.pp.normalize_total(adata_short_9k,target_sum=1e5)
sc.pp.log1p(adata_short_9k)
adata_short_9k.raw=adata_short_9k

In [None]:
adata_short_9k

In [None]:
sc.pp.highly_variable_genes(adata_short_9k,n_top_genes=3000)
adata_short_9k=adata_short_9k[:,adata_short_9k.var.highly_variable].copy()
sc.pp.scale(adata_short_9k)
sc.tl.pca(adata_short_9k,svd_solver='arpack')
sc.pp.neighbors(adata_short_9k)
sc.tl.umap(adata_short_9k)
sc.tl.diffmap(adata_short_9k)
sc.pl.umap(adata_short_9k,color='sample')
sc.pl.diffmap(adata_short_9k,color='sample')

In [None]:
adata_short_9k.uns['iroot'] = np.argmax(adata_short_9k.obsm['X_diffmap'][:,1])
sc.tl.dpt(adata_short_9k)

In [None]:
sc.pl.diffmap(adata_short_9k,color=['sample','dpt_pseudotime'],wspace=0.3,save="C2C12_diffmap.pdf")

In [None]:
adata_short9k_new=ad.AnnData(X=adata_short_9k.raw.X)
adata_short9k_new.var_names=adata_short_9k.raw.var_names
adata_short9k_new.obs_names=adata_short_9k.obs_names
adata_short9k_new.obs=adata_short_9k.obs.copy()

In [None]:
# Separate myoblasts single cells, myoblasts single nuclei and myotubes single nuclei
adata_short9k_MB_SC=adata_short9k_new[adata_short9k_new.obs['sample']=='MB_cells'].copy()
adata_short9k_MB_SN=adata_short9k_new[adata_short9k_new.obs['sample']=='MB_nuclei'].copy()
adata_short9k_MT_SN=adata_short9k_new[adata_short9k_new.obs['sample']=='MT_nuclei'].copy()

adata_short9k_UMI_MB_SC=adata_short9k_UMI[adata_short9k_UMI.obs['sample']=='MB_cells'].copy()
adata_short9k_UMI_MB_SN=adata_short9k_UMI[adata_short9k_UMI.obs['sample']=='MB_nuclei'].copy()
adata_short9k_UMI_MT_SN=adata_short9k_UMI[adata_short9k_UMI.obs['sample']=='MT_nuclei'].copy()

### GENE SELECTION

In [None]:
out_folder_SR9k=out_folder+'SR9k/'

if os.path.isdir(out_folder_SR9k)==False:
    os.mkdir(out_folder_SR9k)

In [None]:
out_folder_aracne_SR9k=out_folder_SR9k+'ARACNe_INPUT/'

if os.path.isdir(out_folder_aracne_SR9k)==False:
    os.mkdir(out_folder_aracne_SR9k)

In [None]:
adata_short9k_UMI_MB_SC.write(out_folder_aracne_SR9k+'SR9kMBSC_UMI.h5ad')
adata_short9k_UMI_MB_SN.write(out_folder_aracne_SR9k+'SR9kMBSN_UMI.h5ad')
adata_short9k_UMI_MT_SN.write(out_folder_aracne_SR9k+'SR9kMTSN_UMI.h5ad')

In [None]:
# Save the pseudotime data
pseudo_df_MB_SC=pd.DataFrame(data=adata_short9k_MB_SC.obs['dpt_pseudotime'], index=adata_short9k_MB_SC.obs_names)
pseudo_df_MB_SC.to_csv(out_folder_SR9k+'C2C12_SR9kMBSC_PseudoTime.csv')
pseudo_df_MB_SN=pd.DataFrame(data=adata_short9k_MB_SN.obs['dpt_pseudotime'], index=adata_short9k_MB_SN.obs_names)
pseudo_df_MB_SN.to_csv(out_folder_SR9k+'C2C12_SR9kMBSN_PseudoTime.csv')
pseudo_df_MT_SN=pd.DataFrame(data=adata_short9k_MT_SN.obs['dpt_pseudotime'], index=adata_short9k_MT_SN.obs_names)
pseudo_df_MT_SN.to_csv(out_folder_SR9k+'C2C12_SR9kMTSN_PseudoTime.csv')

### Myoblasts - single cells

In [None]:
adata_short9k_MB_SC_new=ad.AnnData(X=adata_short9k_MB_SC.X)
adata_short9k_MB_SC_new.var_names=adata_short9k_MB_SC.var_names
adata_short9k_MB_SC_new.obs_names=adata_short9k_MB_SC.obs_names
sc.pp.filter_genes(adata_short9k_MB_SC_new,min_cells=10)
sc.pp.highly_variable_genes(adata_short9k_MB_SC_new,max_mean=10)

In [None]:
adata_short9k_MB_SC_new

In [None]:
adata_short9k_MB_SC_new.var.highly_variable.value_counts()

In [None]:
HVRBPs_MB_SC=list(set(mouse_RBPs.Gene_Name).intersection(set(adata_short9k_MB_SC_new[:,adata_short9k_MB_SC_new.var.highly_variable].var_names)))
print(len(HVRBPs_MB_SC))
HVRBPs_MB_SC=list(set(['Adar']+HVRBPs_MB_SC))
print(len(HVRBPs_MB_SC))

In [None]:
adata_tmp=adata_short9k_MB_SC_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=500)

SR_MB_SC_RBP_RNA500=list(set(list(HVRBPs_MB_SC)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SC_RBP_RNA500))

all_genes=list(set(all_genes+SR_MB_SC_RBP_RNA500))



tmp_df=pd.DataFrame(data=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA500].X.todense().T,
                    index=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA500].var_names,
                    columns=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCNormalizedData_RBP_RNA500.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA500].X.todense().T, 
                    index=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA500].var_names, 
                    columns=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCRawData_RBP_RNA500.csv')

In [None]:
adata_tmp=adata_short9k_MB_SC_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=1000)

SR_MB_SC_RBP_RNA1000=list(set(list(HVRBPs_MB_SC)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SC_RBP_RNA1000))

all_genes=list(set(all_genes+SR_MB_SC_RBP_RNA1000))

tmp_df=pd.DataFrame(data=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA1000].X.todense().T,
                    index=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA1000].var_names,
                    columns=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCNormalizedData_RBP_RNA1000.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA1000].X.todense().T, 
                    index=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA1000].var_names, 
                    columns=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCRawData_RBP_RNA1000.csv')

In [None]:
adata_tmp=adata_short9k_MB_SC_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10)

SR_MB_SC_RBP_RNAHVG=list(set(list(HVRBPs_MB_SC)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SC_RBP_RNAHVG))

all_genes=list(set(all_genes+SR_MB_SC_RBP_RNAHVG))

tmp_df=pd.DataFrame(data=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNAHVG].X.todense().T,
                    index=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNAHVG].var_names,
                    columns=adata_short9k_MB_SC_new[:,SR_MB_SC_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCNormalizedData_RBP_RNAHVG.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNAHVG].X.todense().T, 
                    index=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNAHVG].var_names, 
                    columns=adata_short9k_UMI_MB_SC[:, SR_MB_SC_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSCRawData_RBP_RNAHVG.csv')

### Myoblasts - single nuclei

In [None]:
adata_short9k_MB_SN_new=ad.AnnData(X=adata_short9k_MB_SN.X)
adata_short9k_MB_SN_new.var_names=adata_short9k_MB_SN.var_names
adata_short9k_MB_SN_new.obs_names=adata_short9k_MB_SN.obs_names
sc.pp.filter_genes(adata_short9k_MB_SN_new,min_cells=10)
sc.pp.highly_variable_genes(adata_short9k_MB_SN_new,max_mean=10)

In [None]:
adata_short9k_MB_SN_new.var.highly_variable.value_counts()

In [None]:
HVRBPs_MB_SN=list(set(mouse_RBPs.Gene_Name).intersection(set(adata_short9k_MB_SN_new[:,adata_short9k_MB_SN_new.var.highly_variable].var_names)))
print(len(HVRBPs_MB_SN))
HVRBPs_MB_SN=list(set(['Adar']+HVRBPs_MB_SN))
print(len(HVRBPs_MB_SN))

In [None]:
adata_tmp=adata_short9k_MB_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=500)

SR_MB_SN_RBP_RNA500=list(set(list(HVRBPs_MB_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SN_RBP_RNA500))

all_genes=list(set(all_genes+SR_MB_SN_RBP_RNA500))

tmp_df=pd.DataFrame(data=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA500].X.todense().T,
                    index=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA500].var_names,
                    columns=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNNormalizedData_RBP_RNA500.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA500].X.todense().T, 
                    index=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA500].var_names, 
                    columns=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNRawData_RBP_RNA500.csv')

In [None]:
adata_tmp=adata_short9k_MB_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=1000)

SR_MB_SN_RBP_RNA1000=list(set(list(HVRBPs_MB_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SN_RBP_RNA1000))

all_genes=list(set(all_genes+SR_MB_SN_RBP_RNA1000))

tmp_df=pd.DataFrame(data=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA1000].X.todense().T,
                    index=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA1000].var_names,
                    columns=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNNormalizedData_RBP_RNA1000.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA1000].X.todense().T, 
                    index=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA1000].var_names, 
                    columns=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNRawData_RBP_RNA1000.csv')

In [None]:
adata_tmp=adata_short9k_MB_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10)

SR_MB_SN_RBP_RNAHVG=list(set(list(HVRBPs_MB_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MB_SN_RBP_RNAHVG))

all_genes=list(set(all_genes+SR_MB_SN_RBP_RNAHVG))

tmp_df=pd.DataFrame(data=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNAHVG].X.todense().T,
                    index=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNAHVG].var_names,
                    columns=adata_short9k_MB_SN_new[:,SR_MB_SN_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNNormalizedData_RBP_RNAHVG.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNAHVG].X.todense().T, 
                    index=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNAHVG].var_names, 
                    columns=adata_short9k_UMI_MB_SN[:, SR_MB_SN_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMBSNRawData_RBP_RNAHVG.csv')

### Myotubes - Single nuclei

In [None]:
adata_short9k_MT_SN_new=ad.AnnData(X=adata_short9k_MT_SN.X)
adata_short9k_MT_SN_new.var_names=adata_short9k_MT_SN.var_names
adata_short9k_MT_SN_new.obs_names=adata_short9k_MT_SN.obs_names
sc.pp.filter_genes(adata_short9k_MT_SN_new,min_cells=10)
sc.pp.highly_variable_genes(adata_short9k_MT_SN_new,max_mean=10)

In [None]:
adata_short9k_MT_SN_new.var.highly_variable.value_counts()

In [None]:
HVRBPs_MT_SN=list(set(mouse_RBPs.Gene_Name).intersection(set(adata_short9k_MT_SN_new[:,adata_short9k_MT_SN_new.var.highly_variable].var_names)))
print(len(HVRBPs_MT_SN))
HVRBPs_MT_SN=list(set(['Adar']+HVRBPs_MT_SN))
print(len(HVRBPs_MT_SN))

In [None]:
adata_tmp=adata_short9k_MT_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=500)

SR_MT_SN_RBP_RNA500=list(set(list(HVRBPs_MT_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MT_SN_RBP_RNA500))

all_genes=list(set(all_genes+SR_MT_SN_RBP_RNA500))

tmp_df=pd.DataFrame(data=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA500].X.todense().T,
                    index=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA500].var_names,
                    columns=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNNormalizedData_RBP_RNA500.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA500].X.todense().T, 
                    index=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA500].var_names, 
                    columns=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA500].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNRawData_RBP_RNA500.csv')

In [None]:
adata_tmp=adata_short9k_MT_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10,n_top_genes=1000)

SR_MT_SN_RBP_RNA1000=list(set(list(HVRBPs_MT_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MT_SN_RBP_RNA1000))

all_genes=list(set(all_genes+SR_MT_SN_RBP_RNA1000))



tmp_df=pd.DataFrame(data=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA1000].X.todense().T,
                    index=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA1000].var_names,
                    columns=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNNormalizedData_RBP_RNA1000.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA1000].X.todense().T, 
                    index=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA1000].var_names, 
                    columns=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNA1000].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNRawData_RBP_RNA1000.csv')

In [None]:
adata_tmp=adata_short9k_MT_SN_new.copy()
sc.pp.highly_variable_genes(adata_tmp,max_mean=10)

SR_MT_SN_RBP_RNAHVG=list(set(list(HVRBPs_MT_SN)+list(adata_tmp[:,adata_tmp.var.highly_variable].var_names)))
print(len(SR_MT_SN_RBP_RNAHVG))

all_genes=list(set(all_genes+SR_MT_SN_RBP_RNAHVG))

tmp_df=pd.DataFrame(data=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNAHVG].X.todense().T,
                    index=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNAHVG].var_names,
                    columns=adata_short9k_MT_SN_new[:,SR_MT_SN_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNNormalizedData_RBP_RNAHVG.csv')

# Save the raw data in a csv file
tmp_df=pd.DataFrame(data=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNAHVG].X.todense().T, 
                    index=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNAHVG].var_names, 
                    columns=adata_short9k_UMI_MT_SN[:, SR_MT_SN_RBP_RNAHVG].obs_names)
tmp_df.to_csv(out_folder_SR9k+'C2C12_SR9kMTSNRawData_RBP_RNAHVG.csv')

In [None]:
# 9k sample: save gene names
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSC_RBP_RNA500.txt",np.c_[SR_MB_SC_RBP_RNA500],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSC_RBP_RNA1000.txt",np.c_[SR_MB_SC_RBP_RNA1000],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSC_RBP_RNAHVG.txt",np.c_[SR_MB_SC_RBP_RNAHVG],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSN_RBP_RNA500.txt",np.c_[SR_MB_SN_RBP_RNA500],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSN_RBP_RNA1000.txt",np.c_[SR_MB_SN_RBP_RNA1000],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMBSN_RBP_RNAHVG.txt",np.c_[SR_MB_SN_RBP_RNAHVG],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMTSN_RBP_RNA500.txt",np.c_[SR_MT_SN_RBP_RNA500],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMTSN_RBP_RNA1000.txt",np.c_[SR_MT_SN_RBP_RNA1000],fmt="%s")
np.savetxt("./GENE_SELECTION_ADAR/gnames_C2C12_SR9kMTSN_RBP_RNAHVG.txt",np.c_[SR_MT_SN_RBP_RNAHVG],fmt="%s")