# Analysis of the scRNA-seq data from astrocytes in  the ARC

## Table of contents:

* <a href=#Load>Load Packages and Set Global Variables</a>
    * <a href=#Imports>Imports and Settings</a>
    * <a href=#Global>Global Variables</a> 
* <a href=#Dataloading>Loading Data, Quality Control and Preprocessing</a>
    * <a href=#Counts>Gene numbers and counts with and without mitochondrial RNA</a>
* <a href=#Allcells>All cells - normalization, projection and clustering</a>
* <a href=#Define>Define Cell Types</a>
* <a href=#astrocytes>Astrocytes Only</a>
    * <a href=#Embedding>Embeddings and Clustering</a>
    * <a href=#adipmarkers>Astrocyte Marker Analysis</a>
    * <a href=#topde>Top ranking DE Genes</a>
    * <a href=#count_dist>Count distribution for Gfap, Aldh1l1 and Slc1a3</a>
* <a href=#traject>Gfap and Aldh1l1 only</a>

# Load Packages and Set Global Variables

<a id="imports"></a>

## Imports and Settings

In [3]:
import numpy as np
import scanpy as sc
import scipy as sci
import scipy.sparse
import pandas as pd
import seaborn as sb
import scvelo as scv
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import colors
from gprofiler import GProfiler
import custom_functions as cf
from matplotlib_venn import venn3_unweighted
from scipy import stats
import pingouin as pg
import matplotlib_venn
import statistics
import gseapy
import sys
import re
import os

import batchglm
import diffxpy.api as de

import warnings
warnings.filterwarnings('ignore')

%load_ext autoreload
%autoreload 2
sc.settings.verbosity = 3 # amount of output

base_dir = '/Users/viktorian.miok/Documents/consultation/Luiza/single_cell/data/scanpy_AnnData/'
dir_out = '/Users/viktorian.miok/Documents/consultation/Luiza/single_cell/results/'
dir_scv = '/Users/viktorian.miok/Documents/consultation/Luiza/single_cell/data/velocyto/'
dir_tables = dir_out+'tables/'
sc_settings_figdir = dir_out+'figures/'
sc_settings_writedir = dir_out+'anndata/'
sc.logging.print_versions()
os.chdir(dir_out)
sc.settings.set_figure_params(dpi=80, scanpy=True)
print (sys.version)



-----
anndata     0.7.5
scanpy      1.7.1
sinfo       0.3.1
-----
PIL                 8.1.2
PyObjCTools         NA
anndata             0.7.5
appdirs             1.4.4
appnope             0.1.2
autoreload          NA
backcall            0.2.0
batchglm            v0.7.4
bioservices         1.7.11
bs4                 4.9.3
certifi             2020.12.05
cffi                1.14.5
chardet             4.0.0
cloudpickle         1.6.0
colorama            0.4.4
colorlog            NA
custom_functions    NA
cycler              0.10.0
cython_runtime      NA
dask                2021.03.0
dateutil            2.8.1
decorator           4.4.2
diffxpy             v0.7.4
docutils            0.16
easydev             0.11.0
get_version         2.1
gprofiler           1.0.0
gseapy              0.10.4
h5py                3.2.1
idna                2.10
igraph              0.9.0
ipykernel           5.4.3
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.17.2
joblib              1.0.1


In [4]:
#Define a nice colour map for gene expression
colors2 = plt.cm.Reds(np.linspace(0, 1, 128))
colors3 = plt.cm.Greys_r(np.linspace(0.7,0.8,20))
colorsComb = np.vstack([colors3, colors2])
mymap = colors.LinearSegmentedColormap.from_list('my_colormap', colorsComb)
sc.set_figure_params(scanpy=True, fontsize=17)

## Global Variables

All embeddings and clusterings can be saved and loaded into this script. Be carful with overwriting cluster caches as soon as cell type annotation has started as cluster labels may be shuffled.

Set whether anndata objects are recomputed or loaded from cache.

In [5]:
bool_recomp = False

Set whether clustering is recomputed or loaded from saved .obs file. Loading makes sense if the clustering changes due to a change in scanpy or one of its dependencies and the number of clusters or the cluster labels change accordingly.

In [6]:
bool_recluster = False

Set whether cluster cache is overwritten. Note that the cache exists for reproducibility of clustering, see above.

In [7]:
bool_write_cluster_cache = False

Set whether to produce plots, set to False for test runs.

In [8]:
bool_plot = False

Set whether observations should be calculated. If false, it is necessary to read cacheed file that contains the necssary information. It then shows the the distributions of counts and genes, as well as mt_frac after filtering. 
Set to true in order to see the data before filtering and follow the decisions for cutoffs.

In [9]:
bool_create_observations = True

<a id="Dataloading"></a>

# Loading Data, Quality Control and Preprocessing

Read the data in:

In [10]:
if bool_recomp:
    adata_raw1 = sc.read(base_dir + 'MUC26030/filtered_feature_bc_matrix.h5ad')
    adata_raw2 = sc.read(base_dir + 'MUC26031/filtered_feature_bc_matrix.h5ad')
    adata_raw3 = sc.read(base_dir + 'MUC26032/filtered_feature_bc_matrix.h5ad')
    adata_raw = adata_raw1.concatenate([adata_raw2, adata_raw3],
                                       batch_key='diet', 
                                       batch_categories=['chow', 'hfd_5', 'hfd_15']
    )
    sc.write(sc_settings_writedir+'adata_raw.h5ad', adata_raw)
else:
    adata_raw = sc.read(sc_settings_writedir+'adata_raw.h5ad')

<a id="QC"></a>

Summary of steps performed here: Only cells with at least 500 UMIs are kept. Counts per cell are cell library depth normalized. The gene (feature) space is reduced with PCA to 50 PCs. A nearest neighbour graph and umap are computed based on the PC space. Cell are clustered with louvain clustering based on the nearest neighbour graph. Graph abstraction is computed based on the louvain clustering.

In [11]:
sc.pp.filter_cells(adata_raw, min_counts = 1)

The data contains 21143 observations with 31253 different genes. Due to dropouts, some of the observations might not show any counts and genes. In order to calculate the fraction of mitochondrial RNA in the next steps, each observations without counts must be filtered out to prevent NaN from emerging. 

In [12]:
print('Number of cells: {:d}'.format(adata_raw.n_obs))
print('Number of genes: {:d}'.format(adata_raw.shape[1]))
print('Number of cells per diet:')
adata_raw.obs['diet'].value_counts().sort_index()

Number of cells: 21143
Number of genes: 31253
Number of cells per diet:


chow      7116
hfd_5     6204
hfd_15    7823
Name: diet, dtype: int64

### Gene numbers and counts with and without mitochondrial RNA

Create necessary obs:

In [13]:
adata_qc = adata_raw.copy()
adata_qc.obs['n_genes'] = (adata_qc.X > 0).sum(1)
mt_gene_mask = [gene.startswith('mt-') for gene in adata_qc.var_names]
temp_mt_sum = adata_qc[:,mt_gene_mask].X.sum(1)
temp_mt_sum = np.squeeze(np.asarray(temp_mt_sum))
adata_qc.obs['n_counts'] = adata_qc.X.sum(1)
temp_n_counts = adata_qc.obs['n_counts']
adata_qc.obs['mt_frac'] = temp_mt_sum/adata_qc.obs['n_counts']

Plot n_counts and mt_frac:

In [14]:
if bool_plot == True:
    t1 = sc.pl.violin(adata_qc, ['n_counts', 'n_genes', 'mt_frac'], size=1, log=False, jitter=3, multi_panel=True)

In [15]:
if bool_plot==True:
    sc.pl.highest_expr_genes(adata_qc, n_top=20) 

Overall, the data contains a lot of observations with high fractions of mitochondrial RNA. Additionally, most observations show counts below 100, suggesting poor data quality. To further investigate the distributions counts over genes per observations, scatterplots are created:

### Number of Genes versus Number of Counts

In [16]:
if bool_plot == True:
    p1 = sc.pl.scatter(adata_qc, 'n_counts', 'n_genes', color='mt_frac', size=5)
    p2 = sc.pl.scatter(adata_qc[adata_qc.obs['n_counts']<5000], 'n_counts', 'n_genes', color='mt_frac', size=5)

### Distribution of Counts and Genes

For the remaining observations, the fraction of mitochondrial RNA is generally very low and at most 20%

In [17]:
if bool_plot == True:
    p6 = sb.distplot(adata_qc.obs['n_counts'], kde=False)
    plt.show()
    p7 = sb.distplot(adata_qc.obs['n_counts'][adata_qc.obs['n_counts']<1000], kde=False)
    plt.show()

In [18]:
if bool_plot == True:
    p9 = sb.distplot(adata_qc.obs['n_genes'], kde=False, bins=60)
    plt.show()
    p10 = sb.distplot(adata_qc.obs['n_genes'][adata_qc.obs['n_genes']<500], kde=False, bins=60)
    plt.show()

### Filtering

In [19]:
# Filter cells according to identified QC thresholds:
print('Total number of cells: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, min_counts=200)
print('Number of cells after min count filter: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, max_counts=100000)
print('Number of cells after max count filter: {:d}'.format(adata_qc.n_obs))

adata_qc = adata_qc[adata_qc.obs['mt_frac'] < 0.5]
print('Number of cells after MT filter: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, min_genes=350)
print('Number of cells after gene filter: {:d}'.format(adata_qc.n_obs))

Total number of cells: 21143


filtered out 35 cells that have more than 100000 counts


Number of cells after min count filter: 21143
Number of cells after max count filter: 21108
Number of cells after MT filter: 20491


filtered out 496 cells that have less than 350 genes expressed
Trying to set attribute `.obs` of view, copying.


Number of cells after gene filter: 19995


In [20]:
#Filter genes:
print('Total number of genes: {:d}'.format(adata_qc.n_vars))

# Min 20 cells - filters out 0 count genes
sc.pp.filter_genes(adata_qc, min_cells=20)
print('Number of genes after cell filter: {:d}'.format(adata_qc.n_vars))

Total number of genes: 31253


filtered out 13164 genes that are detected in less than 20 cells


Number of genes after cell filter: 18089


In [21]:
if bool_plot == True:
    p1 = sc.pl.scatter(adata_qc, 'n_counts','n_genes',color='mt_frac', size=5)
    p3 = sc.pl.scatter(adata_qc[adata_qc.obs['n_counts']<5000],'n_counts','n_genes',color='mt_frac', size=5)

In [22]:
print('Number of cells: {:d}'.format(adata_qc.n_obs))
print('Number of genes: {:d}'.format(adata_qc.shape[1]))
print('Number of cells per diet:')
adata_qc.obs['diet'].value_counts().sort_index()

Number of cells: 19995
Number of genes: 18089
Number of cells per diet:


chow      6741
hfd_5     5886
hfd_15    7368
Name: diet, dtype: int64

## All cells - normalization, projection and clustering

In [23]:
if bool_recomp == True:
        
    adata_proc = adata_qc.copy()
    adata_proc.raw = adata_qc
    sc.pp.normalize_per_cell(adata_proc)
    sc.pp.log1p(adata_proc)
    sc.pp.combat(adata_proc, key='diet')
    sc.pp.highly_variable_genes(adata_proc, flavor='cell_ranger',n_top_genes=4000)
    sc.pl.highly_variable_genes(adata_proc)
    #adata_proc.X = adata_proc.X.toarray()
    
    sc.pp.pca(adata_proc, n_comps=50, random_state=0, use_highly_variable=True, svd_solver='arpack')
    sc.pp.neighbors(adata_proc, n_neighbors=100, knn=True, method='umap', n_pcs=50, random_state=0)
    sc.tl.umap(adata_proc)
    if bool_recluster == True:
        #sc.tl.louvain(adata_proc, resolution=0.5, flavor='vtraag', random_state=0)
        sc.tl.leiden(adata_proc, resolution=0.3)
        pd.DataFrame(adata_proc.obs).to_csv(path_or_buf =sc_settings_writedir+"obs_adata_proc.csv")
    else:
        obs = pd.read_csv(sc_settings_writedir+'obs_adata_proc.csv')
        adata_proc.obs['leiden']=pd.Series(obs['leiden'].values, dtype = 'category')
    sc.write(sc_settings_writedir+'adata_proc.h5ad',adata_proc)
else:
    adata_proc = sc.read(sc_settings_writedir+'adata_proc.h5ad') 
sc.tl.paga(adata_proc)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:01)


Produce some summarizing plots that show the global characteristics of the data.

In [24]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc, ['leiden'], save="_all_cells_leiden", use_raw=False)

In [25]:
#####################################################################################################################
if bool_plot == True:
    plt.rcParams['figure.figsize'] = [5,5]
    cf.plot_umap_marker(adata_proc, ['leiden'], save="_all_cells_leiden_ondata", use_raw=False, legend_loc='on data',
                        frameon=False, title='', size=20)

Number of cells in each cluster:

In [26]:
adata_proc.obs["leiden"].value_counts()

0     3921
1     2547
2     2386
3     2247
4     1466
5     1403
6     1233
7     1080
8     1075
9     1056
10     877
11     515
12     109
13      80
Name: leiden, dtype: int64

# Define Cell Types

<a id="DE"></a>

## Summary heatmap, dotplot and stacked_violin for cluster assignments

In [27]:
#####################################################################################################################
if bool_plot==True:
    marker_genes_dict = {'Astrocytes': ['Slc1a2', 'Slc1a3', 'Aqp4', 'Gfap', 'Aldh1l1', 'Gja1', 'Gjb6', 'Atp1b2'],
                         'Endothelial cells': ['Cldn5', 'Pecam1', 'Slco1c1'],
                         'Ependymal cells': ['Ccdc153', 'Rarres2', 'Hdc', 'Tm4sf1'],
                         'Microglia': ['Itgam', 'Tmem119', 'Cx3cr1','Csf1r', 'Aif1', 'P2ry12'],
                         'Mural cells': ['Mustn1', 'Pdgfrb', 'Des'],
                         'Neurons': ['Rbfox3', 'Syp', 'Tubb3', 'Snap25', 'Syt1'],
                         'Oligodendrocytes': ['Olig1','Mog', 'Mag'],
                         'Mural cells': ['Mustn1', 'Pdgfrb','Des'],
                         'Tanycytes': ['Rax', 'Lhx2', 'Col23a1', 'Slc16a2', 'Crym', 'Adm'],
                         'VLMCs': ['Lum', 'Col1a1', 'Col3a1']}
    
    for i in list(marker_genes_dict.keys()):
        marker_genes_dict[i].sort()
        
    sc.pl.dotplot(
        adata=adata_proc,
        var_names=marker_genes_dict, 
        groupby='leiden',
        use_raw=False, 
        log=False, 
        dendrogram=True, 
        var_group_rotation=90, 
        show=True, 
        #size_title=5,
        save="all_cells_celltypes_markers.pdf")

## UMAP with assigned cell types

In [28]:
#####################################################################################################################
if bool_plot == True:
    new_cluster_names = {
    '0': "Astrocytes",
    '1': "Tanycytes",
    '2': "Ependymal cells",
    '3': "Neurons",
    '4': "Endothelial cells",
    '5': "Oligodendrocytes",
    '6': "Microglia",
    '7': "Oligodendrocytes",
    '8': "Neurons",
    '9': "Microglia",
    '10': "Oligodendrocytes",
    '11': "Mural cells",
    '12': "VLMCs",
    '13': "Oligodendrocytes"
    }
    adata_proc.obs['celltypes'] = [new_cluster_names[x] for x in  adata_proc.obs['leiden']]
    
    cf.plot_umap_marker(adata_proc, ['celltypes'], save="_all_cells_celltypes.png", use_raw=False, 
                    frameon=False, size=20, title='', 
                    palette=['#1f77b4','#aa40fc','#279e68','#aec7e8','#98df8a','#d62728','#ffbb78','#ff7f0e','#ff9896'])

In [29]:
#####################################################################################################################
if bool_plot == True:
    adata_proc_hv = adata_proc[:, adata_proc.var.highly_variable]
    adata_proc_hv.raw = adata_qc[:, adata_proc.var.highly_variable]
    adata_proc_hv.obs['diet_leiden'] = adata_proc_hv.obs['diet'].str.cat(adata_proc_hv.obs['leiden'], sep='_')
    adata_proc_hv.obs['diet_celltypes'] = adata_proc_hv.obs['diet'].str.cat(adata_proc_hv.obs['celltypes'], sep='_')
    
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Astrocytes'], reference='chow_Astrocytes', key_added="ct5_ast", method='t-test') #  wilcoxon
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Endothelial cells'], reference='chow_Endothelial cells', key_added="ct5_end", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Ependymal cells'], reference='chow_Ependymal cells', key_added="ct5_epe", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Microglia'], reference='chow_Microglia', key_added="ct5_mic", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Mural cells'], reference='chow_Mural cells', key_added="ct5_mur", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Neurons'], reference='chow_Neurons', key_added="ct5_neu", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Oligodendrocytes'], reference='chow_Oligodendrocytes', key_added="ct5_oli", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_Tanycytes'], reference='chow_Tanycytes', key_added="ct5_tan", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_5_VLMCs'], reference='chow_VLMCs', key_added="ct5_vlmc", method='t-test')
    
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Astrocytes'], reference='chow_Astrocytes', key_added="ct15_ast", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Endothelial cells'], reference='chow_Endothelial cells', key_added="ct15_end", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Ependymal cells'], reference='chow_Ependymal cells', key_added="ct15_epe", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Microglia'], reference='chow_Microglia', key_added="ct15_mic", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Mural cells'], reference='chow_Mural cells', key_added="ct15_mur", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Neurons'], reference='chow_Neurons', key_added="ct15_neu", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Oligodendrocytes'], reference='chow_Oligodendrocytes', key_added="ct15_oli", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_Tanycytes'], reference='chow_Tanycytes', key_added="ct15_tan", method='t-test')
    sc.tl.rank_genes_groups(adata_proc_hv, 'diet_celltypes', groups=['hfd_15_VLMCs'], reference='chow_VLMCs', key_added="ct15_vlmc", method='t-test')

    ct0_5=[]
    ct0_5up=[]
    ct0_5down=[]
    df0_5=[]
    for i in ['ct5_ast','ct5_end','ct5_epe','ct5_mic','ct5_mur','ct5_neu','ct5_oli','ct5_tan','ct5_vlmc']:
        result = adata_proc_hv.uns[i]
        groups = result['names'].dtype.names
        df5 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                            for group in groups for key in ['pvals_adj', 'logfoldchanges']}) 
        #print(sum(df5.iloc[:, 0]<0.05))
        print(sum((df5.iloc[:, 0]<0.05) & (df5.iloc[:, 1]>0)))
        print(-sum((df5.iloc[:, 0]<0.05) & (df5.iloc[:, 1]<=0)))
        ct0_5.append(sum(df5.iloc[:, 0]<0.05))  
        ct0_5up.append(sum((df5.iloc[:, 0]<0.05) & (df5.iloc[:, 1]>0)))  
        ct0_5down.append(sum((df5.iloc[:, 0]<0.05) & (df5.iloc[:, 1]<=0)))  
        df0_5.append(df5) 
        
    ct0_15=[]
    ct0_15up=[]
    ct0_15down=[]
    df0_15=[]
    for i in ['ct15_ast','ct15_end','ct15_epe','ct15_mic','ct15_mur','ct15_neu','ct15_oli','ct15_tan','ct15_vlmc']:
        result = adata_proc_hv.uns[i]
        groups = result['names'].dtype.names
        df15 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                             for group in groups for key in ['pvals_adj', 'logfoldchanges']}) 
        #print(sum(df15.iloc[:, 0]<0.05))
        #print(sum((df15.iloc[:, 0]<0.05) & (df15.iloc[:, 1]>0)))
        #print(-sum((df15.iloc[:, 0]<0.05) & (df15.iloc[:, 1]<=0)))
        ct0_15.append(sum(df15.iloc[:, 0]<0.05))
        df0_15.append(df15)
    
    d = {'0_5':ct0_5,'0_15':ct0_15}
    dfin = pd.DataFrame(d)
    dfin['0_0'] = [0,0,0,0,0,0,0,0,0]
    dfin=dfin.reindex(['0_0','0_5','0_15'], axis=1)
    dfin.index=['Astrocytes','Endothelial cells','Ependymal cells','Microglia','Mural cells','Neurons','Oligodendrocytes','Tanycytes','VLMCs']
    
    plt.rcParams["figure.figsize"] = (7,7)

    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[0], linewidth=3, c='#1f77b4')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[1], linewidth=3, c='#aa40fc')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[2], linewidth=3, c='#279e68')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[3], linewidth=3, c='#aec7e8')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[4], linewidth=3, c='#98df8a')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[5], linewidth=3, c='#d62728')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[6], linewidth=3, c='#ffbb78')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[7], linewidth=3, c='#ff7f0e')
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], dfin.iloc[8], linewidth=3, c='#ff9896')

    #plt.xlabel("X axis label")
    plt.ylabel("Number of DEGs")

In [30]:
if bool_plot == True:
    result = adata_proc_hv.uns['ct5_ast']
    groups = result['names'].dtype.names
    dfa5 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                        for group in groups for key in ['names', 'pvals_adj', 'logfoldchanges']}) 

    result = adata_proc_hv.uns['ct5_neu']
    groups = result['names'].dtype.names
    dfn5 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                        for group in groups for key in ['names', 'pvals_adj', 'logfoldchanges']}) 

    result = adata_proc_hv.uns['ct15_ast']
    groups = result['names'].dtype.names
    dfa15 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                        for group in groups for key in ['names', 'pvals_adj', 'logfoldchanges']}) 

    result = adata_proc_hv.uns['ct15_neu']
    groups = result['names'].dtype.names
    dfn15 = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                        for group in groups for key in ['names', 'pvals_adj', 'logfoldchanges']}) 
    
    #dfa5.to_csv("DGE0_5astro.csv")
    #dfn5.to_csv("DGE0_5neuro.csv")
    #dfa15.to_csv("DGE0_15astro.csv")
    #dfn15.to_csv("DGE0_15neuro.csv")
    
    #df0_5.to_csv("DGE0_5celltypes.csv")

### microglia inflamation markers

In [31]:
#####################################################################################################################
if bool_plot == True:
    adata_inf=adata_proc[(adata_proc.obs['celltypes']=='Astrocytes') | (adata_proc.obs['celltypes']=='Microglia'),:]
    adata_inf.obs['diet_celltype'] = adata_inf.obs['celltypes'].astype(str) + '_' + adata_inf.obs['diet'].astype(str)

    marker_genes_mglia = {'inflam_mark': ['Il1b','Il2','Il6','Tnf','Il1r1','Il6ra','Tlr4','Lcn2','Mmp2','Ccl5','Ccl12',
                                         'Vegfa','Tpo','Plaur','Axin2'], # ,'Il5','Mmp13',
                         'mglia_activ': ['Itgam','Cd14','Fcgr3','Fcgr2b','Cd40','Ptprc','Cd68','Cd80','Cd86','Cx3cr1',
                                             'Adgre1','Fcer1g']} # ,'H2'
    'C3H2-T23','Serping1','H2-D1','Ggta1','Iigp1','Gbp2','Fbln5','Fkbp5','Psmb8','Srgn','Amigo2'
    marker_pan_a1_a2 = {'PAN_reactive': ['Lcn2','Steap4','S1pr3','Timp1','Hsbp1','Cxcl10','Cd44','Osmr','Cp','Serpina3n',
                                         'Aspg','Vim','Gfap'],
                        'A1_specific': ['Serping1','H2-D1','Ggta1','Iigp1','Gbp2','Fbln5','Fkbp5','Psmb8',
                                        'Srgn','Amigo2'], #'C3H2-T23',
                        'A2_specific': ['Clcf1','Tgm1','Ptx3','S100a10','Sphk1','Cd109','Ptgs2','Emp1','Slc10a6','Tm4sf1',
                                        'B3gnt5','Cd14','Stat3']}

In [32]:
#####################################################################################################################
if bool_plot == True:
    adata_inf.layers['scaled'] = sc.pp.scale(adata_inf, copy=True).X

    sc.pl.matrixplot(adata_inf, marker_genes_mglia, 'diet_celltype', dendrogram=False, swap_axes=True,
                     colorbar_title='mean z-score', layer='scaled', vmin=-2, vmax=2, cmap='RdBu_r')
    sc.pl.stacked_violin(adata_inf, marker_genes_mglia, 'diet_celltype', dendrogram=False, swap_axes=True,
                     colorbar_title='mean z-score', layer='scaled', vmin=-2, vmax=2, cmap='RdBu_r')

In [34]:
#####################################################################################################################
if bool_plot == True:
    adata_a = adata_astro
    adata_a.layers['scaled'] = sc.pp.scale(adata_a, copy=True).X

    sc.pl.matrixplot(adata_a, marker_pan_a1_a2, 'diet', dendrogram=False, swap_axes=True, 
                     colorbar_title='mean z-score', layer='scaled', vmin=-0.2, vmax=0.2, cmap='RdBu_r')

    sc.pl.heatmap(adata_a, marker_pan_a1_a2, swap_axes=True, show=False, use_raw=True,
                  show_gene_labels=True, groupby='diet', dendrogram=False, layer='scaled', vmin=-2, vmax=2, cmap='RdBu_r')

    sc.pl.matrixplot(adata=adata_a, var_names=marker_pan_a1_a2, groupby='diet', use_raw=False, log=False,  
                     dendrogram=False, var_group_rotation=90, swap_axes=True, show=True)

# Astorcytes only

<a id="Embedding"></a>

## Embedding and Clustering

In [35]:
if bool_recomp == True:  
    cell_ids_astro = np.asarray(adata_proc.obs_names)[
        [x in ['astrocytes'] 
         for x in np.asarray(adata_proc.obs['celltypes'].values)]
    ]
    adata_astro = adata_raw[cell_ids_astro,:].copy()  # adata_raw
    #dat = pd.DataFrame(adata_proc.X, index=adata_proc.obs.index, columns=adata_proc.var.index)
    adata_astro.obs['n_genes'] = (adata_astro.X > 0).sum(1)
    adata_astro.obs['n_counts'] = adata_astro.X.sum(1)
    mt_gene_mask = [gene.startswith('mt-') for gene in adata_astro.var_names]
    temp_mt_sum = adata_astro[:,mt_gene_mask].X.sum(1)
    temp_mt_sum = np.squeeze(np.asarray(temp_mt_sum))
    temp_n_counts = adata_astro.obs['n_counts']
    adata_astro.obs['mt_frac'] = temp_mt_sum/adata_astro.obs['n_counts']
    adata_astro.raw = adata_astro
    sc.pp.normalize_per_cell(adata_astro)
    sc.pp.log1p(adata_astro)
    sc.pp.highly_variable_genes(adata_astro,n_top_genes=4000)
    sc.pl.highly_variable_genes(adata_astro)
    adata_astro.X = adata_astro.X.toarray()
    
    sc.pp.pca(adata_astro, n_comps=50, use_highly_variable = True, random_state=0, svd_solver='arpack')
    sc.pp.neighbors(adata_astro, n_neighbors=100, knn=True, method='umap', n_pcs=50, random_state=0)
    sc.tl.umap(adata_astro)
    if bool_recluster == True:
        sc.tl.leiden(adata_astro, resolution=0.5)
        pd.DataFrame(adata_astro.obs).to_csv(path_or_buf=sc_settings_writedir+'obs_adata_astro.csv')
    else:
        obs = pd.read_csv(sc_settings_writedir+'obs_adata_astro.csv')
        adata_astro.obs['leiden']=pd.Series(obs['leiden'].values, dtype = 'category')
    sc.write(sc_settings_writedir+'adata_astro.h5ad',adata_astro)
else:
    adata_astro = sc.read(sc_settings_writedir+'adata_astro.h5ad') 
sc.tl.paga(adata_astro)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


# RNA Velocitiy and clustering

In [36]:
spmark=['Aldoc','Aqp4','S1pr1','Slc38a1',
        'Npy','Pcsk1n','Ins2','Il1b','Ifit3b','Flt1','Ucp2','Vamp5','S100b','Gfap','Aldh1l1',
        'Apoe','Ift43','Tpt1','Slc1a2','Slc3a2']

#### chow

In [37]:
if bool_recomp == True:
    astro0 = adata_astro[adata_astro.obs['diet']=='chow',]
    sc.tl.leiden(astro0, resolution=0.5, key_added='cluster0')
    astro0.obs['cluster0n'] = astro0.obs['cluster0'].replace({'0': 'b', '1': 'd', '2': 'a', '3': 'c'})
    sc.write(sc_settings_writedir+'astro0.h5ad', astro0)
else:
    astro0 = sc.read(sc_settings_writedir+'astro0.h5ad')   

In [38]:
if bool_recomp == True:  
    adata_loom0 = scv.read(dir_scv+'MUC26030/possorted_genome_bam_Z0I20.loom', cache=True)
    astro0_v = scv.utils.merge(astro0, adata_loom0)
    scv.pl.proportions(astro0_v)
    scv.pp.filter_and_normalize(astro0_v, min_shared_counts=20, n_top_genes=2000)
    scv.pp.moments(astro0_v, n_pcs=30, n_neighbors=30)
    scv.tl.recover_dynamics(astro0_v)
    scv.tl.velocity(astro0_v)
    scv.tl.velocity_graph(astro0_v)
    
    sc.write(sc_settings_writedir+'astro0_v.h5ad', astro0_v)
else:
    astro0_v = sc.read(sc_settings_writedir+'astro0_v.h5ad') 

In [39]:
if bool_plot == True:
    cf.plot_umap_marker(astro0, ['cluster0n'], palette=['dimgray','gray','darkgray','lightgray'],
                        use_raw=False, size=60, frameon=False)
    scv.pl.velocity_embedding_stream(astro0_v, basis='umap', color='cluster0n', use_raw=False, size=60, 
                                     legend_loc='right margin', palette=['dimgray','gray','darkgray','lightgray']) 

Rank genes by likelihoods per cluster/regime.
This ranks genes by their likelihood obtained from the dynamical model grouped by clusters specified in groupby.

In [40]:
if bool_plot == True:
    top_genes = astro0_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro0_v, groupby='cluster0')
    df = scv.get_df(astro0_v, 'rank_dynamical_genes/names')
    df.to_csv('chow_potential_drivers.csv')
    print(df.shape)
    df.head(10)

In [41]:
if bool_plot == True:
    scv.tl.latent_time(astro0_v)
    top_genes = df['1']   #['Aldoc','Aqp4','S1pr1','Slc38a1']
    scv.pl.heatmap(astro0_v, var_names=top_genes, sortby='latent_time', col_color='cluster0n', n_convolve=100,
                   figsize=(10,13), font_scale=1, colorbar=True, yticklabels=True)

In [42]:
if bool_plot == True:
    scv.pl.velocity(astro0_v, var_names=['Gfap','Aldoc','Pcsk1n','Aqp4','S1pr1','Pcdh15','Pak3'], 
                    colorbar=True, ncols=1, color='cluster0n')

#### hfd_5

In [43]:
if bool_recomp == True:
    astro5 = adata_astro[adata_astro.obs['diet']=='hfd_5',]
    sc.tl.leiden(astro5, resolution=0.5, key_added='cluster5')
    astro5.obs['cluster5n'] = astro5.obs['cluster5'].replace({'0': 'c', '1': 'a', '2': 'b'})
    sc.write(sc_settings_writedir+'astro5.h5ad', astro5)
else:
    astro5 = sc.read(sc_settings_writedir+'astro5.h5ad') 

In [44]:
if bool_recomp == True: 
    adata_loom5 = scv.read(dir_scv+'MUC26031/possorted_genome_bam_VXMFJ.loom', cache=True)
    astro5_v = scv.utils.merge(astro5, adata_loom5)
    scv.pl.proportions(astro5_v)
    scv.pp.filter_and_normalize(astro5_v, min_shared_counts=20, n_top_genes=2000)
    scv.pp.moments(astro5_v, n_pcs=30, n_neighbors=30)
    scv.tl.recover_dynamics(astro5_v)
    scv.tl.velocity(astro5_v)
    scv.tl.velocity_graph(astro5_v)

    sc.write(sc_settings_writedir+'astro5_v.h5ad', astro5_v)
else:
    astro5_v = sc.read(sc_settings_writedir+'astro5_v.h5ad') 

In [45]:
if bool_plot == True: 
    cf.plot_umap_marker(astro5, ['cluster5n'], use_raw=False, size=60, frameon=False,
                        palette=['darkorange','orangered','darkcyan'])
    scv.pl.velocity_embedding_stream(astro5_v, basis='umap', color='cluster5n', use_raw=False, size=60, 
                                     legend_loc='right margin', palette=['darkorange','orangered','darkcyan'])

In [46]:
if bool_plot == True: 
    top_genes = astro5_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro5_v, groupby='leiden', n_genes=400)
    df = scv.get_df(astro5_v, 'rank_dynamical_genes/names')
    df.to_csv('hfd5_potential_drivers.csv')
    print(df.shape)
    df.head(10)

In [47]:
if bool_plot == True:
    scv.tl.latent_time(astro5_v)
    top_genes = df['1'] #['Npy','Pcsk1n','Ins2','Il1b','Ifit3b','Flt1','Ucp2','Vamp5','S100b','Gfap','Aldh1l1']
    scv.pl.heatmap(astro5_v, var_names=top_genes, sortby='latent_time', col_color='cluster5n', n_convolve=100,
                  col_cluster=False, figsize=(10,22), font_scale=1, yticklabels=True)

In [48]:
if bool_plot == True:
    scv.pl.velocity(astro5_v, var_names=['Gfap', 'Slc3a2','Clu','Slc38a1','Igfbp5','Ucp2','Vamp5'], 
                    colorbar=True, ncols=1, color='cluster5n')

#### hfd_15

In [49]:
if bool_recomp == True:
    astro15 = adata_astro[adata_astro.obs['diet']=='hfd_15',]
    sc.tl.leiden(astro15, resolution=0.5, key_added='cluster15')
    astro15.obs['cluster15n'] = astro15.obs['cluster15'].replace({'0': 'c', '1': 'b', '2': 'a', '3': 'd'})
    sc.write(sc_settings_writedir+'astro15.h5ad', astro15)
else:
    astro15 = sc.read(sc_settings_writedir+'astro15.h5ad') 

In [50]:
if bool_recomp == True: 
    adata_loom15 = scv.read(dir_scv+'MUC26032/possorted_genome_bam_2UK19.loom', cache=True)
    astro15_v = scv.utils.merge(astro15, adata_loom15)
    scv.pl.proportions(astro15_v)
    scv.pp.filter_and_normalize(astro15_v, min_shared_counts=20, n_top_genes=2000)
    scv.pp.moments(astro15_v, n_pcs=30, n_neighbors=30)
    scv.tl.recover_dynamics(astro15_v)
    scv.tl.velocity(astro15_v)
    scv.tl.velocity_graph(astro15_v)
    
    sc.write(sc_settings_writedir+'astro15_v.h5ad', astro15_v)
else:
    astro15_v = sc.read(sc_settings_writedir+'astro15_v.h5ad') 

In [51]:
if bool_plot == True:
    cf.plot_umap_marker(astro15, ['cluster15n'], use_raw=False, size=60, frameon=False, 
                        palette=['darkorange','orangered','darkcyan','lightseagreen'])
    scv.pl.velocity_embedding_stream(astro15_v, basis='umap', color='cluster15n', use_raw=True,
                                     size=60, legend_loc='right margin',
                                     palette=['darkorange','orangered','darkcyan','lightseagreen'])

In [52]:
if bool_plot == True:
    top_genes = astro15_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro15_v, groupby='cluster15', n_genes=400)
    df = scv.get_df(astro15_v, 'rank_dynamical_genes/names')
    df.to_csv('hfd15_potential_drivers.csv')
    print(df.shape)
    df.head(10)

In [53]:
if bool_plot == True:
    scv.tl.latent_time(astro15_v)
    top_genes = df['1'] # ['Apoe','Ift43','Cldn10','Tpt1','Slc1a2','Slc3a2']
    scv.pl.heatmap(astro15_v, var_names=top_genes, sortby='latent_time', col_color='cluster15n', n_convolve=100,
                  col_cluster=False, figsize=(10,20), font_scale=1, yticklabels=True)

In [54]:
if bool_plot == True:
    scv.pl.velocity(astro15_v, var_names=['Gfap', 'Tpt1','Ift43','Apoe','Pmm1','Atp1b1','Slc1a2'], 
                    colorbar=True, ncols=1, color='cluster15n')

In [55]:
if bool_plot == True:
    adata_astro.obs['cluster'] = list(pd.concat([astro0.obs['cluster0'],
                                                 astro5.obs['cluster5'],
                                                 astro15.obs['cluster15']]))
    adata_astro.obs['diet_cluster'] = adata_astro.obs['diet'].astype(str) + '_' + adata_astro.obs['cluster'].astype(str)
    adata_astro.obs['diet_location'] = adata_astro.obs['diet_cluster'].replace({'chow_2': 'chow_up',
                                                                                'chow_0': 'chow_down',
                                                                                'chow_1': 'chow_down',
                                                                                'chow_3': 'chow_down',
                                                                                'hfd_5_0': 'hfd_5_down',
                                                                                'hfd_5_1': 'hfd_5_up',
                                                                                'hfd_5_2': 'hfd_5_up',
                                                                                'hfd_15_0': 'hfd_15_down',
                                                                                'hfd_15_1': 'hfd_15_up',
                                                                                'hfd_15_2': 'hfd_15_up',
                                                                                'hfd_15_3': 'hfd_15_down'})

#### All diets

In [56]:
if bool_plot == True:
    adata_loom_all = adata_loom0.concatenate([adata_loom5, adata_loom15])
    astro_all_v = scv.utils.merge(adata_astro, adata_loom_all)
    scv.pl.proportions(astro_all_v)
    scv.pp.filter_and_normalize(astro_all_v, min_shared_counts=20, n_top_genes=2000)
    scv.pp.moments(astro_all_v, n_pcs=30, n_neighbors=30)
    scv.tl.recover_dynamics(astro_all_v)
    scv.tl.velocity(astro_all_v)
    scv.tl.velocity_graph(astro_all_v)
    sc.write(sc_settings_writedir+'astro_all_v.h5ad', astro_all_v)
else:
    astro_all_v = sc.read(sc_settings_writedir+'astro_all_v.h5ad') 

In [57]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro, ['leiden'], use_raw=False, size=60, frameon=False)
    scv.pl.velocity_embedding_stream(astro_all_v, basis='umap', color='leiden', use_raw=True, size=60, 
                                     legend_loc='right margin')

In [58]:
if bool_plot == True:
    top_genes = astro_all_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro_all_v, groupby='leiden', n_genes=400)
    df = scv.get_df(astro_all_v, 'rank_dynamical_genes/names')
    df.to_csv('astro_all_potential_drivers.csv')
    print(df.shape)
    df.head(10)

# Differential gene expression

### diet effect

In [59]:
if bool_plot == True:    
    sc.tl.rank_genes_groups(adata_astro, 'diet', groups=['hfd_15'], reference='chow', method='t-test_overestim_var')

    result = adata_astro.uns['rank_genes_groups']
    groups = result['names'].dtype.names
    df = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                       for group in groups for key in ['names', 'logfoldchanges', 'pvals_adj']})

    df.columns = ['gene', 'log2fc', 'pval']
    df.to_csv(path_or_buf = dir_tables+"chow_vs_hfd_15.csv", sep="\t")
    df[df['gene'].isin(marker_genes_dict['Astrocytes'])]

### diet effect of up and down clusters

In [60]:
if bool_plot == True:
    sc.tl.rank_genes_groups(adata_astro, 'diet_location', groups=['hfd_15_down'], reference='hfd_15_up',
                             method='t-test_overestim_var') 

    result = adata_astro.uns['rank_genes_groups']
    groups = result['names'].dtype.names
    df = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                      for group in groups for key in ['names', 'logfoldchanges', 'pvals_adj']})

    df.columns = ['gene', 'log2fc', 'pval']
    df.to_csv(path_or_buf = dir_tables+"hfd_15_up_vs_hfd_15_down.csv", sep="\t")
    df[df['gene'].isin(marker_genes_dict['Astrocytes'])]

### diet effect comparing clusters

In [61]:
if bool_plot == True:
    sc.tl.rank_genes_groups(adata_astro, 'diet_cluster', groups=['hfd_15_2'], reference='chow_2',
                             method='t-test_overestim_var') 

    result = adata_astro.uns['rank_genes_groups']
    groups = result['names'].dtype.names
    df = pd.DataFrame({group + '_' + key[:1]: result[key][group]
                      for group in groups for key in ['names', 'logfoldchanges', 'pvals_adj']})

    df.columns = ['gene', 'log2fc', 'pval']
    df.to_csv(path_or_buf = dir_tables+"chow_2_vs_hfd_15_2.csv", sep="\t")
    df[df['gene'].isin(marker_genes_dict['Astrocytes'])]

## Astorcyte markers per diet

#### chow

In [62]:
if bool_plot == True:
    cf.plot_umap_marker(astro0, ['Gfap', 'Aldh1l1'], color_map=mymap, size=30, 
                        save="_astrocyte_markers.png", use_raw=False, vmax=3.4056416)
    cf.plot_umap_marker(astro0, ['Slc1a2', 'Slc1a3', 'Aqp4', 'Gja1', 'Gjb6', 'Atp1b2'], color_map=mymap,
                        size=30, save="_astrocyte_markers.png", use_raw=False, vmax=4.1836486)

#### hfd 5

In [63]:
if bool_plot == True:
    cf.plot_umap_marker(astro5, ['Gfap', 'Aldh1l1'], color_map=mymap, size=30, 
                        save="_astrocyte_markers.png", use_raw=False, vmax=3.4056416)
    cf.plot_umap_marker(astro5, ['Slc1a2', 'Slc1a3', 'Aqp4', 'Gja1', 'Gjb6', 'Atp1b2'], color_map=mymap,
                        size=30, save="_astrocyte_markers.png", use_raw=False, vmax=4.1836486)

#### hfd 15

In [64]:
if bool_plot == True:
    cf.plot_umap_marker(astro15, ['Gfap', 'Aldh1l1'], color_map=mymap, size=30, 
                        save="_astrocyte_markers.png", use_raw = False, vmax=3.4056416)
    cf.plot_umap_marker(astro15, ['Slc1a2', 'Slc1a3', 'Aqp4', 'Gja1', 'Gjb6', 'Atp1b2'], color_map=mymap,
                        size=30, save="_astrocyte_markers.png", use_raw = False, vmax=4.1836486)

# Counting

In [65]:
if bool_plot == True:
    astro0.var['n_cells'] = np.squeeze(np.asarray((astro0.raw.X > 0).sum(0)))
    astro5.var['n_cells'] = np.squeeze(np.asarray((astro5.raw.X > 0).sum(0)))
    astro15.var['n_cells'] = np.squeeze(np.asarray((astro15.raw.X > 0).sum(0)))

    p0_5 = (astro5.var[astro5.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0.var[astro0.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']-1)*100
    p5_15 = (astro15.var[astro15.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro5.var[astro5.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']-1)*100
    
    p0_5.to_csv('substraction_cells_hfd5-chow.csv')
    p5_15.to_csv('substraction_cells_hfd15-hfd5.csv')

In [66]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0_5.index
    students = p0_5
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percent of increase comparing to chow (%)")
    plt.title("Substraction HFD_5 - chow")
    plt.show()

In [67]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p5_15.index
    students = p5_15
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percentage of increase comparing to HFD_5 (%)")
    plt.title("Substraction HFD_15 - HFD_5")
    plt.show()

### up cluster

In [68]:
if bool_plot == True:
    astro0up = adata_astro[adata_astro.obs['diet_location']=='chow_up',]
    astro5up = adata_astro[adata_astro.obs['diet_location']=='hfd_5_up',]
    astro15up = adata_astro[adata_astro.obs['diet_location']=='hfd_15_up',]
    
    astro0up.var['n_cells'] = (astro0up.X > 0).sum(0)
    astro5up.var['n_cells'] = (astro5up.X > 0).sum(0)
    astro15up.var['n_cells'] = (astro15up.X > 0).sum(0)

    p0_5up = astro5up.var[astro5up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0up.var[astro0up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']
    p0_15up = astro15up.var[astro15up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0up.var[astro0up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']
    p5_15up = astro15up.var[astro15up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro5up.var[astro5up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']

In [69]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0_5up.index
    students = (p0_5up-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percent of increase comparing to chow (%)")
    plt.title("Substraction HFD_5-Chow - up")
    plt.show()

In [70]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0_15up.index
    students = (p0_15up-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percent of increase comparing to chow (%)")
    plt.title("Substraction HFD_15-Chow - up")
    plt.show()

In [71]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p5_15up.index
    students = (p5_15up-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percentage of increase comparing to HFD_5 (%)")
    plt.title("Substraction HFD_15-HFD_5 - up")
    plt.show()

### down clusters

In [72]:
if bool_plot == True:
    astro0down = adata_astro[adata_astro.obs['diet_location']=='chow_down',]
    astro5down = adata_astro[adata_astro.obs['diet_location']=='hfd_5_down',]
    astro15down = adata_astro[adata_astro.obs['diet_location']=='hfd_15_down',]
    
    astro0down.var['n_cells'] = (astro0down.X > 0).sum(0)
    astro5down.var['n_cells'] = (astro5down.X > 0).sum(0)
    astro15down.var['n_cells'] = (astro15down.X > 0).sum(0)

    p0_5down = astro5down.var[astro5down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0down.var[astro0down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']
    p0_15down = astro15down.var[astro15down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0down.var[astro0down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']
    p5_15down = astro15down.var[astro15down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro5down.var[astro5down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']

In [73]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0_5down.index
    students = (p0_5down-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percent of increase comparing to chow (%)")
    plt.title("Substraction HFD_5-Chow - down")
    plt.show()

In [74]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0_15down.index
    students = (p0_15down-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percent of increase comparing to chow (%)")
    plt.title("Substraction HFD_15-Chow - down")
    plt.show()

In [75]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p5_15down.index
    students = (p5_15down-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percentage of increase comparing to HFD_5 (%)")
    plt.title("Substraction HFD_15b-HFD_5 - down")
    plt.show()

### up vs down clusters

In [76]:
if bool_plot == True:
    p0down_0up = astro0down.var[astro0down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro0up.var[astro0up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']
    p5down_5up = astro5down.var[astro0down.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']/astro5up.var[astro5up.var.index.isin(marker_genes_dict['Astrocytes'])]['n_cells']

In [77]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs = p0down_0up.index
    students = (p0down_0up-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percentage of increase comparing to chow up (%)")
    plt.title("Substraction chow_down-chow_up")
    plt.show()

In [78]:
if bool_plot == True:
    fig = plt.figure()
    fig.set_size_inches(9, 5)
    ax = fig.add_axes([0,0,1,1])
    langs =  p5down_5up.index
    students = (p5down_5up-1)*100
    ax.bar(langs,students)
    plt.xlabel("astrocyte markers")
    plt.ylabel("percentage of increase comparing to hfd5 up (%)")
    plt.title("Substraction hfd5_down-hfd5_up")
    plt.show()