# Analysis of scRNA-seq data set from ARC
## focusing on Gfap and Aldh1l1 Astrocyte populations

In this script the scRNA-seq data generated from ARC of the hypothalamus from mice under standard chow, 5 days and 15 days high fat high sugar diet are analysed. Here we contretate on astrocytes, with special focus on Gfap and Aldh1l1 cell populations. Following steps were performed in order to analyse our scRNA-seq data:
- Loading Data, quality control and preprocessing.
- Filtering, normalization and clustering and projection.
- Cell type identification.
- Analysis and visualisation of the Astrocyte population.
- Visualization of Gfap and Aldh1l1 cell populations.
- Creating Diffusion maps to visualise Gfap and Aldh1l1 populations over the pseudotime.
- Inferring the genes with significant correlation with Gfap and Aldh1l1 and their enrichment in GO and KEGG repositories.
- Stringent filtering (60% of cells) to identify the common genes among the populations and diets.
- Visualise the literature based marker genes over the marker-diet groups
- Differential expression analysis comparing populations within and between diets (using various tests).
- For differential expression analysis the Welch-ANOVA test was implemented.
- Enrichment analysis and visualisation of the differential expression results implemented in R script.
- Exploring different ways of visualisation of differentially expressed genes using violin plots.
- RNA velocity of astrocytes was performed for different diets in order to recovery of directed dynamic information and to learn about cellular decision making


## Table of contents:

* <a href=#Load>Load Packages and Set Global Variables</a>
    * <a href=#Imports>Imports and Settings</a>
    * <a href=#Global>Global Variables</a> 
* <a href=#Dataloading>Loading Data, Quality Control and Preprocessing</a>
    * <a href=#Counts>Gene numbers and counts with and without mitochondrial RNA</a>
    * <a href>Number of Genes versus Number of Counts</a>
	* <a href>Distribution of Counts and Genes</a>
	* <a href>Filtering</a>
* <a href>Normalization, projection and clustering</a>
* <a href>Define Cell Types</a>
	* <a href>Differentialy expressed Genes</a>
	* <a href>Define Marker Sets</a>
    * <a href>Summary heatmap, dotplot and stacked_violin for cluster assignments</a>
    * <a href>UMAP with assigned cell types</a>
* <a href>Astorcytes</a>
	* <a href>Embedding and Clustering</a>
* <a href>Define Cell Types</a>
	* <a href>Differentialy Expressed Genes</a>
    * <a href>Summary heatmap, dotplot and stacked_violin for cluster assignments</a>
    * <a href>Count distribution for Aldh1l1 and Gfap</a>
* <a href>Gfap, Aldh1l1 and double positive populations</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
* <a href>RNA Velocitiy</a>
	* <a href>All cells</a>
	* <a href>Astrocytes</a>
* <a href>Common genes</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
* <a href>Visualisation of well known marker genes</a>
	* <a href>Stacked violin plots</a>
* <a href>Illustration and differential expression analysis</a>
	* <a href>Scatter plot and pie chart - chow</a>
	* <a href>Differential gene expression - chow</a>
	* <a href>Scatter plot and pie chart - hfd 5</a>
	* <a href>Differential gene expression - hfd 5</a>
	* <a href>Scatter plot and pie chart - hfd 15</a>
	* <a href>Differential gene expression - hfd 15</a>
* <a href>Differential expression - diet effect</a>
	* <a href>Aldh1l1</a>
	* <a href>Gfap</a>
	* <a href>Double positive</a>
* <a href>Welch ANOVA - marker effect</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
	* <a href>Stacked violin plot visualisation over the diets and marker populations</a>
* <a href>Welch ANOVA - diet effect</a>
	* <a href>Aldh1l1</a>
	* <a href>Gfap</a>
	* <a href>Double positive</a>
	* <a href>Different types of stacked violin plot visualisation</a>
* <a href>Diffusion maps of Astrocytes</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
	* <a href>Pseudotime</a>
* <a href>Corrleation analysis of Gfap and Aldh1l1</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
* <a href>Clusters 0,1 and 2 - potential astrocyte population</a>
	* <a href>Embedding and Clustering</a>
* <a href>Define Cell Types</a>
	* <a href>Differentialy Expressed Genes</a>
    * <a href>Summary heatmap, dotplot and stacked_violin for cluster assignments</a>
    * <a href>Count distribution for Aldh1l1 and Gfap</a>
* <a href>Gfap, Aldh1l1 and double positive populations</a>
	* <a href>Chow</a>
	* <a href>Hfd_5</a>
	* <a href>Hfd_15</a>
* <a href>Differential gene expression - chow</a>
* <a href>Differential gene expression - hfd 5</a>
* <a href>Differential gene expression - hfd15</a>

<a id="Load"></a>

# Load Packages and Set Global Variables

<a id="imports"></a>

## Imports and Settings

In [1]:
import numpy as np
import scanpy as sc
import scipy as sci
import scipy.sparse
import pandas as pd
import seaborn as sb
import scvelo as scv
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import colors
from gprofiler import GProfiler
import custom_functions as cf
from matplotlib_venn import venn3_unweighted
from scipy import stats
import pingouin as pg
import matplotlib_venn
import statistics
import gseapy
import sys
import re
import os

import batchglm
import diffxpy.api as de

import warnings
warnings.filterwarnings('ignore')

%load_ext autoreload
%autoreload 2
sc.settings.verbosity=3 # amount of output

base_dir = '/Users/viktorian.miok/Documents/consultation/Luiza/single_cell/data/scanpy_AnnData/'
dir_out = '/Users/viktorian.miok/Documents/consultation/Luiza/single_cell/results/'
dir_tables = dir_out+'tables/'
sc_settings_figdir = dir_out+'figures/'
sc_settings_writedir = dir_out+'anndata/'
sc.logging.print_versions()
os.chdir(dir_out)
sc.settings.set_figure_params(dpi=80, 
                              scanpy=True
)
print(sys.version)



-----
anndata     0.7.5
scanpy      1.7.1
sinfo       0.3.1
-----
PIL                 8.1.2
PyObjCTools         NA
anndata             0.7.5
appdirs             1.4.4
appnope             0.1.2
autoreload          NA
backcall            0.2.0
batchglm            v0.7.4
bioservices         1.7.11
bs4                 4.9.3
certifi             2020.12.05
cffi                1.14.5
chardet             4.0.0
cloudpickle         1.6.0
colorama            0.4.4
colorlog            NA
custom_functions    NA
cycler              0.10.0
cython_runtime      NA
dask                2021.03.0
dateutil            2.8.1
decorator           4.4.2
diffxpy             v0.7.4
docutils            0.16
easydev             0.11.0
get_version         2.1
gprofiler           1.0.0
gseapy              0.10.4
h5py                3.2.1
idna                2.10
igraph              0.9.0
ipykernel           5.4.3
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.17.2
joblib              1.0.1


In [2]:
#Define a nice colour map for gene expression
colors2 = plt.cm.Reds(np.linspace(0, 1, 128))
colors3 = plt.cm.Greys_r(np.linspace(0.7, 0.8, 20))
colorsComb = np.vstack([colors3, colors2])
mymap = colors.LinearSegmentedColormap.from_list('my_colormap', 
                                                 colorsComb
)
sc.set_figure_params(scanpy=True, 
                     fontsize=17
)

## Global Variables

All embeddings and clusterings can be saved and loaded into this script. Be carful with overwriting cluster caches as soon as cell type annotation has started as cluster labels may be shuffled.

Set whether anndata objects are recomputed or loaded from cache.

In [3]:
bool_recomp = False

Set whether clustering is recomputed or loaded from saved .obs file. Loading makes sense if the clustering changes due to a change in scanpy or one of its dependencies and the number of clusters or the cluster labels change accordingly.

In [4]:
bool_recluster = False

Set whether cluster cache is overwritten. Note that the cache exists for reproducibility of clustering, see above.

In [5]:
bool_write_cluster_cache = False

Set whether to produce plots, set to False for test runs.

In [6]:
bool_plot = False

Set whether observations should be calculated. If false, it is necessary to read cacheed file that contains the necssary information. It then shows the the distributions of counts and genes, as well as mt_frac after filtering. 
Set to true in order to see the data before filtering and follow the decisions for cutoffs.

In [7]:
bool_create_observations = True

<a id="Dataloading"></a>

# Loading Data, Quality Control and Preprocessing

Read the data in:

In [8]:
if bool_recomp:
    adata_raw1 = sc.read(base_dir+'MUC26030/filtered_feature_bc_matrix.h5ad')
    adata_raw2 = sc.read(base_dir+'MUC26031/filtered_feature_bc_matrix.h5ad')
    adata_raw3 = sc.read(base_dir+'MUC26032/filtered_feature_bc_matrix.h5ad')
    adata_raw = adata_raw1.concatenate([adata_raw2, adata_raw3],
                                       batch_key='diet', 
                                       batch_categories=['chow', 'hfd_5', 'hfd_15']
    )
    sc.write(sc_settings_writedir+'adata_raw.h5ad', adata_raw)
else:
    adata_raw = sc.read(sc_settings_writedir+'adata_raw.h5ad')

<a id="QC"></a>

Summary of steps performed here: Only cells with at least 500 UMIs are kept. Counts per cell are cell library depth normalized. The gene (feature) space is reduced with PCA to 50 PCs. A nearest neighbour graph and umap are computed based on the PC space. Cell are clustered with louvain clustering based on the nearest neighbour graph. Graph abstraction is computed based on the louvain clustering.

In [9]:
sc.pp.filter_cells(adata_raw, min_counts=1)

The data contains 21143 observations with 31253 different genes. Due to dropouts, some of the observations might not show any counts and genes. In order to calculate the fraction of mitochondrial RNA in the next steps, each observations without counts must be filtered out to prevent NaN from emerging. 

In [10]:
print('Number of cells: {:d}'.format(adata_raw.n_obs))
print('Number of genes: {:d}'.format(adata_raw.shape[1]))
print('Number of cells per diet:')
adata_raw.obs['diet'].value_counts().sort_index()

Number of cells: 21143
Number of genes: 31253
Number of cells per diet:


chow      7116
hfd_5     6204
hfd_15    7823
Name: diet, dtype: int64

### Gene numbers and counts with and without mitochondrial RNA

Create necessary obs:

In [11]:
adata_qc=adata_raw.copy()
adata_qc.obs['n_genes'] = (adata_qc.X > 0).sum(1)
mt_gene_mask = [gene.startswith('mt-') for gene in adata_qc.var_names]
temp_mt_sum = adata_qc[:,mt_gene_mask].X.sum(1)
temp_mt_sum = np.squeeze(np.asarray(temp_mt_sum))
adata_qc.obs['n_counts'] = adata_qc.X.sum(1)
temp_n_counts = adata_qc.obs['n_counts']
adata_qc.obs['mt_frac'] = temp_mt_sum/adata_qc.obs['n_counts']

Plot n_counts and mt_frac:

In [12]:
if bool_plot == True:
    t1 = sc.pl.violin(adata_qc, 
                      ['n_counts', 'n_genes', 'mt_frac'],
                      size=1, 
                      log=False, 
                      jitter=3,
                      multi_panel=True
    )

In [13]:
if bool_plot==True:
    sc.pl.highest_expr_genes(adata_qc,
                             n_top=20
    ) 

Overall, the data contains a lot of observations with high fractions of mitochondrial RNA. Additionally, most observations show counts below 100, suggesting poor data quality. To further investigate the distributions counts over genes per observations, scatterplots are created:

### Number of Genes versus Number of Counts

In [14]:
if bool_plot == True:
    p1 = sc.pl.scatter(adata_qc,
                       'n_counts',
                       'n_genes',
                       color='mt_frac', 
                       size=5
    )
    p2 = sc.pl.scatter(adata_qc[adata_qc.obs['n_counts'] < 5000],
                       'n_counts',
                       'n_genes',
                       color='mt_frac',
                       size=5
    )

### Distribution of Counts and Genes

For the remaining observations, the fraction of mitochondrial RNA is generally very low and at most 20%

In [15]:
if bool_plot == True:
    p6 = sb.distplot(adata_qc.obs['n_counts'],
                     kde=False
    )
    plt.show()
    p7 = sb.distplot(adata_qc.obs['n_counts'][adata_qc.obs['n_counts'] < 1000],
                     kde=False
    )
    plt.show()

In [16]:
if bool_plot == True:
    p9 = sb.distplot(adata_qc.obs['n_genes'],
                     kde=False, 
                     bins=60
    )
    plt.show()
    p10 = sb.distplot(adata_qc.obs['n_genes'][adata_qc.obs['n_genes'] < 500],
                      kde=False,
                      bins=60
    )
    plt.show()

### Filtering

In [17]:
# Filter cells according to identified QC thresholds:
print('Total number of cells: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, 
                   min_counts=200
)
print('Number of cells after min count filter: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, 
                   max_counts=100000
)
print('Number of cells after max count filter: {:d}'.format(adata_qc.n_obs))

adata_qc=adata_qc[adata_qc.obs['mt_frac'] < 0.5]
print('Number of cells after MT filter: {:d}'.format(adata_qc.n_obs))

sc.pp.filter_cells(adata_qc, 
                   min_genes=350
)
print('Number of cells after gene filter: {:d}'.format(adata_qc.n_obs))

Total number of cells: 21143


filtered out 35 cells that have more than 100000 counts


Number of cells after min count filter: 21143
Number of cells after max count filter: 21108
Number of cells after MT filter: 20491


filtered out 496 cells that have less than 350 genes expressed
Trying to set attribute `.obs` of view, copying.


Number of cells after gene filter: 19995


In [18]:
#Filter genes:
print('Total number of genes: {:d}'.format(adata_qc.n_vars))

# Min 20 cells - filters out 0 count genes
sc.pp.filter_genes(adata_qc,
                   min_cells=20
)
print('Number of genes after cell filter: {:d}'.format(adata_qc.n_vars))

Total number of genes: 31253


filtered out 13164 genes that are detected in less than 20 cells


Number of genes after cell filter: 18089


In [19]:
if bool_plot == True:
    p1 = sc.pl.scatter(adata_qc,
                       'n_counts',
                       'n_genes',
                       color='mt_frac',
                       size=5
    )
    p3 = sc.pl.scatter(adata_qc[adata_qc.obs['n_counts'] < 5000],
                       'n_counts',
                       'n_genes',
                       color='mt_frac',
                       size=5
    )

In [20]:
print('Number of cells: {:d}'.format(adata_qc.n_obs))
print('Number of genes: {:d}'.format(adata_qc.shape[1]))
print('Number of cells per diet:')
adata_qc.obs['diet'].value_counts().sort_index()

Number of cells: 19995
Number of genes: 18089
Number of cells per diet:


chow      6741
hfd_5     5886
hfd_15    7368
Name: diet, dtype: int64

# Normalization, projection and clustering

In [21]:
if bool_recomp == True:
    adata_proc=adata_qc.copy()
    adata_proc.raw=adata_qc
    sc.pp.normalize_per_cell(adata_proc)
    sc.pp.log1p(adata_proc)
    sc.pp.combat(adata_proc,
                 key='diet'
    )
    sc.pp.highly_variable_genes(adata_proc, 
                                flavor='cell_ranger',
                                n_top_genes=4000
    )
    sc.pl.highly_variable_genes(adata_proc)
    #adata_proc.X=adata_proc.X.toarray()
    
    sc.pp.pca(adata_proc,
              n_comps=50, 
              random_state=0,
              use_highly_variable=True, 
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_proc,
                    n_neighbors=100,
                    knn=True,
                    method='umap',
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.umap(adata_proc)
    if bool_recluster == True:
        #sc.tl.louvain(adata_proc, resolution=0.5, flavor='vtraag', random_state=0)
        sc.tl.leiden(adata_proc,
                     resolution=0.3
        )
        pd.DataFrame(adata_proc.obs).to_csv(path_or_buf=sc_settings_writedir+"obs_adata_proc.csv")
    else:
        obs = pd.read_csv(sc_settings_writedir+'obs_adata_proc.csv')
        adata_proc.obs['leiden'] = pd.Series(obs['leiden'].values, 
                                             dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_proc.h5ad', adata_proc)
else:
    adata_proc=sc.read(sc_settings_writedir+'adata_proc.h5ad') 
sc.tl.paga(adata_proc)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:01)


Produce some summarizing plots that show the global characteristics of the data.

In [22]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc,
                        ['leiden'],
                        save="_all_cells_leiden",
                        use_raw=False
    )

In [23]:
#####################################################################################################################
if bool_plot == True:
    plt.rcParams['figure.figsize']=[5,5]
    cf.plot_umap_marker(adata_proc,
                        ['leiden'],
                        save="_all_cells_leiden_ondata",
                        use_raw=False,
                        legend_loc='on data',
                        frameon=False,
                        title='', 
                        size=20
    )

In [24]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc, 
                        ['n_genes', 'n_counts', 'mt_frac'], 
                        color_map=mymap,
                        size=10,
                        save="_all_cells_n_gene_count_mt",
                        use_raw=False
    )

A high fraction of mitochondrial RNA is in cluster 3 and around the central cluster 8.

In [25]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc,
                        ['Gfap','Aldh1l1'],
                        color_map=mymap,
                        size=20,
                        save="_all_cells_gfap-aldh", 
                        use_raw=False
    )

In [26]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc,
                        ['diet'], 
                        save="_all_cells_diet",
                        use_raw=False
    )

In [27]:
if bool_plot == True:
    sc.pl.paga(adata_proc,
               save="_all_cells.png"
    )

In [28]:
if bool_plot == True:
    cf.cell_percent(adata_proc,
                    cluster='leiden',
                    condition='diet', 
                    xlabel='clusters', 
                    ylabel='percentage', 
                    title='barplot_all_cells_diet_per_clusters',
                    save=sc_settings_figdir,
                    table=False
    )

In [29]:
if bool_plot == True:
    aldh_pos = adata_proc.obs_names[np.asarray(adata_proc[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_proc.obs_names[np.asarray(adata_proc[:,'Gfap'].X).flatten() > 0]
    glast_pos = adata_proc.obs_names[np.asarray(adata_proc[:,'Slc1a3'].X).flatten() > 0]

    matplotlib_venn.venn3([set(aldh_pos),
                           set(gfap_pos),
                           set(glast_pos)],
                          set_labels=("Aldh1l1", "Gfap", "Slc1a3"))
    plt.savefig(sc_settings_figdir+'venndiagram_all_cells_gfap-aldh-glast.png')

In [30]:
if bool_plot == True:
    aldh_pos = adata_proc.obs_names[np.asarray(adata_proc[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_proc.obs_names[np.asarray(adata_proc[:,'Gfap'].X).flatten() > 0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"))#, "Slc1a3"))
    plt.savefig(sc_settings_figdir+'venndiagram_all_cells_gfap-aldh.png')

Number of cells in each cluster:

In [31]:
adata_proc.obs["leiden"].value_counts()

0     3921
1     2547
2     2386
3     2247
4     1466
5     1403
6     1233
7     1080
8     1075
9     1056
10     877
11     515
12     109
13      80
Name: leiden, dtype: int64

## Define Cell Types

<a id="DE"></a>

### Differentialy expressd Genes

In [32]:
sc.tl.rank_genes_groups(adata_proc, 
                        groupby='leiden',
                        key_added='rank_genes'
)
if bool_plot==True:
    sc.pl.rank_genes_groups(adata_proc,
                            key='rank_genes',
                            groups=['0','1','2'],
                            save="_all_cells_1.png"
    )
    sc.pl.rank_genes_groups(adata_proc, 
                            key='rank_genes', 
                            groups=['3','4','5'],
                            save="_all_cells_2.png"
    )
    sc.pl.rank_genes_groups(adata_proc, 
                            key='rank_genes', 
                            groups=['6','7','8'],
                            save="_all_cells_3.png"
    )
    sc.pl.rank_genes_groups(adata_proc,
                            key='rank_genes',
                            groups=['9','10','11'], 
                            save="_all_cells_4.png"
    )
    sc.pl.rank_genes_groups(adata_proc,
                            key='rank_genes',
                            groups=['12'], 
                            save="_all_cells_5.png"
    )

ranking genes
    finished: added to `.uns['rank_genes']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:04)


### Define Marker Sets

Define marker sets for some of the expected cell types and add DE genes.

In [33]:
astrocyte_markers=['Slc1a2','Slc1a3','Aqp4', 'S100b','Gfap','Aldh1l1',
                   'Gja1','Gjb6','Agt','Atp1b2'] # , 'Sox9'
neuron_markers=['Rbfox3','Syp', 'Tubb3','Snap25','Syt1']
microglia_markers=['Itgam','Tmem119','Cx3cr1','Csf1r','Aif1','P2ry12']
oligodendrocyte_markers=['Olig1','Mog','Mag']
endothelial_markers=['Cldn5', 'Pecam1','Slco1c1']
mural_markers=['Mustn1','Pdgfrb','Des']
ependymal_markers=['Ccdc153','Rarres2','Hdc','Tm4sf1'] 
tanycyes_markers=['Rax','Lhx2','Col23a1','Slc16a2','Crym','Adm']
npc_markers=['Nes','Sox2','Notch1','Pax6','Prom1']
vlmc_markers=['Lum','Col1a1','Col3a1']

Only keep markers occurring in data set.

In [34]:
astrocyte_markers=np.array([x for x in astrocyte_markers if x in adata_proc.var_names])
neuron_markers=np.array([x for x in neuron_markers if x in adata_proc.var_names])
microglia_markers=np.array([x for x in microglia_markers if x in adata_proc.var_names])
oligodendrocyte_markers=np.array([x for x in oligodendrocyte_markers if x in adata_proc.var_names])
endothelial_markers=np.array([x for x in endothelial_markers if x in adata_proc.var_names])
mural_markers=np.array([x for x in mural_markers if x in adata_proc.var_names])
ependymal_markers=np.array([x for x in ependymal_markers if x in adata_proc.var_names])
tanycyes_markers=np.array([x for x in tanycyes_markers if x in adata_proc.var_names])
npc_markers=np.array([x for x in npc_markers if x in adata_proc.var_names])
vlmc_markers=np.array([x for x in vlmc_markers if x in adata_proc.var_names])

Plot the vioilin plot of the markers for cell types of interest

### Astrocyte Markers

In [35]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          astrocyte_markers.tolist(),
                          save="_all_cells_astrocyte_markers",
                          use_raw=False
    )

### Neuron Markers

In [36]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc, 
                          neuron_markers.tolist(),
                          save="_all_cells_neuron_markers"
    )

### Microglia Markers

In [37]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          microglia_markers.tolist(),
                          save="_all_cells_microglia_markers"
    )

### Oligodendrocytes Markers

In [38]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc, 
                          oligodendrocyte_markers.tolist(),
                          save="_all_cells_oligodendrocyte_markers"
    )

### Endothelial Markers

In [39]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          endothelial_markers.tolist(),
                          save="_all_cells_endothelial_markers"
    )

### Mural Markers

In [40]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          mural_markers.tolist(), 
                          save="_all_cells_mural_markers"
    )

### Ependymal Markers

In [41]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          ependymal_markers.tolist(), 
                          save="_all_cells_ependymal_markers"
    )

### Tanycyes Markers

In [42]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc, 
                          tanycyes_markers.tolist(),
                          save="_all_cells_tanycyes_markers")

### Neural Progenitor Cells Markers

In [43]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          npc_markers.tolist(),
                          save="_all_cells_npc_markers"
    )

### VLMC Markers

In [44]:
if bool_plot == True:
    cf.plot_violin_marker(adata_proc,
                          vlmc_markers.tolist(),
                          save="_all_cells_vlmc_markers"
    )

## Summary heatmap, dotplot and stacked_violin for cluster assignments

In [45]:
selected_astrocyte_markers=['Slc1a2','Slc1a3','Aqp4', 'S100b', 'Sox9','Gfap',
                            'Aldh1l1','Gja1','Gjb6','Agt','Atp1b2']
selected_neuron_markers=['Rbfox3','Syp','Tubb3','Snap25','Syt1']
selected_microglia_markers=['Itgam','Tmem119','Cx3cr1','Csf1r','Aif1','P2ry12']
selected_oligodendrocyte_markers=['Olig1','Mog','Mag']
selected_endothelial_markers=['Cldn5','Pecam1','Slco1c1']
selected_mural_markers=['Mustn1','Pdgfrb','Des']
selected_ependymal_markers=['Ccdc153','Rarres2','Hdc','Tm4sf1'] 
selected_tanycyes_markers=['Rax','Lhx2','Col23a1','Slc16a2','Crym','Adm']
selected_npc_markers=['Nes','Sox2','Notch1','Pax6','Prom1']
selected_vlmc_markers=['Lum','Col1a1','Col3a1']

In [46]:
selected_cell_markers=selected_astrocyte_markers + \
selected_neuron_markers + \
selected_microglia_markers + \
selected_oligodendrocyte_markers + \
selected_endothelial_markers + \
selected_mural_markers + \
selected_ependymal_markers + \
selected_tanycyes_markers + \
selected_npc_markers + \
selected_vlmc_markers

In [47]:
marker_genes_dict = {'Astrocytes': ['Slc1a2', 'Slc1a3', 'Aqp4', 'S100b', 'Sox9','Gfap',
                                    'Aldh1l1','Gja1','Gjb6','Agt','Atp1b2'],
                     'Endothelial cells': ['Cldn5', 'Pecam1','Slco1c1'],
                     'Ependymal cells': ['Ccdc153','Rarres2','Hdc','Tm4sf1'],
                     'Microglia': ['Itgam','Tmem119','Cx3cr1','Csf1r','Aif1','P2ry12'],
                     'Mural cells': ['Mustn1','Pdgfrb','Des'],
                     'Neurons': ['Rbfox3','Syp', 'Tubb3','Snap25','Syt1'],
                     'Oligodendrocytes': ['Olig1','Mog','Mag'],
                     'Mural cells': ['Mustn1','Pdgfrb','Des'],
                     'Tanycytes': ['Rax','Lhx2','Col23a1','Slc16a2','Crym','Adm'],
                     'VLMCs': ['Lum','Col1a1','Col3a1']}

neuron={'ARC_neuro': ['Pomc','Cartpt','Npy','Agrp','Cited1','Tbx3']}

In [48]:
marker_genes_dict['Astrocytes'].sort()
marker_genes_dict['Neurons'].sort()
marker_genes_dict['Microglia'].sort()
marker_genes_dict['Oligodendrocytes'].sort()
marker_genes_dict['Mural cells'].sort()
marker_genes_dict['Ependymal cells'].sort()
marker_genes_dict['Tanycytes'].sort()
marker_genes_dict['VLMCs'].sort()

Plot the heatmap where clusters are ploted against the cell type and neuron marker genes

In [49]:
if bool_plot==True:
    sc.pl.heatmap(adata=adata_proc, 
                  var_names=marker_genes_dict, 
                  groupby="leiden", 
                  use_raw=False, 
                  log=True, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_all_cells_celltypes_markers.png"
    )

In [50]:
if bool_plot==True:
    sc.pl.heatmap(adata=adata_proc, 
                  var_names=neuron, 
                  groupby="leiden", 
                  use_raw=False, 
                  log=True, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_all_cells_ARCneuro_markers.png"
    )

Plot the dotplot where clusters are ploted against the cell type and neuron marker genes

In [51]:
#####################################################################################################################
if bool_plot==True:
    sc.pl.dotplot(adata=adata_proc,
                  var_names=marker_genes_dict, 
                  groupby='leiden',
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show=True, 
                  #size_title=5,
                  save="all_cells_celltypes_markers.pdf"
    )

In [52]:
if bool_plot==True:
    sc.pl.dotplot(adata=adata_proc,
                  var_names=neuron, 
                  groupby='leiden',
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show=True, 
                  save="all_cells_ARCneuro_markers.png"
    )

Plot the stacked violin where clusters are ploted against the cell type and neuron marker genes

In [53]:
if bool_plot==True:
    sc.pl.stacked_violin(adata_proc, 
                         marker_genes_dict, 
                         groupby='leiden', 
                         show=True,
                         use_raw=False,
                         dendrogram=True,
                         cmap='viridis_r',
                         save="all_cells_celltypes_markers.png"
    )

Identify the cell types using differential expressed genes

In [54]:
if bool_plot==True:
    plt.figure(figsize=(7,7))
    cell_annotation = sc.tl.marker_gene_overlap(adata_proc,
                                                marker_genes_dict, 
                                                key='rank_genes', 
                                                normalize='data'
    )
    sb.heatmap(cell_annotation,
               cbar=False,
               annot=True
    )
    plt.savefig(sc_settings_figdir+'heatmap_all_cells_rank_genes_cell_annotation.png')

## Gfap vs. Aldh1l1 - per diet

Visualise Gfap versus Aldh1l1 populations per diet.

###  chow diet cells

In [55]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_proc[adata_proc.obs['diet'] == 'chow', ][:, ['Gfap']].X, 
                           adata_proc[adata_proc.obs['diet'] == 'chow', ][:, ['Aldh1l1']].X, 
                           s=9, 
                           cmap='seismic',
                           c='dodgerblue'
    )
    ax1.set_title('chow')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_proc[adata_proc.obs['diet'] == 'chow', ],
                          color=['leiden'],
                          size=5, 
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'umap_all_cells_chow_gfap-aldh_1')

In [56]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc[adata_proc.obs['diet'] == 'chow', ],
                        ['Gfap', 'Aldh1l1'],
                        color_map=mymap, 
                        size=20, 
                        save="_all_cells_chow_gfap-aldh.png",
                        use_raw=False
    )

### hfd_5 diet cells

In [57]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_proc[adata_proc.obs['diet'] == 'hfd_5', ][:, ['Gfap']].X, 
                           adata_proc[adata_proc.obs['diet'] == 'hfd_5', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic', 
                           c='darkorange'
    )
    ax1.set_title('chow')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_proc[adata_proc.obs['diet'] == 'hfd_5', ],
                          color=['leiden'], 
                          size=5, 
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'umap_all_cells_hfd5_gfap-aldh_1.png')

In [58]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc[adata_proc.obs['diet'] == 'hfd_5', ],
                        ['Gfap', 'Aldh1l1'],
                        color_map=mymap,
                        size=20,
                        save="_all_cells_hfd5_gfap-aldh.png",
                        use_raw=False
    )

### hfd15 diet cells

In [59]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_proc[adata_proc.obs['diet'] == 'hfd_15', ][:, ['Gfap']].X, 
                           adata_proc[adata_proc.obs['diet'] == 'hfd_15', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic',
                           c='green'
    )
    ax1.set_title('chow')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_proc[adata_proc.obs['diet'] == 'hfd_15', ],
                          color=['leiden'],
                          size=5,
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'umap_all_cells_hfd15_gfap-aldh_1.png')

In [60]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc[adata_proc.obs['diet'] == 'hfd_15', ], 
                        ['Gfap', 'Aldh1l1'],
                        color_map=mymap,
                        size=20,
                        save="_all_cells_hfd15_gfap-aldh.png", 
                        use_raw=False
    )

## UMAP with assigned cell types

Plot the UMAP projectioin with assigned cell type per culster

In [61]:
new_cluster_names = {'0': "Astrocytes",
                     '1': "Tanycytes",
                     '2': "Ependymal cells",
                     '3': "Neurons",
                     '4': "Endothelial cells",
                     '5': "Oligodendrocytes",
                     '6': "Microglia",
                     '7': "Oligodendrocytes",
                     '8': "Neurons",
                     '9': "Microglia",
                     '10': "Oligodendrocytes",
                     '11': "Mural cells",
                     '12': "VLMCs",
                     '13': "Oligodendrocytes"
}
adata_proc.obs['celltypes'] = [new_cluster_names[x] for x in  adata_proc.obs['leiden']]

In [62]:
#####################################################################################################################
if bool_plot == True:
    cf.plot_umap_marker(adata_proc,
                        ['celltypes'],
                        save="_all_cells_celltypes.png",
                        use_raw=False, 
                        frameon=False,
                        size=20,
                        title=''
    )

In [63]:
if bool_plot == True:
    cf.plot_umap_marker(adata_proc,
                        ['celltypes'],
                        save="_all_cells_celltypes_ondata.png",
                        use_raw=False, 
                        legend_loc='on data'
    )

In [64]:
if bool_plot == True:
    sc.pl.paga(adata_proc,
               save="_all_cells_celltypes.png"
    )

Plot of the number of differentially expressed genes comparinig the chow and hfd diets per cell type

In [65]:
######################################################################################################################
if bool_plot == True:
    adata_proc.obs['diet_leiden'] = adata_proc.obs['diet'].str.cat(adata_proc.obs['leiden'],
                                                                   sep='_'
    )
    adata_proc.obs['diet_celltypes'] = adata_proc.obs['diet'].str.cat(adata_proc.obs['celltypes'],
                                                                      sep='_'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_5_Astrocytes'], 
                            reference='chow_Astrocytes', 
                            key_added="ct5_ast",
                            method='t-test'
    ) #  wilcoxon
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes', 
                            groups=['hfd_5_Endothelial cells'],
                            reference='chow_Endothelial cells',
                            key_added="ct5_end",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_5_Ependymal cells'],
                            reference='chow_Ependymal cells',
                            key_added="ct5_epe",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc, 
                            'diet_celltypes',
                            groups=['hfd_5_Microglia'],
                            reference='chow_Microglia',
                            key_added="ct5_mic",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_5_Mural cells'],
                            reference='chow_Mural cells', 
                            key_added="ct5_mur", 
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_5_Neurons'],
                            reference='chow_Neurons',
                            key_added="ct5_neu",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc, 
                            'diet_celltypes',
                            groups=['hfd_5_Oligodendrocytes'],
                            reference='chow_Oligodendrocytes',
                            key_added="ct5_oli",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_5_Tanycytes'],
                            reference='chow_Tanycytes',
                            key_added="ct5_tan",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes', 
                            groups=['hfd_5_VLMCs'],
                            reference='chow_VLMCs',
                            key_added="ct5_vlmc", 
                            method='t-test'
    )

    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Astrocytes'], 
                            reference='chow_Astrocytes',
                            key_added="ct15_ast", 
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes', 
                            groups=['hfd_15_Endothelial cells'],
                            reference='chow_Endothelial cells',
                            key_added="ct15_end",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Ependymal cells'], 
                            reference='chow_Ependymal cells',
                            key_added="ct15_epe", 
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Microglia'],
                            reference='chow_Microglia',
                            key_added="ct15_mic",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Mural cells'],
                            reference='chow_Mural cells',
                            key_added="ct15_mur", 
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Neurons'],
                            reference='chow_Neurons',
                            key_added="ct15_neu",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc, 
                            'diet_celltypes',
                            groups=['hfd_15_Oligodendrocytes'],
                            reference='chow_Oligodendrocytes',
                            key_added="ct15_oli",
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_Tanycytes'], 
                            reference='chow_Tanycytes',
                            key_added="ct15_tan", 
                            method='t-test'
    )
    sc.tl.rank_genes_groups(adata_proc,
                            'diet_celltypes',
                            groups=['hfd_15_VLMCs'],
                            reference='chow_VLMCs', 
                            key_added="ct15_vlmc", 
                            method='t-test'
    )

    clus=['ct5_ast','ct5_end','ct5_epe','ct5_mic','ct5_mur','ct5_neu','ct5_oli','ct5_tan','ct5_vlmc']
    ct0_5=[]
    for i in clus:
        result=adata_proc.uns[i]
        groups=result['names'].dtype.names
        df = pd.DataFrame(
             {group + '_' + key[:1]: result[key][group]
             for group in groups for key in ['pvals_adj']})
        ct0_5.append(sum(df.iloc[:, 0] < 0.05))


    clus=['ct15_ast','ct15_end','ct15_epe','ct15_mic','ct15_mur','ct15_neu','ct15_oli','ct15_tan','ct15_vlmc']
    ct0_15=[]
    for i in clus:
        result=adata_proc.uns[i]
        groups=result['names'].dtype.names
        df = pd.DataFrame(
             {group + '_' + key[:1]: result[key][group]
             for group in groups for key in ['pvals_adj']})
        ct0_15.append(sum(df.iloc[:, 0] < 0.05))
        
    plt.rcParams["figure.figsize"]=(7,7)

    d={'0_5':ct0_5, '0_15':ct0_15}
    df=pd.DataFrame(d)

    df['0_0']=[0,0,0,0,0,0,0,0,0]
    df = df.reindex(['0_0','0_5','0_15'],
                    axis=1
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[0],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[1],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[2],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[3],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[4], 
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[5], 
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'],
                           df.iloc[6],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], 
                           df.iloc[7],
                           linewidth=3
    )
    line_chart1 = plt.plot(['SD', '5d HFHS diet', '15d HFHS diet'], 
                           df.iloc[8], 
                           linewidth=3
    )

    plt.ylabel("Number of DEGs")

# Astorcytes 

<a id="Embedding"></a>

## Embedding and Clustering

In [66]:
if bool_recomp == True:  
    cell_ids_astro = np.asarray(adata_proc.obs_names)[
        [x in ['astrocytes'] 
         for x in np.asarray(adata_proc.obs['celltypes'].values)]
    ]
    adata_astro=adata_raw[cell_ids_astro,:].copy()  # adata_raw
    #dat = pd.DataFrame(adata_proc.X, index=adata_proc.obs.index, columns=adata_proc.var.index)
    adata_astro.obs['n_genes'] = (adata_astro.X > 0).sum(1)
    adata_astro.obs['n_counts'] = adata_astro.X.sum(1)
    mt_gene_mask=[gene.startswith('mt-') for gene in adata_astro.var_names]
    temp_mt_sum = adata_astro[:,mt_gene_mask].X.sum(1)
    temp_mt_sum = np.squeeze(np.asarray(temp_mt_sum))
    temp_n_counts = adata_astro.obs['n_counts']
    adata_astro.obs['mt_frac'] = temp_mt_sum/adata_astro.obs['n_counts']
    adata_astro.raw = adata_astro
    sc.pp.normalize_per_cell(adata_astro)
    sc.pp.log1p(adata_astro)
    sc.pp.highly_variable_genes(adata_astro, 
                                n_top_genes=4000
    )
    sc.pl.highly_variable_genes(adata_astro)
    adata_astro.X = adata_astro.X.toarray()
    
    sc.pp.pca(adata_astro,
              n_comps=50,
              use_highly_variable=True,
              random_state=0, 
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_astro,
                    n_neighbors=100,
                    knn=True,
                    method='umap',
                    n_pcs=50, 
                    random_state=0
    )
    sc.tl.umap(adata_astro)
    if bool_recluster == True:
        sc.tl.leiden(adata_astro,
                     resolution=0.5
        )
        pd.DataFrame(adata_astro.obs).to_csv(path_or_buf=sc_settings_writedir+'obs_adata_astro.csv')
    else:
        obs = pd.read_csv(sc_settings_writedir+'obs_adata_astro.csv')
        adata_astro.obs['leiden'] = pd.Series(obs['leiden'].values,
                                              dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_astro.h5ad', adata_astro)
else:
    adata_astro = sc.read(sc_settings_writedir+'adata_astro.h5ad') 
sc.tl.paga(adata_astro)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


In [67]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro, 
                        ['leiden'],
                        save="_astrocyte_leiden.png", 
                        use_raw=False,
                        size=30
    )

In [68]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro,
                        ['leiden'],
                        save="_astrocyte_leiden_ondata.png",
                        use_raw=False, 
                        legend_loc='on data',
                        size=30
    )

In [69]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro,
                        ['n_genes', 'n_counts', 'mt_frac'],
                        color_map=mymap,
                        size=30,
                        save="_astrocyte_n_gene_count_mt.png",
                        use_raw=False
    )

Plot the expression of the Gfap and Aldh1l1 in astrocytes

In [70]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro, 
                        ['Gfap','Aldh1l1'],
                        color_map=mymap, 
                        size=30, 
                        save="_astrocyte_markers.png",
                        use_raw=False
    )

In [71]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro,
                        ['diet'],
                        save="_astrocytes_diet.png",
                        size=30, 
                        use_raw=False
    )

In [72]:
if bool_plot == True:
    sc.pl.paga(adata_astro,
               save="_astrocytes.png"
    )

Visualize the marker and cluster populationis in astrocytes

In [73]:
if bool_plot == True:
    cf.cell_percent(adata_astro, 
                    cluster='leiden',
                    condition='diet',
                    xlabel='clusters', 
                    ylabel='percentage', 
                    title='barplot_astrocytes_diet_per_clusters',
                    save=sc_settings_figdir,
                    table=False
    )

In [74]:
if bool_plot == True:
    aldh_pos = adata_astro.obs_names[np.asarray(adata_astro[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_astro.obs_names[np.asarray(adata_astro[:,'Gfap'].X).flatten() > 0]
    glast_pos = adata_astro.obs_names[np.asarray(adata_astro[:,'Slc1a3'].X).flatten() > 0]

    matplotlib_venn.venn3([set(aldh_pos),
                           set(gfap_pos),
                           set(glast_pos)],
                          set_labels = ("Aldh1l1", "Gfap", "Slc1a3")
    )
    plt.savefig(sc_settings_figdir+'venndiagram_astrocytes_gfap-aldh-glast.png')

In [75]:
if bool_plot == True:
    aldh_pos = adata_astro.obs_names[np.asarray(adata_astro[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_astro.obs_names[np.asarray(adata_astro[:,'Gfap'].X).flatten() > 0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta')
    )
    plt.savefig(sc_settings_figdir+'venndiagram_astrocytes_gfap-aldh.png')

## Define Cell Types

### Differentialy Expressed Genes

In [76]:
if bool_plot == True:
    sc.tl.rank_genes_groups(adata_astro,
                            groupby='leiden',
                            key_added='rank_genes'
    )
    sc.pl.rank_genes_groups(adata_astro,
                            key='rank_genes',
                            groups=['0','1','2'],
                            save="_astorcytes_1.png"
    )

### Summary heatmap, dotplot and stacked_violin for cluster assignments

In [77]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_astro, 
                  var_names=marker_genes_dict, 
                  groupby="leiden", 
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_astrocytes_celltypes.png"
    )

In [78]:
if bool_plot == True:
    sc.pl.dotplot(adata=adata_astro,
                  var_names=marker_genes_dict, 
                  groupby='leiden',
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show=True, 
                  save="_astrocytes_celltypes.png"
    )

In [79]:
if bool_plot == True:
    sc.pl.stacked_violin(adata=adata_astro, 
                         var_names=marker_genes_dict, 
                         groupby='leiden', 
                         use_raw=False,
                         dendrogram=True,
                         cmap='viridis_r',
                         show=True,
                         save="_astrocytes_celltypes.png"
    )

In [80]:
if bool_plot == True:
    plt.figure(figsize=(7,7))
    cell_annotation = sc.tl.marker_gene_overlap(adata_astro,
                                                marker_genes_dict, 
                                                key='rank_genes', 
                                                normalize='data'
    )
    sb.heatmap(cell_annotation, 
               cbar=False,
               annot=True
    )
    plt.savefig(sc_settings_figdir+'heatmap_astrocytes_rank_genes_cell_annotation.png')

In [81]:
if bool_plot==True:
    sc.tl.embedding_density(adata_astro,
                            basis='umap',
                            groupby='diet'
    )
    sc.pl.embedding_density(adata_astro,
                            basis='umap',
                            key='umap_density_diet',
                            group=['chow', 'hfd_5', 'hfd_15'], 
                            bg_dotsize=5, 
                            fg_dotsize=30,
                            save="astrocytes.png"
    )

### Count distribution for Aldh1l1 and Gfap

#### Single and double positive counts¶

Define booleans for single and double positive counts of Gfap and Aldh1l1

In [82]:
non_boolean_int = np.array((adata_astro[:,'Gfap'].X <= 0) & 
                           (adata_astro[:,'Aldh1l1'].X <= 0), 
                           dtype=int
)

gfap_single_boolean = (adata_astro[:,'Gfap'].X > 0) & (adata_astro[:,'Aldh1l1'].X <= 0)
aldh_single_boolean = (adata_astro[:,'Aldh1l1'].X > 0) & (adata_astro[:,'Gfap'].X <= 0) 
single_boolean_int = np.array((gfap_single_boolean | aldh_single_boolean), 
                            dtype=int)*1

gfap_aldh_double_boolean = (adata_astro[:,'Gfap'].X > 0) & (adata_astro[:,'Aldh1l1'].X > 0)
double_boolean_int = np.array((gfap_aldh_double_boolean),
                              dtype=int)*2

non_boolean_int *=0

In [83]:
gfap_single_pos = adata_astro[:,'Gfap'].X[gfap_single_boolean]
aldh_single_pos = adata_astro[:,'Aldh1l1'].X[aldh_single_boolean]

print('Gfap Single Positive ', len(gfap_single_pos))
print('Aldh1l1 Single Positive ', len(aldh_single_pos))

gfap_aldh_double_pos = adata_astro[:,'Gfap'].X[gfap_aldh_double_boolean]

print('Gfap/Aldh1l1 Double Positive', len(gfap_aldh_double_pos))


Gfap Single Positive  683
Aldh1l1 Single Positive  851
Gfap/Aldh1l1 Double Positive  624


<a id="adipmarkers"></a>

In [84]:
adata_astro.obs['sdt_pos'] = np.array((non_boolean_int + single_boolean_int + double_boolean_int),
                                      dtype=str
)
adata_astro.obs['s_pos'] = np.array((np.array(gfap_single_boolean,
                                              dtype=int)*1)+(np.array(aldh_single_boolean,
                                                                                           dtype=int)*2),
                                    dtype=str
)
adata_astro.obs['d_pos'] = np.array((np.array(gfap_aldh_double_boolean, dtype=int)*1),
                                    dtype=str
)

# make them categorical
adata_astro.obs['sdt_pos'] = pd.Series(adata_astro.obs['sdt_pos'],
                                       dtype="category"
)
adata_astro.obs['s_pos'] = pd.Series(adata_astro.obs['s_pos'], 
                                     dtype="category"
)
adata_astro.obs['d_pos'] = pd.Series(adata_astro.obs['d_pos'],
                                     dtype="category"
)

In [85]:
if bool_plot == True:
    new_cluster_names = ['Non Positives', 'Single Positives', 'Double Positives']
    adata_astro.rename_categories('sdt_pos', 
                                  new_cluster_names
    )
    sc.pl.umap(adata_astro,
               color=['sdt_pos'],
               size=30, 
               palette=["lightgrey", "tomato", "blue"], 
               save="_astrocytes_gfap-aldh_single_double_positiv.png"
    )
    new_cluster_names=['Non Single Positives', 'Gfap Single Positives', 'Aldh1l1 Single Positives']
    adata_astro.rename_categories('s_pos',
                                  new_cluster_names
    )
    sc.pl.umap(adata_astro[adata_astro.obs['s_pos'] != 'Non Single Positives'],
               color=['s_pos'],
               size=30, 
               palette=["magenta", "lime"],
               save="_astrocytes_gfap-aldh_single_positiv.png"
    )
    new_cluster_names=['Non Double Positives', 'Gfap/Aldh1l1 Double Positive']
    adata_astro.rename_categories('d_pos', new_cluster_names)
    sc.pl.umap(adata_astro[adata_astro.obs['d_pos'] != 'Non Double Positives'],
               color=['d_pos'],
               size=30,
               save="_astrocytes_gfap-aldh_doble_positiv.png"
    )

Visualize Gfap and Aldh1l1 populations of astrocytes

### Astrocyte chow diet

In [86]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_astro[adata_astro.obs['diet'] == 'chow', ][:, ['Gfap']].X, 
                           adata_astro[adata_astro.obs['diet'] == 'chow', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic',
                           c='dodgerblue'
    )
    ax1.set_title('chow')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                          color=['leiden'],
                          size=30,
                          ax=ax2, 
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_astrocytes_chow.png')

In [87]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                        ['Gfap', 'Aldh1l1'],
                        save="_astrocyte_chow_gfap-aldh.png", 
                        size=30, 
                        color_map=mymap, 
                        use_raw=False
    )

In [88]:
if bool_plot == True:
    aldh_pos = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'chow', ][:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_astro[adata_astro.obs['diet'] =='chow', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'chow', ][:,'Gfap'].X).flatten() > 0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta'))
    plt.savefig(sc_settings_figdir+'venn_diagram_astrocytes_chow_gfap-aldh.png')

### Astrocyte hfd_5 diet

In [89]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ][:, ['Gfap']].X, 
                           adata_astro[adata_astro.obs['diet'] == 'hfd_5', ][:, ['Aldh1l1']].X, 
                           s=9, 
                           cmap='seismic',
                           c='darkorange'
    )
    ax1.set_title('hfd_5')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                          color=['leiden'], 
                          size=30,
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_astrocytes_hfd5.png')

In [90]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                        ['Gfap', 'Aldh1l1'],
                        save="_astrocyte_hfd5_gfap-aldh.png", 
                        size=30, 
                        color_map=mymap,
                        use_raw=False
    )

In [91]:
if bool_plot == True:
    aldh_pos = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ][:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ][:,'Gfap'].X).flatten() > 0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta')
    )
    plt.savefig(sc_settings_figdir+'venn_diagram_astrocytes_hfd5_gfap-aldh.png')

### Astrocyte hfd_15 diet

In [92]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ][:, ['Gfap']].X, 
                           adata_astro[adata_astro.obs['diet'] == 'hfd_15', ][:, ['Aldh1l1']].X, 
                           s=9, 
                           cmap='seismic',
                           c='green'
    )
    ax1.set_title('hfd_15')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ],
                          color=['leiden'],
                          size=30,
                          ax=ax2, 
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_astrocytes_hfd15.png')

In [93]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ], 
                        ['Gfap', 'Aldh1l1'],
                        save="_astrocyte_hfd15_gfap-aldh.png", 
                        size=30,
                        color_map=mymap,
                        use_raw=False
    )

In [94]:
if bool_plot == True:
    aldh_pos = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ][:,'Aldh1l1'].X).flatten()>0]
    gfap_pos = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs_names[np.asarray(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ][:,'Gfap'].X).flatten()>0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta')) 
    plt.savefig(sc_settings_figdir+'venn_diagram_astrocytes_hfd15_gfap-aldh.png')

## Gfap, Aldh1l1 and double positive populations

Define populatioin of cells: gfap only, aldh1l1 only, dobulbe positive and none

In [95]:
adata_astro.obs['gfap_aldh'] = np.select([((adata_astro[:,'Gfap'].X > 0) & (adata_astro[:,'Aldh1l1'].X == 0)), 
                                          ((adata_astro[:,'Gfap'].X == 0) & (adata_astro[:,'Aldh1l1'].X > 0)),
                                          ((adata_astro[:,'Gfap'].X > 0) & (adata_astro[:,'Aldh1l1'].X > 0)), 
                                          ((adata_astro[:,'Gfap'].X == 0) & (adata_astro[:,'Aldh1l1'].X == 0))],
                                          ['gfap_only', 'aldh_only', 'both', 'none'])

UMAP projection with the pie chart of populations over the diets

### Chow

In [96]:
if bool_plot==True:
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                     color=['gfap_aldh'],
                     size=30,
                     ax=ax0,
                     show=False,
                     palette=['green', 'blue', 'magenta', 'gainsboro']
    )
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data),
                                       textprops=dict(color="w"), 
                                       colors=['green', 'blue', 'magenta', 'gainsboro']
    )
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_astrocyte_chow_gfap-aldh.png')

### HFD_5

In [97]:
if bool_plot==True:
    data = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs['gfap_aldh'].value_counts().sort_index()
    
    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet']=='hfd_5', ], 
                     color=['gfap_aldh'],
                     size=30,
                     ax=ax0,
                     show=False,
                     palette=['green', 'blue', 'magenta', 'gainsboro']
    )
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data),
                                       textprops=dict(color="w"),
                                       colors=['green', 'blue', 'magenta', 'gainsboro']
    )
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_astrocyte_hfd5_gfap-aldh.png')

### HFD_15

In [98]:
if bool_plot==True:   
    data = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ], 
                     color=['gfap_aldh'], 
                     size=30, 
                     ax=ax0,
                     show=False,
                     palette=['green', 'blue', 'magenta', 'gainsboro']
    )
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data), 
                                       textprops=dict(color="w"),
                                       colors=['green', 'blue', 'magenta', 'gainsboro']
    )
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_astrocyte_hfd15_gfap-aldh.png')

In [99]:
if bool_plot == True:
    cf.cell_percent(adata_astro, 
                    cluster='diet',
                    condition='gfap_aldh',
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='barplot_per_diet_astrocytes_gfap-aldh',
                    save=sc_settings_figdir, 
                    table=False
    )

# RNA Velocitiy

## All cells


RNA velocity of all the cells ploted in the UMAP plots with indication of the cell types

In [100]:
if bool_plot == True:
    adata_loom0 = scv.read(dir_scv+'MUC26030/possorted_genome_bam_Z0I20.loom',
                           cache=True
    )
    adata_loom5 = scv.read(dir_scv+'MUC26031/possorted_genome_bam_VXMFJ.loom',
                           cache=True
    )
    adata_loom15 = scv.read(dir_scv+'MUC26032/possorted_genome_bam_2UK19.loom', 
                            cache=True
    )

### chow

In [101]:
if bool_plot == True:    
    astro_chow = adata_proc[adata_proc.obs['diet'] == 'chow', ]

    astro_chow_v = scv.utils.merge(astro_chow,
                                   adata_loom0
    )
    scv.pl.proportions(astro_chow_v)
    scv.pp.filter_and_normalize(astro_chow_v,
                                min_shared_counts=20, 
                                n_top_genes=2000
    )
    scv.pp.moments(astro_chow_v,
                   n_pcs=30,
                   n_neighbors=30
    )
    scv.tl.velocity(astro_chow_v)
    scv.tl.velocity_graph(astro_chow_v)
    scv.pl.velocity_embedding_stream(astro_chow_v,
                                     basis='umap',
                                     color='leiden',
                                     use_raw=True
    )

### hfd_5

In [102]:
if bool_plot == True:
    astro_hfd5 = adata_proc[adata_proc.obs['diet'] == 'hfd_5', ]

    astro_hfd5_v = scv.utils.merge(astro_hfd5, 
                                   adata_loom5
    )
    scv.pl.proportions(astro_hfd5_v)
    scv.pp.filter_and_normalize(astro_hfd5_v,
                                min_shared_counts=20,
                                n_top_genes=2000
    )
    scv.pp.moments(astro_hfd5_v,
                   n_pcs=30,
                   n_neighbors=30
    )
    scv.tl.velocity(astro_hfd5_v)
    scv.tl.velocity_graph(astro_hfd5_v)
    scv.pl.velocity_embedding_stream(astro_hfd5_v,
                                     basis='umap',
                                     color='leiden',
                                     use_raw=True
    )

### hfd_15

In [103]:
if bool_plot == True:
    astro_hfd15 = adata_proc[adata_proc.obs['diet'] == 'hfd_15', ]

    astro_hfd15_v = scv.utils.merge(astro_hfd15,
                                    adata_loom15
    )
    scv.pl.proportions(astro_hfd15_v)
    scv.pp.filter_and_normalize(astro_hfd15_v,
                                min_shared_counts=20, 
                                n_top_genes=2000
    )
    scv.pp.moments(astro_hfd15_v, 
                   n_pcs=30,
                   n_neighbors=30
    )
    scv.tl.velocity(astro_hfd15_v)
    scv.tl.velocity_graph(astro_hfd15_v)
    scv.pl.velocity_embedding_stream(astro_hfd15_v,
                                     basis='umap', 
                                     color='leiden',
                                     use_raw=True
    )

## Astrocytes 

Perform RNA velocity for astrocytes per diet 

### chow

In [104]:
if bool_plot == True:
    astro_chow = adata_astro[adata_astro.obs['diet'] == 'chow', ]

    astro_chow_v = scv.utils.merge(astro_chow,
                                   adata_loom0
    )
    scv.pl.proportions(astro_chow_v)
    scv.pp.filter_and_normalize(astro_chow_v,
                                min_shared_counts=20, 
                                n_top_genes=2000
    )
    scv.pp.moments(astro_chow_v, 
                   n_pcs=30, 
                   n_neighbors=30
    )
    scv.tl.recover_dynamics(astro_chow_v)
    scv.tl.velocity(astro_chow_v)
    scv.tl.velocity_graph(astro_chow_v)
    scv.pl.velocity_embedding_stream(astro_chow_v,
                                     basis='umap',
                                     color='leiden',
                                     use_raw=False, 
                                     size=60, 
                                     legend_loc='right margin'
    ) # palette=['green','blue','magenta','silver'],

In [105]:
if bool_plot == True:
    top_genes = astro_chow_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro_chow_v,
                                groupby='leiden'
    )
    df = scv.get_df(astro_chow_v,
                    'rank_dynamical_genes/names'
    )
    df.head(10)

In [106]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_chow_v,
                         top_genes[1:10],
                         groupby='leiden',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="",
                         swap_axes=True
    )

### hfd_5

In [107]:
if bool_plot == True:
    astro_hfd5 = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ]

    astro_hfd5_v = scv.utils.merge(astro_hfd5,
                                   adata_loom5
    )
    scv.pl.proportions(astro_hfd5_v)
    scv.pp.filter_and_normalize(astro_hfd5_v,
                                min_shared_counts=20,
                                n_top_genes=2000
    )
    scv.pp.moments(astro_hfd5_v,
                   n_pcs=30, 
                   n_neighbors=30
    )
    scv.tl.recover_dynamics(astro_hfd5_v)
    scv.tl.velocity(astro_hfd5_v)
    scv.tl.velocity_graph(astro_hfd5_v)
    scv.pl.velocity_embedding_stream(astro_hfd5_v,
                                     basis='umap',
                                     color='leiden',
                                     use_raw=False, 
                                     size=60, 
                                     legend_loc='right margin'
    )

In [108]:
if bool_plot == True:
    top_genes = astro_hfd5_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro_hfd5_v,
                                groupby='leiden'
    )
    df = scv.get_df(astro_hfd5_v, 
                    'rank_dynamical_genes/names'
    )
    df.head(10)

In [109]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_hfd5_v,
                         top_genes[1:10],
                         groupby='leiden',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="",
                         swap_axes=True
    )

### hfd_15

In [110]:
if bool_plot == True:
    astro_hfd15 = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ]

    astro_hfd15_v = scv.utils.merge(astro_hfd15,
                                    adata_loom15
    )
    scv.pl.proportions(astro_hfd15_v)
    scv.pp.filter_and_normalize(astro_hfd15_v,
                                min_shared_counts=20,
                                n_top_genes=2000
    )
    scv.pp.moments(astro_hfd15_v,
                   n_pcs=30,
                   n_neighbors=30
    )
    scv.tl.recover_dynamics(astro_hfd15_v)
    scv.tl.velocity(astro_hfd15_v)
    scv.tl.velocity_graph(astro_hfd15_v)
    scv.pl.velocity_embedding_stream(astro_hfd15_v,
                                     basis='umap', 
                                     color='leiden',
                                     use_raw=True,
                                     size=60, 
                                     legend_loc='right margin'
    )

In [111]:
if bool_plot == True:
    top_genes = astro_hfd15_v.var['fit_likelihood'].sort_values(ascending=False).index

    scv.tl.rank_dynamical_genes(astro_hfd15_v,
                                groupby='leiden'
    )
    df = scv.get_df(astro_hfd15_v,
                    'rank_dynamical_genes/names'
    )
    df.head(10)

In [112]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_hfd15_v,
                         top_genes[1:10],
                         groupby='leiden',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="",
                         swap_axes=True
    )

Plot the percentages of the marker genes in the particular clusters per diet

In [113]:
if bool_plot == True:
    cf.cell_percent(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                    cluster='leiden',
                    condition='gfap_aldh',
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='',
                    save=sc_settings_figdir,
                    table=False
    )
    cf.cell_percent(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                    cluster='leiden',
                    condition='gfap_aldh',
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='', 
                    save=sc_settings_figdir,
                    table=False
    )
    cf.cell_percent(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ], 
                    cluster='leiden',
                    condition='gfap_aldh',
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='', 
                    save=sc_settings_figdir,
                    table=False
    )

Perform RNA velocity of all the astrocytes

In [114]:
if bool_plot == True:  
    adata_loom_all = adata_loom0.concatenate([adata_loom5, adata_loom15])
    astro_all_v = scv.utils.merge(adata_astro,
                                  adata_loom_all
    )
    scv.pl.proportions(astro_all_v)
    scv.pp.filter_and_normalize(astro_all_v,
                                min_shared_counts=20,
                                n_top_genes=2000
    )
    scv.pp.moments(astro_all_v,
                   n_pcs=30,
                   n_neighbors=30
    )
    scv.tl.recover_dynamics(astro_all_v)
    scv.tl.velocity(astro_all_v)
    scv.tl.velocity_graph(astro_all_v)
    scv.pl.velocity_embedding_stream(astro_all_v,
                                     basis='umap', 
                                     color='leiden',
                                     use_raw=True,
                                     size=60, 
                                     legend_loc='right margin'
    )

In [115]:
if bool_plot == True:
    cf.cell_percent(adata_astro,
                    cluster='leiden',
                    condition='gfap_aldh',
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='', 
                    save=sc_settings_figdir,
                    table=False
    )

## Per marker and diet

In [116]:
if bool_plot == True:    
    cutoff = statistics.median(adata_astro[adata_astro.obs['gfap_aldh'] == 'gfap_only', 'Gfap'].X)
    adata_astro.obs['gfap_only'] = np.select([((adata_astro[:,'Gfap'].X >= cutoff) & 
                                               (adata_astro[:,'Aldh1l1'].X == 0)),
                                              ((adata_astro[:,'Gfap'].X < cutoff) & 
                                               (adata_astro[:,'Gfap'].X > 0) & 
                                               (adata_astro[:,'Aldh1l1'].X == 0))],
                                              ['h', 'l']
    )
    c = pd.Categorical(adata_astro.obs['gfap_only'])
    adata_astro.obs['gfap_only'] = c.rename_categories({'h': 'gfap_only_high', 
                                                        'l': 'gfap_only_low',
                                                        '0': 'none'}
    )

In [117]:
if bool_plot == True: 
    adata_astro.obs['gfap_only'] = np.select([((adata_astro[:,'Gfap'].X > 0) & 
                                               (adata_astro[:,'Aldh1l1'].X == 0)), 
                                              ((adata_astro[:,'Gfap'].X == 0) &
                                               (adata_astro[:,'Aldh1l1'].X > 0)),
                                              ((adata_astro[:,'Gfap'].X > 0) &
                                               (adata_astro[:,'Aldh1l1'].X > 0)), 
                                              ((adata_astro[:,'Gfap'].X == 0) &
                                               (adata_astro[:,'Aldh1l1'].X == 0))],
                                              ['gfap_only', 'none', 'none', 'none']
    )

In [118]:
#####################################################################################################################
if bool_plot == True: 
    adata_astro.obs['gfap_only'] = np.select([((adata_astro[:,'Gfap'].X > 0) & 
                                               (adata_astro[:,'Aldh1l1'].X == 0))],
                                              ['gfap_only']
    )
    c=pd.Categorical(adata_astro.obs['gfap_only'])
    adata_astro.obs['gfap_only'] = c.rename_categories({'gfap_only': 'gfap_only',
                                                        '0': 'none'}
    )

In [119]:
if bool_plot == True:
    cutoff=statistics.median(adata_astro[adata_astro.obs['gfap_aldh'] == 'aldh_only', 'Aldh1l1'].X)
    adata_astro.obs['aldh_only'] = np.select([((adata_astro[:,'Aldh1l1'].X >= cutoff) &
                                               (adata_astro[:,'Gfap'].X == 0)),
                                              ((adata_astro[:,'Aldh1l1'].X < cutoff) &
                                               (adata_astro[:,'Aldh1l1'].X > 0) & 
                                               (adata_astro[:,'Gfap'].X == 0))],
                                              ['h', 'l']
    )
    c = pd.Categorical(adata_astro.obs['aldh_only'])
    adata_astro.obs['aldh_only'] = c.rename_categories({'h': 'aldh_only_high', 
                                                        'l': 'aldh_only_low', 
                                                        '0': 'none'}
    )

In [120]:
if bool_plot == True:
    adata_astro.obs['aldh_only'] = np.select([((adata_astro[:,'Gfap'].X > 0) &
                                               (adata_astro[:,'Aldh1l1'].X == 0)), 
                                              ((adata_astro[:,'Gfap'].X == 0) & 
                                               (adata_astro[:,'Aldh1l1'].X > 0)),
                                              ((adata_astro[:,'Gfap'].X > 0) & 
                                               (adata_astro[:,'Aldh1l1'].X > 0)), 
                                              ((adata_astro[:,'Gfap'].X == 0) & 
                                               (adata_astro[:,'Aldh1l1'].X == 0))],
                                              ['none', 'aldh_only', 'none', 'none']
    )

In [121]:
#####################################################################################################################
if bool_plot == True: 
    adata_astro.obs['aldh_only'] = np.select([((adata_astro[:,'Gfap'].X == 0) & 
                                               (adata_astro[:,'Aldh1l1'].X > 0))],
                                             ['aldh_only']
    )
    c=pd.Categorical(adata_astro.obs['aldh_only'])
    adata_astro.obs['aldh_only'] = c.rename_categories({'aldh_only': 'aldh_only',
                                                        '0': 'none'}
    )

In [122]:
if bool_plot == True:    
    cutoff=statistics.median(adata_astro[adata_astro.obs['gfap_aldh'] == 'both', 'Gfap'].X)
    adata_astro.obs['both'] = np.select([((adata_astro[:,'Aldh1l1'].X >= cutoff) & 
                                          (adata_astro[:,'Gfap'].X >= cutoff)),
                                         ((adata_astro[:,'Aldh1l1'].X < cutoff) & 
                                          (adata_astro[:,'Aldh1l1'].X > 0) & 
                                          (adata_astro[:,'Gfap'].X < cutoff) &
                                          (adata_astro[:,'Gfap'].X>0))],
                                        ['h','l']
    )
    c = pd.Categorical(adata_astro.obs['both'])
    adata_astro.obs['both'] = c.rename_categories({'h': 'both_high',
                                                   'l': 'both_low',
                                                   '0': 'none'}
    )

In [123]:
if bool_plot == True:
    adata_astro.obs['both'] = np.select([((c > 0) & (adata_astro[:,'Aldh1l1'].X == 0)), 
                                        ((adata_astro[:,'Gfap'].X == 0) & (adata_astro[:,'Aldh1l1'].X > 0)),
                                        ((adata_astro[:,'Gfap'].X > 0) & (adata_astro[:,'Aldh1l1'].X > 0)), 
                                        ((adata_astro[:,'Gfap'].X == 0) & (adata_astro[:,'Aldh1l1'].X == 0))],
                                        ['none','none','both','none']
    )

In [124]:
#####################################################################################################################
if bool_plot == True: 
    adata_astro.obs['both'] = np.select([((adata_astro[:,'Gfap'].X>0) & (adata_astro[:,'Aldh1l1'].X>0))],
                                        ['both']
    )
    c=pd.Categorical(adata_astro.obs['both'])
    adata_astro.obs['both'] = c.rename_categories({'both': 'both',
                                                   '0': 'none'}
    )

## Common genes

In order to identify the common genes across the diet and marker populations, additional filtering is applied. 
We keep the gene if it is expressed in at least 60% of the cells in a given diet marker population. 

In [125]:
# define the percentage per group
pcent=0.6

### chow

In [126]:
if bool_plot == True: 
    chow_a = adata_astro[(adata_astro.obs['diet']=='chow') & 
                         (adata_astro.obs['gfap_aldh']=='aldh_only'), ]
    chow_g = adata_astro[(adata_astro.obs['diet']=='chow') &
                         (adata_astro.obs['gfap_aldh']=='gfap_only'), ]
    chow_b = adata_astro[(adata_astro.obs['diet']=='chow') &
                         (adata_astro.obs['gfap_aldh']=='both'), ]

    sc.pp.filter_genes(chow_a,
                       min_cells=len(chow_a)*pcent
    )  
    sc.pp.filter_genes(chow_g,
                       min_cells=len(chow_g)*pcent
    )
    sc.pp.filter_genes(chow_b, 
                       min_cells=len(chow_b)*pcent
    )

In [127]:
if bool_plot == True: 
    chow_a.var.to_csv(path_or_buf=dir_tables+"common_chow_aldh.csv", 
                      sep="\t"
    )
    chow_g.var.to_csv(path_or_buf=dir_tables+"common_chow_gfap.csv",
                      sep="\t"
    )
    chow_b.var.to_csv(path_or_buf=dir_tables+"common_chow_both.csv", 
                      sep="\t"
    )

    matplotlib_venn.venn3_unweighted([set(chow_a.var.index),
                                      set(chow_g.var.index),
                                      set(chow_b.var.index)],
                                     set_labels=("Aldh1l1", "Gfap", "Both")
    )
    chow_enrich = gseapy.enrichr(gene_list=chow_a.var.index.intersection(chow_g.var.index).intersection(chow_b.var.index).tolist(),
                                 organism='Mouse',
                                 gene_sets='GO_Biological_Process_2018',
                                 description='pathway',
                                 cutoff=0.05
    )
    gseapy.barplot(chow_enrich.res2d,
                   title='GO_Biological_Process',
                   top_term=15
    )

### HFD_5

In [128]:
if bool_plot == True: 
    hfd5_a = adata_astro[(adata_astro.obs['diet'] == 'hfd_5') &
                         (adata_astro.obs['gfap_aldh'] == 'aldh_only'), ]
    hfd5_g = adata_astro[(adata_astro.obs['diet'] == 'hfd_5') &
                         (adata_astro.obs['gfap_aldh'] == 'gfap_only'), ]
    hfd5_b = adata_astro[(adata_astro.obs['diet'] == 'hfd_5') &
                         (adata_astro.obs['gfap_aldh'] == 'both'), ]

    sc.pp.filter_genes(hfd5_a, 
                       min_cells=len(hfd5_a)*pcent
    ) 
    sc.pp.filter_genes(hfd5_g,
                       min_cells=len(hfd5_g)*pcent
    )
    sc.pp.filter_genes(hfd5_b,
                       min_cells=len(hfd5_b)*pcent
    )

In [129]:
if bool_plot == True: 
    hfd5_a.var.to_csv(path_or_buf=dir_tables+"common_hfd5_aldh.csv",
                      sep="\t"
    )
    hfd5_g.var.to_csv(path_or_buf=dir_tables+"common_hfd5_gfap.csv",
                      sep="\t"
    )
    hfd5_b.var.to_csv(path_or_buf=dir_tables+"common_hfd5_both.csv", 
                      sep="\t"
    )

    matplotlib_venn.venn3_unweighted([set(hfd5_a.var.index),
                                      set(hfd5_g.var.index),
                                      set(hfd5_b.var.index)],
                                     set_labels=("Aldh1l1", "Gfap", "Both"))

    hfd5_enrich = gseapy.enrichr(gene_list=hfd5_a.var.index.intersection(hfd5_g.var.index).intersection(hfd5_b.var.index).tolist(),
                                 organism='Mouse',
                                 gene_sets='GO_Biological_Process_2018',
                                 description='pathway',
                                 cutoff=0.05
    )
    gseapy.barplot(hfd5_enrich.res2d,
                   title='GO_Biological_Process',
                   top_term=15
    )

### HFD_15

In [130]:
if bool_plot == True: 
    hfd15_a = adata_astro[(adata_astro.obs['diet'] == 'hfd_15') &
                          (adata_astro.obs['gfap_aldh'] == 'aldh_only'), ]
    hfd15_g = adata_astro[(adata_astro.obs['diet'] == 'hfd_15') & 
                          (adata_astro.obs['gfap_aldh'] == 'gfap_only'), ]
    hfd15_b = adata_astro[(adata_astro.obs['diet'] == 'hfd_15') & 
                          (adata_astro.obs['gfap_aldh'] == 'both'), ]

    sc.pp.filter_genes(hfd15_a, 
                       min_cells=len(hfd15_a)*pcent
    ) 
    sc.pp.filter_genes(hfd15_g,
                       min_cells=len(hfd15_g)*pcent
    )
    sc.pp.filter_genes(hfd15_b,
                       min_cells=len(hfd15_b)*pcent
    )

In [131]:
if bool_plot == True: 
    hfd15_a.var.to_csv(path_or_buf=dir_tables+"common_hfd15_aldh.csv", 
                       sep="\t"
    )
    hfd15_g.var.to_csv(path_or_buf=dir_tables+"common_hfd15_gfap.csv",
                       sep="\t"
    )
    hfd15_b.var.to_csv(path_or_buf=dir_tables+"common_hfd15_both.csv",
                       sep="\t"
    )

    matplotlib_venn.venn3_unweighted([set(hfd15_a.var.index),
             set(hfd15_g.var.index),
             set(hfd15_b.var.index)
         ], set_labels = ("Aldh1l1", "Gfap", "Both"))

    hfd15_enrich = gseapy.enrichr(gene_list=hfd15_a.var.index.intersection(hfd15_g.var.index).intersection(hfd15_b.var.index).tolist(),
                                 organism='Mouse',
                                 gene_sets='GO_Biological_Process_2018',
                                 description='pathway',
                                 cutoff=0.05
    )
    gseapy.barplot(hfd15_enrich.res2d,
                   title='GO_Biological_Process',
                   top_term=15
    )

Select the genes which apear to be expressed in more then 60% of the cell in at least one marker-diet group

In [132]:
if bool_plot == True: 
    x = np.unique(list(chow_a.var.index)+list(chow_g.var.index)+list(chow_b.var.index)+
                  list(hfd5_a.var.index)+list(hfd5_g.var.index)+list(hfd5_b.var.index)+
                  list(hfd15_a.var.index)+list(hfd15_g.var.index)+list(hfd15_b.var.index))
    print(len(x))

# Visualisation of well known marker genes 

Visualisation of marker genes over the populations and diets. Maker genes are liturature based. For visualisation the dot plot and stacked violin plot are employed

In [133]:
if bool_plot == True:
    homeostasis=['Slc2a1','Slc2a2','Slc2a3','Slc2a4','Slc2a6','Slc38a1','Slc38a2','Slc1a5','Slc16a1','Slc16a3',
                 'Slc5a12','Slc1a2','Slc1a3','Slc7a11','Gck','Pdk1','Pdk4','Prkaa1','Ffar4','Cpt1a','Abhd6','Acadvl',
                 'Acadm','Lpl','Pfkfb3','Cyp11a1','Srd5a1','Srd5a2','Tspo','Apoe','Hcar2','Pkm']         
    hormone=['Insr','Igf1r','Lepr','Glp1r','Ghsr','Mc4r','Mc3r','Npy1r','Npy4r']
    gliotransmision=['Srr','Slc17a6','Kcnj10','Phgdh','Slc7a10','Thbs1','P2ry4','P2ry6','Npy','Nfatc3','Slc6a9',
                     'Vamp2','Vamp3','Stx1a','Snap25','Bdnf','Grm5','Nlgn2','Gpc4'] #,'Odn'
    angiogenic=['Vegfa','Vegfb','Vegfc','Angpt1','Angpt2','Epo','Pdgfa','Pdgfb','Fgf2','Ang','Edn1','Nos3','Havcr2']
    ecm=['Vim','Col4a1','Cspg4','Tgfb1','Irf1','Cxcl12','Cxcr4','Sema4d','Cd44','Ncan']#,'Acan','Plaur','Sema3a'
    ucp2thyroid=['Ucp2','Dio2','Dio3','Slc16a2','Slco1c1','Sod1','Cybb','Gpx1','Gpx4','Ddit3','Atf4']    
    iron=['Cp','Tfrc','Fth1','Ftmt','Slc11a2']        
    cellcycle=['E2f2','Pcna','Mcm2','H4c1','Slbp','Rrm2','Rfc4','Ccna2',
               'Cdc25b','Rad21','Cdkn3','Pttg1','Kif11','Plk1','Psat1','Mybl2','Ccnd1']
              #'Ccne1','Ccne2','Bub1','Top2a','Bub1b','Kif2c','Kif4','Aurka','Foxm1','Mki67','Ccnf','Ccnb1','Ccnb2',
    inflamation=['Csf1','Cxcl1','Il4','Il2','Ccl9','Ccl11','Ccl19','Vegfa','Tlr4','Ccl12','Ifng','C5ar2','Il1b',
                 'Il6','C3','Ccl5','Cxcl2','Il1r1','Il6ra','Lcn2']    
                 ##,'Il12p70','Tnfa','Cxcl8','Ccl21a','Csf3','Il10','Il12b','Il3','Il5','Lep','Tnfsf11'
    calcium=['Itpr2','Plcb1','Plcg1','Prkca','Prkcb','Nfatc1','Calm1']
    hedgehog=['Smo','Shh','Ptch1','Ptch2','Gli1','Gli2','Gli3']
    others=['Tbx3','Cited1','Fech','Cpe','Clu']
    a1astro=['H2-T23','Serping1','H2-D1','Ggta1','Iigp1','Gbp2','Fbln5','Fkbp5','Psmb8','Srgn','Amigo2'] # ,'Ugt1a1'
    a2astro=['Clcf1','Tgm1','Ptx3','S100a10','Sphk1','Cd109','Ptgs2','Emp1','Slc10a6','Tm4sf1','B3gnt5','Cd14']

    marker_genes_dict={'homeostasis': ['Slc2a1','Slc2a2','Slc2a3','Slc2a4','Slc2a6','Slc38a1','Slc38a2','Slc1a5',
                                       'Slc16a1','Slc16a3','Slc5a12','Slc1a2','Slc1a3','Slc7a11','Gck','Pdk1','Pdk4',
                                       'Prkaa1','Ffar4','Cpt1a','Abhd6','Acadvl','Acadm','Lpl','Pfkfb3','Cyp11a1',
                                       'Srd5a1','Srd5a2','Tspo','Apoe','Hcar2','Pkm'],
                       'hormone': ['Insr','Igf1r','Lepr','Glp1r','Ghsr','Mc4r','Mc3r','Npy1r','Npy4r'],
                       'gliotransmision': ['Srr','Slc17a6','Kcnj10','Phgdh','Slc7a10','Thbs1','P2ry4','P2ry6','Npy',
                                           'Nfatc3','Slc6a9','Vamp2','Vamp3','Stx1a','Snap25','Bdnf','Grm5','Nlgn2',
                                           'Gpc4'],
                       'angiogenic': ['Vegfa','Vegfb','Vegfc','Angpt1','Angpt2','Epo','Pdgfa','Pdgfb','Fgf2','Ang',
                                      'Edn1','Nos3','Havcr2'],
                       'ecm': ['Vim','Col4a1','Cspg4','Tgfb1','Irf1','Cxcl12','Cxcr4','Plaur','Sema3a','Sema4d',
                               'Cd44','Ncan','Acan'],
                       'ucp2thyroid': ['Ucp2','Dio2','Dio3','Slc16a2','Slco1c1','Sod1','Cybb','Gpx1','Gpx4','Ddit3',
                                       'Atf4'],
                       'iron': ['Cp','Tfrc','Fth1','Ftmt','Slc11a2'],
                       'cellcycle': ['Ccne1','Ccne2','E2f2','Pcna','Mcm2','H4c1','Slbp','Rrm2','Rfc4','Top2a','Ccna2',
                                     'Ccnf','Ccnb1','Ccnb2','Bub1','Bub1b','Cdc25b','Rad21','Cdkn3','Pttg1','Kif11',
                                     'Kif2c','Kif4','Plk1','Aurka','Psat1','Foxm1','Mki67','Mybl2','Ccnd1'],
                       'inflamation': ['Ccl9','Ccl21a','Lep','Csf3','Ccl11','Il12b','Tnfsf11','Ccl19','Csf1','Cxcl1',
                                       'Il4','Il2','Il5','Ccl12','Il10','Il3','Ifng','Vegfa','C5ar2','Il1b','Il6',
                                       'C3','Ccl5','Cxcl2','Tlr4','Il1r1','Il6ra','Lcn2'],
                       'calcium': ['Itpr2','Plcb1','Plcg1','Prkca','Prkcb','Nfatc1','Calm1'],
                       'hedgehog': ['Smo','Shh','Ptch1','Ptch2','Gli1','Gli2','Gli3'],
                       'others': ['Tbx3','Cited1','Fech','Cpe','Clu'],
                       'a1astro': ['H2-T23','Serping1','H2-D1','Ggta1','Iigp1','Gbp2','Fbln5','Ugt1a1','Fkbp5',
                                   'Psmb8','Srgn','Amigo2'],
                       'a2astro': ['Clcf1','Tgm1','Ptx3','S100a10','Sphk1','Cd109','Ptgs2','Emp1','Slc10a6',
                                   'Tm4sf1','B3gnt5','Cd14']}

### Stacked violin plots

In [134]:
if bool_plot == True:
    adata_astro.obs['diet_marker'] = adata_astro.obs['gfap_aldh'] + '_' + adata_astro.obs['diet'].astype(str)
    adata_proc.obs['diet_cluster'] = adata_proc.obs['leiden'].astype(str) + '_' + adata_proc.obs['diet'].astype(str)

In [135]:
if bool_plot == True:
    sc.pl.dotplot(adata=adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                  var_names=inflamation, 
                  groupby='diet_marker',
                  use_raw=False, 
                  log=False, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  title="Inflamation",
                  swap_axes=True,
                  show=True, 
                  save="all_cells_ARCneuro_markers.png"
    )

In [136]:
if bool_plot==True:
    sc.pl.dotplot(adata=adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                  var_names=marker_genes_dict, 
                  groupby='diet_marker',
                  use_raw=False, 
                  log=False, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  swap_axes=True,
                  show=True, 
                  save="all_cells_ARCneuro_markers.png"
    )

In [137]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                         a1astro,
                         groupby='diet_marker', 
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="inflamation",
                         swap_axes=True
    )

In [138]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_proc,
                         inflamation,
                         groupby='diet_cluster', 
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="inflamation",
                         swap_axes=True
    )

In [139]:
if bool_plot == True:
    plt.rcParams['figure.figsize']=[8,8]

    adata_c = adata_astro[adata_astro.obs['diet'] == "chow",]
    adata_5 = adata_astro[adata_astro.obs['diet'] == "hfd_5",]
    adata_15 = adata_astro[adata_astro.obs['diet'] == "hfd_15",]

    gf_c = (adata_c[:,'Gfap'].X > 0) & (adata_c[:,'Aldh1l1'].X == 0) 
    gfsum_c = sum(gf_c)
    al_c = (adata_c[:,'Gfap'].X == 0) & (adata_c[:,'Aldh1l1'].X > 0) 
    alsum_c = sum(al_c)
    bt_c = (adata_c[:,'Gfap'].X > 0) & (adata_c[:,'Aldh1l1'].X > 0) 
    btsum_c = sum(bt_c)

    gf_5 = (adata_5[:,'Gfap'].X > 0) & (adata_5[:,'Aldh1l1'].X == 0) 
    gfsum_5 = sum(gf_5)
    al_5 = (adata_5[:,'Gfap'].X == 0) & (adata_5[:,'Aldh1l1'].X > 0) 
    alsum_5 = sum(al_5)
    bt_5 = (adata_5[:,'Gfap'].X > 0) & (adata_5[:,'Aldh1l1'].X > 0) 
    btsum_5 = sum(bt_5)

    gf_15 = (adata_15[:,'Gfap'].X > 0) & (adata_15[:,'Aldh1l1'].X == 0) 
    gfsum_15 = sum(gf_15)
    al_15 = (adata_15[:,'Gfap'].X == 0) & (adata_15[:,'Aldh1l1'].X > 0) 
    alsum_15 = sum(al_15)
    bt_15 = (adata_15[:,'Gfap'].X > 0) & (adata_15[:,'Aldh1l1'].X > 0) 
    btsum_15 = sum(bt_15)

In [140]:
if bool_plot == True:
    df = pd.DataFrame(columns=['gfap_c','gfap_5','gfap_15','aldh_c','aldh_5','aldh_15','both_c','both_5','both_15'])
    for i in hormone:
        df = df.append({'gfap_c': sum((adata_c[:,i].X > 0) & gf_c)/gfsum_c*100,
                        'gfap_5': sum((adata_5[:,i].X > 0) & gf_5)/gfsum_5*100,
                        'gfap_15': sum((adata_15[:,i].X > 0) & gf_15)/gfsum_15*100,

                        'aldh_c': sum((adata_c[:,i].X > 0) & al_c)/alsum_c*100,
                        'aldh_5': sum((adata_5[:,i].X > 0) & al_5)/alsum_5*100,
                        'aldh_15': sum((adata_15[:,i].X > 0) & al_15)/alsum_15*100,

                        'both_c': sum((adata_c[:,i].X > 0) & bt_c)/btsum_c*100,
                        'both_5': sum((adata_5[:,i].X > 0) & bt_5)/btsum_5*100,
                        'both_15': sum((adata_15[:,i].X > 0) & bt_15)/btsum_15*100,

                       }
                       ,ignore_index=True
        )
    df.index=hormone
    for i in df.columns:
        df[i] = df[i].str.get(0)
    df.round(1)#.to_csv("hormone_receptors.csv")
    sb.heatmap(df,
               annot=True, 
               cmap="viridis",
               vmin=0,
               vmax=100
    )

# Illustration and differential expression analysis 

## Scatter plot and pie chart - chow

### chow

In [141]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                        ['gfap_only'],
                        palette=['silver', 'darkmagenta'],
                        save="_astrocytes_gfap_only_chow.png", 
                        size=40, 
                        color_map=mymap, 
                        use_raw=False,
                        frameon=False,
                        title='',
                        legend_loc=None
    ) 
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['gfap_only'].value_counts().sort_index()
    plt.pie(data,
            autopct=lambda pct: cf.func(pct, data),
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'darkmagenta'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_gfap_only_chow.pdf')

In [142]:
if bool_plot == True:
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['gfap_only'].value_counts().sort_index()

    fig, (ax0, ax1)=plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                     color=['gfap_only'],
                     size=40,
                     ax=ax0,
                     show=False,
                     palette=['silver','magenta'], #,'darkmagenta'
                     frameon=False
    ) 
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data), 
                                       textprops=dict(color="black", fontsize=24),
                                       colors=['silver', 'magenta'],  #,'darkmagenta'
                                       pctdistance=1.2
    ) 
    ax1.set_title('Chow gfap_only share')

    fig.savefig(sc_settings_figdir+'pie_astrocytes_gfap_only_chow.png')
    #cf.plot_umap_marker(adata_astro[adata_astro.obs['diet']=='chow', ], ['gfap_only'], 
                        #palette=['gainsboro','darkmagenta','magenta'],
                        #save="_astrocytes_gfap_only_chow.png", size=30, color_map=mymap, use_raw=False)

#### Aldh1l1

In [143]:
#####################################################################################################################
if bool_plot==True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                        ['aldh_only'], 
                        palette=['silver', 'green'],
                        save="_astrocytes_aldh_only_chow.png",
                        size=40, 
                        color_map=mymap,
                        use_raw=False, 
                        frameon=False,
                        title='', 
                        legend_loc=None
    )
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['aldh_only'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data), 
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'green'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_aldh_only_chow.pdf')

In [144]:
if bool_plot == True:  
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['aldh_only'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                     color=['aldh_only'],
                     size=40,
                     ax=ax0,
                     show=False,
                     palette=['silver', 'green'],
                     frameon=False
    )
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data),
                                       textprops=dict(color="black", fontsize=24),
                                       colors=['silver', 'green'],
                                       pctdistance=1.2
    )
    ax1.set_title('Chow aldh_only share')

    fig.savefig(sc_settings_figdir+'umap_astrocytes_aldh_only_chow.png')
    #cf.plot_umap_marker(adata_astro[adata_astro.obs['diet']=='chow', ], ['aldh_only'], palette=['gainsboro','green','lime'],
    #                    save="_astrocytes_aldh_only_chow.png", size=30, color_map=mymap, use_raw=False)

#### Doble positive

In [145]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                        ['both'], 
                        palette=['silver', 'blue'],
                        save="_astrocytes_both_chow.png", 
                        size=40,
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='', 
                        legend_loc=None
    )

    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['both'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data),
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'blue'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_both_chow.pdf')

In [146]:
if bool_plot == True:
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['both'].value_counts().sort_index()
    
    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                     color=['both'],
                     size=40, 
                     ax=ax0,
                     show=False,
                     palette=['silver', 'blue'],
                     frameon=False
    )
    wedges, texts, autotexts = ax1.pie(data, autopct=lambda pct: func(pct, data),
                                       textprops=dict(color="black", fontsize=24),
                                       colors=['silver', 'blue'],
                                       pctdistance=1.2  #,fontsize=12
    ) 
    ax1.set_title('Chow both share')

    fig.savefig(sc_settings_figdir+'_astrocytes_both_chow.png')
    #cf.plot_umap_marker(adata_astro[adata_astro.obs['diet']=='chow', ], ['both'], palette=['gainsboro','blue','dodgerblue'],
    #                    save="_astrocytes_both_chow.png", size=30, color_map=mymap, use_raw=False)

#### All together

In [147]:
#####################################################################################################################
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                        ['gfap_aldh'], 
                        palette=['green', 'blue', 'magenta', 'gainsboro'],
                        save="_astrocytes_gfap_aldh_chow.png",
                        size=40,
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='', 
                        legend_loc=None
    )
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['gfap_aldh'].value_counts().sort_index()
    plt.pie(data,
            autopct=lambda pct: cf.func(pct, data), 
            textprops=dict(color="black", fontsize=24),
            colors=['green', 'blue', 'darkmagenta', 'gainsboro'],
            pctdistance=1.3
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_all_chow.pdf')

In [148]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[(adata_astro.obs['diet'] == 'chow') & 
                                    (adata_astro.obs['gfap_aldh'] != 'gfap_only'), ],
                        ['gfap_aldh'], 
                        palette=['green', 'blue', 'gainsboro'],
                        save="_astrocytes_gfap_aldh_chow.png", 
                        size=30,
                        color_map=mymap,
                        use_raw=False
    )

In [149]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[(adata_astro.obs['diet'] == 'chow') & 
                                    (adata_astro.obs['gfap_aldh'] != 'aldh_only'), ], 
                        ['gfap_aldh'], 
                        palette=['blue', 'magenta', 'gainsboro'],
                        save="_astrocytes_gfap_aldh_chow.png",
                        size=30,
                        color_map=mymap, 
                        use_raw=False
    )

In [150]:
if bool_plot == True:
    cf.plot_umap_marker(adata_astro[(adata_astro.obs['diet'] == 'chow') & 
                                    (adata_astro.obs['gfap_aldh'] != 'both'), ],
                        ['gfap_aldh'], 
                        palette=['green', 'magenta', 'gainsboro'],
                        save="_astrocytes_gfap_aldh_chow.png",
                        size=30,
                        color_map=mymap,
                        use_raw=False
    )

In [151]:
if bool_plot == True:
    data = adata_astro[adata_astro.obs['diet'] == 'chow', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                     color=['gfap_aldh'],
                     size=40, 
                     ax=ax0, 
                     show=False,
                     palette=['green', 'blue', 'magenta', 'gainsboro'],
                     frameon=False
    )
    wedges, texts, autotexts = ax1.pie(data,
                                       autopct=lambda pct: func(pct, data),
                                       textprops=dict(color="black", fontsize=24),
                                       colors=['green', 'blue', 'magenta', 'gainsboro'],
                                       pctdistance=1.2
    ) 
    fig.savefig(sc_settings_figdir+'_astrocytes_both_chow.png')

## Differential gene expression - chow

Differential expression analysis comparing each population agains another two using Welch t-test. With violin plot illustrattion of significant genes.

#### Aldh1l1

In [152]:
#####################################################################################################################
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap','Aldh1l1'])]
    astro_0.obs['aldh-gfap_both'] = astro_0.obs['gfap_aldh'].replace({'both': 'gfap_both',
                                                                      'gfap_only': 'gfap_both'}
    )
    astro0a = astro_0[(astro_0.obs['diet'] == 'chow') & 
                      (astro_0.obs['aldh-gfap_both'].isin(['aldh_only', 'gfap_both'])), ]
    astro0a = astro0a[:,astro0a.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro0a,
                               sample_description=astro0a.obs,
                               grouping="aldh-gfap_both",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"aldh-gfap_both-chow.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))
    dets_mark_summary.sort_values(by=['qval']).iloc[:10,:]  # show the top 10 genes

In [153]:
if bool_plot == True:
    pk = dets_mark_summary.sort_values(by=['qval']).gene[:1]
    sc.pl.stacked_violin(adata_astro[(adata_astro.obs['diet'].isin(['chow'])) & 
                                     (adata_astro.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))], 
                         pk,      
                         groupby='gfap_aldh', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

#### Gfap

In [154]:
#####################################################################################################################
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap','Aldh1l1'])]
    astro_0.obs['gfap-aldh_both'] = astro_0.obs['gfap_aldh'].replace({'both': 'aldh_both',
                                                                      'aldh_only': 'aldh_both'}
    )
    # select only: gfap_only and aldh_both from chow
    astro0g = astro_0[(astro_0.obs['diet'] == 'chow') &
                      (astro_0.obs['gfap-aldh_both'].isin(['gfap_only','aldh_both'])), ]
    astro0g = astro0g[:,astro0g.var.index.isin(x)]
        
    dets_mark = de.test.t_test(data=astro0g, 
                               sample_description=astro0g.obs, 
                               grouping="gfap-aldh_both",
                               is_logged=False
    )
    dets_mark_summary=dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"gfap-aldh_both-chow.csv", 
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))
    dets_mark_summary.sort_values(by=['pval']).iloc[:20,:]


In [155]:
if bool_plot == True:
    pk = dets_mark_summary.sort_values(by=['qval']).gene[:19]
    sc.pl.stacked_violin(adata_astro[(adata_astro.obs['diet'].isin(['chow'])) & 
                                     (adata_astro.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))],  
                          pk,      
                          groupby='gfap_aldh',
                          use_raw=False,
                          swap_axes=True,
                          dendrogram=False,
                          title=''
    )

#### Both

In [156]:
#####################################################################################################################
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_0.obs['both-aldh_gfap'] = astro_0.obs['gfap_aldh'].replace({'gfap_only': 'aldh_gfap', 
                                                                      'aldh_only': 'aldh_gfap'}
    )
    astro0b = astro_0[(astro_0.obs['diet'] == 'chow') &
                      (astro_0.obs['both-aldh_gfap'].isin(['both', 'aldh_gfap'])), ]
    astro0b = astro0b[:,astro0b.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro0b,
                               sample_description=astro0b.obs,
                               grouping="both-aldh_gfap",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"both-aldh_gfap-chow.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))
    dets_mark_summary.sort_values(by=['pval']).iloc[:40,:]

In [157]:
if bool_plot == True:
    pk = dets_mark_summary.sort_values(by=['qval']).gene[:35]
    sc.pl.stacked_violin(adata_astro[(adata_astro.obs['diet'].isin(['chow'])) & 
                                     (adata_astro.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))],  
                          pk,      
                          groupby='gfap_aldh', 
                          use_raw=False,
                          swap_axes=True,
                          dendrogram=False,
                          title=''
    )

In [158]:
if bool_plot == True:
    matplotlib_venn.venn3_unweighted([set(dets_mark_summaryA[dets_mark_summaryA['qval'] < 0.05]['gene']),
                                      set(dets_mark_summaryG[dets_mark_summaryG['qval'] < 0.05]['gene']),
                                      set(dets_mark_summaryB[dets_mark_summaryB['qval'] < 0.05]['gene'])],
                                     set_labels = ("Aldh1l1", "Gfap", "Double_positive")
    )

## Scatter plot and pie chart - hfd 5

### hfd 5

#### Gfap

In [159]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                        ['gfap_only'],
                        palette=['silver', 'magenta'],
                        save="_astrocytes_gfap_only_hfd5.png",
                        size=40,
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='', 
                        legend_loc=None
    ) 

    data = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs['gfap_only'].value_counts().sort_index()
    plt.pie(data,
            autopct=lambda pct: cf.func(pct, data),
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'magenta'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_gfap_only_hfd5.pdf')

#### Aldhl1l

In [160]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                        ['aldh_only'], 
                        palette=['silver', 'limegreen'],
                        save="_astrocytes_aldh_only_hfd5.png", 
                        size=40, 
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='',
                        legend_loc=None
    )
    data = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs['aldh_only'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data), 
            textprops=dict(color="black",fontsize=24),
            colors=['silver','limegreen'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_aldh_only_hfd5.pdf')

#### Doble positive

In [161]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                        ['both'],
                        palette=['silver','dodgerblue'],
                        save="_astrocytes_both_hfd5.png", 
                        size=40, 
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='',
                        legend_loc=None
    )
    data = adata_astro[adata_astro.obs['diet'] == 'hfd_5', ].obs['both'].value_counts().sort_index()
    plt.pie(data,
            autopct=lambda pct: cf.func(pct, data),
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'dodgerblue'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_both_hfd5.pdf')

### Differential gene expression - hfd5

#### Aldhl1l

In [162]:
#####################################################################################################################
if bool_plot == True:
    astro_5 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_5.obs['aldh-gfap_both'] = astro_5.obs['gfap_aldh'].replace({'both': 'gfap_both',
                                                                      'gfap_only': 'gfap_both'}                                                              
    )
    
    astro5a = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                      (astro_5.obs['aldh-gfap_both'].isin(['aldh_only', 'gfap_both'])), ]
    astro5a = astro5a[:,astro5a.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5a,
                               sample_description=astro5a.obs,
                               grouping="aldh-gfap_both",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"aldh-gfap_both-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

#### Gfap

In [163]:
#####################################################################################################################
if bool_plot == True:
    astro_5 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_5.obs['gfap-aldh_both'] = astro_5.obs['gfap_aldh'].replace({'both': 'aldh_both', 
                                                                      'aldh_only': 'aldh_both'}
    )
    # select only: gfap_only and aldh_both from chow
    astro5g = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                      (astro_5.obs['gfap-aldh_both'].isin(['gfap_only', 'aldh_both'])), ]
    astro5g = astro5g[:,astro5g.var.index.isin(x)]
        
    dets_mark = de.test.t_test(data=astro5g, 
                               sample_description=astro5g.obs,
                               grouping="gfap-aldh_both",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"gfap-aldh_both-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

#### Doble positive

In [164]:
#####################################################################################################################
if bool_plot == True:
    astro_5 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_5.obs['both-aldh_gfap'] = astro_5.obs['gfap_aldh'].replace({'gfap_only': 'aldh_gfap', 
                                                                      'aldh_only': 'aldh_gfap'}
    )
   
    astro5b = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                      (astro_5.obs['both-aldh_gfap'].isin(['both', 'aldh_gfap'])), ]
    astro5b = astro5b[:,astro5b.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5b,
                               sample_description=astro5b.obs,
                               grouping="both-aldh_gfap",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"both-aldh_gfap-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

### hfd 15

#### Gfap

In [165]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ],
                        ['gfap_only'], 
                        palette=['silver', 'violet'],
                        save="_astrocytes_gfap_only_hfd15.png",
                        size=40, 
                        color_map=mymap, 
                        use_raw=False, 
                        frameon=False,
                        title='', 
                        legend_loc=None
    ) 

    data = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs['gfap_only'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data),
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'violet'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_gfap_only_hfd15.pdf')

#### Aldhl1l

In [166]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ], 
                        ['aldh_only'],
                        palette=['silver','lime'],
                        save="_astrocytes_aldh_only_hfd15.png",
                        size=40, 
                        color_map=mymap,
                        use_raw=False,
                        frameon=False,
                        title='',
                        legend_loc=None
    )
    data = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs['aldh_only'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data), 
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'lime'],
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_aldh_only_hfd15.pdf')

#### Doble positive

In [167]:
#####################################################################################################################
if bool_plot == True:    
    cf.plot_umap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ], 
                        ['both'], 
                        palette=['silver', 'darkturquoise'],
                        save="_astrocytes_both_hfd15.png",
                        size=40,
                        color_map=mymap, 
                        use_raw=False, 
                        frameon=False,
                        title='',
                        legend_loc=None
    )

    data = adata_astro[adata_astro.obs['diet'] == 'hfd_15', ].obs['both'].value_counts().sort_index()
    plt.pie(data, 
            autopct=lambda pct: cf.func(pct, data), 
            textprops=dict(color="black", fontsize=24),
            colors=['silver', 'darkturquoise'], 
            pctdistance=1.2
    ) 
    plt.savefig(sc_settings_figdir+'pie_astrocytes_both_hfd15.pdf')

### Differential gene expression - hfd15

#### Aldhl1l

In [168]:
#####################################################################################################################
if bool_plot == True:
    astro_15 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_15.obs['aldh-gfap_both'] = astro_15.obs['gfap_aldh'].replace({'both': 'gfap_both', 
                                                                        'gfap_only': 'gfap_both'}
    )
    astro15a = astro_15[(astro_15.obs['diet'] == 'hfd_15') &
                        (astro_15.obs['aldh-gfap_both'].isin(['aldh_only', 'gfap_both'])), ]
    astro15a = astro15a[:,astro15a.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro15a,
                               sample_description=astro15a.obs,
                               grouping="aldh-gfap_both", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"aldh-gfap_both-hfd15.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

#### Gfap

In [169]:
#####################################################################################################################
if bool_plot == True:
    astro_15 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_15.obs['gfap-aldh_both'] = astro_15.obs['gfap_aldh'].replace({'both': 'aldh_both', 
                                                                        'aldh_only': 'aldh_both'}
    )
    # select only: gfap_only and aldh_both from chow
    astro15g = astro_15[(astro_15.obs['diet'] == 'hfd_15') & 
                        (astro_15.obs['gfap-aldh_both'].isin(['gfap_only', 'aldh_both'])), ]
    astro15g = astro15g[:,astro15g.var.index.isin(x)]
        
    dets_mark = de.test.t_test(data=astro15g,
                               sample_description=astro15g.obs,
                               grouping="gfap-aldh_both",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"gfap-aldh_both-hfd15.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

#### Doble positive

In [170]:
#####################################################################################################################
if bool_plot == True:
    astro_15 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro_5.obs['both-aldh_gfap'] = astro_5.obs['gfap_aldh'].replace({'gfap_only': 'aldh_gfap',
                                                                      'aldh_only': 'aldh_gfap'}
    )
    astro5b = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                      (astro_5.obs['both-aldh_gfap'].isin(['both', 'aldh_gfap'])), ]
    astro5b = astro5b[:,astro5b.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5b, 
                               sample_description=astro5b.obs,
                               grouping="both-aldh_gfap",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"both-aldh_gfap-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['pval'] < 0.05]))

In [171]:
if bool_plot == True:
    chow_aldh_gfap['Condition']='chow_aldh_gfap'
    chow_both_gfap['Condition']='chow_both_gfap'
    chow_aldh_both['Condition']='chow_aldh_both'

    df = chow_aldh_gfap.append([chow_both_gfap, chow_aldh_both])
    df['Count'] = df['Overlap'].str.split('/').str[0]
    df = df.astype({'Count': 'int32'})

    #plt.rcParams['figure.figsize'] = [5, 10]
    
    #df=df.loc[df['Term'].isin(rpath), ]  # select only the pathways of interest
    sct = plt.scatter(x="Condition",
                      y="Term",
                      s=5*df['Count'], 
                      c=df['Adjusted P-value'], 
                      cmap='inferno',
                      data=df
    )
    plt.margins(x=0.5,
                y=0.05
    )
    #plt.xticks(rotation=30, ha='right')
    plt.colorbar(alpha=0.05, 
                 aspect=16,
                 shrink=0.4,
                 label='adjusted p-value'
    ) 
    #plt.legend(*sct.legend_elements('sizes', num=[5,10,20,30,40,80], alpha=0.6), title="number of genes")
    plt.show()

In [172]:
if bool_plot == True:
    chow_aldh_gfap['Condition']='chow_aldh_gfap'
    chow_both_gfap['Condition']='chow_both_gfap'
    chow_aldh_both['Condition']='chow_aldh_both'
    hfd5_aldh_gfap['Condition']='hfd5_aldh_gfap'
    hfd5_both_gfap['Condition']='hfd5_both_gfap'
    hfd5_aldh_both['Condition']='hfd5_aldh_both'
    hfd15_aldh_gfap['Condition']='hfd15_aldh_gfap'
    hfd15_both_gfap['Condition']='hfd15_both_gfap'
    hfd15_aldh_both['Condition']='hfd15_aldh_both'

    df = chow_aldh_gfap.append([chow_both_gfap, chow_aldh_both, hfd5_aldh_gfap,
                                hfd5_both_gfap, hfd5_aldh_both, hfd15_aldh_gfap,
                                hfd15_both_gfap, hfd15_aldh_both]
    )
    df['Count'] = df['Overlap'].str.split('/').str[0]
    df=df.astype({'Count': 'int32'})

    #plt.rcParams['figure.figsize']=[8, 20]
    
    #df=df.loc[df['Term'].isin(rpath), ]  # select only the pathways of interest
    sct = plt.scatter(x="Condition",
                      y="Term",
                      s=df['Count'],
                      c=df['Adjusted P-value'],
                      cmap='inferno', 
                      data=df
    )
    plt.margins(0.02)
    plt.xticks(rotation=30,
               ha='right'
    )
    plt.colorbar(alpha=0.05,
                 aspect=16,
                 shrink=0.4,
                 label='adjusted p-value'
    ) 
    #plt.legend(*sct.legend_elements('sizes', num=[5,10,20,30,40,80], alpha=0.6), title="number of genes")
    plt.show()

# Differential expression - diet effect

### Aldh1l1

#### chow_vs_hfd5

In [173]:
#####################################################################################################################
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro5a = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_5']) & 
                      (astro_0.obs['gfap_aldh'] == 'aldh_only')), ]
    astro5a = astro5a[:,astro5a.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5a,
                               sample_description=astro5a.obs, 
                               grouping="diet", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summaryA5 = dets_mark_summary[dets_mark_summary['qval'] < 0.05]
    dets_mark_summaryA5.to_csv(path_or_buf=dir_tables+"diet_aldh_only_chow-hfd5.csv",
                               sep="\t"
    )
    print(len(dets_mark_summaryA5[dets_mark_summaryA5['qval'] < 0.05]))

#### chow_vs_hfd15

In [174]:
#####################################################################################################################
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    astro15a = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_15']) & 
                      (astro_0.obs['gfap_aldh'] == 'aldh_only')), ]
    astro15a = astro15a[:,astro15a.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro15a,
                               sample_description=astro15a.obs,
                               grouping="diet",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summaryA15 = dets_mark_summary[dets_mark_summary['qval'] < 0.05]  
    dets_mark_summaryA15.to_csv(path_or_buf=dir_tables+"diet_aldh_only_chow-hfd15.csv",
                                sep="\t"
    )
    print(len(dets_mark_summaryA15[dets_mark_summaryA15['qval'] < 0.05]))

In [175]:
if bool_plot == True:
    matplotlib_venn.venn2_unweighted([ set(dets_mark_summaryA5.iloc[:,0],),
                                       set(dets_mark_summaryA15.iloc[:,0]),],
                                     set_labels=("chow-hfd5", "chow-hfd15")
    )

In [176]:
if bool_plot == True:
    for i in dets_mark_summary[dets_mark_summary['qval'] < 0.05]['gene']:
        sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                             i,#list(set(dets_mark_summaryA5.iloc[:,0])-set(dets_mark_summaryA15.iloc[:,0])),
                             groupby='diet_marker1',
                             use_raw=False, 
                             log=False, 
                             dendrogram=False,
                             rotation=90,
                             title="",
                             size=10,
                             colorbar_title='',
                             swap_axes=True
        )

### Gfap

#### chow_vs_hfd5

In [177]:
#####################################################################################################################
if bool_plot == True:
    astro5g = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_5']) & 
                      (astro_0.obs['gfap_aldh'] == 'gfap_only')), ]
    astro5g = astro5g[:,astro5g.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5g,
                               sample_description=astro5g.obs,
                               grouping="diet",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"diet_gfap_only_chow-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))

#### chow_vs_hfd15

In [178]:
#####################################################################################################################
if bool_plot == True:
    astro15g = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_15']) & 
                       (astro_0.obs['gfap_aldh'] == 'gfap_only')), ]
    astro15g = astro15g[:,astro15g.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro15g,
                               sample_description=astro15g.obs,
                               grouping="diet", 
                               is_logged=False
    )
    dets_mark_summary=dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"diet_gfap_only_chow-hfd15.csv", 
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))

### Double positive

#### chow_vs_hfd5

In [179]:
#####################################################################################################################
if bool_plot == True:
    astro5b = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_5']) & 
                      (astro_0.obs['gfap_aldh'] == 'both')), ]
    astro5b = astro5b[:,astro5b.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro5b,
                               sample_description=astro5b.obs, 
                               grouping="diet", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"diet_both_chow-hfd5.csv",
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))

#### chow_vs_hfd15

In [180]:
#####################################################################################################################
if bool_plot == True:
    astro15b = astro_0[(astro_0.obs['diet'].isin(['chow', 'hfd_15']) & 
                       (astro_0.obs['gfap_aldh'] == 'both')), ]
    astro15b = astro15b[:,astro15b.var.index.isin(x)]
    
    dets_mark = de.test.t_test(data=astro15b, 
                               sample_description=astro15b.obs, 
                               grouping="diet",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"diet_both_chow-hfd15.csv", 
                             sep="\t"
    )
    print(len(dets_mark_summary[dets_mark_summary['qval'] < 0.05]))

# Welch ANOVA - marker effect

First you need to run all diets (because it will give problem in the plots) and then plot

### chow

In [181]:
if bool_plot == True:
    ch = astro_0[(astro_0.obs['diet'].isin(['chow'])) & 
                 (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))]
    ch = ch[:,ch.var.index.isin(x)]
    gene_name=[]
    pval=[]
    aldh_both_pval=[]
    aldh_gfap_pval=[]
    both_gfap_pval=[]

    for i in range(ch.n_vars):
        gen = ch.var.index[i]

        aldh = ch[ch.obs['gfap_aldh'] == 'aldh_only', gen].X.toarray()
        gfap = ch[ch.obs['gfap_aldh'] == 'gfap_only', gen].X.toarray()
        both = ch[ch.obs['gfap_aldh'] == 'both', gen].X.toarray()
        args=(aldh, gfap, both)
        xt = np.concatenate(args)
        xt = xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['aldh', 'gfap', 'both'],
                                              repeats=[len(aldh), len(gfap), len(both)])}
        ) 

        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score',
                              between='group',
                              data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group',
                                         data=df)['pval']
        gene_name.append(gen)
        pval.append(oall.loc[0])
        aldh_both_pval.append(pergen[0])
        aldh_gfap_pval.append(pergen[1])
        both_gfap_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['aldh_both_pval']=aldh_both_pval
    d['aldh_gfap_pval']=aldh_gfap_pval
    d['both_gfap_pval']=both_gfap_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = cf.fdr(d['pval'])
    # sorrt by corrected pval
    dfin_0 = d.sort_values(by=['pval_adj'])

In [182]:
if bool_plot == True:
    for i in list(set(dfin_0.iloc[:40,0])-set(dfin_5.iloc[:160,0])-set(dfin_15.iloc[:115,0])):
        sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                             i,#list(set(dfin_gf.iloc[:17,0])-set(dfin_al.iloc[:52,0])-set(dfin_b.iloc[:19,0])),
                             groupby='diet_marker1',
                             use_raw=False, 
                             log=False, 
                             dendrogram=False,
                             rotation=90,
                             #scale='area',  # area
                             title="",
                             size=10,
                             colorbar_title='',
                             #vmin=-1, vmax=1,
                             #cmap='bwr',
                             #col_palette=['green','green','green','blue','blue','blue','magenta','magenta','magenta'],
                             swap_axes=True
        )

In [183]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_0[(astro_0.obs['diet'].isin(['chow'])) & 
                                 (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))],
                         list(set(dfin_0.iloc[:40,0])-set(dfin_5.iloc[:160,0])-set(dfin_15.iloc[:115,0])),  
                         groupby='gfap_aldh', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

#### hfd 5

In [184]:
if bool_plot == True:
    h5 = astro_0[(astro_0.obs['diet'].isin(['hfd_5'])) & 
                 (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))]
    h5 = h5[:,h5.var.index.isin(x)]
    gene_name=[]
    pval=[]
    aldh_both_pval=[]
    aldh_gfap_pval=[]
    both_gfap_pval=[]

    for i in range(h5.n_vars):
        gen = h5.var.index[i]

        aldh = h5[h5.obs['gfap_aldh'] == 'aldh_only', gen].X.toarray()
        gfap = h5[h5.obs['gfap_aldh'] == 'gfap_only', gen].X.toarray()
        both = h5[h5.obs['gfap_aldh'] == 'both', gen].X.toarray()
        args=(aldh, gfap, both)
        xt = np.concatenate(args)
        xt = xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['aldh', 'gfap', 'both'], 
                                              repeats=[len(aldh), len(gfap), len(both)])}
        ) 

        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score',
                            between='group', 
                            data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group', 
                                         data=df)['pval']

        gene_name.append(gen)
        pval.append(oall.loc[0])
        aldh_both_pval.append(pergen[0])
        aldh_gfap_pval.append(pergen[1])
        both_gfap_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['aldh_both_pval']=aldh_both_pval
    d['aldh_gfap_pval']=aldh_gfap_pval
    d['both_gfap_pval']=both_gfap_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = cf.fdr(d['pval'])
    # sorrt by corrected pval
    dfin_5 = d.sort_values(by=['pval_adj'])

In [185]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[(astro_0.obs['diet'].isin(['hfd_5'])) & 
                                     (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))],
                         list(set(dfin_5.iloc[:160,0])-set(dfin_15.iloc[:115,0])-set(dfin_0.iloc[:40,0])),
                         groupby='gfap_aldh', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

#### hfd 15

In [186]:
if bool_plot == True: 
    h15 = astro_0[(astro_0.obs['diet'].isin(['hfd_15'])) & 
                  (astro_0.obs['gfap_aldh'].isin(['aldh_only','gfap_only','both']))]
    h15 = h15[:,h15.var.index.isin(x)]
    gene_name=[]
    pval=[]
    aldh_both_pval=[]
    aldh_gfap_pval=[]
    both_gfap_pval=[]

    for i in range(h15.n_vars):
        gen=h15.var.index[i]

        aldh = h15[h15.obs['gfap_aldh'] == 'aldh_only', gen].X.toarray()
        gfap = h15[h15.obs['gfap_aldh'] == 'gfap_only', gen].X.toarray()
        both = h15[h15.obs['gfap_aldh'] == 'both', gen].X.toarray()
        args=(aldh, gfap, both)
        xt = np.concatenate(args)
        xt = xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['aldh', 'gfap', 'both'],
                                              repeats=[len(aldh),len(gfap),len(both)])}) 

        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score', 
                              between='group',
                              data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group',
                                         data=df)['pval']

        gene_name.append(gen)
        pval.append(oall.loc[0])
        aldh_both_pval.append(pergen[0])
        aldh_gfap_pval.append(pergen[1])
        both_gfap_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['aldh_both_pval']=aldh_both_pval
    d['aldh_gfap_pval']=aldh_gfap_pval
    d['both_gfap_pval']=both_gfap_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = cf.fdr(d['pval'])
    # sorrt by corrected pval
    dfin_15 = d.sort_values(by=['pval_adj'])

In [187]:
if bool_plot == True: 
    sc.pl.stacked_violin(astro_0[(astro_0.obs['diet'].isin(['hfd_15'])) & 
                                 (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only', 'both']))],
                         list(set(dfin_15.iloc[:115,0])-set(dfin_5.iloc[:160,0])-set(dfin_0.iloc[:40,0])),      
                         groupby='gfap_aldh', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

In [188]:
if bool_plot == True: 
    matplotlib_venn.venn3_unweighted([set(dfin_0.iloc[:40,0]),
                                      set(dfin_5.iloc[:160,0]),
                                      set(dfin_15.iloc[:115,0])],
                                     set_labels = ("Chow", "Hfd_5", "Hfd_15"))

### Stacked violin plot visualisation over the diets and marker populations

In [189]:
if bool_plot == True:
    adata_astro.obs['diet_marker1']=adata_astro.obs['diet'].astype(str) + '_' + adata_astro.obs['gfap_aldh'].astype(str) 

In [190]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                         list(set(dfin_5.iloc[:160,0])-set(dfin_15.iloc[:115,0])-set(dfin_0.iloc[:40,0])),
                         groupby='diet_marker1',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="aastro",
                         swap_axes=True
    )

In [191]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                         list(set(dfin_0.iloc[:40,0])-set(dfin_5.iloc[:160,0])-set(dfin_15.iloc[:115,0])),  
                         groupby='diet_marker1',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         title="",
                         swap_axes=True
    )

# Welch ANOVA - diet effect

### Aldh1l1

In [192]:
if bool_plot ==True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]

    al = astro_0[astro_0.obs['gfap_aldh'] == 'aldh_only',]
    al = al[:,al.var.index.isin(x)]
    gene_name=[]
    pval=[]
    dt0_dt5_pval=[]
    dt0_dt15_pval=[]
    dt5_dt15_pval=[]

    for i in range(al.n_vars):
        gen=al.var.index[i]

        dt0 = al[al.obs['diet'] == 'chow', gen].X.toarray()
        dt5 = al[al.obs['diet'] == 'hfd_5', gen].X.toarray()
        dt15 = al[al.obs['diet'] == 'hfd_15', gen].X.toarray()
        args = (dt0, dt5, dt15)
        xt  =  np.concatenate(args)
        xt = xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['dt0', 'dt5', 'dt15'],
                                              repeats=[len(dt0), len(dt5), len(dt15)])}
        ) 
        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score',
                              between='group',
                              data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group',
                                         data=df)['pval']
        gene_name.append(gen)
        pval.append(oall.loc[0])
        dt0_dt5_pval.append(pergen[0])
        dt0_dt15_pval.append(pergen[1])
        dt5_dt15_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['dt0_dt5_pval']=dt0_dt5_pval
    d['dt0_dt15_pval']=dt0_dt15_pval
    d['dt5_dt15_pval']=dt5_dt15_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = cf.fdr(d['pval'])
    # sorrt by corrected pval
    dfin_al = d.sort_values(by=['pval_adj'])

In [193]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_0[astro_0.obs['gfap_aldh'] == 'aldh_only',],
                         list(set(dfin_al.iloc[:52,0])-set(dfin_gf.iloc[:17,0])-set(dfin_b.iloc[:19,0])),  
                         #dfin_al.iloc[:52,0],
                         groupby='diet', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

#### Gfap

In [194]:
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]

    gf = astro_0[astro_0.obs['gfap_aldh'] == 'gfap_only',]
    gf = gf[:,gf.var.index.isin(x)]
    gene_name=[]
    pval=[]
    dt0_dt5_pval=[]
    dt0_dt15_pval=[]
    dt5_dt15_pval=[]

    for i in range(gf.n_vars):
        gen = gf.var.index[i]

        dt0 = gf[gf.obs['diet'] == 'chow', gen].X.toarray()
        dt5 = gf[gf.obs['diet'] == 'hfd_5', gen].X.toarray()
        dt15 = gf[gf.obs['diet'] == 'hfd_15', gen].X.toarray()
        args=(dt0, dt5, dt15)
        xt = np.concatenate(args)
        xt=xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['dt0', 'dt5', 'dt15'],
                                              repeats=[len(dt0), len(dt5), len(dt15)])}
        ) 
        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score',
                              between='group',
                              data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group',
                                         data=df)['pval']
        gene_name.append(gen)
        pval.append(oall.loc[0])
        dt0_dt5_pval.append(pergen[0])
        dt0_dt15_pval.append(pergen[1])
        dt5_dt15_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['dt0_dt5_pval']=dt0_dt5_pval
    d['dt0_dt15_pval']=dt0_dt15_pval
    d['dt5_dt15_pval']=dt5_dt15_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = cf.fdr(d['pval'])
    # sorrt by corrected pval
    dfin_gf = d.sort_values(by=['pval_adj'])

In [195]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_0[astro_0.obs['gfap_aldh'] == 'gfap_only',],
                         list(set(dfin_gf.iloc[:17,0])-set(dfin_al.iloc[:52,0])-set(dfin_b.iloc[:19,0])),
                         #dfin_gf.iloc[:17,0],
                         groupby='diet', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

#### Both

In [196]:
if bool_plot == True:
    astro_0 = adata_astro[:,~adata_astro.var.index.isin(['Gfap', 'Aldh1l1'])]
    b = astro_0[astro_0.obs['gfap_aldh'] == 'both',]
    b = b[:,b.var.index.isin(x)]
    gene_name=[]
    pval=[]
    dt0_dt5_pval=[]
    dt0_dt15_pval=[]
    dt5_dt15_pval=[]

    for i in range(b.n_vars):
        gen = b.var.index[i]

        dt0 = b[b.obs['diet'] == 'chow', gen].X.toarray()
        dt5 = b[b.obs['diet'] == 'hfd_5', gen].X.toarray()
        dt15 = b[b.obs['diet'] == 'hfd_15', gen].X.toarray()
        args=(dt0, dt5, dt15)
        xt = np.concatenate(args)
        xt = xt.ravel()

        #create DataFrame
        df = pd.DataFrame({'score': xt,
                           'group': np.repeat(['dt0', 'dt5', 'dt15'],
                                              repeats=[len(dt0),len(dt5),len(dt15)])}
        ) 

        #perform Welch's ANOVA
        oall = pg.welch_anova(dv='score', 
                              between='group', 
                              data=df)['p-unc']

        pergen = pg.pairwise_gameshowell(dv='score',
                                         between='group',
                                         data=df)['pval']
        gene_name.append(gen)
        pval.append(oall.loc[0])
        dt0_dt5_pval.append(pergen[0])
        dt0_dt15_pval.append(pergen[1])
        dt5_dt15_pval.append(pergen[2])

    d=pd.DataFrame()
    d['gene_name']=gene_name
    d['pval']=pval
    d['dt0_dt5_pval']=dt0_dt5_pval
    d['dt0_dt15_pval']=dt0_dt15_pval
    d['dt5_dt15_pval']=dt5_dt15_pval

    # perform BH multipliciity correctioin
    d['pval_adj'] = fdr(d['pval'])
    # sorrt by corrected pval
    dfin_b = d.sort_values(by=['pval_adj'])

In [197]:
if bool_plot == True:
    sc.pl.stacked_violin(astro_0[astro_0.obs['gfap_aldh'] == 'both',],
                         list(set(dfin_b.iloc[:19,0])-set(dfin_al.iloc[:52,0])-set(dfin_gf.iloc[:17,0])),  
                         #dfin_b.iloc[:19,0],
                         groupby='diet', 
                         use_raw=False,
                         swap_axes=True,
                         dendrogram=False,
                         title=''
    )

In [198]:
if bool_plot == True:
    matplotlib_venn.venn3_unweighted([set(dfin_al.iloc[:52,0],),
                                      set(dfin_gf.iloc[:17,0]),
                                      set(dfin_b.iloc[:19,0],)], 
                                     set_labels=("Aldh1l1", "Gfap", "Both"))

### Different types of stacked violin plot visualisation

In [199]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                         list(set(dfin_gf.iloc[:17,0])-set(dfin_al.iloc[:52,0])-set(dfin_b.iloc[:19,0])),
                         groupby='diet_marker',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         scale='width',
                         title="",
                         dendrogram_col=True,
                         row_palette=['green', 'limegreen', 'lime', 'blue', 'dodgerblue',
                                      'darkturquoise', 'darkmagenta', 'magenta', 'violet'],
                         swap_axes=False
    )

In [200]:
if bool_plot == True:
    sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                         list(set(dfin_gf.iloc[:17,0])-set(dfin_al.iloc[:52,0])-set(dfin_b.iloc[:19,0])),
                         groupby='diet_marker',
                         use_raw=False, 
                         log=False, 
                         dendrogram=False,
                         rotation=90,
                         scale='width',
                         #standard_scale='var',
                         title="",
                         #vmin=-1, vmax=1,
                         cmap='coolwarm',
                         col_palette=['green', 'limegreen', 'lime', 'blue', 'dodgerblue',
                                      'darkturquoise', 'darkmagenta', 'magenta', 'violet'],
                         swap_axes=True
    )

In [201]:
if bool_plot == True:
    for i in list(set(dfin_gf.iloc[:17,0])-set(dfin_al.iloc[:52,0])-set(dfin_b.iloc[:19,0])):
        sc.pl.stacked_violin(adata_astro[adata_astro.obs['gfap_aldh'] != 'none',],
                             i,
                             groupby='diet_marker',
                             use_raw=False, 
                             log=False, 
                             dendrogram=False,
                             rotation=90,
                             scale='width',  # area
                             title="",
                             colorbar_title='',
                             #vmin=-1, vmax=1,
                             cmap='bwr',
                             #col_palette = ['green','green','green','blue','blue','blue','magenta','magenta','magenta'],
                             swap_axes=True
        )
    

## Differentia expression and enrichment analysis

In [202]:
if bool_plot == True:
    astro0 = astro_0[(astro_0.obs['diet'] == 'chow') & 
                   (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro0,
                               sample_description=astro0.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()#mean_thres=np.log(0.01), qval_thres=0.1, fc_lower_thres=1, fc_upper_thres=1)
    
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"aldh_only-gfap_only-chow.csv",
                             sep="\t"
    )
    print(sum(dets_mark_summary['pval'] < 0.05))
    dets_mark.plot_volcano(alpha=0.05, 
                           corrected_pval=False,
                           min_fc=1,
                           size=15,
                           log10_p_threshold=-5.5, 
                           log2_fc_threshold=3.7,
                           save=sc_settings_figdir+"dge_volcano_chow_aldh-gfap",
                           suffix="_astrocytes.png"
    )
    enri_aldh_gfap_0 = gseapy.enrichr(gene_list=dets_mark_summary.loc[dets_mark_summary['pval'] < 0.05,]['gene'].tolist(),
                                      organism='Mouse',
                                      gene_sets=repository,
                                      description='pathway',
                                      cutoff=cfenr
    )

In [203]:
if bool_plot == True:
    astro0 = astro_0[(astro_0.obs['diet'] == 'chow') & 
                    (astro_0.obs['gfap_aldh'].isin(['both','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro0,
                               sample_description=astro0.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()#mean_thres=np.log(0.01), qval_thres=0.1, fc_lower_thres=1, fc_upper_thres=1)
    
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"both-gfap_only-chow.csv",
                             sep="\t"
    )
    dets_mark.plot_volcano(alpha=0.05,corrected_pval=False, min_fc=1, size=15, log10_p_threshold=-5.5,
                           log2_fc_threshold=3.5, save=sc_settings_figdir+"dge_volcano_chow_both-gfap",
                           suffix="_astrocytes.png")
    enri_both_gfap_0 = gseapy.enrichr(gene_list=dets_mark_summary.loc[dets_mark_summary['pval']<0.05,]['gene'].tolist(),
                                       organism='Mouse',
                                       gene_sets=repository,
                                       description='pathway',
                                       cutoff=cfenr)

In [204]:
if bool_plot == True:
    astro0 = astro_0[(astro_0.obs['diet'] == 'chow') & 
                    (astro_0.obs['gfap_aldh'].isin(['aldh_only', 'both'])), ]
    dets_mark = de.test.t_test(data=astro0, 
                               sample_description=astro0.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary()#mean_thres=np.log(0.01), qval_thres=0.1, fc_lower_thres=1, fc_upper_thres=1)
    
    dets_mark_summary.to_csv(path_or_buf=dir_tables+"aldh_only-both-chow.csv",
                             sep="\t"
    )
    dets_mark.plot_volcano(alpha=0.05, 
                           corrected_pval=False,
                           min_fc=1,
                           size=15,
                           log10_p_threshold=-5.5, 
                           log2_fc_threshold=3.5,
                           save=sc_settings_figdir+"dge_volcano_chow_aldh-both",
                           suffix="_astrocytes.png"
    )
    enri_aldh_both_0 = gseapy.enrichr(gene_list=dets_mark_summary.loc[dets_mark_summary['pval'] < 0.05,]['gene'].tolist(),
                                       organism='Mouse',
                                       gene_sets=repository,
                                       description='pathway',
                                       cutoff=cfenr
    )

In [205]:
if bool_plot == True:
    chow_aldh_gfap = enri_aldh_gfap_0.res2d[enri_aldh_gfap_0.res2d['Adjusted P-value'] < 0.05]
    chow_both_gfap = enri_both_gfap_0.res2d[enri_both_gfap_0.res2d['Adjusted P-value'] < 0.05]
    chow_aldh_both = enri_aldh_both_0.res2d[enri_aldh_both_0.res2d['Adjusted P-value'] < 0.05]

<a id="DE"></a>

# Diffusion maps of Astrocytes¶

In [206]:
if bool_plot == True:
    adata_astro_raw = adata_astro.copy()
    adata_astro_raw.X = adata_astro.raw.X.copy()
    sc.pp.log1p(adata_astro_raw)
    sc.pp.neighbors(adata_astro_raw)
    sc.tl.diffmap(adata_astro_raw)

    sc.pp.neighbors(adata_astro)
    sc.tl.diffmap(adata_astro) 

In [207]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro, 
                           ['gfap_aldh'], 
                           components='1,3',
                           use_raw=False,
                           size=20,
                           save="_astrocyte_leiden.png"
    )

### Chow

In [208]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ], 
                           ['aldh_only'], 
                           components='1,3',#,'Aldh1l1'
                           color_map=mymap,
                           use_raw=False, 
                           palette=['gainsboro', 'green'],
                           size=20,
                           save="_astrocyte_chow_aldh.png"
    )

In [209]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet']  == 'chow', ],
                           ['gfap_only'], 
                           components='1,3',# ,'Gfap' 
                           color_map=mymap,
                           use_raw=False,
                           palette=['gainsboro', 'magenta'],
                           size=20, 
                           save="_astrocyte_chow_gfap.png"
    )

In [210]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'chow', ],
                           ['both'], 
                           components='1,3',
                           use_raw=False,
                           palette=['gainsboro', 'blue'],
                           size=20, 
                           save="_astrocyte_chow_both.svg"
    )

### hfd 5

In [211]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                           ['aldh_only'], 
                           components='1,3',# ,'Aldh1l1'
                           color_map=mymap, 
                           use_raw=False,
                           palette=['gainsboro', 'green'],
                           size=20,
                           save="_astrocyte_hfd5_aldh.png"
    )

In [212]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                           ['gfap_only'],
                           components='1,3', #,'Gfap'
                           color_map=mymap,
                           use_raw=False,
                           palette=['gainsboro', 'magenta'], 
                           size=20, 
                           save="_astrocyte_hfd5_gfap.png"
    )

In [213]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_5', ],
                           ['both'],
                           components='1,3',
                           use_raw=False,
                           palette=['gainsboro','blue'],
                           size=20, 
                           save="_astrocyte_hfd5_both.svg"
    )

### hfd 15

In [214]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ],
                           ['aldh_only'],
                           components='1,3', #,'Aldh1l1'
                           color_map=mymap,
                           use_raw=False, 
                           palette=['gainsboro', 'green'],
                           size=20,
                           save="_astrocyte_hfd15_aldh.png"
    )

In [215]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ],
                           ['gfap_only'], 
                           components='1,3', #,'Gfap'
                           color_map=mymap,
                           use_raw=False,
                           palette=['gainsboro','magenta'],
                           size=20, 
                           save="_astrocyte_hfd15_gfap.png"
    )

In [216]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro[adata_astro.obs['diet'] == 'hfd_15', ],
                           ['both'], 
                           components='1,3',
                           use_raw=False,
                           palette=['gainsboro', 'blue'],
                           size=20,
                           save="_astrocyte_hfd15_both.svg"
    )

### Pseudotime

In [217]:
if bool_plot == True:
    astrocyte_mask21 = np.isin(adata_astro.obs['leiden'],
                               '0'
    )
    max_astrocyte_id21 = np.argmax(adata_astro.obsm['X_diffmap'][astrocyte_mask21,2])
    max_astrocyte_id2_raw1 = np.argmax(adata_astro.obsm['X_diffmap'][astrocyte_mask21,2])
    root_id21 = np.arange(len(astrocyte_mask21))[astrocyte_mask21][max_astrocyte_id21]
    adata_astro.uns['iroot'] = root_id21
    adata_astro_raw.uns['iroot'] = root_id21

    #Compute dpt
    sc.tl.dpt(adata_astro,
              n_branchings=0
    )
    sc.tl.dpt(adata_astro_raw,
              n_branchings=0
    )

In [218]:
if bool_plot == True:
    cf.plot_diffmap_marker(adata_astro, 
                           ['dpt_pseudotime'],
                           components='1,3',
                           color_map=mymap,
                           use_raw=False,
                           size=20,
                           save="_astrocyte_pseudotime.png"
    )

# Corrleation analysis of Gfap and Aldh1l1

Correlation analysis is performed in order to identify the genes which correlate with Gfap and Aldh1l1 over the populations and diets.

In [219]:
if bool_plot == True:
    sc.pp.normalize_per_cell(adata_qc)
    adata_qc.X = adata_qc.X.toarray()
    adata_qc.obs['gfap_aldh'] = np.select([((adata_qc[:,'Gfap'].X > 0) & (adata_qc[:,'Aldh1l1'].X == 0)), 
                                           ((adata_qc[:,'Gfap'].X == 0) & (adata_qc[:,'Aldh1l1'].X > 0)),
                                           ((adata_qc[:,'Gfap'].X > 0) & (adata_qc[:,'Aldh1l1'].X > 0)), 
                                           ((adata_qc[:,'Gfap'].X == 0) & (adata_qc[:,'Aldh1l1'].X == 0))],
                                           ['gfap_only', 'aldh_only', 'both', 'none'])

In [220]:
if bool_plot == True:
    repository='KEGG_2019_Mouse'#   KEGG_2019_Mouse  GO_Biological_Process_2018
    title='kegg'
    cfcor=0.2
    cfenr=0.1
    plt.rcParams['figure.figsize']=[5, 5]

    all = homeostasis+hormone+gliotransmision+angiogenic+ecm+ucp2thyroid+iron+cellcycle+inflamation+calcium+hedgehog+others+a1astro+a2astro
    ph = [homeostasis, hormone, gliotransmision, angiogenic,
          ecm, ucp2thyroid, iron, cellcycle, inflamation, 
          calcium, hedgehog, others, a1astro, a2astro]
    pname = ['homeostasis', 'hormone', 'gliotransmision', 'angiogenic',
             'ecm', 'ucp2thyroid', 'iron', 'cellcycle', 'inflamation',
             'calcium', 'hedgehog', 'others', 'a1astro', 'a2astro']

### Chow

In [221]:
if bool_plot == True:
    p = adata_qc[(adata_qc.obs['diet'] == 'chow') & (adata_astro.obs['gfap_aldh'] == 'aldh_only'), ]
    df = pd.DataFrame(p.X.toarray(), 
                      index=p.obs.index, 
                      columns=p.var.index
    )
    p.var['chow_aldh_only'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['chow_aldh_only'][p.var.index!='Aldh1l1'], bins=15)  
    plt.axvline(x=0.2,
                color='r',
                label='#genes: 394'
    )
    plt.legend(loc='upper right')
    plt.title('chow_aldh_only')

    enr_aldh_chow = gseapy.enrichr(gene_list=p.var[p.var['chow_aldh_only'] > cfcor].index.tolist(),
                                  organism='Mouse',
                                  gene_sets=repository,
                                  description='pathway',
                                  cutoff=cfenr)
    #gseapy.barplot(enr_aldh_chow.res2d,title=title+'_chow_aldh1l1',
    #               cutoff=cfenr, 
    #               top_term=25)#,
    #               #ofname=sc_settings_figdir+title+'_chow_aldh1l1')

    pvalue=[]
    cnt=[]
    gl=p.var[p.var['chow_aldh_only'] > cfcor].index
    gl1=p.var[p.var['chow_aldh_only'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    ch_al = pd.DataFrame({'Term':pname, 
                          'P-value':pvalue,
                          'Count':cnt,
                          'Comparison':'chow_aldh_only'}
    )

In [222]:
if bool_plot == True:
    p = adata_qc[(adata_qc.obs['diet'] == 'chow') & (adata_astro.obs['gfap_aldh'] == 'gfap_only'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['chow_gfap_only'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['chow_gfap_only'][p.var.index != 'Gfap'], bins=15)  
    plt.axvline(x=0.2, 
                color='r', 
                label='#genes: 766'
    )
    plt.legend(loc='upper right')
    plt.title('chow_gfap_only')
    
    enr_gfap_chow = gseapy.enrichr(gene_list=p.var[p.var['chow_gfap_only'] > cfcor].index.tolist(),
                                  organism='Mouse',
                                  gene_sets=repository,
                                  description='pathway',
                                  cutoff=cfenr
    )
    #gseapy.barplot(enr_gfap_chow.res2d,title=title+'_chow_gfap',
    #               cutoff=cfenr, 
    #               top_term=25    ),
    #               ofname=sc_settings_figdir+title+'_chow_gfap')
    
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['chow_gfap_only'] > cfcor].index
    gl1 = p.var[p.var['chow_gfap_only'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    ch_gf = pd.DataFrame({'Term':pname,
                          'P-value':pvalue, 
                          'Count':cnt, 
                          'Comparison':'chow_gfap_only'}
    )

In [223]:
if bool_plot == True:
    gseapy.barplot(chow_aldh[chow_aldh['Term'].isin(z1)], 
                   title=title+'_chow_intersect_aldh',
                   cutoff=cfenr,
                   top_term=25
    )#, ofname=sc_settings_figdir+title+'_chow_intersect_aldh')

In [224]:
if bool_plot == True:    
    gseapy.barplot(chow_gfap[chow_gfap['Term'].isin(z1)],
                   title=title+'_chow_intersect_gfap',
                   cutoff=cfenr,
                   top_term=25
    )#, ofname=sc_settings_figdir+title+'_chow_intersect_gfap')

In [225]:
if bool_plot == True:
    gseapy.barplot(chow_aldh[chow_aldh['Term'].isin(z2)],
                   title=title+'_chow_only_aldh',
                   cutoff=cfenr, 
                   top_term=25
    )#ofname=sc_settings_figdir+title+'_chow_only_aldh')

In [226]:
if bool_plot == True:
    gseapy.barplot(chow_gfap[chow_gfap['Term'].isin(z3)],
                   title=title+'_chow_only_gfap',
                   cutoff=cfenr, 
                   top_term=25
    )#ofname=sc_settings_figdir+title+'_chow_only_gfap')

In [227]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'chow') & (adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['chow_both_gfap'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['chow_both_gfap'][p.var.index!='Gfap'], bins=20)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 1847'
    )
    plt.legend(loc='upper right')
    plt.title('chow_both_gfap')
    
    enr_both_chow_gfap = gseapy.enrichr(gene_list=p.var[p.var['chow_both_gfap']>cfcor].index.tolist(),
                                         organism='Mouse',
                                         gene_sets=repository,
                                         description='pathway',
                                         cutoff=cfenr
    )
    #gseapy.barplot(enr_res.res2d,title=title+'_chow_both_gfap', cutoff=cfenr, top_term=25)#,
                  # ofname=sc_settings_figdir+title+'_chow_both_gfap')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['chow_both_gfap'] > cfcor].index
    gl1 = p.var[p.var['chow_both_gfap'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    ch_bt_gf = pd.DataFrame({'Term':pname,
                             'P-value':pvalue,
                             'Count':cnt, 
                             'Comparison':'chow_both_gfap'}
    )        

In [228]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'chow') & (adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['chow_both_aldh'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['chow_both_aldh'][p.var.index!='Aldh1l1'], bins=15)#, bins=20)  
    plt.axvline(x=0.2,
                color='r',
                label='#genes: 1212'
    )
    plt.legend(loc='upper right')
    plt.title('chow_both_aldh')
    
    enr_both_chow_aldh = gseapy.enrichr(gene_list=p.var[p.var['chow_both_aldh'] > cfcor].index.tolist(),
                                        organism='Mouse',
                                        gene_sets=repository,
                                        description='pathway',
                                        cutoff=cfenr
    )
    #gseapy.barplot(enr_res.res2d,title=title+'_chow_both_aldh1l1',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_chow_both_aldh1l1')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['chow_both_aldh'] > cfcor].index
    gl1 = p.var[p.var['chow_both_aldh'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    ch_bt_al = pd.DataFrame({'Term':pname, 
                             'P-value':pvalue,
                             'Count':cnt,
                             'Comparison':'chow_both_aldh'}
    )        

In [229]:
if bool_plot == True:
    chow_aldh = enr_aldh_chow.res2d[enr_aldh_chow.res2d['Adjusted P-value'] < 0.05]
    chow_gfap = enr_gfap_chow.res2d[enr_gfap_chow.res2d['Adjusted P-value'] < 0.05]
    chow_both_aldh = enr_both_chow_aldh.res2d[enr_both_chow_aldh.res2d['Adjusted P-value'] < 0.05]
    chow_both_gfap = enr_both_chow_gfap.res2d[enr_both_chow_gfap.res2d['Adjusted P-value'] < 0.05]
    
    ch_al = ch_al[ch_al['P-value'] < 0.05]
    ch_gf = ch_gf[ch_gf['P-value'] < 0.05]
    ch_bt_gf = ch_bt_gf[ch_bt_gf['P-value'] < 0.05]
    ch_bt_al = ch_bt_al[ch_bt_al['P-value'] < 0.05]

### HFD 5 days

In [230]:
if bool_plot == True:
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_5') & (adata_astro.obs['gfap_aldh'] == 'aldh_only'), ]
    df = pd.DataFrame(p.X.toarray(), 
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd5_aldh_only'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['hfd5_aldh_only'][p.var.index!='Aldh1l1'], bins=20)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 357'
    )
    plt.legend(loc='upper right')
    plt.title('hfd5_aldh_only')
    
    enr_aldh_hfd5 = gseapy.enrichr(gene_list=p.var[p.var['hfd5_aldh_only'] > cfcor].index.tolist(),
                                   organism='Mouse',
                                   gene_sets=repository,
                                   description='pathway',
                                   cutoff=cfenr
    )
    #gseapy.barplot(enr_res_aldh.res2d,title=title+'_hfd5_aldh1l1',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_hfd5_aldh1l1')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['hfd5_aldh_only'] > cfcor].index
    gl1 = p.var[p.var['hfd5_aldh_only'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h5_al = pd.DataFrame({'Term':pname,
                          'P-value':pvalue,
                          'Count':cnt,
                          'Comparison':'hfd5_aldh_only'}
    )

In [231]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_5') & (adata_astro.obs['gfap_aldh'] == 'gfap_only'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd5_gfap_only'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['hfd5_gfap_only'][p.var.index!='Gfap'], bins=20)  
    plt.axvline(x=0.2,
                color='r',
                label='#genes: 958'
    )
    plt.legend(loc='upper right')
    plt.title('hfd5_gfap_only')
    print(len(p.var[p.var['hfd5_gfap_only'] > cfcor]))
    
    enr_gfap_hfd5 = gseapy.enrichr(gene_list=p.var[p.var['hfd5_gfap_only']>cfcor].index.tolist(),
                             organism='Mouse',
                             gene_sets=repository,
                             description='pathway',
                             cutoff=cfenr
    )
    #gseapy.barplot(enr_res_gfap.res2d,title=title+'_hfd5_gfap',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_hfd5_gfap')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['hfd5_gfap_only'] > cfcor].index
    gl1 = p.var[p.var['hfd5_gfap_only'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h5_gf = pd.DataFrame({'Term':pname, 
                          'P-value':pvalue, 
                          'Count':cnt,
                          'Comparison':'hfd5_gfap_only'}
    )    

In [232]:
if bool_plot == True:
    gseapy.barplot(hfd5_aldh[hfd5_aldh['Term'].isin(z1)], 
                   title=title+'_hfd5_intersect_aldh', 
                   cutoff=cfenr,
                   top_term=25#, ofname=sc_settings_figdir+title+'_hfd5_intersect_aldh')
    )

In [233]:
if bool_plot == True:    
    gseapy.barplot(hfd5_gfap[hfd5_gfap['Term'].isin(z1)],
                   title=title+'_hfd5_intersect_gfap', 
                   cutoff=cfenr,
                   top_term=25#, ofname=sc_settings_figdir+title+'_hfd5_intersect_gfap')
    )

In [234]:
if bool_plot == True:
    gseapy.barplot(hfd5_aldh[hfd5_aldh['Term'].isin(z2)],
                   title=title+'_hfd5_only_aldh',
                   cutoff=cfenr,
                   top_term=25#,#ofname=sc_settings_figdir+title+'_hfd5_only_aldh')
    )

In [235]:
if bool_plot == True:
    gseapy.barplot(hfd5_gfap[hfd5_gfap['Term'].isin(z3)],
                   title=title+'_hfd5_only_gfap',
                   cutoff=cfenr,
                   top_term=25#, # ofname=sc_settings_figdir+title+'_hfd5_only_gfap')
    )

In [236]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_5') & (adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd5_both_gfap'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['hfd5_both_gfap'][p.var.index!='Gfap'], bins=15)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 2277'
    )
    plt.legend(loc='upper right')
    plt.title('hfd5_both_gfap')
    
    enr_hfd5_both_gfap = gseapy.enrichr(gene_list=p.var[p.var['hfd5_both_gfap']>cfcor].index.tolist(),
                             organism='Mouse',
                             gene_sets=repository,
                             description='pathway',
                             cutoff=cfenr
    )
    #gseapy.barplot(enr_res.res2d,title=title+'_hfd5_both_gfap',cutoff=cfenr, top_term=25)#,
                  # ofname=sc_settings_figdir+title+'_hfd5_both_gfap')
    pvalue = []
    cnt = []
    gl = p.var[p.var['hfd5_both_gfap']>cfcor].index
    gl1 = p.var[p.var['hfd5_both_gfap']<cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1)&(set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h5_bt_gf=pd.DataFrame({'Term':pname, 'P-value':pvalue, 'Count':cnt, 'Comparison':'hfd5_both_gfap'})           

In [237]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_5')&(adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(), index=p.obs.index, columns=p.var.index)
    p.var['hfd5_both_aldh'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['hfd5_both_aldh'][p.var.index!='Aldh1l1'], bins=20)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 501'
    )
    plt.legend(loc='upper right')
    plt.title('hfd5_both_aldh')
    print(len(p.var[p.var['hfd5_both_aldh']>cfcor]))
    
    enr_hfd5_both_aldh = gseapy.enrichr(gene_list=p.var[p.var['hfd5_both_aldh'] > cfcor].index.tolist(),
                             organism='Mouse',
                             gene_sets=repository,
                             description='pathway',
                             cutoff=cfenr)
    #gseapy.barplot(enr_res.res2d,title=title+'_hfd5_both_aldh1l1',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_hfd5_both_aldh1l1')
    pvalue = []
    cnt = []
    gl = p.var[p.var['hfd5_both_aldh'] > cfcor].index
    gl1 = p.var[p.var['hfd5_both_aldh'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h5_bt_al = pd.DataFrame({'Term':pname, 
                             'P-value':pvalue,
                             'Count':cnt,
                             'Comparison':'hfd5_both_aldh'}
    )           

In [238]:
if bool_plot == True:
    hfd5_aldh = enr_aldh_hfd5.res2d[enr_aldh_hfd5.res2d['Adjusted P-value'] < 0.05]
    hfd5_gfap = enr_gfap_hfd5.res2d[enr_gfap_hfd5.res2d['Adjusted P-value'] < 0.05]
    hfd5_both_aldh = enr_hfd5_both_aldh.res2d[enr_hfd5_both_aldh.res2d['Adjusted P-value'] < 0.05]
    hfd5_both_gfap = enr_hfd5_both_gfap.res2d[enr_hfd5_both_gfap.res2d['Adjusted P-value'] < 0.05]

    h5_al = h5_al[h5_al['P-value'] < 0.05]
    h5_gf = h5_gf[h5_gf['P-value'] < 0.05]
    h5_bt_gf = h5_bt_gf[h5_bt_gf['P-value'] < 0.05]
    h5_bt_al = h5_bt_al[h5_bt_al['P-value'] < 0.05]

### HFD 15 days

In [239]:
if bool_plot == True:
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_15') & (adata_astro.obs['gfap_aldh'] == 'aldh_only'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd15_aldh_only'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  
    
    plt.hist(p.var['hfd15_aldh_only'][p.var.index!='Aldh1l1'], bins=15)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 499'
    )
    plt.legend(loc='upper right')
    plt.title('hfd15_aldh_only')
    
    enr_aldh_hfd15 = gseapy.enrichr(gene_list=p.var[p.var['hfd15_aldh_only']>cfcor].index.tolist(),
                             organism='Mouse',
                             gene_sets=repository,
                             description='pathway',
                             cutoff=cfenr
    )
    #gseapy.barplot(enr_res_aldh.res2d,title=title+'_hfd15_aldh1l1',cutoff=cfenr, top_term=25)#,
                 #  ofname=sc_settings_figdir+title+'_hfd15_aldh1l1')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['hfd15_aldh_only']>cfcor].index
    gl1 = p.var[p.var['hfd15_aldh_only']<cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h15_al = pd.DataFrame({'Term':pname, 
                           'P-value':pvalue, 
                           'Count':cnt,  
                           'Comparison':'hfd15_aldh_only'}
    )

In [240]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_15') & (adata_astro.obs['gfap_aldh'] == 'gfap_only'), ]
    df = pd.DataFrame(p.X.toarray(), 
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd15_gfap_only'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['hfd15_gfap_only'][p.var.index!='Gfap'], bins=20)  
    plt.axvline(x=0.2, 
                color='r',
                label='#genes: 2052'
    )
    plt.legend(loc='upper right')
    plt.title('hfd15_gfap_only')
    print(len(p.var[p.var['hfd15_gfap_only'] > cfcor]))
    
    enr_gfap_hfd15 = gseapy.enrichr(gene_list=p.var[p.var['hfd15_gfap_only'] > cfcor].index.tolist(),
                             organism='Mouse',
                             gene_sets=repository,
                             description='pathway',
                             cutoff=cfenr
    )
    #gseapy.barplot(enr_res_gfap.res2d,title=title+'_hfd15_gfap',cutoff=cfenr, top_term=25)#,
                  # ofname=sc_settings_figdir+title+'_hfd15_gfap')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['hfd15_gfap_only'] > cfcor].index
    gl1 = p.var[p.var['hfd15_gfap_only'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h15_gf = pd.DataFrame({'Term':pname, 
                           'P-value':pvalue, 
                           'Count':cnt, 
                           'Comparison':'hfd15_gfap_only'}
    )    

In [241]:
if bool_plot == True:
    gseapy.barplot(hfd15_aldh[hfd15_aldh['Term'].isin(z1)], 
                   title=title+'_hfd15_intersect_aldh', 
                   cutoff=cfenr,
                   top_term=25#, ofname=sc_settings_figdir+title+'_hfd15_intersect_aldh')
    )

In [242]:
if bool_plot == True:    
    gseapy.barplot(hfd15_gfap[hfd15_gfap['Term'].isin(z1)], 
                   title=title+'_hfd15_intersect_gfap', 
                   cutoff=cfenr,
                   top_term=25#, ofname=sc_settings_figdir+title+'_hfd15_intersect_gfap')
    )

In [243]:
if bool_plot == True:
    gseapy.barplot(hfd15_aldh[hfd15_aldh['Term'].isin(z2)], 
                   title=title+'_hfd15_only_aldh', 
                   cutoff=cfenr, 
                   top_term=25#,#ofname=sc_settings_figdir+title+'_hfd15_only_aldh')
    )

In [244]:
if bool_plot == True:
    gseapy.barplot(hfd15_gfap[hfd15_gfap['Term'].isin(z3)],
                   title=title+'_hfd15_only_gfap',
                   cutoff=cfenr,
                   top_term=25#,#ofname=sc_settings_figdir+title+'_hfd15_only_gfap')
    )

In [245]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_15') & (adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd15_both_gfap'] = [stats.spearmanr(df.iloc[:,i],df['Gfap'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['hfd15_both_gfap'][p.var.index!='Gfap'], bins=20)  
    plt.axvline(x=0.2, 
                color='r', 
                label='#genes: 470'
    )
    plt.legend(loc='upper right')
    plt.title('hfd15_both_gfap')
    print(len(p.var[p.var['hfd15_both_gfap'] > cfcor]))

    enr_both_hfd15_gfap = gseapy.enrichr(gene_list=p.var[p.var['hfd15_both_gfap']>cfcor].index.tolist(),
                                    organism='Mouse',
                                    gene_sets=repository,
                                    description='pathway',
                                    cutoff=cfenr
    )
    #gseapy.barplot(enr_res.res2d,title=title+'_hfd15_both_gfap',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_hfd15_both_gfap')
    pvalue = []
    cnt = []
    gl = p.var[p.var['hfd15_both_gfap'] > cfcor].index
    gl1 = p.var[p.var['hfd15_both_gfap'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h15_bt_gf = pd.DataFrame({'Term':pname,
                              'P-value':pvalue, 
                              'Count':cnt, 
                              'Comparison':'hfd15_both_gfap'}
    )           

In [246]:
if bool_plot == True:    
    p = adata_qc[(adata_qc.obs['diet'] == 'hfd_15') & (adata_astro.obs['gfap_aldh'] == 'both'), ]
    df = pd.DataFrame(p.X.toarray(),
                      index=p.obs.index,
                      columns=p.var.index
    )
    p.var['hfd15_both_aldh'] = [stats.spearmanr(df.iloc[:,i],df['Aldh1l1'])[0] for i in range(df.shape[1])]  

    plt.hist(p.var['hfd15_both_aldh'][p.var.index!='Aldh1l1'], bins=20)  
    plt.axvline(x=0.2,  color='r', label='#genes: 543')
    plt.legend(loc='upper right')
    plt.title('hfd15_both_aldh')
    print(len(p.var[p.var['hfd15_both_aldh'] > cfcor]))

    enr_both_hfd15_aldh = gseapy.enrichr(gene_list=p.var[p.var['hfd15_both_aldh']>cfcor].index.tolist(),
                                         organism='Mouse',
                                         gene_sets=repository,
                                         description='pathway',
                                         cutoff=cfenr
    )
    #gseapy.barplot(enr_res.res2d,title=title+'_hfd15_both_aldh1l1',cutoff=cfenr, top_term=25)#,
                   #ofname=sc_settings_figdir+title+'_hfd15_both_aldh1l1')
    pvalue=[]
    cnt=[]
    gl = p.var[p.var['hfd15_both_aldh'] > cfcor].index
    gl1 = p.var[p.var['hfd15_both_aldh'] < cfcor].index
    for i in ph:
        pval = stats.fisher_exact([[len(set(gl) & set(i))+1, len(i)-len(set(gl) & set(i))],
                                  [len(set(gl) & set(set(all)-set(i))), len(set(gl1) & (set(all)-set(i)))]])[1]
        pvalue.append(pval)
        cnt.append(len(set(gl) & set(i)))

    h15_bt_al = pd.DataFrame({'Term':pname,
                              'P-value':pvalue, 
                              'Count':cnt, 
                              'Comparison':'hfd15_both_aldh'}
    )           

In [247]:
if bool_plot == True:
    hfd15_aldh = enr_aldh_hfd15.res2d[enr_aldh_hfd15.res2d['Adjusted P-value'] < 0.05]
    hfd15_gfap = enr_gfap_hfd15.res2d[enr_gfap_hfd15.res2d['Adjusted P-value'] < 0.05]
    hfd15_both_aldh = enr_both_hfd15_aldh.res2d[enr_both_hfd15_aldh.res2d['Adjusted P-value'] < 0.05]
    hfd15_both_gfap = enr_both_hfd15_gfap.res2d[enr_both_hfd15_gfap.res2d['Adjusted P-value'] < 0.05]
    
    h15_al = h15_al[h15_al['P-value'] < 0.05]
    h15_gf = h15_gf[h15_gf['P-value'] < 0.05]
    h15_bt_gf = h15_bt_gf[h15_bt_gf['P-value'] < 0.05]
    h15_bt_al = h15_bt_al[h15_bt_al['P-value'] < 0.05]

# Cluster 0,1, 2 only

Here we focus on Clusters 0,1,2 as a potential astrocyte population

<a id="Embedding"></a>

## Embedding and Clustering

In [248]:
if bool_recomp == True:  
    cell_ids_012 = np.asarray(adata_proc.obs_names)[
        [x in ['astrocytes','ependymal'] 
         for x in np.asarray(adata_proc.obs['celltypes'].values)]
    ]
    adata_012 = adata_raw[cell_ids_012,:].copy()
    adata_012.obs['n_genes'] = (adata_012.X > 0).sum(1)
    adata_012.obs['n_counts'] = adata_012.X.sum(1)
    mt_gene_mask = [gene.startswith('mt-') for gene in adata_012.var_names]
    temp_mt_sum = adata_012[:,mt_gene_mask].X.sum(1)
    temp_mt_sum = np.squeeze(np.asarray(temp_mt_sum))
    temp_n_counts = adata_012.obs['n_counts']
    adata_012.obs['mt_frac'] = temp_mt_sum/adata_012.obs['n_counts']
    adata_012.raw = adata_012
    sc.pp.normalize_per_cell(adata_012)
    sc.pp.log1p(adata_012)
    sc.pp.highly_variable_genes(adata_012,n_top_genes=4000)
    sc.pl.highly_variable_genes(adata_012)
    adata_012.X = adata_012.X.toarray()
    
    sc.pp.pca(adata_012,
              n_comps=50,
              use_highly_variable=True,
              random_state=0, 
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_012, 
                    n_neighbors=100, 
                    knn=True, 
                    method='umap', 
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.umap(adata_012)
    if bool_recluster == True:
        sc.tl.leiden(adata_012,
                     resolution=0.5)
        pd.DataFrame(adata_012.obs).to_csv(path_or_buf=sc_settings_writedir+'obs_adata_012.csv')
    else:
        obs = pd.read_csv(sc_settings_writedir+'obs_adata_012.csv')
        adata_012.obs['louvain'] = pd.Series(obs['louvain'].values,
                                             dtype = 'category'
        )
    sc.write(sc_settings_writedir+'adata_012.h5ad', adata_012)
else:
    adata_012 = sc.read(sc_settings_writedir+'adata_012.h5ad') 
sc.tl.paga(adata_012)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


In [249]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012, 
                        ['leiden'],
                        save="_clust012_leiden.png",
                        use_raw=False, 
                        size=5
    )

In [250]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012, 
                        ['leiden'],
                        save="_clust012_leiden_ondata.png",
                        use_raw=False, 
                        legend_loc='on data', 
                        size=5
    )

In [251]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012,
                        ['n_genes', 'n_counts', 'mt_frac'], 
                        color_map=mymap, 
                        size=10,
                        save="_clust012_n_gene_count_mt.png",
                        use_raw=False
    )

A high fraction of mitochondrial RNA is in cluster 3 and around the central cluster 8.

In [252]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012, 
                        ['Gfap', 'Aldh1l1'], 
                        color_map=mymap,
                        size=10, 
                        save="_clust012_markers.png",
                        use_raw=False
    )

In [253]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012,
                        ['diet'],
                        save="_clust012_diet.png",
                        size=10,
                        use_raw=False
    )

In [254]:
if bool_plot == True:
    sc.pl.paga(adata_012,
               save="_clust012.png"
    )

In [255]:
if bool_plot == True:
    cf.cell_percent(adata_012,
                    cluster='leiden', 
                    condition='diet', 
                    xlabel='clusters',
                    ylabel='percentage', 
                    title='barplot_clust012_diet_per_clusters',
                    save=sc_settings_figdir,
                    table=False
    )

In [256]:
if bool_plot == True:
    aldh_pos = adata_012.obs_names[np.asarray(adata_012[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_012.obs_names[np.asarray(adata_012[:,'Gfap'].X).flatten() > 0]
    glast_pos = adata_012.obs_names[np.asarray(adata_012[:,'Slc1a3'].X).flatten() > 0]

    matplotlib_venn.venn3([set(aldh_pos),
                           set(gfap_pos),
                           set(glast_pos)],
                          set_labels=("Aldh1l1", "Gfap", "Slc1a3")
    )
    plt.savefig(sc_settings_figdir+'venndiagram_clust012_gfap-aldh-glast.png')

In [257]:
if bool_plot == True:
    aldh_pos = adata_012.obs_names[np.asarray(adata_012[:,'Aldh1l1'].X).flatten() > 0]
    gfap_pos = adata_012.obs_names[np.asarray(adata_012[:,'Gfap'].X).flatten() > 0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta'))
    plt.savefig(sc_settings_figdir+'venndiagram_clust012_gfap-aldh.png')

## Define Cell Types

### DE Genes

In [258]:
sc.tl.rank_genes_groups(adata_012,
                        groupby='leiden',
                        key_added='rank_genes'
)

if bool_plot == True:
    sc.pl.rank_genes_groups(adata_012, 
                            key='rank_genes',
                            groups=['0','1','2'],
                            save="_clust012_1.png"
    )
    sc.pl.rank_genes_groups(adata_012, 
                            key='rank_genes',
                            groups=['3','4'], 
                            save="_clust012_2.png"
    )
    #sc.pl.rank_genes_groups(adata_012, key='rank_genes', groups=['6','7'], save="_clust012_3.png")

ranking genes
    finished: added to `.uns['rank_genes']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:00)


### Summary heatmap, dotplot and stacked_violin for cluster assignments

In [259]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_012, 
                  var_names=marker_genes_dict, 
                  groupby="leiden", 
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_clust012_celltypes.png"
    )

In [260]:
if bool_plot == True:
    sc.pl.dotplot(adata=adata_012,
                  var_names=marker_genes_dict, 
                  groupby='leiden',
                  use_raw=False, 
                  log=False, 
                  dendrogram=True, 
                  var_group_rotation=90, 
                  show=True, 
                  save="_clust012_celltypes.png"
    )

In [261]:
if bool_plot == True:
    sc.pl.stacked_violin(adata=adata_012, 
                         var_names=marker_genes_dict, 
                         groupby='leiden', 
                         use_raw=False,
                         dendrogram=True,
                         cmap='viridis_r',
                         show=True,
                         save="_clust012_celltypes.png"
    )

In [262]:
if bool_plot == True:
    plt.figure(figsize=(7,7))
    cell_annotation = sc.tl.marker_gene_overlap(adata_012,
                                                marker_genes_dict, 
                                                key='rank_genes', 
                                                normalize='data'
    )
    sb.heatmap(cell_annotation, 
               cbar=False,
               annot=True
    )
    plt.savefig(sc_settings_figdir+'heatmap_clust012_rank_genes_cell_annotation.png')

In [263]:
if bool_plot == True:
    sc.tl.embedding_density(adata_012,
                            basis='umap',
                            groupby='diet'
    )
    sc.pl.embedding_density(adata_012, 
                            basis='umap',
                            key='umap_density_diet',
                            group=['chow', 'hfd_5', 'hfd_15'],
                            bg_dotsize=5, 
                            fg_dotsize=30,
                            save="clust012.png"
    )

## Count distribution for Aldh1l1 and Gfap

#### Single and double positive counts¶

Define booleans for single and double positive counts of Gfap and Aldh1l1

In [264]:
non_boolean_int = np.array((adata_012[:,'Gfap'].X <= 0) & (adata_012[:,'Aldh1l1'].X <= 0)
                           , dtype=int
)
gfap_single_boolean = (adata_012[:,'Gfap'].X > 0) & (adata_012[:,'Aldh1l1'].X <= 0)
aldh_single_boolean = (adata_012[:,'Aldh1l1'].X > 0) & (adata_012[:,'Gfap'].X <= 0) 
single_boolean_int = np.array((gfap_single_boolean | aldh_single_boolean), 
                              dtype=int)*1

gfap_aldh_double_boolean = (adata_012[:,'Gfap'].X>0) & (adata_012[:,'Aldh1l1'].X>0)
double_boolean_int = np.array((gfap_aldh_double_boolean), 
                              dtype=int)*2

non_boolean_int *=0

In [265]:
gfap_single_pos = adata_012[:,'Gfap'].X[gfap_single_boolean]
aldh_single_pos = adata_012[:,'Aldh1l1'].X[aldh_single_boolean]

print('Gfap Single Positive ',len(gfap_single_pos))
print('Aldh1l1 Single Positive ',len(aldh_single_pos))

gfap_aldh_double_pos = adata_012[:,'Gfap'].X[gfap_aldh_double_boolean]

print('Gfap/Aldh1l1 Double Positive ',len(gfap_aldh_double_pos))


Gfap Single Positive  1698
Aldh1l1 Single Positive  965
Gfap/Aldh1l1 Double Positive  736


<a id="adipmarkers"></a>

In [266]:
adata_012.obs['sdt_pos'] = np.array((non_boolean_int + single_boolean_int + double_boolean_int),
                                    dtype=str
)
adata_012.obs['s_pos'] = np.array((np.array(gfap_single_boolean, dtype=int)*1)+(np.array(aldh_single_boolean, dtype=int)*2), 
                                  dtype=str
)
adata_012.obs['d_pos'] = np.array((np.array(gfap_aldh_double_boolean, dtype=int)*1),
                                  dtype=str
)

# make them categorical
adata_012.obs['sdt_pos'] = pd.Series(adata_012.obs['sdt_pos'],
                                     dtype="category"
)
adata_012.obs['s_pos'] = pd.Series(adata_012.obs['s_pos'], 
                                   dtype="category"
)
adata_012.obs['d_pos'] = pd.Series(adata_012.obs['d_pos'],
                                   dtype="category"
)

In [267]:
if bool_plot == True:
    new_cluster_names = ['Non Positives', 'Single Positives', 'Double Positives']
    adata_012.rename_categories('sdt_pos', 
                                new_cluster_names
    )
    sc.pl.umap(adata_012,
               color=['sdt_pos'],
               size=10, 
               palette=["lightgrey", "tomato", "blue"], 
               save="_clust012_gfap-aldh_single_double_positiv.png"
    )
    new_cluster_names = ['Non Single Positives', 'Gfap Single Positives', 'Aldh1l1 Single Positives']
    adata_012.rename_categories('s_pos',
                                new_cluster_names
    )
    sc.pl.umap(adata_012[adata_012.obs['s_pos'] != 'Non Single Positives'],
               color=['s_pos'],
               size=10, 
               palette=["magenta", "lime"],
               save="_clust012_gfap-aldh_single_positiv.png"
    )
    new_cluster_names = ['Non Double Positives', 'Gfap/Aldh1l1 Double Positive']
    adata_012.rename_categories('d_pos',
                                new_cluster_names
    )
    sc.pl.umap(adata_012[adata_012.obs['d_pos'] != 'Non Double Positives'],
               color=['d_pos'],
               size=10,
               save="_clust012_gfap-aldh_doble_positiv.png"
    )

### Cluster 0,1,2 chow diet

In [268]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_012[adata_012.obs['diet'] == 'chow', ][:, ['Gfap']].X, 
                           adata_012[adata_012.obs['diet'] == 'chow', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic',
                           c='dodgerblue'
    )
    ax1.set_title('chow')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_012[adata_012.obs['diet']=='chow', ], 
                          color=['leiden'],
                          size=10,
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_clust012_chow.png')

In [269]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012,
                        ['Gfap', 'Aldh1l1'],
                        save="_clust012_chow_gfap-aldh.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

In [270]:
if bool_plot == True:
    aldh_pos = adata_012[adata_012.obs['diet'] == 'chow', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'chow', ][:,'Aldh1l1'].X).flatten()>0]
    gfap_pos = adata_012[adata_012.obs['diet'] == 'chow', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'chow', ][:,'Gfap'].X).flatten()>0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"),
                          set_colors=('lime', 'magenta')
    )
    plt.savefig(sc_settings_figdir+'venndiagram_clust012_chow_gfap-aldh.png')

### Cluster 0,1,2 hfd_5 diet

In [271]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_012[adata_012.obs['diet'] == 'hfd_5', ][:, ['Gfap']].X, 
                           adata_012[adata_012.obs['diet'] == 'hfd_5', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic', 
                           c='darkorange'
    )
    ax1.set_title('hfd_5')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_012[adata_012.obs['diet'] == 'hfd_5', ],
                          color=['leiden'],
                          size=10,
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_clust012_hfd5.png')

In [272]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_5', ], ['Gfap', 'Aldh1l1'],
                        save="_clust012_hfd5_gfap-aldh.png", size=10, color_map=mymap, use_raw=False)

In [273]:
if bool_plot == True:
    aldh_pos = adata_012[adata_012.obs['diet'] == 'hfd_5', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'hfd_5', ][:,'Aldh1l1'].X).flatten()>0]
    gfap_pos = adata_012[adata_012.obs['diet'] == 'hfd_5', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'hfd_5', ][:,'Gfap'].X).flatten()>0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"), 
                          set_colors=('lime', 'magenta')
    )
    plt.savefig(sc_settings_figdir+'venndiagram_clust012_hfd5_gfap-aldh.png')

### Cluster 0,1,2 hfd_15 diet

In [274]:
if bool_plot == True:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), gridspec_kw={'wspace':0.8})
    ax1_dict = ax1.scatter(adata_012[adata_012.obs['diet'] == 'hfd_15', ][:, ['Gfap']].X, 
                           adata_012[adata_012.obs['diet'] == 'hfd_15', ][:, ['Aldh1l1']].X, 
                           s=9,
                           cmap='seismic',
                           c='green'
    )
    ax1.set_title('hfd_15')
    ax1.set_ylabel('Gfap')
    ax1.set_xlabel('Aldh1l1')
    ax2_dict = sc.pl.umap(adata_012[adata_012.obs['diet'] == 'hfd_15', ],
                          color=['leiden'],
                          size=10,
                          ax=ax2,
                          show=False
    )
    plt.savefig(sc_settings_figdir+'gfap-aldh_clust012_hfd15.png')

In [275]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_15', ], 
                        ['Gfap', 'Aldh1l1'],
                        save="_clust012_hfd15_gfap-aldh.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

In [276]:
if bool_plot == True:
    aldh_pos = adata_012[adata_012.obs['diet'] == 'hfd_15', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'hfd_15', ][:,'Aldh1l1'].X).flatten()>0]
    gfap_pos = adata_012[adata_012.obs['diet'] == 'hfd_15', ].obs_names[np.asarray(adata_012[adata_012.obs['diet'] == 'hfd_15', ][:,'Gfap'].X).flatten()>0]

    matplotlib_venn.venn2([set(aldh_pos),
                           set(gfap_pos)],
                          set_labels=("Aldh1l1", "Gfap"), 
                          set_colors=('lime', 'magenta')
    ) 
    plt.savefig(sc_settings_figdir+'venndiagram_clust012_hfd15_gfap-aldh.png')

## Cells expression Gfap and Aldh1l1

In [277]:
adata_012.obs['gfap_aldh'] = np.select([((adata_012[:,'Gfap'].X > 0) & (adata_012[:,'Aldh1l1'].X == 0)), 
                                        ((adata_012[:,'Gfap'].X == 0) & (adata_012[:,'Aldh1l1'].X > 0)),
                                        ((adata_012[:,'Gfap'].X > 0) & (adata_012[:,'Aldh1l1'].X > 0)), 
                                        ((adata_012[:,'Gfap'].X == 0) & (adata_012[:,'Aldh1l1'].X == 0))],
                                        ['gfap_only', 'aldh_only', 'both', 'none'])

### Chow

In [278]:
if bool_plot == True:
    data = adata_012[adata_012.obs['diet'] == 'chow', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_012[adata_012.obs['diet']=='chow', ], 
                     color=['gfap_aldh'], size=10, ax=ax0, show=False,
                     palette=['green','blue','magenta','gainsboro'])
    wedges, texts, autotexts = ax1.pie(data, autopct=lambda pct: func(pct, data), textprops=dict(color="w"),
                                      colors=['green','blue','magenta','gainsboro'])
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_clust012_chow_gfap-aldh.png')

### HFD_5

In [279]:
if bool_plot == True:
    data = adata_012[adata_012.obs['diet'] == 'hfd_5', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_012[adata_012.obs['diet']=='hfd_5', ], 
                     color=['gfap_aldh'],
                     size=10,
                     ax=ax0,
                     show=False,
                     palette=['green','blue','magenta','gainsboro']
    )
    wedges, texts, autotexts = ax1.pie(data, autopct=lambda pct: func(pct, data), textprops=dict(color="w"),
                                       colors=['green', 'blue', 'magenta', 'gainsboro']
    )
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_clust012_hfd5_gfap-aldh.png')

### HFD_15

In [280]:
if bool_plot == True: 
    data = adata_012[adata_012.obs['diet'] == 'hfd_15', ].obs['gfap_aldh'].value_counts().sort_index()

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(16, 6), subplot_kw=dict(aspect="equal"))
    ax0 = sc.pl.umap(adata_012[adata_012.obs['diet']=='hfd_15', ], 
                     color=['gfap_aldh'],
                     size=10,
                     ax=ax0, 
                     show=False,
                     palette=['green', 'blue', 'magenta', 'gainsboro']
    )
    wedges, texts, autotexts = ax1.pie(data, autopct=lambda pct: func(pct, data), textprops=dict(color="w"),
                                       colors=['green', 'blue', 'magenta', 'gainsboro']
    )
    ax1.set_title('Chow diet cells - markers share')

    fig.savefig(sc_settings_figdir+'umap_pie_chart_clust012_hfd15_gfap-aldh.png')

In [281]:
if bool_plot == True:
    cf.cell_percent(adata_012, cluster='diet', condition='gfap_aldh', xlabel='clusters', ylabel='percentage', 
                 title='barplot_per_diet_clust012_gfap-aldh', save=sc_settings_figdir, table=False)

## Per marker and diet

In [282]:
adata_012.obs['gfap_only'] = adata_012.obs['gfap_aldh']=='gfap_only'
adata_012.obs['aldh_only'] = adata_012.obs['gfap_aldh']=='aldh_only'
adata_012.obs['both'] = adata_012.obs['gfap_aldh']=='both'

In [283]:
# make them categorical
adata_012.obs['gfap_only'] = pd.Series(adata_012.obs['gfap_only'],
                                       dtype="category"
)
adata_012.obs['aldh_only'] = pd.Series(adata_012.obs['aldh_only'], 
                                       dtype="category"
)
adata_012.obs['both'] = pd.Series(adata_012.obs['both'],
                                  dtype="category"
)

In [284]:
adata_012.rename_categories('gfap_only', ['none', 'gfap_only'])

In [285]:
adata_012.rename_categories('aldh_only', ['none', 'aldh_only'])
adata_012.rename_categories('both', ['none', 'both'])

### chow

In [286]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'chow', ],
                        ['gfap_only'],
                        palette=['gainsboro','magenta'],
                        save="_clust012_gfap_only_chow.png", 
                        size=10, 
                        color_map=mymap, 
                        use_raw=False
    )

In [287]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'chow', ],
                        ['aldh_only'],
                        palette=['gainsboro','green'],
                        save="_clust012_aldh_only_chow.png",
                        size=10, 
                        color_map=mymap, 
                        use_raw=False
    )

In [288]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'chow', ],
                        ['both'],
                        palette=['gainsboro','blue'],
                        save="_clust012_both_chow.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

In [289]:
if bool_plot == True: 
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'chow', ], 
                        ['gfap_aldh'], 
                        palette=['green', 'blue', 'magenta', 'gainsboro'],
                        save="_clust012_gfap_aldh_chow.png", 
                        size=10,
                        color_map=mymap,
                        use_raw=False
    )

### Differential gene expression - chow

In [290]:
astro_0 = adata_012[:,~adata_012.var.index.isin(['Gfap', 'Aldh1l1'])]

In [291]:
if bool_plot==True:
    astro0 = astro_0[(astro_0.obs['diet'] == 'chow') & 
                     (astro_0.obs['gfap_aldh'].isin(['aldh_only','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro0, 
                               sample_description=astro0.obs,
                               grouping="gfap_aldh", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01), 
                                          qval_thres=0.1, 
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    dets_mark_summary
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False,
                           min_fc=1,  
                           size=15, 
                           log10_p_threshold=-40, 
                           log2_fc_threshold=6, save=sc_settings_figdir+"dge_volcano_chow_aldh-gfap",
                           suffix="_clust012.png"
    )

In [292]:
if bool_plot==True:
    astro0 = astro_0[(astro_0.obs['diet'] == 'chow') & 
                     (astro_0.obs['gfap_aldh'].isin(['both','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro0,
                               sample_description=astro0.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1, 
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False,
                           min_fc=1, 
                           size=15, 
                           log10_p_threshold=-40,
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_chow_gfap-both",
                           suffix="_clust012.png"
    )

In [293]:
if bool_plot == True:
    astro0 = astro_0[(astro_0.obs['diet']=='chow') & 
                     (astro_0.obs['gfap_aldh'].isin(['aldh_only','both'])), ]
    dets_mark = de.test.t_test(data=astro0, 
                               sample_description=astro0.obs, 
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1,
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False, 
                           min_fc=1,
                           size=15,
                           log10_p_threshold=-40, 
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_chow_aldh-both",
                           suffix="_clust012.png"
    )

### hfd 5

In [294]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_5', ],
                        ['gfap_only'],
                        palette=['gainsboro','magenta'],
                        save="_clust012_gfap_only_hfd5.png", 
                        size=10, 
                        color_map=mymap, 
                        use_raw=False
    )

In [295]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_5', ], 
                        ['aldh_only'],
                        palette=['gainsboro', 'green'],
                        save="_clust012_aldh_only_hfd5.png",
                        size=10, 
                        color_map=mymap, 
                        use_raw=False
    )

In [296]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_5', ],
                        ['both'], 
                        palette=['gainsboro','blue'],
                        save="_clust012_both_hfd5.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

In [297]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_5', ], 
                        ['gfap_aldh'],
                        palette=['green', 'blue', 'magenta', 'gainsboro'],
                        save="_clust012_gfap_aldh_hfd5.png",
                        size=10, 
                        color_map=mymap, 
                        use_raw=False
    )

### Differential gene expression - hfd_5

In [298]:
astro_5 = adata_012[:,~adata_012.var.index.isin(['Gfap', 'Aldh1l1'])]

In [299]:
if bool_plot == True:
    astro5 = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                     (astro_5.obs['gfap_aldh'].isin(['aldh_only','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro5, 
                               sample_description=astro5.obs,
                               grouping="gfap_aldh", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1, 
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    dets_mark_summary
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False,
                           min_fc=1, 
                           size=15,
                           log10_p_threshold=-50, 
                           log2_fc_threshold=6, 
                           save=sc_settings_figdir+"dge_volcano_hfd5_aldh-gfap",
                           suffix="_clust012.png"
    )

In [300]:
if bool_plot == True:
    astro5 = astro_5[(astro_5.obs['diet'] == 'hfd_5') & 
                     (astro_5.obs['gfap_aldh'].isin(['both','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro5,
                               sample_description=astro5.obs,
                               grouping="gfap_aldh", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01), 
                                          qval_thres=0.1, 
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False,
                           min_fc=1, 
                           size=15,
                           log10_p_threshold=-50,
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_hfd5_both-gfap",
                           suffix="_clust012.png"
    )

In [301]:
if bool_plot == True:
    astro5 = astro_5[(astro_5.obs['diet']=='hfd_5') & 
                     (astro_5.obs['gfap_aldh'].isin(['aldh_only','both'])), ]
    dets_mark = de.test.t_test(data=astro5,
                               sample_description=astro5.obs, 
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01), 
                                          qval_thres=0.1,
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05,
                           corrected_pval=False, 
                           min_fc=1,
                           size=15,
                           log10_p_threshold=-50, 
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_hfd5_aldh-both",
                           suffix="_clust012.png"
    )

### hfd 15

In [302]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_15', ], 
                        ['gfap_only'],
                        palette=['gainsboro', 'magenta'],
                        save="_clust012_gfap_only_hfd15.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

In [303]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_15', ],
                        ['aldh_only'],
                        palette=['gainsboro', 'green'],
                        save="_clust012_aldh_only_hfd15.png",
                        size=10,
                        color_map=mymap,
                        use_raw=False
    )

In [304]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_15', ], 
                        ['both'],
                        palette=['gainsboro', 'blue'],
                        save="_clust012_both_hfd15.png",
                        size=10,
                        color_map=mymap,
                        use_raw=False
    )

In [305]:
if bool_plot == True:
    cf.plot_umap_marker(adata_012[adata_012.obs['diet'] == 'hfd_15', ],
                        ['gfap_aldh'], 
                        palette=['green', 'blue', 'magenta', 'gainsboro'],
                        save="_clust012_gfap_aldh_hfd15.png",
                        size=10, 
                        color_map=mymap,
                        use_raw=False
    )

### Differential gene expression - hfd_15

In [306]:
astro_15 = adata_012[:,~adata_012.var.index.isin(['Gfap', 'Aldh1l1'])]

In [307]:
if bool_plot == True:
    astro15 = astro_15[(astro_15.obs['diet'] == 'hfd_15') & 
                    (astro_15.obs['gfap_aldh'].isin(['aldh_only','gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro15,
                               sample_description=astro15.obs, 
                               grouping="gfap_aldh", 
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1,
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    dets_mark_summary
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05, 
                           corrected_pval=False,
                           min_fc=1,
                           size=15, 
                           log10_p_threshold=-50,
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_hfd15_gfap-aldh",
                           suffix="_clust012.png"
    )

In [308]:
if bool_plot == True:
    astro15 = astro_15[(astro_15.obs['diet'] == 'hfd_15') & 
                       (astro_15.obs['gfap_aldh'].isin(['both', 'gfap_only'])), ]
    dets_mark = de.test.t_test(data=astro15, 
                               sample_description=astro15.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1,
                                          fc_lower_thres=1, 
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05, 
                           corrected_pval=False, 
                           min_fc=1, 
                           size=15,
                           log10_p_threshold=-50, 
                           log2_fc_threshold=6,
                           save=sc_settings_figdir+"dge_volcano_hfd15_gfap-both",
                           suffix="_clust012.png"
    )

In [309]:
if bool_plot == True:
    astro15 = astro_15[(astro_15.obs['diet'] == 'hfd_15') & 
                    (astro_15.obs['gfap_aldh'].isin(['aldh_only', 'both'])), ]
    dets_mark = de.test.t_test(data=astro15,
                               sample_description=astro15.obs,
                               grouping="gfap_aldh",
                               is_logged=False
    )
    dets_mark_summary = dets_mark.summary(mean_thres=np.log(0.01),
                                          qval_thres=0.1, 
                                          fc_lower_thres=1,
                                          fc_upper_thres=1
    )
    #dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", sep="\t")
    dets_mark.plot_volcano(alpha=0.05, 
                           corrected_pval=False,
                           min_fc=1,
                           size=15,
                           log10_p_threshold=-50, 
                           log2_fc_threshold=5.6,
                           save=sc_settings_figdir+"dge_volcano_hfd15_aldh-both",
                           suffix="_clust012.png"
    )