# Summary of notebook

In this notebook, we perform the following analysis:
(1) Clustering of the cells of young and old mice, annotation of the clusters using marker genes of immune cells and adipocytes. 
(2) Separate the preadipocytes from the immune cells and identify the mature subpopulation.

# Load packages and set global variables

In [3]:
import numpy as np
import scanpy as sc
import scipy as sci
import scipy.sparse
import re
import pandas
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import sys

from matplotlib import colors

import batchglm
import diffxpy.api as de

#from beakerx import *

%load_ext autoreload
%autoreload 2

sc.settings.verbosity = 3 # amount of output
dir_in = '/Users/viktorian.miok/Documents/consultation/Altun-Ussar/David/data/'
dir_out = '/Users/viktorian.miok/Documents/consultation/Altun-Ussar/David/results/'
dir_tables = dir_out+'tables/'
sc_settings_figdir = dir_out+'panels/'
sc_settings_writedir = dir_out+'anndata/'
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80, scanpy=True)
print (sys.version)



-----
anndata     0.7.5
scanpy      1.7.1
sinfo       0.3.1
-----
PIL                 8.1.2
PyObjCTools         NA
anndata             0.7.5
appnope             0.1.2
autoreload          NA
backcall            0.2.0
batchglm            v0.7.4
cffi                1.14.5
cloudpickle         1.6.0
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dask                2021.03.0
dateutil            2.8.1
decorator           4.4.2
diffxpy             v0.7.4
get_version         2.1
h5py                3.2.1
igraph              0.9.0
ipykernel           5.4.3
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.17.2
joblib              1.0.1
kiwisolver          1.3.1
legacy_api_wrap     1.2
leidenalg           0.8.3
llvmlite            0.36.0
matplotlib          3.3.4
mpl_toolkits        NA
natsort             7.1.1
numba               0.53.0
numexpr             2.7.3
numpy               1.20.1
packaging           20.9
pandas              1.2.3
par

In [5]:
print(de.__version__)

v0.7.4


## Global variables

All embeddings and clusterings can be saved and loaded into this script. Be careful with over-writing cluster caches as soon as cell type annotation has started as cluster labels may be shuffled.

Set whether anndata objects are recomputed or loaded from cache.

In [6]:
bool_recomp = False

Set whether clustering is recomputed or loaded from saved .obs file. Loading makes sense if the clustering changes due to a change in scanpy or one of its dependencies and the number of clusters or the cluster labels change accordingly.

In [7]:
bool_recluster = False

Set whether cluster cache is overwritten. Note that the cache exists for reproducibility of clustering, see above.

In [8]:
bool_write_cluster_cache = False

Set whether to produce plots, set to False for test runs.

In [9]:
bool_plot = False

# Load data

In [11]:
if bool_recomp:
    # Count matrix:
    fn_cnts_young = dir_in+'Data S1 adolescent_filtered_gene_bc_matrices_h5.h5'
    adata_young = sc.read_10x_h5(fn_cnts_young)
    fn_cnts_old = dir_in+'Data S2 adult_filtered_gene_bc_matrices_h5.h5'
    adata_old = sc.read_10x_h5(fn_cnts_old)
    adata_raw = adata_young.concatenate([adata_old],
                                        batch_key='age',
                                        batch_categories=['young', 'old']
    )
    sc.write(sc_settings_writedir+'adata_raw.h5ad', adata_raw)
else:
    adata_raw = sc.read(sc_settings_writedir+'adata_raw.h5ad')

# Process data

## Embeddings and clustering

Summary of steps performed here: Only cells with at least 500 UMIs are kept. Counts per cell are cell library depth normalized. The gene (feature) space is reduced with PCA to 50 PCs. A nearest neighbour graph and t-SNE are computed based on the PC space. Cell are clustered with louvain clustering based on the nearest neighbour graph. Graph abstraction is computed based on the louvain clustering.

In [12]:
if bool_recomp:
    adata_proc = adata_raw.copy()
    sc.pp.filter_cells(adata_proc,
                       min_counts=500
    )
    sc.pp.normalize_per_cell(adata_proc)
    adata_proc.raw = sc.pp.log1p(adata_proc,
                                 copy=True
    )
    sc.pp.pca(adata_proc,
              n_comps=50,
              random_state=0, 
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_proc,
                    n_neighbors=100,
                    knn=True, 
                    method='umap',
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.tsne(adata_proc,
               n_jobs=3
    )
    sc.tl.umap(adata_proc)
    if bool_recluster == True:
        sc.tl.louvain(adata_proc, 
                      resolution=1,
                      flavor='vtraag', 
                      random_state=0
        )
        pandas.DataFrame(adata_proc.obs).to_csv(
            path_or_buf = sc_settings_writedir+"obs_adata_proc.csv")
    else:
        obs = pandas.read_csv(sc_settings_writedir+'obs_adata_proc.csv')
        adata_proc.obs['louvain'] = pandas.Series(obs['louvain'].values,
                                                  dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_proc.h5ad', adata_proc)
else:
    adata_proc = sc.read(sc_settings_writedir+'adata_proc.h5ad')
sc.tl.paga(adata_proc)


This is where adjacency matrices should go now.
  warn(

This is where adjacency matrices should go now.
  warn(
running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


Produce some summarizing plots that show the global characteristics of the data.

In [13]:
#Define a nice colour map for gene expression
colors2 = plt.cm.Reds(np.linspace(0, 1, 128))
colors3 = plt.cm.Greys_r(np.linspace(0.7,0.8,20))
colorsComb = np.vstack([colors3, colors2])
mymap = colors.LinearSegmentedColormap.from_list('my_colormap', 
                                                 colorsComb
)

In [14]:
if bool_plot==True:
    sc.pl.tsne(adata_proc,
               color=['age'],
               size=5,
               save="_all_age.pdf"
    )
    sc.pl.tsne(adata_proc,
               color=['louvain'],
               size=5,
               save="_all_louvain.pdf"
    )
    sc.pl.tsne(adata_proc,
               color=['n_counts'],
               size=5, 
               save="_all_n_counts.pdf"
    )
    sc.pl.tsne(adata_proc, 
               color=['Pdgfra'],
               size=5,
               save="_all_Pdgfra.pdf", 
               color_map=mymap
    )
    sc.pl.tsne(adata_proc, 
               color=['Slc7a10'],
               size=5,
               save="_all_Slc7a10.pdf", 
               color_map=mymap
    )

In [15]:
if bool_plot == True:
    sc.pl.paga(adata_proc,
               save="_all.pdf"
    )

Number of cells in each sample:

In [16]:
print(np.sum(adata_proc.obs["age"].values == "young"))

3915


In [17]:
print(np.sum(adata_proc.obs["age"].values == "old"))

5429


In [18]:
adata_proc.obs['louvain'].value_counts()

0     1733
1     1379
2     1222
3      948
4      892
5      816
6      799
7      518
8      514
9      421
10     102
Name: louvain, dtype: int64

# Define cell types

## Marker genes

### Define marker sets

Define surface marker sets for some of the expected cell types.

In [19]:
# Leukocyte markers:
leukocyte_markers = ['Ptprc']
tc_markers = ['Cd3d','Cd3e','Cd3g','Cd4'] 
nk_markers = ['Nkg7','Il2rb','Ncr1','Klrd1','Klrb1b','Klrb1f']
myeloid_markers = ['Cd79a','Itgax','Itgam','Fcgr3','S100a8', 'S100a9']
mp_markers = ['Adgre1','Lyz2']
dc_markers = ['Cd74','Anpep' , 'Cd33', 'Cd80', 'Cd83', 'Cd86'] 
bc_markers = ['Cd19']
adipocyte_markers = ['Pdgfra','Slc7a10','Pparg','Fermt2','Fbn1','Col4a1','Itgb1','Cd34','Cd24a','Dlk1','Slc7a10']
megakaryocyte_markers = ['Ppbp']
erythrocyte_markers = ['Gypa']
go_adip_dev = ['Aacs','Acat1','Arid5b','Arrdc3','Atf2','Bbs4','Bdh1','Csf1','Dgat2','Dyrk1b','Ebf2','Amer1','Fto',
               'Id2','Lep','Lrp5','Nampt','Oxct1','Paxip1','Pik3ca','Ppard','Ppargc1a','Rorc','Sh3pxd2b','Slc25a25',
               'Sox8','Spg20','Tbl1xr1','Xbp1']

### Plotting routines for marker gene sets:

In [20]:
def plot_violin_marker(adata, markers, save=None, use_raw=True):
    for i in range(len(markers) // 2 + len(markers) % 2):
        if save is not None:
            sc.pl.violin(adata, 
                         groupby='louvain', 
                         keys=markers[(2*i):np.min([2*(i+1), len(markers)])], 
                         use_raw=use_raw, 
                         rotation=90,
                         size=5,
                         save=save+"_"+str(i)+".pdf"
            )
        else:
            sc.pl.violin(adata, 
                         groupby='louvain', 
                         keys=markers[(2*i):np.min([2*(i+1), len(markers)])], 
                         use_raw=use_raw, 
                         rotation=90, 
                         size=5,
                         save="dasdad"
            )
        
def plot_tsne_marker(adata, markers, size=5, save=None, use_raw=True):
    for i in range(len(markers) // 2 + len(markers) % 2):
        if save is not None:
            sc.pl.tsne(adata, 
                       color=markers[(2*i):np.min([2*(i+1), len(markers)])], 
                       size=size,
                       use_raw=use_raw,
                       color_map=mymap,
                       save=save+"_"+str(i)+".pdf"
            )
        else:
            sc.pl.tsne(adata, 
                       color=markers[(2*i):np.min([2*(i+1), len(markers)])], 
                       size=size,
                       use_raw=use_raw
            )

### Leukocyte markers:

In [21]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       leukocyte_markers, 
                       save="_all_markers_leukocyte"
    )

Cluster 1,6,7,8,9 and potentially 0,10 express leukocyte marker Ptprc. These clusters are further validated by leukocyte specific markers below. The remaining clusters 2,3,4 are investigated with non-leukocyte marker sets.

### Megakaryocyte markers:

In [22]:
if bool_plot == True:
    plot_violin_marker(adata_proc,
                       megakaryocyte_markers, 
                       save="_all_markers_megakaryocytes"
    )

There do not seem to be many megakaryocytes in this data set.

### Erythrocyte markers:

In [23]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       erythrocyte_markers, 
                       save="_all_markers_erythrcytes"
    )

There do not seem to be many erythrocytes in this data set.

### Preadipocyte markers:

In [24]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       adipocyte_markers, 
                       save="_all_markers_preadipocytes"
    )

Cluster 2,3,4 express adipocyte markers.

### T-cell markers:

In [25]:
if bool_plot == True:
    plot_violin_marker(adata_proc,
                       tc_markers, 
                       save="_all_markers_tcells"
    )

Marker gene expression suggests that cluster 1 and 6 are T-cells, interestingly not Cd4+Cd8+ T-cells it seems as Cd4 expression is low.

### Natural killer cell markers:

In [26]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       nk_markers,
                       save="_all_markers_nk"
    )

Cluster 1,6,7, have natural killer cell marker gene expression, cluster 1 also expresses Cd3 so it may contain yd-T-cells-?

### Myeloid cell markers:

In [27]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       myeloid_markers,
                       save="_all_markers_myeloid"
    )

Cluster 5,8,9,10 express myeloid cell marker genes. Cluster 9 seems to have bimodal expression in S100a8 and S100a9 so it may need subclustering to subdivide cell types here.

### Macrophage markers:

In [28]:
if bool_plot == True:
    plot_violin_marker(adata_proc,
                       mp_markers,
                       save="_all_markers_macrophages"
    )

Cluster 5,7,8,9 express macrophage markers, in line with the myeloid cell marker gene expression.

### Dendritic cell markers:

In [29]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       dc_markers, 
                       save="_all_markers_dendritic"
    )

Clusters 0,5,7,8,9 express dendritic cell markers, in line with myeloid marker gene expression, cluster 0,7 could be a non-myeloid dendritic cell.

### B-cell markers:

In [30]:
if bool_plot == True:
    plot_violin_marker(adata_proc, 
                       bc_markers,
                       save="_all_markers_bcells"
    )

Cluster 0 may contain B-cells.

## Summary heatmap to characterize cell types

Select a few genes to summarize cell type assignments:

In [31]:
selected_leukocyte_markers = ['Ptprc']
selected_tc_markers = ['Cd3d','Cd3g']
selected_nk_markers = ['Nkg7','Klrd1']
selected_myeloid_markers = ['Fcgr3','S100a8']
selected_mp_markers = ['Adgre1','Lyz2']
selected_dc_markers = ['Cd74','Cd83'] 
selected_bc_markers = ['Cd19']
selected_adipocyte_markers = ['Pdgfra','Fbn1','Col4a1','Cd34','Cd24a','Dlk1','Slc7a10'] 
selected_megakaryocyte_markers = ['Ppbp']
selected_erythrocyte_markers = ['Gypa']

In [32]:
selected_cell_markers = selected_leukocyte_markers + \
selected_megakaryocyte_markers + \
selected_erythrocyte_markers + \
selected_myeloid_markers + \
selected_mp_markers + \
selected_dc_markers + \
selected_bc_markers + \
selected_tc_markers + \
selected_nk_markers + \
selected_adipocyte_markers

Only keep markers that occur in data set.

In [33]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_proc, 
                  var_names=selected_cell_markers, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=False, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_all_markers_celltypes.pdf"
    )

# Preadipocytes only

## Embedding and clustering

In [34]:
if bool_recomp == True:
    cell_ids_adip = np.asarray(adata_proc.obs_names)[[x in ['2','3','4'] 
                                                      for x in np.asarray(adata_proc.obs['louvain'].values)]]
    adata_adip = adata_raw[cell_ids_adip,:].copy()
    sc.pp.filter_cells(adata_adip,
                       min_counts=500
    )
    sc.pp.normalize_per_cell(adata_adip)
    adata_adip.raw = adata_adip.copy()
    sc.pp.log1p(adata_adip)
    sc.pp.pca(adata_adip,
              n_comps=50,
              random_state=0,
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_adip,
                    n_neighbors=100,
                    knn=True, 
                    method='umap',
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.tsne(adata_adip,
               n_jobs=3
    )
    if bool_recluster == True:
        sc.tl.louvain(adata_adip, 
                      resolution=1,
                      flavor='vtraag', 
                      random_state=0
        )
        pandas.DataFrame(adata_adip.obs).to_csv(
            path_or_buf = sc_settings_writedir+"obs_adata_adip.csv")
    else:
        obs = pandas.read_csv(sc_settings_writedir+'obs_adata_adip.csv')
        adata_adip.obs['louvain'] = pandas.Series(obs['louvain'].values, 
                                                  dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_adip.h5ad', adata_adip)
else:
    adata_adip = sc.read(sc_settings_writedir+'adata_adip.h5ad')
sc.tl.paga(adata_adip)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


In [35]:
if bool_plot==True:
    sc.pl.tsne(adata_adip,
               color=['age'],
               size=20,
               save="_preadip_age.pdf"
    )
    sc.pl.tsne(adata_adip,
               color=['louvain'],
               size=20, 
               save="_preadip_louvain.pdf"
    )
    sc.pl.tsne(adata_adip,
               color=['n_counts'],
               size=20,
               save="_preadip_n_counts.pdf"
    )

Fairly equal count depth across preadipocytes. Potential batch effect or age effect, indistinguishable in this scenario.

In [36]:
if bool_plot == True:
    sc.pl.paga(adata_adip, 
               save="_preadip.pdf"
    )

## Marker gene sets

In [37]:
if bool_plot == True:
    plot_violin_marker(adata_adip,
                       adipocyte_markers,
                       save="_preadip_markers_preadipcytes",
                       use_raw=False
    )

In [38]:
if bool_plot==True:
    plot_tsne_marker(adata_adip,
                     adipocyte_markers, 
                     size=20, 
                     save="_preadip_markers_preadipcytes", 
                     use_raw=False
    )

It seems that Slc7a10+ cells are in clusters 0,2,4, which directly correspond to the samples of the young mice with a few exceptions. 

# Preadipocytes only by age

## Embedding and clustering

### Young mouse

In [39]:
if bool_recomp==True:
    cell_ids_adip_young = np.asarray(adata_proc.obs_names)[
        np.logical_and(np.asarray([x in ['2','3','4']
                                   for x in np.asarray(adata_proc.obs['louvain'].values)]),
                       np.asarray([x=='young'
                                   for x in np.asarray(adata_proc.obs['age'].values)]))]
    adata_adip_young = adata_raw[cell_ids_adip_young,:].copy()
    sc.pp.filter_cells(adata_adip_young,
                       min_counts=500
    )
    sc.pp.normalize_per_cell(adata_adip_young)
    adata_adip_young.raw = adata_adip_young.copy()
    sc.pp.log1p(adata_adip_young)
    sc.pp.pca(adata_adip_young,
              n_comps=50, 
              random_state=0,
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_adip_young,
                    n_neighbors=100,
                    knn=True,
                    method='umap',
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.tsne(adata_adip_young,
               n_jobs=3
    )
    if bool_recluster == True:
        sc.tl.louvain(adata_adip_young, 
                      resolution=1,
                      flavor='vtraag',
                      random_state=0
        )
        pandas.DataFrame(adata_adip_young.obs).to_csv(
            path_or_buf = sc_settings_writedir+"obs_adata_adip_young.csv")
    else:
        obs = pandas.read_csv(sc_settings_writedir+'obs_adata_adip_young.csv')
        adata_adip_young.obs['louvain'] = pandas.Series(obs['louvain'].values, 
                                                        dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_adip_young.h5ad', adata_adip_young)
else:
    adata_adip_young = sc.read(sc_settings_writedir+'adata_adip_young.h5ad')
sc.tl.paga(adata_adip_young)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


In [40]:
if bool_plot == True:
    sc.pl.tsne(adata_adip_young,
               color=['louvain'],
               size=20,
               save="_preadipYoung_louvain.pdf"
    )
    sc.pl.tsne(adata_adip_young, 
               color=['n_counts'],
               size=20,
               save="_preadipYoung_n_counts.pdf"
    )
    sc.pl.paga(adata_adip_young, 
               save="_preadipYoung.pdf"
    )

Number of preadipocytes observed in young mouse with scRNAseq:

In [41]:
print(adata_adip_young.X.shape[0])

1636


### Young mouse - coarse clustering

In [43]:
adata_adip_young_lowres = adata_adip_young.copy()
if bool_recomp == True:
    if True:
        sc.tl.louvain(adata_adip_young_lowres, 
                      resolution=0.5, 
                      flavor='vtraag',
                      random_state=0
        )
        pandas.DataFrame(adata_adip_young_lowres.obs).to_csv(
            path_or_buf = sc_settings_writedir+"obs_adata_adip_young_lowres.csv")
    else:
        obs = pandas.read_csv(sc_settings_writedir+'obs_adata_adip_young_lowres.csv')
        adata_adip_young_lowres.obs['louvain'] = pandas.Series(obs['louvain'].values, 
                                                               dtype='category'
        )

In [44]:
if bool_plot == True:
    sc.pl.tsne(adata_adip_young_lowres,
               color=['louvain'],
               size=20, 
               save="_preadipYoung_louvain_lowres.pdf"
    )

In [45]:
adata_adip_young_lowres.obs['louvain'].value_counts()

0    503
1    386
2    373
3    306
4     68
Name: louvain, dtype: int64

### Old mouse

In [46]:
if bool_recomp == True:
    cell_ids_adip_old = np.asarray(adata_proc.obs_names)[
        np.logical_and(np.asarray([x in ['2','3','4']
                                   for x in np.asarray(adata_proc.obs['louvain'].values)]),
                       np.asarray([x=='old'
                                   for x in np.asarray(adata_proc.obs['age'].values)]))]
    adata_adip_old = adata_raw[cell_ids_adip_old,:].copy()
    sc.pp.filter_cells(adata_adip_old,
                       min_counts=500
    )
    sc.pp.normalize_per_cell(adata_adip_old)
    adata_adip_old.raw = adata_adip_old.copy()
    sc.pp.log1p(adata_adip_old)
    sc.pp.pca(adata_adip_old,
              n_comps=50,
              random_state=0,
              svd_solver='arpack'
    )
    sc.pp.neighbors(adata_adip_old,
                    n_neighbors=50,
                    knn=True,
                    method='umap',
                    n_pcs=50,
                    random_state=0
    )
    sc.tl.tsne(adata_adip_old, 
               n_jobs=3
    )
    if bool_recluster == True:
        sc.tl.louvain(adata_adip_old, 
                      resolution=1,
                      flavor='vtraag',
                      random_state=0
        )
        pandas.DataFrame(adata_adip_old.obs).to_csv(
            path_or_buf = sc_settings_writedir+"obs_adata_adip_old.csv")
    else:
        obs = pandas.read_csv(sc_settings_writedir+'obs_adata_adip_old.csv')
        adata_adip_old.obs['louvain'] = pandas.Series(obs['louvain'].values, 
                                                      dtype='category'
        )
    sc.write(sc_settings_writedir+'adata_adip_old.h5ad', adata_adip_old)
else:
    adata_adip_old = sc.read(sc_settings_writedir+'adata_adip_old.h5ad')
sc.tl.paga(adata_adip_old)

running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)


In [47]:
if bool_plot == True:
    sc.pl.tsne(adata_adip_old,
               color=['louvain'], 
               size=20,
               save="_preadipOld_louvain.pdf"
    )
    sc.pl.tsne(adata_adip_old, 
               color=['n_counts'],
               size=20, 
               save="_preadipOld_n_counts.pdf"
    )
    sc.pl.paga(adata_adip_old,
               save="_preadipOld.pdf"
    )

Number of preadipocytes observed in old mouse with scRNAseq:

In [48]:
print(adata_adip_old.X.shape[0])

1426


## Marker gene sets

### Young mouse

In [49]:
adipocyte_markers = ['Pdgfra','Slc7a10','Pparg','Fermt2','Fbn1','Col4a1',
                     'Itgb1','Cd34','Cd24a','Dlk1','Srr','Fabp4','Dpp4']

In [50]:
if bool_plot == True:
    plot_violin_marker(adata_adip_young,
                       adipocyte_markers,
                       save="_preadipYoung_markers_preadipocytes",
                       use_raw=False
    )

In [51]:
if bool_plot == True:
    plot_violin_marker(adata_adip_young,
                       go_adip_dev,
                       save="_preadipYoung_markers_GO_adipocyte_dev", 
                       use_raw=False
    )

In [52]:
if bool_plot == True:
    plot_tsne_marker(adata_adip_young, 
                     adipocyte_markers,
                     size=20,
                     save="_preadipYoung_markers_preadipocytes", 
                     use_raw=False
    )

In [53]:
if bool_plot==True:
    plot_tsne_marker(adata_adip_young,
                     go_adip_dev,
                     size=20,
                     save="_preadipYoung_markers_GO_adipocyte_dev",
                     use_raw=False
    )

In [54]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_young, 
                  var_names=adipocyte_markers, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  cmap="viridis",
                  vmin=-1,
                  vmax=4,
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipYoung_markers_preadipocytes.pdf"
    )

In [55]:

if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_young, 
                  var_names=go_adip_dev, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  cmap="viridis",
                  vmin=-1,
                  vmax=2,
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipYoung_markers_GO_adipocyte_dev.pdf"
    )

In [56]:
[{'mean expression of Slc7a10 in louvain group '+x:
  np.mean(adata_adip_young[adata_adip_young.obs['louvain'].values == x,:][:,'Slc7a10'].X)} 
 for x in adata_adip_young.obs['louvain'].cat.categories.values]

  res = method(*args, **kwargs)


[{'mean expression of Slc7a10 in louvain group 0': 0.07535989},
 {'mean expression of Slc7a10 in louvain group 1': 0.13056228},
 {'mean expression of Slc7a10 in louvain group 2': 0.18283738},
 {'mean expression of Slc7a10 in louvain group 3': 0.14173967},
 {'mean expression of Slc7a10 in louvain group 4': 0.00809834}]

This seems to be a continuum of cell types with two opposing gradients of gene expression: Pdgfra and Slc7a10. This could indicate a developmental lineage.

### Young mouse - coarse clustering

In [57]:
if bool_plot == True:
    plot_violin_marker(adata_adip_young_lowres,
                       adipocyte_markers,
                       save="_preadipYoung_lowres_markers_preadipocytes"
    )

In [58]:
if bool_plot == True:
    plot_violin_marker(adata_adip_young_lowres, 
                       go_adip_dev, 
                       save="_preadipYoung_lowres_markers_GO_adipocyte_dev"
    )

In [59]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_young_lowres, 
                  var_names=adipocyte_markers, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipYoung_lowres_markers_preadipocytes.pdf"
    )

In [60]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_young_lowres, 
                  var_names=go_adip_dev, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipYoung_lowres_markers_GO_adipocyte_dev.pdf"
    )

### Old mouse

In [61]:
if bool_plot == True:
    plot_violin_marker(adata_adip_old, 
                       adipocyte_markers,
                       save="_preadipOld_markers_preadipocytes"
    )

In [62]:
if bool_plot == True:
    plot_violin_marker(adata_adip_old, 
                       go_adip_dev, 
                       save="_preadipOld_markers_GO_adipocyte_dev"
    )

In [63]:
if bool_plot == True:
    plot_tsne_marker(adata_adip_old,
                     adipocyte_markers, 
                     size=10,
                     save="_preadipOld_markers_preadipocytes"
    )

In [64]:
if bool_plot == True:
    plot_tsne_marker(adata_adip_old,
                     go_adip_dev, 
                     size=10,
                     save="_preadipOld_markers_GO_adipocyte_dev"
    )

In [65]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_old, 
                  var_names=adipocyte_markers, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipOld_markers_preadipocytes.pdf"
    )

In [66]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_old, 
                  var_names=go_adip_dev, 
                  groupby="louvain", 
                  use_raw=True, 
                  log=True, 
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipOld_markers_GO_adipocyte_dev.pdf"
    )

This looks like there are two clusters, one consisting of louvain group 0,4 (Pdgfra-Slc7a10-) and one of group 1,2,3,5 (Pdgfra+Slc7a10-).

# Differential expression analysis

## By age

In [67]:
dets_age = de.test.t_test(data=adata_adip.raw, 
                          sample_description=adata_adip.obs, 
                          grouping="age", 
                          is_logged=False
)

The differentially expressed genes with some fold-change and mean expression thresholding are:

In [85]:
if bool_plot == True:
    dets_age_summary = dets_age.summary(mean_thres=np.log(0.01), 
                                        qval_thres=0.05, 
                                        fc_lower_thres=0.5, 
                                        fc_upper_thres=2
    )
    dets_age_summary.to_csv(path_or_buf=dir_tables+"DE_preadip_by_age.tab", 
                            sep="\t"
    )
    dets_age_summary

In [71]:
if bool_plot == True:
    dets_age.plot_volcano(alpha=0.01,
                          min_fc=1.5,
                          size=15, 
                          log10_p_threshold=-50,
                          log2_fc_threshold=7,
                          highlight_ids=['Slc7a10'], 
                          highlight_size=30, 
                          highlight_col="red",
                          save=sc_settings_figdir+"preadip_DE_age", 
                          suffix="_volcano.pdf"
    )

In [72]:
if bool_plot==True:
    dets_age.plot_ma(alpha=0.01,
                     log2_fc_threshold=6,
                     highlight_ids=['Slc7a10'], 
                     highlight_size=30, 
                     highlight_col="red",
                     save=sc_settings_figdir+"preadip_DE_age",
                     suffix="_ma_plot.pdf"
    )

Count DE genes:

In [73]:
np.sum(np.array([x < 0.01 if x is not np.nan else False for x in dets_age.qval]))

3359

## By louvain group

### Coarse clustering

#### Test

Here, we perform a differential expression test across the coarse-grained louvain clustering which yielded two groups.

In [75]:
if bool_plot==True:
    dets_young_lowres_louvain = de.test.t_test(data=adata_adip_young_lowres.raw, 
                                               sample_description=adata_adip_young_lowres.obs, 
                                               grouping="louvain", 
                                               is_logged=False
    )

The differentially expressed genes with some fold-change and mean expression thresholding are:

In [76]:
if bool_plot == True:
    dets_young_lowres_louvain_summary = dets_young_lowres_louvain.summary(mean_thres=np.log(0.01), 
                                                                          qval_thres=0.05, 
                                                                          fc_lower_thres=0.5,
                                                                          fc_upper_thres=2
    )
    dets_young_lowres_louvain_summary.to_csv(path_or_buf=dir_tables+"DE_preadipYoung_by_louvain_lowres.csv",
                                             sep="\t"
    )
    dets_young_lowres_louvain_summary

In [77]:
if bool_plot == True:
    dets_young_lowres_louvain.plot_volcano(alpha=0.01,
                                           min_fc=1.5,
                                           size=15,
                                           log10_p_threshold=-15,
                                           log2_fc_threshold=5,
                                           highlight_ids=['Slc7a10'], 
                                           highlight_size=30,
                                           highlight_col="red",
                                           save=sc_settings_figdir+"preadipYoung_DE_louvain_lowres",
                                           suffix="_volcano.pdf"
    )

In [78]:
if bool_plot == True:
    dets_young_lowres_louvain.plot_ma(alpha=0.01,
                                      log2_fc_threshold=5,
                                      highlight_ids=['Slc7a10'], 
                                      highlight_size=30,
                                      highlight_col="red",
                                      save=sc_settings_figdir+"preadipYoung_DE_louvain_lowres",
                                      suffix="_ma_plot.pdf"
    )

Count DE genes:

In [79]:
np.sum(np.array([x < 0.01 if x is not np.nan else False for x in dets_young_lowres_louvain.qval]))

1927

Save DE genes to file for enrichment:

In [80]:
dets_young_lowres_louvain.summary(qval_thres=0.01)["gene"].to_csv(dir_out+"enrichment/de_genes_young_preadip.csv",
                                                                  index=False
)

#### Heatmaps

Select all differentially expressed genes at a corrected p-value threshold of 0.01 and a minimal or maximal log2 fold change of 2 or 0.5 and a minimal mean expression of 0.5.

In [81]:
dets_young_lowres_louvain_summary_forheatmap = dets_young_lowres_louvain.summary(mean_thres=0.5,
                                                                                 qval_thres=0.01, 
                                                                                 fc_lower_thres=0.5, 
                                                                                 fc_upper_thres=2
)

In [82]:
all_de_genes_young_coarse = dets_young_lowres_louvain_summary_forheatmap['gene'].values[
    np.argsort(dets_young_lowres_louvain_summary_forheatmap['log2fc'].values)
]

In [83]:
print(len(all_de_genes_young_coarse))

257


In [84]:
if bool_plot == True:
    sc.pl.heatmap(adata=adata_adip_young_lowres, 
                  var_names=all_de_genes_young_coarse, 
                  groupby="louvain", 
                  use_raw=False, 
                  log=False, 
                  cmap="viridis",
                  vmin=-1,
                  vmax=5,
                  dendrogram=False, 
                  var_group_rotation=90, 
                  show_gene_labels=True, 
                  show=True, 
                  save="_preadipYoung_lowres_all_de_genes.pdf"
    )