# Introduction

Notebook for storing metadata related to cell types, conditions, and colors to keep them consistent throughout the notebooks. This notebook was updated many times throughout the analysis, and so if starting from raw input files, it might not run immediately, but all the files imported here are eventually created in other notebooks. To reproduce results, I would recommend using the `meta.pkl` provided on GEO.

In [1]:
import pickle as pkl
import scanpy as sc
import matplotlib as mpl
import itertools as it
import os
import pandas as pd

In [2]:
mountpoint = '/data/clue_test/'

Path to the most recent concat object:

In [3]:
path_concat = mountpoint + 'prod/comb/h5ads/concat_6.h5ad'

Path to eQTL results:

In [4]:
path_eqtl = mountpoint + 'prod/eqtl/mateqtl/vals/'

# Create Metadata Dictionary

I use a `pkl` called `meta.pkl` in various notebooks throughout the processing. Here is where I create it.

In [15]:
meta = dict()

## `cts`

In [5]:
concat = sc.read_h5ad(path_concat)

In [16]:
ct_types = [i for i in concat.obs.columns if i.startswith('ct')]

In [17]:
meta['cts'] = dict()
for ct_type in ct_types:
    meta['cts'][ct_type] = concat.obs[ct_type].cat.categories.tolist()

### DESeq2

In [5]:
meta['cts']['de_set1'] = ['B_Naive', 'pDC', 'T4_Naive', 'HSC', 'T4_EM', 'NK', 'T_Tox', 'B_Mem', 'M_cDC', 'T8_Naive']
meta['cts']['de_set2'] = ['ncM', 'cM', 'cDC'] # for all conditions excpet PMA/I

## `ct` colors

In [18]:
ct_colors = dict()
for ct_type in ct_types:
    ct_colors[ct_type] = dict()
    ct_colors[ct_type]['hex'] = dict(zip(concat.obs[ct_type].cat.categories, concat.uns[ct_type + '_colors']))
    ct_colors[ct_type]['rgb'] = dict(zip(concat.obs[ct_type].cat.categories, list(map(mpl.colors.hex2color, concat.uns[ct_type + '_colors']))))

In [19]:
meta['ct_colors'] = ct_colors

## `conds`

In [5]:
meta['conds'] = dict()
meta['conds']['conds_all'] = ['0', 'A', 'B', 'C', 'G', 'P' ,'R']
meta['conds']['conds_filt'] = ['A', 'B', 'C', 'G', 'P' ,'R']
meta['conds']['conds_focus'] = ['A', 'B', 'C', 'G' ,'R']
meta['conds']['conds_stims'] = ['A', 'B', 'G' ,'R', 'P']
meta['conds']['stims_focus'] = ['A', 'B', 'G' ,'R']

## `cond_colors`

In [21]:
cond_colors = dict()
conds_all = meta['conds']['conds_all'] # 
cond_colors['hex'] = dict(zip(conds_all, sc.pl.palettes.default_20))
cond_colors['rgb'] = dict(zip(conds_all, list(map(mpl.colors.hex2color, sc.pl.palettes.default_20))))

In [22]:
meta['cond_colors'] = cond_colors

## `eQTL`

In [35]:
output_files = [fn for fn in os.listdir(path_eqtl) if fn.startswith('mateqtl_')]

In [36]:
cond_cts = list()
for output_file in output_files:
    cond_cts.append(tuple(output_file.replace('mateqtl_', '').replace('_all_cis.csv', '').split('_', maxsplit=1)))

In [49]:
meta['eqtl'] = dict()
meta['eqtl']['cond_cts'] = pd.DataFrame(cond_cts, columns=['cond', 'ct'], dtype='category')

### ATAC validation

In `${mountpoint}/amo/atac/peak_calling.ipynb`, I select a few cell types and conditions on which to run the ATAC-eQTL enrichment based on cell numbers.

In [4]:
atac_cond_cts = it.product(['C', 'G', 'B'], ['T4_Naive', 'T8_Naive', 'T4_EM', 'T8_Mem', 'NK_CD16+', 'cM', 'B_Naive', 'B_Mem'])
meta['eqtl']['cond_cts_atac'] = pd.DataFrame(atac_cond_cts, columns=['cond', 'ct'], dtype='category')

# Export

In [7]:
# with open(mountpoint + 'meta.pkl', 'wb') as file:
#     pkl.dump(meta, file)

In [4]:
with open(mountpoint + 'meta.pkl', 'rb') as file:
    meta = pkl.load(file)