# scMultiome from NVF E14.5 Pancreatic Cells - Prepare GEX & ATAC Data

In [2]:
import scipy as sci
import numpy as np
import pandas as pd
import logging
import scanpy as sc

In [3]:
sc.settings.verbosity = 3
sc.logging.print_versions()

The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info.
-----
anndata     0.7.8
scanpy      1.8.2
sinfo       0.3.4
-----
PIL                         8.4.0
anyio                       NA
attr                        21.2.0
babel                       2.9.1
backcall                    0.2.0
beta_ufunc                  NA
binom_ufunc                 NA
bottleneck                  1.3.2
certifi                     2021.10.08
cffi                        1.15.0
chardet                     4.0.0
charset_normalizer          2.0.7
cloudpickle                 2.0.0
colorama 

# Prepare Data for DropletUtils

Load raw data filter out droplets with less than 1 counts and save.

In [6]:
samples = ['E14-5', 'E15-5']
base_path = '/storage/scRNA-seq/scMultiome_Mouse-Islets_NVF-E14.5_210044/data/cr_arc/cr_count/' # might have to be adapted
outs_path = '/outs' # might have to be adapted

for sample in samples:
    print('Loading ' + base_path + sample + outs_path)
    path = base_path + sample + outs_path
    adata = sc.read_10x_h5(path + '/raw_feature_bc_matrix.h5', gex_only=False)
    print(adata.shape)
    sc.pp.filter_cells(adata, min_counts=1)
    sc.pp.filter_genes(adata, min_cells=1)
    print(adata.shape,'\n\n')
    # Save combined
    sc.write(path + '/' + sample + '_raw_feature_bc_matrix', adata)
    # Split and save GEX and ATAC
    print('Shape GEX:', adata[:,adata.var.feature_types.isin(['Gene Expression'])].shape)
    sc.write(path + '/' + sample + '_raw_gex_bc_matrix', adata[:,adata.var.feature_types.isin(['Gene Expression'])])
    print('Shape ATAC:', adata[:,adata.var.feature_types.isin(['Peaks'])].shape)
    sc.write(path + '/' + sample + '_raw_atac_bc_matrix', adata[:,adata.var.feature_types.isin(['Peaks'])])
    del adata
    gc.collect()


Loading /storage/scRNA-seq/scMultiome_Mouse-Islets_NVF-E14.5_210044/data/cr_arc/cr_count/E15-5/outs
reading /storage/scRNA-seq/scMultiome_Mouse-Islets_NVF-E14.5_210044/data/cr_arc/cr_count/E15-5/outs/raw_feature_bc_matrix.h5


Variable names are not unique. To make them unique, call `.var_names_make_unique`.


 (0:00:10)
(735587, 221966)
filtered out 768 cells that have less than 1 counts


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


filtered out 6471 genes that are detected in less than 1 cells


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


(734819, 215495) 




  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'feature_types' as categorical
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'genome' as categorical


Shape GEX: (734819, 24784)
Shape ATAC: (734819, 190711)
