### 1. General info of dataset GSE132509

This is the Jupyter Notebook for dataset GSE132509. Its dataset includes a big overall cell annotation tsv file and barcodes/genes/matrix files for each sample.

Thus, we need to simply incorparate these barcodes/genes/matrix files and generate an AnnData object for each sample. In total, there are 11 samples.

<span style="color:green">**[ETV6-RUNX1]**</span> Pre-B t(12;21) acute lymphoblastic leukemia

<span style="color:green">**[HHD]**</span> Pre-B High hyper diploid acute lymphoblastic leukemia

<span style="color:green">**[PRE-T]**</span> Pre-T acute lymphoblastic leukemia

<span style="color:green">**[PBMCC]**</span> Healthy pediatric bone marrow mononuclear cells

In [1]:
# Environment setup
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as anndata
import scipy

### 2. AnnData object of each sample

<span style="color:red">**IMPORTANT:**</span> rename files to get rid of prefixes

1. `barcodes.tsv`: cell barcodes, which go into `.obs`
2. `genes.tsv`: gene names, `.var`
3. `matrix.mtx`: the expression matrix, `.X`

In [3]:
etv6_runx1_1 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872434_ETV6-RUNX1_1',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 2776 × 33694

# save the anndata object for later use
etv6_runx1_1.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872434_ETV6-RUNX1_1.h5ad", compression="gzip")

In [4]:
etv6_runx1_2 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872435_ETV6-RUNX1_2',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 6274 × 33694

# save the anndata object for later use
etv6_runx1_2.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872435_ETV6-RUNX1_2.h5ad", compression="gzip")

In [2]:
etv6_runx1_3 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872436_ETV6-RUNX1_3',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 3862 × 33694

# save the anndata object for later use
etv6_runx1_3.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872436_ETV6-RUNX1_3.h5ad", compression="gzip")

In [4]:
etv6_runx1_4 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872437_ETV6-RUNX1_4',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 5069 × 33694

# save the anndata object for later use
etv6_runx1_4.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872437_ETV6-RUNX1_4.h5ad", compression="gzip")

In [6]:
hhd_1 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872438_HHD_1',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 3728 × 33694

# save the anndata object for later use
hhd_1.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872438_HHD_1.h5ad", compression="gzip")

In [8]:
hhd_2 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872439_HHD_2',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 5013 × 33694

# save the anndata object for later use
hhd_2.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872439_HHD_2.h5ad", compression="gzip")

In [11]:
pre_t_1 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872440_PRE-T_1',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 2959 × 33694

# save the anndata object for later use
pre_t_1.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872440_PRE-T_1.h5ad", compression="gzip")

In [13]:
pre_t_2 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872441_PRE-T_2',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 2748 × 33694

# save the anndata object for later use
pre_t_2.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872441_PRE-T_2.h5ad", compression="gzip")

In [15]:
pbmmc_1 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872442_PBMMC_1',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 1612 × 33694

# save the anndata object for later use
pbmmc_1.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872442_PBMMC_1.h5ad", compression="gzip")

In [17]:
pbmmc_2 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872443_PBMMC_2',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 3105 × 33694

# save the anndata object for later use
pbmmc_2.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872443_PBMMC_2.h5ad", compression="gzip")

In [19]:
pbmmc_3 = sc.read_10x_mtx(
    '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE132509/GSE132509_RAW/GSM3872444_PBMMC_3',  
    var_names='gene_symbols',                
    cache=False)                              
# n_obs × n_vars = 2229 × 33694

# save the anndata object for later use
pbmmc_3.write_h5ad("/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872444_PBMMC_3.h5ad", compression="gzip")

### 3. Confirmation of created AnnData objects

In [25]:
sample_1 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872434_ETV6-RUNX1_1.h5ad')
print(sample_1)

sample_2 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872435_ETV6-RUNX1_2.h5ad')
print(sample_2)

sample_3 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872436_ETV6-RUNX1_3.h5ad')
print(sample_3)

sample_4 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872437_ETV6-RUNX1_4.h5ad')
print(sample_4)

sample_5 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872438_HHD_1.h5ad')
print(sample_5)

sample_6 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872439_HHD_2.h5ad')
print(sample_6)

sample_7 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872440_PRE-T_1.h5ad')
print(sample_7)

sample_8 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872441_PRE-T_2.h5ad')
print(sample_8)

sample_9 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872442_PBMMC_1.h5ad')
print(sample_9)

sample_10 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872443_PBMMC_2.h5ad')
print(sample_10)

sample_11 = anndata.read_h5ad('/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE132509/GSM3872444_PBMMC_3.h5ad')
print(sample_11)

AnnData object with n_obs × n_vars = 2776 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 6274 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 3862 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 5069 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 3728 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 5013 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 2959 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 2748 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 1612 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 3105 × 33694
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 2229 × 33694
    var: 'gene_ids'
