### 1. General info of dataset GSE236351

This is the Jupyter Notebook for dataset GSE236351. Its dataset includes barcodes/genes/matrix files for each sample.

Thus, we need to simply incorparate these barcodes/genes/matrix files and generate an AnnData object for each sample. 
In total, there are 7 Mixed-Phenotype Acute Leukemia (MPAL) samples.


In [1]:
# Environment setup
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as anndata
import scipy

### 2. AnnData object of each sample

<span style="color:red">**IMPORTANT:**</span> rename files to get rid of prefixes; rename features.tsv to genes.tsv

1. `barcodes.tsv`: cell barcodes, which go into `.obs`
2. `genes.tsv`: gene names, `.var`
3. `matrix.mtx`: the expression matrix, `.X`

In [16]:
general_input_path = '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE236351/GSM75283'
general_output_path = '/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE236351/GSM75283'

for i, j in zip(range(25, 32), range(1, 8)):
    actual_input_path = general_input_path + str(i) + '_M' + str(j)
    actual_output_path = general_output_path + str(i) + '_M' + str(j) + '.h5ad'

    sample = sc.read_10x_mtx(
        actual_input_path,
        var_names='gene_symbols',  
        cache=False
    )
    print(sample)
    
    # save the anndata object
    sample.write_h5ad(actual_output_path, compression="gzip")

AnnData object with n_obs × n_vars = 4293 × 17950
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 6523 × 36601
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 2942 × 17950
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 3731 × 17950
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 3726 × 17950
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 1712 × 17950
    var: 'gene_ids'

AnnData object with n_obs × n_vars = 6916 × 36601
    var: 'gene_ids'



### 3. Confirmation of created AnnData objects

In [18]:
general_output_path = '/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE236351/GSM75283'
for i, j in zip(range(25, 32), range(1, 8)):
    actual_output_path = general_output_path + str(i) + '_M' + str(j) + '.h5ad'
    sample = anndata.read_h5ad(actual_output_path)
    print(sample)

AnnData object with n_obs × n_vars = 4293 × 17950
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 6523 × 36601
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 2942 × 17950
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 3731 × 17950
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 3726 × 17950
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 1712 × 17950
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 6916 × 36601
    var: 'gene_ids'
