### 1. General info of dataset GSE148218

This is the Jupyter Notebook for dataset GSE148218. Its dataset includes barcodes/genes/matrix files for each sample.

Thus, we need to simply incorparate these barcodes/genes/matrix files and generate an AnnData object for each sample. 
In total, there are 8 acute lymphoblastic leukemia (ALL) samples.


In [1]:
# Environment setup
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as anndata
import scipy

### 2. AnnData object of each sample

<span style="color:red">**IMPORTANT:**</span> rename files to get rid of prefixes; rename features.tsv to genes.tsv

1. `barcodes.tsv`: cell barcodes, which go into `.obs`
2. `genes.tsv`: gene names, `.var`
3. `matrix.mtx`: the expression matrix, `.X`

In [9]:
general_input_path = '/scratch/user/s4543064/Xiaohan_Summer_Research/data/GSE148218/GSM445625'
general_output_path = '/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM445625'

for i in range(1, 9):
    j = ''
    if i == 1:
        j += '1'
    elif i == 2:
        j += '3'
    elif i == 3:
        j += '8'
    elif i == 4:
        j += '9'
    elif i == 5:
        j += '10'
    elif i == 6:
        j += '12'
    elif i == 7:
        j += '10-d15'
    elif i == 8:
        j += '12-d15'

    actual_input_path = general_input_path + str(i) + '_ALL' + j
    actual_output_path = general_output_path + str(i) + '_ALL' + j + '.h5ad'

    sample = sc.read_10x_mtx(
        actual_input_path,
        var_names='gene_symbols',  
        cache=False
    )
    print(j)
    print(sample)
    
    # save the anndata object
    sample.write_h5ad(actual_output_path, compression="gzip")

1
AnnData object with n_obs × n_vars = 7228 × 32738
    var: 'gene_ids'
3
AnnData object with n_obs × n_vars = 6123 × 32738
    var: 'gene_ids'
8
AnnData object with n_obs × n_vars = 4163 × 32738
    var: 'gene_ids'
9
AnnData object with n_obs × n_vars = 7826 × 32738
    var: 'gene_ids'
10
AnnData object with n_obs × n_vars = 7160 × 32738
    var: 'gene_ids'
12
AnnData object with n_obs × n_vars = 4224 × 32738
    var: 'gene_ids'
10-d15
AnnData object with n_obs × n_vars = 7917 × 32738
    var: 'gene_ids'
12-d15
AnnData object with n_obs × n_vars = 6106 × 32738
    var: 'gene_ids'


### 3. Confirmation of created AnnData objects

In [10]:
import os

# Specify the directory path
directory = '/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218'

# Loop through every file in the directory
for filename in os.listdir(directory):
    # Get the full path of the file
    filepath = os.path.join(directory, filename)
    # Check if the file is a regular file (not a directory)
    if os.path.isfile(filepath):
        # Process the file
        print(filepath)
        sample = anndata.read_h5ad(filepath)
        print(sample)


/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456256_ALL12.h5ad
AnnData object with n_obs × n_vars = 4224 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456255_ALL10.h5ad
AnnData object with n_obs × n_vars = 7160 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456258_ALL12-d15.h5ad
AnnData object with n_obs × n_vars = 6106 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456251_ALL1.h5ad
AnnData object with n_obs × n_vars = 7228 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456253_ALL8.h5ad
AnnData object with n_obs × n_vars = 4163 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM4456254_ALL9.h5ad
AnnData object with n_obs × n_vars = 7826 × 32738
    var: 'gene_ids'
/scratch/user/s4543064/Xiaohan_Summer_Research/write/GSE148218/GSM445