### 1. General info of dataset GSE136929

This is the Jupyter Notebook for dataset GSE136929. Its dataset includes barcodes/genes/matrix files for each sample. There are 2 samples.


<span style="color:green">**[D90-rROs]**</span> Normal human retinal organoids at 90-days

<span style="color:green">**[D90-rRBOs]**</span> Human Rb organoids (hRBOs) at 90-days

In [1]:
# Environment setup
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as anndata
import scipy

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### 2. AnnData object of each sample

<span style="color:red">**IMPORTANT:**</span> rename files to get rid of prefixes

1. `barcodes.tsv`: cell barcodes, which go into `.obs`
2. `genes.tsv`: gene names, `.var`
3. `matrix.mtx`: the expression matrix, `.X`

In [16]:
from pathlib import Path

# Specify directory paths
data_directory = Path('/scratch/user/uqjsaxo1/xiaohan-john-project/data/GSE136929_RAW/')
write_directory = Path('/scratch/user/uqjsaxo1/xiaohan-john-project/write/GSE136929/')

# Loop through all files in the directory
for sample_directory in data_directory.iterdir():
    sample_name = sample_directory.stem
    sample_h5ad = sample_name + '_uni.h5ad'

    sample = sc.read_10x_mtx(
    sample_directory,
    var_names='gene_symbols',  
    cache=False
    )

    # Create an observation metric info to store related features
    obs_metrics = pd.DataFrame(index=sample.obs_names) ## Get the identifiers
    obs_metrics['cancer_type'] = 'Retinoblastoma'
    obs_metrics['dataset'] = 'GSE136929'
    obs_metrics['tissue'] = 'retinal organoid'
    obs_metrics['sample_name'] = sample_name
    obs_metrics['uni_barcode'] = obs_metrics['dataset'] + '_' + obs_metrics.index.astype(str)
    
    sample.obs = obs_metrics
    sample.obs.set_index("uni_barcode", drop=False, inplace=True)
    print(sample)

    # save the anndata object
    output_path = write_directory.joinpath(sample_h5ad)
    sample.write_h5ad(output_path, compression="gzip")

AnnData object with n_obs × n_vars = 9665 × 19997
    obs: 'cancer_type', 'dataset', 'tissue', 'sample_name', 'uni_barcode'
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 12218 × 19997
    obs: 'cancer_type', 'dataset', 'tissue', 'sample_name', 'uni_barcode'
    var: 'gene_ids'


### 3. Confirmation of created AnnData objects

In [18]:
from pathlib import Path

# Specify directory paths
write_directory = Path('/scratch/user/uqjsaxo1/xiaohan-john-project/write/GSE136929/')

# Loop through all files in the directory
for file in write_directory.iterdir():
    sample = anndata.read_h5ad(file)
    print(sample)

AnnData object with n_obs × n_vars = 12218 × 19997
    obs: 'cancer_type', 'dataset', 'tissue', 'sample_name', 'uni_barcode'
    var: 'gene_ids'
AnnData object with n_obs × n_vars = 9665 × 19997
    obs: 'cancer_type', 'dataset', 'tissue', 'sample_name', 'uni_barcode'
    var: 'gene_ids'
