### Notebook for the processing of single cell data from the damaged human heart.

- **Developed by**: Carlos Talavera-López Ph.D
- **Institute of AI for Health, HelmholtzZentrum münchen**
- v210830

- Data was downloaded from NCBI GEO using the `GSE109816` and `GSE121893` accession numbers.
- The publication is linked [here](https://www.nature.com/articles/s41556-019-0446-7).

### Load required modules

In [1]:
import anndata
import numpy as np
import pandas as pd
import scanpy as sc

### Set up working environment

In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 200, color_map = 'RdPu', dpi_save = 300, vector_friendly = True, format = 'svg')

The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info.
-----
anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.2.0
anyio               NA
appnope             0.1.2
attr                20.3.0
babel               2.9.0
backcall            0.2.0
bottleneck          1.3.2
brotli              NA
cairo               1.20.0
certifi             2020.12.05
cffi                1.14.5
chardet             4.0.0
cloudpickle         1.6.0
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
cytoolz             0.11.0
dask  

### Read in datasets

- **GSE109816**

In [5]:
GSE109816_raw = sc.read_csv('/Volumes/TIGERII/My Drive/INBOX/heart/GSE109816/GSE109816_normal_heart_umi_matrix.csv.gz').T
GSE109816_raw

In [None]:
GSE109816_raw.obs_names

Index(['SC_92563_0_69', 'SC_92563_0_17', 'SC_92563_0_23', 'SC_92563_0_12',
       'SC_92563_0_18', 'SC_92563_0_19', 'SC_92563_0_20', 'SC_92563_0_14',
       'SC_92563_1_45', 'SC_92563_1_46',
       ...
       'SC_97502_70_62', 'SC_97502_71_55', 'SC_97502_70_2', 'SC_97502_24_52',
       'SC_97502_34_40', 'SC_97502_32_60', 'SC_97502_66_39', 'SC_97502_30_13',
       'SC_97502_32_1', 'SC_97502_30_69'],
      dtype='object', length=9994)

In [None]:
GSE109816_raw.var_names

Index(['TSPAN6', 'TNMD', 'DPM1', 'SCYL3', 'C1orf112', 'FGR', 'CFH', 'FUCA2',
       'GCLC', 'NFYA',
       ...
       'RP5-1182A14.7', 'RP11-539L10.5', 'RP11-490B18.8', 'RP11-555E9.1',
       'RP11-753C18.12', 'AC126544.4', 'RP11-151A10.3', 'LLNLR-222A1.1',
       'AC008993.3', 'RP13-147D17.3'],
      dtype='object', length=54750)

- Read in metadata for `GSE109816`

In [None]:
GSE109816_meta = pd.read_csv('/Volumes/TIGERII/My Drive/INBOX/heart/GSE109816//GSE109816_normal_heart_cell_info.txt.gz', sep = '\t', index_col = 0)
GSE109816_meta.head()

Unnamed: 0_level_0,Barcode,Type,Individual,Age,Gender,Dispense.Order,X384.Well.Plate.Location,Chip.Row.ID,Chip.Column.ID,Image.ID,...,Chimera,Duplicate,FragementLength,MappingQuality,MultiMapping,NoFeatures,Nonjunction,Secondary,Unmapped,mito.perc
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
SC_92563_0_69,AACCAACCAAG,N_LA_NCM,N6,21,M,2,A2,0,69,True,...,0,0,0,0,0,159,0,0,0,0.35
SC_92563_0_17,AACCAAGATTC,N_LA_NCM,N6,21,M,2,A2,0,17,True,...,0,0,0,0,0,47715,0,0,0,0.19
SC_92563_0_23,AACCAAGCAGT,N_LA_NCM,N6,21,M,2,A2,0,23,True,...,0,0,0,0,0,10959,0,0,0,0.39
SC_92563_0_12,AACCAAGCCTG,N_LA_NCM,N6,21,M,2,A2,0,12,True,...,0,0,0,0,0,47522,0,0,0,0.19
SC_92563_0_18,AACCAAGCTAA,N_LA_NCM,N6,21,M,2,A2,0,18,True,...,0,0,0,0,0,5225,0,0,0,0.12


In [None]:
GSE109816_meta.shape

(9994, 33)

- Add metadata to `anndata` object 

In [None]:
GSE109816_raw.obs = GSE109816_meta.copy()
GSE109816_raw

AnnData object with n_obs × n_vars = 9994 × 54750
    obs: 'Barcode', 'Type', 'Individual', 'Age', 'Gender', 'Dispense.Order', 'X384.Well.Plate.Location', 'Chip.Row.ID', 'Chip.Column.ID', 'Image.ID', 'Barcode.Read.Pairs', 'Distinct.UMIs', 'ERCC.Read.Pairs', 'Trimmed.Read.Pairs', 'NoContam.Read.Pairs', 'Mitochondria.Alignments', 'Mitochondria.Read.Pairs', 'Total.Barcode.Alignments', 'Distinct.Genes.w..Alignments', 'Distinct.Gene.UMI.Combos', 'Aligned', 'Assigned', 'Ambiguity', 'Chimera', 'Duplicate', 'FragementLength', 'MappingQuality', 'MultiMapping', 'NoFeatures', 'Nonjunction', 'Secondary', 'Unmapped', 'mito.perc'

In [9]:
GSE109816_raw.obs['Type'] = GSE109816_raw.obs['Type'].astype('category')
GSE109816_raw.obs['Type'].cat.categories

Index(['N_LA_CM', 'N_LA_NCM', 'N_LV_CM', 'N_LV_NCM'], dtype='object')

In [10]:
GSE109816_raw

AnnData object with n_obs × n_vars = 9994 × 54750
    obs: 'Barcode', 'Type', 'Individual', 'Age', 'Gender', 'Dispense.Order', 'X384.Well.Plate.Location', 'Chip.Row.ID', 'Chip.Column.ID', 'Image.ID', 'Barcode.Read.Pairs', 'Distinct.UMIs', 'ERCC.Read.Pairs', 'Trimmed.Read.Pairs', 'NoContam.Read.Pairs', 'Mitochondria.Alignments', 'Mitochondria.Read.Pairs', 'Total.Barcode.Alignments', 'Distinct.Genes.w..Alignments', 'Distinct.Gene.UMI.Combos', 'Aligned', 'Assigned', 'Ambiguity', 'Chimera', 'Duplicate', 'FragementLength', 'MappingQuality', 'MultiMapping', 'NoFeatures', 'Nonjunction', 'Secondary', 'Unmapped', 'mito.perc'

- **GSE121893**

In [11]:
GSE121893_raw = sc.read_csv('/Users/carlos.lopez/INBOX/heart/GSE121893/GSE121893_human_heart_sc_umi.csv.gz').T
GSE121893_raw

AnnData object with n_obs × n_vars = 4933 × 25742

In [12]:
GSE121893_raw.obs_names

Index(['SC_96279_36_11', 'SC_96279_69_44', 'SC_96279_31_1', 'SC_96279_69_27',
       'SC_96279_36_65', 'SC_96279_29_29', 'SC_96279_65_35', 'SC_96279_69_0',
       'SC_96279_62_3', 'SC_96279_29_23',
       ...
       'SC_105235_37_21', 'SC_105235_38_15', 'SC_105235_37_66',
       'SC_105235_37_16', 'SC_105235_6_12', 'SC_105235_33_4',
       'SC_105235_39_66', 'SC_105235_4_18', 'SC_105235_37_40',
       'SC_105235_38_40'],
      dtype='object', length=4933)

In [13]:
GSE121893_raw.var_names

Index(['TSPAN6', 'DPM1', 'SCYL3', 'FGR', 'CFH', 'FUCA2', 'GCLC', 'NFYA',
       'STPG1', 'NIPAL3',
       ...
       'AC093642.6', 'RNF5P1', 'RP11-221N13.3', 'MEG8', 'CTD-2636A23.2',
       'WI2-87327B8.1', 'RP11-385F5.5', 'U47924.32', 'RP4-633H17.2',
       'RP11-367J11.2'],
      dtype='object', length=25742)

- Read in metadata for `GSE121893`

In [14]:
GSE121893_meta = pd.read_csv('/Users/carlos.lopez/INBOX/heart/GSE121893/GSE121893_all_heart_cell_cluster_info.txt.gz', sep = '\t', index_col = 0)
GSE121893_meta.head()

Unnamed: 0_level_0,nGene,nUMI,condition,group,sample,ident,Age
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
SC_92563_0_17,2004,35539,N_LA,NCM,N6,EC1,21
SC_92563_0_19,1455,16525,N_LA,NCM,N6,EC4,21
SC_92563_0_14,2050,32416,N_LA,NCM,N6,EC7,21
SC_92563_2_64,919,9016,N_LA,NCM,N6,EC2,21
SC_92563_2_70,742,7513,N_LA,NCM,N6,EC7,21


In [15]:
GSE121893_meta.shape

(11377, 7)

In [16]:
GSE121893_meta = GSE121893_meta.loc[GSE121893_raw.obs_names]
GSE121893_meta.shape

(4933, 7)

In [17]:
GSE121893_raw.obs = GSE121893_meta.copy()
GSE121893_raw

AnnData object with n_obs × n_vars = 4933 × 25742
    obs: 'nGene', 'nUMI', 'condition', 'group', 'sample', 'ident', 'Age'

### Export objects

In [18]:
GSE109816_raw.write('/Users/carlos.lopez/INBOX/heart/GSE109816/GSE109816.iCell8.ctl210830.raw.h5ad')

... storing 'Barcode' as categorical
... storing 'Individual' as categorical
... storing 'Gender' as categorical
... storing 'X384.Well.Plate.Location' as categorical


In [19]:
GSE121893_raw.write('/Users/carlos.lopez/INBOX/heart/GSE121893/GSE121893.iCell8.ctl210830.raw.h5ad')

... storing 'condition' as categorical
... storing 'group' as categorical
... storing 'sample' as categorical
... storing 'ident' as categorical
