# Data Structure for Keyence Data

Wenrui Wu, 2024-12-29

In [1]:
import sys
from io import StringIO

import pandas as pd

from pycodex.cls import MarkerMetadata

sys.path.append("..")
from src.preprocess import KeyencePreprocessor

## 01. Data Structure

### Output from Keyence platform

The images of different markers for each region are stored in a corresponding region folder (e.g., `final/reg001`). 

All markers are named in a uniform format: `reg[region number]_cyc[cycle number]_ch[channel number]_[marker name].tif`. Note: Ch1Cy1 is for DAPI on channel 1 of cycle 1.

```
final
├── reg001
│   ├── reg001_cyc001_ch001_Ch1Cy1.tif
│   ├── reg001_cyc001_ch003_Blank.tif
│   ├── reg001_cyc001_ch004_Blank.tif
│   └── ...
└── ...
```

### Input for this pipeline

First you need to organize your data from different TMAs. 

Put all the region folders under a root folder. Make sure that each region has a unique id, especially for regions from different TMAs. 

Recommendation: `[TMA id]_[region id]`, e.g., TMA544_reg001. 

```
dir_root
├── [unique region id]
│   ├── reg001_cyc001_ch001_Ch1Cy1.tif
│   ├── reg001_cyc001_ch003_Blank.tif
│   ├── reg001_cyc001_ch004_Blank.tif
│   └── ...
└── ...
```

## 02. Review Data

In [2]:
################################################################################
dir_root = "/mnt/nfs/storage/RCC/RCC_formal_CODEX/RCC_TMA001-run1/reg_4x5/images/final/"
################################################################################

metadatas = MarkerMetadata(dir_root)

Review the root directory: 

Ideally, there is only multiple folders under the root directory. 

In [3]:
metadatas.summary_dir()

Folders: ['reg001', 'reg002', 'reg003', 'reg004', 'reg005', 'reg006', 'reg007', 'reg008', 'reg009', 'reg010', 'reg011', 'reg012', 'reg013', 'reg014']


Organize the metadata of the marker images:

As the images of a specific region are stored under a subfolder (region folder), we set `subfolders=True`

In [4]:
metadatas.organize_metadata(platform="keyence", subfolders=True)
metadatas.summary_metadata()

Summary of Regions:
    - Total regions: 14 ['reg001', 'reg002', 'reg003', 'reg004', 'reg005', 'reg006', 'reg007', 'reg008', 'reg009', 'reg010', 'reg011', 'reg012', 'reg013', 'reg014']
Summary of Markers:
    - Total unique markers: 73
    - Unique markers: 72 ['ATP5A', 'C1Q', 'CA9', 'CD11b', 'CD11c', 'CD138', 'CD16', 'CD163', 'CD20', 'CD28', 'CD31', 'CD3e', 'CD4', 'CD45', 'CD45RA', 'CD45RO', 'CD56', 'CD57', 'CD68', 'CD69', 'CD8', 'CD86', 'Ch1Cy1', 'Ch1Cy10', 'Ch1Cy11', 'Ch1Cy12', 'Ch1Cy13', 'Ch1Cy14', 'Ch1Cy15', 'Ch1Cy16', 'Ch1Cy17', 'Ch1Cy18', 'Ch1Cy19', 'Ch1Cy2', 'Ch1Cy20', 'Ch1Cy21', 'Ch1Cy22', 'Ch1Cy23', 'Ch1Cy26', 'Ch1Cy27', 'Ch1Cy28', 'Ch1Cy3', 'Ch1Cy4', 'Ch1Cy5', 'Ch1Cy6', 'Ch1Cy7', 'Ch1Cy8', 'Ch1Cy9', 'Cytokeratin', 'DIG_TREM2', 'FoxP3', 'G6PD', 'GLUT1', 'GranzymeB', 'HLA1', 'IDO-1', 'IFN-y', 'Ki-67', 'LAG-3', 'MPO', 'NaKATP', 'P53', 'PD-1', 'PDL1_100_500ms', 'PDL1_50_250ms', 'Podoplanin', 'T-bet', 'TCF1_7', 'Tim-3', 'Tox_Tox2', 'VDAC1', 'aSMA']
    - Blank markers: 1 ['Blank'

Review the regions and markers you have: 

Recommand to copy the names of the regions and markers printed here to avoid typo (as invalid names are renamed)

In [5]:
metadatas.display_items(metadatas.regions)
metadatas.display_items(metadatas.unique_markers, ncol=5)

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
0,reg001,reg002,reg003,reg004,reg005,reg006,reg007,reg008,reg009,reg010
1,reg011,reg012,reg013,reg014,,,,,,


Unnamed: 0,1,2,3,4,5
0,ATP5A,C1Q,CA9,CD11b,CD11c
1,CD138,CD16,CD163,CD20,CD28
2,CD31,CD3e,CD4,CD45,CD45RA
3,CD45RO,CD56,CD57,CD68,CD69
4,CD8,CD86,Ch1Cy1,Ch1Cy10,Ch1Cy11
5,Ch1Cy12,Ch1Cy13,Ch1Cy14,Ch1Cy15,Ch1Cy16
6,Ch1Cy17,Ch1Cy18,Ch1Cy19,Ch1Cy2,Ch1Cy20
7,Ch1Cy21,Ch1Cy22,Ch1Cy23,Ch1Cy26,Ch1Cy27
8,Ch1Cy28,Ch1Cy3,Ch1Cy4,Ch1Cy5,Ch1Cy6
9,Ch1Cy7,Ch1Cy8,Ch1Cy9,Cytokeratin,DIG_TREM2


## 03. Output Data for Review

There are multiple DAPI images from the Keyence platform. Some of them are full of artifacts and some of them are clean. So you need to select the best DAPI for downstream analysis. Also, you can select, rename, and order your markers in a specific and reasonable order. 


### Output of this step

- `metadata_dapi.csv`: `region` column for all regions, and `dapi` column for DAPI you select, fill in the marker names displayed above (`Ch1Cy1`, rather than `reg001_cyc001_ch001_Ch1Cy1`) 

- Multiple OME-TIFF files with all DAPI markers for each region.

- `metadata_marker.csv`: metadata of all the markers except for DAPI. You need to:

    - Remove the rows of the DAPI images that are not included in the final OME-TIFF file

    - Fill the `channel_name` column for each marker, which is the names shown in the final OME-TIFF file (e.g., rename `DIG_TREM2` to `TREM2`)
    
    - Modify the order of the rows, which will be the order of the markers in the final OME-TIFF file

In [6]:
################################################################################
dir_root = "/mnt/nfs/storage/RCC/RCC_formal_CODEX/RCC_TMA001-run1/reg_4x5/images/final/"
dir_output_review = "/mnt/nfs/storage/wenruiwu_temp/pipeline/keyence/01_preprocess"
################################################################################

keyence = KeyencePreprocessor(dir_root)

Summary of Regions:
    - Total regions: 14 ['reg001', 'reg002', 'reg003', 'reg004', 'reg005', 'reg006', 'reg007', 'reg008', 'reg009', 'reg010', 'reg011', 'reg012', 'reg013', 'reg014']
Summary of Markers:
    - Total unique markers: 73
    - Unique markers: 72 ['ATP5A', 'C1Q', 'CA9', 'CD11b', 'CD11c', 'CD138', 'CD16', 'CD163', 'CD20', 'CD28', 'CD31', 'CD3e', 'CD4', 'CD45', 'CD45RA', 'CD45RO', 'CD56', 'CD57', 'CD68', 'CD69', 'CD8', 'CD86', 'Ch1Cy1', 'Ch1Cy10', 'Ch1Cy11', 'Ch1Cy12', 'Ch1Cy13', 'Ch1Cy14', 'Ch1Cy15', 'Ch1Cy16', 'Ch1Cy17', 'Ch1Cy18', 'Ch1Cy19', 'Ch1Cy2', 'Ch1Cy20', 'Ch1Cy21', 'Ch1Cy22', 'Ch1Cy23', 'Ch1Cy26', 'Ch1Cy27', 'Ch1Cy28', 'Ch1Cy3', 'Ch1Cy4', 'Ch1Cy5', 'Ch1Cy6', 'Ch1Cy7', 'Ch1Cy8', 'Ch1Cy9', 'Cytokeratin', 'DIG_TREM2', 'FoxP3', 'G6PD', 'GLUT1', 'GranzymeB', 'HLA1', 'IDO-1', 'IFN-y', 'Ki-67', 'LAG-3', 'MPO', 'NaKATP', 'P53', 'PD-1', 'PDL1_100_500ms', 'PDL1_50_250ms', 'Podoplanin', 'T-bet', 'TCF1_7', 'Tim-3', 'Tox_Tox2', 'VDAC1', 'aSMA']
    - Blank markers: 1 ['Blank'

In [7]:
# Generate DAPI OME-TIFF and metadata
keyence.export_dapi_ometiff_and_metadata(dir_output_review)

  0%|          | 0/14 [00:00<?, ?it/s]

Exporting DAPI OME-TIFF for: reg001


Loading images: 100%|██████████| 26/26 [00:00<00:00, 496.16it/s]
Writing images: 100%|██████████| 6/6 [00:22<00:00,  3.70s/it]
  7%|▋         | 1/14 [00:22<04:51, 22.44s/it]


Exporting DAPI OME-TIFF for: reg002


Loading images: 100%|██████████| 26/26 [00:00<00:00, 498.28it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.66s/it]
 14%|█▍        | 2/14 [00:44<04:27, 22.30s/it]


Exporting DAPI OME-TIFF for: reg003


Loading images: 100%|██████████| 26/26 [00:00<00:00, 559.75it/s]
Writing images: 100%|██████████| 6/6 [00:25<00:00,  4.23s/it]
 21%|██▏       | 3/14 [01:10<04:22, 23.84s/it]


Exporting DAPI OME-TIFF for: reg004


Loading images: 100%|██████████| 26/26 [00:00<00:00, 599.09it/s]
Writing images: 100%|██████████| 6/6 [00:22<00:00,  3.80s/it]
 29%|██▊       | 4/14 [01:33<03:55, 23.52s/it]


Exporting DAPI OME-TIFF for: reg005


Loading images: 100%|██████████| 26/26 [00:00<00:00, 567.82it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.60s/it]
 36%|███▌      | 5/14 [01:55<03:26, 22.92s/it]


Exporting DAPI OME-TIFF for: reg006


Loading images: 100%|██████████| 26/26 [00:00<00:00, 586.04it/s]
Writing images: 100%|██████████| 6/6 [00:23<00:00,  3.84s/it]
 43%|████▎     | 6/14 [02:18<03:04, 23.04s/it]


Exporting DAPI OME-TIFF for: reg007


Loading images: 100%|██████████| 26/26 [00:00<00:00, 577.47it/s]
Writing images: 100%|██████████| 6/6 [00:22<00:00,  3.76s/it]
 50%|█████     | 7/14 [02:41<02:40, 22.96s/it]


Exporting DAPI OME-TIFF for: reg008


Loading images: 100%|██████████| 26/26 [00:00<00:00, 468.53it/s]
Writing images: 100%|██████████| 6/6 [00:24<00:00,  4.02s/it]
 57%|█████▋    | 8/14 [03:05<02:20, 23.43s/it]


Exporting DAPI OME-TIFF for: reg009


Loading images: 100%|██████████| 26/26 [00:00<00:00, 576.55it/s]
Writing images: 100%|██████████| 6/6 [00:27<00:00,  4.58s/it]
 64%|██████▍   | 9/14 [03:33<02:03, 24.77s/it]


Exporting DAPI OME-TIFF for: reg010


Loading images: 100%|██████████| 26/26 [00:00<00:00, 575.91it/s]
Writing images: 100%|██████████| 6/6 [00:24<00:00,  4.14s/it]
 71%|███████▏  | 10/14 [03:58<01:39, 24.86s/it]


Exporting DAPI OME-TIFF for: reg011


Loading images: 100%|██████████| 26/26 [00:00<00:00, 559.92it/s]
Writing images: 100%|██████████| 6/6 [00:29<00:00,  4.94s/it]
 79%|███████▊  | 11/14 [04:28<01:19, 26.41s/it]


Exporting DAPI OME-TIFF for: reg012


Loading images: 100%|██████████| 26/26 [00:00<00:00, 603.83it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.65s/it]
 86%|████████▌ | 12/14 [04:51<00:50, 25.30s/it]


Exporting DAPI OME-TIFF for: reg013


Loading images: 100%|██████████| 26/26 [00:03<00:00,  6.96it/s]
Writing images: 100%|██████████| 6/6 [00:33<00:00,  5.62s/it]
 93%|█████████▎| 13/14 [05:29<00:29, 29.11s/it]


Exporting DAPI OME-TIFF for: reg014


Loading images: 100%|██████████| 26/26 [00:00<00:00, 563.49it/s]
Writing images: 100%|██████████| 6/6 [00:23<00:00,  3.85s/it]
100%|██████████| 14/14 [05:52<00:00, 25.17s/it]







## 04. Organize Data

In [8]:
################################################################################
# copy from Excel
string_metadata_dapi = """
region	dapi
reg001	Ch1Cy1
reg002	Ch1Cy2
reg003	Ch1Cy3
reg004	Ch1Cy1
reg005	Ch1Cy2
reg006	Ch1Cy3
reg007	Ch1Cy1
reg008	Ch1Cy2
reg009	Ch1Cy3
reg010	Ch1Cy1
reg011	Ch1Cy2
reg012	Ch1Cy3
reg013	Ch1Cy1
reg014	Ch1Cy2
"""

string_metadata_marker = """
marker	channel_name
CD45	CD45
CD3e	CD3e
CD8	CD8
CD4	CD4
CD45RO	CD45RO
CD45RA	CD45RA
CD69	CD69
CD57	CD57
CD56	CD56
FoxP3	FoxP3
CD28	CD28
CD86	CD86
T-bet	T-bet
TCF1_7	TCF1_7
IFN-y	IFN-y
GranzymeB	GranzymeB
Tox_Tox2	Tox_Tox2
Tim-3	Tim-3
PD-1	PD-1
LAG-3	LAG-3
CD20	CD20
CD138	CD138
CD68	CD68
DIG_TREM2	TREM2
CD163	CD163
CD16	CD16
CD11b	CD11b
CD11c	CD11c
MPO	MPO
IDO-1	IDO-1
PDL1_100_500ms	PD-L1
CA9	CA9
Cytokeratin	Cytokeratin
HLA1	HLA1
Ki-67	Ki-67
P53	P53
CD31	CD31
Podoplanin	Podoplanin
aSMA	aSMA
NaKATP	NaKATP
VDAC1	VDAC1
ATP5A	ATP5A
GLUT1	GLUT1
G6PD	G6PD
C1Q	C1Q
"""
################################################################################

df_metadata_dapi = pd.read_csv(StringIO(string_metadata_dapi), sep="\t")
df_metadata_marker = pd.read_csv(StringIO(string_metadata_marker), sep="\t")

In [9]:
################################################################################
dir_output_ometiff = "/mnt/nfs/storage/wenruiwu_temp/pipeline/keyence/02_ometiff"
################################################################################

# Generate final OME-TIFF
keyence.export_ometiff(dir_output_ometiff, df_metadata_dapi, df_metadata_marker)

  0%|          | 0/14 [00:00<?, ?it/s]

Exporting OME-TIFF for: reg001


Loading images: 100%|██████████| 46/46 [00:00<00:00, 539.19it/s]
Writing images: 100%|██████████| 6/6 [00:26<00:00,  4.42s/it]
  7%|▋         | 1/14 [00:26<05:48, 26.79s/it]


Exporting OME-TIFF for: reg002


Loading images: 100%|██████████| 46/46 [00:00<00:00, 525.71it/s]
Writing images: 100%|██████████| 6/6 [00:29<00:00,  4.98s/it]
 14%|█▍        | 2/14 [00:57<05:51, 29.31s/it]


Exporting OME-TIFF for: reg003


Loading images: 100%|██████████| 46/46 [00:05<00:00,  8.09it/s]
Writing images: 100%|██████████| 6/6 [00:32<00:00,  5.43s/it]
 21%|██▏       | 3/14 [01:36<06:09, 33.63s/it]


Exporting OME-TIFF for: reg004


Loading images: 100%|██████████| 46/46 [00:00<00:00, 589.18it/s]
Writing images: 100%|██████████| 6/6 [00:40<00:00,  6.67s/it]
 29%|██▊       | 4/14 [02:17<06:06, 36.63s/it]


Exporting OME-TIFF for: reg005


Loading images: 100%|██████████| 46/46 [00:06<00:00,  6.80it/s]
Writing images: 100%|██████████| 6/6 [00:30<00:00,  5.13s/it]
 36%|███▌      | 5/14 [02:55<05:33, 37.10s/it]


Exporting OME-TIFF for: reg006


Loading images: 100%|██████████| 46/46 [00:00<00:00, 223.41it/s]
Writing images: 100%|██████████| 6/6 [00:38<00:00,  6.39s/it]
 43%|████▎     | 6/14 [03:35<05:03, 37.98s/it]


Exporting OME-TIFF for: reg007


Loading images: 100%|██████████| 46/46 [00:06<00:00,  7.25it/s]
Writing images: 100%|██████████| 6/6 [00:28<00:00,  4.77s/it]
 50%|█████     | 7/14 [04:10<04:19, 37.09s/it]


Exporting OME-TIFF for: reg008


Loading images: 100%|██████████| 46/46 [00:00<00:00, 550.25it/s]
Writing images: 100%|██████████| 6/6 [00:26<00:00,  4.38s/it]
 57%|█████▋    | 8/14 [04:37<03:22, 33.72s/it]


Exporting OME-TIFF for: reg009


Loading images: 100%|██████████| 46/46 [00:00<00:00, 651.26it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.52s/it]
 64%|██████▍   | 9/14 [04:58<02:29, 29.83s/it]


Exporting OME-TIFF for: reg010


Loading images: 100%|██████████| 46/46 [00:00<00:00, 586.24it/s]
Writing images: 100%|██████████| 6/6 [00:23<00:00,  3.84s/it]
 71%|███████▏  | 10/14 [05:21<01:51, 27.78s/it]


Exporting OME-TIFF for: reg011


Loading images: 100%|██████████| 46/46 [00:00<00:00, 673.04it/s]
Writing images: 100%|██████████| 6/6 [00:27<00:00,  4.54s/it]
 79%|███████▊  | 11/14 [05:49<01:23, 27.68s/it]


Exporting OME-TIFF for: reg012


Loading images: 100%|██████████| 46/46 [00:00<00:00, 588.26it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.64s/it]
 86%|████████▌ | 12/14 [06:11<00:51, 25.95s/it]


Exporting OME-TIFF for: reg013


Loading images: 100%|██████████| 46/46 [00:00<00:00, 539.41it/s]
Writing images: 100%|██████████| 6/6 [00:21<00:00,  3.66s/it]
 93%|█████████▎| 13/14 [06:33<00:24, 24.78s/it]


Exporting OME-TIFF for: reg014


Loading images: 100%|██████████| 46/46 [00:00<00:00, 597.71it/s]
Writing images: 100%|██████████| 6/6 [00:24<00:00,  4.12s/it]
100%|██████████| 14/14 [06:58<00:00, 29.87s/it]







# 05. Review Output

A OME-TIFF file for each region is exported under directory for each region. 

In [10]:
!tree $dir_output_ometiff

[01;34m/mnt/nfs/storage/wenruiwu_temp/pipeline/keyence/02_ometiff[00m
├── metadata_dapi.csv
├── metadata_marker.csv
├── [01;34mreg001[00m
│   └── reg001.ome.tiff
├── [01;34mreg002[00m
│   └── reg002.ome.tiff
├── [01;34mreg003[00m
│   └── reg003.ome.tiff
├── [01;34mreg004[00m
│   └── reg004.ome.tiff
├── [01;34mreg005[00m
│   └── reg005.ome.tiff
├── [01;34mreg006[00m
│   └── reg006.ome.tiff
├── [01;34mreg007[00m
│   └── reg007.ome.tiff
├── [01;34mreg008[00m
│   └── reg008.ome.tiff
├── [01;34mreg009[00m
│   └── reg009.ome.tiff
├── [01;34mreg010[00m
│   └── reg010.ome.tiff
├── [01;34mreg011[00m
│   └── reg011.ome.tiff
├── [01;34mreg012[00m
│   └── reg012.ome.tiff
├── [01;34mreg013[00m
│   └── reg013.ome.tiff
└── [01;34mreg014[00m
    └── reg014.ome.tiff

14 directories, 16 files
