# Data Structure for Keyence Data

The images of different markers for each region are stored in corresponding folder (e.g., `final/reg001`). 

All markers are named in a uniform format: `reg[region number]_cyc[cycle number]_ch[channel number]_[marker name].tif`. Note: Ch1Cy1 is for DAPI on channel 1 of cycle 1.

```
final
├── reg001
│   ├── reg001_cyc001_ch001_Ch1Cy1.tif
│   ├── reg001_cyc001_ch003_Blank.tif
│   ├── reg001_cyc001_ch004_Blank.tif
│   └── ...
└── ...
```

## 01. Organize Data

First you need to organize your data from different TMAs. 

Put the region folder under a root folder. Make sure that each region has a unique id, especially for regions from different TMA. 

Recommendation: `[TMA id]_[region id]`, e.g., TMA544_reg001. 

```
dir_root
├── [unique region id]
│   ├── reg001_cyc001_ch001_Ch1Cy1.tif
│   ├── reg001_cyc001_ch003_Blank.tif
│   ├── reg001_cyc001_ch004_Blank.tif
│   └── ...
└── ...
```

## 02. Review Data

In [1]:
# conda activate cellSeg_test 
from pycodex.cls import Marker
from pycodex.utils import display_items

dir_root = "/mnt/nfs/storage/RCC/RCC_formal_CODEX/RCC_TMA001-run1/reg_4x5/images/final/"
markers = Marker(dir_root)

In [2]:
# Review the root directory
# Ideally, there is only multiple folders under the root directory
markers.summary_dir()

Folders: ['reg001', 'reg002', 'reg003', 'reg004', 'reg005', 'reg006', 'reg007', 'reg008', 'reg009', 'reg010', 'reg011', 'reg012', 'reg013', 'reg014']


In [3]:
# Organize the metadata of the marker images
# As the images of a specific region are stored under a subfolder (region folder), we set `subfolders=True`
markers.organize_metadata(platform="keyence", subfolders=True)
markers.summary_metadata()

Summary of Regions:
    - Total regions: 14 ['reg001', 'reg002', 'reg003', 'reg004', 'reg005', 'reg006', 'reg007', 'reg008', 'reg009', 'reg010', 'reg011', 'reg012', 'reg013', 'reg014']
Summary of Markers:
    - Total unique markers: 73
    - Unique markers: 72 ['ATP5A', 'C1Q', 'CA9', 'CD11b', 'CD11c', 'CD138', 'CD16', 'CD163', 'CD20', 'CD28', 'CD31', 'CD3e', 'CD4', 'CD45', 'CD45RA', 'CD45RO', 'CD56', 'CD57', 'CD68', 'CD69', 'CD8', 'CD86', 'Ch1Cy1', 'Ch1Cy10', 'Ch1Cy11', 'Ch1Cy12', 'Ch1Cy13', 'Ch1Cy14', 'Ch1Cy15', 'Ch1Cy16', 'Ch1Cy17', 'Ch1Cy18', 'Ch1Cy19', 'Ch1Cy2', 'Ch1Cy20', 'Ch1Cy21', 'Ch1Cy22', 'Ch1Cy23', 'Ch1Cy26', 'Ch1Cy27', 'Ch1Cy28', 'Ch1Cy3', 'Ch1Cy4', 'Ch1Cy5', 'Ch1Cy6', 'Ch1Cy7', 'Ch1Cy8', 'Ch1Cy9', 'Cytokeratin', 'DIG_TREM2', 'FoxP3', 'G6PD', 'GLUT1', 'GranzymeB', 'HLA1', 'IDO-1', 'IFN-y', 'Ki-67', 'LAG-3', 'MPO', 'NaKATP', 'P53', 'PD-1', 'PDL1_100_500ms', 'PDL1_50_250ms', 'Podoplanin', 'T-bet', 'TCF1_7', 'Tim-3', 'Tox_Tox2', 'VDAC1', 'aSMA']
    - Blank markers: 1 ['Blank'

In [4]:
# Review the regions and markers you have
# Recommand to copy the names of the regions and markers printed here to avoid typo (as invalid names are renamed)
display_items(markers.regions)
display_items(markers.unique_markers, ncol=5)

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
0,reg001,reg002,reg003,reg004,reg005,reg006,reg007,reg008,reg009,reg010
1,reg011,reg012,reg013,reg014,,,,,,


Unnamed: 0,1,2,3,4,5
0,ATP5A,C1Q,CA9,CD11b,CD11c
1,CD138,CD16,CD163,CD20,CD28
2,CD31,CD3e,CD4,CD45,CD45RA
3,CD45RO,CD56,CD57,CD68,CD69
4,CD8,CD86,Ch1Cy1,Ch1Cy10,Ch1Cy11
5,Ch1Cy12,Ch1Cy13,Ch1Cy14,Ch1Cy15,Ch1Cy16
6,Ch1Cy17,Ch1Cy18,Ch1Cy19,Ch1Cy2,Ch1Cy20
7,Ch1Cy21,Ch1Cy22,Ch1Cy23,Ch1Cy26,Ch1Cy27
8,Ch1Cy28,Ch1Cy3,Ch1Cy4,Ch1Cy5,Ch1Cy6
9,Ch1Cy7,Ch1Cy8,Ch1Cy9,Cytokeratin,DIG_TREM2
