# Data Preprocess for Fusion Data

Wenrui Wu, 2024-12-28

In [1]:
import os
from pathlib import Path

import numpy as np
from pyqupath.geojson import crop_dict_by_geojson
from pyqupath.ometiff import export_ometiff_pyramid_from_dict, load_tiff_to_dict

  from tqdm.autonotebook import tqdm


## 01. Data Structure

The output data structure of Fusion platform is: 

```
Scan1
├── [name].qptiff
└── .temp
    └── MarkerList.txt
```

## 02. Organize Data

CODEX downstream analysis is usually performed on the core/region level. So you need to first crop the whole slide image into multiple regions. 

- Annotate different regions using QuPath and its Polygon tools. Export the annotation as GeoJSON file. 

- Put the following files into a folder (`dir_root`):
    - `.qptiff`

    - `MarkerList.txt`
    
    - `cropping_regions.geojson`

```
/path/dir_root
├── [name].qptiff
├── cropping_regions.geojson
└── MarkerList.txt
```

In [2]:
################################################################################
dir_root = "/mnt/nfs/storage/wenruiwu_temp/pipeline/fusion/00_raw_data/"
################################################################################

dir_root = Path(dir_root)

# review all the files in the root directory
!tree $dir_root

[01;34m/mnt/nfs/storage/wenruiwu_temp/pipeline/fusion/00_raw_data[00m
├── cropping_regions.geojson
├── [01;32mMarkerList.txt[00m
└── Periodontal_CODEX-S8_Scan1.er.qptiff

0 directories, 3 files


In [3]:
# parse the dir_root
path_markerlist = dir_root / "MarkerList.txt"
path_geojson = dir_root / "cropping_regions.geojson"
paths_qptiff = list(dir_root.glob("*.qptiff"))
if len(paths_qptiff) == 1:
    path_qptiff = paths_qptiff[0]
else:
    raise ValueError("There should be only one qptiff file in the directory")

In [4]:
# review the channels in the qptiff file
channels_name = np.loadtxt(path_markerlist, dtype=str).tolist()
channels_name

['DAPI',
 'CD56',
 'CD3e',
 'CD8',
 'CD15',
 'CD138',
 'HLA-E',
 'CD45',
 'CD31',
 'CD68',
 'Pax5',
 'CD11b',
 'CD11c',
 'CD4',
 'MUC5AC',
 'MUC5B',
 'HLA-DR',
 'CD44',
 'ICOS',
 'E-cadherin',
 'COLA1',
 'KRT14',
 'a-SMA',
 'HLA-1',
 'Ki67',
 'Vimentin',
 'Blank-75',
 'Blank-75']

## 03. Order and Rename Markers

`channels_order`: select and order markers from the `MarkerList.txt`. 

`channels_rename`: in the same length of `channels_order`, which are the corresponding new names for markers in the `channels_order`. 

In [5]:
################################################################################
# selcet the channels that are needed (e.g., exclude the Blank channels)")
channels_order = [
    "DAPI",
    "CD45",
    "CD3e",
    "CD4",
    "CD8",
    "CD56",
    "CD11b",
    "CD11c",
    "CD138",
    "Pax5",
    "CD68",
    "CD15",
    "CD31",
    "HLA-E",
    "HLA-DR",
    "E-cadherin",
    "MUC5AC",
    "MUC5B",
    "COLA1",
    "KRT14",
    "a-SMA",
    "Vimentin",
    "ICOS",
    "CD44",
    "Ki67",
    "HLA-1",
]
channels_rename = None  # If None, the channels will not be renamed
################################################################################

## 04. Crop QPTIFF into Multiple OME-TIFF

In [6]:
################################################################################  
dir_output = "/mnt/nfs/storage/wenruiwu_temp/pipeline/fusion/01_preprocess/"
################################################################################

dir_output = Path(dir_output)

In [7]:
# Load QPTIFF file
im_dict = load_tiff_to_dict(
    path_qptiff,
    filetype="qptiff",
    channels_order=channels_order,
    channels_rename=channels_rename,
    path_markerlist=path_markerlist,
)

# Crop QPTIFF file into multiple OME-TIFF files
for name, crop_im_dict in crop_dict_by_geojson(im_dict, path_geojson):
    path_ometiff = dir_output / name / f"{name}.ome.tiff"
    path_ometiff.parent.mkdir(parents=True, exist_ok=True)
    if path_ometiff.exists():
        os.remove(path_ometiff)
    export_ometiff_pyramid_from_dict(crop_im_dict, str(path_ometiff))

Loading images: 100%|██████████| 26/26 [00:32<00:00,  1.24s/it]
Cropping regions:   0%|          | 0/6 [00:00<?, ?it/s]

Pyramid level sizes:
    level 1: 5667 x 12509 (original size)
    level 2: 2834 x 6255
    level 3: 1417 x 3128
    level 4: 709 x 1564
    level 5: 355 x 782
    level 6: 178 x 391
    level 7: 89 x 196

Writing level 1: 5667 x 12509
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 2834 x 6255
Resizing image for level 3: 1417 x 3128
Resizing image for level 4: 709 x 1564
Resizing image for level 5: 355 x 782
Resizing image for level 6: 178 x 391
Resizing image for level 7: 89 x 196


Cropping regions:  17%|█▋        | 1/6 [00:29<02:25, 29.15s/it]


Pyramid level sizes:
    level 1: 5861 x 11782 (original size)
    level 2: 2931 x 5891
    level 3: 1466 x 2946
    level 4: 733 x 1473
    level 5: 367 x 737
    level 6: 184 x 369
    level 7: 92 x 185

Writing level 1: 5861 x 11782
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 2931 x 5891
Resizing image for level 3: 1466 x 2946
Resizing image for level 4: 733 x 1473
Resizing image for level 5: 367 x 737
Resizing image for level 6: 184 x 369
Resizing image for level 7: 92 x 185


Cropping regions:  33%|███▎      | 2/6 [00:56<01:52, 28.15s/it]


Pyramid level sizes:
    level 1: 13258 x 16552 (original size)
    level 2: 6629 x 8276
    level 3: 3315 x 4138
    level 4: 1658 x 2069
    level 5: 829 x 1035
    level 6: 415 x 518
    level 7: 208 x 259
    level 8: 104 x 130

Writing level 1: 13258 x 16552
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 6629 x 8276
Resizing image for level 3: 3315 x 4138
Resizing image for level 4: 1658 x 2069
Resizing image for level 5: 829 x 1035
Resizing image for level 6: 415 x 518
Resizing image for level 7: 208 x 259
Resizing image for level 8: 104 x 130


Cropping regions:  50%|█████     | 3/6 [02:20<02:40, 53.55s/it]


Pyramid level sizes:
    level 1: 13975 x 15925 (original size)
    level 2: 6988 x 7963
    level 3: 3494 x 3982
    level 4: 1747 x 1991
    level 5: 874 x 996
    level 6: 437 x 498
    level 7: 219 x 249

Writing level 1: 13975 x 15925
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 6988 x 7963
Resizing image for level 3: 3494 x 3982
Resizing image for level 4: 1747 x 1991
Resizing image for level 5: 874 x 996
Resizing image for level 6: 437 x 498


Cropping regions:  67%|██████▋   | 4/6 [03:43<02:10, 65.12s/it]

Resizing image for level 7: 219 x 249

Pyramid level sizes:
    level 1: 14475 x 10360 (original size)
    level 2: 7238 x 5180
    level 3: 3619 x 2590
    level 4: 1810 x 1295
    level 5: 905 x 648
    level 6: 453 x 324
    level 7: 227 x 162

Writing level 1: 14475 x 10360
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 7238 x 5180
Resizing image for level 3: 3619 x 2590
Resizing image for level 4: 1810 x 1295
Resizing image for level 5: 905 x 648
Resizing image for level 6: 453 x 324


Cropping regions:  83%|████████▎ | 5/6 [04:45<01:04, 64.06s/it]

Resizing image for level 7: 227 x 162

Pyramid level sizes:
    level 1: 9047 x 12679 (original size)
    level 2: 4524 x 6340
    level 3: 2262 x 3170
    level 4: 1131 x 1585
    level 5: 566 x 793
    level 6: 283 x 397
    level 7: 142 x 199

Writing level 1: 9047 x 12679
    channel 1
    channel 2
    channel 3
    channel 4
    channel 5
    channel 6
    channel 7
    channel 8
    channel 9
    channel 10
    channel 11
    channel 12
    channel 13
    channel 14
    channel 15
    channel 16
    channel 17
    channel 18
    channel 19
    channel 20
    channel 21
    channel 22
    channel 23
    channel 24
    channel 25
    channel 26

Resizing image for level 2: 4524 x 6340
Resizing image for level 3: 2262 x 3170
Resizing image for level 4: 1131 x 1585
Resizing image for level 5: 566 x 793
Resizing image for level 6: 283 x 397


Cropping regions: 100%|██████████| 6/6 [05:39<00:00, 56.58s/it]

Resizing image for level 7: 142 x 199






# 05. Review Output

A OME-TIFF file for each region is exported under directory for each region. 

In [8]:
!tree $dir_output

[01;34m/mnt/nfs/storage/wenruiwu_temp/pipeline/fusion/01_preprocess[00m
├── [01;34mreg001[00m
│   └── reg001.ome.tiff
├── [01;34mreg002[00m
│   └── reg002.ome.tiff
├── [01;34mreg003[00m
│   └── reg003.ome.tiff
├── [01;34mreg004[00m
│   └── reg004.ome.tiff
├── [01;34mreg005[00m
│   └── reg005.ome.tiff
└── [01;34mreg006[00m
    └── reg006.ome.tiff

6 directories, 6 files
