In [5]:
from pathlib import Path

import pandas as pd

from rtnls_fundusprep.preprocessor import parallel_preprocess

## Preprocessing

This code will preprocess the images and write .png files with the square fundus image and the contrast enhanced version

This step is not strictly necessary, but it is useful if you want to run the preprocessing step separately before model inference


Create a list of files to be preprocessed:

In [6]:
ds_path = Path("../data/ODIR")
files = list((ds_path / "original").glob("*"))

Images with .dcm extension will be read as dicom and the pixel_array will be read as RGB. All other images will be read using PIL's Image.open

In [7]:
bounds = parallel_preprocess(
    files,  # List of image files
    rgb_path=ds_path / "rgb",  # Output path for RGB images
    ce_path=ds_path / "ce",  # Output path for Contrast Enhanced images
    n_jobs=4,  # number of preprocessing workers
)
df_bounds = pd.DataFrame(bounds).set_index("id")

0it [00:00, ?it/s][Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
8it [00:02,  2.85it/s][Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    3.1s
16it [00:03,  5.60it/s][Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    3.4s
20it [00:03,  7.26it/s][Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    3.9s
28it [00:04, 10.21it/s][Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    4.3s
36it [00:04, 12.72it/s][Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    4.9s
48it [00:05, 12.70it/s][Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    5.6s
56it [00:06, 13.76it/s][Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:    6.3s
68it [00:07, 13.98it/s][Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:    7.2s
80it [00:07, 13.49it/s][Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:    8.1s
96it [00:09, 14.61it/s][Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:    8.9s
108it [00:09, 15.72it/s][Parallel(n_jobs=4)]: Done 105 tasks      | 

The preprocessor will produce RGB and contrast-enhanced preprocessed images cropped to a square and return a dataframe with the image bounds that can be used to reconstruct the original image. Output files will be named the same as input images, but with .png extension. Be careful with providing multiple inputs with the same filename without extension as this will result in over-written images. Any exceptions during pre-processing will not stop execution but will print error. Images that failed pre-processing for any reason will be marked with `success=False` in the df_bounds dataframe.

In [8]:
df_bounds.to_csv(ds_path / "meta.csv")