Initial draft of data specifications for cell classification task.

## Summary

Files must be zip containers (filenames will end in '.zip') and include:
- X.npy array with dimensions (1, y, x, c); raw data
- y.npy array with dimensions (1, y, x, 1); instance labels
- channel_names.json
- classes/cell_type.json

This notebook will help create each component of the file and save them in the correct output format. This notebook also provides an example of how to extract the file contents after annotation using the python zipfile library.

In [None]:
import os
import re

import imageio
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
def sanitize(x):
    """Strip out non-alphanumeric characters from a string.
    
    https://stackoverflow.com/a/1276774
    
    returns lowercase version of string to help compare
    possible variations of channel or class names, eg:
        - 'B cell' vs 'Bcell' vs 'b_cell'
    
    Note that this will strip out '+' and '-' characters,
    so if that is the only difference between two class names,
    problems may arise! Use 'pos' or 'neg' when creating names
    for lineage classifications instead.
    """
    return re.sub(r'\W+', '', x).lower()

In [None]:
# example data
DATA_DIR = os.path.abspath('../data/cell_classification_example')

### Starting from predicted classifications
Annotation files for DCL will likely need to be prepared from existing data in a different format.

In [None]:
# load the existing cell classification mapping
example_key = os.path.join(DATA_DIR, 'cell_key.csv')
example_key_df = pd.read_csv(example_key, header=None)

# this corresponds to the pixel-level classification array
# we will need this information to convert from
# the label array to the cell class assignment dictionary we require
example_key_df

In [None]:
# load and preview the pixel-level classification predictions
example_prediction_path = os.path.join(DATA_DIR, 'Point1', 'Point1_cell_overlay.tiff')
example_prediction_arr = imageio.imread(example_prediction_path)

classes_cmap = plt.get_cmap('Dark2')
classes_cmap.set_bad('black')
fig, ax = plt.subplots(figsize=(10, 10))

ax.imshow(np.ma.masked_equal(example_prediction_arr, 0), 
           cmap=classes_cmap)