Connected components #25

LorenzLamm · 2023-07-25T20:45:12Z

This PR is about two things:

Added functionality to compute connected components of output segmentation. This will assign different labels to different components to differentiate between different membranes. Also, it is useful to use this in combination with ColabSeg, which recommends connected components as inputs.
Dataloading: As suggested in remove flags from tomogram loading #23 and Pyto tomoloading #22 (review) , I adjusted the dataloading: First, tomogram loading does not depend on flags whether to return a header anymore, but instead returns a Tomogram object which always contains data, header and voxel size. Second, I adjusted the data storing to be more efficient. Depending on the tomogram data type, it will try to find the most efficient .mrc file mode to store the data in. Thus, particularly segmentations can be stored much more efficiently (e.g. 2GB to 500MB). That's still way too much, but the best we can do with the mrc file format I think.

… segmentation time.

LorenzLamm · 2023-07-26T07:49:52Z

docs/Usage/Segmentation.md

Added description of connected components to docs.

LorenzLamm · 2023-07-26T07:50:26Z

src/membrain_seg/annotations/extract_patches.py

-    tomo = load_tomogram(tomo_path)
-    labels = load_tomogram(seg_path)
+    tomo = load_tomogram(tomo_path).data
+    labels = load_tomogram(seg_path).data


This script does not care about the tomogram header, so only the data is read in

LorenzLamm · 2023-07-26T07:59:01Z

src/membrain_seg/segmentation/cli/segment_cli.py

        sw_roi_size=sliding_window_size,
    )
+
+
+@cli.command(name="components", no_args_is_help=True)


It is now possible to compute connected components from the segmentation.
The command can be either embedded in the segmentation directly (above) or separately after computation of the segmentation.

LorenzLamm · 2023-07-26T08:02:01Z

src/membrain_seg/segmentation/dataloading/data_utils.py

@@ -209,12 +225,30 @@ def write_nifti(out_file: str, image: np.ndarray) -> None:
    sitk.WriteImage(out_image, out_file)


+@dataclass
+class Tomogram:


This new dataclass contains data, header and voxel size of the tomogram

LorenzLamm · 2023-07-26T08:02:50Z

src/membrain_seg/segmentation/dataloading/data_utils.py

-        return data, tomogram.voxel_size
-    return data
+
+_dtype_to_mode = {


These are the different data modes used by the mrcfile package

LorenzLamm · 2023-07-26T08:04:28Z

src/membrain_seg/segmentation/dataloading/data_utils.py

+}
+
+
+def convert_dtype(tomogram: np.ndarray) -> np.ndarray:


Going through potential mrcfile data types and trying to find the most efficient one for the input tomogram.

LorenzLamm · 2023-07-26T08:06:07Z

src/membrain_seg/segmentation/dataloading/data_utils.py

+        else:
+            data = tomogram
+            header = None
+        data = convert_dtype(data)


Previously, store_tomogram used the numpy array data type to choose the mrc file data type (automatically in mrcfile package). Now, it tries to find the most efficient data representation.

LorenzLamm · 2023-07-26T08:07:48Z

src/membrain_seg/segmentation/dataloading/data_utils.py

                setattr(out_mrc.header, attr, getattr(header, attr))
+            out_mrc.header.mode = dtype_mode
+        if voxel_size is not None:
+            out_mrc.voxel_size = voxel_size


The voxel_size argument is mainly used for rescaling the tomogram, where pixel size changes, but the original header still contains the original pixel size.

LorenzLamm · 2023-07-26T09:16:50Z

src/membrain_seg/segmentation/connected_components.py

+    structure = np.ones((3, 3, 3))
+    labeled_array, num_features = ndimage.label(binary_seg, structure=structure)
+
+    # remove small clusters


This is still relatively slow, particularly for many different connected components.

alisterburt

hey @LorenzLamm - looking great as always, a few non-blocking comments below - please move forward as you see best fit and merge when you're happy!

alisterburt · 2023-07-27T21:36:38Z

src/membrain_seg/segmentation/connected_components.py

+    if size_thres is not None:
+        print(
+            "Removing components smaller than",
+            size_thres,
+            "voxels. (This can take a while)",
+        )
+        sizes = ndimage.sum(binary_seg, labeled_array, range(1, num_features + 1))
+        too_small = np.nonzero(sizes < size_thres)[0] + 1  # features labeled from 1
+        for feat_nr in too_small[::-1]:  # iterate in reverse order
+            labeled_array[labeled_array == feat_nr] = 0
+            labeled_array[labeled_array > feat_nr] -= 1


I've done it in 2D with a method which may be faster - see
https://github.com/teamtomo/fidder/blob/3ac64b13db256b598bf8951eb9a66137c04b86b6/src/fidder/predict/probabilities_to_mask.py#L5-L32
and specifically this utility function
https://github.com/teamtomo/fidder/blob/3ac64b13db256b598bf8951eb9a66137c04b86b6/src/fidder/utils.py#L128-L156

it's fast but memory requirements go way up for large numbers of regions, in 3D this may be limiting extremely quickly

maybe @kevinyamauchi has tips for doing this quickly in 3D?

Thank you! I tried it, and it did not give a huge boost in speed, unfortunately. But as you say, memory consumption is getting huge quickly. I tested it with a relatively noisy tomogram with 300+ components, leading to 300x1GB channels. Was kind of feasible on our cluster, but I guess for many users it won't be :/

alisterburt · 2023-07-27T21:41:17Z

src/membrain_seg/segmentation/dataloading/data_utils.py

+    data : np.ndarray
+        The 3D array data representing the tomogram.
+    header : Any
+        The header information from the tomogram file.
+    voxel_size : Any, optional
+        The voxel size of the tomogram.
+    """
+
+    data: np.ndarray
+    header: Any
+    voxel_size: Optional[Any] = None


Great as a stopgap but I don't love the Any types here - longer term it might be worth constructing a specific model for the information we care about from the header

class Header(BaseModel): bla: float bla2: tuple[int, int, int] @classmethod def from_file(cls, path: PathLike): with mrcfile.open(path, header_only=True, permissive=True) as mrc: # get stuff from header ... cls(**stuff)

what do you think?

Yes, that's much cleaner, thanks! This would also resolve the issue of skipping several header objects in the store_tomogram function, because we can easily control what properties are stored in the header.
Not sure which ones are the best header items to keep. Need to figure out what's necessary e.g. for compatibility with IMOD or Amira. But will try to clean this up in a follow up :)

alisterburt · 2023-07-27T21:43:09Z

src/membrain_seg/segmentation/dataloading/data_utils.py

+    store_connected_components: bool = False,
+    connected_component_thres: int = None,


I think this should be factored out into a separate program membrain-seg label rather than added to the program for segmenting - it feels like a postprocessing step and people might want to use it on pre-existing segmentations

Yes, currently, both options are implemented: Users can either decide to directly output connected components from the membrain segment function, or use the membrain components function to process already existing segmentation files.

LorenzLamm added 8 commits July 24, 2023 20:15

Connected component functionality

dbe1cbd

Add efficient data storage and connected component storing

269fe57

Add CLI for only connected components and functionalities directly at…

1d8faa3

… segmentation time.

Add connected components functionality

946d09e

Merge branch 'main' into connected_components

4d1be79

Add Tomogram class and adjust tomo loading and storing

6f3b07f

Change to new data loading / storing

869f52d

Add connected component description

dae6c09

LorenzLamm commented Jul 26, 2023

View reviewed changes

docs/Usage/Segmentation.md Outdated

Copy link

Collaborator Author

LorenzLamm Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description of connected components to docs.

LorenzLamm commented Jul 26, 2023

View reviewed changes

Description for postprocessing connected components

d7c599b

LorenzLamm commented Jul 26, 2023

View reviewed changes

LorenzLamm added 2 commits July 26, 2023 11:09

add long runtime hint

4528c80

add long runtime hint

256abf6

LorenzLamm commented Jul 26, 2023

View reviewed changes

LorenzLamm marked this pull request as ready for review July 26, 2023 09:17

LorenzLamm mentioned this pull request Jul 27, 2023

Error during normalization while running membrain on WSL2 #27

Closed

alisterburt reviewed Jul 27, 2023

View reviewed changes

resolve conflicts with main

30ac769

LorenzLamm merged commit 25e7ee3 into main Jul 28, 2023
11 checks passed

LorenzLamm deleted the connected_components branch July 28, 2023 09:45

LorenzLamm mentioned this pull request Jul 28, 2023

remove flags from tomogram loading #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connected components #25

Connected components #25

LorenzLamm commented Jul 25, 2023 •

edited

Loading

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

LorenzLamm Jul 26, 2023

alisterburt left a comment

alisterburt Jul 27, 2023

LorenzLamm Jul 28, 2023

alisterburt Jul 27, 2023

LorenzLamm Jul 28, 2023

alisterburt Jul 27, 2023

LorenzLamm Jul 28, 2023

		store_connected_components: bool = False,
		connected_component_thres: int = None,

Connected components #25

Connected components #25

Conversation

LorenzLamm commented Jul 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alisterburt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LorenzLamm commented Jul 25, 2023 •

edited

Loading