[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

JBWilkie · 2023-12-12T14:00:47Z

Problem

If the filename of either:

Any source file in any slot, or:
The filename of the dataset item itself contains dots "." that aren't file extensions, the NifTI exporter will break

Solution

Introduce logic that explicitly looks for .dcm, .nii, and .nii.gz extensions instead of pulling out suffixes

Changelog

Improved tolerance for dots in medical filenames
Some test linting

linear · 2023-12-12T14:00:50Z

PY-642 BUG: Cannot export .NII due to too many dots(.) in filename.

JBWilkie · 2023-12-12T14:38:18Z

darwin/exporter/formats/nifti.py

-    if len(suffixes) == 2:
-        image_id = str(filename).rstrip("".join(suffixes))
-    elif len(suffixes) == 1:
-        image_id = str(filename.stem)


I'm a little unsure about this. Currently, if .nii.gz then we include this in the image_id

Otherwise, if .dcm or .nii, we don't include this in the image_id. Why is this?

Currently, if .nii.gz then we include this in the image_id

Is that actually true? image_id = str(filename).rstrip("".join(suffixes)) would strip out .nii.gz 🤔

Actually yes that is correct

Just tested and the behaviour now is:

If any slot has a name in ending in anything other than .nii, .dcm, or .nii.gz, the exporter fails (we want this)

The filename itself can be anything, and if it ends with any of the 3 above we strip it away

I believe this is correct

rslota · 2023-12-12T15:21:48Z

darwin/exporter/formats/nifti.py

-    if len(suffixes) == 2:
-        image_id = str(filename).rstrip("".join(suffixes))
-    elif len(suffixes) == 1:
-        image_id = str(filename.stem)


Currently, if .nii.gz then we include this in the image_id

Is that actually true? image_id = str(filename).rstrip("".join(suffixes)) would strip out .nii.gz 🤔

rslota · 2023-12-12T15:22:06Z

darwin/exporter/formats/nifti.py

-                    )
-            else:
+            if not (
+                filename.name.endswith(".nii.gz")


Can we also make sure we check that case-insensitive? That's gonna be next reported bug.

Updated, all 6 checks are now case-insensitive

rslota · 2023-12-12T16:19:17Z

darwin/exporter/formats/nifti.py

+    if filename.name.lower().endswith(".nii.gz"):
+        image_id = str(filename).rstrip(".nii.gz")
+    elif filename.name.lower().endswith(".nii"):
+        image_id = str(filename).rstrip(".nii")


Will this work for case where we have upper cases? the condition will trigger, but the strip won't find the upper case extension, right?

God I need to multitask less...

Updated. Now we check the lowered version of the filename so we're guaranteed to pick up any of the 3 matches

rslota · 2023-12-12T16:34:57Z

darwin/exporter/formats/nifti.py

-    elif len(suffixes) == 1:
-        image_id = str(filename.stem)
+    if filename.name.lower().endswith(".nii.gz"):
+        image_id = str(filename).lower().rstrip(".nii.gz")


Are you sure image_id can be a lower case of the filename? I'm not sure how it's used, I would personally prefer to just strip the extension in case-insensitive manner (I guess you may need regex for that with re.IGNORECASE). But if you're sure image_id can be modified this way, that's fine by me :)

I don't believe it's an issue, but I cannot be not certain. Therefore, I've updated this so that we only strip the extension in a case-insensitive manner and leave the rest of the image_id untouched

linear · 2023-12-13T11:26:26Z

PY-645 Solution for .nii filename issue in darwin-py

linear · 2023-12-13T17:27:55Z

PY-642 BUG: Cannot export .NII due to too many dots(.) in filename.

Improved tolerance for dots in filenames & test linting

5311289

JBWilkie and others added 2 commits December 12, 2023 14:13

Merge branch 'master' into py-642

73df08b

Fixed broken tests

20428f3

JBWilkie commented Dec 12, 2023

View reviewed changes

rslota reviewed Dec 12, 2023

View reviewed changes

Added case-insensitivity

120936f

rslota reviewed Dec 12, 2023

View reviewed changes

Case insensitivity for dataset item name

8dd77e8

rslota reviewed Dec 12, 2023

View reviewed changes

Do not set image_id to lowercase

a84911e

Nathanjp91 changed the title ~~[PY-642][externa] Improved tolerance for dots in filenames & test linting~~ [PY-645][externa] Improved tolerance for dots in filenames & test linting Dec 13, 2023

rslota approved these changes Dec 13, 2023

View reviewed changes

Nathanjp91 merged commit 79859ab into master Dec 19, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

JBWilkie commented Dec 12, 2023 •

edited

Loading

linear bot commented Dec 12, 2023

JBWilkie Dec 12, 2023

rslota Dec 12, 2023

JBWilkie Dec 12, 2023 •

edited

Loading

rslota Dec 12, 2023

rslota Dec 12, 2023

JBWilkie Dec 12, 2023

rslota Dec 12, 2023

JBWilkie Dec 12, 2023

rslota Dec 12, 2023

JBWilkie Dec 13, 2023

linear bot commented Dec 13, 2023

linear bot commented Dec 13, 2023

[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

Conversation

JBWilkie commented Dec 12, 2023 • edited Loading

Problem

Solution

Changelog

linear bot commented Dec 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JBWilkie Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linear bot commented Dec 13, 2023

linear bot commented Dec 13, 2023

JBWilkie commented Dec 12, 2023 •

edited

Loading

JBWilkie Dec 12, 2023 •

edited

Loading