Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PY-645][externa] Improved tolerance for dots in filenames & test linting #746

Merged
merged 6 commits into from Dec 19, 2023

Conversation

JBWilkie
Copy link
Contributor

@JBWilkie JBWilkie commented Dec 12, 2023

Problem

If the filename of either:

  • Any source file in any slot, or:
  • The filename of the dataset item itself contains dots "." that aren't file extensions, the NifTI exporter will break

Solution

Introduce logic that explicitly looks for .dcm, .nii, and .nii.gz extensions instead of pulling out suffixes

Changelog

  • Improved tolerance for dots in medical filenames
  • Some test linting

Copy link

linear bot commented Dec 12, 2023

Comment on lines -180 to -183
if len(suffixes) == 2:
image_id = str(filename).rstrip("".join(suffixes))
elif len(suffixes) == 1:
image_id = str(filename.stem)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unsure about this. Currently, if .nii.gz then we include this in the image_id

Otherwise, if .dcm or .nii, we don't include this in the image_id. Why is this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if .nii.gz then we include this in the image_id

Is that actually true? image_id = str(filename).rstrip("".join(suffixes)) would strip out .nii.gz 🤔

Copy link
Contributor Author

@JBWilkie JBWilkie Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes that is correct

Just tested and the behaviour now is:

  • If any slot has a name in ending in anything other than .nii, .dcm, or .nii.gz, the exporter fails (we want this)
  • The filename itself can be anything, and if it ends with any of the 3 above we strip it away

I believe this is correct

Comment on lines -180 to -183
if len(suffixes) == 2:
image_id = str(filename).rstrip("".join(suffixes))
elif len(suffixes) == 1:
image_id = str(filename.stem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if .nii.gz then we include this in the image_id

Is that actually true? image_id = str(filename).rstrip("".join(suffixes)) would strip out .nii.gz 🤔

)
else:
if not (
filename.name.endswith(".nii.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also make sure we check that case-insensitive? That's gonna be next reported bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, all 6 checks are now case-insensitive

if filename.name.lower().endswith(".nii.gz"):
image_id = str(filename).rstrip(".nii.gz")
elif filename.name.lower().endswith(".nii"):
image_id = str(filename).rstrip(".nii")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for case where we have upper cases? the condition will trigger, but the strip won't find the upper case extension, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God I need to multitask less...

Updated. Now we check the lowered version of the filename so we're guaranteed to pick up any of the 3 matches

elif len(suffixes) == 1:
image_id = str(filename.stem)
if filename.name.lower().endswith(".nii.gz"):
image_id = str(filename).lower().rstrip(".nii.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure image_id can be a lower case of the filename? I'm not sure how it's used, I would personally prefer to just strip the extension in case-insensitive manner (I guess you may need regex for that with re.IGNORECASE). But if you're sure image_id can be modified this way, that's fine by me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's an issue, but I cannot be not certain. Therefore, I've updated this so that we only strip the extension in a case-insensitive manner and leave the rest of the image_id untouched

@Nathanjp91 Nathanjp91 changed the title [PY-642][externa] Improved tolerance for dots in filenames & test linting [PY-645][externa] Improved tolerance for dots in filenames & test linting Dec 13, 2023
Copy link

linear bot commented Dec 13, 2023

Copy link

linear bot commented Dec 13, 2023

@Nathanjp91 Nathanjp91 merged commit 79859ab into master Dec 19, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants