Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper duplicate name validation in filter_images() #270

Closed
CarloOtano opened this issue Apr 6, 2023 · 5 comments
Closed

Improper duplicate name validation in filter_images() #270

CarloOtano opened this issue Apr 6, 2023 · 5 comments

Comments

@CarloOtano
Copy link
Contributor

The filter_images function is supposed to validate that the image names are unique, but it is currently validating that the image paths are unique.

For example the following folder structure will not raise an error:
jpg_parent_folder
---> jpg_sub_folder_1
-------> imageA.jpg
-------> imageB.jpg
---> jpg_sub_folder_2
-------> imageA.jpg
-------> imageC.jpg

Since the images location can be provided as a glob, more than one sub directory can be specified:
--images "jpg_parent_folder/**/*.jpg"

The although the names are not unique, the image paths are unique:

  • jpg_parent_folder/jpg_sub_folder_1/imageA.jpg <---Name not unique
  • jpg_parent_folder/jpg_sub_folder_1/imageB.jpg
  • jpg_parent_folder/jpg_sub_folder_2/imageA.jpg <---Name not unique
  • jpg_parent_folder/jpg_sub_folder_2/imageC.jpg

One simple solution might be:

for i in stream_images(image_paths=get_image_paths(**kwargs)):
    filename = clean_filename(i.path)
    if filename in image_paths:
      duplicates.add(filename)
    image_paths.add(filename)

pix-plot/pixplot/pixplot.py

Lines 195 to 207 in 48c8e6b

'''Main method for filtering images given user metadata (if provided)'''
# validate that input image names are unique
image_paths = set()
duplicates = set()
for i in stream_images(image_paths=get_image_paths(**kwargs)):
if i.path in image_paths:
duplicates.add(i.path)
image_paths.add(i.path)
if duplicates:
raise Exception('''
Image filenames should be unique, but the following filenames are duplicated\n
{}
'''.format('\n'.join(duplicates)))

@duhaime
Copy link
Member

duhaime commented Apr 7, 2023

Yes this has been a challenge in the past.

When we copy the filenames to the output directory, do we use just the filename or the full path?

If the latter, we should also use the full path when deduping just as you suggest. If we copy just the basename, we'll need to use the full path when copying the images to the output (like replacing / with - or similar).

@CarloOtano
Copy link
Contributor Author

When we copy the filenames to the output directory, do we use just the filename or the full path?

You use just the filename when you copy the files. The filename are used in other areas such as the imagelist (and other places).

If the latter, we should also use the full path when deduping just as you suggest...

I am not suggesting we use the full path. Although the ideal solution might be to modify the code to allow the use of files with the same name, that is a more involved process.

For now I am only suggesting we modify the validation to catch duplicate basenames with different paths.

@duhaime
Copy link
Member

duhaime commented Apr 12, 2023

Ah roger that--makes sense. If you'd want to send up a diff with the code you posted when you created this issue, we can get it merged!

(P.S. Thank you for your help with the project--the team has scattered and resources are limited right now, but it's great to improve the software when there are easy wins to be had!)

CarloOtano added a commit to CarloOtano/pix-plot that referenced this issue Apr 13, 2023
@CarloOtano CarloOtano mentioned this issue Apr 13, 2023
@CarloOtano
Copy link
Contributor Author

Please review PR #271 and let me know if you need additional changes.

(P.S. Thank you all for taking the time to create and improve this project!)

@CarloOtano
Copy link
Contributor Author

Issues solved in PR #271

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants