Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix an issue where missing files would be indexed without verification #3816

Merged
merged 2 commits into from
Mar 31, 2022

Conversation

neutrinoceros
Copy link
Member

PR Summary

I think this fixes #2819
However I do not have a clear enough picture of what the code in the initial report is supposed to accomplish.

  • would it be preferable to log missing files
  • should an error be raised instead when files are missing ?

pinging @brittonsmith and @matthewturk for review

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Feb 18, 2022

For reference I've been testing this locally with yt_astro_analysis 1.1.1 using this script (which is a self-contained version of OP's script)

import yt
from yt.extensions.astro_analysis.halo_analysis import HaloCatalog

data_ds = yt.load('snap_000.11')
hc = HaloCatalog(data_ds=data_ds, finder_method='hop')
hc.create()

as I'm writing this, the script has been running on one CPU for 30min and counting, I have no idea wether that's expected for a 500Mb dataset or if it means I'm actually stuck in an infinite loop.

edit: I used a continue instead of a break at first... so yeah I had an infinite loop. This is fixed now, and the script takes less than 5min to run

@neutrinoceros
Copy link
Member Author

this is clearly broken. Switching to draft for now

@neutrinoceros neutrinoceros marked this pull request as draft February 18, 2022 15:42
@neutrinoceros neutrinoceros added the code frontends Things related to specific frontends label Feb 18, 2022
df = cls(
self.dataset, self.io, template % {"num": i}, fi, (start, end)
)
except FileNotFoundError:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that other frontends will directly benefit from this without changes because they call open or h5py.File, which both raise FileNotFoundError naturally.

@neutrinoceros
Copy link
Member Author

@matthewturk I think the codetour watch workflow is broken (and probably always was)
Seeing that it's been directly copied from their documentation, my guess is that we never set up the required repo secret properly.

@brittonsmith
Copy link
Member

I don't think this is the right solution. If we are breaking out of this loop because a file is not found, then we are ignoring data files (and hence, particles) associated with this dataset. For Issue #2819, it looks like the directory of the data files is getting lost somehow. I think that's the thing that needs to be fixed.

@neutrinoceros
Copy link
Member Author

So the way I see it there are two solutions:

  • tag the issue as wontfix and close (the error message is already clear enough ?)
  • change this PR to add a warning that data is missing.

Personally I would still favour the second approach because I see no harm in allowing informed users to work with partial datasets. Your call :)

@brittonsmith
Copy link
Member

I think I'm only now fully understanding the original issue and I'm coming around to agreeing with your solution. I agree we should allow users to operate on incomplete datasets if they want to and if they understand that that is what is happening. I'm happy with your second option of adding a warning message.

@neutrinoceros
Copy link
Member Author

@brittonsmith there you go !

@neutrinoceros neutrinoceros removed the code frontends Things related to specific frontends label Feb 24, 2022
@matthewturk matthewturk merged commit ec79686 into yt-project:main Mar 31, 2022
@neutrinoceros neutrinoceros deleted the hotfix_2819 branch March 31, 2022 13:28
@neutrinoceros neutrinoceros added this to the 4.0.3 milestone Mar 31, 2022
neutrinoceros pushed a commit to neutrinoceros/yt that referenced this pull request Mar 31, 2022
BUG: fix an issue where missing files would be indexed without verification
(cherry picked from commit ec79686)
@neutrinoceros
Copy link
Member Author

backported as #3881

matthewturk added a commit that referenced this pull request Apr 13, 2022
Manual backport #3816 to yt-4.0.x (BUG: fix an issue where missing files would be indexed without verification)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in Gadget frontend
4 participants