Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Find datadicts matching a set of conditions #379

Closed

Conversation

yoshi74ls181
Copy link
Contributor

@yoshi74ls181 yoshi74ls181 commented Feb 20, 2023

This pull request adds a method plottr.data.datadict_storage.search_datadicts, which returns an iterator over datadicts matching a set of conditions.
The following conditions are currently supported:

  • since: Date (and time) in the format YYYY-mm-dd (or YYYY-mm-ddTHHMMSS).
  • until: Date (and time) in the format YYYY-mm-dd (or YYYY-mm-ddTHHMMSS). If not given, default to until = since.
  • name: Name of the dataset (if not given, match all datasets).

For convenience, I've also added a method plottr.data.datadict_storage.search_datadict, which asserts that there is only one matching datadict.

@yoshi74ls181 yoshi74ls181 reopened this Feb 21, 2023
@yoshi74ls181
Copy link
Contributor Author

yoshi74ls181 commented Feb 21, 2023

Resolved a merge conflict with #375.

@marcosfrenkel
Copy link
Collaborator

I really like this feature! But at the moment if the search encounters any invalid data (the writer always creates a file even if the nothing is inside of it) the whole search fails. Because of this, it is hard to test on my end.

I am also a little unsure if its a good idea that the search_datadicts returns the generator instead of a list with all the matching datadicts. It is a good idea to have the generator since the datadicts might be big, but having both the generators and a function that returns a list might be a good idea too and shouldn't take much effort. @wpfff what do you think?

@yoshi74ls181
Copy link
Contributor Author

Thanks! I think I've resolved the error you encountered by fixing a bug in datadict_from_hdf5. Could you test this again?

@yoshi74ls181
Copy link
Contributor Author

Added the following search conditions:

  • only_complete: Only return datadicts tagged as complete. Defaults to True.
  • skip_trash: Skip datadicts tagged as trash. Defaults to True.

@marcosfrenkel
Copy link
Collaborator

marcosfrenkel commented Mar 16, 2023

Hello sorry for the late response, its been a busy couple of weeks.

I remember being able to test this but no matter how I try now the generator is always empty. @yoshi74ls181 could you give me an example of how it is supposed to be used?

@yoshi74ls181
Copy link
Contributor Author

No worries! Sorry about flooding you with many pull requests recently, I don't mean to rush you at all.

Here's a usage example:

from plottr.data.datadict_storage import DataDict, DDH5Writer, search_datadicts, search_datadict

basedir = "C:\\plottr-data"

# create two datasets
data = DataDict(x=dict(), y=dict(axes=["x"]))
with DDH5Writer(data, basedir, name="test") as writer:
    writer.add_data(x=[1, 2, 3], y=[1, 2, 3])
data = DataDict(x=dict(), y=dict(axes=["x"]))
with DDH5Writer(data, basedir, name="test") as writer:
    writer.add_data(x=[1, 2, 3], y=[3, 2, 1])

# print all datasets named "test" from today
for foldername, datadict in search_datadicts(basedir, "2023-03-17", name="test"):
    print(foldername, datadict["x"]["values"], datadict["y"]["values"])

# print just the newest one
foldername, datadict = search_datadict(basedir, "2023-03-17", name="test", newest=True)
print(foldername, datadict["x"]["values"], datadict["y"]["values"])

# print the one with specific date and time
foldername, datadict = search_datadict(basedir, "2023-03-17T200540", name="test")
print(foldername, datadict["x"]["values"], datadict["y"]["values"])

@wpfff
Copy link
Contributor

wpfff commented Mar 20, 2023

@yoshi74ls181 off-topic, but i couldn't find a way to message you in a different way :)
it was great meeting you at the APS meeting! could you maybe let me know your email address? (you can email me directly at wpfaff at illinois dot edu)

@yoshi74ls181
Copy link
Contributor Author

@wpfff Have you received my email? I'm worried that it might have ended up in your spam folder because I sent it from my personal gmail account (I lost access to my university email when I graduated). No worries if it's just that you've been busy.

@wpfff
Copy link
Contributor

wpfff commented May 23, 2023

this function is useful, and we have a similar one in our lab code -- but i'm not sure it should be part of plottr itself.
there's a few conceptual issues:

  • it's hard to make this useable from the monitr gui
  • it assumes a particular way of data naming/storing, which we don't want to enforce in the package (currently you can easily change how naming works by making your own data writer, and everything else will keep working)

we're currently thinking on how to filter better in monitr, but we're not sure yet on the correct approach.
I'm closing this for now, and we can re-open if needed.

@wpfff wpfff closed this May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants