ENH: Allow stream frontends to accept callables #3421

matthewturk · 2021-07-08T10:15:04Z

PR Summary

At present, all the stream frontends can only accept actual, instantiated arrays. So for instance, this means that any calls to load_uniform_grid require that the entire array already be present in memory.

This isn't a hard-and-fast requirement of the Stream frontend -- and in fact, is opposite the original purpose of the Stream frontend! -- but it has evolved to be the only way to load data into it. This PR starts the process of reversing this, and enabling functions to be supplied, instead.

Ideally, while this is somewhat useful at present, in the long run this will allow us to expose a much simpler interface for prototyping frontends, and supplying items to the Stream frontend without actually writing a fully-fledged frontend and without requiring that the data be loaded in advance.

It is still at a draft state, and is somewhat brittle.

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

matthewturk · 2021-07-08T10:32:47Z

@chummels I wanted to ping you on this one, as I think it would make it much easier to prototype some of the frontends you've been looking at, specifically as it might allow you to dynamically repartition grids.

matthewturk · 2021-07-08T10:39:00Z

yt/frontends/stream/definitions.py

@@ -177,7 +180,9 @@ def process_data(data, grid_dims=None):
    # At this point, we have arrays for all our fields
    new_data = {}
    for field in data:
-        n_shape = len(data[field].shape)
+        n_shape = 3


I wasn't incredibly happy with this. Right now, if it's callable, it assumes it's 3D. I'm wondering if maybe we could have a decorator that supplied some metadata about the field that would either be returned or would be assigned to the function object.

we could provide a @shape(...) decorator that would add a .shape attribute to callable field functions so this line would work transparently without change. It would work something like this

@shape(1, 2, 3) def my_callable_field(...): .... my_callable_field.shape == (1, 2, 3)

that said, if we only really car about dimensionality, then len(data[field].shape), then we should just call data[field].ndim and have a corresponding, simpler decorator instead.

@ndim(3) def my_callable_field(...): ...

Great idea. I personally think an ndim would be minimum ... I think a more expansive one could come later.

matthewturk · 2021-07-08T10:39:51Z

yt/frontends/stream/io.py

@@ -18,6 +18,8 @@ def __init__(self, ds):
    def _read_data_set(self, grid, field):
        # This is where we implement processor-locking
        tr = self.fields[grid.id][field]
+        if callable(tr):


This, and the change below, is not the most elegant. I'm not sure I have a better solution.

it's typing madness, but I'd say that however inelegant, it's at least very compact and doesn't leak out of scope, so I think it's a reasonable approach overall

cphyc

Awesome! Provided we add tests & doc, this would be a really good addition

matthewturk · 2022-08-02T15:05:04Z

I've rebased this for fixing the tests.

matthewturk · 2022-08-02T15:13:09Z

@chrishavlin and I have been talking and there are a handful of changes we want to make, in addition to fixing any broken tests and adding docs and examples.

Make the callables accept a set of kwargs and args that are well-defined.
Set up "virtual" grids for them, so that you can specify directly the base size and have it decompose using the existing machinery.

Probably more, but that was the main thing at the moment.

matthewturk · 2022-08-02T18:17:28Z

Ok, with a little more experimentation, I've come up with this as a workable thing:

import yt
from yt.utilities.decompose import get_psize, decompose_array
import numpy as np
import h5py

with h5py.File("example.h5", "w") as f:
    f1, f2, f3 = np.mgrid[0.0:1.0:256j, 0.0:1.0:256j, 0.0:1.0:256j]
    f.create_dataset("/density", data = f1)
    f.create_dataset("/temperature", data = f2)
    f.create_dataset("/dinosaurs", data = f3)

class WeirdFunGenerator:
    def __init__(self, filename):
        self.filename = filename
        self._handle = h5py.File(filename, "r")
        
    def read_data(self, grid, field_name):
        ftype, fname = field_name
        si = grid.get_global_startindex()
        ei = si + grid.ActiveDimensions
        return self._handle[fname][si[0]:ei[0], si[1]:ei[1], si[2]:ei[2]]

wfg = WeirdFunGenerator("example.h5")

grid_data = []
psize = get_psize(np.array((256, 256, 256)), 16)
left_edges, right_edges, shapes, slices = decompose_array((256, 256, 256), psize, np.array([[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]))
for le, re, s in zip(left_edges, right_edges, shapes):
    grid_data.append({'left_edge': le, 'right_edge': re, 'dimensions': s, 'density': wfg.read_data, 'temperature': wfg.read_data, 'dinosaurs': wfg.read_data, 'level': 1})
ds = yt.load_amr_grids(grid_data, [256, 256, 256])

ds.r[:,0.5,:].plot([("gas", "density")])

which does what we want!

I think if we get docs and a wrapper function that handles all the decomp, we should be good to go. The test-case example we could use is a very simple mmap-ing of an on-disk gigantic array.

matthewturk · 2022-08-02T18:38:18Z

With a little bit more work, I made this:

https://gist.github.com/84911dce25e6aac8d6ece9a3d6dbe668

which I think might be getting close to what I'd be happy with.

matthewturk · 2022-08-03T10:40:09Z

I think that I'd like to get two loaders into loaders.py before this is ready -- one that does the hdf5 thing and one that accepts a dict of slice-receiving set of objects.

For the latter, the example that I was thinking of was going to be just a set of shaped grids that return their xyz coordinates, to demonstrate it. This would also work for the tests.

neutrinoceros

here's some feedback

doc/source/examining/loading_data.rst

neutrinoceros · 2022-08-04T09:17:36Z

yt/frontends/stream/definitions.py

@@ -177,7 +180,9 @@ def process_data(data, grid_dims=None):
    # At this point, we have arrays for all our fields
    new_data = {}
    for field in data:
-        n_shape = len(data[field].shape)
+        n_shape = 3


we could provide a @shape(...) decorator that would add a .shape attribute to callable field functions so this line would work transparently without change. It would work something like this

@shape(1, 2, 3) def my_callable_field(...): .... my_callable_field.shape == (1, 2, 3)

that said, if we only really car about dimensionality, then len(data[field].shape), then we should just call data[field].ndim and have a corresponding, simpler decorator instead.

@ndim(3) def my_callable_field(...): ...

neutrinoceros · 2022-08-04T09:24:33Z

yt/frontends/stream/io.py

@@ -18,6 +18,8 @@ def __init__(self, ds):
    def _read_data_set(self, grid, field):
        # This is where we implement processor-locking
        tr = self.fields[grid.id][field]
+        if callable(tr):


it's typing madness, but I'd say that however inelegant, it's at least very compact and doesn't leak out of scope, so I think it's a reasonable approach overall

matthewturk · 2022-08-04T16:26:32Z

I have two more things to do:

add an even simpler slicer loader, that we can feed mmap data into even if it's not hdf5
really shore up some documentation and examples
add some tests for this, but likely not hdf5 loader tests (yet)
put in stubs for ndim, but probably not implement in this situation yet
Add docs for the hdf5 loader

I'd appreciate some feedback on the existing function I added for HDF5 files.

matthewturk · 2022-08-04T16:28:46Z

I suspect the docs build is related to my stubbing of non-existent files etc.

matthewturk · 2022-08-04T19:11:29Z

I made a notebook that I think I might start with for showing off the way you can load data from functional forms. I also intend to add on an example that just loads from individual files, too -- maybe replicating the enzo data loading. I got a little confused about the best way to show overlapping wavemodes without doing any whole-domain convolution and I'm not sure I like this all that much ... any suggestions for a better way to demonstrate?

https://gist.github.com/82e13dfab8a2f0d956cce791cddb9611

(n.b., this has no narration, and I think I messed things up with the 2**i bit)

doc/source/examining/loading_data.rst

matthewturk · 2022-08-12T18:06:26Z

I've spent some time looking at the slicing stuff and I think I want to hold off for the time being. I'll check that one off and we can consider doing it at a later time. I'll get the tests added for the HDF5 file.

neutrinoceros · 2022-08-16T11:53:39Z

Taken your last message I'm going to assume it's safe to remove this from the milestone

matthewturk · 2022-08-16T13:14:00Z

Sorry, no, what I meant was: the slices don't need to go in. I think I am ready for this to go in. I want to add tests for the existing functionality, but that's all.

chrishavlin

Took a quick look at the current state since the hdf5 loader was added since last I looked. In addition to the in-line questions/comments, a couple more:

the load_amr_grids docstring entry for grid_data should be updated to reflect that you can supply callables
do you intend this to work fully with load_uniform_grid as well? It should work as it is now when nproc=1, but it'll break when nproc>1. I think the domain decomposition when nproc > 1 could be rewritten to use the domain_dimensions argument along with how you handled the nchunks in load_hdf5_file? but I'm not sure that's needed for this PR.

doc/source/examining/Loading_Data_via_Functions.ipynb

yt/frontends/stream/definitions.py

chrishavlin · 2022-08-25T22:54:17Z

yt/loaders.py

+    dataset_arguments: Optional[dict] = None,
+):
+    """
+    Create a yt dataset given the path to an hdf5 file.


Should mention this works only with grid data and not e.g., particles stored in an hdf file (unless I'm wrong about that)

yt/loaders.py

chrishavlin · 2022-08-25T23:09:59Z

yt/loaders.py

+    root_node: Optional[str] = "/",
+    fields: Optional[List[str]] = None,
+    bbox: np.ndarray = None,
+    nchunks: int = 0,


does nchunks=0 have meaning beyond being a flag to trigger the auto-chunking? I'm inclined to change this to

Suggested change

nchunks: int = 0,

nchunks: Optional[int] = None,

and change the nchunks logic below to match. But I don't feel strongly about it.

matthewturk · 2022-09-21T17:42:19Z

I've updated most of the suggestions, but I am running out of gas on this for the moment. I think the biggest item of what is left from the suggestions is to throw an error if nproc > 1 and there are any callables.

@chrishavlin @neutrinoceros take a look?

chrishavlin

new tests look good to me!

In addition to adding the nproc error check, I think it'd be good to update the load_hdf5 docstring to mention that it's for loading grids (for now at least? unless I'm confused...).

neutrinoceros · 2022-09-21T19:34:28Z

No further comments for me at this point, thank you for adding tests !

matthewturk · 2022-09-22T10:04:23Z

I think I've addressed them all! I also added unit stuff like @chrishavlin suggested.

neutrinoceros · 2022-09-22T11:48:03Z

I am very confused by the type checking error. It hardly seems related to anything we're doing here. What seems to be happening is that we're running mypy, targeting Python 3.7, but using numpy 1.23, which uses some Python 3.8+ syntax, which raises an error and prevent type checking to be triggered. This seems like something that should happen every time the job is run, or not at all, but I've never seen it happening before.

neutrinoceros · 2022-09-22T12:14:06Z

Before we do anything else, I suggest upgrading mypy, hoping that the issue (that I'm assuming is internal to mypy) was resolved already. I'm working on this in #4139

matthewturk added code frontends Things related to specific frontends enhancement Making something better labels Jul 8, 2021

matthewturk commented Jul 8, 2021

View reviewed changes

cphyc previously approved these changes Jul 27, 2021

View reviewed changes

matthewturk changed the title ~~[WIP] Allow stream frontends to accept callables~~ [WIP] ENH: Allow stream frontends to accept callables Oct 9, 2021

Drafting out callable funcs in stream datasets

35dd1be

matthewturk dismissed cphyc’s stale review via 35dd1be August 2, 2022 15:04

matthewturk force-pushed the stream_callables branch from cd7c806 to 35dd1be Compare August 2, 2022 15:04

matthewturk changed the title ~~[WIP] ENH: Allow stream frontends to accept callables~~ ENH: Allow stream frontends to accept callables Aug 2, 2022

matthewturk added this to the 4.1.0 milestone Aug 2, 2022

Change the function signature and add stub doc

ec74b8b

Two minor fixes for callables

0163289

neutrinoceros reviewed Aug 4, 2022

View reviewed changes

matthewturk added 2 commits August 4, 2022 11:21

Add a load_hdf5_file function.

87e653f

stub commit for docs

5e25efa

Fix types, add import

3c34986

matthewturk added 4 commits August 5, 2022 10:56

Adding documentation

8ceddff

Change to different notebook kernel

24b86c9

Levels should be 0 in HDF5 loader

d1ed155

Add docs for simple hdf5 loader

1e0687c

neutrinoceros reviewed Aug 8, 2022

View reviewed changes

doc/source/examining/loading_data.rst Outdated Show resolved Hide resolved

Remove empty code block

b4dac00

neutrinoceros removed this from the 4.1.0 milestone Aug 16, 2022

matthewturk added this to the 4.1.0 milestone Aug 16, 2022

chrishavlin reviewed Aug 25, 2022

View reviewed changes

matthewturk mentioned this pull request Sep 6, 2022

ENH: Adding Cholla Frontend #3663

Merged

matthewturk added 3 commits September 21, 2022 12:25

Add tests and docs

0cdb2e7

Merge branch 'main' of github.com:yt-project/yt into stream_callables

32f9950

Add additional tests

94f780b

chrishavlin previously approved these changes Sep 21, 2022

View reviewed changes

Finish comment suggestions.

df30edc

matthewturk dismissed chrishavlin’s stale review via df30edc September 22, 2022 10:03

neutrinoceros mentioned this pull request Sep 22, 2022

TYP: upgrade mypy, handle newly detected problems #4139

Merged

Merge branch 'main' into stream_callables

9ad6da8

neutrinoceros approved these changes Sep 22, 2022

View reviewed changes

jzuhone approved these changes Sep 23, 2022

View reviewed changes

jzuhone merged commit e7bb936 into yt-project:main Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow stream frontends to accept callables #3421

ENH: Allow stream frontends to accept callables #3421

matthewturk commented Jul 8, 2021

matthewturk commented Jul 8, 2021

matthewturk Jul 8, 2021

neutrinoceros Aug 4, 2022

matthewturk Aug 4, 2022

matthewturk Jul 8, 2021

neutrinoceros Aug 4, 2022

matthewturk Aug 4, 2022

cphyc left a comment

matthewturk commented Aug 2, 2022

matthewturk commented Aug 2, 2022 •

edited

matthewturk commented Aug 2, 2022

matthewturk commented Aug 2, 2022

matthewturk commented Aug 3, 2022

neutrinoceros left a comment

neutrinoceros Aug 4, 2022

neutrinoceros Aug 4, 2022

matthewturk commented Aug 4, 2022 •

edited

matthewturk commented Aug 4, 2022

matthewturk commented Aug 4, 2022 •

edited

matthewturk commented Aug 12, 2022

neutrinoceros commented Aug 16, 2022

matthewturk commented Aug 16, 2022

chrishavlin left a comment •

edited

chrishavlin Aug 25, 2022

chrishavlin Aug 25, 2022

matthewturk commented Sep 21, 2022

chrishavlin left a comment

neutrinoceros commented Sep 21, 2022

matthewturk commented Sep 22, 2022

neutrinoceros commented Sep 22, 2022

neutrinoceros commented Sep 22, 2022

ENH: Allow stream frontends to accept callables #3421

ENH: Allow stream frontends to accept callables #3421

Conversation

matthewturk commented Jul 8, 2021

PR Summary

PR Checklist

matthewturk commented Jul 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cphyc left a comment

Choose a reason for hiding this comment

matthewturk commented Aug 2, 2022

matthewturk commented Aug 2, 2022 • edited

matthewturk commented Aug 2, 2022

matthewturk commented Aug 2, 2022

matthewturk commented Aug 3, 2022

neutrinoceros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewturk commented Aug 4, 2022 • edited

matthewturk commented Aug 4, 2022

matthewturk commented Aug 4, 2022 • edited

matthewturk commented Aug 12, 2022

neutrinoceros commented Aug 16, 2022

matthewturk commented Aug 16, 2022

chrishavlin left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewturk commented Sep 21, 2022

chrishavlin left a comment

Choose a reason for hiding this comment

neutrinoceros commented Sep 21, 2022

matthewturk commented Sep 22, 2022

neutrinoceros commented Sep 22, 2022

neutrinoceros commented Sep 22, 2022

matthewturk commented Aug 2, 2022 •

edited

matthewturk commented Aug 4, 2022 •

edited

matthewturk commented Aug 4, 2022 •

edited

chrishavlin left a comment •

edited