Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating derived fields with lambda functions #3434

Closed
forrestglines opened this issue Jul 13, 2021 · 9 comments · Fixed by #3440
Closed

Creating derived fields with lambda functions #3434

forrestglines opened this issue Jul 13, 2021 · 9 comments · Fixed by #3440
Labels
api-consistency naming conventions, code deduplication, informative error messages, code smells... bug UX user-experience

Comments

@forrestglines
Copy link
Contributor

Bug report

Bug summary

Adding fields using a lambda function fails to add the field to a dataset. Trying to access created fields that use a lambda function results in a YTFieldNotFound exception.

Code for reproduction

import yt

ds = yt.testing.fake_amr_ds(fields=["density"],units=["g/cm**3"])

for i in range(3):
    ds.add_field(
            name=("gas", f"field_{i}"),
            function=lambda field,data : i*data["density"],
            sampling_type="local",
            units="g/cm**3"
            )

for i in range(3):
    name = ("gas",f"field_{i}")
    print(f"{name} in ds.field_info:", name in ds.field_info) #All false

Actual outcome

yt : [INFO     ] 2021-07-13 16:50:44,250 Parameters: current_time              = 0.0
yt : [INFO     ] 2021-07-13 16:50:44,250 Parameters: domain_dimensions         = [32 32 32]
yt : [INFO     ] 2021-07-13 16:50:44,250 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2021-07-13 16:50:44,250 Parameters: domain_right_edge         = [1. 1. 1.]
yt : [INFO     ] 2021-07-13 16:50:44,250 Parameters: cosmological_simulation   = 0.0
('gas', 'field_0') in ds.field_info: False
('gas', 'field_1') in ds.field_info: False
('gas', 'field_2') in ds.field_info: False

None of the fields defined with lambda functions get added to field_info. No errors or exceptions are reported by add_field. But when I try to access the supposedly created fields, I get a YTFieldNotFound exception.

Expected outcome

Lambda functions should work with add_field like other function objects. Or it should be documented that they don't work.

I remember running into this issue once before several years ago when I was just starting out with yt, it might have existed for a while.

Version Information

  • Operating System: Linux
  • Python Version: Python 3.9.5
  • yt version: Fork of 4.0.dev0, installed from source
  • Other Libraries (if applicable):
@welcome
Copy link

welcome bot commented Jul 13, 2021

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

@matthewturk
Copy link
Member

Hi @forrestglines -- can you verify that if the field is defined as a function (potentially in its own closure), it does get added?

@forrestglines
Copy link
Contributor Author

Yes - I tried out a few ways of adding the fields

#Add the fields with functions defined in loop - works
for i in range(3,6):
    def _field_i(field,data):
        return np.full_like(data["density"],i)
    ds.add_field(
            name=("gas", f"field_{i}"),
            function=_field_i,
            sampling_type="local",
            units="g/cm**3"
            )
#Add fields
def _field_6(field,data):
    return np.full_like(data["density"],6)

ds.add_field(
    name=("gas", "field_6"),
    function=_field_6,
    sampling_type="local",
    units="g/cm**3"
        )

for i in range(7):
    name = ("gas",f"field_{i}")
    print(f"{name} in ds.field_info:", name in ds.field_info)

    if name in ds.field_info:
        print(f"  {name} mean = {ds.all_data()[name].mean()}")
('gas', 'field_3') in ds.field_info: True
  ('gas', 'field_3') mean = 3.0 g/cm**3
('gas', 'field_4') in ds.field_info: True
  ('gas', 'field_4') mean = 4.0 g/cm**3
('gas', 'field_5') in ds.field_info: True
  ('gas', 'field_5') mean = 5.0 g/cm**3
('gas', 'field_6') in ds.field_info: True
  ('gas', 'field_6') mean = 6.0 g/cm**3

These all work. However, defining the functions within the for-loop works as expected in this example but in my analysis code leads to them all sharing the last added function (i.e. they would all print 5.0 in the example shown here, not all different and not all 6.0). I haven't been able to reproduce that behavior in a sharable snippet.

@neutrinoceros
Copy link
Member

Looks like a bug to me. I'm not 100% sure it that it should work with lambdas (or generally with non-function callables), though I would be inclined to think so. On the off chances that there's a good, deep reason not to add support for it, it should at the very least produce a clear error message.
I note that there isn't a single example of this in the code base or documentation so if anything it doesn't look like it was ever actively pushed.

@neutrinoceros neutrinoceros added api-consistency naming conventions, code deduplication, informative error messages, code smells... UX user-experience bug labels Jul 15, 2021
@matthewturk
Copy link
Member

My guess is that there's something yt does with the function handles that is not available to lambdas.

To make the functions work the way you're wanting to (which I recognize is a stopgap) you can try adding a layer of enclosure to them. For instance,

def _make_field(n):
    def _field_i(field,data):
        return np.full_like(data["density"],n)
    return _field_i

for i in range(3,6):
    ds.add_field(
            name=("gas", f"field_{i}"),
            function=_make_field(i),
            sampling_type="local",
            units="g/cm**3"
            )

That should do it.

@neutrinoceros
Copy link
Member

yes this is how it is done internally.
I couldn't find a solution to this, and it looks like it's even hard to distinguish a function from a lambda at runtime, but there is a way to do it, so I'll use that to explicitly forbid lambdas in a PR.

@forrestglines
Copy link
Contributor Author

To make the functions work the way you're wanting to (which I recognize is a stopgap) you can try adding a layer of enclosure to them.

This works -- both in my example and in my analysis code.

I'm still a little confused and curious as to what features of lambdas are missing which make them break as a derived field.

@forrestglines
Copy link
Contributor Author

After a bit of googling, I found another piece of the puzzle. The function only needs to be given a name other than <lambda>, even changing to lambda works. I.e.

for i in range(3):
    name = ("gas", f"field_{i}")
    _my_field = lambda field,data : np.full_like(data["density"],i)
    _my_field.__name__ = "lambda"

    ds.add_field(
            name=name,
            function=_my_field,
            sampling_type="local",
            units="g/cm**3"
            )

works fine.

The only check I can find for <lambda> in the code is on yt/fields/derived_field.py:253. If I remove the check on that line and just always invoke e[self.name], my example works as expected. However, that's probably removing something important.

@neutrinoceros
Copy link
Member

Fascinating ! @matthewturk, it seems you authored this line 8yrs ago but the commit message doesn't help much, any idea if this check could have become irrelevant ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-consistency naming conventions, code deduplication, informative error messages, code smells... bug UX user-experience
Projects
None yet
3 participants