Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error if field accesses is ambiguous. #2967

Merged
merged 17 commits into from
Apr 23, 2021

Conversation

matthewturk
Copy link
Member

PR Summary

This addresses both #832 and #1467.

It instruments __setitem__ in the field info container to check if there is a possible ambiguity, which we then check against in the "guessing" portion of _get_field_info. We need to investigate the potential performance penalty for this instrumentation of __setitem__ but I hope it will be minimal.

This may also break our tests, but ideally it will not. I think it will likely break workflows where the particle type is not set, as it will throw an error for particle unions that are identical to particle types (i.e, if you just have 'nbody' particles, then 'nbody' and 'all' will be the same.)

PR Checklist

  • pass black --check yt/
  • pass isort . --check --diff
  • pass flake8 yt/
  • pass flynt yt/ --fail-on-change --dry-run -e yt/extern
  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this. I left minor comments and suggestions but as a whole I'm happy with the PR as it is :)

edit: of course my approval is a bit premature given the test failures.

yt/utilities/exceptions.py Outdated Show resolved Hide resolved
yt/fields/field_info_container.py Outdated Show resolved Hide resolved
matthewturk and others added 2 commits November 15, 2020 05:20
Co-authored-by: Clément Robert <cr52@protonmail.com>
@matthewturk matthewturk added api-consistency naming conventions, code deduplication, informative error messages, code smells... backwards incompatible This change will change behavior yt-4.0 feature targeted for the yt-4.0 release bug labels Nov 15, 2020
cphyc
cphyc previously approved these changes Nov 15, 2020
Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad this feature is finally there, this is really great!

I am fine with this being merged with or without my comments taken into account (and assuming tests pass).

yt/fields/field_info_container.py Outdated Show resolved Hide resolved
yt/utilities/exceptions.py Outdated Show resolved Hide resolved
@cphyc
Copy link
Member

cphyc commented Nov 15, 2020

In order to magically get diffs that would fix the failing tests, you can use this script:

import sys
import os
from difflib import unified_diff

from nose.plugins import Plugin
from yt.utilities.exceptions import YTAmbiguousFieldName
from nose.plugins.plugintest import run

ROOT = os.path.abspath(__file__)

diff_file = open("diff.txt", mode="w")

class AmbiguousResolvePlugin(Plugin):
    """Plugin that takes no command-line arguments"""
    name = "ambiguous-solver"
    enabled = True

    def configure(self, options, conf):
        pass

    def options(self, parser, env={}):
        pass

    def addError(self, test, err: sys.exc_info):
        try:
            test_path = test.context.__file__
        except:
            test_path = os.path.join(
                ROOT,
                test.context.__module__.replace(".", "/")
            )
        t, v, tb = err
        if t is not YTAmbiguousFieldName:
            return

        import traceback
        # print('TYPE:', t)
        # print('VALUE:', v)
        # print('TRACEBACK:', tb)

        ambiguous_fname = v.fname
        ok = False
        for ft, _ in traceback.walk_tb(tb):
            line = open(ft.f_code.co_filename, "r").readlines()[ft.f_lineno].replace("\n", "")
            # print(f"{ft.f_code.co_filename}:{ft.f_code.co_firstlineno} {line}")
            if test_path == ft.f_code.co_filename:
                ft_err = ft
                ok = True

        if not ok:
            return

        # Now, ft contains the current frame,
        # need to correct the test!
        # print(f"error from {ft_err.f_code.co_filename}:{ft_err.f_code.co_firstlineno}:{ft_err.f_lineno}")

        with open(test_path, 'r') as f:
            lines = f.readlines()

        lineno = ft_err.f_lineno - 1

        suggested_ftype = v.possible_ftypes[-1]
        corrected = lines.copy()
        corrected[lineno] = corrected[lineno].replace(
            '"%s"' % ambiguous_fname,
            '("%s", "%s")' % (suggested_ftype, ambiguous_fname)
        ).replace(
            "'%s'" % ambiguous_fname,
            "('%s', '%s')" % (suggested_ftype, ambiguous_fname)
        )

        rel_path = os.path.relpath(test_path)
        diff_file.writelines(
            unified_diff(
                lines, corrected,
                fromfile=rel_path, tofile=rel_path
            )
        )
        diff_file.flush()


run(
    argv=["nosetests", "yt"],
    plugins=[AmbiguousResolvePlugin()]
)

Example

$ pwd
/path/to/yt/root
$ python nose_debug.py
[...]
$ cat diffs.txt
--- /path/to/yt/root/yt/yt/data_objects/level_sets/tests/test_clump_finding.py
+++ /path/to/yt/root/yt/yt/data_objects/level_sets/tests/test_clump_finding.py
@@ -42,7 +42,7 @@
     master_clump.add_validator("min_cells", 1)
 
     def _total_volume(clump):
-        total_vol = clump.data.quantities.total_quantity(["cell_volume"]).in_units(
+        total_vol = clump.data.quantities.total_quantity([("stream", "cell_volume")]).in_units(
             "cm**3"
         )
         return "Cell Volume: %6e cm**3.", total_vol
--- /path/to/yt/root/yt/yt/data_objects/level_sets/tests/test_clump_finding.py
+++ /path/to/yt/root/yt/yt/data_objects/level_sets/tests/test_clump_finding.py
@@ -105,7 +105,7 @@
     leaf_clumps = master_clump.leaves
 
     fn = master_clump.save_as_dataset(
-        fields=["density", "x", "y", "z", "particle_mass"]
+        fields=["density", ("enzo", "x"), "y", "z", "particle_mass"]
     )
     ds2 = load(fn)
[…]

As you can see with the last bit, it is far from perfect!

@cphyc
Copy link
Member

cphyc commented Nov 15, 2020

As you can see with the last bit, it is far from perfect!

Also, there is some freedom to pick which ftype should be used. By default here, it takes the last one but since it is fed from a set, well, that doesn't really matter.
Maybe one option would be to store in the exception which field yt used to infer, and set it as default in the diff that's created. It would also be a nice addition to inform the user what the old behavior was.

Note that you'll likely need to apply the following patch (if it is indeed a fix for the many fails that popped...)

--- a/yt/data_objects/derived_quantities.py
+++ b/yt/data_objects/derived_quantities.py
@@ -23,7 +23,7 @@ def get_position_fields(field, data):
             ftype = finfo.name[0]
         position_fields = [(ftype, f"particle_position_{d}") for d in axis_names]
     else:
-        position_fields = axis_names
+        position_fields = [("index", d) for d in axis_names]
 
     return position_fields

@cphyc
Copy link
Member

cphyc commented Jan 18, 2021

What's the status of this PR?

@matthewturk
Copy link
Member Author

matthewturk commented Jan 20, 2021 via email

Base automatically changed from master to main January 20, 2021 15:27
@cphyc
Copy link
Member

cphyc commented Feb 24, 2021

FYI I have applied the nose trick in #3092 to catch as many ambiguous access calls as possible. Apart from removing ambiguous accesses, is there something more you want to add in this PR @matthewturk ?

@jzuhone
Copy link
Contributor

jzuhone commented Feb 25, 2021

@matthewturk do you want help with this? I can look at the errors

@cphyc
Copy link
Member

cphyc commented Feb 26, 2021

@matthewturk do you want help with this? I can look at the errors

@jzuhone I've investigated a bit the issue(s) and the problem is that most of our tests use ambiguous field accesses. For example there are hundreds of occurences of ad["x"], but x is provided by ("gas", "x"), ("index", "x"), ("stream", "x") so each of one will result in an error. I have came up with an automated way of fixing them (using the type inferred by yt to replace the ambiguous line), which is located here #3101 .

@munkm munkm mentioned this pull request Mar 12, 2021
30 tasks
@cphyc cphyc mentioned this pull request Apr 12, 2021
2 tasks
cphyc added a commit to cphyc/yt that referenced this pull request Apr 12, 2021
Copy link
Member

@munkm munkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I think this looks good!

In the triage meeting today we decided to add an interim step for the ambiguous fields that for 4.0 we will issue a warning about the field being ambiguous to allow users to migrate scripts. In 4.1 the error will start and hopefully by then users will have shifted over scripts.

@cphyc
Copy link
Member

cphyc commented Apr 16, 2021

@yt-fido test this please

@cphyc
Copy link
Member

cphyc commented Apr 16, 2021

I am expected one unit test to fail due to the pixelization routines relying on ambiguous fields that I couldn't fix in #3101.

[EDIT:] it failed indeed!

@cphyc cphyc requested review from cphyc, munkm and neutrinoceros and removed request for cphyc April 20, 2021 13:55
@cphyc
Copy link
Member

cphyc commented Apr 20, 2021

Note for other reviewers: I have updated the PR following @munkm comments about throwing a warning rather than an error until 4.1.0 and added some more tests.

@cphyc cphyc dismissed their stale review April 20, 2021 14:02

I prefer to be considered a contributor than a reviewer on this one.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the tests are using pytest, the test file should be added to the ignore lists in tests/tests.yaml and nose_unit.cfg

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a few questions regarding the test but aside from that, I think this is very well done.

yt/fields/tests/test_ambiguous_fields.py Show resolved Hide resolved
yt/fields/tests/test_ambiguous_fields.py Show resolved Hide resolved
yt/fields/tests/test_ambiguous_fields.py Show resolved Hide resolved
Copy link
Member

@chummels chummels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out, and it works great. I just have the requested comment above about informing the user what field was actually used.

"is ambiguous and corresponds to any one of "
f"the following field types\n {self.field_info._ambiguous_field_names[fname]}. "
"Please specify the requested field as an explicit "
"tuple (ftype, fname)."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be beneficial to inform the user which field it defaulted to in this pass, by including something like: "Defaulting to ('ftype', 'fname')."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now!

yt/data_objects/static_output.py Outdated Show resolved Hide resolved
yt/data_objects/static_output.py Outdated Show resolved Hide resolved
yt/data_objects/static_output.py Outdated Show resolved Hide resolved
yt/data_objects/static_output.py Outdated Show resolved Hide resolved
yt/data_objects/static_output.py Outdated Show resolved Hide resolved
yt/data_objects/static_output.py Outdated Show resolved Hide resolved
Copy link
Member

@munkm munkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great now!

@neutrinoceros neutrinoceros merged commit d207f80 into yt-project:main Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-consistency naming conventions, code deduplication, informative error messages, code smells... backwards incompatible This change will change behavior bug yt-4.0 feature targeted for the yt-4.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants