-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Uninitialized memory in Athena++ selection #3619
Comments
This should not ever actually be in parallel, however. |
Ah. Then I have no clue what's happening. |
I'm going to close #3620 but I'd ask that you not delete the branch, as we may want to re-open it. I'll follow up here momentarily. |
ok, works for me ! |
OK, so what I'm concerned about with disabling this test is that we're going to see it crop up somewhere else. What we've seen in the past is that sometimes an error has come in, and then it will show up in one test, and then by disabling that test, another one breaks. I'm kind of skeptical that this is the problematic test. I'll start out by saying that a full-on valgrind-level audit is probably not workable, but, we can likely run the tests with the Cython optimizations all off, which could turn up errors that are currently being swallowed. I think what we should do is to first check if the failing test (which I know doesn't fail every time) is sensitive to the ordering of the tests, then run with all Cython optimizations off and look for runtime errors. Or maybe in reverse order. |
AFAICT athena_pp runs in a separate thread every time, i.e. there are no tests that could influence the result except for |
Any chance we could switch up the order in that?
…On Fri, Oct 22, 2021, 1:06 PM Kacper Kowalik ***@***.***> wrote:
I think what we should do is to first check if the failing test (which I
know doesn't fail every time) is sensitive to the ordering of the tests,
AFAICT athena_pp runs in a separate thread every time, i.e. there are no
tests that could influence the result except for
yt/frontends/athena_pp/tests/test_outputs.py
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3619 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAVXO4O5VD64DO26T2ETN3UIGR23ANCNFSM5GQ3QXPQ>
.
|
I'm 99% sure it's already the first test in that thread. They should be processed in order they're written in tests.yaml. |
With @Xarthisius 's help, this repro's the problem: import yt
yt.set_log_level(50)
import numpy as np
from yt.testing import (
assert_allclose,
assert_equal,
)
from yt.utilities.answer_testing.framework import (
data_dir_load,
)
disk = "KeplerianDisk/disk.out1.00000.athdf"
ds = yt.load_sample("KeplerianDisk")
assert_equal(str(ds), "disk.out1.00000")
dd = ds.all_data()
#vol = (ds.domain_right_edge[0] ** 3 - ds.domain_left_edge[0] ** 3) / 3.0
#vol *= np.cos(ds.domain_left_edge[1]) - np.cos(ds.domain_right_edge[1])
#vol *= ds.domain_right_edge[2].v - ds.domain_left_edge[2].v
#assert_allclose(dd.quantities.total_quantity(("gas", "cell_volume")), vol)
a = dd.quantities.total_quantity(("gas", "cell_volume"))
print(a, a.v == 43.23719530295646) I'll note that this always passes on mine, but, running via:
and ignoring everything before "Constructing meshes" results in the first error being some uninitialized memory used in |
Replacing |
@matthewturk @Xarthisius don't hesitate to edit this issue's title or labels as you make progress. If we can rule out parallelism issues then it should be triaged again. |
This has likely been resolved by #4562. |
Bug report
Bug summary
So there currently seems to be an issue with a single test that I've seen failing repeatedly these days.
Namely, it
yt/frontends/athenapp/tests/test_outputs.py::test_disk
and the failing assertion is:yt/yt/frontends/athena_pp/tests/test_outputs.py
Line 31 in 9a6def9
So we can see from the recent history that the call to
total_quantity
returns random results, includingNote that all the examples above are from running #3489, which is clearly unrelated to the failing code, but I've seen it for other PRs as well, it's hard to find another example within the recent history because most jobs we still have logs for were failed for different reasons or cancelled.
I assume this is a parellelism bug given that
total_quantity
isTotalQuantity
object, derived fromDerivedQuantity
, and workers synchronisation issues seem like a plausible explanation to getting random results to an otherwise deterministic functionsee
yt/yt/data_objects/derived_quantities.py
Line 55 in 69ada92
I think the breaking test isn't worth blocking PRs and consuming many hours of CI from resubmissions, so I'm going to open a with the problematic line deactivated, but it won't close this issue.
The text was updated successfully, but these errors were encountered: