Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symlog and log default behaviors #3944

Closed
chrishavlin opened this issue May 26, 2022 · 5 comments · Fixed by #3986
Closed

symlog and log default behaviors #3944

chrishavlin opened this issue May 26, 2022 · 5 comments · Fixed by #3986
Labels
bug question release critical Highest priority (in a milestone) viz: 2D
Milestone

Comments

@chrishavlin
Copy link
Contributor

chrishavlin commented May 26, 2022

This issue is a dedicated discussion to resolving some open questions on handling default behavior for normalization in the symlog but also IMO log scaling behavior. On the face, the question is just: when using symlog scaling, how should yt select a default value for the linthresh parameter? But answering that is actually tied pretty closely to an upstream matplotlib bug and our attempts at handling it...

background

With the new plot norm and colorbar handling in #3849 , we have returned to the discussion of a choosing good linthresh value. Since 3849 did not originate those changes, it's worth taking a broad look at related development threads to think through how to handle these defaults. So first, here's a quick summary of related development history:

The main conceptual issue with the current linthresh default is that for most cosmological simulations will have a much larger range than 3 orders of magnitude -- below I'll give some ideas for improving that. But underlying all of this, and the reason why I pulled out the development history above, is the same matplotlib bug that gave rise to the need for #3161, #3754 and #3858 in the first place, I'll highlight this in below as well.

improving the linthresh default

So the current way of calculating the linthresh default, max(abs(data)) / 1000, is IMO a nice approach because it will definitely not error for any range of data. But it's not great because many (most?) of the datasets that yt handles have much more than 3 orders of magnitude in range. So if we want to keep this as absolutely as simple as possible, I think we can just increase that scale to something like max(abs(data)) / 1e10 and end up with a better image for a wider range of datasets. If we're going to have a fixed default setting, I think we should default to one that is likely to work well for the astro datasets that yt mostly handles.

But we could generalize this a little better by actually checking what the dynamic range of our image array is and scaling by a value appropriate to the data range with some cutoff:

max_dynamic_range = 1e15  
actual_range = max_data_val - min(abs(data[data!=0]))  
if actual_range < max_dynamic_range:
    linthresh = min(abs(data[data!=0]))
else:
    linthresh = max_data_val / max_dynamic_range

it's not much more complicated than the current approach and would result in nice images for more datasets and fields than a fixed value.

when the range is too large....

So the solution above with a cutoff is suspiciously similar to the logic that was introduced by #3161 ... because I'm 99.9% certain we are still seeing the same matplotlib bug. When you use min(abs(data[data!=0])) with a very large range of data, you can end up with rounding errors in calculations like max_data_val - min(abs(data[data!=0])) that result in matplotlib crashes. This is true in both log and symlog normalizations and still persists in yt even with the current linthresh default. For example, I took the simple reproducer script from #3858 and modified it slightly and it again errors deep in matplotlib's internal scaling methods :

import numpy as np
import yt

shape = (64,64,1)
arr = np.full(shape, 5.e-324)
arr[0, 0] = -1e12
d = {'scalar': arr}

ds = yt.load_uniform_grid(d, shape)
p = yt.SlicePlot(ds, "z", ('stream', 'scalar'))
p.show()

the final bit of the stack trace is:

~/miniconda3/envs/yt_dev/lib/python3.7/site-packages/matplotlib/colors.py in inverse(self, value)
   1552             t_vmin, t_vmax = self._trf.transform([self.vmin, self.vmax])
   1553             if not np.isfinite([t_vmin, t_vmax]).all():
-> 1554                 raise ValueError("Invalid vmin or vmax")
   1555             value, is_scalar = self.process_value(value)
   1556             rescaled = value * (t_vmax - t_vmin)

ValueError: Invalid vmin or vmax

This errors with the latest matplotlib (3.5.2), on yt main and on the new color norm branch. But it is the same error we saw previously... because we're not actually using the upstream fix. The upstream matplotlib fix relies on a new keyword argument to imshow, (interpolation_stage, link), which we are not using. What I'm not sure about is whether it would be better to work on using this new argument or just use default ranges that are not susceptible to this bug (like in the suggestions above).

summary

So in the short term, I think we should simply adjust the default linthresh value. My preference is for the more dynamic option I gave above, but certainly open to other ideas!

In the slightly longer term, I'll try to see if we can actually use this new keyword argument -- it's easy enough to add it to our imshow calls if the matplotlib version is sufficient, but I'm not sure how/if we can use it directly with the matplotlib.colors norm objects. Using sensible defaults may be the best approach...

@chrishavlin
Copy link
Contributor Author

pinging @neutrinoceros and @chummels as the most interested parties, but all opinions are welcome!

@neutrinoceros neutrinoceros added this to the 4.1.0 milestone May 26, 2022
@neutrinoceros neutrinoceros added the release critical Highest priority (in a milestone) label May 26, 2022
@chrishavlin
Copy link
Contributor Author

In the slightly longer term, I'll try to see if we can actually use this new keyword argument ... but I'm not sure how/if we can use it directly with the matplotlib.colors norm objects.

just a note, was playing with this right now and I think just adding the keyword to our imshow calls should work, we wouldn't need to adjust anything with the Normalize objects. but I'm still not sure that's the best design choice.... will keep playing with it.

@neutrinoceros
Copy link
Member

Thank you Chris for opening this discussion, you did a great job a summarising the issues.
I agree that we should use the upstream fix when it's available (with sufficiently recent versions of MPL) if it can be done, and otherwise provide better defaults.
Thanks a lot for showing that even the current, supposedly more robust defaults, are not 100% bullet proof. For the record I added some tests for this category of edge cases in #3888. I can get behind any proposed new default that still survives these tests :)

@chummels
Copy link
Member

Thank you very much for opening this discussion and summarizing this somewhat complicated history, @chrishavlin . It's great to get this kind of context in a PR description! If MPL has provided a solution for these sort of situations, I suppose it makes sense to use it. But I agree with your solution that you provided above with limiting the dynamic range to 1e15, rather than the current default of 1e3. I think that's a reasonably easy fix in the short term.

@neutrinoceros
Copy link
Member

Agreed, so can we fix this as a follow up to #3849 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug question release critical Highest priority (in a milestone) viz: 2D
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants