symlog and log default behaviors #3944

chrishavlin · 2022-05-26T16:36:15Z

This issue is a dedicated discussion to resolving some open questions on handling default behavior for normalization in the symlog but also IMO log scaling behavior. On the face, the question is just: when using symlog scaling, how should yt select a default value for the linthresh parameter? But answering that is actually tied pretty closely to an upstream matplotlib bug and our attempts at handling it...

background

With the new plot norm and colorbar handling in #3849 , we have returned to the discussion of a choosing good linthresh value. Since 3849 did not originate those changes, it's worth taking a broad look at related development threads to think through how to handle these defaults. So first, here's a quick summary of related development history:

ProjectionPlot throws a MPL error when attempting to save() #2890 : a projection plot containing large data range and 0-values errors in yt's call to matplotlib's imshow when using the default log scale. This was addressed by switching to symlog when certain conditions are met (0-values in the data, data range above a cutoff value, 3.3 >= matplotlib version < 3.5) ( fixes in [bugfix] Fix log plots when using matplotilb 3.3.x if there are zeros in the plot #3161 , BUG: add missing MPL version check to a temporary hack introduced in #3161 #3754 ).
PR Make symlog choose a better default linthresh #3295 then adjusted the symlog parameters so that the default lintresh would be the minimum, nonzero absolute value of the data, which results in a nicer image for FIRE datasets.
BUG: invalid auto symlog parameters #3858 found that in some situations the new linthresh default could result in matplotlib crashes, this was fixed by BUG: fix a bug where automatically computed symlog linear threshold would sometimes cause a crash in matplotlib #3859 by adding another default linthresh, set to max(abs(data)) / 1000) (though it is in fact, only partly fixed, see later on...)
The latest, RFC: a more robust plot norm/colorbar API #3849, includes only the max(abs(data)) / 1000) default.

The main conceptual issue with the current linthresh default is that for most cosmological simulations will have a much larger range than 3 orders of magnitude -- below I'll give some ideas for improving that. But underlying all of this, and the reason why I pulled out the development history above, is the same matplotlib bug that gave rise to the need for #3161, #3754 and #3858 in the first place, I'll highlight this in below as well.

improving the linthresh default

So the current way of calculating the linthresh default, max(abs(data)) / 1000, is IMO a nice approach because it will definitely not error for any range of data. But it's not great because many (most?) of the datasets that yt handles have much more than 3 orders of magnitude in range. So if we want to keep this as absolutely as simple as possible, I think we can just increase that scale to something like max(abs(data)) / 1e10 and end up with a better image for a wider range of datasets. If we're going to have a fixed default setting, I think we should default to one that is likely to work well for the astro datasets that yt mostly handles.

But we could generalize this a little better by actually checking what the dynamic range of our image array is and scaling by a value appropriate to the data range with some cutoff:

max_dynamic_range = 1e15  
actual_range = max_data_val - min(abs(data[data!=0]))  
if actual_range < max_dynamic_range:
    linthresh = min(abs(data[data!=0]))
else:
    linthresh = max_data_val / max_dynamic_range

it's not much more complicated than the current approach and would result in nice images for more datasets and fields than a fixed value.

when the range is too large....

So the solution above with a cutoff is suspiciously similar to the logic that was introduced by #3161 ... because I'm 99.9% certain we are still seeing the same matplotlib bug. When you use min(abs(data[data!=0])) with a very large range of data, you can end up with rounding errors in calculations like max_data_val - min(abs(data[data!=0])) that result in matplotlib crashes. This is true in both log and symlog normalizations and still persists in yt even with the current linthresh default. For example, I took the simple reproducer script from #3858 and modified it slightly and it again errors deep in matplotlib's internal scaling methods :

import numpy as np
import yt

shape = (64,64,1)
arr = np.full(shape, 5.e-324)
arr[0, 0] = -1e12
d = {'scalar': arr}

ds = yt.load_uniform_grid(d, shape)
p = yt.SlicePlot(ds, "z", ('stream', 'scalar'))
p.show()

the final bit of the stack trace is:

~/miniconda3/envs/yt_dev/lib/python3.7/site-packages/matplotlib/colors.py in inverse(self, value)
   1552             t_vmin, t_vmax = self._trf.transform([self.vmin, self.vmax])
   1553             if not np.isfinite([t_vmin, t_vmax]).all():
-> 1554                 raise ValueError("Invalid vmin or vmax")
   1555             value, is_scalar = self.process_value(value)
   1556             rescaled = value * (t_vmax - t_vmin)

ValueError: Invalid vmin or vmax

This errors with the latest matplotlib (3.5.2), on yt main and on the new color norm branch. But it is the same error we saw previously... because we're not actually using the upstream fix. The upstream matplotlib fix relies on a new keyword argument to imshow, (interpolation_stage, link), which we are not using. What I'm not sure about is whether it would be better to work on using this new argument or just use default ranges that are not susceptible to this bug (like in the suggestions above).

summary

So in the short term, I think we should simply adjust the default linthresh value. My preference is for the more dynamic option I gave above, but certainly open to other ideas!

In the slightly longer term, I'll try to see if we can actually use this new keyword argument -- it's easy enough to add it to our imshow calls if the matplotlib version is sufficient, but I'm not sure how/if we can use it directly with the matplotlib.colors norm objects. Using sensible defaults may be the best approach...

The text was updated successfully, but these errors were encountered:

chrishavlin · 2022-05-26T16:37:09Z

pinging @neutrinoceros and @chummels as the most interested parties, but all opinions are welcome!

chrishavlin · 2022-05-26T16:51:23Z

In the slightly longer term, I'll try to see if we can actually use this new keyword argument ... but I'm not sure how/if we can use it directly with the matplotlib.colors norm objects.

just a note, was playing with this right now and I think just adding the keyword to our imshow calls should work, we wouldn't need to adjust anything with the Normalize objects. but I'm still not sure that's the best design choice.... will keep playing with it.

neutrinoceros · 2022-05-26T16:54:41Z

Thank you Chris for opening this discussion, you did a great job a summarising the issues.
I agree that we should use the upstream fix when it's available (with sufficiently recent versions of MPL) if it can be done, and otherwise provide better defaults.
Thanks a lot for showing that even the current, supposedly more robust defaults, are not 100% bullet proof. For the record I added some tests for this category of edge cases in #3888. I can get behind any proposed new default that still survives these tests :)

chummels · 2022-05-29T22:22:59Z

Thank you very much for opening this discussion and summarizing this somewhat complicated history, @chrishavlin . It's great to get this kind of context in a PR description! If MPL has provided a solution for these sort of situations, I suppose it makes sense to use it. But I agree with your solution that you provided above with limiting the dynamic range to 1e15, rather than the current default of 1e3. I think that's a reasonably easy fix in the short term.

neutrinoceros · 2022-05-30T06:26:18Z

Agreed, so can we fix this as a follow up to #3849 ?

chrishavlin added bug viz: 2D labels May 26, 2022

chrishavlin added the question label May 26, 2022

neutrinoceros added this to the 4.1.0 milestone May 26, 2022

neutrinoceros added the release critical Highest priority (in a milestone) label May 26, 2022

neutrinoceros mentioned this issue May 27, 2022

RFC: a more robust plot norm/colorbar API #3849

Merged

9 tasks

chrishavlin mentioned this issue Jun 24, 2022

Updating the linthresh guessing #3986

Merged

neutrinoceros linked a pull request Jul 13, 2022 that will close this issue

Updating the linthresh guessing #3986

Merged

chummels closed this as completed in #3986 Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symlog and log default behaviors #3944

symlog and log default behaviors #3944

chrishavlin commented May 26, 2022 •

edited

Loading

chrishavlin commented May 26, 2022

chrishavlin commented May 26, 2022

neutrinoceros commented May 26, 2022

chummels commented May 29, 2022

neutrinoceros commented May 30, 2022

symlog and log default behaviors #3944

symlog and log default behaviors #3944

Comments

chrishavlin commented May 26, 2022 • edited Loading

background

improving the linthresh default

when the range is too large....

summary

chrishavlin commented May 26, 2022

chrishavlin commented May 26, 2022

neutrinoceros commented May 26, 2022

chummels commented May 29, 2022

neutrinoceros commented May 30, 2022

chrishavlin commented May 26, 2022 •

edited

Loading