-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
symlog and log default behaviors #3944
Comments
pinging @neutrinoceros and @chummels as the most interested parties, but all opinions are welcome! |
just a note, was playing with this right now and I think just adding the keyword to our imshow calls should work, we wouldn't need to adjust anything with the |
Thank you Chris for opening this discussion, you did a great job a summarising the issues. |
Thank you very much for opening this discussion and summarizing this somewhat complicated history, @chrishavlin . It's great to get this kind of context in a PR description! If MPL has provided a solution for these sort of situations, I suppose it makes sense to use it. But I agree with your solution that you provided above with limiting the dynamic range to 1e15, rather than the current default of 1e3. I think that's a reasonably easy fix in the short term. |
Agreed, so can we fix this as a follow up to #3849 ? |
This issue is a dedicated discussion to resolving some open questions on handling default behavior for normalization in the symlog but also IMO log scaling behavior. On the face, the question is just: when using symlog scaling, how should yt select a default value for the linthresh parameter? But answering that is actually tied pretty closely to an upstream matplotlib bug and our attempts at handling it...
background
With the new plot norm and colorbar handling in #3849 , we have returned to the discussion of a choosing good linthresh value. Since 3849 did not originate those changes, it's worth taking a broad look at related development threads to think through how to handle these defaults. So first, here's a quick summary of related development history:
imshow
when using the default log scale. This was addressed by switching to symlog when certain conditions are met (0-values in the data, data range above a cutoff value, 3.3 >= matplotlib version < 3.5) ( fixes in [bugfix] Fix log plots when using matplotilb 3.3.x if there are zeros in the plot #3161 , BUG: add missing MPL version check to a temporary hack introduced in #3161 #3754 ).max(abs(data)) / 1000)
(though it is in fact, only partly fixed, see later on...)max(abs(data)) / 1000)
default.The main conceptual issue with the current linthresh default is that for most cosmological simulations will have a much larger range than 3 orders of magnitude -- below I'll give some ideas for improving that. But underlying all of this, and the reason why I pulled out the development history above, is the same matplotlib bug that gave rise to the need for #3161, #3754 and #3858 in the first place, I'll highlight this in below as well.
improving the linthresh default
So the current way of calculating the linthresh default,
max(abs(data)) / 1000
, is IMO a nice approach because it will definitely not error for any range ofdata
. But it's not great because many (most?) of the datasets that yt handles have much more than 3 orders of magnitude in range. So if we want to keep this as absolutely as simple as possible, I think we can just increase that scale to something likemax(abs(data)) / 1e10
and end up with a better image for a wider range of datasets. If we're going to have a fixed default setting, I think we should default to one that is likely to work well for the astro datasets that yt mostly handles.But we could generalize this a little better by actually checking what the dynamic range of our image array is and scaling by a value appropriate to the data range with some cutoff:
it's not much more complicated than the current approach and would result in nice images for more datasets and fields than a fixed value.
when the range is too large....
So the solution above with a cutoff is suspiciously similar to the logic that was introduced by #3161 ... because I'm 99.9% certain we are still seeing the same matplotlib bug. When you use
min(abs(data[data!=0]))
with a very large range of data, you can end up with rounding errors in calculations likemax_data_val - min(abs(data[data!=0]))
that result in matplotlib crashes. This is true in both log and symlog normalizations and still persists in yt even with the current linthresh default. For example, I took the simple reproducer script from #3858 and modified it slightly and it again errors deep in matplotlib's internal scaling methods :the final bit of the stack trace is:
This errors with the latest matplotlib (3.5.2), on yt main and on the new color norm branch. But it is the same error we saw previously... because we're not actually using the upstream fix. The upstream matplotlib fix relies on a new keyword argument to
imshow
, (interpolation_stage
, link), which we are not using. What I'm not sure about is whether it would be better to work on using this new argument or just use default ranges that are not susceptible to this bug (like in the suggestions above).summary
So in the short term, I think we should simply adjust the default linthresh value. My preference is for the more dynamic option I gave above, but certainly open to other ideas!
In the slightly longer term, I'll try to see if we can actually use this new keyword argument -- it's easy enough to add it to our imshow calls if the matplotlib version is sufficient, but I'm not sure how/if we can use it directly with the
matplotlib.colors
norm objects. Using sensible defaults may be the best approach...The text was updated successfully, but these errors were encountered: