ENH: perf improvements by skipping unnecessary symbolic unit computations #4071

yipihey · 2022-08-11T00:23:28Z

PR Summary

Profiled script

# t_prof.py

import yt
import numpy as np
from yt import derived_field
import unyt as u

for fn in range(5):
    ds = yt.load_sample("HiresIsolatedGalaxy")

    ds._periodicity = (False, False, False)
    v, c = ds.find_max(("gas", "density"))
    sp = ds.sphere(c, 10 * u.kpc)
    profiles = yt.create_profile(
        sp,
        "radius",
        [
            ("gas", "density"),
            ("gas", "radial_velocity"),
            ("gas", "tangential_velocity"),
            ("gas", "sound_speed"),
            ("gas", "mach_number"),
            ("gas", "temperature"),
        ],
        weight_field=("gas", "cell_mass"),
        units={"radius": "pc", "density": "amu/cm**3"},
        logs={"radius": True},
    )
    profiles.save_as_dataset("/tmp/test.h5")

Profiling and viz

python -m cProfile -o test.profile t_prof.py && snakeviz test.profile

on main:

This branch:

PR Checklist

[N/A] New features are documented, with docstrings and narrative docs
[N/A] Adds a test for any bugs fixed. Adds tests for new features.

edited by @neutrinoceros with details

This is to speed up calculations of the grid coordinates x, y, z which also benefits radius calculations and other geometric operations in derived fields.

…triggering calls to unyt.

neutrinoceros · 2022-08-11T08:18:29Z

Hi @yipihey,
it looks like this branch includes the patch from #4066 and is a logical continuation of it, so I'd advise to keep this PR and close the first one if that's okay with you.

Also, I have a good idea what's going on because we discussed it on Slack, but it'd be good to add some context here so it's clearer to others. In particular, having your benchmark script and show what observable you measured (and how) would be very useful.

Thanks a lot for working on this with us !

yipihey · 2022-08-12T15:21:03Z

Ah yes. Indeed. This adds one more place with to add a .d to avoid triggering unit calculations when the outcome is known.
I'm just learning how to keep different pull requests separate from different adds and commits and much appreciate your patience!

matthewturk

This looks to me like it could be an enormous improvement in performance -- with big impacts on generating ghost zones, as well! Thanks, @yipihey . I think if we can guarantee that these will be in the same units (which I think we can) it should be good to go.

yipihey · 2022-08-12T17:54:09Z

Super @matthewturk! Yes I imagine as a rule we should always be able to do all computation in code_units. It should be safe to assume the users chose those units as the ones best suited for any calculation on their data fields. Then things like .in_base and methods like it should just show up when plotting labels or coloring etc. to done with the preferred “output” units. T

…

On Aug 12, 2022, at 10:28 AM, Matthew Turk ***@***.***> wrote: @matthewturk approved this pull request. This looks to me like it could be an enormous improvement in performance -- with big impacts on generating ghost zones, as well! Thanks, @yipihey <https://github.com/yipihey> . I think if we can guarantee that these will be in the same units (which I think we can) it should be good to go. — Reply to this email directly, view it on GitHub <#4071 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQSDIOHWJGC7L6JW5YR2UDVY2CT3ANCNFSM56GMEFWA>. You are receiving this because you were mentioned.

neutrinoceros · 2022-08-12T18:13:30Z

Are you done or do you intend to push again here ?
In any case I will try it out this weekend with the benchmark you posted on slack !

jzuhone · 2022-08-12T19:03:37Z

Hi @yipihey,

It should be safe to assume the users chose those units as the ones best suited for any calculation on their data fields.

I would be a bit wary of assuming this in all cases--in this particular I think it's justified, however.

yipihey · 2022-08-13T12:23:40Z

Hi @yipihey,

It should be safe to assume the users chose those units as the ones best suited for any calculation on their data fields.

I would be a bit wary of assuming this in all cases--in this particular I think it's justified, however.

Yes. Definitely want to be careful. Perhaps it makes sense to think of it as a style guide? I.e. recommend that code units are used when doing internal calculations. Then over time we can see whether we find a case in which this would not be the best or at least equivalent choice.

neutrinoceros · 2022-08-13T17:13:13Z

@yipihey I completed the original message, reran your test script, and uploaded some basic screenshots from snakeviz.
From the runs I did on my machine I see no actual perf gain so I guess I am missing something. Maybe I'm not up to date on the script you ran, or I am not looking at the right level in the profile result ? Can you check (and correct !) what I did ?

yipihey · 2022-08-13T17:51:27Z

@yipihey I completed the original message, reran your test script, and uploaded some basic screenshots from snakeviz.
From the runs I did on my machine I see no actual perf gain so I guess I am missing something. Maybe I'm not up to date on the script you ran, or I am not looking at the right level in the profile result ? Can you check (and correct !) what I did ?

Oh fantastic. Thanks for checking.
Yes what you tested was the original script I started with and is pointing us into places to fix up.
This first tiny pull request should only affect creating of fcoords, widths, and volumes but is dominated by very large memory allocations from other routines.

I think in the script below we should notice fewer calls to array.py in the snakeviz outputs.

# Aprof.py: Use yt to make 1-D profiles and save them to a file.
import yt

for i in range(5):
    ds = yt.load_sample("HiresIsolatedGalaxy")

    ds._periodicity = (False, False, False) # Checking periodicity is slow in yt
    readit = ds.all_data()  # already defines xyz

    x = readit.fcoords[:,0]
    y = readit.fcoords[:,1]
    z = readit.fcoords[:,2]

    x2   = readit["index", "x"]
    y2   = readit["index", "y"]
    z2   = readit["index", "z"]
    vol  = readit["index", "volume"]

neutrinoceros · 2022-08-13T21:46:46Z

yes with that script I see about 4% gain. It's more than 1%/changed line ! If it meets your goal here, I guess we can just merge as is.

yipihey · 2022-08-13T21:59:37Z

Perfect. Yes. That is all for now. Thanks, Tom Typed on an IPhone. Please excuse typos and surprising autocorrects.

…

On Aug 13, 2022, at 14:47, Clément Robert ***@***.***> wrote: yes with that script I see about 4% gain. It's more than 1%/changed line ! If it meets your goal here, I guess we can just merge as is. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

yipihey · 2022-08-14T00:46:27Z

I found two more spots that are in exactly the same spirit as the previous changes. I think we should keep that in this PR.

neutrinoceros · 2022-08-14T05:35:09Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

welcome · 2022-08-14T06:20:11Z

Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆

yipihey · 2022-10-11T08:04:49Z

Yes let’s call it done for now. I think there are more similar tweaks that will be possible but it may touch more files and will take more time. Thanks! Tom

…

On Aug 12, 2022, at 11:13 AM, Clément Robert ***@***.***> wrote: Are you done or do you intend to push again here ? In any case I will try it out this weekend with the benchmark you posted on slack ! — Reply to this email directly, view it on GitHub <#4071 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQSDIL3PFUYTDH252IWWZDVY2H5LANCNFSM56GMEFWA>. You are receiving this because you were mentioned.

yipihey added 2 commits August 9, 2022 08:18

Update to grid_patch.py to avoid calling units

40ff988

This is to speed up calculations of the grid coordinates x, y, z which also benefits radius calculations and other geometric operations in derived fields.

Add same change to select_fwidth as we did for select_coord to avoid …

87f011c

…triggering calls to unyt.

neutrinoceros added enhancement Making something better performance labels Aug 11, 2022

neutrinoceros mentioned this pull request Aug 12, 2022

Update to grid_patch.py to avoid calling units #4066

Closed

2 tasks

matthewturk previously approved these changes Aug 12, 2022

View reviewed changes

neutrinoceros changed the title ~~Origin/patch 1~~ ENH: perf improvements by skipping unnecessary symbolic unit computations Aug 13, 2022

Add same change to two more files avoiding extra unit conversions.

6896ce2

yipihey dismissed matthewturk’s stale review via 6896ce2 August 14, 2022 00:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

38f43fd

for more information, see https://pre-commit.ci

neutrinoceros enabled auto-merge (squash) August 14, 2022 05:39

neutrinoceros approved these changes Aug 14, 2022

View reviewed changes

neutrinoceros merged commit 19b07f4 into yt-project:main Aug 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: perf improvements by skipping unnecessary symbolic unit computations #4071

ENH: perf improvements by skipping unnecessary symbolic unit computations #4071

yipihey commented Aug 11, 2022 •

edited by neutrinoceros

neutrinoceros commented Aug 11, 2022

yipihey commented Aug 12, 2022

matthewturk left a comment

yipihey commented Aug 12, 2022 via email

neutrinoceros commented Aug 12, 2022

jzuhone commented Aug 12, 2022

yipihey commented Aug 13, 2022

neutrinoceros commented Aug 13, 2022 •

edited

yipihey commented Aug 13, 2022

neutrinoceros commented Aug 13, 2022

yipihey commented Aug 13, 2022 via email

yipihey commented Aug 14, 2022

neutrinoceros commented Aug 14, 2022

welcome bot commented Aug 14, 2022

yipihey commented Oct 11, 2022 via email

ENH: perf improvements by skipping unnecessary symbolic unit computations #4071

ENH: perf improvements by skipping unnecessary symbolic unit computations #4071

Conversation

yipihey commented Aug 11, 2022 • edited by neutrinoceros

PR Summary

PR Checklist

neutrinoceros commented Aug 11, 2022

yipihey commented Aug 12, 2022

matthewturk left a comment

Choose a reason for hiding this comment

yipihey commented Aug 12, 2022 via email

neutrinoceros commented Aug 12, 2022

jzuhone commented Aug 12, 2022

yipihey commented Aug 13, 2022

neutrinoceros commented Aug 13, 2022 • edited

yipihey commented Aug 13, 2022

neutrinoceros commented Aug 13, 2022

yipihey commented Aug 13, 2022 via email

yipihey commented Aug 14, 2022

neutrinoceros commented Aug 14, 2022

welcome bot commented Aug 14, 2022

yipihey commented Oct 11, 2022 via email

yipihey commented Aug 11, 2022 •

edited by neutrinoceros

neutrinoceros commented Aug 13, 2022 •

edited