Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable link time optimization #75

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

mngr777
Copy link
Contributor

@mngr777 mngr777 commented Aug 16, 2022

Enabling link time optimization for h3-pg/h3 improves performance by ~12--15%.

Test query:

WITH coords AS (SELECT * from generate_series(-85, 85, 0.1) AS lat, generate_series(-180, 180, 0.2) AS lon)
SELECT h3_to_geo_boundary(h3_geo_to_h3(point(lon, lat), 8))

Added compilation flags:

  • -flto enables LTO;
  • -fvisibility=hidden allows to inline H3 library functions in h3-pg (not allowed with default visibility since definitions can change at link time in this case);
  • -fwrapv allows to inline functions like degsToRads/radsToDegs (PGXS compiles extension with -fwrapv);
  • -mpc64 mitigates effect of extened floating point precision on x86 (using x87) and also improves performance by ~2--5%

Without -mpc64, LTO (inlining specifically?) causes subtle differences in floating point calculation results on x86, several tests failing for both h3-pg and h3 library. -fexcess-presicion=standard and -ffloat-store do not solve this problem completely.
Another possible solution would be to use SSE with mfpmath=sse -msse2.

@mngr777 mngr777 marked this pull request as ready for review August 16, 2022 12:32
@Komzpa
Copy link
Contributor

Komzpa commented Aug 22, 2022

Looks good to me.

@zachasme
Copy link
Owner

Looks great! Does this finally solve #23?

And how does it relate to uber/h3#360? Should any of these be part of the Release flag upstream?

@zachasme zachasme requested review from zachasme and removed request for zachasme August 23, 2022 09:37
@zachasme zachasme self-assigned this Aug 23, 2022
@mngr777
Copy link
Contributor Author

mngr777 commented Aug 23, 2022

@zachasme
Yes, this solves #23, at least $ objdump -D h3.so | grep -E '(_square|radsToDegs)' shows no matches.
Full list of functions that are inlined or not possible to inline (with some explanation) can be obtailed with CUSTOM_COPT= <...> -fopt-info-inline-optimized-missed=inline.txt: inline.txt

The flags are related to LTO or specific to gcc, nothing to use upstream.

@zachasme zachasme merged commit 20882d9 into zachasme:master Aug 23, 2022
@zachasme
Copy link
Owner

Thanks, @mngr777! I'll merge this, and release it as part of v4 (should be in a few days).

zachasme added a commit that referenced this pull request Aug 23, 2022
@zachasme zachasme mentioned this pull request Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants