Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] Optionally enable Link Time Optimisation (LTO) #1540

Closed
wants to merge 4 commits into from

Conversation

inglor
Copy link
Contributor

@inglor inglor commented Mar 4, 2021

@inglor
Copy link
Contributor Author

inglor commented Mar 4, 2021

Benchmark results. The No-LTO and LTO values are taken from a median of 5 runs to eliminate any anomalies

Benchmarks

IdentifierCompleterFixture CandidatesWithCommonPrefix No-LTO iterations No-LTO real_time No-LTO cpu_time LTO iterations LTO real_time LTO cpu_time Δ iterations Δ real_time Δ cpu_time
1/0 1035392 669.498 668.995 1331270 535.538 535.204 295878 -133.96 -133.791
16/0 215822 3305.02 3284.35 232671 2928.76 2926.72 16849 -376.26 -357.63
256/0 10630 65414.7 65341.5 11621 59855.2 59811.8 991 -5559.5 -5529.7
4096/0 544 1187060 1177480 595 1222740 1221580 51 35680 44100
65536/0 26 30538500 30265300 26 29411000 29354400 0 -1127500 -910900
BigO 29.1129 28.8525 28.9294 28.8733 -0.183499999999999 0.020800000000001
RMS 0.0171755 0.0171014 0.011392 0.0112637 -0.0057835 -0.0058377
1/10 1037007 663.883 663.388 1191371 534.499 534.204 154364 -129.384 -129.184
16/10 216357 3181.41 3179.18 237600 2879.39 2877.37 21243 -302.02 -301.81
256/10 17681 39642.2 39610.6 18973 37010.5 36984.9 1292 -2631.7 -2625.7
4096/10 1185 586519 583039 1247 551788 551376 62 -34731 -31663
65536/10 63 10878800 10843900 63 11216800 11195500 0 338000 351600
BigO 10.5496 10.5324 10.6995 10.6792 0.149900000000001 0.146799999999999
RMS 0.0137034 0.0136282 0.0102307 0.0103637 -0.0034727 -0.0032645
PythonSupportFixture FilterAndSortUnstoredCandidatesWithCommonPrefix No-LTO iterations No-LTO real_time No-LTO cpu_time LTO iterations LTO real_time LTO cpu_time Δ iterations Δ real_time Δ cpu_time
1/0 242177 2917.62 2888.34 242061 2925.14 2899.42 -116 7.51999999999998 11.0799999999999
16/0 25716 27371.5 27322.6 23366 29988.9 29950.9 -2350 2617.4 2628.3
256/0 1835 382566 382248 1769 395917 395599 -66 13351 13351
4096/0 117 5872000 5867280 116 6080630 6075690 -1 208630 208410
65536/0 7 102380000 102259000 6 104917000 104789000 -1 2537000 2530000
BigO 1561.69 1559.85 1600.44 1598.5 38.75 38.6500000000001
RMS 0.010824 0.010778 0.00819893 0.0081527 -0.00262507 -0.0026253
1/50 245234 2878.51 2853.36 243985 2896.39 2861.34 -1249 17.8799999999997 7.98000000000002
16/50 25748 27069.9 27035.5 25093 28080.9 28043.7 -655 1011 1008.2
256/50 1939 359544 359229 1865 373679 373396 -74 14135 14167
4096/50 126 5541290 5536810 120 5780400 5776280 -6 239110 239470
65536/50 8 93582200 93479200 7 96732700 96624400 -1 3150500 3145200
BigO 1427.69 1426.11 1475.77 1474.12 48.0799999999999 48.01
RMS 0.0062781 0.00624379 0.00524658 0.0052236 -0.00103152 -0.00102019
PythonSupportFixture FilterAndSortStoredCandidatesWithCommonPrefix No-LTO iterations No-LTO real_time No-LTO cpu_time LTO iterations LTO real_time LTO cpu_time Δ iterations Δ real_time Δ cpu_time
1/0 1160462 600.242 599.783 1220665 580.282 579.808 60203 -19.9599999999999 -19.975
16/0 198516 3534.62 3531.83 200282 3513.16 3510.71 1766 -21.46 -21.1199999999999
256/0 10931 64062.1 64008.9 11235 62721.4 62670.8 304 -1340.7 -1338.1
4096/0 725 961166 960328 746 946425 945726 21 -14741 -14602
65536/0 27 25770400 25712800 28 24816800 24769200 1 -953600 -943600
BigO 24.5656 24.5107 23.6575 23.6122 -0.908100000000001 -0.898499999999999
RMS 0.0203502 0.0202539 0.0181326 0.0180176 -0.0022176 -0.0022363
1/50 1164363 597.419 596.935 1201266 578.338 578.001 36903 -19.081 -18.934
16/50 199314 3548.07 3545.31 199449 3502.68 3500.27 135 -45.3900000000003 -45.04
256/50 16977 41276.4 41243.2 16959 41089.1 41059 -18 -187.300000000003 -184.199999999997
4096/50 1118 625531 624666 1093 660453 659938 -25 34922 35272
65536/50 42 16417900 16385900 42 16738200 16706400 0 320300 320500
BigO 15.6513 15.6208 15.9574 15.9272 0.306099999999999 0.3064
RMS 0.0189555 0.018896 0.0152664 0.0151449 -0.0036891 -0.0037511

Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow... Reviewable doesn't know how to render tables...

Anyway, LTO seems to be better with a small dataset and progressively get worse as the use cases get worse. The PythonSupportWhatever benchmarks for really heavy workloads didn't improve and those are our weakest points.

Reviewed 2 of 2 files at r1.
Reviewable status: 0 of 2 LGTMs obtained (waiting on @inglor)


cpp/CMakeLists.txt, line 231 at r1 (raw file):

# diagnostics.
include(CheckIPOSupported)
check_ipo_supported( RESULT LTOAvailable )

We can just assume this is available, because we only support clang, gcc and msvc.


cpp/ycm/CMakeLists.txt, line 446 at r1 (raw file):

###############################################################################

if ( LTOAvailable )

Do we want this in debug builds?

Copy link
Contributor Author

@inglor inglor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained (waiting on @bstaletic)


cpp/CMakeLists.txt, line 231 at r1 (raw file):

Previously, bstaletic (Boris Staletic) wrote…

We can just assume this is available, because we only support clang, gcc and msvc.

Done.


cpp/ycm/CMakeLists.txt, line 446 at r1 (raw file):

Previously, bstaletic (Boris Staletic) wrote…

Do we want this in debug builds?

Done.

@inglor inglor requested a review from bstaletic April 2, 2021 22:01
@inglor inglor changed the title Optionally enable Link Time Optimisation (LTO) [READY] Optionally enable Link Time Optimisation (LTO) Apr 3, 2021
@codecov
Copy link

codecov bot commented Apr 22, 2021

Codecov Report

Merging #1540 (7d36556) into master (b69b980) will increase coverage by 0.05%.
The diff coverage is n/a.

❗ Current head 7d36556 differs from pull request most recent head 8f89dfb. Consider uploading reports for the commit 8f89dfb to get more accurate results

@@            Coverage Diff             @@
##           master    #1540      +/-   ##
==========================================
+ Coverage   96.32%   96.37%   +0.05%     
==========================================
  Files          90       90              
  Lines        7839     7839              
  Branches      164      164              
==========================================
+ Hits         7551     7555       +4     
+ Misses        235      231       -4     
  Partials       53       53              

Copy link
Member

@puremourning puremourning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: 1 of 2 LGTMs obtained (waiting on @bstaletic)

Copy link
Member

@puremourning puremourning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 LGTMs obtained (waiting on @bstaletic)

Copy link
Contributor Author

@inglor inglor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained (waiting on @bstaletic)

Copy link

@mark2185 mark2185 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foo

cpp/ycm/CMakeLists.txt Show resolved Hide resolved
Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puremourning My only concern with this are the benchmarks. Me and @inglor have both measured that LTO makes things from ~3% faster to ~3% slower. CI agrees, except on Winblows where it... let me check... all benchmarks turned out to be 10% slower with LTO. I would be fine with 3%, but 10%?

Now, that is CI, which is running some unknown CPU with an unknown load. Not exactly a perfect measuring device.

Your thoughts?

Dismissed @mark2185 from a discussion.
Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained (waiting on @inglor)

@mergify
Copy link
Contributor

mergify bot commented Apr 25, 2021

Thanks for sending a PR!

Copy link
Member

@puremourning puremourning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained (waiting on @inglor)

a discussion (no related file):
Cancel.


Copy link
Member

@puremourning puremourning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your thoughts?

If LTO does not produce a measurable, reliable improvement, then there is no real point in the change. My intuition is that 3% of the "fast part" is not going to move the needle when the "slow part" is added back (*he said, without any measurements to prove this), even if the 3% is in our favour.

We certainly don't want a 10% regression, that's for sure.

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained (waiting on @inglor)

Copy link
Contributor Author

@inglor inglor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will close this and revisit once we get newer compiler versions on CI.

Reviewable status: :shipit: complete! 2 of 2 LGTMs obtained (waiting on @puremourning)

a discussion (no related file):

Previously, puremourning (Ben Jackson) wrote…

Cancel.

Done.


@inglor inglor closed this Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants