Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] perf run for rustc-hash candidate #125133

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

orlp
Copy link
Contributor

@orlp orlp commented May 14, 2024

@rustbot
Copy link
Collaborator

rustbot commented May 14, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels May 14, 2024
@rustbot
Copy link
Collaborator

rustbot commented May 14, 2024

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@orlp
Copy link
Contributor Author

orlp commented May 14, 2024

r? @thomcc

@rustbot rustbot assigned thomcc and unassigned Mark-Simulacrum May 14, 2024
@thomcc
Copy link
Member

thomcc commented May 14, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2024
@bors
Copy link
Contributor

bors commented May 14, 2024

⌛ Trying commit 4484f37 with merge 9c2b2da...

bors added a commit to rust-lang-ci/rust that referenced this pull request May 14, 2024
[DO NOT MERGE] perf run for rustc-hash candidate

See rust-lang/rustc-hash#37.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented May 15, 2024

☀️ Try build successful - checks-actions
Build commit: 9c2b2da (9c2b2da640319fec27adc6d9bdb2781a5addeb00)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9c2b2da): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.3%] 12
Regressions ❌
(secondary)
0.3% [0.1%, 0.7%] 13
Improvements ✅
(primary)
-0.3% [-0.3%, -0.2%] 4
Improvements ✅
(secondary)
-0.7% [-1.1%, -0.5%] 13
All ❌✅ (primary) 0.1% [-0.3%, 0.3%] 16

Max RSS (memory usage)

Results (secondary -6.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-6.0% [-6.0%, -6.0%] 1
All ❌✅ (primary) - - 0

Cycles

Results (primary 1.4%, secondary -2.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.4%, 1.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.9% [-3.3%, -2.4%] 10
All ❌✅ (primary) 1.4% [1.4%, 1.4%] 1

Binary size

Results (secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 678.372s -> 677.551s (-0.12%)
Artifact size: 316.02 MiB -> 316.07 MiB (0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 15, 2024
@thomcc
Copy link
Member

thomcc commented May 15, 2024

Looks green on cycles, walltime, task clock, cpu clock, ... Instructions are somewhat surprisingly a mixed bag (surprising as it does strictly less work, IIUC).

@rustbot rustbot added A-rustdoc-json Area: Rustdoc JSON backend A-testsuite Area: The testsuite used to check the correctness of rustc labels May 15, 2024
@rustbot
Copy link
Collaborator

rustbot commented May 15, 2024

Some changes occurred in exhaustiveness checking

cc @Nadrieril

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

rustdoc-json-types is a public (although nightly-only) API. If possible, consider changing src/librustdoc/json/conversions.rs; otherwise, make sure you bump the FORMAT_VERSION constant.

cc @CraftSpider, @aDotInTheVoid, @Enselic, @obi1kenobi

@orlp
Copy link
Contributor Author

orlp commented May 15, 2024

@thomcc It is not strictly less work, it adds an extra bit rotation for the pure integer hash case where there wasn't anything before.

These results are a bit more of a mixed bag than the results in my previous experiments, I believe because I gated some string optimizations behind a nightly feature, which I didn't do in my experimental code. I tried enabling the feature in the [patch] section which is silly as it does not work. Can you re-run the benchmark now that I've specified the feature in the respective Cargo.tomls?

@thomcc
Copy link
Member

thomcc commented May 15, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 15, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request May 15, 2024
[DO NOT MERGE] perf run for rustc-hash candidate

See rust-lang/rustc-hash#37.
@bors
Copy link
Contributor

bors commented May 15, 2024

⌛ Trying commit f4b59b1 with merge c358310...

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Getting action download info
Download action repository 'msys2/setup-msys2@v2.22.0' (SHA:cc11e9188b693c2b100158c3322424c4cc1dadea)
Download action repository 'actions/checkout@v4' (SHA:0ad4b8fadaa221de15dcec353f45205ec38ea70b)
Download action repository 'actions/upload-artifact@v4' (SHA:65462800fd760344b1a7b4382951275a0abb4808)
Complete job name: PR - mingw-check-tidy
git config --global core.autocrlf false
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
---
COPY scripts/sccache.sh /scripts/
RUN sh /scripts/sccache.sh

COPY host-x86_64/mingw-check/reuse-requirements.txt /tmp/
RUN pip3 install --no-deps --no-cache-dir --require-hashes -r /tmp/reuse-requirements.txt \
    && pip3 install virtualenv
COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/

# NOTE: intentionally uses python2 for x.py so we can test it still works.
# NOTE: intentionally uses python2 for x.py so we can test it still works.
# validate-toolstate only runs in our CI, so it's ok for it to only support python3.
ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test \
           --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes reuse-requirements.in
---

#12 [5/8] COPY host-x86_64/mingw-check/reuse-requirements.txt /tmp/
#12 DONE 0.0s

#13 [6/8] RUN pip3 install --no-deps --no-cache-dir --require-hashes -r /tmp/reuse-requirements.txt     && pip3 install virtualenv
#13 0.441   Downloading binaryornot-0.4.4-py2.py3-none-any.whl (9.0 kB)
#13 0.527 Collecting boolean-py==4.0
#13 0.536   Downloading boolean.py-4.0-py3-none-any.whl (25 kB)
#13 0.560 Collecting chardet==5.1.0
---
#13 3.947 Building wheels for collected packages: reuse
#13 3.948   Building wheel for reuse (pyproject.toml): started
#13 4.285   Building wheel for reuse (pyproject.toml): finished with status 'done'
#13 4.286   Created wheel for reuse: filename=reuse-1.1.0-cp310-cp310-manylinux_2_35_x86_64.whl size=181117 sha256=f5f58750481f69515c2c0d1d503daf565e2565c370d07fc6aeb95fe3498b4269
#13 4.286   Stored in directory: /tmp/pip-ephem-wheel-cache-x6sm0_n7/wheels/c2/3c/b9/1120c2ab4bd82694f7e6f0537dc5b9a085c13e2c69a8d0c76d
#13 4.289 Installing collected packages: boolean-py, binaryornot, setuptools, reuse, python-debian, markupsafe, license-expression, jinja2, chardet
#13 4.312   Attempting uninstall: setuptools
#13 4.313     Found existing installation: setuptools 59.6.0
#13 4.314     Not uninstalling setuptools at /usr/lib/python3/dist-packages, outside environment /usr
---
#13 5.612   Downloading virtualenv-20.26.2-py3-none-any.whl (3.9 MB)
#13 5.784      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 23.0 MB/s eta 0:00:00
#13 5.840 Collecting filelock<4,>=3.12.2
#13 5.847   Downloading filelock-3.14.0-py3-none-any.whl (12 kB)
#13 5.880 Collecting platformdirs<5,>=3.9.1
#13 5.887   Downloading platformdirs-4.2.2-py3-none-any.whl (18 kB)
#13 5.907 Collecting distlib<1,>=0.3.7
#13 5.915   Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
#13 5.927      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 KB 50.0 MB/s eta 0:00:00
#13 6.013 Installing collected packages: distlib, platformdirs, filelock, virtualenv
#13 6.178 Successfully installed distlib-0.3.8 filelock-3.14.0 platformdirs-4.2.2 virtualenv-20.26.2
#13 DONE 6.3s

#14 [7/8] COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
#14 DONE 0.0s
---
DirectMap4k:      208832 kB
DirectMap2M:     8179712 kB
DirectMap1G:    10485760 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
    Finished `dev` profile [unoptimized] target(s) in 0.03s
##[endgroup]
downloading https://ci-artifacts.rust-lang.org/rustc-builds-alt/0160bff4b1bffa241299aba8c8c63e7a3cd871fe/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
extracting /checkout/obj/build/cache/llvm-0160bff4b1bffa241299aba8c8c63e7a3cd871fe-true/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz to /checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm
---
   Compiling ignore v0.4.20
   Compiling cargo_metadata v0.15.4
   Compiling miropt-test-tools v0.1.0 (/checkout/src/tools/miropt-test-tools)
   Compiling termcolor v1.4.1
   Compiling rustc-hash v1.1.0 (https://github.com/orlp/rustc-hash/?branch=faster-hash#d5b36344)
error[E0725]: the feature `hasher_prefixfree_extras` is not in the list of allowed features
  --> /cargo/git/checkouts/rustc-hash-e2a4418644c61f2d/d5b3634/src/lib.rs:18:42
   |
18 | #![cfg_attr(feature = "nightly", feature(hasher_prefixfree_extras))]

error[E0658]: use of unstable library feature 'hasher_prefixfree_extras'
error[E0658]: use of unstable library feature 'hasher_prefixfree_extras'
   --> /cargo/git/checkouts/rustc-hash-e2a4418644c61f2d/d5b3634/src/lib.rs:149:5
    |
149 | /     fn write_length_prefix(&mut self, len: usize) {
150 | |         // Most cases will specialize hash_slice anyway which calls write(),
151 | |         // which encodes the length already.
    | |_____^
    |
    = note: see issue #96762 <https://github.com/rust-lang/rust/issues/96762> for more information
    = help: add `#![feature(hasher_prefixfree_extras)]` to the crate attributes to enable
    = help: add `#![feature(hasher_prefixfree_extras)]` to the crate attributes to enable
    = note: this compiler was built on 2024-04-28; consider upgrading it if it is out of date

error[E0658]: use of unstable library feature 'hasher_prefixfree_extras'
   --> /cargo/git/checkouts/rustc-hash-e2a4418644c61f2d/d5b3634/src/lib.rs:156:5
156 | /     fn write_str(&mut self, s: &str) {
157 | |         // We don't need anything special here.
158 | |         self.write(s.as_bytes())
159 | |     }

@bors
Copy link
Contributor

bors commented May 15, 2024

☀️ Try build successful - checks-actions
Build commit: c358310 (c358310bb0ab238e25e60f1e77f96e0a2b270d0b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c358310): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.3%] 5
Regressions ❌
(secondary)
0.3% [0.3%, 0.4%] 5
Improvements ✅
(primary)
-0.3% [-0.4%, -0.2%] 5
Improvements ✅
(secondary)
-0.7% [-1.2%, -0.3%] 15
All ❌✅ (primary) -0.0% [-0.4%, 0.3%] 10

Max RSS (memory usage)

Results (primary -1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [2.4%, 2.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.1% [-2.7%, -1.2%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.0% [-2.7%, 2.4%] 4

Cycles

Results (secondary -3.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.4% [-4.3%, -2.8%] 9
All ❌✅ (primary) - - 0

Binary size

Results (secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 679.642s -> 678.195s (-0.21%)
Artifact size: 315.96 MiB -> 316.01 MiB (0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 15, 2024
@Nilstrieb
Copy link
Member

it's mostly neutral, but there is a bit of green, that's awesome, congrats! makes sense then to move ahead and replace it in rustc_hash

@nnethercote
Copy link
Contributor

I see a small but clear instructions win for unused-warnings and a clear cycles win for tuple-stress, both secondary benchmarks. Very small improvements, really, and it's not obvious to me that this change should be merged. FxHash is used so heavily that I would prefer a stronger signal before replacing it with something entirely different.

@WaffleLapkin
Copy link
Member

It's intresting that cache-misses and branch-misses are so green.

@orlp
Copy link
Contributor Author

orlp commented May 16, 2024

@nnethercote If you'd like I could do a run where I remove the final rotate_left which I think would in general lead to green across the board for instruction count, but would also make the hash significantly more vulnerable to misuse like the current one is.

This PR isn't strictly a performance PR - it's also one that makes rustc-hash more robust against accidental heavy multicollisions if your only differences happens to be in the high bits of the last hashed word.

Also, for cycle counts there are 186 regressions but 353 improvements if you include all the results.

@nnethercote
Copy link
Contributor

This PR isn't strictly a performance PR - it's also one that makes rustc-hash more robust against accidental heavy multicollisions if your only differences happens to be in the high bits of the last hashed word.

So it seems the argument is that the new hash gives slightly better performance and has better protection against bad cases. On the latter item, from the design I can believe that the new algorithm is less collision-prone, but it would be good to have some empirical evidence.

A couple of ideas:

  • Compare the old and new versions through a hash function evaluation suite. (I know these exist, I can't remember the names of any now, I figure you probably know a suitable one.)
  • manually implement Hash for DefId #91660 is one example I know of where the existing FxHasher behaved really badly. The commits in that PR include comments that explain things well. There are two 32-bit fields in a struct that get combined into a 64-bit integer and then hashed, and if you do it one order with the existing FxHasher you get good results and if you do it in the other order you get terrible results. So you could repeat that "change the order" experiment with the current FxHasher to reconfirm the big slowdown, and then retry that experiment with the new hasher and see if it avoids the slowdown. The slowdowns are big enough that you might not even need to run rustc-perf. Just doing cargo check on stm32f might be sufficient.

Good results on one or both of those would be enough to convince me to switch algorithms. What do you think? I don't want to be a roadblock, but I also want to be cautious when changing such a critical piece of code.

@orlp
Copy link
Contributor Author

orlp commented May 17, 2024

@nnethercote Thanks for bringing #91660 to my attention, because this is precisely what this hash mitigates. I quote from the linked PR:

// The order here has direct impact on `FxHash` quality because we have far more `DefIndex` per
// crate than we have `Crate`s within one compilation. Or in other words, this arrangement puts
// more entropy in the low bits than the high bits. The reason this matters is that `FxHash`, which
// is used throughout rustc, has problems distributing the entropy from the high bits, so reversing
// the order would lead to a large number of collisions and thus far worse performance.
//
// On 64-bit big-endian systems, this compiles to a 64-bit rotation by 32 bits, which is still
// faster than another `FxHash` round.
#[cfg(target_pointer_width = "64")]
impl Hash for DefId {
    fn hash<H: Hasher>(&self, h: &mut H) {
        (((self.krate.as_u32() as u64) << 32) | (self.index.as_u32() as u64)).hash(h)
    }
}

If you don't mind, I'd like to do a simulated benchmark to demonstrate the effectiveness of the new hash in this scenario. Suppose we have 64 crates and 1024 def_index's per crate (which adds up to a total of 2^16 elements), so a hash table of 2^17 elements should fit all the data with a load factor of 0.5. Let's see how many collisions for hash table buckets we get in that scenario:

use core::hash::{BuildHasher, BuildHasherDefault};

fn compute_max_collisions(name: &str, hasher: impl BuildHasher) {
    let table_mask = (1 << 17) - 1;
    let mut table_orig_order = vec![0u64; 1 << 17];
    let mut table_swap_order = vec![0u64; 1 << 17];
    for krate in 0..64_u32 {
        for def_index in 0..1024_u32 {
            let h_orig = hasher.hash_one(((def_index as u64) << 32) | krate as u64);
            let h_swap = hasher.hash_one(((krate as u64) << 32) | def_index as u64);
            table_orig_order[(h_orig & table_mask) as usize] += 1;
            table_swap_order[(h_swap & table_mask) as usize] += 1;
        }
    }
    
    let max_orig = table_orig_order.into_iter().max().unwrap();
    let max_swap = table_swap_order.into_iter().max().unwrap();
    println!("{name} max orig collisions: {max_orig}");
    println!("{name} max swap collisions: {max_swap}");
}

fn main() {
    compute_max_collisions("current", BuildHasherDefault::<current::FxHasher>::default());
    compute_max_collisions("orlp", BuildHasherDefault::<orlp::FxHasher>::default());
}

You can run it yourself here, I just copy/pasted the current and my implementation: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=7ca94ca09b9ab282874e6ae4c08d7c88.

So what are the results?

current max orig collisions: 1024
current max swap collisions: 64
orlp max orig collisions: 2
orlp max swap collisions: 2

Now, hashbrown is a combination of larger hash table buckets + a tag instead of only a bucket index which mitigates this, but the principle stays the same.

@nnethercote
Copy link
Contributor

Ok, that satisfies me, thanks!

Perhaps the PR for this repo that incorporates the new FxHash should simplify the DefId hasher, then -- sounds like we won't need the big/little endian split and the long comment any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants