Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Jan 26, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

kddnewton and others added 9 commits January 26, 2026 14:43
Though very unlikely, it could potentially allocate a large array
of whitespace.

ruby/prism@3389947819
timer_thread_check_exceed() was returning true when the remaining time
was less than 1ms, treating it as "too short time". This caused
sub-millisecond sleeps (like sleep(0.0001)) to return immediately
instead of actually sleeping.

The fix removes this optimization that was incorrectly short-circuiting
short sleep durations. Now the timeout is only considered exceeded when
the actual deadline has passed.

Note: There's still a separate performance issue where MN_THREADS mode
is slower for sub-millisecond sleeps due to the timer thread using
millisecond-resolution polling. This will require a separate fix to
use sub-millisecond timeouts in kqueue/epoll.

[Bug #21836]
[Feature #21846]

There is a single path through our GC Sweeping code, and we always call
rb_gc_obj_free_vm_weak_references and rb_gc_obj_free before adding the
object back to the freelist.

We do this even when the object has no external resources that require
being free'd and has no weak references pointing to it.

This commit introduces a conservative fast path through gc_sweep_plane
that uses the object flags to identify certain cases where these calls
can be skipped - for these objects we just add them straight back on the
freelist. Any object for which gc_sweep_fast_path_p returns false will
use the current full sweep code (referred to here as the slow path).

Currently there are 2 checks that
will _always_ require an object to go down the slow path:

1. Has it's object_id been observed and stored in the id2ref_table
2. Has it got generic ivars in the gen_fields table

If neither of these are true, then we run some flag checks on the object
and send the following cases down the fast path:

- Objects that are not heap allocated
- Embedded strings that aren't in the fstring table
- Embedded Arrays
- Embedded Hashes
- Embedded Bignums
- Embedded Strings
- Floats, Rationals and Complex
- Various IMEMO subtypes that do no allocation

We've benchmarked this code using ruby-bench as well as the gcbench
benchmarks inside Ruby (benchmarks/gc) and this patch results in a
modest speed improvement on almost all of the headline benchmarks (2% in
railsbench with YJIT enabled), and an observable 30% improvement in time
spent sweeping during the GC benchmarks:

```
master: ruby 4.1.0dev (2026-01-19T12:03:33Z master 859920d) +YJIT +PRISM [x86_64-linux]
experiment: ruby 4.1.0dev (2026-01-16T21:36:46Z mvh-sweep-fast-pat.. c3ffe37) +YJIT +PRISM [x86_64-linux]

--------------  -----------  ----------  ---------------  ----------  ------------------  -----------------
bench           master (ms)  stddev (%)  experiment (ms)  stddev (%)  experiment 1st itr  master/experiment
lobsters        N/A          N/A         N/A              N/A         N/A                 N/A
activerecord    132.5        0.9         132.5            1.0         1.056               1.001
chunky-png      577.2        0.4         580.1            0.4         0.994               0.995
erubi-rails     902.9        0.2         894.3            0.2         1.040               1.010
hexapdf         1763.9       3.3         1760.6           3.7         1.027               1.002
liquid-c        56.9         0.6         56.7             1.4         1.004               1.003
liquid-compile  46.3         2.1         46.1             2.1         1.005               1.004
liquid-render   77.8         0.8         75.1             0.9         1.023               1.036
mail            114.7        0.4         113.0            1.4         1.054               1.015
psych-load      1635.4       1.4         1625.9           0.5         0.988               1.006
railsbench      1685.4       2.4         1650.1           2.0         0.989               1.021
rubocop         133.5        8.1         130.3            7.8         1.002               1.024
ruby-lsp        140.3        1.9         137.5            1.8         1.007               1.020
sequel          64.6         0.7         63.9             0.7         1.003               1.011
shipit          1196.2       4.3         1181.5           4.2         1.003               1.012
--------------  -----------  ----------  ---------------  ----------  ------------------  -----------------

Legend:
- experiment 1st itr: ratio of master/experiment time for the first benchmarking iteration.
- master/experiment: ratio of master/experiment time. Higher is better for experiment. Above 1 represents a speedup.
```

```
Benchmark      │    Wall(B)   Sweep(B)  Mark(B) │    Wall(E)   Sweep(E)  Mark(E) │   Wall Δ  Sweep Δ
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
null           │     0.000s        1ms      4ms │     0.000s        1ms      4ms │       0%       0%
hash1          │     4.330s      875ms     46ms │     3.960s      531ms     44ms │ +8.6% +39.3%
hash2          │     6.356s      243ms    988ms │     6.298s      176ms    1.03s │ +0.9% +27.6%
rdoc           │    37.337s      2.42s    1.09s │    36.678s      2.11s    1.20s │ +1.8% +13.1%
binary_trees   │     3.366s      426ms    252ms │     3.082s      275ms    239ms │ +8.4% +35.4%
ring           │     5.252s       14ms    2.47s │     5.327s       12ms    2.43s │ -1.4% +14.3%
redblack       │     2.966s       28ms     41ms │     2.940s       21ms     38ms │ +0.9% +25.0%
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────

Legend: (B) = Baseline, (E) = Experiment, Δ = improvement (positive = faster)
        Wall = total wallclock, Sweep = GC sweeping time, Mark = GC marking time
        Times are median of 3 runs
```

These results are also borne out when YJIT is disabled:

```
master: ruby 4.1.0dev (2026-01-19T12:03:33Z master 859920d) +PRISM [x86_64-linux]
experiment: ruby 4.1.0dev (2026-01-16T21:36:46Z mvh-sweep-fast-pat.. c3ffe37) +PRISM [x86_64-linux]

--------------  -----------  ----------  ---------------  ----------  ------------------  -----------------
bench           master (ms)  stddev (%)  experiment (ms)  stddev (%)  experiment 1st itr  master/experiment
lobsters        N/A          N/A         N/A              N/A         N/A                 N/A
activerecord    389.6        0.3         377.5            0.3         1.032               1.032
chunky-png      1123.4       0.2         1109.2           0.2         1.013               1.013
erubi-rails     1754.3       0.1         1725.7           0.1         1.035               1.017
hexapdf         3346.5       0.9         3326.9           0.7         1.003               1.006
liquid-c        84.0         0.5         83.5             0.5         0.992               1.006
liquid-compile  74.0         1.5         73.5             1.4         1.011               1.008
liquid-render   199.9        0.4         199.6            0.4         1.000               1.002
mail            177.8        0.4         176.4            0.4         1.069               1.008
psych-load      2749.6       0.7         2777.0           0.0         0.980               0.990
railsbench      2983.0       1.0         2965.5           0.8         1.041               1.006
rubocop         228.8        1.0         227.5            1.2         1.015               1.005
ruby-lsp        221.8        0.9         216.1            0.8         1.011               1.026
sequel          89.1         0.5         89.1             1.8         1.005               1.000
shipit          2385.6       1.6         2371.8           1.0         1.002               1.006
--------------  -----------  ----------  ---------------  ----------  ------------------  -----------------

Legend:
- experiment 1st itr: ratio of master/experiment time for the first benchmarking iteration.
- master/experiment: ratio of master/experiment time. Higher is better for experiment. Above 1 represents a speedup.
```

```
Benchmark      │    Wall(B)   Sweep(B)  Mark(B) │    Wall(E)   Sweep(E)  Mark(E) │   Wall Δ  Sweep Δ
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────
null           │     0.000s        1ms      4ms │     0.000s        1ms      3ms │       0%       0%
hash1          │     4.349s      877ms     45ms │     4.045s      532ms     44ms │ +7.0% +39.3%
hash2          │     6.575s      235ms    967ms │     6.540s      181ms    1.04s │ +0.5% +23.0%
rdoc           │    45.782s      2.23s    1.14s │    44.925s      1.90s    1.01s │ +1.9% +15.0%
binary_trees   │     6.433s      426ms    252ms │     6.268s      278ms    240ms │ +2.6% +34.7%
ring           │     6.584s       17ms    2.33s │     6.738s       13ms    2.33s │ -2.3% +30.8%
redblack       │    13.334s       31ms     42ms │    13.296s       24ms    107ms │ +0.3% +22.6%
───────────────┼─────────────────────────────────┼─────────────────────────────────┼──────────────────

Legend: (B) = Baseline, (E) = Experiment, Δ = improvement (positive = faster)
        Wall = total wallclock, Sweep = GC sweeping time, Mark = GC marking time
        Times are median of 3 runs
```
It relies too much on VM level concerns, such that it can't be built
with modular GC enabled.

We'll move it into the VM, and then expose it to the GC
implementations so they can use it.
Most compilers will optimise this anyway
@pull pull bot locked and limited conversation to collaborators Jan 26, 2026
@pull pull bot added the ⤵️ pull label Jan 26, 2026
@pull pull bot merged commit 3c63489 into turkdevops:master Jan 26, 2026
1 of 2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants