Skip to content

Commit

Permalink
Optimized forwarding callers and callees
Browse files Browse the repository at this point in the history
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](ruby#9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
  • Loading branch information
3 people committed Apr 12, 2024
1 parent 1521af3 commit 91b1ff4
Show file tree
Hide file tree
Showing 29 changed files with 675 additions and 101 deletions.
132 changes: 132 additions & 0 deletions bootstraptest/test_method.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1176,3 +1176,135 @@ def foo
foo
foo
}, '[Bug #20178]'

assert_equal 'ok', %q{
def bar(x); x; end
def foo(...); bar(...); end
foo('ok')
}

assert_equal 'ok', %q{
def bar(x); x; end
def foo(z, ...); bar(...); end
foo(1, 'ok')
}

assert_equal 'ok', %q{
def bar(x, y); x; end
def foo(...); bar("ok", ...); end
foo(1)
}

assert_equal 'ok', %q{
def bar(x); x; end
def foo(...); 1.times { return bar(...) }; end
foo("ok")
}

assert_equal 'ok', %q{
def bar(x); x; end
def foo(...); x = nil; 1.times { x = bar(...) }; x; end
foo("ok")
}

assert_equal 'ok', %q{
def bar(x); yield; end
def foo(...); bar(...); end
foo(1) { "ok" }
}

assert_equal 'ok', %q{
def baz(x); x; end
def bar(...); baz(...); end
def foo(...); bar(...); end
foo("ok")
}

assert_equal '[1, 2, 3, 4]', %q{
def baz(a, b, c, d); [a, b, c, d]; end
def bar(...); baz(1, ...); end
def foo(...); bar(2, ...); end
foo(3, 4)
}

assert_equal 'ok', %q{
class Foo; def self.foo(x); x; end; end
class Bar < Foo; def self.foo(...); super; end; end
Bar.foo('ok')
}

assert_equal 'ok', %q{
class Foo; def self.foo(x); x; end; end
class Bar < Foo; def self.foo(...); super(...); end; end
Bar.foo('ok')
}

assert_equal 'ok', %q{
class Foo; def self.foo(x, y); x + y; end; end
class Bar < Foo; def self.foo(...); super("o", ...); end; end
Bar.foo('k')
}

assert_equal 'ok', %q{
def bar(a); a; end
def foo(...); lambda { bar(...) }; end
foo("ok").call
}

assert_equal 'ok', %q{
class Foo; def self.foo(x, y); x + y; end; end
class Bar < Foo; def self.y(&b); b; end; def self.foo(...); y { super("o", ...) }; end; end
Bar.foo('k').call
}

assert_equal 'ok', %q{
def baz(n); n; end
def foo(...); bar = baz(...); lambda { lambda { bar } }; end
foo("ok").call.call
}

assert_equal 'ok', %q{
class A; def self.foo(...); new(...); end; attr_reader :b; def initialize(a, b:"ng"); @a = a; @b = b; end end
A.foo(1).b
A.foo(1, b: "ok").b
}

assert_equal 'ok', %q{
class A; def initialize; @a = ["ok"]; end; def first(...); @a.first(...); end; end
def call x; x.first; end
def call1 x; x.first(1); end
call(A.new)
call1(A.new).first
}

assert_equal 'ok', %q{
class A; def foo; yield("o"); end; end
class B < A; def foo(...); super { |x| yield(x + "k") }; end; end
B.new.foo { |x| x }
}

assert_equal "[1, 2, 3, 4]", %q{
def foo(*b) = b
def forward(...)
splat = [1,2,3]
foo(*splat, ...)
end
forward(4)
}

assert_equal "[1, 2, 3, 4]", %q{
class A
def foo(*b) = b
end
class B < A
def foo(...)
splat = [1,2,3]
super(*splat, ...)
end
end
B.new.foo(4)
}
87 changes: 73 additions & 14 deletions compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ static int iseq_setup_insn(rb_iseq_t *iseq, LINK_ANCHOR *const anchor);
static int iseq_optimize(rb_iseq_t *iseq, LINK_ANCHOR *const anchor);
static int iseq_insns_unification(rb_iseq_t *iseq, LINK_ANCHOR *const anchor);

static int iseq_set_local_table(rb_iseq_t *iseq, const rb_ast_id_table_t *tbl);
static int iseq_set_local_table(rb_iseq_t *iseq, const rb_ast_id_table_t *tbl, const NODE *const node_args);
static int iseq_set_exception_local_table(rb_iseq_t *iseq);
static int iseq_set_arguments(rb_iseq_t *iseq, LINK_ANCHOR *const anchor, const NODE *const node);

Expand Down Expand Up @@ -871,12 +871,12 @@ rb_iseq_compile_node(rb_iseq_t *iseq, const NODE *node)

if (node == 0) {
NO_CHECK(COMPILE(ret, "nil", node));
iseq_set_local_table(iseq, 0);
iseq_set_local_table(iseq, 0, 0);
}
/* assume node is T_NODE */
else if (nd_type_p(node, NODE_SCOPE)) {
/* iseq type of top, method, class, block */
iseq_set_local_table(iseq, RNODE_SCOPE(node)->nd_tbl);
iseq_set_local_table(iseq, RNODE_SCOPE(node)->nd_tbl, (NODE *)RNODE_SCOPE(node)->nd_args);
iseq_set_arguments(iseq, ret, (NODE *)RNODE_SCOPE(node)->nd_args);

switch (ISEQ_BODY(iseq)->type) {
Expand Down Expand Up @@ -1441,7 +1441,7 @@ new_callinfo(rb_iseq_t *iseq, ID mid, int argc, unsigned int flag, struct rb_cal
{
VM_ASSERT(argc >= 0);

if (!(flag & (VM_CALL_ARGS_SPLAT | VM_CALL_ARGS_BLOCKARG | VM_CALL_KW_SPLAT)) &&
if (!(flag & (VM_CALL_ARGS_SPLAT | VM_CALL_ARGS_BLOCKARG | VM_CALL_KW_SPLAT | VM_CALL_FORWARDING)) &&
kw_arg == NULL && !has_blockiseq) {
flag |= VM_CALL_ARGS_SIMPLE;
}
Expand Down Expand Up @@ -2024,6 +2024,13 @@ iseq_set_arguments(rb_iseq_t *iseq, LINK_ANCHOR *const optargs, const NODE *cons
}
block_id = args->block_arg;

bool optimized_forward = (args->forwarding && args->pre_args_num == 0 && !args->opt_args);

if (optimized_forward) {
rest_id = 0;
block_id = 0;
}

if (args->opt_args) {
const rb_node_opt_arg_t *node = args->opt_args;
LABEL *label;
Expand Down Expand Up @@ -2080,7 +2087,7 @@ iseq_set_arguments(rb_iseq_t *iseq, LINK_ANCHOR *const optargs, const NODE *cons
if (args->kw_args) {
arg_size = iseq_set_arguments_keywords(iseq, optargs, args, arg_size);
}
else if (args->kw_rest_arg) {
else if (args->kw_rest_arg && !optimized_forward) {
ID kw_id = iseq->body->local_table[arg_size];
struct rb_iseq_param_keyword *keyword = ZALLOC_N(struct rb_iseq_param_keyword, 1);
keyword->rest_start = arg_size++;
Expand All @@ -2100,6 +2107,12 @@ iseq_set_arguments(rb_iseq_t *iseq, LINK_ANCHOR *const optargs, const NODE *cons
body->param.flags.has_block = TRUE;
}

// Only optimize specifically methods like this: `foo(...)`
if (optimized_forward) {
body->param.flags.forwardable = TRUE;
arg_size = 1;
}

iseq_calc_param_size(iseq);
body->param.size = arg_size;

Expand Down Expand Up @@ -2129,13 +2142,26 @@ iseq_set_arguments(rb_iseq_t *iseq, LINK_ANCHOR *const optargs, const NODE *cons
}

static int
iseq_set_local_table(rb_iseq_t *iseq, const rb_ast_id_table_t *tbl)
iseq_set_local_table(rb_iseq_t *iseq, const rb_ast_id_table_t *tbl, const NODE *const node_args)
{
unsigned int size = tbl ? tbl->size : 0;
unsigned int offset = 0;

if (node_args) {
struct rb_args_info *args = &RNODE_ARGS(node_args)->nd_ainfo;

// If we have a function that only has `...` as the parameter,
// then its local table should only be `...`
// FIXME: I think this should be fixed in the AST rather than special case here.
if (args->forwarding && args->pre_args_num == 0 && !args->opt_args) {
size -= 3;
offset += 3;
}
}

if (size > 0) {
ID *ids = (ID *)ALLOC_N(ID, size);
MEMCPY(ids, tbl->ids, ID, size);
MEMCPY(ids, tbl->ids + offset, ID, size);
ISEQ_BODY(iseq)->local_table = ids;
}
ISEQ_BODY(iseq)->local_table_size = size;
Expand Down Expand Up @@ -4110,7 +4136,7 @@ iseq_specialized_instruction(rb_iseq_t *iseq, INSN *iobj)
}
}

if ((vm_ci_flag(ci) & VM_CALL_ARGS_BLOCKARG) == 0 && blockiseq == NULL) {
if ((vm_ci_flag(ci) & (VM_CALL_ARGS_BLOCKARG | VM_CALL_FORWARDING)) == 0 && blockiseq == NULL) {
iobj->insn_id = BIN(opt_send_without_block);
iobj->operand_size = insn_len(iobj->insn_id) - 1;
}
Expand Down Expand Up @@ -6272,9 +6298,33 @@ setup_args(rb_iseq_t *iseq, LINK_ANCHOR *const args, const NODE *argn,
unsigned int dup_rest = 1;
DECL_ANCHOR(arg_block);
INIT_ANCHOR(arg_block);
NO_CHECK(COMPILE(arg_block, "block", RNODE_BLOCK_PASS(argn)->nd_body));

*flag |= VM_CALL_ARGS_BLOCKARG;
if (RNODE_BLOCK_PASS(argn)->forwarding && ISEQ_BODY(ISEQ_BODY(iseq)->local_iseq)->param.flags.forwardable) {
int idx = ISEQ_BODY(ISEQ_BODY(iseq)->local_iseq)->local_table_size;// - get_local_var_idx(iseq, idDot3);

RUBY_ASSERT(nd_type_p(RNODE_BLOCK_PASS(argn)->nd_head, NODE_ARGSPUSH));
const NODE * arg_node =
RNODE_ARGSPUSH(RNODE_BLOCK_PASS(argn)->nd_head)->nd_head;

int argc = 0;

// Only compile leading args:
// foo(x, y, ...)
// ^^^^
if (nd_type_p(arg_node, NODE_ARGSCAT)) {
argc += setup_args_core(iseq, args, RNODE_ARGSCAT(arg_node)->nd_head, dup_rest, flag, keywords);
}

*flag |= VM_CALL_FORWARDING;

ADD_GETLOCAL(args, argn, idx, get_lvar_level(iseq));
return INT2FIX(argc);
}
else {
*flag |= VM_CALL_ARGS_BLOCKARG;

NO_CHECK(COMPILE(arg_block, "block", RNODE_BLOCK_PASS(argn)->nd_body));
}

if (LIST_INSN_SIZE_ONE(arg_block)) {
LINK_ELEMENT *elem = FIRST_ELEMENT(arg_block);
Expand Down Expand Up @@ -6306,7 +6356,7 @@ build_postexe_iseq(rb_iseq_t *iseq, LINK_ANCHOR *ret, const void *ptr)
ADD_INSN1(ret, body, putspecialobject, INT2FIX(VM_SPECIAL_OBJECT_VMCORE));
ADD_CALL_WITH_BLOCK(ret, body, id_core_set_postexe, argc, block);
RB_OBJ_WRITTEN(iseq, Qundef, (VALUE)block);
iseq_set_local_table(iseq, 0);
iseq_set_local_table(iseq, 0, 0);
}

static void
Expand Down Expand Up @@ -9391,6 +9441,13 @@ compile_super(rb_iseq_t *iseq, LINK_ANCHOR *const ret, const NODE *const node, i
ADD_GETLOCAL(args, node, idx, lvar_level);
}

/* forward ... */
if (local_body->param.flags.forwardable) {
flag |= VM_CALL_FORWARDING;
int idx = local_body->local_table_size - get_local_var_idx(liseq, idDot3);
ADD_GETLOCAL(args, node, idx, lvar_level);
}

if (local_body->param.flags.has_opt) {
/* optional arguments */
int j;
Expand Down Expand Up @@ -12935,7 +12992,8 @@ ibf_dump_iseq_each(struct ibf_dump *dump, const rb_iseq_t *iseq)
(body->param.flags.has_block << 6) |
(body->param.flags.ambiguous_param0 << 7) |
(body->param.flags.accepts_no_kwarg << 8) |
(body->param.flags.ruby2_keywords << 9);
(body->param.flags.ruby2_keywords << 9) |
(body->param.flags.forwardable << 10);

#if IBF_ISEQ_ENABLE_LOCAL_BUFFER
# define IBF_BODY_OFFSET(x) (x)
Expand Down Expand Up @@ -13149,8 +13207,9 @@ ibf_load_iseq_each(struct ibf_load *load, rb_iseq_t *iseq, ibf_offset_t offset)
load_body->param.flags.ambiguous_param0 = (param_flags >> 7) & 1;
load_body->param.flags.accepts_no_kwarg = (param_flags >> 8) & 1;
load_body->param.flags.ruby2_keywords = (param_flags >> 9) & 1;
load_body->param.flags.anon_rest = (param_flags >> 10) & 1;
load_body->param.flags.anon_kwrest = (param_flags >> 11) & 1;
load_body->param.flags.forwardable = (param_flags >> 10) & 1;
load_body->param.flags.anon_rest = (param_flags >> 11) & 1;
load_body->param.flags.anon_kwrest = (param_flags >> 12) & 1;
load_body->param.size = param_size;
load_body->param.lead_num = param_lead_num;
load_body->param.opt_num = param_opt_num;
Expand Down
5 changes: 3 additions & 2 deletions imemo.c
Original file line number Diff line number Diff line change
Expand Up @@ -315,8 +315,9 @@ rb_imemo_mark_and_move(VALUE obj, bool reference_updating)
}
}
else {
if (vm_cc_super_p(cc) || vm_cc_refinement_p(cc)) {
if (vm_cc_orphan_p(cc) || vm_cc_refinement_p(cc)) {
rb_gc_mark_movable((VALUE)cc->cme_);
rb_gc_mark_movable((VALUE)cc->klass);
}
}

Expand Down Expand Up @@ -471,7 +472,7 @@ vm_ccs_free(struct rb_class_cc_entries *ccs, int alive, VALUE klass)
}
}

VM_ASSERT(!vm_cc_super_p(cc) && !vm_cc_refinement_p(cc));
VM_ASSERT(!vm_cc_orphan_p(cc) && !vm_cc_refinement_p(cc));
vm_cc_invalidate(cc);
}
ruby_xfree(ccs->entries);
Expand Down
27 changes: 25 additions & 2 deletions insns.def
Original file line number Diff line number Diff line change
Expand Up @@ -867,10 +867,22 @@ send
// attr rb_snum_t sp_inc = sp_inc_of_sendish(cd->ci);
// attr rb_snum_t comptime_sp_inc = sp_inc_of_sendish(ci);
{
VALUE bh = vm_caller_setup_arg_block(ec, GET_CFP(), cd->ci, blockiseq, false);
struct rb_forwarding_call_data adjusted_cd;
struct rb_callinfo adjusted_ci;

CALL_DATA _cd = cd;

VALUE bh = vm_caller_setup_args(GET_EC(), GET_CFP(), &cd, blockiseq, 0, &adjusted_cd, &adjusted_ci);

val = vm_sendish(ec, GET_CFP(), cd, bh, mexp_search_method);
JIT_EXEC(ec, val);

if (vm_ci_flag(_cd->ci) & VM_CALL_FORWARDING) {
if (_cd->cc != cd->cc && vm_cc_markable(cd->cc)) {
RB_OBJ_WRITE(GET_ISEQ(), &_cd->cc, cd->cc);
}
}

if (UNDEF_P(val)) {
RESTORE_REGS();
NEXT_INSN();
Expand Down Expand Up @@ -991,10 +1003,21 @@ invokesuper
// attr rb_snum_t sp_inc = sp_inc_of_sendish(cd->ci);
// attr rb_snum_t comptime_sp_inc = sp_inc_of_sendish(ci);
{
VALUE bh = vm_caller_setup_arg_block(ec, GET_CFP(), cd->ci, blockiseq, true);
CALL_DATA _cd = cd;
struct rb_forwarding_call_data adjusted_cd;
struct rb_callinfo adjusted_ci;

VALUE bh = vm_caller_setup_args(GET_EC(), GET_CFP(), &cd, blockiseq, 1, &adjusted_cd, &adjusted_ci);

val = vm_sendish(ec, GET_CFP(), cd, bh, mexp_search_super);
JIT_EXEC(ec, val);

if (vm_ci_flag(_cd->ci) & VM_CALL_FORWARDING) {
if (_cd->cc != cd->cc && vm_cc_markable(cd->cc)) {
RB_OBJ_WRITE(GET_ISEQ(), &_cd->cc, cd->cc);
}
}

if (UNDEF_P(val)) {
RESTORE_REGS();
NEXT_INSN();
Expand Down

0 comments on commit 91b1ff4

Please sign in to comment.