Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coverage: Memoize and simplify counter expressions #125106

Merged
merged 3 commits into from
May 20, 2024

Conversation

Zalathar
Copy link
Contributor

@Zalathar Zalathar commented May 14, 2024

When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like c1 + (c0 - c1), which is equivalent to just c0.

To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:

  • (a - b) + ba.
  • (a + b) - ba.
  • (a + b) - ab.
  • a + (b - a)b.
  • a - (a - b)b.

Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.

(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)


This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.

This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by #124278 (comment).

This currently has no effect, but is expected to be useful when expanding
support for branch coverage and MC/DC coverage.
Some of these cases currently don't occur in practice, but are included for
completeness, and to avoid having to add them later as branch coverage and
MC/DC coverage start building more complex expressions.
@rustbot
Copy link
Collaborator

rustbot commented May 14, 2024

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 14, 2024
@rustbot
Copy link
Collaborator

rustbot commented May 14, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rustbot rustbot added the A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) label May 14, 2024
@Zalathar
Copy link
Contributor Author

@ZhuUx This is my proposal for how to add counter expression simplification.

It is inspired by #124154 and by your proposed patch from #124278 (comment), but is implemented a bit differently.

I decided to include all possible 3-operand simplifications, even though some of them currently don't occur in practice, to avoid having to keep adding more as branch coverage and MC/DC start introducing more complex expressions. We can always get rid of the useless ones later, after that work is more mature.

Number of files: 1
- file 0 => global file 1
Number of expressions: 164
Number of expressions: 7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these improvements are pretty modest, but this particular test sees 164 counter expressions reduced to just 7!

@ZhuUx
Copy link
Contributor

ZhuUx commented May 14, 2024

Cool this should be more powerful than my expected. Since expressions do not cause runtime overhead in general, we'd better not cost much on simplifying it. This patch probably is the best choice to balance between cost and effectiveness.

I have tried other more aggressive simplifying methods like trie tree and a np algorithm, they do not give more drastic optimization compared to this pr but cost much more. Because most computational expressions are located in blocks that are relatively adjacent on the control flow graph, we can believe 3-operands simplification is enough in most cases.

Comment on lines -19 to -20
- Code(Expression(0, Add)) at (prev + 5, 1) to (start + 0, 2)
= (c1 + Zero)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any explanation of why this (essentially Counter(1)) is turned into Counter(0) instead?
This same pattern turns up in a bunch of files.

Copy link
Contributor

@ZhuUx ZhuUx May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general an expression can be lowered to zero due to some constants. For example, partial mir of this test function is

    coverage ExpressionId(0) => Expression { lhs: Counter(0), op: Subtract, rhs: Counter(1) };
    coverage ExpressionId(1) => Expression { lhs: Counter(1), op: Add, rhs: Expression(0) };

    bb0: {
        Coverage::CounterIncrement(0);
        // ... 
        _3 = const true;
        switchInt(move _3) -> [0: bb4, otherwise: bb1];
    }

    bb1: {
        Coverage::CounterIncrement(1);
        // ... 
    }

    bb4: {
        Coverage::ExpressionUsed(0);
        // ... 
    }

Since _3 is always true, we have Counter 1 == Counter 0 and Expression 0 can be lowered to Zero.
Then Expression 1 := Counter 1 + Expression 0, and we can see (c1 + Zero) here in previous.

This patch simplifies expressions before any expression is lowered to Zero. Therefore we get Expression 1:= Counter 1 + Expression 0 = Counter 1 + (Counter 0 - Counter 1) = Counter 0 first. Hence later it is not related to c1 or Zero but it should have same value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. So in other words:

  • Without expression simplification, this would have been c1 + (c0 - c1).
    • Then the block containing (c0 - c1) is removed by MIR opts, so coverage codegen replaces it with Zero, and we get c1 + Zero.
  • With simplification, we just immediately get c1, since c0 cancels itself out.

@RenjiSann
Copy link
Contributor

I am not 100% sure, but I remember seeing a simplification step operated on the LLVM side.
If the simplication does the same thing as we do, do we still need to do it on the Rust side, or can we let LLVM handle it ?

See this function that is used during codegen: https://github.com/llvm/llvm-project/blob/d9db2664994ff672f50d7fd0117477935dac04f1/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp#L77

@Zalathar
Copy link
Contributor Author

I am not 100% sure, but I remember seeing a simplification step operated on the LLVM side. If the simplication does the same thing as we do, do we still need to do it on the Rust side, or can we let LLVM handle it ?

See this function that is used during codegen: https://github.com/llvm/llvm-project/blob/d9db2664994ff672f50d7fd0117477935dac04f1/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp#L77

Ah, I remember having seen this code in the past, but I had forgotten about it until now.

I believe the reason we don't benefit from this simplification step is that (unlike clang) we never actually use CounterExpressionBuilder; instead we build a list of expressions and pass it directly to CoverageMappingWriter.

(So LLVM still removes unused expressions for us, but it won't simplify the ones that are used.)

Changing our FFI code to use CounterExpressionBuilder seems like more trouble than it's worth. If we wanted to perform the same simplification, it would probably be easier to reimplement it on the Rust side.

But doing so would require deeper changes to how we store and manipulate expressions in the instrumentor (or more complexity in codegen). The advantage of this PR is that it's a very “drop-in” solution, easy to add now, and (hopefully) easy to remove later if we switch over to a more thorough approach to simplifying expressions.

@davidtwco
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented May 20, 2024

📌 Commit d01df6f has been approved by davidtwco

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request May 20, 2024
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#124682 (Suggest setting lifetime in borrowck error involving types with elided lifetimes)
 - rust-lang#124917 (Check whether the next_node is else-less if in get_return_block)
 - rust-lang#125106 (coverage: Memoize and simplify counter expressions)
 - rust-lang#125173 (Remove `Rvalue::CheckedBinaryOp`)
 - rust-lang#125305 (add some codegen tests for issue 120493)
 - rust-lang#125314 ( Add an experimental feature gate for global registration)
 - rust-lang#125318 (Migrate `run-make/rustdoc-scrape-examples-whitespace` to `rmake.rs`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit e0d9228 into rust-lang:master May 20, 2024
6 checks passed
@rustbot rustbot added this to the 1.80.0 milestone May 20, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 20, 2024
Rollup merge of rust-lang#125106 - Zalathar:expressions, r=davidtwco

coverage: Memoize and simplify counter expressions

When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`.

To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:
- `(a - b) + b` → `a`.
- `(a + b) - b` → `a`.
- `(a + b) - a` → `b`.
- `a + (b - a)` → `b`.
- `a - (a - b)` → `b`.

Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.

(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)

---

This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.

This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang#124278 (comment).
@Zalathar Zalathar deleted the expressions branch May 20, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants