Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Improve alias analysis for indirections #1736

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

dave-bartolomeo
Copy link
Contributor

This PR improves our alias analysis and SSA construction for access to memory via indirection, including fields of the indirection. Before, we could get precise flow through fields as long as we knew exactly which variable and which field was being accessed. Now, we can often tell that two accesses touch the same memory location, even if we don't know exactly which memory location that is.

Alias Analysis Changes

AliasAnalysis.qll is now parameterized base on a Configuration, which defines a type named Allocation. Allocation represents a contiguous region of memory that is disjoint from all other instances of Allocation. Two memory accesses that belong to different allocations can never overlap. When building unaliased SSA, Allocation represents an automatic (stack) variable, since that is all we care to model. When building aliased SSA, Allocation represents a ValueNumber for a VariableAddressInstruction, but in the future will also include InitializeParameterInstructions and possibly calls to malloc-like functions.

The entry points into AliasAnalysis.qll are now:

  • allocationEscapes(Configuration::Allocation allocation): Holds if the specified allocation's address escapes.
  • addressOperandBaseAndConstantOffset(AddressOperand addrOperand, Instruction base, int bitOffset): Holds if the specified AddressOperand points to an address equal to base plus a constant offset bitOffset. Two accesses with the same base address overlap if and only if their offsets overlap.
  • getAddressOperandAllocation(AddressOperand addrOperand): Gets the allocation into which addrOperand points, if known.

AliasedSSA.qll has been changed to track the allocation and base address for every indirect memory access. The overlap relationship now uses the following rules to determine overlap:

  • A def and use in different virtual variables never overlap.
  • A def and use that both have known but different allocations never overlap.
  • A def and use with the same base address overlap if and only if their offsets from that base address overlap. This ensures that *p and *p always exactly overlap, and p->x never overlaps p->y, even if we don't know exactly what p points to.

SimpleSSA.qll has been changed to track everything in terms of Allocations, but since an Allocation in SimpleSSA is just an IRAutomaticVariable, nothing really changed here.

Fixed handling of PointerOffsetInstruction in alias analysis to only propagate the pointer operand.

Test Changes

Added the first SSA sanity check, for operands with more than one associated MemoryLocation, and tests that run this check on our existing IR test sources.

Added a simple test for SSA for indirect parameter accesses.

Moved the points_to.ql test to its own directory, and made it use a new InlineExpectationsTest class. This class lets you define the results of your test in terms of (tag, value) pairs, each associated with a location. The base class looks for matching comments of the form $tag=value on the appropriate line, and reports an output row for each missing or unexpected result. The ".expected" file should always be an empty file. This allows us to modify the test sources, including inserting lines, without getting a bunch of diffs just due to line numbers changing.

Miscellaneous

Made duplicateOperand have the output of a problem query, including a link to the containing function.

Made IRVariable non-abstract, so that it is now possible to extend IRVariable to restrict rather than extend.

Fixed a bug where PrintSSA would print some properties on the wrong instruction.

Refactored the integer constant code a bit. Bit offsets are now printed in the format bytes:bits, where :bits is omitted if bits is zero. Thus, 21 bits is printed as 2:5, while 24 bits is printed as just 3.

@dave-bartolomeo
Copy link
Contributor Author

This produces the expected results on our tests, but I still need to test it (including performance) on real snapshots.

@jbj
Copy link
Contributor

jbj commented Aug 14, 2019

This is going to be very hard to review. Is there any way it can be split up into multiple commits that are reasonable in size and make sense on their own?

@dave-bartolomeo
Copy link
Contributor Author

@jbj As long as the individual commits don't have to actually work in isolation, I can definitely split it up.
I think the big chunks to review, in order, are:

  1. AliasAnalysis.qll
  2. SimpleSSA.qll+AliasConfiguration.qll (uses the new APIs from AliasAnalysis.qll)
  3. AliasedSSA.qll+AliasConfiguration.qll (uses the new APIs from AliasAnalysis.qll, but does more with them)
  4. The test changes
  5. InlineExpectationsTest.qll (used by the points_to test, but easier to understand once you've seen the changes to that test).
  6. Everything else (mostly simple changes)

I'll have it split in a couple hours.

@dave-bartolomeo
Copy link
Contributor Author

OK, I've curated all of the changes into separate commits. Many of those commits are pretty trivial. I've saved AliasedSSA.qll for last.

jbj added a commit to jbj/ql that referenced this pull request Sep 5, 2019
This uses the autoformatter from internal PR 34050. Some files that will
change in github#1736 have been spared.

    ./build -j4 target/jars/qlformat
    find ql/cpp/ql -name "*.ql"  -print0 | xargs -0 target/jars/qlformat --input
    find ql/cpp/ql -name "*.qll" -print0 | xargs -0 target/jars/qlformat --input
    (cd ql && git checkout 'cpp/ql/src/semmle/code/cpp/ir/implementation/**/*SSA*.qll')
    buildutils-internal/scripts/pr-checks/sync-identical-files.py --latest
jbj added a commit to jbj/ql that referenced this pull request Sep 9, 2019
Some files that will change in github#1736 have been spared.

    ./build -j4 target/jars/qlformat
    find ql/cpp/ql -name "*.ql"  -print0 | xargs -0 target/jars/qlformat --input
    find ql/cpp/ql -name "*.qll" -print0 | xargs -0 target/jars/qlformat --input
    (cd ql && git checkout 'cpp/ql/src/semmle/code/cpp/ir/implementation/**/*SSA*.qll')
    buildutils-internal/scripts/pr-checks/sync-identical-files.py --latest
/**
* A memory allocation that can be tracked by the AliasedSSA alias analysis.
* For now, we track all variables accessed within the function, including both local variables
* and global variables. In the future, we will track indirect parameters as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about modeled allocators or default new?

private Overlap getVariableMemoryLocationOverlap(VariableMemoryLocation def, VariableMemoryLocation use) {
overlappingIRVariableMemoryLocations(def, use) and
result = Interval::getOverlap(def.getStartBitOffset(), def.getEndBitOffset(), use.getStartBitOffset(), use.getEndBitOffset())
private Overlap getIndirectMemoryLocationOverlap(IndirectMemoryLocation def, IndirectMemoryLocation use) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written carefully to avoid a quadratic blowup on https://lgtm.com/projects/g/ALaDyn/piccante/latest/files/src/sobol.cpp - can you make sure it's still performant?

*/
private string getBaseAddressString(ValueNumber baseAddress) {
// If it's the value of a parameter, just use the parameter name.
result = baseAddress.getAnInstruction().(InitializeParameterInstruction).getParameter().toString() or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also special-case VariableAddressInstruction here?

@jbj
Copy link
Contributor

jbj commented Apr 29, 2020

I recently asked @dbartol to remind me what this PR does.

  • It's an analysis to connect definitions of *p with uses of *p, regardless of whether we have modelled the allocation behind it.
  • Most of the problem was already solved by modelling parameters as disjoint allocations.
  • An extra stage of value numbering is required, which has performance implications.
  • We decided to only revive this PR if we observe real issues.

@adityasharad adityasharad changed the base branch from master to main August 14, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants