Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason about iterating over a map's key set using an enhanced for loop #554

Merged
merged 23 commits into from
Jan 21, 2022

Conversation

msridhar
Copy link
Collaborator

@msridhar msridhar commented Jan 15, 2022

Fixes #535

Uses the strategy outlined in this comment: #535 (comment)

@coveralls
Copy link

coveralls commented Jan 15, 2022

Pull Request Test Coverage Report for Build #720

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 44 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.1%) to 89.495%

Files with Coverage Reduction New Missed Lines %
../nullaway/src/main/java/com/uber/nullaway/dataflow/AccessPath.java 6 96.17%
../nullaway/src/main/java/com/uber/nullaway/dataflow/NullnessStore.java 9 89.02%
../nullaway/src/main/java/com/uber/nullaway/dataflow/AccessPathNullnessPropagation.java 29 90.07%
Totals Coverage Status
Change from base Build #716: 0.1%
Covered Lines: 4200
Relevant Lines: 4693

💛 - Coveralls

@msridhar msridhar marked this pull request as ready for review January 15, 2022 18:34
@msridhar msridhar changed the title [WIP] Reason about iterating over a map's key set using an enhanced for loop Reason about iterating over a map's key set using an enhanced for loop Jan 15, 2022
Copy link
Collaborator

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and questions, but overall looks good!

Did we run our performance benchmarks on this change?

.addSourceLines(
"Test.java",
"package com.uber;",
"import java.util.*;",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is test code, so arguably this doesn't matter, but shouldn't this be just import java.util.Map?

.addSourceLines(
"Test.java",
"package com.uber;",
"import java.util.*;",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

.addSourceLines(
"Test.java",
"package com.uber;",
"import java.util.*;",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even here, I'd just import Map and HashMap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just, in general, mild vote for listing the imports individually as if this were real code. But I am open to the opposite argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed in 4257191

"public class Test {",
" public void keySetStuff(Map<Object, Object> m) {",
" // BUG: Diagnostic contains: dereferenced expression",
" m.get(m.keySet().iterator().next()).toString();",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unhandled or a true positive? Technically the map could be empty....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Map were empty, this would cause a NoSuchElementException, not an NPE. So I would say our report is a false positive, since it refers to the possibility of an NPE.

* as this class is designed specifically for reasoning about iterating over map keys using an
* enhanced-for loop over a {@code keySet()}, and for such cases the iterator is always stored
* locally
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only support locals, is there a more precise type than Element?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, tightened to VariableElement in 76ddc4c

* {@code null}.
*/
@Nullable
private Node getMapNodeForKeySetIteratorCall(MethodInvocationNode invocationNode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a preconditions check that the method being called is Set::iterator()? Or do worry about the extra performance hit of those checks and feel the isEnhancedForIteratorVariable(...) check on the caller and our assumptions about CF's CFG generation are enough of a check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I was trying to minimize the overhead from unnecessary checks. I am very confident in the invariants we rely on in terms of the CF CFG right now. My hope was that if we bump the CF version and something changes, our tests would fail. I guess there is the possibility that CF would change in a way that would cause us to start thinking other types of expressions were non-null erroneously, introducing a new unsoundness. That seems pretty unlikely to me, but if you'd like, I can introduce some extra sanity checking. I would want to limit that checking to not run for every assignment statement, though, instead only for cases where we are pretty sure we're in an enhanced for loop. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call. I think you are way more familiar with the CF internals than I am, if you feel the invariant is pretty much set in stone and requires no runtime checking, I am fine with that. That said, if you think there is a reasonable value in adding sanity checking, I definitely agree we can defer it until right before the updates.set(...) call. Though, again, if you think our assumptions are unlikely to break, I am also fine leaving this as is.

updates.set(mapWithIteratorContentsKey, NONNULL);
}
}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why else { if {...} } instead of else if { ... }?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in e628368

} else {
// Check for an assignment lhs = iter#numX.next(). From the structure of Checker Framework
// CFGs, we know that if iter#numX is the receiver of a call on the rhs of an assignment, it
// must be a call to next().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should trust that CFG is stable enough here, or if we should be checking the method symbol for Iterator::next(). We can do this after verifying isEnhancedForIteratorVariable(...) but before generating the AP, if we are worried about the performance hit from doing so on every method invocation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. If you'd like a check here, is it ok to do it only if mapGetPath != null? That would limit the perf impact even more (we will likely have many enhanced-for loops that are not over the key set of a map).

* Creates an access path identical to {@code accessPath} (which must represent a map), but with
* {@code mapKey} as its map {@code get()} argument
*/
public static AccessPath withMapKey(AccessPath accessPath, MapKey mapKey) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the way we are using this method is that we have an AP of the form ap1 = x.y.m.get(iter#numX) (which is not real, more of a virtual map get) and see v = iter#numX.next() so we call this method with (ap1, v) to obtain ap1 = x.y.m.get(v). Correct?

If so, maybe this should be replaceMapKey(...)? Or, if that's still missing some usage where this is called an access path without an existing map key, then the javadoc should mention it nonetheless "{@code mapKey} as its map {@code get()} argument, replacing any map key information in the original {@code accessPath}."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. We currently only expect this method to be used when accessPath already represents a map get. Renamed the method and clarified Javadoc in 0fca272. I don't think a further precondition check is needed here, but I can add if you like

@msridhar
Copy link
Collaborator Author

Did we run our performance benchmarks on this change?

No, not yet, I'll do that now.

Thanks for the review @lazaroclapp! I addressed most of the comments. There is still an open question about how many well-formedness checks we should add on the CFG; see my inline comments.

Copy link
Collaborator

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below. I'd be fine with this landing as is after performance benchmarking. I leave adding well-formedness checks up to what you think makes sense. If you are pretty sure the CFG invariant will not change in an unexpected way for future Java constructs, then it makes sense to omit those checks for performance reasons (but having them and deferring them to the last possible moment is always an option).

* {@code null}.
*/
@Nullable
private Node getMapNodeForKeySetIteratorCall(MethodInvocationNode invocationNode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call. I think you are way more familiar with the CF internals than I am, if you feel the invariant is pretty much set in stone and requires no runtime checking, I am fine with that. That said, if you think there is a reasonable value in adding sanity checking, I definitely agree we can defer it until right before the updates.set(...) call. Though, again, if you think our assumptions are unlikely to break, I am also fine leaving this as is.

@msridhar
Copy link
Collaborator Author

Here are the perf benchmarking results. master branch:

Benchmark                      Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile  thrpt   25  10.837 ± 0.078  ops/s
CaffeineBenchmark.compile     thrpt   25   2.390 ± 0.013  ops/s

With this PR:

Benchmark                      Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile  thrpt   25  10.851 ± 0.070  ops/s
CaffeineBenchmark.compile     thrpt   25   2.362 ± 0.019  ops/s

The changes seem to be within the noise.

@msridhar
Copy link
Collaborator Author

msridhar commented Jan 21, 2022

I decided to add the sanity checks, just to be safe. Will re-run perf benchmarks now. If the checks do end up causing overhead we care about, we can remove them later (or possibly find something else small to optimize).

Copy link
Collaborator

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved iff no performance regression :)

// receiver represents the map
return baseInvocation.getTarget().getReceiver();
}
}
return null;
}

private boolean isCallToMethod(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@msridhar
Copy link
Collaborator Author

Approved iff no performance regression :)

Re-ran and perf differences are within noise. Landing 🙂

@msridhar msridhar merged commit 4ec519c into master Jan 21, 2022
@msridhar msridhar deleted the keyset-iterator branch January 21, 2022 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better handling of iterating over a map's key set
3 participants