Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include generated code in determining effective regions. #118

Merged
merged 41 commits into from
Dec 22, 2022

Conversation

nimakarimipour
Copy link
Member

@nimakarimipour nimakarimipour commented Dec 20, 2022

This PR is built on #116

This PR enhances logic for selecting the potentially impacted regions to include generated code by processors (e.g. Lombok).

Trackers are used to detect potentially impacted regions for a fix. These trackers only include the regions where the element was used directly. But since some processors, will copy the annotations on the element over some generated methods, these trackers will miss the real impacted regions because of the copied annotation. This PR tries to address this issue.

Changes in this PR:

  • Adds GeneratedCodeDetector interface in annotator-scanner module which is responsible for determining if a TreePath is pointing to a generated code.
  • Adds LombokGeneratedCodeDetector which implements GeneratedCodeDetector and can detect if a path is pointing to a Lombok generated code. For this feature to work properly (lombok.addLombokGeneratedAnnotation = true must be set)
  • Adds a fifth column in the element use serialization in regions which denotes if the serialized region exists in source code or is generated by a processor. If the region exists in the source code it will have the value "SOURCE", otherwise, it will have the name of the processor that created it.
  • Adds GeneratedRegionTracker in annotator-core module which is responsible for extending the list of potentially impacted regions to include users of generated code that uses the target element.
  • Adds LombokTracker which implements GeneratedRegionTracker and can extend regions with other regions that Lombok has created.

To enable handling regions created by Lombok, the flag below must be passed:

-ardl, --activate-region-detection-lombok

@nimakarimipour nimakarimipour added the enhancement New feature or request label Dec 20, 2022
@nimakarimipour nimakarimipour self-assigned this Dec 20, 2022
Copy link
Member

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get the high-level idea. I have a few comments, mostly around use of enums instead of Strings

@@ -141,6 +141,8 @@ public class Config {
*/
private NullAwayVersionAdapter adapter;

public final ImmutableSet<String> generatedCodeDetectors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than Strings it'd be better to create an enum of supported detectors. You could then create the set using Guava's Sets.immutableEnumSet https://guava.dev/releases/snapshot/api/docs/com/google/common/collect/Sets.html#immutableEnumSet(E,E...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, I used enum 4bbd2e4 and f501021

+ values.length);
String regionMember =
!Region.getType(values[1]).equals(Region.Type.METHOD) ? "null" : values[1];
return new TrackerNode(new Region(values[0], regionMember), values[2], values[3]);
return new TrackerNode(new Region(values[0], regionMember, values[4]), values[2], values[3]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change relates to changed output from annotator-scanner, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

* Type of region. If region exists in source code, this value is {@code "SOURCE"}, otherwise it
* will be the name of the processor created this region. (e.g. {"LOMBOK"}).
*/
public final String sourceType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again an enum would be better than a String here. Maybe SOURCE should be in the enum I suggested above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @return Name of the generator that produced the code and {@code "SOURCE"} if element exists in
* source code.
*/
String getSourceForSymbolAtPath(TreePath path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again use enum for return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public class LombokGeneratedCodeDetector implements GeneratedCodeDetector {

/** Associated name for Lombok in region serializations in <i>SOURCE_TYPE</i> column. */
private static final String NAME = "LOMBOK";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use enum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Override
public Set<Region> extendWithGeneratedRegions(Set<Region> regions) {
return regions.stream()
// filter regions where are created by lombok
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is that for any Lombok-generated method in regions, we also add all uses of that method to the impacted regions. Is this the right idea? If so, let's add some comments / Javadoc documenting the logic here. Also we should state briefly why we think this is sufficient to get all the relevant regions, given the documented assumptions on how Lombok propagates annotations (see previous comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass. Also, again, I think the description could use a bit of an edit/proofread. It's useful, but there are a few sentences that are harder to parse than they need to be.

* @return Associated Source type of the generator that produced the code. If element exists in
* source code and not produced by any processor, it will return {@link SourceType#SOURCE}
*/
SourceType getSourceForSymbolAtPath(TreePath path);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be a method of Config or if it makes more sense to have Config merely return a separate object with this method (SymbolSourceResolver?), just to avoid config doing work not related to parsing configuration options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.getLeaf() instanceof MethodTree
? (MethodTree) path.getLeaf()
: ASTHelpers.findEnclosingNode(path, MethodTree.class);
if (enclosingMethod == null && path.getLeaf() instanceof MethodTree) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this case not covered by the conditional expression above? Seems like an exact repetition of the first case of the conditional...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5a0eee1

Yes, that it true, thank you for noticing this.

enclosingMethod = (MethodTree) path.getLeaf();
}
if (enclosingMethod == null) {
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this method is incomplete... no? What happens with e.g generated inner classes or generated fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only focusing on generated methods which can be called outside of the class and extend the potentially impacted regions. But, you are right, I don't see any harm in having a comprehensive check on the generated code of any type. Please see 3621feb. Used the same logic for detecting region member which already covers all cases.

check("NullAway", CheckSeverity.ERROR)
check("AnnotatorScanner", CheckSeverity.ERROR)
option("NullAway:AnnotatedPackages", "test")
option("NullAway:SerializeFixMetadata", "true")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, internally we also run with the following related configuration flags, not sure if this makes lombok detection harder or easier for the auto-annotator, but worth keeping in mind:

-XepOpt:NullAway:ExcludedClassAnnotations=[...],lombok.Generated
-XepOpt:NullAway:ExternalInitAnnotations=[...],lombok.Builder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for noting this. I will add these flags, in the followup PR for adding more unit tests.

new Option(
"ardl",
"activate-region-detection-lombok",
false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we want this to be false by default? What's the cost on non-Lombok code? (Btw, I absolutely agree there should be a flag to control this, but maybe this should be on by default and possible to turn off via a flag?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

94f67a7 activated it by default.

new FieldRegionTracker(config, info),
methodRegionTracker,
new ParameterRegionTracker(tree, methodRegionTracker));
Set<GeneratedRegionTracker> generatedRegionTrackers = new HashSet<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build with ImmutableSet.builder() rather than hashset and then copyOf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set<Region> regions = new HashSet<>();
this.trackers.forEach(tracker -> tracker.getRegions(location).ifPresent(regions::addAll));
this.generatedRegionsTrackers.forEach(
tracker -> regions.addAll(tracker.extendWithGeneratedRegions(regions)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note here: This is safe as long as there aren't cases where generated code from one processor is derived from generated code from another, or if the GeneratedRegionTrackers are all listed in the same order as that in which processors run (assuming no cycles and no concurrency of annotation processors? Not sure how they actually work here, cc: @msridhar ). Otherwise this would need to be run to a fix-point. That said, I think it's very rare for generated code from one processor to be then fed to another, so I wouldn't worry about handle that case for now...

Optional<Set<Region>> ans = tracker.getRegions(onMethod);
return ans.isPresent() ? ans.get().stream() : Stream.of();
})
.collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works on methods only and one hop, right? I.e. if our normal exploration adds @Nullable to field Foo.bar and Foo is @Data, our initial affected regions (before this fix) will include getBar(). Then this chance makes sure we add all methods/fields/etc that would be affected by getBar() itself automatically becoming @Nullable due to Lombok. Correct?

This might not be enough to track all the changes to fields and methods of the inner class Foo.Builder or their impacted regions if the class is also annotated @Builder. Right?

Happy if the answer is no and we can do that other case as a follow up PR, btw 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at generated code by @Builder and I did not see any generated code getting reused again. But I think as discussed in the follow up PR for more tests, we can also resolve this issue if it comes up 🙂

@nimakarimipour
Copy link
Member Author

@lazaroclapp Thank you for the review. I resolved all your comments and this is ready for another round. Thank you 🙂.

Base automatically changed from nimak/refactor-tracker-node to master December 21, 2022 19:34
Copy link
Member

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments this time. Approving assuming they will be addressed before landing

// add potentially impacted regions for the collected methods.
.flatMap(
onMethod -> {
Optional<Set<Region>> ans = tracker.getRegions(onMethod);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but it seems strange that the getRegions() method returns an Optional<Set<Region>> rather than just a Set<Region> that could be empty. Maybe fix in a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msridhar I wanted to distinguish between trackers "not having any knowledge about the passed element" and "having data but not having any actual usage". But this was required in the old designs and should be updated in follow up PRs.

@Nullable
public static Symbol locateRegionMemberForSymbolAtPath(
TreePath path, Symbol.ClassSymbol enclosingClass) {
// Class is not generated by lombok. Let's find the region member, can be a field, method or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a comment about lombok here? Feels somewhat unrelated to what this method is doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry thanks for noticing this. This was copied from existing method. 3738766

public class LombokGeneratedCodeDetector implements GeneratedCodeDetector {

/** Associated source type for Lombok in region serializations in <i>SOURCE_TYPE</i> column. */
private static final SourceType sourceType = SourceType.LOMBOK;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this constant, since it's only used once. Just return SourceType.LOMBOK from the method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* impacted as well.
*
* @param regions Set of potentially impacted regions.
* @return Set of potentially impacted regions including all knowing potentially impacted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return Set of potentially impacted regions including all knowing potentially impacted
* @return Set of potentially impacted regions including all known potentially impacted

But also this is both hard to parse and - I think - inaccurate.

You are not extending the list exclusively with generated regions, but, rather whenever a region in the initial set (a region affected by a fix) is a generated region, we are also adding every region that would be impacted by a fix within the generated region, since the annotation processor might propagate changes from the original fix to this generated region independent of the annotator's own logic. Is this correct? Is there a clearer way to explain this? (cc: @msridhar )

nimakarimipour and others added 3 commits December 21, 2022 14:55
…ProneCLIFlagsConfig.java

Co-authored-by: Lázaro Clapp <lazaro@uber.com>
…ackers/generatedcode/GeneratedRegionTracker.java

Co-authored-by: Lázaro Clapp <lazaro@uber.com>
nimakarimipour and others added 2 commits December 21, 2022 17:20
…ackers/generatedcode/GeneratedRegionTracker.java

Co-authored-by: Manu Sridharan <msridhar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants