-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization version 3: update method serialization to exclude type use annotations and type arguments #735
Serialization version 3: update method serialization to exclude type use annotations and type arguments #735
Conversation
This PR refactors the serialization logic to call a `static` method in `Serializer` to convert a `Symbol` to its corresponding `String` representation. This PR is in preparation for the update on serialization service in #735.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments
nullaway/src/main/java/com/uber/nullaway/fixserialization/Serializer.java
Outdated
Show resolved
Hide resolved
nullaway/src/main/java/com/uber/nullaway/fixserialization/Serializer.java
Outdated
Show resolved
Hide resolved
} | ||
// Index for iteration in visiting method parameters. Used to determine if visiting the last | ||
// parameter. | ||
AtomicInteger index = new AtomicInteger(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer just using a loop with an index to a stream with an AtomicInteger
keeping the index. We could just directly update sb
as we go. To me, that would make for clearer code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also removed as part of d74cde1.
// manually convert the last parameter to varargs if the method is varargs. | ||
Type type = | ||
// if the parameter is the last parameter, | ||
(index.getAndIncrement() == methodSymbol.getParameters().size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getAndIncrement()
side effect inside the conditional here is not my style 🙂 But this would go away if we use a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now removed as part of d74cde1 🙂
? ((Type.ArrayType) parameter.type).makeVarargs() | ||
: parameter.type; | ||
return type.accept( | ||
new Types.DefaultTypeVisitor<String, Void>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we adapt any of this code from Error Prone? If so we should note the source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar Would you please tell me where should I write this ? Should it be inside the method javadoc, or at the first line of the method body in a comment ? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just put it in the Javadoc. Add a sentence saying this code was based on method XYZ from Error Prone or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in javadoc, though with our latest version, I am not sure if the code is actually based on errorprone util method. We are not using a visitor class and the output is different.I added it anyway. But please let me know if you would like to remove it. 6e0e038
if (t.getTypeArguments().nonEmpty()) { | ||
sb.append('<'); | ||
sb.append( | ||
t.getTypeArguments().stream() | ||
.map(arg -> arg.accept(this, null)) | ||
.collect(joining(","))); | ||
sb.append(">"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think type arguments are necessary to uniquely identify a method. Do we need them in the string for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not required. But changing that will require updating 2 unit test outputs and a change in Injector. Please let me know if I should remove type argument serialization. I am fine with changing Injector, but just wanted to confirm if we should change unit test output in this PR as well. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar At this moment, I think removing type argument serialization will simplify both code in NullAway and Injector actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference is to have less code to maintain. So, if the code change to the injector is not too significant / difficult, I'd prefer not to serialize the type arguments.
On the other hand, if you feel that serializing type arguments is important for readability, or some other reason that I am missing, I don't strongly object to keeping them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree and I prefer to remove it as well. It will reduce complexity and worths the update on Injector as well. 424bdb5
nullaway/src/main/java/com/uber/nullaway/fixserialization/Serializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more comments
if (methodSymbol == null) { | ||
return "null"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this since methodSymbol
cannot be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Override | ||
public String visitWildcardType(Type.WildcardType t, Void unused) { | ||
StringBuilder sb = new StringBuilder(); | ||
sb.append(t.kind); | ||
if (t.kind != BoundKind.UNBOUND) { | ||
sb.append(t.type.accept(this, null)); | ||
} | ||
return sb.toString(); | ||
} | ||
|
||
@Override | ||
public String visitCapturedType(Type.CapturedType t, Void s) { | ||
return t.wildcard.accept(this, null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get rid of serializing type arguments, I don't think we will need either of these two overrides either. Maybe we can delete them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, since we decided to remove type arguments, we can implement this with simple iteration rather than a visitor. 424bdb5
@msridhar Thank you for the review, the code is much cleaner now. This is ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more minor things, but otherwise this LGTM! Will wait for @lazaroclapp to take a quick look before landing
* Serializes the signature of the given {@link Symbol.MethodSymbol} to a string. This code is | ||
* based on {@link com.google.errorprone.util.Signatures#prettyMethodSignature}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point, this code is pretty different from the Error Prone code now. So probably we can remove the sentence referencing the Signatures function. Sorry to go back and forth :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Agreed on this one!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, removed the reference 6e1406b
@@ -383,7 +383,7 @@ public void suggestNullableParamGenericsTest() { | |||
.setExpectedOutputs( | |||
new FixDisplay( | |||
"nullable", | |||
"newStatement(T,java.util.ArrayList<T>,boolean,boolean)", | |||
"newStatement(T,java.util.ArrayList,boolean,boolean)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since generics are erased at the bytecode level, the T
here would turn into java.lang.Object
in a JVM-level method descriptor. But, as long as we also serialize the enclosing class, I think keeping T
here should be no problem. @nimakarimipour do you agree? Assuming yes, we are fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar Sorry, I am not sure if I understood correctly. Currently we are not serializing type arguments, therefore, for ArrayList<T>
in a method, we only serialize java.util.ArrayList
. Our current implementation does not visit the type arguments at all. If we decide to keep T
in this case. I am not sure how we should serialize Map<Object, ArrayList<T>>
without full serialization of all type arguments which requires switching to visitor. Our current implementation in this PR will only serialize java.util.Map
.
Considering Annotator, for a method serialization we require two conditions.
- A method serialization which is robust to annotation injection / deletion in source code.
- Preserve enough information in the string representation so that Injector can locate the exact method declaration.
I think java.util.ArrayList
or java.util.Map
satisfies both conditions. Would you please elaborate what I should do or what I am missing here. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Manu's question here is just, why the signature isn't simply:
newStatement(java.lang.Object,java.util.ArrayList,boolean,boolean)
if T
is type-erased to java.lang.Object
.
The problem is that if we want to perform type erasure of arguments with a type argument as their type, then we need to be very careful. And we need to be careful here and also in the javaparser code, no?
For example, here the compiler will throw an error, because indeed, at the JVM level, these two bar
methods are one and the same:
class Foo<T> {
public void bar(T t) { }
public void bar(Object t) { }
}
However, simply changing the code to:
class Foo<T extends String> {
public void bar(T t) { }
public void bar(Object t) { }
}
will caused to compile. Because now the JVM signature of the first bar
is bar(java.lang.String)
, which doesn't clash with bar(java.lang.Object)
below.
I think if both NullAway and the javaparser code are looking at the source and serializing "T"
, then that's the most robust case. But a question is, don't we serialize methods from bytecode jars when looking at the downstream dependencies? Is there a case where serialization is looking at a symbol from bytecode, yet we are then trying to resolve the method name to inject an annotation?
p.s. presumably getting the Signature.descriptor
already returns the type-erased signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @lazaroclapp that's what I was thinking about. And I think if we serialize T
in all cases we are ok. But you're right that if we need to serialize the same method signature both from source code and also from bytecode then that could be tricky. @nimakarimipour what do you think?
I presume Signature.descriptor
returns a type-erased signature but I haven't tested to see if it does the right thing for the cases above (I actually didn't even though that the upper bound of a type variable can affect the generated bytecode!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar @lazaroclapp Sorry for the confusion and thank you for the clarification, now I understand the issue. I just tested the serialization. It serializes T
in all cases. I applied the same logic here for method serialization in annotator-scanner and checked for downstream dependencies, for a method with generic type in its argument, it can correctly:
- Scan the potentially impacted regions on downstream dependencies.
- Serialize a fix for an element in upstream.
Please see the example used in test:
// In upstream
Foo<T> {
public T bar(T p1) {
return null;
}
}
and
// In downstream
Dep {
Foo<Object> foo;
public Object baz() {
return foo.bar(null);
}
}
For method Foo#bar(T)
it can correctly detect Dep#baz()
is an impacted region and the fix for passing null
in Dep#baz()
is making Foo#bar(T)
accept nullable.
An interesting point is that parameter names are not consistent for Foo
while processing downstream dependencies since received as jar. We weren't depending on parameter names and were using index, but I should just keep in mind never depend on that for future works. 🙂.
So I think we can keep the serialization as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Dep#baz()
return Object
rather than T
? But otherwise this makes sense to me. At runtime, for backward compatibility with pre-generics bytecodes, generic types are erased in method descriptors. But, all the generic type information still needs to be preserved somewhere in the class files, to enable proper type checking of downstream deps in javac
. Fortunately, it seems that javac
returns the proper generic type info for upstream for the APIs we are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar Yes, exactly. I simplified the actual tested source code and missed this. It was Object
in Dep#baz()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we get un-erased generics even when looking at jar files? Cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments below. Also, I changed the title of the PR to be a bit more descriptive and responded to a thread from @msridhar's review 🙂
.addSourceLines( | ||
"com/uber/Test.java", | ||
"package com.uber;", | ||
"import java.util.*;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in tests, I'd prefer no star imports if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed it in 73f1a85
* Serializes the signature of the given {@link Symbol.MethodSymbol} to a string. This code is | ||
* based on {@link com.google.errorprone.util.Signatures#prettyMethodSignature}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Agreed on this one!)
// For constructors, we use the name of the enclosing class. | ||
Name name = encClass.getSimpleName(); | ||
if (name.isEmpty()) { | ||
// For anonymous classes, we use the name of the superclass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, since anonymous classes can't declare constructors, so if we see an error for:
List list = new ArrayList(10) {...}
the constructor involved will literally have to be ArrayList(int)
(though the class won't match, since that constructor is declared in java.util.ArrayList
and our encClass
will be com.example.Foo$1
or similar...).
Actually, meta note here: we have a big number of corner cases here to handle serializing constructors. Do we have existing test cases in NullAwaySerializationTest.java
for:
a) ErrorDisplay
with type/kind METHOD
and a standard constructor
b) This case for anonymous inner class "constructors".
?
(Basically, do we have coverage here for both branches of the conditional above?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lazaroclapp's comment made me think of something else. What uniqueness property do we need for these method signatures? If we have two anonymous subclasses of the same superclass, the signatures will be identical. And, both of these anonymous classes could appear in the same containing method. Is that a problem? Or are we guaranteed that other information alongside the method signature (e.g., a source location for the error) will give us the uniqueness property we need? It'd be good to document somewhere the required uniqueness properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are serializing the enclosing class, so we would get method ArrayList(int)
in Foo$1
and method ArrayList(int)
in Foo$2
. The suppression would end up in the method containing both anonymous inner classes, but that's just because there is nowhere to put it on the classes themselves, not because we don't have that info. At least that's my understanding. Is this correct, @nimakarimipour ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msridhar @lazaroclapp Yes the enclosing class (flat name) is always serialized along the method signature and the combination of two makes the pair unique. Regarding the conditions in the beginning of this method, I was just following logic implemented in ErrorProne
rather than calling the getName()
on the method symbol. But after trying calling getName()
on the symbol just now, realized that that simple name for constructors is always <init>
and not the class name, therefore, we have to manually locate the enclosing class for constructors. Regarding the case where the constructor is inside an anonymous class, I don't have an example as of now that how it can happen to test it. But I think maybe we can just keep this branch to be cautious that if for any reason the execution hit that branch, it serialize something rather than empty ? Please let me know your thoughts on this. I updated comments and added a test case specifically for constructors.
76c28d6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An anonymous class cannot declare its own constructors, period. I believe the only case where name.isEmpty()
would return true
is when you have a constructor call for an new anonymous class, i.e., something like new ArrayList(10) {...}
. For this NewClassTree
, I think if you look at the MethodSymbol
for the invoked method, it will be a constructor where the name of the enclosing class is empty. But, for our use case, we should never be trying to serialize such a MethodSymbol
, correct @nimakarimipour? If that is the case, I would rather throw a RuntimeException
here rather than using the superclass name. Then, we will know if we were correct in our assumption that this is impossible, or whether there is a case that we missed and need to think about more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a comment ddb603c for future cases.
@lazaroclapp Thank you very much for the review. I think this is ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting a test case below, plus the version bump and compatibility discussed in Slack.
"Did not expect method serialization for anonymous class constructor: " | ||
+ methodSymbol | ||
+ ", in anonymous class: " | ||
+ encClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still add a test case with:
- A class
Foo
with a constructor taking a non-null arg (i.e.A(Object o)
) - An anonymous subclass of
Foo
, declared asnew Foo(null) { ... }
Just to make sure it doesn't attempt to serialize an anonymous class constructor when encountering that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, please find it cdbbae4
…ersion 3. (#738) This PR is merely refactoring which prepares serialization logic for version `v3` implemented in #735. Changes in this PR - Uses adapter to serialize method signatures. - Adds `SerializationAdapter` to `tabSeparatedToString` methods for symbol serialization. - Adds `SerializationAdapter` to `serializeSymbol` static method for symbol serialization.
@lazaroclapp Thank you for the review, this is ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Some small requests!
@@ -136,8 +136,8 @@ private SerializationAdapter initializeAdapter(int version) { | |||
switch (version) { | |||
case 1: | |||
return new SerializationV1Adapter(); | |||
case 2: | |||
return new SerializationV2Adapter(); | |||
case 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a explicit case for v2 that throws a RuntimeException(...)
noting that v2 is skipped and was used for an alpha version of the auto-annotator tool. Recommend the user to use v3
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, please find it here 56f58be
nullaway/src/test/java/com/uber/nullaway/NullAwaySerializationTest.java
Outdated
Show resolved
Hide resolved
"-XepOpt:NullAway:FixSerializationConfigPath=" + configPath)) | ||
.addSourceLines( | ||
// toString() call on method symbol will serialize the annotation below which should not | ||
// be present in string representation fo a method signature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which annotation? (Test looks fine to me, but comment seems unrelated. Also, this test case is subtle enough that rather than just removing the comment, if it's an incorrect copy paste, I'd rewrite it to explain what we are actually testing: correct serialization of names for constructors invoked while creating anonymous inner classes, where the name is technically the name of the super-class, not of the anonymous inner class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry mistake, fixed it in b767907 and added your explanation.
…Test.java Co-authored-by: Lázaro Clapp <lazaro.clapp@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Serialization version 3: update method serialization to exclude type use annotations and type arguments (uber#735) This PR updates serialization to version `3` and burns version `2`. Changes in serialization version 3: - Type arguments and Type use annotations are excluded from the serialized method signatures. --- Additional context: This PR updates method serialization to ensure only relevant information in a method signature is serialized. Previous to this PR, method signatures serialization was relying on calling `.toString()` method, but if a parameter contains an annotation of type `@Target({TYPE_USE})` the annotation will exist in the `.toSting()` output which is not part of a method signature. This PR updates serialization to ensure exact method signature as expected is serialized. Suggested change suggested changes ternary operator as a method argument Docs: `-XepExcludedPaths` was added in 2.1.3, not 2.13 (uber#744) Add command line option to skip specific library models. (uber#741) This PR adds the CLI option `-XepOpt:NullAway:IgnoreLibraryModelsFor` which takes a list of methods, given as fully qualified class name + method simple name (i.e. independent of argument types). Methods matching this list will be skipped when loading library models (from any implementation of the `LibraryModels`'s `@AutoService` as well as our included models). This can be used in a per-compilation target basis to disable NullAway's library models for ease of upgrading to new versions with stricter modeling of common JDK or third-party APIs. We considered a few alternative approaches here, but decided against: - Simply using another instance of `LibraryModels` to "invert" the models given by NullAway's default library models: This would have required no code changes, but doesn't work in all cases. If NullAway's default models make the return of method `foo()` `@Nullable`, for example, then a new model that makes the return `@NonNull` will break `@Nullable` overrides of `foo()`. In general, we want to go back to "optimistic assumptions" rather than just replace the library model. - We could have a list of methods in the `LibraryModels` for which to ignore previous models, and have those override any models on those methods coming from a different `LibraryModels` implementation. But, from the point of view of the user configuring NullAway, this is complex: they need to have an instance of custom library models in their build, and changing java plugin classpath deps on a per-target basis is more complex than changing CLI arguments (e.g. due to JVM re-use by the build system). - We could provide more specific disabling of library models (e.g. a specific method signature or removing only one particular kind of model from a method, such as keeping the model on the return value, but removing it from an argument, or removing a null-implies-false model or similar). We could revisit this in the future, but supporting this would make the syntax of the CLI values a lot more complex. For now, we believe just turning off all models for a given method is a sufficient degree of granularity. - Per-package/per-class/regex based ignore specs: See above. Avoiding complexity until we need it. Note: If and when this lands, it needs a Wiki documentation update! --------- Co-authored-by: Manu Sridharan <msridhar@gmail.com>
This PR is built upon #738.#738 is landed.This PR updates serialization to version
3
and burns version2
.Changes in serialization version 3:
Additional context:
This PR updates method serialization to ensure only relevant information in a method signature is serialized. Previous to this PR, method signatures serialization was relying on calling
.toString()
method, but if a parameter contains an annotation of type@Target({TYPE_USE})
the annotation will exist in the.toSting()
output which is not part of a method signature. This PR updates serialization to ensure exact method signature as expected is serialized.