-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call graph edges missing in programs with reflection set to STRING_ONLY compared to NONE #1116
Comments
It seems like there might be multiple different issues being discussed here, and I haven't read the related source code all that carefully. Let me try to guess at what is going on for the first example with Antlr. When reflection handling of some form is enabled, WALA tries to generate synthetic models of reflective methods like https://github.com/wala/WALA/blob/784ae143c3fe1bef8033741eaac102fdfd7a315f/com.ibm.wala.core/src/main/java/com/ibm/wala/analysis/reflection/ClassFactoryContextInterpreter.java (for When these context interpreters are being used, you may not see call graph edges to the unmodelled versions of Does this make sense? Does it explain WALA's behavior on the other examples? |
Hi Manu, I think this explains most of the behavior. I still don't understand the second example. Here, we do have a constant string, so I'm assuming WALA's context interpreter can model the call. Does WALA's modeling of the call with a constant string assume an exception can never be thrown? If so, why? Also, do you have any insight on why the first example only happens when the code is compiled with Java 11 and not with Java 8? |
@msridhar I have another question on the explanation you gave. Consider the following program:
Where the Under both configurations of reflection (STRING_ONLY and NONE) WALA's call graph contains the call to Thanks for your time, Austin |
This is a strange one, I agree. Can you come up with a reduced test case for this one? Ideally a self-contained program that does not involve hsqldb? That would make it much easier for me to debug.
This I don't fully follow. Are you saying that with Java 11, WALA successfully finds the call graph edges to
Yeah, I agree this is inconsistent with my explanation. I haven't looked at this code in a long time and I have forgotten to some degree how it works. Do you have a small example of code where the WALA call graph is missing CG edges to |
Sure. Here's a program that exhibits the behavior:
Under STRING_ONLY, WALA does not have an edge to Exception.getMessage(). On NONE, it does. I compiled the program with temurin-jdk 8, and am using the code here to run WALA and print the call graph: https://github.com/amordahl/WALAInterface
No. If I compile the antlr JAR with Java 8, then both STRING_ONLY and NONE show edges to Class.forName and Class.newInstance. If I compile the jar with Java 11, then only NONE has edges to these methods. I don't think either configuration has edges to
This is tricky, I've been trying to do this myself but it has been a difficult task. Here's a reduced version of hsqldb.jar hsqldb.zip that shows the behavior (only about 300 LoC). Specifically, the configuration with STRING_ONLY misses the edges here, in org.hsqldb.util.ScriptTool.java (this is the decompiled code as produced by IntelliJ):
|
This is all great info for debugging, thanks @amordahl! One more ask: can you paste your whole "driver" for building a WALA call graph? I just want to see what type of CFA builder you are using, etc. Thanks! |
Sure. It's kind of a long file since we made our driver try to support all of WALA's configurability:
The actual call graph is built and printed in |
Thanks! Is there a particular value of |
Yes, we used ZERO_CFA for these. |
Ok, I added a test for the first issue here (missing edge to I'm not seeing the issue you mentioned for
So there is an edge from the main method to @amordahl did I miss something here? |
Hey Manu, Thanks for the reply. The test case you added passed for me, too. I noticed you put your classes in packages, and, turns out, I can only recreate the behavior that I brought up when Application and MyClass are in the default package! When I moved them to the same package structure you had ( Default:
STRING_ONLY:
|
I can reproduce the issue using the default package! Weird 🤔 I will update when I've tracked down the root cause. |
Ok, I figured out a bit more here. First, WALA's modeling of Lines 98 to 108 in 784ae14
I think the default package stuff was a red herring; the issue was that my original test didn't pass the fully-qualified name for Now, when WALA's modeling kicks in, it generates a synthetic version of public class Application {
public static void main(String[] args) {
try {
Class clazz = Class.forName("MyClass");
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
} Now the question is, what should we do about this? If we look at the JDK 8 Javadoc for
To be more sound, we should probably model that some type of @amordahl does the above make sense and is it consistent with your examples? If so, what kind of change / fix in WALA do you think makes sense for these scenarios? |
That makes a lot of sense! Thanks for the explanation. I think this answers the issue in my second example, but I think we still haven't figured out why WALA is missing the methods in my first and third examples. To summarize:
This is a good question. In the case where WALA can successfully resolve the string target, it's a pretty reasonable assumption that we won't get a ClassNotFoundException. The best counterargument I can think of is in the case of a taint analysis: if sensitive data is leaked out of a catch block, we want to know that, because it potentially opens up an attack vector wherein a malicious actor could cause a leak by gaining access to the server and deleting a .class file. To me, it makes the most sense and is most sound to model throwing this exception, but I can definitely understand both sides of the argument. |
Let's look at this case next. Do we have a reduced input to for reproducing it?
After further thought, my feeling is that when |
Have you tried reproducing the bug in the reduced hsqldb.jar I provided? We've managed to automatically reduce that program to ~300 lines of code while still reproducing the example. My suspicion is that the issue I identified on antlr and hsqldb are the same.
I concede your point. I do think it might be a good idea to make these tradeoffs clearer in the documentation of |
I just tried with hsqldb.jar, using
I disagree that NONE is more sound. As I see it, STRING_ONLY is more precise, as it removes the spurious possibility of a |
I used the synthetic entrypoint obtained from |
Based on discussion in #1116, in particular #1116 (comment)
Thanks, I still can't repro. My changes are here: https://github.com/msridhar/WALA/tree/hsqldb-reflection You can run
|
@msridhar I also can't repo. I investigated a little closer and it actually seems like the problem is in the call to cg.getPossibleTargets on line 46 of my driver: I can see in the callgraph itself that these targets are present. However, the call to getPossibleTargets returns an empty set for the call sites I showed you. I'm guessing this is because of the check here: WALA/com.ibm.wala.core/src/main/java/com/ibm/wala/ipa/callgraph/impl/ExplicitCallGraph.java Line 475 in e24abb1
ExplicitCallGraph .
Is this intended behavior? |
Hrm, still not sure what is going on. I modified my test code to be very similar to yours: And I still see those targets in the output:
Any other guesses? |
Hey @amordahl just curious if you found another root cause here? Or if not, can we close out this issue? Let me know if you found other problems. |
Hey Manu, sorry for the delay. ICSE deadline last week 😫 It's very weird; I can still recreate this issue. Which version of WALA do your test cases use? |
They are all against the latest master branch. You can test against master by publishing a |
Ah @msridhar, finally figured out why we're seeing different results. In my driver, I set both handleZeroLengthArray and handleStaticInit to false by default. In your test case, these are set to true. When I manually set them both to false, I am replicating the behavior I am describing. Does it make sense that these edges would be hidden if these two options are set to false only under STRING_ONLY? |
I would expect only Also, out of curiosity, can you comment on why you are setting these to false? I would expect |
Small update: disabling handling of zero-length arrays was added in d22ee36 and is related to using WALA with Averroes. In normal usage I would not recommend setting that to false. |
I extracted all configuration options I could find and, for boolean options, set them to FALSE by default. I can rerun my testing with handleZeroLengthArray set to true, but for now, are you aware of any reason why these edges would be hidden under this albeit uncommon configuration? |
I could see |
@amordahl any update on this issue? Do you still think there may be a WALA bug here? |
Hey Manu,
I’m currently on vacation. I will get back to this within one week. Thanks for your patience :)
|
Hi Manu, sorry again for the late reply. I am currently rerunning testing of WALA with the configuration you suggested (i.e. handleStaticInit and handleZeroLengthArrays) to see if this issue still manifests. Will update when this is done. |
Hi @msridhar, I have results. I modified my configuration model of WALA to make handleStaticInit and handleZeroLengthArrays true by default. I still see the strange behavior between the I've attached a package containing reduced versions of the apps as well as the .json reports produced by my testing tool. You can check out the "unexpected_diffs" element to see what causes the violation. I will be more responsive on this going forward :) Had a few back-to-back projects that sucked up all my time. |
I haven't had time to look at this again, and still don't for the next bit. Just FYI that it's on my TODO list 🙂 |
Hi,
I've noticed some (seemingly) strange behavior when running WALA on programs under different reflection settings.
For example, consider the antlr.jar (antlr.zip) program from DaCapo-2006: specifically, the following piece of code from
antlr.Tool.doEverything
.When I run WALA with the
NONE
Reflection Option, I get a call graph that includes the calls to Class.forName and Class.newInstance in the inner try block, but not the calls to setBehavior, setAnalyzer, or setTool. Even more strangely, when I run WALA with Reflection set to theSTRING_ONLY
setting, I don't even the edges to class.forName or class.newInstance(). Notably, if I compile antlr with Java 8 (the attached version was compiled with Java 11), then the edges to setBehavior, setAnalyzer, and setTool are still missing, but both configurations report the edges to Class.forName and class.newInstance, so maybe this is an issue with some newer JVM bytecode features like invokedynamic? However, I was able to inspect the control flow graph and see that the call to newInstance and forName were reachable.I've noticed this behavior on multiple programs. For example, on hsqldb.jar and hsqldb-deps.jar (hsqldb.zip)
we see strange behavior on the following try-catch block in org.hsqldb.TestBase.setUp:
WALA run with the
NONE
reflection option reports an edge to Throwable.getMessage from the call to Exception.getMessage in the catch block. WALA run with theSTRING_ONLY
option does not report any outbound edges from this block, and indeed, from the method this block is in at all.Finally, on pmd.jar (pmd.zip), consider the following method in net.sourceforge.pmd.RuleSetFactory:
the WALA configuration with STRING_ONLY does not report the calls to ClassLoader.loadClass or Class.newInstance on the last line, while the configuration with NONE does.
Any insight as to why this is happening? I'm not sure if this is a bug or expected behavior of the STRING_ONLY option. Thanks!
Austin
The text was updated successfully, but these errors were encountered: