Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call graph edges missing in programs with reflection set to STRING_ONLY compared to NONE #1116

Open
amordahl opened this issue Aug 17, 2022 · 34 comments
Assignees
Labels

Comments

@amordahl
Copy link

amordahl commented Aug 17, 2022

Hi,

I've noticed some (seemingly) strange behavior when running WALA on programs under different reflection settings.

For example, consider the antlr.jar (antlr.zip) program from DaCapo-2006: specifically, the following piece of code from antlr.Tool.doEverything.

        try {
            ANTLRParser p = new ANTLRParser(tokenBuf, behavior, this);
            p.setFilename(grammarFile);
            p.grammar();
            if (hasError()) {
                fatalError("Exiting due to errors.");
            }
            checkForInvalidArguments(modifiedArgs, cmdLineArgValid);

            // Create the right code generator according to the "language" option
            CodeGenerator codeGen;

            // SAS: created getLanguage() method so subclass can override
            //      (necessary for VAJ interface)
            String codeGenClassName = "antlr." + getLanguage(behavior) + "CodeGenerator";
            try {
                // HERE
                Class codeGenClass = Class.forName(codeGenClassName);
                codeGen = (CodeGenerator)codeGenClass.newInstance();
                codeGen.setBehavior(behavior);
                codeGen.setAnalyzer(analyzer);
                codeGen.setTool(this);
                codeGen.gen();
            }
            catch (ClassNotFoundException cnfe) {
                panic("Cannot instantiate code-generator: " + codeGenClassName);
            }
            catch (InstantiationException ie) {
                panic("Cannot instantiate code-generator: " + codeGenClassName);
            }
            catch (IllegalArgumentException ie) {
                panic("Cannot instantiate code-generator: " + codeGenClassName);
            }
            catch (IllegalAccessException iae) {
                panic("code-generator class '" + codeGenClassName + "' is not accessible");
            }

When I run WALA with the NONE Reflection Option, I get a call graph that includes the calls to Class.forName and Class.newInstance in the inner try block, but not the calls to setBehavior, setAnalyzer, or setTool. Even more strangely, when I run WALA with Reflection set to the STRING_ONLY setting, I don't even the edges to class.forName or class.newInstance(). Notably, if I compile antlr with Java 8 (the attached version was compiled with Java 11), then the edges to setBehavior, setAnalyzer, and setTool are still missing, but both configurations report the edges to Class.forName and class.newInstance, so maybe this is an issue with some newer JVM bytecode features like invokedynamic? However, I was able to inspect the control flow graph and see that the call to newInstance and forName were reachable.

I've noticed this behavior on multiple programs. For example, on hsqldb.jar and hsqldb-deps.jar (hsqldb.zip)
we see strange behavior on the following try-catch block in org.hsqldb.TestBase.setUp:

        try {
            Class.forName("org.hsqldb.jdbcDriver");
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println(this + ".setUp() error: " + e.getMessage());
        }

WALA run with the NONE reflection option reports an edge to Throwable.getMessage from the call to Exception.getMessage in the catch block. WALA run with the STRING_ONLY option does not report any outbound edges from this block, and indeed, from the method this block is in at all.

Finally, on pmd.jar (pmd.zip), consider the following method in net.sourceforge.pmd.RuleSetFactory:

   private void parseInternallyDefinedRuleNode(RuleSet ruleSet, Node ruleNode)
            throws ClassNotFoundException, InstantiationException, IllegalAccessException {
        Element ruleElement = (Element) ruleNode;

        String attribute = ruleElement.getAttribute("class");
        Rule rule = (Rule) classLoader.loadClass(attribute).newInstance();
}

the WALA configuration with STRING_ONLY does not report the calls to ClassLoader.loadClass or Class.newInstance on the last line, while the configuration with NONE does.

Any insight as to why this is happening? I'm not sure if this is a bug or expected behavior of the STRING_ONLY option. Thanks!

Austin

@msridhar
Copy link
Member

It seems like there might be multiple different issues being discussed here, and I haven't read the related source code all that carefully. Let me try to guess at what is going on for the first example with Antlr. When reflection handling of some form is enabled, WALA tries to generate synthetic models of reflective methods like forName and newInstance, via context interpreters. See, e.g.:

https://github.com/wala/WALA/blob/784ae143c3fe1bef8033741eaac102fdfd7a315f/com.ibm.wala.core/src/main/java/com/ibm/wala/analysis/reflection/ClassFactoryContextInterpreter.java (for forName())
https://github.com/wala/WALA/blob/784ae143c3fe1bef8033741eaac102fdfd7a315f/com.ibm.wala.core/src/main/java/com/ibm/wala/analysis/reflection/ClassNewInstanceContextInterpreter.java (for newInstance())

When these context interpreters are being used, you may not see call graph edges to the unmodelled versions of forName() and newInstance(). Instead, you may only see edges to synthetic models of those methods, when WALA succeeds in modeling them. Unfortunately, for the Antlr example, I don't think it will model the forName() call successfully, since codeGenClassName is not a String constant. So, with the STRINGS_ONLY configuration, you end up not seeing any call graph edges to those methods, and the other edges are missing as well (since WALA does not discover any actual objects returned by the newInstance() call, required to find targets for the subsequent calls). I don't think this behavior has anything to do with invokedynamic.

Does this make sense? Does it explain WALA's behavior on the other examples?

@amordahl
Copy link
Author

Hi Manu,

I think this explains most of the behavior. I still don't understand the second example. Here, we do have a constant string, so I'm assuming WALA's context interpreter can model the call. Does WALA's modeling of the call with a constant string assume an exception can never be thrown? If so, why?

Also, do you have any insight on why the first example only happens when the code is compiled with Java 11 and not with Java 8?

@amordahl
Copy link
Author

amordahl commented Aug 19, 2022

@msridhar I have another question on the explanation you gave. Consider the following program:

import targets.HasHelloWorld;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;

public class Application {

    public static void main(String[] args) throws ClassNotFoundException, NoSuchMethodException, InvocationTargetException, InstantiationException, IllegalAccessException, IOException {
        String targetClassName = "targets." + getClassName();
        Class<?> clazz = Class.forName(targetClassName);
        HasHelloWorld obj = (HasHelloWorld) clazz.getConstructor().newInstance();
        System.out.println(obj.helloWorld());
    }

    public static String getClassName() throws IOException {
        String line = null;
        try (BufferedReader br = new BufferedReader(new FileReader("prefs.txt"))) {
            StringBuilder sb = new StringBuilder();
            line = br.readLine();
        }
            return line.substring(0,1);
    }
}

Where the targets package contains an interface HasHelloWorld that defines a method String helloWorld(), and we have two instantiations of that interface, A and B. (The full source code is here: src.zip).

Under both configurations of reflection (STRING_ONLY and NONE) WALA's call graph contains the call to Class.forName and Class.newInstance from Application.main, even though I don't think WALA would be able to resolve the value of the string targetClassName. Given this, I'm still confused on when callgraphs constructed by STRING_ONLY contain these edges and when they don't.

Thanks for your time,

Austin

@msridhar
Copy link
Member

I think this explains most of the behavior. I still don't understand the second example. Here, we do have a constant string, so I'm assuming WALA's context interpreter can model the call. Does WALA's modeling of the call with a constant string assume an exception can never be thrown? If so, why?

This is a strange one, I agree. Can you come up with a reduced test case for this one? Ideally a self-contained program that does not involve hsqldb? That would make it much easier for me to debug.

Also, do you have any insight on why the first example only happens when the code is compiled with Java 11 and not with Java 8?

This I don't fully follow. Are you saying that with Java 11, WALA successfully finds the call graph edges to setBehavior() etc?

Under both configurations of reflection (STRING_ONLY and NONE) WALA's call graph contains the call to Class.forName and Class.newInstance from Application.main, even though I don't think WALA would be able to resolve the value of the string targetClassName. Given this, I'm still confused on when callgraphs constructed by STRING_ONLY contain these edges and when they don't.

Yeah, I agree this is inconsistent with my explanation. I haven't looked at this code in a long time and I have forgotten to some degree how it works. Do you have a small example of code where the WALA call graph is missing CG edges to Class.forName and Class.newInstance? That would be helpful to track down what is going on.

@amordahl
Copy link
Author

This is a strange one, I agree. Can you come up with a reduced test case for this one? Ideally a self-contained program that does not involve hsqldb? That would make it much easier for me to debug.

Sure. Here's a program that exhibits the behavior:

// Application.java
public class Application {

    public static void main(String[] args) {
        try {
            Class clazz = Class.forName("MyClass");
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }
    }
}
// MyClass.java
public class MyClass {

    public String sayHello() {
        return "Hello from MyClass!";
    }
}

Under STRING_ONLY, WALA does not have an edge to Exception.getMessage(). On NONE, it does. I compiled the program with temurin-jdk 8, and am using the code here to run WALA and print the call graph: https://github.com/amordahl/WALAInterface

This I don't fully follow. Are you saying that with Java 11, WALA successfully finds the call graph edges to setBehavior() etc?

No. If I compile the antlr JAR with Java 8, then both STRING_ONLY and NONE show edges to Class.forName and Class.newInstance. If I compile the jar with Java 11, then only NONE has edges to these methods. I don't think either configuration has edges to setBehavior etc. under either compilation setting. That's why I was wondering whether it might have something to do with invokedynamic.

Yeah, I agree this is inconsistent with my explanation. I haven't looked at this code in a long time and I have forgotten to some degree how it works. Do you have a small example of code where the WALA call graph is missing CG edges to Class.forName and Class.newInstance? That would be helpful to track down what is going on.

This is tricky, I've been trying to do this myself but it has been a difficult task. Here's a reduced version of hsqldb.jar hsqldb.zip that shows the behavior (only about 300 LoC). Specifically, the configuration with STRING_ONLY misses the edges here, in org.hsqldb.util.ScriptTool.java (this is the decompiled code as produced by IntelliJ):

    public void execute(String[] var1) {
        Properties var2 = pProperties;
        String var3 = var2.getProperty("driver", "org.hsqldb.jdbcDriver");
        boolean var4 = var2.getProperty("log", "false").equalsIgnoreCase("true");

        try {
            Class.forName(var3).newInstance();
        } catch (Exception var6) {
        }

    }
}

@msridhar
Copy link
Member

This is all great info for debugging, thanks @amordahl! One more ask: can you paste your whole "driver" for building a WALA call graph? I just want to see what type of CFA builder you are using, etc. Thanks!

@amordahl
Copy link
Author

Sure. It's kind of a long file since we made our driver try to support all of WALA's configurability:

class Application {

    //private static final Logger logger = LoggerFactory.getLogger(Application.class);

    private static CommandLineOptions clo;

    public static void main(String[] args)
            throws WalaException, CallGraphBuilderCancelException, IOException {
        // Initialize command line and print help if requested.
        Application.clo = new CommandLineOptions();
        new CommandLine(clo).parseArgs(args);
        if (clo.usageHelpRequested) {
            CommandLine.usage(new CommandLineOptions(), System.out);
            return;
        }

        // Build call graph.
        CallGraph cg = new Application().makeCallGraph(clo);

        // Print to output.
        FileWriter fw = new FileWriter(String.valueOf(clo.callgraphOutput));
        for (CGNode cgn : cg) {
            Iterator<CallSiteReference> callSiteIterator = cgn.iterateCallSites();
            while (callSiteIterator.hasNext()) {
                CallSiteReference csi = callSiteIterator.next();
                for (CGNode target : cg.getPossibleTargets(cgn, csi)) {
                    fw.write(String.format(
                            "%s\t%s\t%s\t%s\t%s\n",
                            cgn.getMethod(),
                            csi.toString(),
                            cgn.getContext(),
                            target.getMethod().getSignature(),
                            target.getContext()));
                }
            }
        }
        System.out.println("Wrote callgraph to " + clo.callgraphOutput.toString());
        fw.close();
    }

    public CallGraph makeCallGraph(CommandLineOptions clo)
            throws ClassHierarchyException, IOException, CallGraphBuilderCancelException {
        AnalysisScope scope =
                AnalysisScopeReader.makeJavaBinaryAnalysisScope(clo.appJar, null);

        ClassHierarchy cha = ClassHierarchyFactory.make(scope);

        Iterable<Entrypoint> entrypoints =
                com.ibm.wala.ipa.callgraph.impl.Util.makeMainEntrypoints(scope, cha);
        AnalysisOptions options = new AnalysisOptions(scope, entrypoints);
        options.setReflectionOptions(clo.reflection);
        options.setHandleStaticInit(clo.handleStaticInit);
        options.setUseConstantSpecificKeys(clo.useConstantSpecificKeys);
        options.setUseStacksForLexicalScoping(clo.useStacksForLexicalScoping);
        options.setUseLexicalScopingForGlobals(clo.useLexicalScopingForGlobals);
        options.setMaxNumberOfNodes(clo.maxNumberOfNodes);
        options.setHandleZeroLengthArray(clo.handleZeroLengthArray);

        // //
        // build the call graph
        // //
        CallGraphBuilder<InstanceKey> builder;
        switch (clo.callGraphBuilder) {
            case NCFA:
                builder = Util.makeNCFABuilder(clo.sensitivity, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case NOBJ:
                builder = Util.makeNObjBuilder(clo.sensitivity, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case VANILLA_NCFA:
                builder =
                        Util.makeVanillaNCFABuilder(clo.sensitivity, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case VANILLA_NOBJ:
                builder =
                        Util.makeVanillaNObjBuilder(clo.sensitivity, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case RTA:
                builder = Util.makeRTABuilder(options, new AnalysisCacheImpl(), cha, scope);
                break;
            case ZERO_CFA:
                builder = Util.makeZeroCFABuilder(Language.JAVA, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case ZEROONE_CFA:
                builder = Util.makeZeroOneCFABuilder(Language.JAVA, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case VANILLA_ZEROONECFA:
                builder =
                        Util.makeVanillaZeroOneCFABuilder(Language.JAVA, options, new AnalysisCacheImpl(), cha, scope);
                break;
            case ZEROONE_CONTAINER_CFA:
                builder = Util.makeZeroOneContainerCFABuilder(options, new AnalysisCacheImpl(), cha, scope);
                break;
            case VANILLA_ZEROONE_CONTAINER_CFA:
                builder = Util.makeVanillaZeroOneContainerCFABuilder(options, new AnalysisCacheImpl(), cha, scope);
                break;
            case ZERO_CONTAINER_CFA:
                builder = Util.makeZeroContainerCFABuilder(options, new AnalysisCacheImpl(), cha, scope);
                break;
            default:
                throw new IllegalArgumentException("Invalid call graph algorithm.");
        }
        long startTime = System.currentTimeMillis();

        MonitorUtil.IProgressMonitor pm = new MonitorUtil.IProgressMonitor() {
            private boolean cancelled;

            @Override
            public void beginTask(String s, int i) {

            }

            @Override
            public void subTask(String s) {

            }

            @Override
            public void cancel() {
                cancelled = true;
            }

            @Override
            public boolean isCanceled() {
                if (System.currentTimeMillis() - startTime > clo.timeout) {
                    cancelled = true;
                }
                return cancelled;
            }

            @Override
            public void done() {

            }

            @Override
            public void worked(int i) {

            }

            @Override
            public String getCancelMessage() {
                return "Timed out.";
            }
        };
        return builder.makeCallGraph(options, pm);
    }
}

The actual call graph is built and printed in main. clo is a JCommander command line object; you can see the options we support at https://github.com/amordahl/WALAInterface/blob/main/src/main/java/edu/utdallas/amordahl/CommandLineOptions.java. The pom.xml file is at https://github.com/amordahl/WALAInterface/blob/main/src/main/java/edu/utdallas/amordahl/CommandLineOptions.java; we are using wala 1.5.7.

@msridhar
Copy link
Member

Thanks! Is there a particular value of CommandLineOptions.callGraphBuilder that you were using when reproducing the strange behaviors above?

@amordahl
Copy link
Author

Yes, we used ZERO_CFA for these.

@msridhar
Copy link
Member

Ok, I added a test for the first issue here (missing edge to getMessage()):

#1117

I'm not seeing the issue you mentioned for STRING_ONLY, @amordahl. Here's what I see in the call graph when I print it:

Node: < Application, Lreflection/StringsOnlyGetMessage, main([Ljava/lang/String;)V > Context: Everywhere
 - invokestatic < Application, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; >@2
     -> Node: synthetic < Primordial, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; > Context: Everywhere
 - invokevirtual < Application, Ljava/lang/Exception, getMessage()Ljava/lang/String; >@14
     -> Node: < Primordial, Ljava/lang/Throwable, getMessage()Ljava/lang/String; > Context: Everywhere
 - invokevirtual < Application, Ljava/io/PrintStream, println(Ljava/lang/String;)V >@17
     -> Node: < Primordial, Ljava/io/PrintStream, println(Ljava/lang/String;)V > Context: Everywhere

So there is an edge from the main method to Throwable.getMessage() (which the Exception class inherits). You can reproduce by cloning the PR branch and running ./gradlew :com.ibm.wala.core:test --tests com.ibm.wala.core.tests.callGraph.ReflectionTest.testStringsOnlyGetMessage locally. I tested on JDK 8 and JDK 11 on my machine.

@amordahl did I miss something here?

@amordahl
Copy link
Author

amordahl commented Aug 22, 2022

Hey Manu,

Thanks for the reply. The test case you added passed for me, too. I noticed you put your classes in packages, and, turns out, I can only recreate the behavior that I brought up when Application and MyClass are in the default package! When I moved them to the same package structure you had (reflection.StringsOnlyGetMessage), the issue disappeared and the call graphs were the exact same. Any insight into what's happening? For reference, here are the call graphs I am getting when these classes are in the default package (using the TSV printer I pasted above).

Default:

synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >        invokestatic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V >@0  Everywhere      com.ibm.wala.FakeRootClass.fakeWorldClinit()V   Everywhere
synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >        invokespecial < Primordial, Ljava/lang/Object, <init>()V >@4    Everywhere      java.lang.Object.<init>()V      Everywhere
synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >        invokestatic < Application, LApplication, main([Ljava/lang/String;)V >@5        Everywhere      Application.main([Ljava/lang/String;)V  Everywhere
< Application, LApplication, main([Ljava/lang/String;)V >       invokestatic < Application, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; >@2  Everywhere      java.lang.Class.forName(Ljava/lang/String;)Ljava/lang/Class;    Everywhere
< Application, LApplication, main([Ljava/lang/String;)V >       invokevirtual < Application, Ljava/lang/Exception, getMessage()Ljava/lang/String; >@14  Everywhere      java.lang.Throwable.getMessage()Ljava/lang/String;      Everywhere

STRING_ONLY:

synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >	invokestatic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V >@0	Everywhere	com.ibm.wala.FakeRootClass.fakeWorldClinit()V	Everywhere
synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >	invokespecial < Primordial, Ljava/lang/Object, <init>()V >@4	Everywhere	java.lang.Object.<init>()V	Everywhere
synthetic < Primordial, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V >	invokestatic < Application, LApplication, main([Ljava/lang/String;)V >@5	Everywhere	Application.main([Ljava/lang/String;)V	Everywhere
< Application, LApplication, main([Ljava/lang/String;)V >	invokestatic < Application, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; >@2	Everywhere	java.lang.Class.forName(Ljava/lang/String;)Ljava/lang/Class;	DelegatingContext [A=JavaTypeContext<point: <Application,LMyClass>>, B=Everywhere]

@msridhar
Copy link
Member

I can reproduce the issue using the default package! Weird 🤔 I will update when I've tracked down the root cause.

@msridhar msridhar self-assigned this Aug 23, 2022
@msridhar msridhar added the bug label Aug 23, 2022
@msridhar
Copy link
Member

Ok, I figured out a bit more here. First, WALA's modeling of Class.forName() only kicks in when the argument is a String constant and WALA can actually find the class within the current IClassHierarchy:

if (symbolTable.isStringConstant(use)) {
String className =
StringStuff.deployment2CanonicalTypeString(symbolTable.getStringValue(use));
TypeReference t =
TypeReference.findOrCreate(
caller.getMethod().getDeclaringClass().getClassLoader().getReference(), className);
IClass klass = caller.getClassHierarchy().lookupClass(t);
if (klass != null) {
return new JavaTypeContext(new PointType(klass));
}
}

I think the default package stuff was a red herring; the issue was that my original test didn't pass the fully-qualified name for MyClass, so WALA's modeling didn't even kick in.

Now, when WALA's modeling kicks in, it generates a synthetic version of forName() that returns the right Class object for the given argument. It turns out this synthetic method is (unsoundly) modeled as not ever throwing an exception. So in the code below (when MyClass is present in the class hierarchy), WALA thinks no value is ever assigned to e, and hence getMessage() will never execute, so there are no corresponding targets in the call graph:

public class Application {
    public static void main(String[] args) {
        try {
            Class clazz = Class.forName("MyClass");
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }
    }
}

Now the question is, what should we do about this? If we look at the JDK 8 Javadoc for forName() it indicates the method can throw the following exceptions:

Throws:
LinkageError - if the linkage fails
ExceptionInInitializerError - if the initialization provoked by this method fails
ClassNotFoundException - if the class cannot be located

To be more sound, we should probably model that some type of LinkageError or ExceptionInInitializerError can be thrown from the synthetic method. But neither of those exception types will be caught by the catch block in the example above, so the call graph will remain the same. Should the synthetic method be able to throw ClassNotFoundException? While this would be possible at runtime, I think it's better to not model this possibility in the case where the given class was resolved by WALA; after all, we found the class! But I'm open to discussion on this.

@amordahl does the above make sense and is it consistent with your examples? If so, what kind of change / fix in WALA do you think makes sense for these scenarios?

@amordahl
Copy link
Author

amordahl commented Aug 23, 2022

does the above make sense and is it consistent with your examples?

That makes a lot of sense! Thanks for the explanation. I think this answers the issue in my second example, but I think we still haven't figured out why WALA is missing the methods in my first and third examples. To summarize:

  • 1. STRING_ONLY missing edges to Class.forName and Class.newInstance in ANTLR. I think we still haven't figured this one out yet, as I was able to construct a test case in which STRING_ONLY should not be able to model the string yet it includes the edges.
  • 2. Missing edges out of an exceptional block in STRING_ONLY. I think the explanation you just gave fully resolves this.
  • 3. STRING_ONLY missing edges to ClassLoader.loadClass and Class.newInstance. I'm guessing this one happened because WALA wasn't able to find a class "Class" in its hierarchy. I think this is probably the same issue as (1.); the question is when does WALA's STRING_ONLY mode add these edges and when doesn't it?

what kind of change / fix in WALA do you think makes sense for these scenarios?

This is a good question. In the case where WALA can successfully resolve the string target, it's a pretty reasonable assumption that we won't get a ClassNotFoundException. The best counterargument I can think of is in the case of a taint analysis: if sensitive data is leaked out of a catch block, we want to know that, because it potentially opens up an attack vector wherein a malicious actor could cause a leak by gaining access to the server and deleting a .class file. To me, it makes the most sense and is most sound to model throwing this exception, but I can definitely understand both sides of the argument.

@msridhar
Copy link
Member

  1. STRING_ONLY missing edges to Class.forName and Class.newInstance in ANTLR. I think we still haven't figured this one out yet, as I was able to construct a test case in which STRING_ONLY should not be able to model the string yet it includes the edges.

Let's look at this case next. Do we have a reduced input to for reproducing it?

This is a good question. In the case where WALA can successfully resolve the string target, it's a pretty reasonable assumption that we won't get a ClassNotFoundException. The best counterargument I can think of is in the case of a taint analysis: if sensitive data is leaked out of a catch block, we want to know that, because it potentially opens up an attack vector wherein a malicious actor could cause a leak by gaining access to the server and deleting a .class file. To me, it makes the most sense and is most sound to model throwing this exception, but I can definitely understand both sides of the argument.

After further thought, my feeling is that when Class.forName() is modeled, we should not have the model possibly throw a ClassNotFoundException. One of the main points of WALA is to run a whole-program analysis and see what interesting program facts can be discovered / verified in that scenario. (Of course this is "soundy" due to dynamic class loading, etc., but let's ignore that for the moment.) I don't think it makes sense for WALA analyses to try to be sound in the face of classes present at analysis time being deleted, modified, etc. We should possibly extend the Class.forName() models to throw runtime errors; I opened #1118 on that.

@amordahl
Copy link
Author

amordahl commented Aug 24, 2022

Let's look at this case next. Do we have a reduced input to for reproducing it?

Have you tried reproducing the bug in the reduced hsqldb.jar I provided? We've managed to automatically reduce that program to ~300 lines of code while still reproducing the example. My suspicion is that the issue I identified on antlr and hsqldb are the same.

After further thought, my feeling is that when Class.forName() is modeled, we should not have the model possibly throw a ClassNotFoundException. One of the main points of WALA is to run a whole-program analysis and see what interesting program facts can be discovered / verified in that scenario. (Of course this is "soundy" due to dynamic class loading, etc., but let's ignore that for the moment.) I don't think it makes sense for WALA analyses to try to be sound in the face of classes present at analysis time being deleted, modified, etc. We should possibly extend the Class.forName() models to throw runtime errors; I opened #1118 on that.

I concede your point. I do think it might be a good idea to make these tradeoffs clearer in the documentation of ReflectionOptions -- I'd be happy to do that in a PR. As it currently stands, ReflectionOptions gives (at least me) the impression that the only difference between NONE and STRING_ONLY is that the latter tries to do some string resolution, so I thought STRING_ONLY would be strictly more sound, but there are cases where NONE is actually a more sound option for handling reflection than STRING_ONLY.

@msridhar
Copy link
Member

Have you tried reproducing the bug in the reduced hsqldb.jar I provided? We've managed to automatically reduce that program to ~300 lines of code while still reproducing the example. My suspicion is that the issue I identified on antlr and hsqldb are the same.

I just tried with hsqldb.jar, using Harness.main() as the entrypoint, 0-CFA, and STRING_ONLY. Unfortunately, I didn't find any methods from org.hsqldb.util.ScriptTool to be present in the resulting call graph at all. Do I need any other jars in the analysis scope or other config? Or should I choose a different entrypoint?

I concede your point. I do think it might be a good idea to make these tradeoffs clearer in the documentation of ReflectionOptions -- I'd be happy to do that in a PR. As it currently stands, ReflectionOptions gives (at least me) the impression that the only difference between NONE and STRING_ONLY is that the latter tries to do some string resolution, so I thought STRING_ONLY would be strictly more sound, but there are cases where NONE is actually a more sound option for handling reflection than STRING_ONLY.

I disagree that NONE is more sound. As I see it, STRING_ONLY is more precise, as it removes the spurious possibility of a ClassNotFoundException being thrown. And, it may indeed be surprising that more reflection handling can add more precision, thereby resulting in removal of spurious call graph edges. I welcome a PR on this, though we may have to negotiate the verbiage 🙂

@amordahl
Copy link
Author

I used the synthetic entrypoint obtained from com.ibm.wala.ipa.callgraph.impl.Util.makeMainEntrypoints(scope, cha);

msridhar added a commit that referenced this issue Aug 26, 2022
Based on discussion in #1116, in particular #1116 (comment)
@msridhar
Copy link
Member

Thanks, I still can't repro. My changes are here:

https://github.com/msridhar/WALA/tree/hsqldb-reflection

You can run com.ibm.wala.core.tests.callGraph.ReflectionTest#testHSQLDB and see the call graph output (you'll have to adjust hsqldb.txt with the path to your jar). I see:

Node: < Application, Lorg/hsqldb/util/ScriptTool, execute([Ljava/lang/String;)V > Context: Everywhere
 - invokevirtual < Application, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; >@9
     -> Node: < Primordial, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; > Context: Everywhere
 - invokevirtual < Application, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; >@18
     -> Node: < Primordial, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; > Context: Everywhere
 - invokevirtual < Application, Ljava/lang/String, equalsIgnoreCase(Ljava/lang/String;)Z >@23
     -> Node: < Primordial, Ljava/lang/String, equalsIgnoreCase(Ljava/lang/String;)Z > Context: Everywhere
 - invokestatic < Application, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; >@29
     -> Node: synthetic < Primordial, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; > Context: Everywhere
 - invokevirtual < Application, Ljava/lang/Class, newInstance()Ljava/lang/Object; >@32
     -> Node: synthetic  factory < Primordial, Ljava/lang/Class, newInstance()Ljava/lang/Object; > Context: Everywhere

@amordahl
Copy link
Author

amordahl commented Aug 27, 2022

@msridhar I also can't repo. I investigated a little closer and it actually seems like the problem is in the call to cg.getPossibleTargets on line 46 of my driver:

https://github.com/amordahl/WALAInterface/blob/08f4b204daab883080ffef2815a7d25625dfa952/src/main/java/edu/utdallas/amordahl/Application.java#L46

I can see in the callgraph itself that these targets are present. However, the call to getPossibleTargets returns an empty set for the call sites I showed you. I'm guessing this is because of the check here:

public Set<CGNode> getPossibleTargets(CGNode node, CallSiteReference site) {
that the target is explicit? My callgraph at runtime is an instance of ExplicitCallGraph.

Is this intended behavior?

@msridhar
Copy link
Member

Hrm, still not sure what is going on. I modified my test code to be very similar to yours:

msridhar@fa9d656

And I still see those targets in the output:

CALLER: Node: < Application, Lorg/hsqldb/util/ScriptTool, execute([Ljava/lang/String;)V > Context: Everywhere
SITE: invokevirtual < Application, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; >@9
  CALLEE: Node: < Primordial, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; > Context: Everywhere
SITE: invokevirtual < Application, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; >@18
  CALLEE: Node: < Primordial, Ljava/util/Properties, getProperty(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; > Context: Everywhere
SITE: invokevirtual < Application, Ljava/lang/String, equalsIgnoreCase(Ljava/lang/String;)Z >@23
  CALLEE: Node: < Primordial, Ljava/lang/String, equalsIgnoreCase(Ljava/lang/String;)Z > Context: Everywhere
SITE: invokestatic < Application, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; >@29
  CALLEE: Node: synthetic < Primordial, Ljava/lang/Class, forName(Ljava/lang/String;)Ljava/lang/Class; > Context: Everywhere
SITE: invokevirtual < Application, Ljava/lang/Class, newInstance()Ljava/lang/Object; >@32
  CALLEE: Node: synthetic  factory < Primordial, Ljava/lang/Class, newInstance()Ljava/lang/Object; > Context: Everywhere

Any other guesses?

@msridhar
Copy link
Member

msridhar commented Sep 6, 2022

Hey @amordahl just curious if you found another root cause here? Or if not, can we close out this issue? Let me know if you found other problems.

@amordahl
Copy link
Author

amordahl commented Sep 6, 2022

Hey Manu, sorry for the delay. ICSE deadline last week 😫

It's very weird; I can still recreate this issue. Which version of WALA do your test cases use?

@msridhar
Copy link
Member

msridhar commented Sep 6, 2022

They are all against the latest master branch. You can test against master by publishing a SNAPSHOT build locally (clone WALA and do ./gradlew publishToMavenLocal) and then depending on WALA version 1.5.9-SNAPSHOT.

@amordahl
Copy link
Author

Ah @msridhar, finally figured out why we're seeing different results. In my driver, I set both handleZeroLengthArray and handleStaticInit to false by default. In your test case, these are set to true. When I manually set them both to false, I am replicating the behavior I am describing. Does it make sense that these edges would be hidden if these two options are set to false only under STRING_ONLY?

@msridhar
Copy link
Member

I would expect only handleStaticInit to have a potential impact here. Do you need to set both handleStaticInit and handleZeroLengthArray to false to have an impact? Or just setting one to false cause the issue?

Also, out of curiosity, can you comment on why you are setting these to false? I would expect handleZeroLengthArray to only make the analysis results better; in fact I can't really recall why we made it an option to disable that, other than for testing.

@msridhar
Copy link
Member

Small update: disabling handling of zero-length arrays was added in d22ee36 and is related to using WALA with Averroes. In normal usage I would not recommend setting that to false.

@amordahl
Copy link
Author

amordahl commented Oct 8, 2022

Also, out of curiosity, can you comment on why you are setting these to false? I would expect handleZeroLengthArray to only make the analysis results better; in fact I can't really recall why we made it an option to disable that, other than for testing.

I extracted all configuration options I could find and, for boolean options, set them to FALSE by default. I can rerun my testing with handleZeroLengthArray set to true, but for now, are you aware of any reason why these edges would be hidden under this albeit uncommon configuration?

@msridhar
Copy link
Member

msridhar commented Oct 8, 2022

I could see handleStaticInit having an impact. handleZeroLengthArray should have nothing to do with it.

@msridhar
Copy link
Member

@amordahl any update on this issue? Do you still think there may be a WALA bug here?

@amordahl
Copy link
Author

amordahl commented Nov 2, 2022 via email

@amordahl
Copy link
Author

Hi Manu, sorry again for the late reply. I am currently rerunning testing of WALA with the configuration you suggested (i.e. handleStaticInit and handleZeroLengthArrays) to see if this issue still manifests. Will update when this is done.

@amordahl
Copy link
Author

amordahl commented Nov 28, 2022

Hi @msridhar, I have results. I modified my configuration model of WALA to make handleStaticInit and handleZeroLengthArrays true by default. I still see the strange behavior between the STRING_ONLY and NONE reflection options on three apps in DaCapo: pmd.jar, hsqldb.jar, and antlr.jar.

I've attached a package containing reduced versions of the apps as well as the .json reports produced by my testing tool. You can check out the "unexpected_diffs" element to see what causes the violation.

wala_alt_default_config.zip

I will be more responsive on this going forward :) Had a few back-to-back projects that sucked up all my time.

@msridhar
Copy link
Member

I haven't had time to look at this again, and still don't for the next bit. Just FYI that it's on my TODO list 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants