Deal with multiple possible callables #208

khatchad · 2024-07-21T17:04:32Z

Return null when there are multiple possible callables.
Add test to exercise call string imprecision. Based on the call string length. See Method `com.ibm.wala.ipa.callgraph.MethodTargetSelector.getCalleeTarget()` and multiple targets WALA#1417 (reply in thread).
Expect the test to fail. In the past, we could add 0's to the parameters, but since we are not enforcing the existing of the node in the CG, we can no longer do that. Still, this test should now fail if Callable identification should be further up the analysis #207 is fixed.

* Return `null` when there are multiple possible callables. * Add test to exercise call string imprecision. Based on the call string length. See wala/WALA#1417 (reply in thread). * Expect the test to fail. In the past, we could add 0's to the parameters, but since we are not enforcing the existing of the node in the CG, we can no longer do that. Still, this test should now fail if wala#207 is fixed.

We don't the instance key that produces the callable right now.

msridhar

I'm having trouble following the overall flow here. It seems the method trampoline target selector looks at points-to sets to figure out a receiver, and then somewhere this receiver is passed to the superclass target selector. This doesn't fit my standard model of how things work. I would expect the pointer analysis to be iterating through possible receiver objects and then calling into the target selector to get the target (possibly null) for each one. Do you understand how this all fits together, and if so, could you explain it a bit?

khatchad · 2024-07-25T14:41:32Z

I'm having trouble following the overall flow here. It seems the method trampoline target selector looks at points-to sets to figure out a receiver, and then somewhere this receiver is passed to the superclass target selector. This doesn't fit my standard model of how things work. I would expect the pointer analysis to be iterating through possible receiver objects and then calling into the target selector to get the target (possibly null) for each one. Do you understand how this all fits together, and if so, could you explain it a bit?

That is sort of what happens, except that we don't call into a target selector. First, we detect whether the given receiver is a "callable" object:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonInstanceMethodTrampolineTargetSelector.java

Line 90 in b8eccd0

if (isCallable(receiver)) {

Then, we swap the receiver:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonInstanceMethodTrampolineTargetSelector.java

Lines 95 to 96 in b8eccd0

    
           // It's a callable. Change the receiver. 
        
           receiver = getCallable(caller, receiver.getClassHierarchy(), call);

The new receiver will actually represent the callable method:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonInstanceMethodTrampolineTargetSelector.java

Lines 209 to 218 in b8eccd0

    
             /** 
        
              * Returns the callable method of the receiver of the given {@link PythonInvokeInstruction}. 
        
              * 
        
              * @param caller The {@link CGNode} representing the caller of the given {@link 
        
              *     PythonInvokeInstruction}. 
        
              * @param cha The receiver's {@link IClassHierarchy}. 
        
              * @param call The {@link PythonInvokeInstruction} in question. 
        
              * @return The callable method the given {@link PythonInvokeInstruction}'s receiver. 
        
              */ 
        
             private IClass getCallable(CGNode caller, IClassHierarchy cha, PythonInvokeInstruction call) {

The points-to set iteration happens here:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonInstanceMethodTrampolineTargetSelector.java

Lines 221 to 240 in b8eccd0

    
           // Lookup the callable method. 
        
           PointerKeyFactory pkf = builder.getPointerKeyFactory(); 
        
           PointerKey receiver = pkf.getPointerKeyForLocal(caller, call.getUse(0)); 
        
           OrdinalSet<InstanceKey> objs = builder.getPointerAnalysis().getPointsToSet(receiver); 
        
           for (InstanceKey o : objs) { 
        
             AllocationSiteInNode instanceKey = getAllocationSiteInNode(o); 
        
             if (instanceKey != null) { 
        
               CGNode node = instanceKey.getNode(); 
        
               IMethod method = node.getMethod(); 
        
               IClass declaringClass = method.getDeclaringClass(); 
        
               final ClassLoaderReference classLoaderReference = 
        
                   declaringClass.getClassLoader().getReference(); 
        
               TypeName declaringClassName = declaringClass.getName(); 
        
               final String packageName = "$" + declaringClassName.toString().substring(1); 
        
               IClass callable = 
        
                   cha.lookupClass( 
        
                       TypeReference.findOrCreateClass( 
        
                           classLoaderReference, packageName, CALLABLE_METHOD_NAME));

That is where we find the __call__() method for the class. Note that when this happens, the original receiver is just Lobject (that is how we know that the "function call" is on an object rather than a function. Does that answer your question?

msridhar · 2024-07-25T21:57:22Z

Thanks, but I'm wondering, where is the target selector for call graph construction getting called in the first place? That's the place where I would expect the points-to set iteration to be happening. Like here is code in the WALA Java bytecode analysis:

https://github.com/wala/WALA/blob/dc3ad5e3740364a3a97786549f16a8808a84811a/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L2074-L2081

The flow is a little complex, but basically it gets the receiver type from an instance key, and only then calls into the MethodTargetSelector (gets called in getTargetForCall here).

khatchad · 2024-07-26T14:09:39Z

Thanks, but I'm wondering, where is the target selector for call graph construction getting called in the first place? That's the place where I would expect the points-to set iteration to be happening. Like here is code in the WALA Java bytecode analysis:

https://github.com/wala/WALA/blob/dc3ad5e3740364a3a97786549f16a8808a84811a/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L2074-L2081

Ariadne also has this as the starting point for getting the target selector for call graph construction.

The flow is a little complex, but basically it gets the receiver type from an instance key, and only then calls into the MethodTargetSelector (gets called in getTargetForCall here).

I see what you mean. So, instead of swapping the receiver inside the instance method trampoline target selector, we'd swap the receiver at the caller further up the call stack. I think that would work also.

I think a bigger issue perhaps is #207, i.e., I'm unsure whether using the points-to analysis in the first place is the right way to go. Does #207 make sense? Otherwise, I don't see why we can't move the points-to set iteration.

msridhar · 2024-07-26T15:37:53Z

Ok, unfortunately I don't have time to dig deep here to understand what is going on. Something does seem off. For now, maybe you can add a TODO comment in the code linking #207 and saying we need a deeper investigation to at least document how this is working? I don't want to block you so we can go ahead and land this PR once we have that.

khatchad · 2024-07-26T15:45:57Z

Yes, also short on time here. To summarize:

The current points-to set traversal can be moved. Instead of internally swapping the receiver, a different receiver can be sent in the first place as part of the outer call.
While we agree on (1) above, a question is open (Callable identification should be further up the analysis #207) as to whether using a points-to analysis in the first place makes sense.

msridhar

Approved but please add discussed code comment before merging

Per wala#208 (comment).

khatchad added 2 commits July 21, 2024 13:01

Use a set to track callable candidates.

853c796

We don't the instance key that produces the callable right now.

khatchad added enhancement New feature or request callables Callable objects in Python. labels Jul 21, 2024

khatchad requested a review from msridhar July 21, 2024 17:04

khatchad enabled auto-merge (squash) July 21, 2024 17:04

msridhar reviewed Jul 24, 2024

View reviewed changes

Merge branch 'master' into contrib_callables_multiple

8364b08

msridhar disabled auto-merge July 26, 2024 15:47

msridhar approved these changes Jul 26, 2024

View reviewed changes

Add TODO comment.

84e377a

Per wala#208 (comment).

khatchad enabled auto-merge (squash) July 26, 2024 18:21

khatchad merged commit 948e6bf into wala:master Jul 26, 2024
1 check passed

khatchad deleted the contrib_callables_multiple branch July 26, 2024 18:22

khatchad added a commit to ponder-lab/ML that referenced this pull request Jul 29, 2024

Add TODO comment.

11993eb

Per wala#208 (comment).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with multiple possible callables #208

Deal with multiple possible callables #208

khatchad commented Jul 21, 2024

msridhar left a comment

khatchad commented Jul 25, 2024

msridhar commented Jul 25, 2024

khatchad commented Jul 26, 2024

msridhar commented Jul 26, 2024

khatchad commented Jul 26, 2024

msridhar left a comment

Deal with multiple possible callables #208

Deal with multiple possible callables #208

Conversation

khatchad commented Jul 21, 2024

msridhar left a comment

Choose a reason for hiding this comment

khatchad commented Jul 25, 2024

msridhar commented Jul 25, 2024

khatchad commented Jul 26, 2024

msridhar commented Jul 26, 2024

khatchad commented Jul 26, 2024

msridhar left a comment

Choose a reason for hiding this comment