-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: How to find type information for a specific variable or object #16961
Comments
We don't really have this sort of functionality in the CodeQL Python libraries. There's an old, unsupported, and bitrotted part of the libraries that does a "points-to" analysis (i.e. figures out what possible values a given program element may point to at runtime), but I tried it and it doesn't seem to work (and at any rate I wouldn't recommend using it). With that in mind, I have two suggestions I can make. If you want to access the types as they are specified in the source code, then the key you're looking for is the As for inference, I think the best option would be to use API graphs, but I think this is likely to be very noisy. What I'm thinking is that you could do something like import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.ApiGraphs
predicate has_type(DataFlow::ParameterNode n, string type) {
exists(API::Node a |
a.getAValueReachableFromSource() = n and
a = API::moduleImport(_).getAMember*().getReturn*() and
type = a.toString()
)
} API graphs basically work by approximating the set of possible access paths of values in the code. For instance, if you do import foo then whereever that a = API::moduleImport(_).getAMember*().getReturn*() and restricts the set of API graph nodes to be just those that are (calls to) attributes of modules (which is perhaps an okay approximation of "type"). If you remove that line, you'll get a lot more results, some possibly somewhat nonsensical. Note that this is limited by our ability to figure out what flows where in the code, and this can only ever be an approximation. |
That's a really great idea! Thanks @tausbn Basically I want to run a query on libraries as the source where I want to identify what are the types for each of the parameters and then group them based on which use the same types for arguments. I agree that maybe using an repo which uses the library makes sense, but then it might only use a limited set of APIs of the library. Is there some possibility of using some sort of TypeTracking stuff here either? But I don't know what to use as the source, It seems hard to find generators or each type and then propagate them across stuff. |
You are right that API graphs are not well suited for tracking things that are defined in the given codebase itself. For instance, if you have something like the following code class A:
...
def foo(x):
...
foo(A()) then API graphs will not be able to figure out that
I'm not entirely sure what you're trying to do based on your description. Maybe a small example would help?
Sure, you could do it using type trackers. In fact, we already have type trackers for classes and class instances. With that in mind, perhaps the following code more accurately captures what you want: import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.internal.DataFlowDispatch
import semmle.python.ApiGraphs
import python
predicate has_type(DataFlow::ParameterNode n, string type) {
exists(API::Node a |
a.getAValueReachableFromSource() = n and
a = API::moduleImport(_).getAMember*().getReturn*() and
type = a.toString()
)
or
exists(Class c |
n = classTracker(c) and type = c.getName()
or
n = classInstanceTracker(c) and type = c.getName() + " instance"
)
} Here, the Bear in mind that both of these are part of an internal API, and as such may change without warning. |
Here is a better explanation of what I want to achieve. I am using codeQL to extract information about python libraries as a part of a pipeline. The information I want is basically something as follows :
I am running these on the libraries themselves, the assumption is that there are enough test cases that these exported APIs are being called at some point with specific types being passed to them. This information is later parsed to group them together (based on types) and perform other analysis and generate statistics. I have CodeQL queries to extract all the information about functions/methods/callgraphs etc - but I am not able get type information for each of the parameters/attributes/return values. Annotated type hints are a bit rare. That aside, the type tracking query does seem to give me a lot of class object types, and gives me some leads on improving it, so that's really helpful.. Thank you! Seems like there's a limitation in identifying the basic datatypes such as List, str etc. But this is a good start (and I have managed to create some queries to do 1, 2 and 3 - for the class types atleast) and I will play around with the API to see what all I can do! |
Hello, I am writing python queries for some libraries and I was trying to find all the types in the program and group api's which use Type X, Type Y etc.
But the current API doesn't seem to have a way to connect type to say function paramaters or anything of that sort. Does CodeQL support this functionality? Is it possible to get atleast an imprecise list of possible types for a paramater of a function?
The text was updated successfully, but these errors were encountered: