-
Notifications
You must be signed in to change notification settings - Fork 1.7k
General issue - Using this taint-tracking script can only show the source node and sink node? #5104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How do you view the results of your query? Only some of the several ways to do so will show you the paths -- for example, doing |
We have recently rewritten the data-flow analysis, which has led to slightly different path explanations. It is true that the previous version includes two more steps for the current example. I am curious as to which of those you find helpful? (I personally find the source and sink adequate in this case, but I am very interested in our users views here.) |
Thank for your reply. In fact, My teacher and I are doing some research on stain analysis recently (Especially for web applications written in Python). It's very important for us to analyze all the nodes of the data stream to reconstruct the data path. I want to konw how I need to modify it to achieve this. (●'◡'●) By the way, for some more complex programs, I think that getting a complete data path is conducive to the establishment of constraints, so as to solve the constraints and get the specific constraints from source to sink. I don't know if my expression is accurate. Do you have any good suggestions? |
That sounds interesting, I am always curious about research :-) However, I am not entirely convinced that your question is well-defined, in that "all the nodes" is a concept specific to the particular data-flow analysis chosen. Our data-flow implementation goes to some trouble to only report a useful set of nodes, both to leave out "trivial steps", but also to leave out various synthetic nodes or steps that are not strongly tied to the syntax and thus may be harder to interpret. We aim for a "path-explanation", a way of convincing a reader that data does indeed flow from the source to the sink. That being said, if you are doing this for research and you are feeling adventurous and happy to make customisations, I think you could change |
For accurate description, I took a screenshot of my query results in vscode. The arrow points to the node displayed in the query result, and the underline identifies some nodes in the data stream transmission in the actual running process (just those I want to get), because through these nodes, we can know the constraints of the data stream in the transmission process, such as If I can get the coordinates of these nodes in the query results, I can directly extract the constraints, so as to avoid the symbol execution in the global scope and improve the efficiency. I'm happy to make some changes to codeql for my research, but the project is very subtle and huge, and it really takes a lot of effort. |
Ah, I see. When you refer to the data stream, you are talking about the messages sent by the application, not the nodes on the data-flow path. It seems like you would be better served with a custom query written to collect constraints? |
Yes, I thought about it at first. But I'm not sure codeql supports such operations as extracting constraints. If I can, are there any templates for me to learn? |
CodeQL is a fairly rich language, I would it expect you could get something useful. I am not sure if we have good templates for this, I did not readily find a query that looked promising. However, we do have some utility classes such as If command injection is your interest, you could take that taint tracking configuration as your starting point. Then you could define "relevant constraints" as those that have flow into them from a source and which also have flow out of them to a sink (you may have to fiddle with the setup to get "constraints per sink/source-pair" which should be your path constraints). |
This issue is stale because it has been open 14 days with no activity. Comment or remove the |
When I used the script that was found in https://github.com/github/codeql/blob/main/python/ql/src/Security/CWE-078/CommandInjection.ql to do a command-injection taint tracking, I found that it can only show the source node and the sink node but not give me the node between them so that I can't build a full path to record the way that taint data passed to vulnerable function.
File: CommandInjection.ql
Example vuln code:
In my impression, the old version of CommandInjection example ql-script that does not encapsulate class “CommandInjectionConfiguration” can achieve the functions I mentioned above.
The text was updated successfully, but these errors were encountered: