-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: Model std::net and tokio fs
, io
, net
#19446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the CodeQL Rust taint models to cover std::net
and Tokio’s fs
, io
, and net
APIs, reorganizes some existing test inputs, and refines the reqwest
model for async responses.
- Add
bytes
crate dependency to test options - Update expected DataFlow outputs for new sources (
stdin
, file, network) - Introduce YAML model files for Tokio and standard library networking, FS, IO, plus enhanced
reqwest
async return modeling
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
rust/ql/test/library-tests/dataflow/sources/options.yml | Added bytes crate to test dependencies |
rust/ql/test/library-tests/dataflow/sources/TaintSources.expected | Updated expected taint sources ordering and entries |
rust/ql/test/library-tests/dataflow/local/DataFlowStep.expected | Extended local dataflow summary with new TcpStream , Tokio async read, split, lines, reqwest futures, etc. |
rust/ql/lib/codeql/rust/frameworks/tokio/net.model.yml | New source and summary model for Tokio TCP connect and read variants |
rust/ql/lib/codeql/rust/frameworks/tokio/io.model.yml | New source and summary model for Tokio async IO primitives |
rust/ql/lib/codeql/rust/frameworks/tokio/fs.model.yml | New source model for Tokio async file read APIs |
rust/ql/lib/codeql/rust/frameworks/stdlib/net.model.yml | New source and summary model for std::net::TcpStream connect and IO |
rust/ql/lib/codeql/rust/frameworks/stdlib/io.model.yml | Added summary model for std::io::Split iterator next |
rust/ql/lib/codeql/rust/frameworks/reqwest.model.yml | Adjusted crate::get to return a future; added async and blocking response methods |
Comments suppressed due to low confidence (2)
rust/ql/lib/codeql/rust/frameworks/tokio/net.model.yml:6
- Consider adding a source model for
<crate::net::tcp::stream::TcpStream>::connect_timeout
similar toconnect
to cover timed-out connection flows.
- ["repo:https://github.com/tokio-rs/tokio:tokio", "<crate::net::tcp::stream::TcpStream>::connect", "ReturnValue.Future.Field[crate::result::Result::Ok(0)]", "remote", "manual"]
rust/ql/lib/codeql/rust/frameworks/reqwest.model.yml:27
- Add a mapping for
<crate::async_impl::response::Response>::text_with_charset
in the async model to ensure coverage of all async response body methods.
- ["repo:https://github.com/seanmonstar/reqwest:reqwest", "<crate::async_impl::response::Response>::chunk", "Argument[self]", "ReturnValue.Future.Field[crate::result::Result::Ok(0)].Field[crate::option::Option::Some(0)]", "taint", "manual"]
DCA shows a 4.3x increase in taint sources, 4.5x increase in taint reach 🎉 |
Fixed merge conflicts with #19466 . There's still a test failure. |
@@ -72,14 +72,15 @@ async fn test_reqwest() -> Result<(), reqwest::Error> { | |||
sink(remote_string4); // $ hasTaintFlow="example.com" | |||
|
|||
let remote_string5 = reqwest::get("example.com").await?.text().await?; // $ Alert[rust/summary/taint-sources] | |||
sink(remote_string5); // $ MISSING: hasTaintFlow | |||
sink(remote_string5); // $ hasTaintFlow="example.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: What does the "example.com"
mean here? I guess it's not that remote_string5
is tainted by the string, since the body of the response will not contain that URL, but that it's some test mechanism where the taint source (the return value of get
) somehow gets linked up with the argument>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you've got the right idea. Let me tell you a story...
We used to label sources by their line number:
fn source() -> i64 { 0 } // synthetic source function
let x = source();
sink(x); // $ hasTaintFlow=2
but this was a nuisance because when you moved the code the label had to be updated. So somebody invented labelled sources, where the source
function takes an argument that is ignored in code, but used as a label by the inline expectation tests:
fn source(label: i64) -> i64 { 0 } // synthetic source function with label
let x = source(123);
sink(x); // $ hasTaintFlow=123
So 123
is not the source, it's just a label, the call source(123)
is the source. So far so good. But what we have in the code above is:
let remote_string5 = reqwest::get("example.com").await?.text().await?;
sink(remote_string5); // $ hasTaintFlow="example.com"
what's happened here is that get
is a modelled source, not a synthetic one, and "example.com"
is both a real argument telling it what URL to load and is accidentally being picked up as a source label.
We could fix this be changing the labelling mechanism or disabling it when the source is a real modelled source rather than a synthetic source function. But to be honest for the most part these are OK labels and I'm not sure the mechanism is doing any harm as it is.
Sorry for the confusion. Let me know if you have any thoughts, e.g. a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining this! It makes sense now, but was a bit confusing at first sight.
match futures::executor::block_on(test_tokio_stdin()) { | ||
Ok(_) => println!("complete"), | ||
Err(e) => println!("error: {}", e), | ||
}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe just because I was running the test to check some of the web-based stuff really works, and it was a pain having to type stuff into stdin
to get to it. Happy to uncomment it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI has some small diffs, but otherwise this looks good to be with a nice impact on DCA 🎉
Model
std::net
and tokiofs
,io
,net
. This includes a good number of high value taint sources. There are lots of test cases, particularly since at present it's tricky to get the repo / path correct in the model without a test to examine first.I've also moved some stuff around in the
dataflow/sources
tests as it was getting large and disorganized. And I've improved modelling ofreqwest
.A DCA run will reveal how well this all works...