Skip to content

Rust: Model std::net and tokio fs, io, net #19446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented May 1, 2025

Model std::net and tokio fs, io, net. This includes a good number of high value taint sources. There are lots of test cases, particularly since at present it's tricky to get the repo / path correct in the model without a test to examine first.

I've also moved some stuff around in the dataflow/sources tests as it was getting large and disorganized. And I've improved modelling of reqwest.

A DCA run will reveal how well this all works...

@Copilot Copilot AI review requested due to automatic review settings May 1, 2025 14:41
@geoffw0 geoffw0 added the Rust Pull requests that update Rust code label May 1, 2025
@geoffw0 geoffw0 requested a review from a team as a code owner May 1, 2025 14:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the CodeQL Rust taint models to cover std::net and Tokio’s fs, io, and net APIs, reorganizes some existing test inputs, and refines the reqwest model for async responses.

  • Add bytes crate dependency to test options
  • Update expected DataFlow outputs for new sources (stdin, file, network)
  • Introduce YAML model files for Tokio and standard library networking, FS, IO, plus enhanced reqwest async return modeling

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rust/ql/test/library-tests/dataflow/sources/options.yml Added bytes crate to test dependencies
rust/ql/test/library-tests/dataflow/sources/TaintSources.expected Updated expected taint sources ordering and entries
rust/ql/test/library-tests/dataflow/local/DataFlowStep.expected Extended local dataflow summary with new TcpStream, Tokio async read, split, lines, reqwest futures, etc.
rust/ql/lib/codeql/rust/frameworks/tokio/net.model.yml New source and summary model for Tokio TCP connect and read variants
rust/ql/lib/codeql/rust/frameworks/tokio/io.model.yml New source and summary model for Tokio async IO primitives
rust/ql/lib/codeql/rust/frameworks/tokio/fs.model.yml New source model for Tokio async file read APIs
rust/ql/lib/codeql/rust/frameworks/stdlib/net.model.yml New source and summary model for std::net::TcpStream connect and IO
rust/ql/lib/codeql/rust/frameworks/stdlib/io.model.yml Added summary model for std::io::Split iterator next
rust/ql/lib/codeql/rust/frameworks/reqwest.model.yml Adjusted crate::get to return a future; added async and blocking response methods
Comments suppressed due to low confidence (2)

rust/ql/lib/codeql/rust/frameworks/tokio/net.model.yml:6

  • Consider adding a source model for <crate::net::tcp::stream::TcpStream>::connect_timeout similar to connect to cover timed-out connection flows.
      - ["repo:https://github.com/tokio-rs/tokio:tokio", "<crate::net::tcp::stream::TcpStream>::connect", "ReturnValue.Future.Field[crate::result::Result::Ok(0)]", "remote", "manual"]

rust/ql/lib/codeql/rust/frameworks/reqwest.model.yml:27

  • Add a mapping for <crate::async_impl::response::Response>::text_with_charset in the async model to ensure coverage of all async response body methods.
      - ["repo:https://github.com/seanmonstar/reqwest:reqwest", "<crate::async_impl::response::Response>::chunk", "Argument[self]", "ReturnValue.Future.Field[crate::result::Result::Ok(0)].Field[crate::option::Option::Some(0)]", "taint", "manual"]

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 2, 2025

DCA shows a 4.3x increase in taint sources, 4.5x increase in taint reach 🎉

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 19, 2025

Fixed merge conflicts with #19466 . There's still a test failure.

@@ -72,14 +72,15 @@ async fn test_reqwest() -> Result<(), reqwest::Error> {
sink(remote_string4); // $ hasTaintFlow="example.com"

let remote_string5 = reqwest::get("example.com").await?.text().await?; // $ Alert[rust/summary/taint-sources]
sink(remote_string5); // $ MISSING: hasTaintFlow
sink(remote_string5); // $ hasTaintFlow="example.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What does the "example.com" mean here? I guess it's not that remote_string5 is tainted by the string, since the body of the response will not contain that URL, but that it's some test mechanism where the taint source (the return value of get) somehow gets linked up with the argument>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you've got the right idea. Let me tell you a story...

We used to label sources by their line number:

fn source() -> i64 { 0 } // synthetic source function
let x = source();
sink(x); // $ hasTaintFlow=2

but this was a nuisance because when you moved the code the label had to be updated. So somebody invented labelled sources, where the source function takes an argument that is ignored in code, but used as a label by the inline expectation tests:

fn source(label: i64) -> i64 { 0 } // synthetic source function with label
let x = source(123);
sink(x); // $ hasTaintFlow=123

So 123 is not the source, it's just a label, the call source(123) is the source. So far so good. But what we have in the code above is:

let remote_string5 = reqwest::get("example.com").await?.text().await?;
sink(remote_string5); // $ hasTaintFlow="example.com"

what's happened here is that get is a modelled source, not a synthetic one, and "example.com" is both a real argument telling it what URL to load and is accidentally being picked up as a source label.

We could fix this be changing the labelling mechanism or disabling it when the source is a real modelled source rather than a synthetic source function. But to be honest for the most part these are OK labels and I'm not sure the mechanism is doing any harm as it is.

Sorry for the confusion. Let me know if you have any thoughts, e.g. a better way to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining this! It makes sense now, but was a bit confusing at first sight.

match futures::executor::block_on(test_tokio_stdin()) {
Ok(_) => println!("complete"),
Err(e) => println!("error: {}", e),
}*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe just because I was running the test to check some of the web-based stuff really works, and it was a pain having to type stuff into stdin to get to it. Happy to uncomment it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI has some small diffs, but otherwise this looks good to be with a nice impact on DCA 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants