Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent rust test failure #6280

Closed
petemoore opened this issue Jun 1, 2023 · 3 comments · Fixed by #6309
Closed

Intermittent rust test failure #6280

petemoore opened this issue Jun 1, 2023 · 3 comments · Fixed by #6309
Assignees
Labels
bug Something isn't working

Comments

@petemoore
Copy link
Member

petemoore commented Jun 1, 2023

Describe the bug

I had a rust test failure today that I've not seen before. My guess is that it is intermittent, and unrelated to my changes.

Issue was here: https://github.com/taskcluster/taskcluster/runs/13918398445

failures:

---- factory::test::file_reader_twice stdout ----
thread 'factory::test::file_reader_twice' panicked at 'assertion failed: `(left == right)`
  left: `[]`,
 right: `[72, 69, 76, 76, 79, 47, 87, 79, 82, 76, 68]`', upload/src/factory.rs:87:9

Looking at the test code, it appears to write DATA to a temporary file ('HELLO/WORLD') and then reads it back from the same file, checking that the content matches what was written to the file. It attempts to read the data twice. The first read was the one that failed in this case (line 87), so wasn't related to data being read twice:

#[tokio::test]
async fn file_reader_twice() -> Result<()> {
let mut file: File = tempfile()?.into();
file.write_all(DATA).await?;
let mut factory = FileReaderFactory::new(file);
assert_eq!(&copy_from_factory(&mut factory).await?, DATA);
assert_eq!(&copy_from_factory(&mut factory).await?, DATA);
Ok(())
}

My guess is that line 84:

file.write_all(DATA).await?;

assumes that the statement blocks until the data is flushed to the file and the file handle has been closed. But perhaps execution continues before the OS has flushed the data, so that lines 86 and 87 that read the data from the file don't see the data that was written but not yet flushed on line 84. This could explain why the left side is empty (no bytes) in the test failure report. The right side is the full data (the data that was meant to be written to the file). It may also explain the intermittency, due to the race between the data flushing and the file being read.

@djmitche What do you think?

@djmitche
Copy link
Collaborator

djmitche commented Jun 3, 2023

I don't think this is a race -- operating systems are pretty good at indicating when data has been written to a file!

Even if the .await was missing on file.write_all(DATA), I wouldn't expect intermittent behavior:

  • Rust warns for unused Futures
  • Futures that are dropped (which is what would happen without .await) don't do anything, so the result would be that the data is never written to the file.

@djmitche
Copy link
Collaborator

djmitche commented Jun 3, 2023

Hm, this might be a tokio bug, though -- we actually duplicate the File (which we have to -- we only get one copy from tempfile) and perhaps that doesn't duplicate the buffer.

But internally it's using a std::fs::File directly, which just maps to OS calls -- no in-process buffering.

That said, adding a file.flush().await? does seem to stop this from reproducing. It's possible that updating tokio would fix some subtle bug here, too, although I don't see the bug.

@djmitche
Copy link
Collaborator

djmitche commented Jun 3, 2023

Oh! Tokio is running fs operations like write on a thread! But, it doesn't wait for those operations to complete when cloning.. That's a bug :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants