Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio file visibility problem #4796

Closed
v0y4g3r opened this issue Jun 30, 2022 · 7 comments
Closed

Tokio file visibility problem #4796

v0y4g3r opened this issue Jun 30, 2022 · 7 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs

Comments

@v0y4g3r
Copy link

v0y4g3r commented Jun 30, 2022

Version

$ cargo tree | grep tokio
└── tokio v1.19.2
    └── tokio-macros v1.8.0 (proc-macro)

Platform

$ uname -a
Linux VM-16-12-ubuntu 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description

What does my program do

In src/main.rs, 3 async tasks cooperate with each other, synchronized by tokio::sync::Notify.

  • Writer appends data to the end of file, update write_offset and notify flush task.
  • Flush task waits for write finishes, load write_offset, flush file and update flush_offset to write_offset.
  • Reader waits for flush finishes, load flush_offset as the persisted file length, read the file
    region [0, flush_offset) and checks if data read matches data written.

What is expected

As these async tasks are carefully synchronized, when reader initiates a read to file,
it should be able to see the data written by writer is already persisted, the file's length
in file metadata is equal to flush_offset, and the file content is the same as data written.

What actually happens

When run this program in a loop, given enough time to repeat, it can always panic at the assert statement in reader
task,
complaining that file's length in metadata is 0, and file content read is empty.

$ cargo build

$ while ./target/debug/filemap; do echo ; done
# ... many runs

file: /tmp/file-test.YhIMnN3Mkb7a/data
write finish: 123
flush: 123
thread 'tokio-runtime-worker' panicked at 'assertion failed: `(left == right)`
  left: `123`,
 right: `0`', src/main.rs:57:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(9), ...)', src/main.rs:71:18

From above logs, it's clear that sync_all succeeded before read (as we can see "flush: 123" was printed),
even though when reader read file it could only see an empty file.

The problem can always be reproduced in platform:

$ uname -a
Linux VM-16-12-ubuntu 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ rustup show
Default host: x86_64-unknown-linux-gnu

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (default)

active toolchain
----------------

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.64.0-nightly (830880640 2022-06-28)

But interestingly, this problem could not yet be reproduced in ARM platforms, including Raspberry Pi and Macbook Pro M1.

The minimal reproducible example can be found in this repo.

@v0y4g3r v0y4g3r added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Jun 30, 2022
@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 30, 2022

As I further narrowed down, it turns out that nothing to do with notify or synchronizations, the problem may lie in tokio::fs::File, you can checkout the no-spawn branch and problem still exists. Also coredump was generated when assertion fails.
core.filemap.1000.4a5b99198751433abb9c7c01c4b094a4.2971510.1656570551000000.zip

@Darksonn
Copy link
Contributor

You need to call flush on the File before sending the notifications. Otherwise the operation may still be in progress even though the call to write has returned.

@Darksonn Darksonn added the M-fs Module: tokio/fs label Jun 30, 2022
@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 30, 2022

You need to call flush on the File before sending the notifications. Otherwise the operation may still be in progress even though the call to write has returned.

Sure, in line 37 I do call sync_all, which internally calls fsync(3p) and does exactly what you mean by "flush".

@Darksonn
Copy link
Contributor

Right. This appears to be a misunderstanding of how try_clone works. Each clone has its own state field, so calling sync_all or flush on a different clone of the file will not wait for pending operations on the original file object.

I see how the documentation of try_clone is confusing here. It would be good to improve it.

@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 30, 2022

Right. This appears to be a misunderstanding of how try_clone works. Each clone has its own state field, so calling sync_all or flush on a different clone of the file will not wait for pending operations on the original file object.

I see how the documentation of try_clone is confusing here. It would be good to improve it.

Thanks for your explanation, I will rewrite the MRE to see if the problem resolves :)

@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 30, 2022

I've found the root cause.

tokio::fs::File internally maintains a state which contains a buffer. When invoking tokio::fs::File::write_all and wait for future complete, the data is guaranteed to be copied to the inner state of current File instance. But if File is cloned via try_clone, this state is not shared with the cloned File instance. So data written to previous File instance is not guaranteed to be flushed to disk even if you call sync_data on the duplicated File instance.

Switch to std::fs::File does solve the problem since std::fs::File is nothing but a wrapper of libc syscalls.

@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 30, 2022

Maybe the documentation could be more clear to remind users of such internal buffering mechanisms :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs
Projects
None yet
Development

No branches or pull requests

2 participants