Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stdout and stderr buffering on windows #2734

Merged
merged 9 commits into from
Sep 23, 2020

Conversation

MikailBag
Copy link
Contributor

Motivation

Should fix #2380 (But not tested on a Windows yet)

Solution

Wrap Blocking in Stdout and Stderr in a specall wrapper, which is completely transparent on Linux and intercepts writes on Windows:

  1. If input buffer is larger that MAX_BUF, we shrink to to MAX_BUF.
  2. If input buffer now has incomplete char at the end, we shrink it too.
    It should work, because in general it is ok for AsyncWrite::write to write less bytes than requested.

I don't think I can add test for it, because on CI stdout is not handle to console, so no panic can occur.

@Darksonn Darksonn added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-io Module: tokio/io labels Jul 31, 2020
@MikailBag
Copy link
Contributor Author

MikailBag commented Aug 7, 2020

I tested my change on wine64 ver. wine-5.0.1.
Program I used:

use std::str;
use tokio::io::{self, AsyncWriteExt};

const MAX_BUF: usize = 16 * 1024;

#[tokio::main]
async fn main() {
    assert_eq!("█".len(), 3);
    // this will actually have the size MAX_BUF * 3
    // I could only get an error being this big!
    let string = str::repeat("█", MAX_BUF);
    let mut stdout = io::stdout();
    stdout.write_all(string.as_bytes()).await.unwrap();
    stdout.flush().await.unwrap();
}

On tokio v0.2.22:

(some chars)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "Windows stdio in console mode does not support writing non-UTF-8 byte sequences" }', src/main.rs:13:5
stack backtrace:

On this PR's head:

(some chars, in much larger amount, as expected)

That's why I think that my PR actually fixed mentioned bug.
cc @brunoczim as original reporter.

@Darksonn
Copy link
Contributor

Darksonn commented Sep 9, 2020

I feel like it should be possible to write an OS-independent test for this by making a thing that verifies that whatever is written to it is valid utf-8.

Comment on lines 46 to 49
return Poll::Ready(Err(std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"provided buffer does not contain utf-8 data",
)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue says:

So, windows stdout as a console only accepts utf-16 characters. Stdlib's stdout detects whether the Stdout is a console in Windows, and then it assumes the byte buffer is encoded in utf-8 and then converts it to utf-16 so it can be printed.

So stdout might not be a console, in which case non-utf8 is ok. If the data isn't utf-8, I think we should just try to forward it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed trimming logic.
Now if buffer has utf8 error at start, or more that 8 bytes would be skipped, we assume that caller really wants to print non-utf8 data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather complicated. Why not just set a boolean flag if you've already printed non utf-8 data?

Copy link
Contributor Author

@MikailBag MikailBag Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such a flag can lead to following scenario:

  1. User creates a tokio::io::Stdout instance.
  2. User passes this instance to a library. That library tries to write some binary data, gets an error and ignores it. Wrapper observes it and sets flag to true.
  3. Now user tries to write long utf8 string into the same Stdout. However, since flag is set, wrapper no longer trims buffer, and write operation fails.

I.e. one binary write "poisons" Stdout instance, and following legitimate "text" writes will sometimes fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code again. You've convinced me.

@Darksonn
Copy link
Contributor

So, although the code appears correct, I think it could be sped up significantly by only utf-8 checking the last few bytes of the data?

@Darksonn
Copy link
Contributor

I have opened a new issue to remove the full utf-8 check.

@MikailBag
Copy link
Contributor Author

Oh, I've implemented more efficient validation (which considers at most 8 final bytes IIRC), but unfortunately my laptop broke at that very time :(
I'll submit PR as soon as possible.

@MikailBag
Copy link
Contributor Author

Followup: #2888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-io Module: tokio/io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode characters are split when writing to windows terminal
2 participants