Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI control characters not treated as zero width #24

Closed
stuhood opened this issue Nov 2, 2021 · 6 comments
Closed

ANSI control characters not treated as zero width #24

stuhood opened this issue Nov 2, 2021 · 6 comments

Comments

@stuhood
Copy link

stuhood commented Nov 2, 2021

Hey folks!

According to the README.md, control characters should be treated as zero width, but it seems like ANSI color sequences are not currently. Code like the following (using strip_ansi_escapes) will fail for strings containing ANSI control characters:

fn assert_width(s: String) {
    let stripped_width = std::str::from_utf8(&strip_ansi_escapes::strip(s.as_bytes()).unwrap())
        .unwrap()
        .width() as u16;
    let unicode_width = s.width() as u16;
    assert_eq!(
        stripped_width, unicode_width,
        "Mismatched width ({} vs {}) for `{:?}`",
        stripped_width, unicode_width, s
    );
}

...such as:

"\u{1b}[1m========\u{1b}[0m"

Is this expected?

@Manishearth
Copy link
Member

Manishearth commented Nov 3, 2021

It's unclear to me where the README file says that: the control characters are marked as "neutral" in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt and "neutral" is a narrow character in the spec (see http://www.unicode.org/reports/tr11/#ED7). The purpose of this crate is to follow the spec; not try to implement rendered column width in terminals.

@stuhood
Copy link
Author

stuhood commented Nov 3, 2021

@Manishearth : Sorry, I guess it wasn't the README, but instead the docstring for width:

/// Control characters are treated as having zero width.

@Manishearth Manishearth reopened this Nov 3, 2021
@Manishearth
Copy link
Member

Hmmm, that's interesting. I'm not sure where that came from, will need to go through the spec a bit more when I have time

@stuhood
Copy link
Author

stuhood commented Nov 3, 2021

It seems like the issue might be that colors are actually represented by sequences of characters, rather than with any individual character... strip_ansi_escapes uses the vte crate, which makes it clearer that a state machine is required to interpret the sequence.

The example from the description, rendered with:

format!("Widths: {:#?}", s.chars().map(|c| format!("{:?}: {:?}", c, c.width())).collect::<Vec<_>>())

...looks like:

Widths: [
    "'\\u{1b}': None",
    "'[': Some(1)",
    "'3': Some(1)",
    "'2': Some(1)",
    "'m': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'=': Some(1)",
    "'\\u{1b}': None",
    "'[': Some(1)",
    "'0': Some(1)",
    "'m': Some(1)",
]

@Manishearth
Copy link
Member

Oh, yeah, this is behaving as expected then (the \1b is width zero), that's about unicode control characters, not general control sequences -- terminals and other higher level systems are welcome to define their own control sequences.

@stuhood
Copy link
Author

stuhood commented Nov 3, 2021

Ok, thanks: that makes sense.

For whomever might come across this issue next, this is what I think counting characters with unicode-width looks like after having stripped ANSI escape sequences using vte:

use vte::{Parser, Perform};
use unicode_width::UnicodeWidthChar;

fn count_blocks(s: &str) -> usize {
    struct BlockCounter(usize);

    impl Perform for BlockCounter {
        fn print(&mut self, c: char) {
            self.0 += c.width().unwrap_or(0);
        }

        fn execute(&mut self, byte: u8) {
            if byte == b'\n' {
                self.0 += 1;
            }
        }
    }

    let mut block_counter = BlockCounter(0);
    let mut parser = Parser::new();
    for b in s.as_bytes() {
        parser.advance(&mut block_counter, *b)
    }
    block_counter.0
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants